SageMaker Ground Truth
Ref: https://www.udemy.com/course/aws-certified-machine-learning-engineer-associate-mla-c01/learn/lecture/45285145
- 🔧 Humans label data → Prepare a training dataset with humans
- Examples: image classification, bounding box…
- Human reviewers: Amazon Mechanical Turk workers, your employees, or third-party vendors
- ❗ Ground Truth creates own model as humans label data → RLHF
- Only images the model isn't sure about are sent to human labelers (reduces manual work by 70%)
- Diagram
- Ground Truth Plus: Turnkey solution
- AWS experts manage the whole workflow
- Fill out a form
- Experts contact you, discuss pricing, manage labelers
- Screenshot
- ‼️ Do NOT confuse with Amazon Augmented AI (A2I)!
- GroundTruth primarily used for human labeling, A2I primarily used for human oversight of trained model predictions
- âť—Â SageMaker Ground Truth and A2I can however use the same human workforce for their separate jobs!
- Benefits: consistency, efficiency, flexibility
- 💡 Other ways to generate training labels: Rekognition, Comprehend… Some pre-trained models or unsupervised techniques can be helpful
SageMaker Data Wrangler
Ref: https://www.udemy.com/course/aws-certified-machine-learning-engineer-associate-mla-c01/learn/lecture/45285151
- 🔧 Import, preview, visualize, transform data… in a visual UI
- Even “Quick Model”
- Can also export data flow
- Many feature engineering capabilities (transform images, balance data, impute missing data, handle outliers, PCA…)
- ‼️ Data Wrangler doesn't actually perform the ETL or feature engineering transformations, but it does generate the code that you can execute to do that!
- Sources and destinations:
- Troubleshooting:
- SageMaker Studio should have correct IAM roles/permissions
- Data sources should allow access (e.g.
AmazonSageMakerFullAccess
policy)
- EC2 instance limit
“The following instance type is not available…”
error → actually is usually a service quota problem → Ask for a bigger EC2 instance/quota increase