Preface to SageMaker
- 💡 SageMaker is the “heart” of the MLA-C01 certification
- The majority of exam questions will have to do with SageMaker, and knowing it inside and out will be essential to do well in the exam.
- It is important to understand and discern between SageMaker Processing, SageMaker Training and SageMaker Hosting, which all cover different aspects of the end-to-end ML process.
- These notes first cover generic ML knowledge and concepts, and then their implementation in AWS (usually involving SageMaker and other AWS services).
- Some open-source Apache services like Hadoop or Spark are also covered, since they are also popular in ML environments.
- It is a good idea to review the high-level overview of SageMaker that was done in the foundational AIF-C01 certification. MLA-C01 builds on top of that knowledge.
- ‼️  NOTE: in 2025, AWS rebranded SageMaker as "SageMaker AI”
- If you're looking for it in the AWS Console, look for it under that new name
- Just plain "SageMaker" takes you instead to "SageMaker Unified Studio" or the "SageMaker Platform," which is a wrapper around SageMaker and Amazon DataZone
- The exam only covers SageMaker AI at this time. These notes only use “SageMaker”, and it should be understood as a synonym to “SageMaker AI”
My SageMaker notes for AIF-C01 (good for reference or review)
Amazon SageMaker
Introduction to Amazon SageMaker
Ref: https://www.udemy.com/course/aws-certified-machine-learning-engineer-associate-mla-c01/learn/lecture/45285053 and https://www.udemy.com/course/aws-certified-machine-learning-engineer-associate-mla-c01/learn/lecture/45285373
- 🔧 AWS service that can handle the whole E2E process in ML
- E2E ML process = Data processing, model training, model deployment and model hosting
- Tons of features and sub-products (will go into depth in these notes)
- Diagram
- SageMaker Training and Deployment Architecture
E2E ML Process in SageMaker
Ref: https://www.udemy.com/course/aws-certified-machine-learning-engineer-associate-mla-c01/learn/lecture/45285065 and https://www.udemy.com/course/aws-certified-machine-learning-engineer-associate-mla-c01/learn/lecture/45285373
- Data Preparation (data prep)
- Data usually comes from S3
- Data can also come from Athena, EMR, Redshift, Amazon Keyspaces DB…
- Integration with Apache Spark
- Data Processing
- Processing job: copy raw data from S3 → Spin up processing container → Output processed data to S3
- Container can be SageMaker built-in or user provided (code)
- Diagram
- Training
- Training job requires
- URL of S3 bucket with training data
- ML compute resources
- URL of S3 bucket for output → Model outputted to S3
- Container (ECR) path to training code
- Many training options available
- Built-in algorithms, Spark MLLib, Tensorflow, PyTorch, Scikit-learn, XGBoost, Hugging Face, your own Docker image, AWS marketplace-purchased algorithms…
- Deployment
- 2 ways:
- Persistent endpoint for individual predictions/inference on demand
- SageMaker Batch Transform for predictions of an entire dataset
- Many cool options: inference pipelines, SageMaker Neo (edge devices), Elastic Inference, automatic scaling, shadow testing…