💡 SageMaker is the “heart” of the MLA-C01 certification
- The majority of exam questions will have to do with SageMaker, and knowing it inside and out will be essential to do well in the exam.
- It is important to understand and discern between SageMaker Processing, SageMaker Training and SageMaker Hosting, which all cover different aspects of the end-to-end ML process.
- These notes first cover generic ML knowledge and concepts, and then their implementation in AWS (usually involving SageMaker and other AWS services).
- Some open-source Apache services like Hadoop or Spark are also covered, since they are also popular in ML environments.
- It is a good idea to review the high-level overview of SageMaker that was done in the foundational AIF-C01 certification. MLA-C01 builds on top of that knowledge.
SageMaker Generic Overview
- Check section in AWS AIF-C01:
Amazon SageMaker
- 💡 Will only include notes on new details and insights!
- ‼️  NOTE: in 2025, AWS rebranded SageMaker as "SageMaker AI”
- If you're looking for it in the AWS Console, look for it under that new name
- Just plain "SageMaker" takes you instead to "SageMaker Unified Studio" or the "SageMaker Platform," which is a wrapper around SageMaker and Amazon DataZone
- The exam only covers SageMaker AI at this time
SageMaker Notebooks
Ref: https://www.udemy.com/course/aws-certified-machine-learning-engineer-associate-mla-c01/learn/lecture/45285053
- 🔧 Old/classic method for ML in SageMaker → ML code
- Spin up EC2 instances to host ML Notebooks, which direct ML E2E process:
- S3 data access
- ML code in notebook (akin to Jupyter notebook)
- Libraries like Scikit_learn, numpy, pandas, Apache Spark, Tensorflow, etc at your disposal
- Wide variety of built-in models
- Can spin up training instances
- Can deploy trained models for making predictions (inferring) at scale
- Alternative to Notebook: spin up ML models without code by using SageMaker console
- SageMaker Notebook Diagram
E2E ML Process in SageMaker
Ref: https://www.udemy.com/course/aws-certified-machine-learning-engineer-associate-mla-c01/learn/lecture/45285065
- Data prep
- Data usually comes from S3
- Data can also come from Athena, EMR, Redshift, Amazon Keyspaces DB…
- Integration with Apache Spark