SageMaker: Machine Learning Operations (MLOps)

Deployment Safeguards

🔧 Test out multiple models on live traffic
Variant weight → how much traffic it should get
- e.g. 10% variant weight gets 10% of traffic
- Once confident in new model's performance, can ramp it up to 100%
Needed for A/B tests & real-world performance validation
- Some models (e.g. recommender systems) can't be effectively evaluated offline

🔧 Compare performance of shadow variant to production variant
- Shadow variant gets a small portion of traffic
You monitor in SageMaker console and decide when to promote shadow variant to main variant

🔧 Control shifting traffic to new models
“Blue/Green” Deployments (Blue fleet: traffic on old model; Green fleet: traffic on new model)
1. All at once: Shift all traffic → Monitor that everything looks good → Terminate blue fleet
2. Canary: Shift a small portion of traffic → Monitor that traffic on new model looks good → Shift the rest of the traffic
3. Linear: Shift traffic in linearly spaced steps
Auto-rollbacks if something goes wrong during deployment
❗ Only for asynchronous or RT inference endpoints