ML Model Fit, Bias, and Variance
Ref: https://www.udemy.com/course/aws-ai-practitioner-certified/learn/lecture/44886641
Model Fit
- Overfitting
- Performs well on training data
- Doesn't perform well on evaluation data
- Underfitting
- Performs poorly on training data
- Could indicated model too simple or poor data features
- Balanced ← Goal
Model Bias
- 🔧 Difference or error between predicted and actual value
- High bias → model doesn't closely match training data → underfitting
- Reduce by:
- using a more complex model
- increasing the number of features
Model Variance
- 🔧 Difference of model performance if model is trained on a different dataset of a similar distribution
- High variance → model very sensitive to changes in training data → good performance on one training dataset, bad performance on evaluation data set → overfitting
- Reduce by:
- feature selection (less, more important features)
- splitting into training and test datasets multiple times
ML - Regression Model Metrics
Ref: https://www.udemy.com/course/aws-ai-practitioner-certified/learn/lecture/45375513
- Example: Imagine you’re trying to predict how well students do on a test based on how many hours they study.
- MAE, MAPE, RMSE – measure error: how “accurate” the model is
- if MAE is 5 → on average, your model’s prediction of a student's score is about 5 points off from their actual score
- R² (R Squared) – measures variance
- If R² is 0.8 → 80% of the changes in test scores can be explained by how much students studied, and the remaining 20% is due to other factors that can't be explained by the model (maybe natural ability or luck)