Overfitting & Underfitting
Remember the story of Goldilocks? One porridge was too hot, one was too cold, and one was just right. Machine learning has the exact same problem β but with models instead of porridge.
Imagine you're studying for a history exam:
- Student A reads the chapter titles and calls it a day. "Something happened in 1776... America, I think?" Way too shallow β they underfitted the material.
- Student B memorizes the textbook word-for-word, including page numbers and typos. When the exam asks a slightly different question, they freeze. They overfitted β they memorized instead of understanding.
- Student C understands the key themes, cause-and-effect relationships, and can apply them to new questions. Just right.
Your ML model needs to be Student C.
Underfitting: "I barely tried"
An underfitting model is too simple to capture the patterns in the data. It performs poorly on both training data and test data.
Think of fitting a straight line through data that clearly curves. The line doesn't match the training points, and it certainly won't match new points either.
Signs of underfitting:
- Low training accuracy
- Low test accuracy
- The model is "too dumb" for the problem
Common causes:
- Model is too simple (e.g., linear model for a nonlinear problem)
- Not enough features
- Too much regularization (we'll cover this later)
- Not trained long enough
Overfitting: "I memorized the textbook"
An overfitting model is too complex. It learns the training data perfectly β including the noise and random quirks that aren't real patterns. Then it bombs on new data because those quirks don't generalize.
Think of fitting a wild, squiggly curve that passes through every single training point. It looks perfect on paper, but it's capturing noise, not signal.
Signs of overfitting:
- High training accuracy (often near perfect)
- Much lower test accuracy
- The gap between train and test scores is large
Common causes:
- Model is too complex (too many parameters)
- Not enough training data
- Training for too long
- No regularization
Seeing Overfit vs. Underfit in Action
Key Metrics
How to fix underfitting
- Use a more complex model β switch from linear to polynomial, or from a shallow tree to a deeper one
- Add more features β give the model more information to work with
- Reduce regularization β let the model be more flexible
- Train longer β the model might not have converged yet
How to fix overfitting
- Get more training data β harder to memorize 100,000 examples than 100
- Use a simpler model β fewer parameters means less room for memorization
- Add regularization β penalize overly complex models (L1, L2, dropout)
- Early stopping β stop training before the model starts memorizing
- Cross-validation β evaluate on multiple train/test splits for a more robust estimate
The bias-variance tradeoff
This tension has a formal name: the bias-variance tradeoff.
- Bias = how much the model's assumptions cause it to miss patterns (underfitting)
- Variance = how much the model's predictions change when trained on different data (overfitting)
You want both to be low, but reducing one tends to increase the other. The sweet spot is in the middle.
Quick check
Continue reading