Foundations7 min read

Overfitting & Underfitting

Goldilocks and the three models — too simple, too complex, just right

underfitting:Model too simple · misses patternsoverfitting:Model too complex · memorizes noisejust right:Generalizes well · balances bias & variance

Remember the story of Goldilocks? One porridge was too hot, one was too cold, and one was just right. Machine learning has the exact same problem — but with models instead of porridge.

Imagine you're studying for a history exam:

Student A reads the chapter titles and calls it a day. "Something happened in 1776... America, I think?" Way too shallow — they underfitted the material.
Student B memorizes the textbook word-for-word, including page numbers and typos. When the exam asks a slightly different question, they freeze. They overfitted — they memorized instead of understanding.
Student C understands the key themes, cause-and-effect relationships, and can apply them to new questions. Just right.

Your ML model needs to be Student C.

Underfitting: "I barely tried"

An underfitting model is too simple to capture the patterns in the data. It performs poorly on both training data and test data.

Think of fitting a straight line through data that clearly curves. The line doesn't match the training points, and it certainly won't match new points either.

Signs of underfitting:

Low training accuracy
Low test accuracy
The model is "too dumb" for the problem

Common causes:

Model is too simple (e.g., linear model for a nonlinear problem)
Not enough features
Too much regularization (we'll cover this later)
Not trained long enough

Overfitting: "I memorized the textbook"

An overfitting model is too complex. It learns the training data perfectly — including the noise and random quirks that aren't real patterns. Then it bombs on new data because those quirks don't generalize.

Think of fitting a wild, squiggly curve that passes through every single training point. It looks perfect on paper, but it's capturing noise, not signal.

Signs of overfitting:

High training accuracy (often near perfect)
Much lower test accuracy
The gap between train and test scores is large

Common causes:

Model is too complex (too many parameters)
Not enough training data
Training for too long
No regularization

Seeing Overfit vs. Underfit in Action

import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# True pattern: y = 2x + noise
np.random.seed(42)
X = np.random.uniform(0, 10, 20).reshape(-1, 1)
y = 2 * X.ravel() + np.random.normal(0, 2, 20)

# Split
X_train, X_test = X[:15], X[15:]
y_train, y_test = y[:15], y[15:]

# Model 1: Too simple (constant — degree 0)
from sklearn.dummy import DummyRegressor
simple = DummyRegressor(strategy='mean')
simple.fit(X_train, y_train)
print("=== Underfitting (just predicts the mean) ===")
print(f"Train error: {mean_squared_error(y_train, simple.predict(X_train)):.1f}")
print(f"Test error:  {mean_squared_error(y_test, simple.predict(X_test)):.1f}")

# Model 2: Just right (degree 1 — linear)
right = LinearRegression()
right.fit(X_train, y_train)
print("\n=== Just Right (linear) ===")
print(f"Train error: {mean_squared_error(y_train, right.predict(X_train)):.1f}")
print(f"Test error:  {mean_squared_error(y_test, right.predict(X_test)):.1f}")

# Model 3: Too complex (degree 15 polynomial)
poly = PolynomialFeatures(degree=15)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)
complex_model = LinearRegression()
complex_model.fit(X_train_poly, y_train)
print("\n=== Overfitting (degree-15 polynomial) ===")
print(f"Train error: {mean_squared_error(y_train, complex_model.predict(X_train_poly)):.1f}")
print(f"Test error:  {mean_squared_error(y_test, complex_model.predict(X_test_poly)):.1f}")

Output

=== Underfitting (just predicts the mean) ===
Train error: 36.2
Test error:  32.8

=== Just Right (linear) ===
Train error: 3.4
Test error:  4.1

=== Overfitting (degree-15 polynomial) ===
Train error: 0.0
Test error:  9847.3

Key Metrics

🧊 Underfitting

Model is too simple — it can't even learn the training data

Low train, Low test High bias

🔥 Overfitting

Model memorized training data, fails on new data

High train, Low test High variance

✅ Good Fit

Model learned real patterns that generalize

High train, High test Balanced

📏 The Gap

Small gap = good. Large gap = overfitting.

Train score - Test score Key diagnostic

How to fix underfitting

Use a more complex model — switch from linear to polynomial, or from a shallow tree to a deeper one
Add more features — give the model more information to work with
Reduce regularization — let the model be more flexible
Train longer — the model might not have converged yet

How to fix overfitting

Get more training data — harder to memorize 100,000 examples than 100
Use a simpler model — fewer parameters means less room for memorization
Add regularization — penalize overly complex models (L1, L2, dropout)
Early stopping — stop training before the model starts memorizing
Cross-validation — evaluate on multiple train/test splits for a more robust estimate

The bias-variance tradeoff

This tension has a formal name: the bias-variance tradeoff.

Bias = how much the model's assumptions cause it to miss patterns (underfitting)
Variance = how much the model's predictions change when trained on different data (overfitting)

You want both to be low, but reducing one tends to increase the other. The sweet spot is in the middle.

Note: Here's a practical rule of thumb: if your training score is much higher than your test score, you're overfitting. If both scores are low, you're underfitting. Start simple, increase complexity gradually, and stop when the test score starts dropping — even if the training score keeps going up.

Quick check

Your model gets 99% accuracy on training data but 52% on test data. What's happening?

Challenge

Train-Test Split

Practice with homework, get graded on new questions — why you must split your data

→

What Is Machine Learning?

Teaching computers to learn from examples instead of following rigid rules

→

Features & Labels

Ingredients are features, the dish name is the label — teach your model what to look at and what to predict

→

Types of Machine Learning

Three classrooms, three teaching styles — supervised, unsupervised, and reinforcement

→