Practical ML7 min read

Cross-Validation

Test on every fold so no data goes to waste
k-fold CV:Train k times Β· more reliable estimateholdout split:Train once Β· quick but noisyleave-one-out:Train n times Β· most thorough but slowest

Imagine a band rehearsing for a big gig. They want honest feedback, but they can't hire an outside critic every day. So they come up with a clever system: each rehearsal, one member sits out and listens while the others play. The drummer critiques on Monday, the guitarist on Tuesday, the bassist on Wednesday.

By the end of the week, every member has been both a player and a critic. They get feedback from multiple perspectives, and no one's opinion dominates.

That's cross-validation in a nutshell. Instead of testing your model on just one chunk of data, you rotate through multiple chunks so every data point gets a turn being the test set.

The problem with a single train/test split

When you split your data 80/20, you're trusting that the 20% you picked is representative. But what if, by bad luck, all the easy examples ended up in the test set? Your model looks amazing β€” but it's a lie. Or what if the hardest examples all landed in the test set? Now your model looks terrible when it's actually fine.

A single split is like asking one person to review your restaurant. They might love it, they might hate it β€” one opinion is unreliable.

K-Fold Cross-Validation

Here's the fix. Split your data into k equal chunks (called folds). Then:

  1. Use fold 1 as the test set, train on folds 2-5
  2. Use fold 2 as the test set, train on folds 1, 3-5
  3. Use fold 3 as the test set, train on folds 1-2, 4-5
  4. ...keep going until every fold has been the test set

Now you have k different accuracy scores. Average them, and you get a much more reliable estimate of how your model performs.

K-Fold Cross-Validation in Action

from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
# Load a classic dataset
X, y = load_iris(return_X_y=True)
model = DecisionTreeClassifier(random_state=42)
# 5-fold cross-validation
scores = cross_val_score(model, X, y, cv=5)
print(f"Fold scores: {scores}")
print(f"Mean accuracy: {scores.mean():.3f}")
print(f"Std deviation: {scores.std():.3f}")
# Much more reliable than a single 80/20 split!
Output
Fold scores: [0.97 0.97 0.90 0.97 0.93]
Mean accuracy: 0.947
Std deviation: 0.030

Flavors of cross-validation

  • 5-Fold or 10-Fold CV β€” the most common. Good balance of speed and reliability.
  • Leave-One-Out (LOO) β€” each sample is its own fold. Most thorough, but painfully slow for large datasets (you train n separate models!).
  • Stratified K-Fold β€” ensures each fold has the same ratio of classes. Critical when your data is imbalanced (e.g., 95% not-fraud, 5% fraud).

When to use it

Cross-validation is essential when you're comparing models or tuning hyperparameters. If Model A gets 92% and Model B gets 91% on a single split, that difference might be noise. But if Model A consistently beats Model B across 5 different folds, you can be much more confident.

Note: Cross-validation doesn't give you a better model β€” it gives you a better ESTIMATE of how good your model is. You still train a final model on all the data after you've picked the best approach.

Key Metrics

πŸ”€ Single Train/Test Split
Fast but noisy β€” results depend on the random split
1Γ— training O(n)
πŸ“‚ K-Fold CV (k=5)
Reliable estimate with manageable cost
5Γ— training O(5n)
πŸ”¬ Leave-One-Out CV
Most reliable but can be extremely slow
nΓ— training O(nΒ²)

Quick check

In 5-fold cross-validation, how many times is the model trained?
Challenge

Continue reading