Bias vs Variance
Imagine you're at a dartboard competition. Two of your friends are playing:
- Alice throws every dart in a tight cluster β but the cluster is consistently to the left of the bullseye. She's precise but off-target.
- Bob has darts scattered all over the board. Some hit near the center, others land in the wall. He's not biased toward any direction, but his throws are wildly inconsistent.
Alice has high bias, low variance. Bob has low bias, high variance. The goal? Throw like neither of them. You want your darts tight AND centered β that's the sweet spot in machine learning too.
What is bias?
Bias is when your model makes overly simplistic assumptions. It's like fitting a straight line through data that's clearly curved. No matter how much data you give it, the model just can't capture the real pattern. This is called underfitting.
Imagine using a ruler to trace the outline of a cloud. The ruler isn't flexible enough β it'll always give you a straight line, no matter how curvy the cloud is.
What is variance?
Variance is when your model is too sensitive to the training data. It memorizes every twist, bump, and noise in the data, so it performs amazingly on the training set but falls apart on new data. This is overfitting.
Imagine tracing that same cloud with a shaky hand and a super-fine pen. You capture every tiny turbulence, but your drawing looks completely different every time the wind shifts.
Seeing Bias vs Variance with Polynomial Fits
The tradeoff
Here's the painful truth: reducing bias usually increases variance, and vice versa. It's a seesaw.
- Make your model more complex β bias drops, but variance rises
- Make your model simpler β variance drops, but bias rises
The art of machine learning is finding the balance point where total error (biasΒ² + variance) is minimized. This is called the bias-variance tradeoff.
How to spot the problem
| Symptom | Problem | Fix |
|---|---|---|
| Bad on training AND test data | High bias (underfitting) | Use a more complex model, add features |
| Great on training, bad on test | High variance (overfitting) | Get more data, simplify model, regularize |