Confusion Matrix
Your apartment has a fire alarm. It can do four things:
- True Positive (TP) โ There's a real fire, and the alarm goes off. Perfect. This is what you want.
- True Negative (TN) โ No fire, no alarm. Also perfect. Quiet night.
- False Positive (FP) โ You burnt some toast, and the alarm screams at 3 AM. Annoying, but you're alive.
- False Negative (FN) โ There's a real fire, but the alarm stays silent. You're asleep. This is the worst case.
A confusion matrix is simply a table that counts how many of each type your model produced. It's called "confusion" because it shows you exactly where your model gets confused.
Reading the matrix
For a binary classifier (like fire/no-fire), the confusion matrix is a 2ร2 grid:
| Predicted: Fire | Predicted: No Fire | |
|---|---|---|
| Actual: Fire | TP (correct alarm) | FN (missed fire!) |
| Actual: No Fire | FP (false alarm) | TN (quiet night) |
The metrics that flow from it
- Accuracy = (TP + TN) / Total โ how often you're right overall
- Precision = TP / (TP + FP) โ when you say "fire," how often is it real?
- Recall = TP / (TP + FN) โ of all real fires, how many did you catch?
- F1 Score = 2 ร (Precision ร Recall) / (Precision + Recall) โ harmonic mean of both
For the fire alarm, recall matters most. You'd rather have 10 false alarms than miss one real fire.
Building a Confusion Matrix
When accuracy lies
Imagine 1,000 people go through airport security. Only 2 are actually carrying something dangerous. A lazy detector that says "nobody is dangerous" gets 998/1000 = 99.8% accuracy. Sounds incredible โ but it missed both actual threats. Its recall is 0%.
This is why the confusion matrix exists. Accuracy alone is dangerous for imbalanced datasets. You need to see the full picture: where exactly is the model failing?