Practical ML7 min read

Confusion Matrix

True positives, false alarms — measure what matters

true positive:Correct alarm · real fire detectedfalse positive:False alarm · burnt toast triggered itfalse negative:Missed detection · real fire, no alarm

Your apartment has a fire alarm. It can do four things:

True Positive (TP) — There's a real fire, and the alarm goes off. Perfect. This is what you want.
True Negative (TN) — No fire, no alarm. Also perfect. Quiet night.
False Positive (FP) — You burnt some toast, and the alarm screams at 3 AM. Annoying, but you're alive.
False Negative (FN) — There's a real fire, but the alarm stays silent. You're asleep. This is the worst case.

A confusion matrix is simply a table that counts how many of each type your model produced. It's called "confusion" because it shows you exactly where your model gets confused.

Reading the matrix

For a binary classifier (like fire/no-fire), the confusion matrix is a 2×2 grid:

	Predicted: Fire	Predicted: No Fire
Actual: Fire	TP (correct alarm)	FN (missed fire!)
Actual: No Fire	FP (false alarm)	TN (quiet night)

The metrics that flow from it

Accuracy = (TP + TN) / Total — how often you're right overall
Precision = TP / (TP + FP) — when you say "fire," how often is it real?
Recall = TP / (TP + FN) — of all real fires, how many did you catch?
F1 Score = 2 × (Precision × Recall) / (Precision + Recall) — harmonic mean of both

For the fire alarm, recall matters most. You'd rather have 10 false alarms than miss one real fire.

Building a Confusion Matrix

from sklearn.metrics import confusion_matrix, classification_report

# Actual labels and predictions
y_true = [1, 1, 1, 0, 0, 0, 1, 0, 1, 0]
y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 0, 0]
# 1 = fire, 0 = no fire

cm = confusion_matrix(y_true, y_pred)
print("Confusion Matrix:")
print(cm)
print()
print(classification_report(y_true, y_pred, 
      target_names=['No Fire', 'Fire']))

Output

Confusion Matrix:
[[4 1]
 [2 3]]

              precision    recall  f1-score
   No Fire       0.67      0.80      0.73
      Fire       0.75      0.60      0.67
  accuracy                           0.70

When accuracy lies

Imagine 1,000 people go through airport security. Only 2 are actually carrying something dangerous. A lazy detector that says "nobody is dangerous" gets 998/1000 = 99.8% accuracy. Sounds incredible — but it missed both actual threats. Its recall is 0%.

This is why the confusion matrix exists. Accuracy alone is dangerous for imbalanced datasets. You need to see the full picture: where exactly is the model failing?

Note: The "cost" of errors is often asymmetric. A false negative in cancer screening (missed cancer) is far worse than a false positive (unnecessary follow-up). Always ask: which type of error is more expensive for YOUR use case?