K-Nearest Neighbours (KNN)
You just moved to a new city and you're starving. You don't know any restaurants, so you knock on the doors of your 5 nearest neighbours and ask each one: "What's the best restaurant around here?"
Three of them say "Mario's Pizza." One says "Thai Palace." One says "Burger Barn." You go with the majority β Mario's Pizza it is.
That's KNN. When it needs to classify a new data point, it looks at the K closest data points in the training set and lets them vote. The majority class wins.
No training? Really?
Here's what's wild about KNN: there is no training step. The model just memorizes the entire dataset. When a new point arrives, it calculates the distance to every single stored point, finds the K closest ones, and takes a vote.
This makes KNN a "lazy learner" β it doesn't learn an equation or build a tree ahead of time. It does all the work at prediction time.
How do we measure "closeness"?
The most common way is Euclidean distance β the straight-line distance between two points. Imagine two houses plotted on a map. The distance between them is just the length of a line drawn from one to the other.
For two features (x1, y1) and (x2, y2):
distance = sqrt((x2 - x1)^2 + (y2 - y1)^2)
It's the Pythagorean theorem, dressed up for data science.
Choosing K: the magic number
The choice of K changes everything:
- K = 1: Just copy the nearest neighbour. Very sensitive to noise β one weird data point can throw everything off.
- K = 3 or 5: The sweet spot for many problems. Smooths out noise while staying responsive.
- K = 100: Very smooth predictions, but might miss local patterns. Like asking 100 neighbours β you get the city's average opinion, not your block's opinion.
Pro tip: always use an odd K for binary classification. If K = 4 and you get a 2-2 tie... awkward.
KNN from Scratch
Key Metrics
Quick check
Continue reading