Core Algorithms7 min read

K-Nearest Neighbours (KNN)

Ask your closest neighbours, go with the majority
training:O(1) β€” literally stores the data, nothing elseprediction:O(n * d) β€” compares against every stored pointmemory:O(n * d) β€” keeps entire dataset in memory

You just moved to a new city and you're starving. You don't know any restaurants, so you knock on the doors of your 5 nearest neighbours and ask each one: "What's the best restaurant around here?"

Three of them say "Mario's Pizza." One says "Thai Palace." One says "Burger Barn." You go with the majority β€” Mario's Pizza it is.

That's KNN. When it needs to classify a new data point, it looks at the K closest data points in the training set and lets them vote. The majority class wins.

No training? Really?

Here's what's wild about KNN: there is no training step. The model just memorizes the entire dataset. When a new point arrives, it calculates the distance to every single stored point, finds the K closest ones, and takes a vote.

This makes KNN a "lazy learner" β€” it doesn't learn an equation or build a tree ahead of time. It does all the work at prediction time.

How do we measure "closeness"?

The most common way is Euclidean distance β€” the straight-line distance between two points. Imagine two houses plotted on a map. The distance between them is just the length of a line drawn from one to the other.

For two features (x1, y1) and (x2, y2):

distance = sqrt((x2 - x1)^2 + (y2 - y1)^2)

It's the Pythagorean theorem, dressed up for data science.

Choosing K: the magic number

The choice of K changes everything:

  • K = 1: Just copy the nearest neighbour. Very sensitive to noise β€” one weird data point can throw everything off.
  • K = 3 or 5: The sweet spot for many problems. Smooths out noise while staying responsive.
  • K = 100: Very smooth predictions, but might miss local patterns. Like asking 100 neighbours β€” you get the city's average opinion, not your block's opinion.

Pro tip: always use an odd K for binary classification. If K = 4 and you get a 2-2 tie... awkward.

KNN from Scratch

import math
from collections import Counter
def euclidean(a, b):
return math.sqrt(sum((x - y) ** 2 for x, y in zip(a, b)))
def knn_predict(train_X, train_y, new_point, k=3):
# Calculate distance to every training point
distances = []
for i, x in enumerate(train_X):
d = euclidean(x, new_point)
distances.append((d, train_y[i]))
# Sort by distance and pick K nearest
distances.sort()
k_nearest = [label for _, label in distances[:k]]
# Majority vote
vote = Counter(k_nearest).most_common(1)[0][0]
return vote
# Fruit data: [weight_g, sweetness]
train_X = [[150, 7], [170, 8], [130, 3], [140, 2], [180, 9], [120, 4]]
train_y = ['apple', 'apple', 'lemon', 'lemon', 'apple', 'lemon']
mystery = [155, 6]
print(f"Mystery fruit: {knn_predict(train_X, train_y, mystery, k=3)}")
Output
Mystery fruit: apple
Note: KNN is heavily affected by feature scale. If one feature ranges from 0-1000 (like salary) and another from 0-5 (like rating), the salary will dominate distance calculations. Always normalize your features before using KNN!

Key Metrics

Training
No training at all β€” just stores the data
Instant O(1)
Prediction
Must compare against every stored point
Slow O(n * d)
Memory
Entire dataset stays in memory
High O(n * d)
Best for
Gets impractical for millions of points
Small datasets n < 10,000

Quick check

Why is KNN called a 'lazy learner'?
Challenge

Continue reading