Core Algorithms7 min read

Linear Regression

Draw the best straight line through your data

training:O(n * d) — scales with data size and featuresprediction:O(d) — just a dot productinterpretability:High — you can read the weights directly

Imagine you're a kid with a ruler and a piece of graph paper covered in dots. Each dot is a house — the x-axis is the house's size in square feet, and the y-axis is its price. Your job: place the ruler so the line passes as close to all the dots as possible.

That's linear regression in a nutshell. You're drawing the best straight line through your data so you can use it to predict new values.

The equation behind the line

Every straight line can be written as:

y = mx + b

You already know this from school. In ML-speak, we just rename things:

y — the prediction (house price)
x — the input feature (house size)
m — the weight (how much price changes per square foot)
b — the bias (the base price even for a "zero-size" house)

Training a linear regression model means finding the best values of m and b so the line fits the data as closely as possible.

How do we measure "best"?

We use something called Mean Squared Error (MSE). For every data point, we calculate the difference between the actual value and our line's prediction, square it (so negatives don't cancel out), and average all those squared errors.

Think of it like a teacher grading your guesses. Each guess that's off by 10 gets penalized 100 (10 squared). A guess that's off by 2 only gets penalized 4. Big mistakes get punished way more than small ones — that's the beauty of squaring.

The model's goal: minimize the MSE. Find the m and b that make the total error as tiny as possible.

Linear Regression from Scratch

import numpy as np

# House sizes (sq ft) and prices ($1000s)
X = np.array([600, 800, 1000, 1200, 1400, 1600])
y = np.array([150, 200, 250, 280, 340, 380])

# Calculate m (slope) and b (intercept)
n = len(X)
m = (n * np.sum(X * y) - np.sum(X) * np.sum(y)) / \
    (n * np.sum(X**2) - np.sum(X)**2)
b = (np.sum(y) - m * np.sum(X)) / n

print(f"y = {m:.2f}x + {b:.2f}")
print(f"1500 sq ft house: ${m * 1500 + b:.0f}k")

Output

y = 0.22x + 17.33
1500 sq ft house: $347k

Note: Linear regression assumes a straight-line relationship. If your data curves (like diminishing returns), a straight line will miss the pattern. Always plot your data first!