Core Algorithms7 min read

Logistic Regression

Yes or no? Draw the line that separates them
training:O(n * d * iterations) β€” gradient descentprediction:O(d) β€” dot product + sigmoidoutput:Probability between 0 and 1

Imagine you're a bouncer at an exclusive club. Every person who walks up has certain features β€” age, dress code score, VIP status. Your job isn't to rate them on a scale of 1 to 100. It's binary: you're in, or you're out.

But here's the thing β€” you don't just flip a coin. You have a gut feeling that's actually pretty mathematical. You weigh each factor, add them up, and if the total crosses a threshold... the velvet rope opens.

That's logistic regression. Despite the name, it's not about regression (predicting numbers). It's a classification algorithm β€” it predicts which category something belongs to.

From straight line to S-curve

Remember linear regression? It gives you a number: y = mx + b. But for yes/no problems, we need a probability β€” a number between 0 and 1.

Enter the sigmoid function. It takes any number (from negative infinity to positive infinity) and squishes it into the range (0, 1). The formula is:

sigmoid(z) = 1 / (1 + e^(-z))

Think of it like a dimmer switch that's been modified. No matter how far you turn the knob in either direction, the light never goes below 0% or above 100%. Turn it way to the left? It approaches 0 but never quite reaches it. Way to the right? Approaches 1.

If the output is above 0.5, we predict "yes" (class 1). Below 0.5? "No" (class 0).

The decision boundary

Here's where it gets cool. Logistic regression draws an invisible line (or plane, or hyperplane in higher dimensions) through your feature space. On one side: class 0. On the other side: class 1.

Imagine scattering red and blue marbles on a table. Logistic regression finds the best straight line that separates reds from blues. Points near the line? The model is less confident. Points far from the line? Very confident.

The model doesn't just say "spam" or "not spam" β€” it says "I'm 94% sure this is spam." That probability is incredibly useful.

Logistic Regression: Spam or Not?

import numpy as np
def sigmoid(z):
return 1 / (1 + np.exp(-z))
# Features: [num_exclamation_marks, contains_free, num_links]
X = np.array([[1, 0, 1], [5, 1, 8], [0, 0, 2],
[8, 1, 10], [1, 0, 0], [6, 1, 7]])
y = np.array([0, 1, 0, 1, 0, 1]) # 0=not spam, 1=spam
# Pretrained weights (normally learned via gradient descent)
weights = np.array([0.3, 1.5, 0.2])
bias = -1.5
# Predict
for i in range(len(X)):
z = np.dot(X[i], weights) + bias
prob = sigmoid(z)
label = "SPAM" if prob > 0.5 else "not spam"
print(f"Email {i+1}: {prob:.2f} β†’ {label}")
Output
Email 1: 0.31 β†’ not spam
Email 2: 0.92 β†’ SPAM
Email 3: 0.21 β†’ not spam
Email 4: 0.97 β†’ SPAM
Email 5: 0.23 β†’ not spam
Email 6: 0.91 β†’ SPAM
Note: Logistic regression is linear under the hood β€” the decision boundary is always a straight line (or flat plane). If your classes are separated by a curve, logistic regression will struggle. But for many real-world problems, a straight boundary works surprisingly well.

Key Metrics

Training
Uses gradient descent, same as linear regression
Iterative O(n * d * iterations)
Prediction
Dot product + sigmoid β€” milliseconds
Very fast O(d)
Output
Not just a label β€” you get confidence scores
Probability 0 to 1
Decision Boundary
Can't capture curved boundaries without feature engineering
Linear Straight line/plane

Quick check

What does the sigmoid function do?
Challenge

Continue reading