Core Algorithms8 min read

Decision Trees

A flowchart that learns which questions to ask
training:O(n * d * log n) β€” sorting features at each splitprediction:O(depth) β€” just follow the branchesinterpretability:Very high β€” you can draw and read the tree

Remember playing 20 Questions as a kid? "Is it alive? Is it bigger than a breadbox? Does it have legs?" Each question narrows down the possibilities until you zero in on the answer.

A decision tree works exactly the same way. It learns a series of yes/no questions about your data, and each answer leads you down a different branch until you reach a prediction at the bottom.

The genius part? The tree figures out the best questions to ask on its own, just by looking at the data.

How does the tree pick its questions?

Imagine you're sorting a mixed bag of apples and oranges. You could ask:

  • "Is it red?" β€” This splits pretty well! Most apples are red, most oranges aren't.
  • "Does it weigh more than 300 grams?" β€” Not as useful. Both apples and oranges can be heavy or light.

The tree picks the question that creates the purest split β€” the one that best separates the classes. It measures this using a concept called information gain (or Gini impurity).

Think of it like this: a pile of only apples is pure (Gini = 0). A pile that's 50/50 apples and oranges is maximally impure (Gini = 0.5). The tree always picks the question that reduces impurity the most.

Growing the tree

The algorithm is beautifully recursive:

  1. Look at all features and all possible split points
  2. Pick the split that gives the best information gain
  3. Split the data into two groups
  4. Repeat steps 1-3 for each group
  5. Stop when a group is pure (all same class) or you hit a depth limit

Without limits, the tree will keep splitting until every single training example is in its own leaf. That's overfitting β€” the tree memorized the training data instead of learning general patterns. It's like studying the answer key instead of understanding the material.

To prevent this, we use pruning: setting a maximum depth, requiring a minimum number of samples per leaf, or cutting branches that don't improve performance on validation data.

Decision Tree: Should I Go Outside?

class Node:
def __init__(self, feature=None, threshold=None,
left=None, right=None, prediction=None):
self.feature = feature
self.threshold = threshold
self.left = left
self.right = right
self.prediction = prediction
# Hand-built decision tree for: go outside?
# Features: [temperature, is_raining (0/1), wind_speed]
tree = Node(feature=1, threshold=0.5, # Is it raining?
left=Node(feature=0, threshold=15, # Temp > 15?
left=Node(prediction="Stay in"), # Cold + not raining
right=Node(prediction="Go outside!")), # Warm + not raining
right=Node(prediction="Stay in")) # Raining β†’ stay in
def predict(node, x):
if node.prediction is not None:
return node.prediction
if x[node.feature] <= node.threshold:
return predict(node.left, x)
return predict(node.right, x)
test_cases = [
([25, 0, 5], "Warm, no rain"),
([10, 0, 3], "Cold, no rain"),
([20, 1, 2], "Warm but raining"),
]
for features, desc in test_cases:
print(f"{desc}: {predict(tree, features)}")
Output
Warm, no rain: Go outside!
Cold, no rain: Stay in
Warm but raining: Stay in
Note: Decision trees are one of the few ML models you can actually draw on a whiteboard and explain to a non-technical person. That's their superpower. But a single tree tends to overfit β€” which is why we usually combine many trees together (see: Random Forests).

Key Metrics

Training
Must sort features and evaluate splits at each node
Moderate O(n * d * log n)
Prediction
Just follow the branches β€” typically O(log n)
Very fast O(tree depth)
Interpretability
You can literally visualize and explain each decision
Excellent Human-readable
Overfitting risk
Unrestricted trees memorize training data
High Without pruning

Quick check

What does a decision tree use to pick the best feature to split on?
Challenge

Continue reading