Decision Trees

Remember playing 20 Questions as a kid? "Is it alive? Is it bigger than a breadbox? Does it have legs?" Each question narrows down the possibilities until you zero in on the answer.

A decision tree works exactly the same way. It learns a series of yes/no questions about your data, and each answer leads you down a different branch until you reach a prediction at the bottom.

The genius part? The tree figures out the best questions to ask on its own, just by looking at the data.

How does the tree pick its questions?

Imagine you're sorting a mixed bag of apples and oranges. You could ask:

"Is it red?" — This splits pretty well! Most apples are red, most oranges aren't.
"Does it weigh more than 300 grams?" — Not as useful. Both apples and oranges can be heavy or light.

The tree picks the question that creates the purest split — the one that best separates the classes. It measures this using a concept called information gain (or Gini impurity).

Think of it like this: a pile of only apples is pure (Gini = 0). A pile that's 50/50 apples and oranges is maximally impure (Gini = 0.5). The tree always picks the question that reduces impurity the most.

Growing the tree

The algorithm is beautifully recursive:

Look at all features and all possible split points
Pick the split that gives the best information gain
Split the data into two groups
Repeat steps 1-3 for each group
Stop when a group is pure (all same class) or you hit a depth limit

Without limits, the tree will keep splitting until every single training example is in its own leaf. That's overfitting — the tree memorized the training data instead of learning general patterns. It's like studying the answer key instead of understanding the material.

To prevent this, we use pruning: setting a maximum depth, requiring a minimum number of samples per leaf, or cutting branches that don't improve performance on validation data.

Decision Tree: Should I Go Outside?

class Node:
    def __init__(self, feature=None, threshold=None,
                 left=None, right=None, prediction=None):
        self.feature = feature
        self.threshold = threshold
        self.left = left
        self.right = right
        self.prediction = prediction

# Hand-built decision tree for: go outside?
# Features: [temperature, is_raining (0/1), wind_speed]
tree = Node(feature=1, threshold=0.5,                   # Is it raining?
    left=Node(feature=0, threshold=15,                   # Temp > 15?
        left=Node(prediction="Stay in"),                 # Cold + not raining
        right=Node(prediction="Go outside!")),            # Warm + not raining
    right=Node(prediction="Stay in"))                    # Raining → stay in

def predict(node, x):
    if node.prediction is not None:
        return node.prediction
    if x[node.feature] <= node.threshold:
        return predict(node.left, x)
    return predict(node.right, x)

test_cases = [
    ([25, 0, 5], "Warm, no rain"),
    ([10, 0, 3], "Cold, no rain"),
    ([20, 1, 2], "Warm but raining"),
]
for features, desc in test_cases:
    print(f"{desc}: {predict(tree, features)}")

Output

Warm, no rain: Go outside!
Cold, no rain: Stay in
Warm but raining: Stay in

Key Metrics

Training

Must sort features and evaluate splits at each node

Moderate O(n * d * log n)

Prediction

Just follow the branches — typically O(log n)

Very fast O(tree depth)

Interpretability

You can literally visualize and explain each decision

Excellent Human-readable

Overfitting risk

Unrestricted trees memorize training data

High Without pruning

How does the tree pick its questions?

Growing the tree

Decision Tree: Should I Go Outside?

Key Metrics

Quick check