Decision Trees
Remember playing 20 Questions as a kid? "Is it alive? Is it bigger than a breadbox? Does it have legs?" Each question narrows down the possibilities until you zero in on the answer.
A decision tree works exactly the same way. It learns a series of yes/no questions about your data, and each answer leads you down a different branch until you reach a prediction at the bottom.
The genius part? The tree figures out the best questions to ask on its own, just by looking at the data.
How does the tree pick its questions?
Imagine you're sorting a mixed bag of apples and oranges. You could ask:
- "Is it red?" β This splits pretty well! Most apples are red, most oranges aren't.
- "Does it weigh more than 300 grams?" β Not as useful. Both apples and oranges can be heavy or light.
The tree picks the question that creates the purest split β the one that best separates the classes. It measures this using a concept called information gain (or Gini impurity).
Think of it like this: a pile of only apples is pure (Gini = 0). A pile that's 50/50 apples and oranges is maximally impure (Gini = 0.5). The tree always picks the question that reduces impurity the most.
Growing the tree
The algorithm is beautifully recursive:
- Look at all features and all possible split points
- Pick the split that gives the best information gain
- Split the data into two groups
- Repeat steps 1-3 for each group
- Stop when a group is pure (all same class) or you hit a depth limit
Without limits, the tree will keep splitting until every single training example is in its own leaf. That's overfitting β the tree memorized the training data instead of learning general patterns. It's like studying the answer key instead of understanding the material.
To prevent this, we use pruning: setting a maximum depth, requiring a minimum number of samples per leaf, or cutting branches that don't improve performance on validation data.
Decision Tree: Should I Go Outside?
Key Metrics
Quick check
Continue reading