What is AI?11 min read

AI vs ML vs Deep Learning

Russian nesting dolls β€” AI is the biggest, ML fits inside it, Deep Learning fits inside ML
scope:Foundationaldifficulty:Beginner

The Matryoshka of Intelligence

You've probably heard these three terms thrown around like confetti: Artificial Intelligence, Machine Learning, and Deep Learning. People use them interchangeably, and it drives computer scientists up the wall β€” because they're not the same thing.

Think of Russian nesting dolls (matryoshka). The biggest doll is AI. Open it up, and inside sits Machine Learning. Open that one, and inside sits Deep Learning.

  • AI (biggest doll) β€” Any technique that enables machines to mimic human intelligence. This is the whole field.
  • ML (medium doll) β€” A subset of AI where machines learn from data instead of being explicitly programmed.
  • DL (smallest doll) β€” A subset of ML that uses neural networks with many layers to learn complex patterns.

Every Deep Learning system is Machine Learning. Every Machine Learning system is AI. But not every AI system uses Machine Learning, and not every ML system uses Deep Learning.

Confused? Let's unpack each doll.

The Biggest Doll: Artificial Intelligence

AI is the broadest term. It means: any system that can perform tasks normally requiring human intelligence. That's it. It doesn't specify how the system works.

The earliest AI systems didn't learn from data at all. They used hand-written rules:

  • Expert systems (1980s) β€” Thousands of if-then rules written by humans. "If the patient has fever AND cough AND sore throat, THEN suggest flu test." No learning involved β€” just a giant decision tree that a human carefully programmed.
  • Game AI (classic) β€” The ghosts in Pac-Man follow simple rules: chase the player, scatter, repeat. That's AI! But there's no learning happening.
  • Rule-based chatbots β€” ELIZA (1966) could hold a conversation by pattern-matching keywords. "I feel sad" triggered "Why do you feel sad?" Clever, but no real understanding.

These are all AI β€” but none of them are Machine Learning. They're programmed, not trained.

AI Without Machine Learning: A Rule-Based System

# This is AI, but NOT Machine Learning.
# Every rule was written by a human. It never learns.
def diagnose(symptoms):
"""A simple rule-based medical AI (expert system)."""
symptoms = set(s.lower() for s in symptoms)
if {"fever", "cough", "sore throat"} <= symptoms:
return "Possible flu β€” suggest rapid test"
elif {"headache", "stiff neck", "fever"} <= symptoms:
return "Possible meningitis β€” urgent care needed"
elif {"sneezing", "runny nose"} <= symptoms:
return "Likely common cold β€” rest and fluids"
elif {"chest pain", "shortness of breath"} <= symptoms:
return "Possible cardiac event β€” call emergency"
else:
return "Insufficient data β€” consult a doctor"
# It works, but every rule was hand-coded:
print(diagnose(["fever", "cough", "sore throat"]))
print(diagnose(["sneezing", "runny nose"]))
print(diagnose(["headache", "fatigue"]))
Output
Possible flu β€” suggest rapid test
Likely common cold β€” rest and fluids
Insufficient data β€” consult a doctor

The Middle Doll: Machine Learning

Here's the big idea that changed everything: what if we stopped writing rules and let the machine figure them out from data?

That's Machine Learning. Instead of a programmer saying "if email contains 'free money', mark as spam," you give the machine thousands of examples of spam and non-spam emails, and it learns the patterns on its own.

The key ingredients of ML:

  • Data β€” Lots of it. The more, the better.
  • Algorithm β€” A mathematical method for finding patterns (decision trees, linear regression, SVMs, etc.).
  • Training β€” The process of feeding data to the algorithm so it can learn.
  • Model β€” The end result. A trained system that can make predictions on new data.

There are three main flavors of ML:

  • Supervised learning β€” You give it labeled examples. "Here's a photo of a cat (labeled 'cat'). Here's a dog (labeled 'dog'). Now classify this new photo." The machine learns from the answers you provide.
  • Unsupervised learning β€” No labels. "Here are 10,000 customer profiles. Find me groups of similar customers." The machine discovers patterns on its own.
  • Reinforcement learning β€” The machine learns by trial and error, getting rewards for good actions and penalties for bad ones. Think of training a dog with treats.

Machine Learning: Learning from Data

# This IS Machine Learning β€” the system learns rules from data,
# rather than having rules hand-written by a programmer.
from collections import Counter
def train_naive_classifier(training_data):
"""Train a simple word-frequency classifier."""
word_counts = {"spam": Counter(), "ham": Counter()}
class_counts = Counter()
for text, label in training_data:
class_counts[label] += 1
for word in text.lower().split():
word_counts[label][word] += 1
return word_counts, class_counts
def predict(text, word_counts, class_counts):
"""Predict spam or ham using learned word frequencies."""
scores = {}
for label in ["spam", "ham"]:
score = class_counts[label] # prior
for word in text.lower().split():
score += word_counts[label].get(word, 0)
scores[label] = score
return max(scores, key=scores.get)
# Training data β€” the machine learns from these examples
data = [
("free money click now winner", "spam"),
("congratulations you won free prize", "spam"),
("claim your free gift today", "spam"),
("meeting tomorrow at 3pm", "ham"),
("project update attached report", "ham"),
("lunch plans for friday team", "ham"),
]
# Train the model (it learns patterns from data!)
wc, cc = train_naive_classifier(data)
# Now it can classify NEW emails it hasn't seen before:
print(predict("you won a free vacation", wc, cc)) # spam
print(predict("meeting agenda for monday", wc, cc)) # ham
print(predict("click here for free money", wc, cc)) # spam
Output
spam
ham
spam

The Smallest Doll: Deep Learning

Deep Learning is ML on steroids. It uses artificial neural networks β€” structures loosely inspired by the human brain β€” with many layers (that's the "deep" part).

Why does depth matter? Each layer learns to recognize increasingly complex patterns:

  • Layer 1 β€” Detects edges and simple shapes in an image
  • Layer 2 β€” Combines edges into textures and parts ("this looks like fur")
  • Layer 3 β€” Combines parts into objects ("this looks like an ear")
  • Layer 10+ β€” Recognizes full concepts ("this is a golden retriever")

Deep learning is behind almost every AI breakthrough you've heard about recently:

  • Image recognition β€” Convolutional Neural Networks (CNNs)
  • Language understanding β€” Transformers (GPT, BERT, Claude)
  • Game playing β€” AlphaGo, AlphaZero
  • Art and music generation β€” Diffusion models, GANs
  • Speech recognition β€” Whisper, voice assistants

The catch? Deep learning is hungry. It needs massive amounts of data and computing power. Training a large language model can cost millions of dollars in GPU time. That's why deep learning only became practical when we got powerful GPUs, huge datasets (the internet!), and clever optimizations.

Deep Learning Intuition: A Tiny Neural Network

import math
def sigmoid(x):
"""Activation function β€” squashes any value to 0-1."""
return 1 / (1 + math.exp(-x))
class TinyNeuralNetwork:
"""A minimal 2-layer neural network (deep learning!).
Input (2 features) -> Hidden layer (2 neurons) -> Output (1 neuron)
"""
def __init__(self):
# Pre-trained weights (normally learned via backpropagation)
self.w_hidden = [[0.5, -0.3], [0.8, 0.1]] # 2 neurons, 2 inputs each
self.b_hidden = [-0.1, 0.2]
self.w_output = [0.6, -0.4]
self.b_output = 0.1
def forward(self, x):
"""Forward pass through the network."""
# Hidden layer: each neuron combines inputs with weights
hidden = []
for i in range(2):
z = sum(x[j] * self.w_hidden[i][j] for j in range(2))
z += self.b_hidden[i]
hidden.append(sigmoid(z)) # activation
# Output layer: combines hidden neurons
z = sum(hidden[j] * self.w_output[j] for j in range(2))
z += self.b_output
return sigmoid(z)
def predict(self, x):
prob = self.forward(x)
label = "Yes" if prob >= 0.5 else "No"
return f"{label} (confidence: {prob:.1%})"
# A tiny neural net making predictions
nn = TinyNeuralNetwork()
# Each input is [feature1, feature2]
print("Input [1, 0]:", nn.predict([1, 0]))
print("Input [0, 1]:", nn.predict([0, 1]))
print("Input [1, 1]:", nn.predict([1, 1]))
print("Input [0, 0]:", nn.predict([0, 0]))
print("\n--- What makes this 'deep'? ---")
print("This net has 2 layers (hidden + output).")
print("Real deep learning uses 10-1000+ layers!")
print("More layers = more complex patterns learned.")
Output
Input [1, 0]: Yes (confidence: 59.1%)
Input [0, 1]: No (confidence: 49.0%)
Input [1, 1]: Yes (confidence: 55.6%)
Input [0, 0]: No (confidence: 47.7%)

--- What makes this 'deep'? ---
This net has 2 layers (hidden + output).
Real deep learning uses 10-1000+ layers!
More layers = more complex patterns learned.

Key Metrics

Rule-Based AI
Fast, transparent, but doesn't scale to complex tasks
Hand-coded If-then rules
Classical ML
Needs feature engineering, works great on structured data
Learns from data Decision trees, SVMs, regression
Deep Learning
Needs massive data and GPUs, excels at unstructured data (images, text, audio)
Learns features AND patterns Neural nets (10-1000+ layers)
Note: When to use what? More complex isn't always better. If you're classifying data with 10 clear features, a decision tree (classical ML) might beat a deep neural network β€” and it'll be faster, cheaper, and easier to understand. Deep learning shines when the data is unstructured (images, text, audio) and massive. Don't use a sledgehammer to hang a picture frame. Pick the right tool for the job.

Putting It All Together

Let's revisit our nesting dolls with a concrete example β€” email:

  • AI approach (no ML): A programmer writes 500 rules. "If subject contains 'FREE', mark as spam. If sender is in contacts, mark as safe." This works... until spammers change tactics. Then you need to write 500 more rules.
  • ML approach: You feed the system 100,000 labeled emails. It learns that certain word combinations, sender patterns, and formatting cues predict spam. When spammers adapt, you retrain with new data.
  • DL approach: You feed the system millions of emails and let a neural network figure out everything β€” word patterns, sender behavior, even the tone of the writing. It discovers features no human would think to look for.

Each level is more powerful but also more complex, more data-hungry, and harder to interpret. The art of AI engineering is knowing which level of sophistication your problem actually needs.

Now when someone says "AI" when they mean "Machine Learning," you'll know the difference. And when someone calls a basic if-else chatbot "Deep Learning," you can politely set them straight.

Quick check

A programmer writes 200 if-then rules to diagnose diseases. Is this Machine Learning?
Challenge

Continue reading