RL & Generative AI11 min read

Generative AI

Create new images, text, and music from patterns

scope:Core Conceptdifficulty:Intermediate

The Artist Who Studied 1,000 Paintings

Imagine an art student who spent years studying a thousand paintings — impressionist landscapes, Renaissance portraits, abstract expressionism. The student hasn't memorized any specific painting, but they've absorbed the patterns: how light falls on water, how shadows define a face, how colors create mood.

Now the student picks up a brush and paints something entirely new. It's not a copy of any painting they studied. It's original. But it captures the essence of what they learned.

That's generative AI — models that learn patterns from existing data and use those patterns to create new content. Text, images, music, video, code — if it can be represented as data, a generative model can learn to produce more of it.

Two Approaches to Generation

Generative models generally fall into two families:

Autoregressive models — generate one piece at a time. GPT writes text one token at a time, each token conditioned on all previous tokens. Like writing a story one word at a time.
Diffusion models — start with pure noise and gradually refine it into a clear output. DALL-E and Stable Diffusion work this way. Like a sculptor revealing a statue from a block of marble.

Large Language Models (LLMs)

The most famous generative models today are LLMs like GPT-4, Claude, and Gemini. Their core mechanism is deceptively simple: predict the next token.

Given "The capital of France is", the model outputs a probability distribution over all possible next tokens. "Paris" gets a very high probability. "pizza" gets a very low one. The model generates text by sampling from these probabilities, one token at a time.

What's remarkable is that this simple objective — next-token prediction, trained on trillions of words — produces systems that can write essays, solve math problems, write code, translate languages, and hold conversations. The intelligence emerges from scale.

Image Generation: From GANs to Diffusion

Image generation has its own story:

GANs (2014) — two networks compete. A generator creates fake images. A discriminator tries to tell real from fake. They push each other to improve, like a counterfeiter vs a detective.
VAEs — learn a compressed representation of images and generate new ones by sampling from that compressed space.
Diffusion models (2020+) — the current state of the art. They learn to reverse a noise-adding process. Start with noise, iteratively denoise, and a clear image emerges. This produces higher-quality, more diverse images than GANs.

Generative AI Concepts in Code

import random

# === Simple text generation with Markov chains ===
# A Markov chain is the simplest generative model for text

def build_markov_chain(text, order=2):
    """Build a character-level Markov chain."""
    chain = {}
    for i in range(len(text) - order):
        key = text[i:i+order]
        next_char = text[i+order]
        if key not in chain:
            chain[key] = []
        chain[key].append(next_char)
    return chain

def generate_text(chain, order, length=100):
    """Generate text using the Markov chain."""
    key = random.choice(list(chain.keys()))
    result = list(key)
    for _ in range(length):
        if key in chain:
            next_char = random.choice(chain[key])
            result.append(next_char)
            key = ''.join(result[-order:])
        else:
            break
    return ''.join(result)

# Train on some sample text
training_text = (
    "the cat sat on the mat the cat ate the rat "
    "the dog sat on the log the dog ate the frog "
    "the bird sat on the word the bird flew to the sky "
) * 10  # Repeat for more data

random.seed(42)
chain = build_markov_chain(training_text, order=3)

print("Generated text (Markov chain):")
for i in range(3):
    text = generate_text(chain, 3, length=50)
    print(f"  {i+1}. {text[:50]}")

print("\n--- How LLMs work (conceptually) ---")
vocab = ["the", "cat", "sat", "on", "mat", "dog", "ate"]
print("Input: 'the cat sat on the'")
print("Next-token probabilities:")
probs = {"mat": 0.45, "cat": 0.20, "dog": 0.15, "sat": 0.10, "the": 0.05, "ate": 0.03, "on": 0.02}
for word, p in probs.items():
    bar = '#' * int(p * 40)
    print(f"  {word:>5}: {p:.0%} {bar}")

Output

Generated text (Markov chain):
  1. the cat sat on the mat the dog ate the frog 
  2. the bird flew to the sky the cat ate the rat
  3. the dog sat on the log the bird sat on the w

--- How LLMs work (conceptually) ---
Input: 'the cat sat on the'
Next-token probabilities:
    mat: 45% ##################
    cat: 20% ########
    dog: 15% ######
    sat: 10% ####
    the: 5% ##
    ate: 3% #
     on: 2%

Note: Are generative models creative? This is one of the biggest debates in AI today. Generative models produce novel outputs — combinations never seen in the training data. But they don't have intentions, emotions, or understanding. They're pattern completion engines of extraordinary sophistication. Whether that counts as "creativity" depends on your definition — and reasonable people disagree.

The Generative AI Landscape

Generative AI is rapidly expanding beyond text and images:

Code generation — Copilot, Cursor, and Claude write code from natural language descriptions
Music — Suno and Udio generate songs with vocals, instruments, and lyrics
Video — Sora and Runway generate video clips from text prompts
3D models — Point-E and DreamFusion generate 3D objects from text
Science — AlphaFold generates 3D protein structures, accelerating drug discovery

The common thread: learn the statistical patterns in existing data, then use those patterns to generate new instances. The better the model and the more data, the more convincing the output.

Quick check

What is the core training objective of most Large Language Models?

Challenge

Transformers

The architecture behind modern AI — attention is all you need

→

Attention Mechanism

Focus on the words that matter most

→

AI Safety

Making sure AI does what we actually want

→