Generative AI
The Artist Who Studied 1,000 Paintings
Imagine an art student who spent years studying a thousand paintings β impressionist landscapes, Renaissance portraits, abstract expressionism. The student hasn't memorized any specific painting, but they've absorbed the patterns: how light falls on water, how shadows define a face, how colors create mood.
Now the student picks up a brush and paints something entirely new. It's not a copy of any painting they studied. It's original. But it captures the essence of what they learned.
That's generative AI β models that learn patterns from existing data and use those patterns to create new content. Text, images, music, video, code β if it can be represented as data, a generative model can learn to produce more of it.
Two Approaches to Generation
Generative models generally fall into two families:
- Autoregressive models β generate one piece at a time. GPT writes text one token at a time, each token conditioned on all previous tokens. Like writing a story one word at a time.
- Diffusion models β start with pure noise and gradually refine it into a clear output. DALL-E and Stable Diffusion work this way. Like a sculptor revealing a statue from a block of marble.
Large Language Models (LLMs)
The most famous generative models today are LLMs like GPT-4, Claude, and Gemini. Their core mechanism is deceptively simple: predict the next token.
Given "The capital of France is", the model outputs a probability distribution over all possible next tokens. "Paris" gets a very high probability. "pizza" gets a very low one. The model generates text by sampling from these probabilities, one token at a time.
What's remarkable is that this simple objective β next-token prediction, trained on trillions of words β produces systems that can write essays, solve math problems, write code, translate languages, and hold conversations. The intelligence emerges from scale.
Image Generation: From GANs to Diffusion
Image generation has its own story:
- GANs (2014) β two networks compete. A generator creates fake images. A discriminator tries to tell real from fake. They push each other to improve, like a counterfeiter vs a detective.
- VAEs β learn a compressed representation of images and generate new ones by sampling from that compressed space.
- Diffusion models (2020+) β the current state of the art. They learn to reverse a noise-adding process. Start with noise, iteratively denoise, and a clear image emerges. This produces higher-quality, more diverse images than GANs.
Generative AI Concepts in Code
The Generative AI Landscape
Generative AI is rapidly expanding beyond text and images:
- Code generation β Copilot, Cursor, and Claude write code from natural language descriptions
- Music β Suno and Udio generate songs with vocals, instruments, and lyrics
- Video β Sora and Runway generate video clips from text prompts
- 3D models β Point-E and DreamFusion generate 3D objects from text
- Science β AlphaFold generates 3D protein structures, accelerating drug discovery
The common thread: learn the statistical patterns in existing data, then use those patterns to generate new instances. The better the model and the more data, the more convincing the output.