What are LLMs?10 min read

What is an LLM?

A super-reader that has devoured every book ever written and now finishes your sentences
scope:Foundationaldifficulty:Beginner

The Super-Reader

Imagine someone who has read every book in the library. Every Wikipedia article. Every Reddit thread. Every news story, poem, recipe, and love letter ever posted online. Trillions and trillions of words.

Now imagine that this super-reader didn't memorize everything word-for-word. Instead, they noticed patterns. They learned that after "once upon a," the next word is almost always "time." They learned that recipe instructions usually start with a verb. They learned that after someone says "I'm sorry for your," the next word is probably "loss."

That super-reader is an LLM β€” a Large Language Model.

What Does LLM Stand For?

Let's break it down:

  • Large β€” These models are enormous. GPT-4 has hundreds of billions of parameters (think of these as tiny dials that got tuned during training). They need warehouses full of specialized computers to run.
  • Language β€” They work with human language: English, Spanish, Python code, mathematical notation β€” anything made of text.
  • Model β€” A model is a simplified representation of something complex. A globe is a model of Earth. An LLM is a model of how language works.

Put together: an LLM is a massive mathematical model that has learned the patterns of human language by reading an enormous amount of text.

The World's Smartest Autocomplete

You know the autocomplete on your phone? When you type "See you" and it suggests "later" or "tomorrow"? An LLM is that β€” but cranked up to a million.

Your phone's autocomplete might look at the last 3-5 words. An LLM looks at thousands of words at once. Your phone picks from common phrases. An LLM picks from a deep understanding of grammar, facts, stories, logic, code, and even humor.

At its core, an LLM does one thing: given all the text so far, predict the most likely next word. Then it adds that word to the text, and predicts the next next word. And the next. And the next. That's how it writes entire paragraphs, essays, and even code.

This is called autoregressive generation β€” a fancy way of saying "one word at a time, each word depends on all the words before it."

But Does It Actually Understand?

This is the great debate of our time. There are two camps:

  • Team "Stochastic Parrot" β€” LLMs are just very sophisticated pattern matchers. They don't understand anything. They're like a parrot that has overheard every conversation in the world. It can repeat things that sound right, but there's nobody home.
  • Team "Emergent Understanding" β€” When you get enough patterns and enough scale, something that looks a lot like understanding emerges. An LLM can solve logic puzzles, write working code, and explain physics. Is that really "just" pattern matching?

The truth? Nobody knows for sure. What we do know is that LLMs are incredibly useful, regardless of whether they "truly" understand. And that's what matters for most people.

Your First LLM API Call

# Using OpenAI's API to talk to an LLM
from openai import OpenAI
client = OpenAI() # uses OPENAI_API_KEY env variable
# The LLM predicts the next words based on your prompt
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "Explain LLMs to a 10-year-old in 2 sentences."}
]
)
print(response.choices[0].message.content)
# Example output:
# "An LLM is like a super-smart autocomplete that has
# read almost everything on the internet. When you ask
# it a question, it guesses the best words to say next,
# one at a time, until it forms a complete answer!"
Output
An LLM is like a super-smart autocomplete that has read almost
everything on the internet. When you ask it a question, it guesses
the best words to say next, one at a time, until it forms a
complete answer!
Note: Size matters β€” a lot. GPT-2 (2019) had 1.5 billion parameters and could write okay-ish paragraphs. GPT-3 (2020) had 175 billion parameters and could write essays. GPT-4 (2023) is rumored to have over a trillion parameters and can pass the bar exam. As models get larger and see more data, they don't just get a little better β€” they gain entirely new abilities. Researchers call these emergent capabilities, and they're one of the most fascinating (and debated) phenomena in AI.

What Can LLMs Actually Do?

The list keeps growing, but here's a sample:

  • Write β€” Essays, emails, stories, poems, marketing copy, tweets
  • Code β€” Write, debug, and explain programs in dozens of languages
  • Translate β€” Between human languages, and even between programming languages
  • Summarize β€” Condense long documents into short summaries
  • Reason β€” Solve math problems, logic puzzles, and standardized tests
  • Chat β€” Hold conversations, answer questions, role-play characters
  • Analyze β€” Extract insights from data, research papers, legal documents

What Can't LLMs Do?

Just as important:

  • They hallucinate β€” Sometimes they confidently state things that are completely wrong. They'll invent fake citations, make up statistics, or describe events that never happened.
  • They have no memory between conversations β€” Each chat starts fresh (unless the system adds memory features on top).
  • They can't access the real world β€” They can't browse the web, check today's weather, or open your files (unless given tools to do so).
  • They don't have opinions or feelings β€” When an LLM says "I think," it's a figure of speech learned from training data, not an actual thought.

The LLM Family Tree

There are many LLMs in the world today. Here are the major ones:

  • GPT series (OpenAI) β€” GPT-3.5 and GPT-4 power ChatGPT. The most well-known LLMs.
  • Claude (Anthropic) β€” Built with a focus on safety and helpfulness. Known for handling very long documents.
  • Gemini (Google) β€” Google's multimodal AI that can handle text, images, audio, and video.
  • LLaMA (Meta) β€” An open-source family of models that researchers and companies can freely use and modify.
  • Mistral β€” A European company making efficient open-source models that punch above their weight.

The landscape is changing fast. New models launch every few months, each one pushing the boundaries of what's possible.

Challenge

Quick check

What does an LLM fundamentally do at each step of text generation?

Continue reading