Fundamentals10 min read

What is System Design?

The blueprint behind every app you use
scope:Foundationaldifficulty:Beginner

Why System Design Matters

Imagine you're building a treehouse. You could just start nailing boards together and hope for the best. But if you want it to hold 5 friends, survive a storm, and have a rope ladder — you need a plan.

System design is that plan, but for software. It's the art of deciding how to build something before you write a single line of code. Which database should you use? How will millions of users connect at the same time? What happens when a server crashes at 3 AM?

Every app you love — Instagram, YouTube, Google Maps — started with someone sketching boxes and arrows on a whiteboard. That sketch is system design.

A single server handling all requests
Traffic grows — the server struggles

Functional vs Non-Functional Requirements

Before designing anything, you need to know what you're building and how well it needs to work. These split into two buckets:

  • Functional requirements — What the system does. "Users can upload photos." "Users can send messages." Think of these as the features on the box.
  • Non-functional requirements — How the system behaves. "The page loads in under 200ms." "The system handles 10 million users." "Data is never lost." Think of these as the fine print.

Here's a trick: non-functional requirements are where interviews get interesting. Anyone can say "users can post tweets." The real challenge is making it work for 500 million users with 99.99% uptime.

Common non-functional requirements include:

  • Scalability — Can it grow?
  • Availability — Is it always up?
  • Latency — Is it fast?
  • Consistency — Does everyone see the same data?
  • Durability — Is data safe even if servers explode?
Note: Interview tip: Always spend the first 3-5 minutes of a system design interview clarifying requirements. Ask questions like "How many users?" "What's more important — speed or consistency?" "Do we need real-time updates?" This shows maturity and prevents you from designing the wrong thing.

The Design Process

Great system design follows a repeatable recipe. Think of it like building with LEGO — you follow the steps, but you still get to be creative.

Step 1: Understand the problem. What are we building? Who uses it? What are the most important features?

Step 2: Estimate the scale. How many users? How much data? How many requests per second? (More on this below.)

Step 3: Define the API. What endpoints or interfaces will the system expose? This is the contract between your system and the outside world.

Step 4: Design the high-level architecture. Draw the big boxes — clients, servers, databases, caches, load balancers. Show how data flows between them.

Step 5: Deep-dive into components. Pick the most interesting or tricky part and zoom in. How does the database schema look? What caching strategy do you use?

Step 6: Address bottlenecks. What breaks first when traffic spikes? Where are the single points of failure? How do you handle them?

The four-phase design process

Back-of-the-Envelope Estimation

System design loves big numbers. You'll often need to estimate things like "how much storage does YouTube need per day?" This is called back-of-the-envelope estimation — quick, rough math to guide your design decisions.

Here are the key numbers you'll work with:

  • DAU (Daily Active Users) — How many people use the app each day.
  • QPS (Queries Per Second) — How many requests hit your servers each second. If 10M users each make 10 requests/day: 10M × 10 / 86400 ≈ 1,157 QPS.
  • Storage — How much disk space you need. If each user uploads one 2MB photo/day with 10M DAU: 10M × 2MB = 20TB/day.
  • Bandwidth — How much data flows in/out. 20TB/day ÷ 86400 ≈ 231 MB/s.

Some handy numbers to memorize:

  • 1 day = 86,400 seconds (round to ~100K for quick math)
  • 1 million requests/day ≈ ~12 QPS
  • 1 char = 1 byte, 1 int = 4 bytes, 1 long/timestamp = 8 bytes
  • A typical tweet-sized text = ~300 bytes
  • A typical image = 200KB–2MB
  • A typical video (1 min) = 50–100MB

Quick Estimation Helper

def estimate_qps(dau: int, requests_per_user: int) -> float:
"""Estimate queries per second from daily active users."""
seconds_per_day = 86_400
qps = (dau * requests_per_user) / seconds_per_day
peak_qps = qps * 3 # peak is usually 2-5x average
return {"avg_qps": round(qps, 1), "peak_qps": round(peak_qps, 1)}
def estimate_storage(dau: int, data_per_user_bytes: int, years: int) -> str:
"""Estimate total storage needed over time."""
daily = dau * data_per_user_bytes
total = daily * 365 * years
units = ["B", "KB", "MB", "GB", "TB", "PB"]
idx = 0
val = float(total)
while val >= 1024 and idx < len(units) - 1:
val /= 1024
idx += 1
return f"{val:.1f} {units[idx]}"
# Example: Twitter-like app
print(estimate_qps(dau=300_000_000, requests_per_user=20))
# {'avg_qps': 69444.4, 'peak_qps': 208333.3}
print(estimate_storage(
dau=300_000_000,
data_per_user_bytes=300, # ~300 bytes per tweet
years=5
))
# 146.9 TB
Output
{'avg_qps': 69444.4, 'peak_qps': 208333.3}
146.9 TB
Note: Think of estimation like planning a road trip. You don't need to know the exact distance to the mile — you just need to know if it's a 2-hour drive or a 20-hour drive. That difference changes whether you pack snacks or book a hotel. Same idea: your estimates don't need to be perfect, they need to be in the right ballpark to guide design decisions.
Break it apart — distribute the work
Add resilience with cache, CDN, and queues

Common Building Blocks

Almost every system design uses the same set of LEGO pieces. Learning these is like learning your ABCs — once you know them, you can spell anything.

  • Load Balancer — Distributes traffic across multiple servers so no single server gets overwhelmed.
  • Cache — A fast temporary storage (like RAM) that saves you from hitting the slow database every time.
  • Database — Where your data lives permanently. SQL for structured data, NoSQL for flexible data.
  • Message Queue — A line where tasks wait to be processed, so your system doesn't choke during traffic spikes.
  • CDN (Content Delivery Network) — Servers spread around the world that serve static files (images, videos) from a location near the user.
  • API Gateway — The front door of your system that handles authentication, rate limiting, and routing.

In the following lessons, we'll deep-dive into each of these. By the end, you'll be able to combine them like a pro to design systems that handle millions of users.

Key Metrics

Read from cacheRAM-speed lookups
< 1 ms\(O(1)\)
Read from SSDFast disk
~0.1 ms\(O(1)\)
Read from HDDSpinning disk seek time
~5 ms\(O(1)\)
Network round-trip (same datacenter)Local network
~0.5 ms
Network round-trip (cross-continent)Speed of light limit
~100 ms
Database query (indexed)Depends on DB and data size
~1-10 ms\(O(\log n)\)
Database query (full scan)Avoid in production
~100-1000 ms\(O(n)\)

Quick check

Which of these is a NON-functional requirement?

Continue reading

Scalability
From one user to one billion — how systems grow
Load Balancing
Sharing the work so no server burns out
Databases: SQL vs NoSQL
Choosing the right home for your data