Fundamentals10 min read

What is System Design?

The blueprint behind every app you use

scope:Foundationaldifficulty:Beginner

Why System Design Matters

Imagine you're building a treehouse. You could just start nailing boards together and hope for the best. But if you want it to hold 5 friends, survive a storm, and have a rope ladder — you need a plan.

System design is that plan, but for software. It's the art of deciding how to build something before you write a single line of code. Which database should you use? How will millions of users connect at the same time? What happens when a server crashes at 3 AM?

Every app you love — Instagram, YouTube, Google Maps — started with someone sketching boxes and arrows on a whiteboard. That sketch is system design.

▸ A single server handling all requests

▸ Traffic grows — the server struggles

Functional vs Non-Functional Requirements

Before designing anything, you need to know what you're building and how well it needs to work. These split into two buckets:

Functional requirements — What the system does. "Users can upload photos." "Users can send messages." Think of these as the features on the box.
Non-functional requirements — How the system behaves. "The page loads in under 200ms." "The system handles 10 million users." "Data is never lost." Think of these as the fine print.

Here's a trick: non-functional requirements are where interviews get interesting. Anyone can say "users can post tweets." The real challenge is making it work for 500 million users with 99.99% uptime.

Common non-functional requirements include:

Scalability — Can it grow?
Availability — Is it always up?
Latency — Is it fast?
Consistency — Does everyone see the same data?
Durability — Is data safe even if servers explode?

Note: Interview tip: Always spend the first 3-5 minutes of a system design interview clarifying requirements. Ask questions like "How many users?" "What's more important — speed or consistency?" "Do we need real-time updates?" This shows maturity and prevents you from designing the wrong thing.

The Design Process

Great system design follows a repeatable recipe. Think of it like building with LEGO — you follow the steps, but you still get to be creative.

Step 1: Understand the problem. What are we building? Who uses it? What are the most important features?

Step 2: Estimate the scale. How many users? How much data? How many requests per second? (More on this below.)

Step 3: Define the API. What endpoints or interfaces will the system expose? This is the contract between your system and the outside world.

Step 4: Design the high-level architecture. Draw the big boxes — clients, servers, databases, caches, load balancers. Show how data flows between them.

Step 5: Deep-dive into components. Pick the most interesting or tricky part and zoom in. How does the database schema look? What caching strategy do you use?

Step 6: Address bottlenecks. What breaks first when traffic spikes? Where are the single points of failure? How do you handle them?

▸ The four-phase design process

Back-of-the-Envelope Estimation

System design loves big numbers. You'll often need to estimate things like "how much storage does YouTube need per day?" This is called back-of-the-envelope estimation — quick, rough math to guide your design decisions.

Here are the key numbers you'll work with:

DAU (Daily Active Users) — How many people use the app each day.
QPS (Queries Per Second) — How many requests hit your servers each second. If 10M users each make 10 requests/day: 10M × 10 / 86400 ≈ 1,157 QPS.
Storage — How much disk space you need. If each user uploads one 2MB photo/day with 10M DAU: 10M × 2MB = 20TB/day.
Bandwidth — How much data flows in/out. 20TB/day ÷ 86400 ≈ 231 MB/s.

Some handy numbers to memorize:

1 day = 86,400 seconds (round to ~100K for quick math)
1 million requests/day ≈ ~12 QPS
1 char = 1 byte, 1 int = 4 bytes, 1 long/timestamp = 8 bytes
A typical tweet-sized text = ~300 bytes
A typical image = 200KB–2MB
A typical video (1 min) = 50–100MB

Quick Estimation Helper

def estimate_qps(dau: int, requests_per_user: int) -> float:
    """Estimate queries per second from daily active users."""
    seconds_per_day = 86_400
    qps = (dau * requests_per_user) / seconds_per_day
    peak_qps = qps * 3  # peak is usually 2-5x average
    return {"avg_qps": round(qps, 1), "peak_qps": round(peak_qps, 1)}

def estimate_storage(dau: int, data_per_user_bytes: int, years: int) -> str:
    """Estimate total storage needed over time."""
    daily = dau * data_per_user_bytes
    total = daily * 365 * years
    units = ["B", "KB", "MB", "GB", "TB", "PB"]
    idx = 0
    val = float(total)
    while val >= 1024 and idx < len(units) - 1:
        val /= 1024
        idx += 1
    return f"{val:.1f} {units[idx]}"

# Example: Twitter-like app
print(estimate_qps(dau=300_000_000, requests_per_user=20))
# {'avg_qps': 69444.4, 'peak_qps': 208333.3}

print(estimate_storage(
    dau=300_000_000,
    data_per_user_bytes=300,   # ~300 bytes per tweet
    years=5
))
# 146.9 TB

Output

{'avg_qps': 69444.4, 'peak_qps': 208333.3}
146.9 TB

Note: Think of estimation like planning a road trip. You don't need to know the exact distance to the mile — you just need to know if it's a 2-hour drive or a 20-hour drive. That difference changes whether you pack snacks or book a hotel. Same idea: your estimates don't need to be perfect, they need to be in the right ballpark to guide design decisions.

▸ Break it apart — distribute the work

▸ Add resilience with cache, CDN, and queues

Common Building Blocks

Almost every system design uses the same set of LEGO pieces. Learning these is like learning your ABCs — once you know them, you can spell anything.

Load Balancer — Distributes traffic across multiple servers so no single server gets overwhelmed.
Cache — A fast temporary storage (like RAM) that saves you from hitting the slow database every time.
Database — Where your data lives permanently. SQL for structured data, NoSQL for flexible data.
Message Queue — A line where tasks wait to be processed, so your system doesn't choke during traffic spikes.
CDN (Content Delivery Network) — Servers spread around the world that serve static files (images, videos) from a location near the user.
API Gateway — The front door of your system that handles authentication, rate limiting, and routing.

In the following lessons, we'll deep-dive into each of these. By the end, you'll be able to combine them like a pro to design systems that handle millions of users.