Social & Communication14 min read

Design a Video Streaming Service

Upload, transcode, and stream video to millions — like YouTube

scope:Real-World Systemdifficulty:Advanced

Understanding the Problem

We're designing a video streaming service like YouTube or Netflix. Users upload videos, the system transcodes them to multiple resolutions, and millions of viewers stream them with minimal buffering.

Functional Requirements:

Upload videos — Creators upload raw video files (potentially gigabytes).
Stream videos — Viewers watch videos with smooth playback.
Search & discovery — Users can search for videos by title, tags, and description.
Recommendations — Suggest relevant videos based on watch history and preferences.

Non-Functional Requirements:

Low buffering: Video should start playing within 2 seconds. Rebuffering ratio < 1%.
Adaptive bitrate: Video quality adjusts automatically based on the viewer's bandwidth — no manual 720p/1080p switching needed.
Global delivery: Viewers worldwide should get fast, consistent playback via CDN edge servers.
Fault tolerance: Partial failures shouldn't stop playback. The system degrades gracefully (lower quality, not a black screen).

▸ The idea: upload → transcode → stream

Estimation

Let's size the system:

500M daily active users — watching an average of 5 videos/day
5M uploads/day — average raw video size ~200 MB
Transcoding: Each video is transcoded to 4 resolutions (360p, 480p, 720p, 1080p). Total output per video ≈ 1.5× raw size = ~300 MB of transcoded variants.
Daily upload storage: 5M × 200 MB = 1 PB/day raw + 1.5 PB transcoded
Concurrent streams: ~1M peak concurrent viewers
Bandwidth: 1M streams × 4 Mbps (720p avg) = 4 Tbps of egress bandwidth

This is a storage-heavy, bandwidth-heavy system. The key challenges are efficient transcoding, smart CDN caching, and adaptive streaming.

API Design

Video streaming uses specialized protocols beyond simple REST:

Upload (Chunked)

Endpoint	`POST /api/v1/upload/init`
Request	`{"title": "My Video", "description": "...", "tags": ["tech"]}`
Response	`{"upload_id": "abc123", "chunk_size": 5242880}`

Endpoint	`PUT /api/v1/upload/:upload_id/chunk/:n`
Body	Binary chunk data (5 MB per chunk)
Response	`{"chunk_n": 3, "status": "received"}`

Streaming (HLS/DASH)

Manifest	`GET /videos/:id/manifest.m3u8` (HLS) or `.mpd` (DASH)
Segment	`GET /videos/:id/segment/:quality/:n.ts`

Why chunked upload? Large videos (1 GB+) can't be uploaded in a single request — network drops, timeouts, and memory limits make it impractical. Chunking enables resumable uploads: if the connection drops at chunk 47 of 200, you resume from chunk 48 instead of starting over.

▸ Upload & transcoding pipeline

Click chart to zoom

Upload flow: chunked upload → object store → async transcode → CDN distribution

▸ Adaptive bitrate streaming

HLS / DASH: How Video Streaming Actually Works

Modern video streaming doesn't send one giant file. Instead, it uses adaptive bitrate streaming (ABR):

Segmentation: Each transcoded video is split into small segments (2–10 seconds each). A 10-minute video at 4 quality levels = ~600 segments total.
Manifest file: An .m3u8 (HLS) or .mpd (DASH) file lists all available quality levels and their segment URLs. The client downloads this first.
Adaptive switching: The player monitors download speed in real time. Slow connection? Switch to 360p segments. Fast WiFi? Jump to 1080p. The switch happens seamlessly between segments — no buffering, no restart.

HLS (HTTP Live Streaming): Developed by Apple. Uses .m3u8 playlists and .ts segments. Universally supported.

DASH (Dynamic Adaptive Streaming over HTTP): Open standard. Uses .mpd manifests and .m4s segments. More flexible but slightly less browser support.

Both work over standard HTTP/HTTPS — no special protocols needed. This is why CDN caching works beautifully for video.

▸ Full architecture: upload + streaming paths

CDN Strategy: The Key to Global Streaming

Serving video at scale is fundamentally a CDN problem. Here's the strategy:

Popular content (top 10%): Pre-warmed on edge servers worldwide. When a video goes viral, it's already cached close to viewers. This handles ~80% of all views.

Long-tail content (bottom 90%): Fetched on-demand from the origin. First viewer experiences a cold cache miss (~500ms extra latency), then subsequent viewers in that region get it from edge cache.

Multi-tier caching:

L1 — Edge PoPs (200+ locations): Closest to users. Cache hot segments only.
L2 — Regional hubs (20-30 locations): Larger capacity. Cache warm + lukewarm content.
L3 — Origin: Object store (S3) with all content. Only hit on full cache misses.

Pre-warming: When a channel with 10M subscribers uploads a new video, don't wait for cache misses. Proactively push transcoded segments to edge PoPs in regions where subscribers are concentrated.

Segment-level caching: Because videos are split into 2-10s segments, the CDN can cache at segment granularity. The first 30 seconds of a video (segments 1-5) get cached aggressively since most viewers watch at least that much.

Note: Interview tip: Video streaming has unique challenges that differentiate it from other system designs. Key talking points: (1) chunked/resumable uploads for large files, (2) async transcoding pipeline with multiple output resolutions, (3) HLS/DASH adaptive bitrate — explain how the manifest works, (4) CDN as the primary serving layer with multi-tier caching. Mentioning segment-level caching and pre-warming shows deep understanding.