Design a Video Streaming Service
Understanding the Problem
We're designing a video streaming service like YouTube or Netflix. Users upload videos, the system transcodes them to multiple resolutions, and millions of viewers stream them with minimal buffering.
Functional Requirements:
- Upload videos β Creators upload raw video files (potentially gigabytes).
- Stream videos β Viewers watch videos with smooth playback.
- Search & discovery β Users can search for videos by title, tags, and description.
- Recommendations β Suggest relevant videos based on watch history and preferences.
Non-Functional Requirements:
- Low buffering: Video should start playing within 2 seconds. Rebuffering ratio < 1%.
- Adaptive bitrate: Video quality adjusts automatically based on the viewer's bandwidth β no manual 720p/1080p switching needed.
- Global delivery: Viewers worldwide should get fast, consistent playback via CDN edge servers.
- Fault tolerance: Partial failures shouldn't stop playback. The system degrades gracefully (lower quality, not a black screen).
Estimation
Let's size the system:
- 500M daily active users β watching an average of 5 videos/day
- 5M uploads/day β average raw video size ~200 MB
- Transcoding: Each video is transcoded to 4 resolutions (360p, 480p, 720p, 1080p). Total output per video β 1.5Γ raw size = ~300 MB of transcoded variants.
- Daily upload storage: 5M Γ 200 MB = 1 PB/day raw + 1.5 PB transcoded
- Concurrent streams: ~1M peak concurrent viewers
- Bandwidth: 1M streams Γ 4 Mbps (720p avg) = 4 Tbps of egress bandwidth
This is a storage-heavy, bandwidth-heavy system. The key challenges are efficient transcoding, smart CDN caching, and adaptive streaming.
API Design
Video streaming uses specialized protocols beyond simple REST:
Upload (Chunked)
| Endpoint | POST /api/v1/upload/init |
| Request | {"title": "My Video", "description": "...", "tags": ["tech"]} |
| Response | {"upload_id": "abc123", "chunk_size": 5242880} |
| Endpoint | PUT /api/v1/upload/:upload_id/chunk/:n |
| Body | Binary chunk data (5 MB per chunk) |
| Response | {"chunk_n": 3, "status": "received"} |
Streaming (HLS/DASH)
| Manifest | GET /videos/:id/manifest.m3u8 (HLS) or .mpd (DASH) |
| Segment | GET /videos/:id/segment/:quality/:n.ts |
Why chunked upload? Large videos (1 GB+) can't be uploaded in a single request β network drops, timeouts, and memory limits make it impractical. Chunking enables resumable uploads: if the connection drops at chunk 47 of 200, you resume from chunk 48 instead of starting over.
HLS / DASH: How Video Streaming Actually Works
Modern video streaming doesn't send one giant file. Instead, it uses adaptive bitrate streaming (ABR):
- Segmentation: Each transcoded video is split into small segments (2β10 seconds each). A 10-minute video at 4 quality levels = ~600 segments total.
- Manifest file: An
.m3u8(HLS) or.mpd(DASH) file lists all available quality levels and their segment URLs. The client downloads this first. - Adaptive switching: The player monitors download speed in real time. Slow connection? Switch to 360p segments. Fast WiFi? Jump to 1080p. The switch happens seamlessly between segments β no buffering, no restart.
HLS (HTTP Live Streaming): Developed by Apple. Uses .m3u8 playlists and .ts segments. Universally supported.
DASH (Dynamic Adaptive Streaming over HTTP): Open standard. Uses .mpd manifests and .m4s segments. More flexible but slightly less browser support.
Both work over standard HTTP/HTTPS β no special protocols needed. This is why CDN caching works beautifully for video.
CDN Strategy: The Key to Global Streaming
Serving video at scale is fundamentally a CDN problem. Here's the strategy:
Popular content (top 10%): Pre-warmed on edge servers worldwide. When a video goes viral, it's already cached close to viewers. This handles ~80% of all views.
Long-tail content (bottom 90%): Fetched on-demand from the origin. First viewer experiences a cold cache miss (~500ms extra latency), then subsequent viewers in that region get it from edge cache.
Multi-tier caching:
- L1 β Edge PoPs (200+ locations): Closest to users. Cache hot segments only.
- L2 β Regional hubs (20-30 locations): Larger capacity. Cache warm + lukewarm content.
- L3 β Origin: Object store (S3) with all content. Only hit on full cache misses.
Pre-warming: When a channel with 10M subscribers uploads a new video, don't wait for cache misses. Proactively push transcoded segments to edge PoPs in regions where subscribers are concentrated.
Segment-level caching: Because videos are split into 2-10s segments, the CDN can cache at segment granularity. The first 30 seconds of a video (segments 1-5) get cached aggressively since most viewers watch at least that much.