Social & Communication13 min read

Design a Music Streaming Service

Stream 100M+ songs to millions of listeners — like Spotify

scope:Real-World Systemdifficulty:Advanced

Understanding the Problem

Design a music streaming service like Spotify that can serve 100M+ songs to millions of concurrent users worldwide.

Functional Requirements:

Search for songs, artists, and albums
Play songs with seamless streaming
Create and manage playlists
Get personalized recommendations (e.g. Discover Weekly)
Download songs for offline listening

Non-Functional Requirements:

Low playback latency: Song playback must start within 200ms of pressing play
Gapless playback: No silence between consecutive tracks
Massive scale: Handle 50M concurrent streams
Global availability: Low-latency streaming worldwide via CDN edge nodes

▸ The idea: catalog → stream → personalize

Estimation

Let's size this system:

500M total users, 50M concurrent streams at peak
100M songs in the catalog, average ~3 MB per song (compressed) = 300 PB of audio storage
200M playlist operations/day — creates, adds, deletes, reorders
Bandwidth: 50M streams × 160 kbps average = ~8 Tbps aggregate bandwidth
Metadata: 100M songs × ~5 KB metadata each = ~500 GB (easily fits in memory)

The main challenges are: (1) serving audio at massive scale via CDN, (2) pre-fetching the next track for gapless playback, and (3) building a recommendation engine that keeps users engaged.

Audio Encoding & Chunked Streaming

Music is encoded at different quality levels to adapt to network conditions:

Low: 96 kbps (Ogg Vorbis) — mobile data saver
Normal: 160 kbps (Ogg Vorbis) — default quality
High: 320 kbps (Ogg Vorbis) — premium tier

Spotify uses Ogg Vorbis for streaming; Apple Music uses AAC. Both are lossy but perceptually excellent at 160+ kbps.

Chunked streaming: Songs are split into ~10-second chunks, similar to HLS/DASH for video. This enables:

Adaptive bitrate: Switch quality mid-song based on bandwidth
Fast start: Begin playback after buffering just 1-2 chunks (~200ms)
Pre-fetch: Start loading the next track's first chunks while the current song is still playing (gapless playback)
Seek: Jump to any point without downloading the entire file

▸ Audio streaming: chunked delivery

Click chart to zoom

Play flow: the client gets a manifest, then streams chunks from the CDN. Cache misses fall through to the origin object store.

Recommendation Engine

Personalized recommendations are the killer feature that keeps users on the platform. Spotify's Discover Weekly uses a multi-stage pipeline:

Collaborative Filtering:

"Users who liked song X also liked song Y"
Matrix factorization on the user-song interaction matrix (billions of plays)
Works well for popular songs but struggles with new/niche tracks (cold start problem)

Content-Based Filtering:

Analyze audio features: tempo, key, energy, danceability, acousticness
Use deep learning models on raw audio spectrograms
Solves the cold start problem — new songs can be recommended based on their audio features

Hybrid Approach:

Combine collaborative and content-based signals
Add contextual features: time of day, listening history, skip patterns
Re-rank with business rules (boost new releases, licensed content)
Spotify runs this pipeline weekly for Discover Weekly, daily for Daily Mix

▸ Recommendation engine pipeline

Playlist Management & Offline Sync

Playlists:

Stored as ordered lists of song IDs in a database
Support collaborative playlists (multiple editors) with operational transforms or CRDTs for conflict resolution
200M operations/day means ~2,300 writes/second — manageable with sharding by user_id

Offline Download:

Songs are downloaded encrypted (DRM) to the device
Playback requires a valid license token (checked periodically when online)
Sync service tracks which songs are cached locally and handles cleanup when storage is low

Royalty & Licensing:

Every stream is logged for royalty calculations
Stream events go to a Kafka pipeline for aggregation
Royalties are calculated per-stream based on licensing agreements (pro-rata or user-centric model)

▸ Full architecture

Note: Interview tip: The CDN is the hero of music streaming. Emphasize that audio chunks are cached at edge nodes worldwide, and the client pre-fetches the next track's first chunks while the current song plays. This is how you achieve <200ms playback start and gapless transitions — two things interviewers love to hear about.