Social & Communication13 min read

Design a Music Streaming Service

Stream 100M+ songs to millions of listeners β€” like Spotify
scope:Real-World Systemdifficulty:Advanced

Understanding the Problem

Design a music streaming service like Spotify that can serve 100M+ songs to millions of concurrent users worldwide.

Functional Requirements:

  • Search for songs, artists, and albums
  • Play songs with seamless streaming
  • Create and manage playlists
  • Get personalized recommendations (e.g. Discover Weekly)
  • Download songs for offline listening

Non-Functional Requirements:

  • Low playback latency: Song playback must start within 200ms of pressing play
  • Gapless playback: No silence between consecutive tracks
  • Massive scale: Handle 50M concurrent streams
  • Global availability: Low-latency streaming worldwide via CDN edge nodes
β–Έ The idea: catalog β†’ stream β†’ personalize

Estimation

Let's size this system:

  • 500M total users, 50M concurrent streams at peak
  • 100M songs in the catalog, average ~3 MB per song (compressed) = 300 PB of audio storage
  • 200M playlist operations/day β€” creates, adds, deletes, reorders
  • Bandwidth: 50M streams Γ— 160 kbps average = ~8 Tbps aggregate bandwidth
  • Metadata: 100M songs Γ— ~5 KB metadata each = ~500 GB (easily fits in memory)

The main challenges are: (1) serving audio at massive scale via CDN, (2) pre-fetching the next track for gapless playback, and (3) building a recommendation engine that keeps users engaged.

Audio Encoding & Chunked Streaming

Music is encoded at different quality levels to adapt to network conditions:

  • Low: 96 kbps (Ogg Vorbis) β€” mobile data saver
  • Normal: 160 kbps (Ogg Vorbis) β€” default quality
  • High: 320 kbps (Ogg Vorbis) β€” premium tier

Spotify uses Ogg Vorbis for streaming; Apple Music uses AAC. Both are lossy but perceptually excellent at 160+ kbps.

Chunked streaming: Songs are split into ~10-second chunks, similar to HLS/DASH for video. This enables:

  • Adaptive bitrate: Switch quality mid-song based on bandwidth
  • Fast start: Begin playback after buffering just 1-2 chunks (~200ms)
  • Pre-fetch: Start loading the next track's first chunks while the current song is still playing (gapless playback)
  • Seek: Jump to any point without downloading the entire file
β–Έ Audio streaming: chunked delivery
Click chart to zoom
Play flow: the client gets a manifest, then streams chunks from the CDN. Cache misses fall through to the origin object store.

Recommendation Engine

Personalized recommendations are the killer feature that keeps users on the platform. Spotify's Discover Weekly uses a multi-stage pipeline:

Collaborative Filtering:

  • "Users who liked song X also liked song Y"
  • Matrix factorization on the user-song interaction matrix (billions of plays)
  • Works well for popular songs but struggles with new/niche tracks (cold start problem)

Content-Based Filtering:

  • Analyze audio features: tempo, key, energy, danceability, acousticness
  • Use deep learning models on raw audio spectrograms
  • Solves the cold start problem β€” new songs can be recommended based on their audio features

Hybrid Approach:

  • Combine collaborative and content-based signals
  • Add contextual features: time of day, listening history, skip patterns
  • Re-rank with business rules (boost new releases, licensed content)
  • Spotify runs this pipeline weekly for Discover Weekly, daily for Daily Mix
β–Έ Recommendation engine pipeline

Playlist Management & Offline Sync

Playlists:

  • Stored as ordered lists of song IDs in a database
  • Support collaborative playlists (multiple editors) with operational transforms or CRDTs for conflict resolution
  • 200M operations/day means ~2,300 writes/second β€” manageable with sharding by user_id

Offline Download:

  • Songs are downloaded encrypted (DRM) to the device
  • Playback requires a valid license token (checked periodically when online)
  • Sync service tracks which songs are cached locally and handles cleanup when storage is low

Royalty & Licensing:

  • Every stream is logged for royalty calculations
  • Stream events go to a Kafka pipeline for aggregation
  • Royalties are calculated per-stream based on licensing agreements (pro-rata or user-centric model)
β–Έ Full architecture
Note: Interview tip: The CDN is the hero of music streaming. Emphasize that audio chunks are cached at edge nodes worldwide, and the client pre-fetches the next track's first chunks while the current song plays. This is how you achieve <200ms playback start and gapless transitions β€” two things interviewers love to hear about.

Key Metrics

Playback start (CDN hit)
Manifest fetch + first chunk from edge
<200 ms \(O(1)\)
Catalog search
Elasticsearch inverted index
~50-100 ms \(O(\log n)\)
Audio storage (100M songs)
100M songs Γ— ~3 MB avg
~300 PB β€”
Concurrent streams
~8 Tbps aggregate bandwidth
50M β€”

Quick check

Why are songs split into ~10-second chunks rather than streamed as a single file?

Continue reading