Design a Social Network
Understanding the Problem
Design a social network like Facebook β a platform where users create profiles, connect with friends, share content, and browse a personalized news feed.
This is one of the most complex systems you can design because it touches graphs, real-time feeds, messaging, search, and massive scale β all at once.
Functional Requirements:
- Create profile: Users can sign up, set a profile picture, bio, and personal info.
- Add friends: Send/accept friend requests, forming a bidirectional connection.
- Post content: Share text, images, and videos on your timeline.
- News feed: A personalized, ranked feed of posts from friends and followed pages.
- Search users: Find people by name, email, or mutual connections.
Non-Functional Requirements:
- Low latency feed: News feed must load in under 500ms β users expect instant content.
- Handle billions of relationships: The social graph has 2B+ nodes and hundreds of billions of edges.
- High availability: 99.99% uptime. Social networks are always-on services.
- Eventually consistent: A new post doesn't need to appear in every friend's feed instantly β seconds of delay is acceptable.
Back-of-the-Envelope Estimation
Let's size this system:
- 2 billion total users, 500 million daily active users (DAU)
- Average 300 friends per user β that's 300 billion friendship edges (bidirectional)
- 10 million new posts per day β ~115 posts/second on average, peaks at 5-10x
- News feed reads: If each DAU checks their feed 20 times/day, that's 10 billion feed reads/day (~115K reads/second)
- Storage: User profiles ~1KB each = 2 TB. Posts with media metadata ~5KB each, 10M/day Γ 365 Γ 5 years β 90 TB. Media (images/videos) in object storage β petabytes.
- Graph storage: 300B edges Γ ~16 bytes each (two user IDs) β 5 TB just for the adjacency data
This is a read-heavy, graph-heavy system. The news feed and social graph are the two hardest problems to solve at scale.
Data Models
The core entities and how they relate:
User Profile: user_id, name, email, profile_pic_url, bio, created_at. Stored in a relational database (PostgreSQL) β structured data with complex queries.
Friendship (Social Graph): An adjacency list representation β for each user, store the list of friend IDs. This is the heart of the system.
- Option A β Relational: A
friendshipstable with(user_id_1, user_id_2, status, created_at). Simple but expensive for graph traversals (friend-of-friend requires self-joins). - Option B β Graph Database: Use Neo4j or TAO (Facebook's graph store). Optimized for traversals like "mutual friends" and "people you may know."
- Option C β Adjacency list in key-value store: Key = user_id, Value = list of friend IDs. Fast reads, but harder to query relationships.
Post: post_id, author_id, content, media_urls, created_at, privacy_level. Stored in a wide-column store (Cassandra) partitioned by author_id for fast timeline reads.
Comment & Like: Linked to posts. High write volume β likes can spike virally. Often stored in separate tables/stores optimized for counters.
Feed Generation: Fan-Out Strategies
The news feed is the hardest part. When User A makes a post, how does it reach all 300 friends' feeds?
Fan-out on Write (Push Model):
- When a user publishes a post, immediately push it to every friend's pre-computed feed cache.
- Pro: Feed reads are instant β just fetch the pre-built cache. O(1) read latency.
- Con: Celebrity problem. If a user has 10M followers, one post triggers 10M cache writes. Slow and expensive.
Fan-out on Read (Pull Model):
- Do nothing at write time. When a user opens their feed, fetch posts from all friends on-the-fly, merge, rank, and return.
- Pro: Write is instant. No wasted work for inactive users.
- Con: Read is slow β you need to query hundreds of friend timelines and merge them. High latency.
Hybrid Approach (What Facebook Does):
- For normal users (< 5K friends): fan-out on write. Push to friend caches immediately.
- For celebrities/pages (millions of followers): fan-out on read. Their posts are fetched at read time and merged into the feed.
- This gives you the best of both worlds β fast reads for most content, no write amplification for celebrities.
Graph Traversal & Friend Suggestions
The social graph enables powerful features beyond just listing friends:
Friend-of-Friend Suggestions ("People You May Know"):
- Find users who are 2 hops away: friends of your friends who aren't already your friend.
- Rank by number of mutual connections β more mutual friends = stronger suggestion.
- This is a BFS/2-hop traversal on the graph. At scale (300 friends Γ 300 = 90K candidates), it needs to be pre-computed and cached.
Mutual Friends:
- Intersection of two users' friend lists. With sorted adjacency lists, this is O(n) merge.
- Displayed on profiles and used for trust signals.
Privacy & Access Control:
- Every post has a privacy level: public, friends-only, friends-of-friends, custom lists.
- Feed generation must check permissions β is the viewer allowed to see this post?
- Graph queries power this: "Is viewer a friend of the author?" must be O(1) β use a hash set for each user's friend list.