Design a Chat System
Understanding the Requirements
We're designing a chat system like WhatsApp, Slack, or Discord. Let's start by asking the right questions.
Functional Requirements:
- 1:1 messaging — Alice sends a message to Bob. Bob sees it (almost) instantly.
- Group chat — Up to 500 members in a group.
- Online/offline status — See who's currently active.
- Read receipts — Know when your message was delivered and read.
- Push notifications — Notify users who are offline.
- Message history — Scroll back to see old messages.
Non-Functional Requirements:
- Low latency: Messages should arrive in < 200ms (real-time feel).
- Reliability: Messages must never be lost. If Bob's phone is off, the message waits for him.
- Ordering: Messages in a conversation appear in the correct order.
- Scale: 50M daily active users, 1B messages/day.
WebSockets vs Polling vs Long Polling
Chat needs real-time, bidirectional communication. The server needs to push messages to clients instantly — not wait for the client to ask. Let's compare the options:
Short Polling: The client asks the server "any new messages?" every few seconds. Simple but wasteful — 99% of requests return "nope, nothing new." Like a kid asking "are we there yet?" every 30 seconds on a road trip.
Long Polling: The client asks "any new messages?" but the server holds the connection open until there's something to return (or a timeout). Better than short polling — fewer empty responses. But each "poll" ties up a server connection.
WebSockets: A persistent, two-way connection between client and server. Once established, either side can send data at any time. This is the gold standard for chat. Low latency, low overhead, no wasted requests.
The trade-off: WebSocket connections are stateful — you need to know which server each user is connected to. This adds complexity compared to stateless HTTP. We'll see how to handle this.
Simple WebSocket Chat Server
High-Level Architecture
Let's lay out the components:
Chat Servers (WebSocket): Handle persistent WebSocket connections. Each server manages thousands of active connections. When a message arrives, the chat server looks up which server the recipient is connected to.
Connection Registry (Redis): Stores the mapping of user_id → chat_server_id. When user Alice connects to Server 3, we write alice → server-3 in Redis. When Bob sends Alice a message, we look up Alice's server and route the message there.
Message Queue (Kafka): Decouples message sending from processing. When a message is sent, it's published to Kafka. Consumers handle storage, delivery, and push notifications independently.
Message Storage: Messages need to be durable and retrievable. Two options:
- For 1:1 chats: A key-value store keyed by
(user_id, conversation_id, timestamp). Cassandra is perfect — great write throughput, range queries by timestamp. - For group chats: Store messages by
(group_id, timestamp). Each member retrieves messages for their groups.
Push Notification Service: When a user is offline, send push notifications via APNs (iOS) or FCM (Android). This is a separate service consuming from the message queue.
Message Ordering and IDs
Messages must appear in the right order. But in a distributed system, timestamps from different devices can be out of sync. How do you guarantee order?
Server-side timestamps: The chat server assigns a timestamp when it receives the message. This works for a single server but becomes tricky across multiple servers (their clocks might differ by milliseconds).
Sequence numbers per conversation: Each conversation has an auto-incrementing sequence number. Message 1, 2, 3, 4... This guarantees perfect ordering within a conversation. Store the latest sequence number in Redis for fast increments.
Snowflake IDs: Twitter's Snowflake ID generates globally unique, time-ordered IDs. Each ID encodes: timestamp (41 bits) + machine_id (10 bits) + sequence (12 bits). IDs are sortable by time, unique across machines, and generated without coordination.
Online/Offline Status
How do you know if someone is "online"? There are two approaches:
Heartbeat-based: Each connected client sends a "heartbeat" (a tiny ping) every 30 seconds. The server records the last heartbeat time. If it's more than, say, 60 seconds old, the user is considered offline.
Store last-seen timestamps in Redis: user:42:last_seen → 1640000000. Query it to show "Online" or "Last seen 5 min ago."
For group chats: Broadcasting online status to all 500 members every time someone connects/disconnects would be extremely chatty. Solution: only show online status when a user opens a specific chat. Fetch the status on demand rather than pushing it constantly.
Read Receipts
Read receipts ("delivered" ✓ and "read" ✓✓) add another layer:
- Sent: The server received the message (acknowledged to the sender).
- Delivered: The recipient's device received the message (the recipient's client acknowledges).
- Read: The recipient opened the chat (the client sends a "read" event when the user views the conversation).
For 1:1 chats, this is straightforward — one acknowledgment per message. For group chats, tracking who has read each message gets expensive. WhatsApp solves this by only showing detailed read info when you tap the message — fetching it on demand rather than tracking it in real-time for everyone.
Scaling the Chat System
50M DAU sending 1B messages/day is ~12,000 messages/second. At peak (3x), that's ~36,000 msg/sec. Here's how to scale each component:
- Chat servers: Each handles ~10K concurrent WebSocket connections. For 50M DAU (maybe 10M concurrent), you need ~1,000 chat servers. Use consistent hashing to assign users to servers.
- Redis (connection registry): A cluster of Redis nodes storing user→server mappings. With 10M entries and small values, this fits easily in memory.
- Kafka: Partition message topics by conversation_id so all messages in a conversation are ordered on the same partition.
- Cassandra (message storage): Partition by conversation_id, cluster by timestamp. This gives you efficient writes and fast message history retrieval.