Real-World Systems12 min read

Design a Chat System

Real-time messaging for millions — delivered instantly
scope:Real-World Systemdifficulty:Intermediate-Advanced

Understanding the Requirements

We're designing a chat system like WhatsApp, Slack, or Discord. Let's start by asking the right questions.

Functional Requirements:

  • 1:1 messaging — Alice sends a message to Bob. Bob sees it (almost) instantly.
  • Group chat — Up to 500 members in a group.
  • Online/offline status — See who's currently active.
  • Read receipts — Know when your message was delivered and read.
  • Push notifications — Notify users who are offline.
  • Message history — Scroll back to see old messages.

Non-Functional Requirements:

  • Low latency: Messages should arrive in < 200ms (real-time feel).
  • Reliability: Messages must never be lost. If Bob's phone is off, the message waits for him.
  • Ordering: Messages in a conversation appear in the correct order.
  • Scale: 50M daily active users, 1B messages/day.
Two users want to chat in real time

WebSockets vs Polling vs Long Polling

Chat needs real-time, bidirectional communication. The server needs to push messages to clients instantly — not wait for the client to ask. Let's compare the options:

Short Polling: The client asks the server "any new messages?" every few seconds. Simple but wasteful — 99% of requests return "nope, nothing new." Like a kid asking "are we there yet?" every 30 seconds on a road trip.

Long Polling: The client asks "any new messages?" but the server holds the connection open until there's something to return (or a timeout). Better than short polling — fewer empty responses. But each "poll" ties up a server connection.

WebSockets: A persistent, two-way connection between client and server. Once established, either side can send data at any time. This is the gold standard for chat. Low latency, low overhead, no wasted requests.

The trade-off: WebSocket connections are stateful — you need to know which server each user is connected to. This adds complexity compared to stateless HTTP. We'll see how to handle this.

WebSocket: persistent bidirectional connection

Simple WebSocket Chat Server

import asyncio
import websockets
import json
from collections import defaultdict
# Track connected users: user_id -> websocket connection
connected_users = {} # In production, use Redis pub/sub across servers
async def handle_connection(websocket, path):
user_id = None
try:
# First message should be authentication
auth_msg = await websocket.recv()
auth = json.loads(auth_msg)
user_id = auth["user_id"]
connected_users[user_id] = websocket
print(f"User {user_id} connected")
# Listen for messages
async for raw_message in websocket:
message = json.loads(raw_message)
recipient_id = message["to"]
content = message["content"]
timestamp = message["timestamp"]
# Build the message to deliver
outgoing = {
"from": user_id,
"content": content,
"timestamp": timestamp,
"message_id": f"{user_id}_{timestamp}"
}
# Store in database (async, don't block)
# await store_message(outgoing)
# Deliver to recipient if online
if recipient_id in connected_users:
await connected_users[recipient_id].send(
json.dumps(outgoing)
)
# Send delivery receipt back to sender
await websocket.send(json.dumps({
"type": "delivered",
"message_id": outgoing["message_id"]
}))
else:
# User offline — queue for push notification
# await push_notification_service.notify(recipient_id, outgoing)
print(f"User {recipient_id} offline — queuing notification")
finally:
if user_id and user_id in connected_users:
del connected_users[user_id]
print(f"User {user_id} disconnected")
# Start server
# asyncio.run(websockets.serve(handle_connection, 'localhost', 8765))
print("Chat server ready on ws://localhost:8765")
Output
Chat server ready on ws://localhost:8765

High-Level Architecture

Let's lay out the components:

Chat Servers (WebSocket): Handle persistent WebSocket connections. Each server manages thousands of active connections. When a message arrives, the chat server looks up which server the recipient is connected to.

Connection Registry (Redis): Stores the mapping of user_id → chat_server_id. When user Alice connects to Server 3, we write alice → server-3 in Redis. When Bob sends Alice a message, we look up Alice's server and route the message there.

Message Queue (Kafka): Decouples message sending from processing. When a message is sent, it's published to Kafka. Consumers handle storage, delivery, and push notifications independently.

Message Storage: Messages need to be durable and retrievable. Two options:

  • For 1:1 chats: A key-value store keyed by (user_id, conversation_id, timestamp). Cassandra is perfect — great write throughput, range queries by timestamp.
  • For group chats: Store messages by (group_id, timestamp). Each member retrieves messages for their groups.

Push Notification Service: When a user is offline, send push notifications via APNs (iOS) or FCM (Android). This is a separate service consuming from the message queue.

Sending a message: Alice → Server → Bob
Message delivery flow: online delivery via WebSocket routing, offline fallback via push notifications

Message Ordering and IDs

Messages must appear in the right order. But in a distributed system, timestamps from different devices can be out of sync. How do you guarantee order?

Server-side timestamps: The chat server assigns a timestamp when it receives the message. This works for a single server but becomes tricky across multiple servers (their clocks might differ by milliseconds).

Sequence numbers per conversation: Each conversation has an auto-incrementing sequence number. Message 1, 2, 3, 4... This guarantees perfect ordering within a conversation. Store the latest sequence number in Redis for fast increments.

Snowflake IDs: Twitter's Snowflake ID generates globally unique, time-ordered IDs. Each ID encodes: timestamp (41 bits) + machine_id (10 bits) + sequence (12 bits). IDs are sortable by time, unique across machines, and generated without coordination.

Online/Offline Status

How do you know if someone is "online"? There are two approaches:

Heartbeat-based: Each connected client sends a "heartbeat" (a tiny ping) every 30 seconds. The server records the last heartbeat time. If it's more than, say, 60 seconds old, the user is considered offline.

Store last-seen timestamps in Redis: user:42:last_seen → 1640000000. Query it to show "Online" or "Last seen 5 min ago."

For group chats: Broadcasting online status to all 500 members every time someone connects/disconnects would be extremely chatty. Solution: only show online status when a user opens a specific chat. Fetch the status on demand rather than pushing it constantly.

Group chat: fan-out to all members

Read Receipts

Read receipts ("delivered" ✓ and "read" ✓✓) add another layer:

  • Sent: The server received the message (acknowledged to the sender).
  • Delivered: The recipient's device received the message (the recipient's client acknowledges).
  • Read: The recipient opened the chat (the client sends a "read" event when the user views the conversation).

For 1:1 chats, this is straightforward — one acknowledgment per message. For group chats, tracking who has read each message gets expensive. WhatsApp solves this by only showing detailed read info when you tap the message — fetching it on demand rather than tracking it in real-time for everyone.

Note: Think of the chat system like a post office. WebSockets are like having a direct phone line to each person — instant communication. The message queue is the sorting room — messages come in, get organized, and go to the right destinations. The database is the filing cabinet — you can always go back and find old letters. Push notifications are the carrier pigeon for people who aren't home.
Full architecture: WS servers, message queue, and storage

Scaling the Chat System

50M DAU sending 1B messages/day is ~12,000 messages/second. At peak (3x), that's ~36,000 msg/sec. Here's how to scale each component:

  • Chat servers: Each handles ~10K concurrent WebSocket connections. For 50M DAU (maybe 10M concurrent), you need ~1,000 chat servers. Use consistent hashing to assign users to servers.
  • Redis (connection registry): A cluster of Redis nodes storing user→server mappings. With 10M entries and small values, this fits easily in memory.
  • Kafka: Partition message topics by conversation_id so all messages in a conversation are ordered on the same partition.
  • Cassandra (message storage): Partition by conversation_id, cluster by timestamp. This gives you efficient writes and fast message history retrieval.
Note: Interview tip: The key insight for chat systems is the connection registry. With thousands of stateful WebSocket servers, you MUST have a way to find which server a user is on. Redis is the standard answer. Also always mention the offline flow — push notifications for offline users is just as important as the real-time WebSocket flow.

Key Metrics

WebSocket message deliveryEnd-to-end for online users
~10-50 ms\(O(1)\)
Redis user lookupFind user's WebSocket server
< 1 ms\(O(1)\)
Message storage (Cassandra)Append-optimized
~1-5 ms\(O(1)\)
Load message historyk = messages per page
~5-20 ms\(O(k)\)
Group message fan-out (500 members)n = group size
~50-200 ms\(O(n)\)
Push notification deliveryDepends on APNs/FCM
~100-3000 ms\(O(1)\)

Quick check

Why are WebSockets preferred over short polling for a chat application?

Continue reading

Design a Notification System
The right message, to the right person, at the right time
Message Queues
Don't do everything right now — put it in line
Scalability
From one user to one billion — how systems grow
Design a News Feed
Delivering personalized content to millions — instantly