CAP Theorem
The Impossible Triangle
Imagine you're running a chain of ice cream shops across the country. You want three things:
- Every shop has the exact same menu at all times (Consistency)
- Every shop is always open, no matter what (Availability)
- Shops keep working even when the phone lines between them go down (Partition Tolerance)
The CAP Theorem says: you can only guarantee two of these three at the same time.
This was proven by Eric Brewer in 2000 and formally proved by Gilbert and Lynch in 2002. It's one of the most important theorems in distributed systems, and it comes up in almost every system design interview.
C, A, and P — Explained
Consistency (C) — Every read receives the most recent write. If Alice updates her profile picture on Server 1, and Bob reads from Server 2 one millisecond later, Bob sees the new picture. Everyone sees the same data at the same time.
Availability (A) — Every request receives a response (not an error), even if it might not contain the most recent data. The system is always "open for business." No request goes unanswered.
Partition Tolerance (P) — The system continues to operate even when network communication between servers breaks. Messages between nodes get lost or delayed. In the real world, network partitions always happen — cables get cut, switches fail, data centers lose connectivity.
Since network partitions are unavoidable in distributed systems, you must always choose P. The real choice is between C and A during a partition:
- CP system: When a partition occurs, the system sacrifices availability. Some requests will fail or wait, but you'll never get stale data.
- AP system: When a partition occurs, the system sacrifices consistency. All requests get a response, but some might return outdated data.
Proof Intuition
Let's make this concrete. You have two database servers, Server A and Server B, that replicate data between each other. A network partition cuts the connection between them.
A client writes x = 5 to Server A. Now a different client reads x from Server B.
Option 1 (Choose Consistency): Server B knows it can't verify it has the latest value (it can't reach Server A). So it refuses the read request — returning an error. The data is consistent, but the system is unavailable for reads on Server B.
Option 2 (Choose Availability): Server B returns whatever value it has for x (maybe the old value, x = 3). The system is available — the client got a response — but the data is inconsistent (Server A says 5, Server B says 3).
There's no magic third option. You must choose.
CP vs AP System Behavior
Real-World Examples
CP Systems (Consistency over Availability):
- ZooKeeper — Used for coordination in distributed systems. If it can't guarantee consistent data, it refuses to serve. Better to be unavailable than wrong.
- HBase — A consistent column-family store. Writes go to a single leader; if the leader is unreachable, writes fail.
- MongoDB (with majority write concern) — Can be configured to prioritize consistency by requiring writes to be acknowledged by a majority of nodes.
- Banking systems — Your bank balance must be correct. If the system can't guarantee accuracy, it's better to show "temporarily unavailable" than a wrong balance.
AP Systems (Availability over Consistency):
- Cassandra — Always accepts writes, even during partitions. Uses consistent hashing for data distribution. Uses eventual consistency to sync later.
- DynamoDB — Amazon's key-value store, one of many NoSQL databases. Designed to always be writable. "Add to cart" should never fail, even if the cart might be briefly inconsistent.
- DNS — The internet's phone book. It caches old values and eventually updates. A slightly outdated DNS record is better than no DNS at all.
- Social media feeds — If your Instagram feed is 5 seconds behind, nobody notices. But if Instagram is down, everyone notices.
Eventual Consistency
Many AP systems use eventual consistency as a compromise. The idea: after a partition heals, all nodes will eventually converge to the same data. There might be a window of inconsistency, but it's temporary.
How does this work in practice?
- Read repair: When a read detects inconsistency between replicas, it triggers a background update to bring them in sync.
- Anti-entropy (Merkle trees): Nodes periodically compare their data using hash trees and exchange any differences.
- Conflict resolution: When two nodes have conflicting writes, the system resolves them. Last-write-wins (LWW) is the simplest strategy. Vector clocks or CRDTs are more sophisticated approaches.
The key question for eventual consistency: how eventual is eventual? In practice, it's usually milliseconds to seconds. But during severe partitions, it could be minutes or hours.
Beyond CAP: PACELC
CAP only describes behavior during partitions. But what about normal operation? That's where PACELC comes in (pronounced "pass-elk").
PACELC says: If there's a Partition (P), choose between Availability (A) and Consistency (C). Else (E), when the system is running normally, choose between Latency (L) and Consistency (C).
- PA/EL — During partition: choose availability. Normally: choose low latency. Example: Cassandra, DynamoDB. These are fast and always available, with eventual consistency.
- PC/EC — During partition: choose consistency. Normally: still choose consistency (even at the cost of latency). Example: Traditional RDBMS like PostgreSQL, ZooKeeper.
- PA/EC — During partition: choose availability. Normally: choose consistency. Example: MongoDB (default config). This is a common practical trade-off.
PACELC gives you a more complete picture of how a system actually behaves in both good times and bad.