Consistent Hashing
The Problem with Simple Hashing
Imagine you have 4 servers and you want to distribute data across them. The simple approach: server = hash(key) % 4. Key "alice" hashes to 2, so it goes to Server 2. Easy.
But what happens when you add a 5th server? Now it's hash(key) % 5. Suddenly, almost every key maps to a different server. "alice" might now map to Server 3 instead of Server 2. You'd have to move most of your data — a massive reshuffling.
In real systems, this reshuffling is catastrophic. Imagine a cache cluster with 100 servers and billions of keys. Adding one server would invalidate nearly all cached data, causing a cache stampede that could bring down your database.
Consistent hashing solves this elegantly.
The Hash Ring
Instead of a line of numbers, imagine a circle (a ring) from 0 to 2^32. Both servers and keys are placed on this ring using a hash function.
Here's how it works:
- Place servers on the ring. Hash each server's name/IP to get its position. Server A lands at position 10,000, Server B at 40,000, Server C at 80,000.
- Place keys on the ring. Hash each key to get its position. Key "alice" lands at position 25,000.
- Find the owner. Walk clockwise from the key's position until you hit a server. "alice" at 25,000 walks clockwise and hits Server B at 40,000. So Server B owns "alice."
Now the magic: when you add or remove a server, only the keys between the new server and the previous server need to move. Instead of reshuffling everything, only a small fraction of keys are affected.
Consistent Hashing Implementation
Virtual Nodes
There's a problem with basic consistent hashing: servers might be placed unevenly on the ring. If Server A and Server B land next to each other, Server A might own a tiny arc while Server C owns a huge arc. The load distribution is unbalanced.
Virtual nodes fix this. Instead of placing each server once on the ring, you place it many times (100-200 copies). Each copy is called a virtual node. Server A gets positions for "A:vnode0", "A:vnode1", ..., "A:vnode149".
With 150 virtual nodes per server, the positions are much more evenly spread around the ring. The load distribution becomes nearly uniform — typically within 5-10% variance between servers.
Virtual nodes also make it easy to handle heterogeneous servers. A powerful server can get 300 virtual nodes while a smaller one gets 100, naturally receiving proportionally more traffic.
What Happens When Servers Change?
Adding a server: The new server takes over some keys from its clockwise neighbor. Only those keys need to move. With N servers, adding one server only moves ~1/N of the total keys. Compare this to modular hashing where nearly ALL keys move.
Removing a server: The removed server's keys are taken over by its clockwise neighbor. Again, only ~1/N keys are affected.
Let's see the math with 100 servers and 1 million keys:
- Modular hashing: Add one server → ~990,000 keys move (99%!)
- Consistent hashing: Add one server → ~10,000 keys move (1%)
That's a 99x improvement. In a cache cluster, this means 99% of your cache stays valid instead of being invalidated. Your database barely notices.
Real-World Use Cases
Consistent hashing is everywhere in distributed systems:
Amazon DynamoDB — Uses consistent hashing to distribute data across storage nodes. When a node is added or removed, only neighboring data partitions are affected.
Apache Cassandra — Each node is assigned a position on the hash ring. Data is automatically distributed based on the partition key's hash. Adding a node triggers minimal data migration.
Akamai CDN — Actually one of the earliest users! Uses consistent hashing to decide which edge server caches which content. This was described in the original 1997 paper.
Discord — Uses consistent hashing to route users to specific server processes. When they add capacity, only a fraction of users are rebalanced.
Load Balancers — Some load balancers use consistent hashing for session affinity. The same key (user ID, session ID) always maps to the same backend server, without the reshuffling problems of simple hash-mod.