Hash Tables5 min read

Chaining

Collisions? Just grow a list at that bucket
lookup avg: \(O(1 + α)\)insert avg: \(O(1)\)delete avg: \(O(1 + α)\)space: \(O(n + m)\)

Picture each bucket in your hash table as a coat hook rather than a single locker. You can hang multiple coats on the same hook — they just form a little chain dangling down. That's exactly what separate chaining does: each bucket is the head of a linked list, and any keys that hash to the same bucket are appended to that list.

When you look up a key, you hash to the bucket (\(O(1)\)), then walk the chain at that bucket until you find the key or reach the end. If chains are short — which they will be if the load factor is low — this is effectively \(O(1)\).

Chaining vs Open Addressing

Chaining pros:

  • Simple to implement
  • Handles high load factors gracefully — chains just get longer
  • Deletion is straightforward (unlink the node)
  • Each chain can use any data structure (sorted list, BST, even another hash table)

Open addressing pros:

  • Everything stays inside the array — better cache locality
  • No extra memory for pointers
  • Works better when load factor stays below ~0.5

Java's HashMap uses chaining. Python's dict uses open addressing with a custom probing scheme. Both work — the tradeoff is memory vs. cache performance.

Collision resolution — chaining appends to a list, open addressing probes forward
Explore hash chaining

← → arrow keys to step · click elements to interact

Hash Table with Chaining

class HashTable:
def __init__(self, size=8):
self.size = size
self.buckets = [[] for _ in range(size)]
def _hash(self, key):
return hash(key) % self.size
def put(self, key, value):
idx = self._hash(key)
for pair in self.buckets[idx]:
if pair[0] == key:
pair[1] = value; return
self.buckets[idx].append([key, value])
def get(self, key):
idx = self._hash(key)
for pair in self.buckets[idx]:
if pair[0] == key: return pair[1]
return None
ht = HashTable()
ht.put('name', 'Alice')
ht.put('age', 30)
ht.put('city', 'NYC')
print(ht.get('name')) # Alice
print(ht.get('age')) # 30
print(ht.get('zip')) # None
Output
Alice
30
None

When to resize

With chaining, the load factor α = n / m (n keys, m buckets) directly predicts average chain length. When α = 1, on average each chain has one node — that's fine. When α = 3, chains average 3 nodes per lookup — still manageable but slowing down.

The typical resize threshold for chaining is α > 0.75 (75% full). At that point the table doubles its bucket count and every key is rehashed into the new, larger array. The operation is \(O(n)\) but amortized \(O(1)\) per insert.

After resizing, α drops to roughly 0.375 (half of 0.75), giving plenty of breathing room before the next resize.

Treeifying long chains (Java 8+)

In Java's HashMap, if a single chain grows beyond 8 nodes, it automatically converts that chain into a red-black tree. Lookup in that bucket then becomes \(O(\log n)\) instead of \(O(n)\) — a safety net against adversarial inputs that hammer one bucket.

Note: The expected number of keys per bucket is exactly the load factor α. Keep α below 1.0 for chaining and you are statistically guaranteed \(O(1)\) average performance with a good hash function.

Complexity

Lookup (average)α = load factor; stays near \(O(1)\) when α is small
Constant + chain\(O(1 + α)\)
Insert (average)Hash to bucket, prepend to chain
Instant\(O(1)\)
Delete (average)Find node in chain, unlink it
Constant + chain\(O(1 + α)\)
Worst caseAll keys in one chain — terrible hash or adversarial input
Linear\(O(n)\)
Spacen entries + m bucket pointers
Linear\(O(n + m)\)

Load Factor & Rehashing

class HashTable:
def __init__(self, size=4):
self.size = size
self.count = 0
self.buckets = [[] for _ in range(size)]
def _hash(self, key): return hash(key) % self.size
def put(self, key, value):
if (self.count + 1) / self.size > 0.75:
self._rehash()
idx = self._hash(key)
for pair in self.buckets[idx]:
if pair[0] == key: pair[1] = value; return
self.buckets[idx].append([key, value])
self.count += 1
def _rehash(self):
old = self.buckets
self.size *= 2
self.buckets = [[] for _ in range(self.size)]
self.count = 0
for bucket in old:
for key, val in bucket: self.put(key, val)
print(f"Rehashed to size {self.size}")
ht = HashTable(4)
for i in range(6): ht.put(f'k{i}', i)
print(ht.count) # 6
Output
Rehashed to size 8
6
Challenge

Quick check

In a chaining hash table with 4 buckets and 8 keys, what is the load factor?

Continue reading

Hash Functions
Turn any key into a bucket number — instantly
Sets & Maps
Instant membership and key-value lookup — the everyday hash table
Consistent HashingSystem Design
Add or remove servers without reshuffling everything