Hash Tables5 min read

Chaining

Collisions? Just grow a list at that bucket
lookup avg: \(O(1 + Ξ±)\)insert avg: \(O(1)\)delete avg: \(O(1 + Ξ±)\)space: \(O(n + m)\)

Picture each bucket in your hash table as a coat hook rather than a single locker. You can hang multiple coats on the same hook β€” they just form a little chain dangling down. That's exactly what separate chaining does: each bucket is the head of a linked list, and any keys that hash to the same bucket are appended to that list.

When you look up a key, you hash to the bucket (\(O(1)\)), then walk the chain at that bucket until you find the key or reach the end. If chains are short β€” which they will be if the load factor is low β€” this is effectively \(O(1)\).

Chaining vs Open Addressing

Chaining pros:

  • Simple to implement
  • Handles high load factors gracefully β€” chains just get longer
  • Deletion is straightforward (unlink the node)
  • Each chain can use any data structure (sorted list, BST, even another hash table)

Open addressing pros:

  • Everything stays inside the array β€” better cache locality
  • No extra memory for pointers
  • Works better when load factor stays below ~0.5

Java's HashMap uses chaining. Python's dict uses open addressing with a custom probing scheme. Both work β€” the tradeoff is memory vs. cache performance.

Click chart to zoom
Collision resolution β€” chaining appends to a list, open addressing probes forward
β—ˆ Explore hash chaining

← β†’ arrow keys to step Β· click elements to interact

Hash Table with Chaining

class HashTable:
def __init__(self, size=8):
self.size = size
self.buckets = [[] for _ in range(size)]
def _hash(self, key):
return hash(key) % self.size
def put(self, key, value):
idx = self._hash(key)
for pair in self.buckets[idx]:
if pair[0] == key:
pair[1] = value; return
self.buckets[idx].append([key, value])
def get(self, key):
idx = self._hash(key)
for pair in self.buckets[idx]:
if pair[0] == key: return pair[1]
return None
ht = HashTable()
ht.put('name', 'Alice')
ht.put('age', 30)
ht.put('city', 'NYC')
print(ht.get('name')) # Alice
print(ht.get('age')) # 30
print(ht.get('zip')) # None
Output
Alice
30
None

When to resize

With chaining, the load factor Ξ± = n / m (n keys, m buckets) directly predicts average chain length. When Ξ± = 1, on average each chain has one node β€” that's fine. When Ξ± = 3, chains average 3 nodes per lookup β€” still manageable but slowing down.

The typical resize threshold for chaining is Ξ± > 0.75 (75% full). At that point the table doubles its bucket count and every key is rehashed into the new, larger array. The operation is \(O(n)\) but amortized \(O(1)\) per insert.

After resizing, Ξ± drops to roughly 0.375 (half of 0.75), giving plenty of breathing room before the next resize.

Treeifying long chains (Java 8+)

In Java's HashMap, if a single chain grows beyond 8 nodes, it automatically converts that chain into a red-black tree. Lookup in that bucket then becomes \(O(\log n)\) instead of \(O(n)\) β€” a safety net against adversarial inputs that hammer one bucket.

Note: The expected number of keys per bucket is exactly the load factor Ξ±. Keep Ξ± below 1.0 for chaining and you are statistically guaranteed \(O(1)\) average performance with a good hash function.

Complexity

Lookup (average)
Ξ± = load factor; stays near \(O(1)\) when Ξ± is small
Constant + chain \(O(1 + Ξ±)\)
Insert (average)
Hash to bucket, prepend to chain
Instant \(O(1)\)
Delete (average)
Find node in chain, unlink it
Constant + chain \(O(1 + Ξ±)\)
Worst case
All keys in one chain β€” terrible hash or adversarial input
Linear \(O(n)\)
Space
n entries + m bucket pointers
Linear \(O(n + m)\)

Load Factor & Rehashing

class HashTable:
def __init__(self, size=4):
self.size = size
self.count = 0
self.buckets = [[] for _ in range(size)]
def _hash(self, key): return hash(key) % self.size
def put(self, key, value):
if (self.count + 1) / self.size > 0.75:
self._rehash()
idx = self._hash(key)
for pair in self.buckets[idx]:
if pair[0] == key: pair[1] = value; return
self.buckets[idx].append([key, value])
self.count += 1
def _rehash(self):
old = self.buckets
self.size *= 2
self.buckets = [[] for _ in range(self.size)]
self.count = 0
for bucket in old:
for key, val in bucket: self.put(key, val)
print(f"Rehashed to size {self.size}")
ht = HashTable(4)
for i in range(6): ht.put(f'k{i}', i)
print(ht.count) # 6
Output
Rehashed to size 8
6
Challenge

Quick check

In a chaining hash table with 4 buckets and 8 keys, what is the load factor?

Continue reading