Chaining

Picture each bucket in your hash table as a coat hook rather than a single locker. You can hang multiple coats on the same hook — they just form a little chain dangling down. That's exactly what separate chaining does: each bucket is the head of a linked list, and any keys that hash to the same bucket are appended to that list.

When you look up a key, you hash to the bucket (\(O(1)\)), then walk the chain at that bucket until you find the key or reach the end. If chains are short — which they will be if the load factor is low — this is effectively \(O(1)\).

Chaining vs Open Addressing

Chaining pros:

Simple to implement
Handles high load factors gracefully — chains just get longer
Deletion is straightforward (unlink the node)
Each chain can use any data structure (sorted list, BST, even another hash table)

Open addressing pros:

Everything stays inside the array — better cache locality
No extra memory for pointers
Works better when load factor stays below ~0.5

Java's HashMap uses chaining. Python's dict uses open addressing with a custom probing scheme. Both work — the tradeoff is memory vs. cache performance.

Hash Table with Chaining

class HashTable:
    def __init__(self, size=8):
        self.size = size
        self.buckets = [[] for _ in range(size)]

    def _hash(self, key):
        return hash(key) % self.size

    def put(self, key, value):
        idx = self._hash(key)
        for pair in self.buckets[idx]:
            if pair[0] == key:
                pair[1] = value; return
        self.buckets[idx].append([key, value])

    def get(self, key):
        idx = self._hash(key)
        for pair in self.buckets[idx]:
            if pair[0] == key: return pair[1]
        return None

ht = HashTable()
ht.put('name', 'Alice')
ht.put('age', 30)
ht.put('city', 'NYC')
print(ht.get('name'))  # Alice
print(ht.get('age'))   # 30
print(ht.get('zip'))   # None

Output

Alice
30
None

When to resize

With chaining, the load factor α = n / m (n keys, m buckets) directly predicts average chain length. When α = 1, on average each chain has one node — that's fine. When α = 3, chains average 3 nodes per lookup — still manageable but slowing down.

The typical resize threshold for chaining is α > 0.75 (75% full). At that point the table doubles its bucket count and every key is rehashed into the new, larger array. The operation is \(O(n)\) but amortized \(O(1)\) per insert.

After resizing, α drops to roughly 0.375 (half of 0.75), giving plenty of breathing room before the next resize.

Treeifying long chains (Java 8+)

In Java's HashMap, if a single chain grows beyond 8 nodes, it automatically converts that chain into a red-black tree. Lookup in that bucket then becomes \(O(\log n)\) instead of \(O(n)\) — a safety net against adversarial inputs that hammer one bucket.

Complexity

Lookup (average)

α = load factor; stays near \(O(1)\) when α is small

Constant + chain \(O(1 + α)\)

Insert (average)

Hash to bucket, prepend to chain

Instant \(O(1)\)

Delete (average)

Find node in chain, unlink it

Constant + chain \(O(1 + α)\)

Worst case

All keys in one chain — terrible hash or adversarial input

Linear \(O(n)\)

Space

n entries + m bucket pointers

Linear \(O(n + m)\)

Load Factor & Rehashing

class HashTable:
    def __init__(self, size=4):
        self.size = size
        self.count = 0
        self.buckets = [[] for _ in range(size)]

    def _hash(self, key): return hash(key) % self.size

    def put(self, key, value):
        if (self.count + 1) / self.size > 0.75:
            self._rehash()
        idx = self._hash(key)
        for pair in self.buckets[idx]:
            if pair[0] == key: pair[1] = value; return
        self.buckets[idx].append([key, value])
        self.count += 1

    def _rehash(self):
        old = self.buckets
        self.size *= 2
        self.buckets = [[] for _ in range(self.size)]
        self.count = 0
        for bucket in old:
            for key, val in bucket: self.put(key, val)
        print(f"Rehashed to size {self.size}")

ht = HashTable(4)
for i in range(6): ht.put(f'k{i}', i)
print(ht.count)  # 6