Hash Functions
Imagine a locker room with 100 numbered lockers. Instead of searching every locker for your stuff, you have a magic formula that tells you exactly which locker number to use based on your name. That magic formula is a hash function.
A hash function takes any key — a string, a number, an object — and converts it to an integer index into an array. The goal: spread keys evenly across all available buckets so lookups stay close to \(O(1)\).
What makes a good hash function?
- Deterministic — same input always produces the same output
- Uniform distribution — keys spread evenly, no hot buckets
- Fast to compute — otherwise you lose the speed benefit
- Avalanche effect — small input change, totally different output
A naive hash for strings might sum the ASCII values and take modulo array length: hash = (sum of char codes) % capacity. Simple, but it clusters anagrams in the same bucket. Better functions like djb2 or FNV-1a mix bits more aggressively.
← → arrow keys to step · click elements to interact
Polynomial Hash & Collisions
Collisions — when two keys land in the same bucket
No matter how good your hash function is, with enough keys you will get collisions — two keys mapping to the same bucket. This is unavoidable (the pigeonhole principle: if you have more keys than buckets, at least one bucket holds two keys).
There are two classic ways to handle collisions:
- Chaining — each bucket holds a linked list; colliding keys are appended to the list
- Open Addressing — when a bucket is occupied, probe for the next empty slot (linear probing, quadratic probing, double hashing)
Load Factor
The load factor α = n / capacity (where n = number of stored keys) measures how full the table is. When α gets too high:
- Chains get longer (chaining)
- Clusters form (open addressing)
- Average lookup degrades toward \(O(n)\)
Most implementations resize (rehash into a bigger array) when α exceeds 0.75 for chaining or 0.5 for open addressing. Rehashing copies every entry — \(O(n)\) one-time cost — but keeps future operations \(O(1)\).
Complexity
Word Frequency Counter
Quick check
Continue reading