Hash Functions
Imagine a locker room with 100 numbered lockers. Instead of searching every locker for your stuff, you have a magic formula that tells you exactly which locker number to use based on your name. That magic formula is a hash function.
A hash function takes any key β a string, a number, an object β and converts it to an integer index into an array. The goal: spread keys evenly across all available buckets so lookups stay close to \(O(1)\).
What makes a good hash function?
- Deterministic β same input always produces the same output
- Uniform distribution β keys spread evenly, no hot buckets
- Fast to compute β otherwise you lose the speed benefit
- Avalanche effect β small input change, totally different output
A naive hash for strings might sum the ASCII values and take modulo array length: hash = (sum of char codes) % capacity. Simple, but it clusters anagrams in the same bucket. Better functions like djb2 or FNV-1a mix bits more aggressively.
β β arrow keys to step Β· click elements to interact
Polynomial Hash & Collisions
Collisions β when two keys land in the same bucket
No matter how good your hash function is, with enough keys you will get collisions β two keys mapping to the same bucket. This is unavoidable (the pigeonhole principle: if you have more keys than buckets, at least one bucket holds two keys).
There are two classic ways to handle collisions:
- Chaining β each bucket holds a linked list; colliding keys are appended to the list
- Open Addressing β when a bucket is occupied, probe for the next empty slot (linear probing, quadratic probing, double hashing)
Load Factor
The load factor Ξ± = n / capacity (where n = number of stored keys) measures how full the table is. When Ξ± gets too high:
- Chains get longer (chaining)
- Clusters form (open addressing)
- Average lookup degrades toward \(O(n)\)
Most implementations resize (rehash into a bigger array) when Ξ± exceeds 0.75 for chaining or 0.5 for open addressing. Rehashing copies every entry β \(O(n)\) one-time cost β but keeps future operations \(O(1)\).
Complexity
Word Frequency Counter
Quick check
Continue reading