Rabin-Karp
Imagine you're searching for a friend's name in a 500-page book. Checking every single letter of every page would take forever. But what if you could create a quick fingerprint of your friend's name — a tiny summary number — and compare fingerprints first? Only when a fingerprint matches do you bother reading the actual letters.
That's exactly what Rabin-Karp does. It computes a fingerprint (called a hash) for the pattern, then slides a window across the text and compares fingerprints. Most windows fail the fingerprint test instantly. Only rare matches require a full character comparison.
The Rolling Hash Trick
Here's the clever part: when you slide the window one step to the right, you don't recompute the whole fingerprint from scratch. Instead, you roll it — subtract the leftmost character's contribution, shift everything left, and add the new rightmost character. This makes each slide cost \(O(1)\) instead of \(O(m)\). That's the rolling hash.
Mathematically, for a window of length m starting at position i:hash = (s[i]·b^(m-1) + s[i+1]·b^(m-2) + ... + s[i+m-1]) mod p
Sliding one step: new_hash = (old_hash - s[i]·b^(m-1)) × b + s[i+m]) mod p
Spurious Hits: When Fingerprints Lie
Sometimes two different strings have the same fingerprint — a hash collision. When this happens, Rabin-Karp does a full character-by-character comparison to confirm. If they don't actually match, it's called a spurious hit. With a good hash and a large prime modulus, spurious hits are extremely rare and the algorithm stays near \(O(n+m)\) in practice.
Searching for Multiple Patterns at Once
This is where Rabin-Karp really shines. If you want to search for k different patterns simultaneously, compute all k hashes and store them in a hash set. Each window only needs one \(O(1)\) lookup against the set. Total time: \(O(n + Σm_i)\) — linear in the combined input size. KMP would need k separate passes.
Rabin-Karp Pattern Search
When Should You Use Rabin-Karp?
- Searching for multiple patterns at once — this is its superpower
- When you want a simple implementation with good average-case behavior
- 2D pattern matching (extend hashing to grids)
For a single pattern, KMP or Boyer-Moore may be faster in the worst case. But when you have dozens or hundreds of patterns, Rabin-Karp's hash-set trick makes it hard to beat.