String Algorithms7 min read

Rabin-Karp

Find a word in a book by its fingerprint — check character-by-character only when fingerprints match
average case:Fast · \(O(n+m)\)worst case:Slow · \(O(nm)\)space:Tiny · \(O(1)\)

Imagine you're searching for a friend's name in a 500-page book. Checking every single letter of every page would take forever. But what if you could create a quick fingerprint of your friend's name — a tiny summary number — and compare fingerprints first? Only when a fingerprint matches do you bother reading the actual letters.

That's exactly what Rabin-Karp does. It computes a fingerprint (called a hash) for the pattern, then slides a window across the text and compares fingerprints. Most windows fail the fingerprint test instantly. Only rare matches require a full character comparison.

The Rolling Hash Trick

Here's the clever part: when you slide the window one step to the right, you don't recompute the whole fingerprint from scratch. Instead, you roll it — subtract the leftmost character's contribution, shift everything left, and add the new rightmost character. This makes each slide cost \(O(1)\) instead of \(O(m)\). That's the rolling hash.

Mathematically, for a window of length m starting at position i:
hash = (s[i]·b^(m-1) + s[i+1]·b^(m-2) + ... + s[i+m-1]) mod p
Sliding one step: new_hash = (old_hash - s[i]·b^(m-1)) × b + s[i+m]) mod p

Spurious Hits: When Fingerprints Lie

Sometimes two different strings have the same fingerprint — a hash collision. When this happens, Rabin-Karp does a full character-by-character comparison to confirm. If they don't actually match, it's called a spurious hit. With a good hash and a large prime modulus, spurious hits are extremely rare and the algorithm stays near \(O(n+m)\) in practice.

Searching for Multiple Patterns at Once

This is where Rabin-Karp really shines. If you want to search for k different patterns simultaneously, compute all k hashes and store them in a hash set. Each window only needs one \(O(1)\) lookup against the set. Total time: \(O(n + Σm_i)\) — linear in the combined input size. KMP would need k separate passes.

Note: The rolling hash is like a conveyor belt scale at a grocery store — you don't reweigh everything from scratch when one item moves off and another rolls on. You just subtract one weight and add another.

Rabin-Karp Pattern Search

def rabin_karp(text, pattern):
n, m = len(text), len(pattern)
if m > n:
return []
BASE, MOD = 31, 10**9 + 7
# Precompute base^(m-1) mod MOD
power = pow(BASE, m - 1, MOD)
def char_val(c):
return ord(c) - ord('a') + 1
# Compute initial hashes
ph = th = 0
for i in range(m):
ph = (ph * BASE + char_val(pattern[i])) % MOD
th = (th * BASE + char_val(text[i])) % MOD
results = []
for i in range(n - m + 1):
if th == ph and text[i:i+m] == pattern: # check spurious hits
results.append(i)
if i < n - m:
th = (th - char_val(text[i]) * power) % MOD
th = (th * BASE + char_val(text[i + m])) % MOD
return results
print(rabin_karp("abcabcabc", "abc")) # [0, 3, 6]
print(rabin_karp("hello world", "world")) # [6]
Output
[0, 3, 6]
[6]

When Should You Use Rabin-Karp?

  • Searching for multiple patterns at once — this is its superpower
  • When you want a simple implementation with good average-case behavior
  • 2D pattern matching (extend hashing to grids)

For a single pattern, KMP or Boyer-Moore may be faster in the worst case. But when you have dozens or hundreds of patterns, Rabin-Karp's hash-set trick makes it hard to beat.

Complexity

🚀 Average / expected case\(O(m)\) to hash pattern, \(O(n)\) to slide window — spurious hits are negligible with a good hash
Lightning fast\(O(n+m)\)
💀 Worst case (bad hash)If every window is a spurious hit, each triggers \(O(m)\) verification — extremely rare with large prime
Slow — every window collides\(O(nm)\)
💾 SpaceOnly stores a few hash values and one precomputed power — no \(O(m)\) array like KMP
Barely anything\(O(1)\)
🎯 Multiple patterns (k patterns)Hash all patterns into a set; each window does one \(O(1)\) set lookup — no extra passes
Still one pass\(O(n + Σm_i)\)

Quick check

You're sliding a window of size 5 across a text. How does a rolling hash update when the window moves one step right?

Continue reading

KMP Algorithm
When searching for a word, never go backwards — use what you already know
Z-Algorithm
Z-array: match length at every position — linear pattern matching
Aho-Corasick
Find all patterns at once — one pass, \(O(n + matches)\)