Z-Algorithm
Imagine you wrote a word on a long banner. Now you walk along the banner starting at each position and ask: "how many letters from the beginning of the banner match here?" The answer at each position is the Z-value.
For example, for the string "aabxaa":
- Position 0: skip (it's the start)
- Position 1:
'a'matches'a'at position 0, then'a'vs'b'— no. Z[1] = 1 - Position 2:
'b'vs'a'— no match. Z[2] = 0 - Position 3:
'x'vs'a'— no match. Z[3] = 0 - Position 4:
'aa'matches prefix'aa'. Z[4] = 2 - Position 5:
'a'matches'a'. Z[5] = 1
The full Z-array: [_, 1, 0, 0, 2, 1]. Simple! And it turns out you can compute this entire array in just one pass.
Why Not Just Check Each Position Naively?
The brute-force approach — try to extend a match at every position character by character — takes \(O(n^2)\) in the worst case. For a string like "aaaaaaa", every position extends all the way to the right, so you'd do \(n^2\) comparisons.
The Z-algorithm avoids this with a neat bookkeeping trick called the Z-box.
The Z-Box: Reusing What You Already Know
As you walk right through the string, you remember the rightmost interval [L, R] where you've confirmed a prefix match. This interval is the Z-box.
When you're at position i inside the Z-box (i ≤ R), you already know that S[i..R] matches S[i-L..R-L] — because that's what the Z-box guarantees. So you can initialize Z[i] = min(Z[i-L], R-i+1) for free, without any character comparisons!
Then you try to extend beyond R. The key insight: R only ever moves right. Every character you compare either extends R rightward, or ends the extension. So across the entire string, each character is compared at most a constant number of times — giving \(O(n)\) total.
Z-Array + Pattern Matching
Using Z for Pattern Matching
The trick is beautifully simple: concatenate pattern + '$' + text, where '$' is a character that never appears in either string. Compute the Z-array of this combined string.
Now look at positions in the text portion. If Z[i] ≥ length of pattern, it means the text starting at that position matches the pattern entirely — you found an occurrence!
The '$' separator is crucial: it ensures Z-values in the text part can never exceed m even if the text happens to start with the same characters as the pattern. Without it, the first position in the text would always show a match equal to the prefix length, which could be misleading.
Z-Algorithm vs KMP
Both solve string matching in \(O(n+m)\). KMP builds a failure function (longest proper prefix that is also a suffix). Z builds a Z-array (longest prefix match at each position). They're two different lenses on the same prefix-suffix structure of a string.
Many programmers find Z simpler to implement — one clean Z-box invariant, no tricky lps table. KMP's failure function, however, is the foundation for Aho-Corasick, making it more relevant for multi-pattern algorithms.