Z-Algorithm

Imagine you wrote a word on a long banner. Now you walk along the banner starting at each position and ask: "how many letters from the beginning of the banner match here?" The answer at each position is the Z-value.

For example, for the string "aabxaa":

Position 0: skip (it's the start)
Position 1: 'a' matches 'a' at position 0, then 'a' vs 'b' — no. Z[1] = 1
Position 2: 'b' vs 'a' — no match. Z[2] = 0
Position 3: 'x' vs 'a' — no match. Z[3] = 0
Position 4: 'aa' matches prefix 'aa'. Z[4] = 2
Position 5: 'a' matches 'a'. Z[5] = 1

The full Z-array: [_, 1, 0, 0, 2, 1]. Simple! And it turns out you can compute this entire array in just one pass.

Why Not Just Check Each Position Naively?

The brute-force approach — try to extend a match at every position character by character — takes $O(n^2)$ in the worst case. For a string like "aaaaaaa", every position extends all the way to the right, so you'd do $n^2$ comparisons.

The Z-algorithm avoids this with a neat bookkeeping trick called the Z-box.

The Z-Box: Reusing What You Already Know

As you walk right through the string, you remember the rightmost interval [L, R] where you've confirmed a prefix match. This interval is the Z-box.

When you're at position i inside the Z-box (i ≤ R), you already know that S[i..R] matches S[i-L..R-L] — because that's what the Z-box guarantees. So you can initialize Z[i] = min(Z[i-L], R-i+1) for free, without any character comparisons!

Then you try to extend beyond R. The key insight: R only ever moves right. Every character you compare either extends R rightward, or ends the extension. So across the entire string, each character is compared at most a constant number of times — giving $O(n)$ total.

Using Z for Pattern Matching

The trick is beautifully simple: concatenate pattern + '$' + text, where '$' is a character that never appears in either string. Compute the Z-array of this combined string.

Now look at positions in the text portion. If Z[i] ≥ length of pattern, it means the text starting at that position matches the pattern entirely — you found an occurrence!

The '$' separator is crucial: it ensures Z-values in the text part can never exceed m even if the text happens to start with the same characters as the pattern. Without it, the first position in the text would always show a match equal to the prefix length, which could be misleading.

Z-Algorithm vs KMP

Both solve string matching in $O(n+m)$. KMP builds a failure function (longest proper prefix that is also a suffix). Z builds a Z-array (longest prefix match at each position). They're two different lenses on the same prefix-suffix structure of a string.

Many programmers find Z simpler to implement — one clean Z-box invariant, no tricky lps table. KMP's failure function, however, is the foundation for Aho-Corasick, making it more relevant for multi-pattern algorithms.

Z-Algorithm

Why Not Just Check Each Position Naively?

The Z-Box: Reusing What You Already Know

Z-Array + Pattern Matching

Using Z for Pattern Matching

Z-Algorithm vs KMP

Complexity

Quick check