String Algorithms6 min read

Z-Algorithm

Z-array: match length at every position — linear pattern matching
Z-array build:Fast · \(O(n)\)pattern matching:Fast · \(O(n+m)\)space:\(O(n+m)\)

Imagine you wrote a word on a long banner. Now you walk along the banner starting at each position and ask: "how many letters from the beginning of the banner match here?" The answer at each position is the Z-value.

For example, for the string "aabxaa":

  • Position 0: skip (it's the start)
  • Position 1: 'a' matches 'a' at position 0, then 'a' vs 'b' — no. Z[1] = 1
  • Position 2: 'b' vs 'a' — no match. Z[2] = 0
  • Position 3: 'x' vs 'a' — no match. Z[3] = 0
  • Position 4: 'aa' matches prefix 'aa'. Z[4] = 2
  • Position 5: 'a' matches 'a'. Z[5] = 1

The full Z-array: [_, 1, 0, 0, 2, 1]. Simple! And it turns out you can compute this entire array in just one pass.

Why Not Just Check Each Position Naively?

The brute-force approach — try to extend a match at every position character by character — takes \(O(n^2)\) in the worst case. For a string like "aaaaaaa", every position extends all the way to the right, so you'd do \(n^2\) comparisons.

The Z-algorithm avoids this with a neat bookkeeping trick called the Z-box.

The Z-Box: Reusing What You Already Know

As you walk right through the string, you remember the rightmost interval [L, R] where you've confirmed a prefix match. This interval is the Z-box.

When you're at position i inside the Z-box (i ≤ R), you already know that S[i..R] matches S[i-L..R-L] — because that's what the Z-box guarantees. So you can initialize Z[i] = min(Z[i-L], R-i+1) for free, without any character comparisons!

Then you try to extend beyond R. The key insight: R only ever moves right. Every character you compare either extends R rightward, or ends the extension. So across the entire string, each character is compared at most a constant number of times — giving \(O(n)\) total.

Note: The Z-box is like a cheat sheet. Once you've confirmed a prefix match in some region, you don't have to re-read that region again — you look up what you already know. R only moves right, so total work stays \(O(n)\).

Z-Array + Pattern Matching

def build_z(s):
n = len(s)
z = [0] * n
L = R = 0
for i in range(1, n):
if i < R:
z[i] = min(R - i, z[i - L])
while i + z[i] < n and s[z[i]] == s[i + z[i]]:
z[i] += 1
if i + z[i] > R:
L, R = i, i + z[i]
return z
def z_search(text, pattern):
combined = pattern + '$' + text
z = build_z(combined)
m = len(pattern)
return [i - m - 1 for i in range(m + 1, len(combined)) if z[i] >= m]
print(build_z("aabxaa")) # [0, 1, 0, 0, 2, 1]
print(z_search("abcabcabc", "abc")) # [0, 3, 6]
Output
[0, 1, 0, 0, 2, 1]
[0, 3, 6]

Using Z for Pattern Matching

The trick is beautifully simple: concatenate pattern + '$' + text, where '$' is a character that never appears in either string. Compute the Z-array of this combined string.

Now look at positions in the text portion. If Z[i] ≥ length of pattern, it means the text starting at that position matches the pattern entirely — you found an occurrence!

The '$' separator is crucial: it ensures Z-values in the text part can never exceed m even if the text happens to start with the same characters as the pattern. Without it, the first position in the text would always show a match equal to the prefix length, which could be misleading.

Z-Algorithm vs KMP

Both solve string matching in \(O(n+m)\). KMP builds a failure function (longest proper prefix that is also a suffix). Z builds a Z-array (longest prefix match at each position). They're two different lenses on the same prefix-suffix structure of a string.

Many programmers find Z simpler to implement — one clean Z-box invariant, no tricky lps table. KMP's failure function, however, is the foundation for Aho-Corasick, making it more relevant for multi-pattern algorithms.

Complexity

⚡ Z-array constructionR only moves right — each character is compared at most twice across the whole algorithm
One clean pass\(O(n)\)
🔍 Pattern matching (total)Build combined string \(O(n+m)\), compute Z \(O(n+m)\), scan Z \(O(n+m)\)
Linear in pattern + text\(O(n+m)\)
💾 SpaceStores the concatenated string and Z-array — slightly more than KMP's \(O(m)\) extra space
Proportional to combined string\(O(n+m)\)

Quick check

For string S = "abcab", what is Z[3]?

Continue reading

KMP Algorithm
When searching for a word, never go backwards — use what you already know
Rabin-Karp
Find a word in a book by its fingerprint — check character-by-character only when fingerprints match
Aho-Corasick
Find all patterns at once — one pass, \(O(n + matches)\)