Dynamic Programming7 min read

Longest Common Subsequence

Find the longest story two sequences share — skipping is allowed
time:\(O(m·n)\)space:\(O(m·n)\) or \(O(min(m,n)\))

Imagine two friends kept diaries over the summer. Their entries don't match word-for-word, but there's a common thread — shared moments they both wrote about, in the same order, even if other entries appear in between. Finding that longest shared storyline is exactly the Longest Common Subsequence (LCS) problem.

A subsequence keeps relative order but can skip elements. So 'ACE' is a subsequence of 'ABCDE' — you skip B and D. A substring must be contiguous — 'BCD' is a substring, 'ACE' is not. LCS finds the longest subsequence that appears in both strings. It is closely related to Edit Distance, which measures the minimum operations to transform one string into another.

The 2D DP Table

Define dp[i][j] = length of the LCS of the first i characters of s and the first j characters of t. The table is (m+1) × (n+1) — one extra row and column for the "empty prefix" base cases: dp[0][j] = 0 and dp[i][0] = 0 (LCS with an empty string is always 0).

The Recurrence

For each cell (i, j):

If s[i-1] == t[j-1] (characters match): dp[i][j] = dp[i-1][j-1] + 1. We extend the LCS of the two shorter prefixes by this matching character.

If they don't match: dp[i][j] = max(dp[i-1][j], dp[i][j-1]). We take the best LCS we can get by either ignoring the last character of s or the last character of t.

Final answer: dp[m][n].

Tracing Back the Actual Sequence

The table gives the length. To find the actual LCS, trace back from dp[m][n]: if s[i-1] == t[j-1], this character is in the LCS — record it and move diagonally to (i-1, j-1). Otherwise, move toward the larger neighbor (up or left). Collect characters during diagonal moves (reverse the order at the end).

Space Optimization

dp[i][j] only depends on dp[i-1][j], dp[i-1][j-1], and dp[i][j-1] — just the previous row and one extra variable. You can reduce to \(O(min(m,n)\)) space by keeping only two rows. The tradeoff: you lose the ability to trace back the actual sequence.

Note: LCS is the engine behind the Unix 'diff' command — but on lines of text, not individual characters. Two files share many lines; diff highlights what changed between them.

LCS — length and reconstruction

def lcs(s, t):
m, n = len(s), len(t)
dp = [[0]*(n+1) for _ in range(m+1)]
for i in range(1, m+1):
for j in range(1, n+1):
if s[i-1] == t[j-1]:
dp[i][j] = dp[i-1][j-1] + 1
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
# reconstruct
res, i, j = [], m, n
while i > 0 and j > 0:
if s[i-1] == t[j-1]:
res.append(s[i-1]); i -= 1; j -= 1
elif dp[i-1][j] > dp[i][j-1]:
i -= 1
else:
j -= 1
return dp[m][n], ''.join(reversed(res))
print(lcs('ABCBDAB', 'BDCAB')) # (4, 'BCAB') or similar
Output
(4, 'BCAB')

Complexity

⏱️ Timem = length of s, n = length of t; each cell takes \(O(1)\)
Fill m×n table\(O(m·n)\)
📊 Space (full table)Full table needed to trace back the actual LCS string
For reconstruction\(O(m·n)\)
🪄 Space (length only)Process the shorter string as columns; keep only two rows at a time
Two rows\(O(min(m,n)\))
🔍 ReconstructionAt most m+n steps moving up or left through the table
Trace-back\(O(m+n)\)
🐌 Brute forceGenerate all 2^m subsequences of s, check each against t in \(O(n)\)
All subsequences\(O(2^m · n)\)

Quick check

The LCS of 'ABCBDAB' and 'BDCAB' has length 4. Which of these is a valid LCS?

Continue reading

Edit Distance
How many typo-fixes does it take to turn one word into another?
Longest Increasing Subsequence
Find the longest staircase hidden inside a jumbled sequence