Dynamic Programming7 min read

Edit Distance

How many typo-fixes does it take to turn one word into another?
time:\(O(mΒ·n)\)space:\(O(mΒ·n)\) or \(O(min(m,n)\))

You type "kitten" and autocorrect suggests "sitting". How does it know they're close? It counts the minimum number of single-character changes needed to turn one into the other. Insert a letter, delete a letter, or replace one letter with another β€” each costs 1. That minimum count is the Edit Distance (also called Levenshtein distance). It is closely related to the Longest Common Subsequence problem.

kitten β†’ sitten (replace k with s) β†’ sittin (replace e with i) β†’ sitting (insert g). Three operations. Edit distance = 3.

The Three Operations

Insert: Add a character anywhere. "cat" β†’ "coat" = 1 insertion.

Delete: Remove a character. "cats" β†’ "cat" = 1 deletion.

Replace: Swap one character for another. "cat" β†’ "bat" = 1 replacement.

The 2D DP Table

Define dp[i][j] = minimum edits to transform the first i characters of s into the first j characters of t. Table is (m+1) Γ— (n+1).

Base cases: dp[i][0] = i (delete all i characters of s) and dp[0][j] = j (insert j characters to build t from nothing).

The Recurrence

If s[i-1] == t[j-1]: dp[i][j] = dp[i-1][j-1]. Characters match β€” no operation needed, just inherit from the diagonal.

If s[i-1] != t[j-1]: dp[i][j] = 1 + min(dp[i-1][j-1], dp[i-1][j], dp[i][j-1]). One operation, plus the minimum of: replacing (diagonal), deleting s[i-1] (up), or inserting t[j-1] (left).

Space Optimization

dp[i][j] only depends on the cell diagonally up-left, directly above, and directly left β€” so one row of storage plus one extra variable is enough. Space drops from \(O(mΒ·n)\) to \(O(min(m,n)\)). You lose reconstruction ability but keep the distance value.

Note: Spell checkers suggest corrections with edit distance 1 or 2 β€” words that are 'one typo away' or 'two typos away'. DNA alignment tools use the same idea to measure how genetically similar two sequences are.

Edit Distance β€” classic DP

def edit_distance(s, t):
m, n = len(s), len(t)
dp = [[0]*(n+1) for _ in range(m+1)]
for i in range(m+1): dp[i][0] = i
for j in range(n+1): dp[0][j] = j
for i in range(1, m+1):
for j in range(1, n+1):
if s[i-1] == t[j-1]:
dp[i][j] = dp[i-1][j-1]
else:
dp[i][j] = 1 + min(dp[i-1][j-1], dp[i-1][j], dp[i][j-1])
return dp[m][n]
print(edit_distance('kitten', 'sitting')) # 3
print(edit_distance('horse', 'ros')) # 3
Output
3
3

Complexity

⏱️ Time
m = length of s, n = length of t; each cell is \(O(1)\)
Fill mΓ—n table \(O(mΒ·n)\)
πŸ“Š Space (full table)
Full table needed to trace back the exact sequence of edits
For reconstruction \(O(mΒ·n)\)
πŸͺ„ Space (distance only)
Keep only the previous row β€” loses reconstruction ability
One row + one variable \(O(min(m,n)\))
🐌 Brute force
DP is exponentially faster than exhaustively trying all insert/delete/replace sequences
All operation sequences \(O(3^(m+n)\))

Quick check

What is the edit distance between 'kitten' and 'sitting'?

Continue reading