Introduction
Edit Distance measures the minimum number of single-character operations needed to transform one string into another. You have operations available:
Insert a character Delete a character Replace a character with another
For example, transforming "kitten" to "sitting" requires operations: replace 'k' with 's', replace 'e' with 'i', and insert 'g' at the end.
You'll find Edit Distance everywhere: spell checkers suggest corrections, DNA sequencing aligns genetic strings, and fuzzy search matches similar queries.
Intuition
I'll walk you through transforming "cat" to "cut".
Start with "cat". You compare characters from left to right. 'c' matches 'c', so no operation needed. 'a' doesn't match 'u', so you replace 'a' with 'u'. Now you have "cut". 't' matches 't'. Total cost: operation.
The key insight is optimal substructure. If you know the minimum edits for shorter prefixes, you can build the answer for longer strings. The best way to transform the first characters depends only on the best ways to transform smaller prefixes. This makes it perfect for dynamic programming.
DP State & Transition
You define as the minimum operations to transform the first characters of string into the first characters of string .
Base cases:
- (insert characters to build from empty string)
- (delete characters to reach empty string)
Transition: When comparing and :
If they match:
If they don't match, take the minimum of:
- (delete from )
- (insert into )
- (replace)
Implementation
Here's the complete pseudocode:
function editDistance(s, t):
m = length(s)
n = length(t)
dp = new array[m + 1][n + 1]
for i from 0 to m:
dp[i][0] = i
for j from 0 to n:
dp[0][j] = j
for i from 1 to m:
for j from 1 to n:
if s[i - 1] == t[j - 1]:
dp[i][j] = dp[i - 1][j - 1]
else:
dp[i][j] = 1 + min(
dp[i - 1][j],
dp[i][j - 1],
dp[i - 1][j - 1]
)
return dp[m][n]
Complexity & Optimization
Time complexity: where and are the lengths of the two strings. You fill every cell in the table exactly once.
Space complexity: for the full DP table.
Optimization: Since each row only depends on the previous row, you can reduce space to by keeping just rows and swapping them.
If you need to reconstruct the actual operations, you either keep the full table and backtrack, or store parent pointers. This brings space back to .