Introduction
The Knuth-Morris-Pratt (KMP) algorithm finds all occurrences of a pattern in a text in time, where and .
##### ###### ##### ### # # ### # # ###### ## ## ## ## ## ## ## # # # # # ## ##### #### ##### # # # # # # # #### ## # ## ## ## ## # # # # # ## ## # ###### ## ### # ### # ######
##### ###### ##### ### # # ### # # ###### ## ## ## ## ## ## ## # # # # # ## ##### #### ##### # # # # # # # #### ## # ## ## ## ## # # # # # ## ## # ###### ## ### # ### # ######
The Knuth-Morris-Pratt (KMP) algorithm finds all occurrences of a pattern in a text in time, where and .
Apply what you learned by solving the practice problem for KMP Pattern Matching.
Go to Practice ProblemNaive approach: For each position in , check if matches starting at . This takes time in the worst case.
The problem: When a mismatch occurs, the naive algorithm restarts matching from the next position. But we may have already seen characters that could help us skip ahead.
Key insight: If we have matched some characters and then fail, parts of what we matched might still be useful.
Example: Searching for "ABAB" in "ABABABC"
The failure function (also called prefix function or -function) is the key to KMP.
For position in pattern , = length of the longest proper prefix of that is also a suffix.
Example: Pattern "ABAB"
Computing :
function compute_failure(P):
m = length(P)
pi = array of size m, all zeros
k = 0
for i from 1 to m-1:
while k > 0 and P[k] != P[i]:
k = pi[k-1]
if P[k] == P[i]:
k = k + 1
pi[i] = k
return pi
Time:
Using the failure function, we can search efficiently:
function kmp_search(T, P):
n = length(T)
m = length(P)
pi = compute_failure(P)
matches = []
j = 0 // characters matched so far
for i from 0 to n-1:
// On mismatch, use failure function to skip
while j > 0 and P[j] != T[i]:
j = pi[j-1]
// Check if characters match
if P[j] == T[i]:
j = j + 1
// Found complete match
if j == m:
matches.append(i - m + 1)
j = pi[j-1] // Continue searching
return matches
Why this works:
Text: "ABABABC", Pattern: "ABAB"
Failure function:
| i | T[i] | j (before) | Action | j (after) | Match? |
|---|---|---|---|---|---|
| 0 | A | 0 | P[0]=A matches | 1 | |
| 1 | B | 1 | P[1]=B matches | 2 | |
| 2 | A | 2 | P[2]=A matches | 3 | |
| 3 | B | 3 | P[3]=B matches | 4 | Match at pos 1! |
| 3 | B | Continue with j=2 | 2 | ||
| 4 | A | 2 | P[2]=A matches | 3 | |
| 5 | B | 3 | P[3]=B matches | 4 | Match at pos 3! |
| 5 | B | Continue with j=2 | 2 | ||
| 6 | C | 2 | P[2]=A C, j= | 0 | |
| 6 | C | 0 | P[0]=A C | 0 |
Result: Matches at positions 1 and 3
Time Complexity:
Why for search? Although there is a while loop inside the for loop, the total number of iterations is bounded.
Each iteration of the for loop:
Each iteration of the while loop:
Since starts at 0 and can increase at most times total, it can also decrease at most times total. So the while loop runs at most times across all iterations.
Space Complexity: for the failure function array.
Comparison:
| Algorithm | Time | Space |
|---|---|---|
| Naive | ||
| KMP |
| Rabin-Karp | avg |