Introduction
The Knuth-Morris-Pratt (KMP) algorithm finds all occurrences of a pattern in a text in time, where and .
Naive approach: For each position in , check if matches starting at . This takes time in the worst case.
The problem: When a mismatch occurs, the naive algorithm restarts matching from the next position. But we may have already seen characters that could help us skip ahead.
Key insight: If we have matched some characters and then fail, parts of what we matched might still be useful.
Example: Searching for "ABAB" in "ABABABC"
- At position 1, we match "ABAB" but then fail at "C"
- The naive approach restarts at position 2
- KMP realizes: "AB" at the end of our match equals "AB" at the start, so we can continue from there
The Failure Function
The failure function (also called prefix function or -function) is the key to KMP.
For position in pattern , = length of the longest proper prefix of that is also a suffix.
Example: Pattern "ABAB"
- (by definition, no proper prefix)
- ("AB" has no matching prefix/suffix)
- ("ABA": "A" matches at start and end)
- ("ABAB": "AB" matches at start and end)
Computing :
function compute_failure(P):
m = length(P)
pi = array of size m, all zeros
k = 0
for i from 1 to m-1:
while k > 0 and P[k] != P[i]:
k = pi[k-1]
if P[k] == P[i]:
k = k + 1
pi[i] = k
return pi
Time:
The Search Algorithm
Using the failure function, we can search efficiently:
function kmp_search(T, P):
n = length(T)
m = length(P)
pi = compute_failure(P)
matches = []
j = 0 // characters matched so far
for i from 0 to n-1:
// On mismatch, use failure function to skip
while j > 0 and P[j] != T[i]:
j = pi[j-1]
// Check if characters match
if P[j] == T[i]:
j = j + 1
// Found complete match
if j == m:
matches.append(i - m + 1)
j = pi[j-1] // Continue searching
return matches
Why this works:
- tracks how many characters of we have matched
- On mismatch, tells us the longest prefix that could still match
- We never re-examine characters in
Trace Through Example
Text: "ABABABC", Pattern: "ABAB"
Failure function:
| i | T[i] | j (before) | Action | j (after) | Match? |
|---|---|---|---|---|---|
| 0 | A | 0 | P[0]=A matches | 1 | |
| 1 | B | 1 | P[1]=B matches | 2 | |
| 2 | A | 2 | P[2]=A matches | 3 | |
| 3 | B | 3 | P[3]=B matches | 4 | Match at pos 1! |
| 3 | B | Continue with j=2 | 2 | ||
| 4 | A | 2 | P[2]=A matches | 3 | |
| 5 | B | 3 | P[3]=B matches | 4 | Match at pos 3! |
| 5 | B | Continue with j=2 | 2 | ||
| 6 | C | 2 | P[2]=A C, j= | 0 | |
| 6 | C | 0 | P[0]=A C | 0 |
Result: Matches at positions 1 and 3