#####   ######  #####    ###    #   #  ###  #   #  ######
##  ##  ##      ##  ##  ## ##   #   #   #   #   #  ##
#####   ####    #####   #   #   #   #   #   #   #  ####
##  #   ##      ##      ## ##    # #    #    # #   ##
##   #  ######  ##       ###      #    ###    #    ######

$ curl repovive.com/algorithms/kmp-pattern-matching

░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░███████████████████████████████████████████████████

#####   ######  #####    ###    #   #  ###  #   #  ######
##  ##  ##      ##  ##  ## ##   #   #   #   #   #  ##
#####   ####    #####   #   #   #   #   #   #   #  ####
##  #   ##      ##      ## ##    # #    #    # #   ##
##   #  ######  ##       ###      #    ###    #    ######

$ curl repovive.com/algorithms/kmp-pattern-matching

Repovive

KMP Pattern Matching - Algorithm Tutorial | Repovive | Repovive

Repovive.

Back to Algorithms

KMP Pattern Matching Algorithm Tutorial

mediumStrings8 min read

Introduction

The Knuth-Morris-Pratt (KMP) algorithm finds all occurrences of a pattern $P$ in a text $T$ in $O(n + m)$ time, where $n = |T|$ and $m = |P|$ .

Naive approach: For each position $i$ in $T$ , check if $P$ matches starting at $i$ . This takes $O(nm)$ time in the worst case.

The problem: When a mismatch occurs, the naive algorithm restarts matching from the next position. But we may have already seen characters that could help us skip ahead.

Key insight: If we have matched some characters and then fail, parts of what we matched might still be useful.

Example: Searching for "ABAB" in "ABABABC"

At position 1, we match "ABAB" but then fail at "C"
The naive approach restarts at position 2
KMP realizes: "AB" at the end of our match equals "AB" at the start, so we can continue from there

The Failure Function

The failure function (also called prefix function or $\pi$ -function) is the key to KMP.

For position $i$ in pattern $P$ , $\pi[i]$ = length of the longest proper prefix of $P[0..i]$ that is also a suffix.

Example: Pattern "ABAB"

$\pi[0] = 0$ (by definition, no proper prefix)
$\pi[1] = 0$ ("AB" has no matching prefix/suffix)
$\pi[2] = 1$ ("ABA": "A" matches at start and end)
$\pi[3] = 2$ ("ABAB": "AB" matches at start and end)

Computing $\pi$ :

text

function compute_failure(P):
    m = length(P)
    pi = array of size m, all zeros
    k = 0

    for i from 1 to m-1:
        while k > 0 and P[k] != P[i]:
            k = pi[k-1]
        if P[k] == P[i]:
            k = k + 1
        pi[i] = k

    return pi

Time: $O(m)$

The Search Algorithm

Using the failure function, we can search efficiently:

text

function kmp_search(T, P):
    n = length(T)
    m = length(P)
    pi = compute_failure(P)
    matches = []

    j = 0  // characters matched so far

    for i from 0 to n-1:
        // On mismatch, use failure function to skip
        while j > 0 and P[j] != T[i]:
            j = pi[j-1]

        // Check if characters match
        if P[j] == T[i]:
            j = j + 1

        // Found complete match
        if j == m:
            matches.append(i - m + 1)
            j = pi[j-1]  // Continue searching

    return matches

Why this works:

$j$ tracks how many characters of $P$ we have matched
On mismatch, $\pi[j-1]$ tells us the longest prefix that could still match
We never re-examine characters in $T$

Trace Through Example

Text: "ABABABC", Pattern: "ABAB"

Failure function: $\pi = [0, 0, 1, 2]$

i	T[i]	j (before)	Action	j (after)	Match?
0	A	0	P[0]=A matches	1
1	B	1	P[1]=B matches	2
2	A	2	P[2]=A matches	3
3	B	3	P[3]=B matches	4	Match at pos 1!
3	B	$\pi[3]=2$	Continue with j=2	2
4	A	2	P[2]=A matches	3
5	B	3	P[3]=B matches	4	Match at pos 3!
5	B	$\pi[3]=2$	Continue with j=2	2
6	C	2	P[2]=A $\ne$ C, j= $\pi[1]=0$	0
6	C	0	P[0]=A $\ne$ C	0

Result: Matches at positions 1 and 3

Implement this Algorithm

Go to Problem

Related Algorithms

String Hashing (Polynomial Hash)

medium2 min

Complexity Analysis

Time Complexity:

Preprocessing (failure function): $O(m)$
Searching: $O(n)$
Total: $O(n + m)$

Why $O(n)$ for search? Although there is a while loop inside the for loop, the total number of iterations is bounded.

Each iteration of the for loop:

Increases $i$ by 1
May increase $j$ by at most 1

Each iteration of the while loop:

Decreases $j$ (via $\pi[j-1] < j$ )

Since $j$ starts at 0 and can increase at most $n$ times total, it can also decrease at most $n$ times total. So the while loop runs at most $n$ times across all iterations.

Space Complexity: $O(m)$ for the failure function array.

Comparison:

Algorithm	Time	Space
Naive	$O(nm)$	$O(1)$
KMP	$O(n+m)$	$O(m)$
Rabin-Karp	$O(n+m)$ avg	$O(1)$