Introduction
String Hashing is a technique to compare substrings in time after preprocessing.
The Problem: Comparing two substrings of length takes time. With queries, the naive approach is .
##### ###### ##### ### # # ### # # ###### ## ## ## ## ## ## ## # # # # # ## ##### #### ##### # # # # # # # #### ## # ## ## ## ## # # # # # ## ## # ###### ## ### # ### # ######
##### ###### ##### ### # # ### # # ###### ## ## ## ## ## ## ## # # # # # ## ##### #### ##### # # # # # # # #### ## # ## ## ## ## # # # # # ## ## # ###### ## ### # ### # ######
String Hashing is a technique to compare substrings in time after preprocessing.
The Problem: Comparing two substrings of length takes time. With queries, the naive approach is .
Apply what you learned by solving the practice problem for String Hashing (Polynomial Hash).
Go to Practice ProblemThe Idea: Map each string to a number (its "hash"). If two strings have different hashes, they are definitely different. If they have the same hash, they are probably equal.
Applications:
A polynomial hash treats a string as a polynomial evaluated at some base , modulo a prime .
For string :
Typical choices:
Example: Hash of "abc" with , :
To query substring hashes in , precompute prefix hashes.
Define as the hash of :
Building prefix hashes:
prefix[0] = 0
for i from 0 to n-1:
prefix[i+1] = (prefix[i] * b + s[i]) mod p
Also precompute powers of :
power[0] = 1
for i from 1 to n:
power[i] = (power[i-1] * b) mod p
Time: preprocessing
To get the hash of substring (0-indexed):
In practice (since we cannot divide):
Make sure to handle negative modulo:
function hash(l, r):
len = r - l + 1
h = prefix[r+1] - prefix[l] * power[len]
return ((h mod p) + p) mod p
Comparing substrings:
function equal(l1, r1, l2, r2):
if (r1 - l1) != (r2 - l2):
return false
return hash(l1, r1) == hash(l2, r2)
Time: per query
Hash collisions occur when two different strings have the same hash.
Probability: With modulus and queries, collision probability is roughly .
For and : probability .
Strategies to reduce collisions:
equal = (hash1(l1,r1) == hash1(l2,r2)) AND
(hash2(l1,r1) == hash2(l2,r2))
Use larger modulus: Use instead of .
Randomize base: Choose base randomly at runtime.
For competitive programming, single hashing with is usually sufficient, but be aware of anti-hash test cases.