ts00ey

kmp and z-function

I started doing the weekly leetcode contests with my friend and this week we got destroyed by KMP. I decided to revisit/revise some notes I wrote in the past.


KMP allows you to do quick string search (needle in haystack). It does this by calculating the "longest prefix suffix" (LPS) for the string "{needle}{character in neither}{haystack}". The LPS of a string is the longest prefix that's also a suffix (but it can't be the entire string), we find the LPS for every index i -- for every substring of the from "{X}...{X}" find the length of X (this isn't entirely correct cause the prefix and suffix could overlap e.g. "abaabaa" the LPS is "abaa").

This is easier to see with an example:

index:       0  1  2  3  4  5  6
string:      A  B  A  B  C  A  B
LPS:         0  0  1  2  0  1  2

at index 0 "A" the longest prefix that's also a suffix is "", it's not "A" because it can't be the entire string "A" itself.

at index 1 "AB" the longest prefix that's also a suffix is "".

at index 2 "ABA" the longest prefix that's also a suffix is "A".

at index 3 "ABAB" the longest prefix that's also a suffix is "AB".

at index 4 "ABABC" the longest prefix that's also a suffix is "".

at index 5 "ABABCA" the longest prefix that's also a suffix is "A".

at index 6 "ABABCAB" the longest prefix that's also a suffix is "AB".

For KMP we now calculate the LPS for the string "{needle}#{haystack}" (assume # doesn't appear in needle or haystack). Each time in the longest prefix suffix is of length needle, we know that we have a match for the needle.


The Z-function is very similar to KMP, except at index i it returns the longest prefix starting at i (so at index 0 it's the whole string).

The Z-function for "ABABCAB":

index:       0  1  2  3  4  5  6
string:      A  B  A  B  C  A  B
Z-function:  7  0  2  0  0  2  0