kmp and z-function
I started doing the weekly leetcode contests with my friend and this week we got destroyed by KMP. I decided to revisit/revise some notes I wrote in the past.
KMP allows you to do quick string search (needle in haystack). It does this by calculating the "longest prefix suffix" (LPS) for the string "{needle}{character in neither}{haystack}". The LPS of a string is the longest prefix that's also a suffix (but it can't be the entire string), we find the LPS for every index i -- for every substring of the from "{X}...{X}" find the length of X (this isn't entirely correct cause the prefix and suffix could overlap e.g. "abaabaa" the LPS is "abaa").
This is easier to see with an example:
index: 0 1 2 3 4 5 6
string: A B A B C A B
LPS: 0 0 1 2 0 1 2
at index 0 "A" the longest prefix that's also a suffix is "", it's not "A" because it can't be the entire string "A" itself.
at index 1 "AB" the longest prefix that's also a suffix is "".
at index 2 "ABA" the longest prefix that's also a suffix is "A".
at index 3 "ABAB" the longest prefix that's also a suffix is "AB".
at index 4 "ABABC" the longest prefix that's also a suffix is "".
at index 5 "ABABCA" the longest prefix that's also a suffix is "A".
at index 6 "ABABCAB" the longest prefix that's also a suffix is "AB".
For KMP we now calculate the LPS for the string "{needle}#{haystack}" (assume # doesn't appear in needle or haystack). Each time in the longest prefix suffix is of length needle, we know that we have a match for the needle.
The Z-function is very similar to KMP, except at index i it returns the longest prefix starting at i (so at index 0 it's the whole string).
The Z-function for "ABABCAB":
index: 0 1 2 3 4 5 6
string: A B A B C A B
Z-function: 7 0 2 0 0 2 0