Z-algorithm walk-through
Pseudo-code:
Z[0] = 0 // The z array
maxZ = 0 // The furthest we've reached into the string (the largest j + Z[j] value seen so far)
j = 0 // the location where maxZ occurs
for (i = 1; i < n; i++)
if (maxZ < i) // case 0: must do the hard work
l = 0 // index within currently matched region
Z[i] = 0
while (T[i + l] == T[l]) // naive match - while characters match increment Z[i]
Z[i]++; l++
end while
maxZ = i + Z[i] - 1// this is the farthest we've gotten in the string
j = i // and this is where it happened
else // now we have the 3 cases from the lecture notes
k = i – j // this is how far we are in the Z box of j
if (Z[k] + i < maxZ) // case 1
Z[i] = Z[k]
else if (Z[k] + i > maxZ) // case 2
Z[i] = maxZ – i + 1
else // case 3 Z[k] + i = maxZ
Z[i] = maxZ – i + 1 // I know I've gotten this far
l = Z[i] + 1
while (T[i + l] == T[l]) // all the following characters must be checked
Z[i]++; l++
end while
maxZ = i + Z[i] - 1
j = i
end if
end if
end for
Now going through the algorithm
T = A G A G A G A C C A G C A G A G A G A G A C G T
i = 1; maxZ is less than 1, hence we are in case 0 - brute force
the first while loop ends immediately as T[l] != T[i + l] (T[0] != T[1])
Z[1] is set to 0; maxZ remains 0, j remains 0
i=2; maxZ is less than 1, hence we are in case 0 - brute force
the first while loop:
l=0, T[0] == T[2] => Z[2] = 1
l=1, T[1] == T[3] => Z[2] = 2
l=2, T[2] == T[4] => Z[2] = 3
...
l = 5, T[5] == T[7] => Z[2] = 5
l =6, T[6] != T[8] => Z[2] = 5
loop ends
j = 2; maxZ = 6
i=3; maxZ > 3, hence we look at the second part of the if statement
k = i - j = 1
Z[k] = 0
since Z[k] + i = 3 < maxZ we are in case 1, and can directly set Z[3] = Z[1] = 0;
i=4; maxZ > 3 hence we look at the second part of the if statement
k = i - j = 2
Z[k] = 5
since Z[k] + i = 9 > maxZ we are in case 2 and can directly set Z[4] = maxZ - i + 1 = 3
...
i=14; maxZ = 18 , j=12 (pen and paper will help figure this out)
k = i -j = 2
Z[k] = 5
since Z[k] + i = 18 = maxZ we are in case 3
we start the loop counting from l = maxZ - i + 1= 5; and setting Z[i] = 4
we match GAC and mismatch at T[8] != T[22], making Z[i] = 4 + 3 = 7
maxZ = 21, j = 14
and so on...