February 21, 2013

Knuth-Morris-Pratt and Boyer-Moore

For this post I made two scripts one for the Knuth-Morris-Pratt and other for Boyer-Moore, both in python.

Knuth-Morris-Pratt.

This algorithm searches in a text T the occurrences of the pattern P, this is made by employing the observation that when a mismatch occurs, the pattern shifts k positions in the text, k is the number of matches (T[i] == P[j]).

For example:

T: a b a c a a b a c c
P: a b a c a b

A mismatch occurs in T[6] and P[6] so k will be equal to 5 and we shift 5 spaces.


T: a b a c a a b a c c
P:              a b a c a b


This is the script:


Boyer-Moore

This algorithm is based on two ideas:
  • comparing pattern characters to text from right to left
  • Use of two heuristics
    • bad-symbol that indicates how much to shift based on the text character that caused a mismatch
    • good-suffix that indicates how much to shift based on matched part (suffix) of the pattern


Bad-character
A mismatched character “B” from the text appears only in the beginning of the pattern. Thus we can simply shift the pattern to the right and align both characters B, skipping comparisons.


Good-Suffix

In this case it move the pattern thus the repeated portion must now align with its first occurrence in the pattern.

Again the shift must align the second portion with its first occurrence.

Now it align the left end of the pattern with the rightmost occurrence of “B”.


Here is the script

Since after doing this scripts I was little of time I just take some times


Conclusions

I made more tests I seeing the results I see that the Boyer-Moore algorithm is faster than Knuth-Morris-Pratt.

I have to say that sometimes Boyer-Moore crashes when I use big patterns, that is something I couldn't fix

1 comment:

  1. Me hubiera gustado ver un script para lo del experimento; 4 pts código. En el reporte esperaba gráficas y una discusión tipo O(...) de complejidad. Van 2 pts por el reporte.

    ReplyDelete