Developer: Knuth-Morris-Pratt and Boyer-Moore

For this post I made two scripts one for the Knuth-Morris-Pratt and other for Boyer-Moore, both in python.

Knuth-Morris-Pratt.

This algorithm searches in a text T the occurrences of the pattern P, this is made by employing the observation that when a mismatch occurs, the pattern shifts k positions in the text, k is the number of matches (T[i] == P[j]).

For example:

T: a b a c a a b a c c

P: a b a c a b

A mismatch occurs in T[6] and P[6] so k will be equal to 5 and we shift 5 spaces.

T: a b a c a a b a c c
P: a b a c a b

This is the script:

Boyer-Moore

This algorithm is based on two ideas:

comparing pattern characters to text from right to left
Use of two heuristics

bad-symbol that indicates how much to shift based on the text character that caused a mismatch
good-sufﬁx that indicates how much to shift based on matched part (sufﬁx) of the pattern

Bad-character

A mismatched character “B” from the text appears only in the beginning of the pattern. Thus we can simply shift the pattern to the right and align both characters B, skipping comparisons.

Good-Suffix

In this case it move the pattern thus the repeated portion must now align with its first occurrence in the pattern.

Again the shift must align the second portion with its first occurrence.

Now it align the left end of the pattern with the rightmost occurrence of “B”.

Here is the script

Since after doing this scripts I was little of time I just take some times

Conclusions

I made more tests I seeing the results I see that the Boyer-Moore algorithm is faster than Knuth-Morris-Pratt.

I have to say that sometimes Boyer-Moore crashes when I use big patterns, that is something I couldn't fix

Sources
iti.fh-flensburg.de
personal.kent.edu
www.iti.fh-flensburg.de
cs.utexas.edu

Developer

Labels

February 21, 2013

Knuth-Morris-Pratt and Boyer-Moore

Knuth-Morris-Pratt.

Boyer-Moore

Conclusions

1 comment:

HTML Gadget Sample