Rules for Boyer-Moore search algorithm

The Boyer-Moore algorithm is a string-searching algorithm. It was developed in 1977 by Robert S. Boyer and J. Strother Moore.

It consists of rules that determine the skipping of characters while matching alignments during the search process. We'll discuss these rules in this Answer.

Rules for searching

These are the following two rules for searching in the Boyer-Moore algorithm:

The bad character rule
The good suffix rule

Let $T$ be the original text string and $P$ be the pattern we want to search. The length of the original text is $n$ and the length of the pattern is $m$ .

The bad character rule

In the bad character rule, we find a character in $T$ that fails to match $P$ . The successive occurrence on the left of $P$ is found. Another shift is proposed that brings the occurrence in line with the mismatched occurrence present in $T$ . If the mismatched occurrence of the character is not found on the left of $P$ , a shift is proposed that shifts $P$ past the point of mismatch.

In a nutshell, if we mismatch a character, we use the knowledge of the mismatched text character to skip alignments.

The good suffix rule

Suppose we have a substring $t$ of $T$ . Let $t$ match a suffix of $P$ but a mismatch occurs.

The rightmost copy $t'$ of $t$ in $P$ is found such that $t'$ is not a suffix of $P$ . The character to the left of $t'$ in $P$ differs from the characters to the left of $t$ in $P$ . $P$ is shifted to the right such that $t'$ in $P$ aligns with $t$ in $T$ .

If $t'$ doesn't exist, then shift the search window towards the left end of $P$ , past the left end of $t$ in $T$ . This occurs such that the prefix of the shifted pattern matches the suffix of $t$ in $T$ .

If such a shift isn't possible, then $P$ is shifted by $m$ indexes to the right. If $P$ is found, then it shifts $P$ by at least an amount so that a proper prefix of the shifted $P$ matches a suffix of the occurrence of $P$ in $T$ . If the aforementioned shift isn't possible, then it shifts $P$ by $m$ places so that it's shift past $t$ .

In a nutshell, if we match some characters, we use knowledge of the matched characters to skip alignments.

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

TRENDING TOPICS

Learn to Code

Tech Interview Prep

Generative AI

Data Science

Machine Learning

GitHub Students Scholarship

Early Access Courses

Blind 75

Layoffs

Pricing

For Individuals

Try for Free

Gift a Subscription

CONTRIBUTE

Become an Author

Become an Affiliate

Earn Referral Credits

RESOURCES

Blog

Cheatsheets

Webinars

Answers

ABOUT US

Our Team

Careers

Hiring

Frequently Asked Questions

Press

LEGAL

Cookie Policy

Business Terms of Service

Data Processing Agreement

INTERVIEW PREP COURSES

Grokking the Modern System Design Interview

Grokking the Product Architecture Design Interview

Grokking the Coding Interview Patterns

Machine Learning System Design

Rules for Boyer-Moore search algorithm

Rules for searching

The bad character rule

The good suffix rule

Conclusion