What is pattern matching?

Given an input string and determining whether it contains a patternA pattern is a particular sequence of characters. is known as pattern matching. Figuring out whether some characters from the pattern need to match or all the characters should, depends entirely on the application.

The pattern (in blue) identified within the input string (in red)

In many programming languages, patterns are described using regular expressions (RegEx), and algorithms such as those mentioned below are used to match the patterns:

Note: To learn more about regular expressions, refer to this link.

Algorithms

Pattern Matching can be achieved using various algorithms, some of which are mentioned below:

  • Brute-force (Naive)

  • Knuth-Morris-Pratt (KMP)

  • Bitap

Brute-force (Naive)

The Brute-force algorithm traverses every character of the input string and checks if the pattern to match begins from it. If the pattern isn't found to start on a particular character of the input string, we move on to the next character and repeat. This process is illustrated in the slides below:

1 of 13

Unfortunately, as seen from the slides above, this approach has a vital flaw—the amount of backtracking it involves.

Bitap

Bitap is also known as the Shift-Or/Shift-And algorithm. The idea of the algorithm is to use bitmasks, where bits correspond to the elements (letters) in the pattern. The algorithm has a few constraints:

  • The number of letters in the alphabet has to be known.

  • The length of the search pattern should be short enough to fit into one machine word.

Note: To learn more about this, please refer to this Answer.

Knuth-Morris-Pratt

The KMP algorithm is used to find a pattern inside a string. This algorithm compares characters from left to right. However, anytime a mismatch occurs, it employs a preprocessed table known as the prefix table to bypass character comparison when matching.

Note: To learn more about this please refer to this Answer.

Applications

Pattern Matching is prevalent in programming and has found a place in the following applications:

  • Parsing of source files by high-level language compilers to determine if they are syntactically correct.

  • Identify if a matching pattern exists in a given input string (as described above).

  • Substituting a matching pattern with another sequence of characters if found, also known as Find and Replace.

  • Used in spelling and grammar checkers, spam detectors, and sentiment analysis tools.

  • Used in biomedical imaging for tumor identification.

Copyright ©2024 Educative, Inc. All rights reserved