Given an input string and determining whether it contains a
In many programming languages, patterns are described using regular expressions (RegEx), and algorithms such as those mentioned below are used to match the patterns:
Note: To learn more about regular expressions, refer to this link.
Pattern Matching can be achieved using various algorithms, some of which are mentioned below:
Brute-force (Naive)
Knuth-Morris-Pratt (KMP)
Bitap
The Brute-force algorithm traverses every character of the input string and checks if the pattern to match begins from it. If the pattern isn't found to start on a particular character of the input string, we move on to the next character and repeat. This process is illustrated in the slides below:
Unfortunately, as seen from the slides above, this approach has a vital flaw—the amount of backtracking it involves.
Bitap is also known as the Shift-Or/Shift-And algorithm. The idea of the algorithm is to use bitmasks, where bits correspond to the elements (letters) in the pattern. The algorithm has a few constraints:
The number of letters in the alphabet has to be known.
The length of the search pattern should be short enough to fit into one machine word.
Note: To learn more about this, please refer to this Answer.
The KMP algorithm is used to find a pattern inside a string. This algorithm compares characters from left to right. However, anytime a mismatch occurs, it employs a preprocessed table known as the prefix table to bypass character comparison when matching.
Note: To learn more about this please refer to this Answer.
Pattern Matching is prevalent in programming and has found a place in the following applications:
Parsing of source files by high-level language compilers to determine if they are syntactically correct.
Identify if a matching pattern exists in a given input string (as described above).
Substituting a matching pattern with another sequence of characters if found, also known as Find and Replace.
Used in spelling and grammar checkers, spam detectors, and sentiment analysis tools.
Used in biomedical imaging for tumor identification.