In bioinformatics, pattern searching is an essential problem. We search for strings in different types of sequences using such algorithms.
The Rabin-Karp algorithm is a string searching or pattern matching algorithm. This algorithm uses the hash function to search the pattern.
Note: To learn more about hashing visit the Answer here.
Let's assume the original string is
Spurious hits occur in the searching process. These occur when the hash value of the search pattern is matched with the hash value of the text window, but the window is not the actual pattern.
Let's see the following example to understand the working of the algorithm.
The Rabin-Karp algorithm can work with different hash functions. The hash function we use in the above-explained example is to measure the length of the original string
We consider the window size to be the same as the length of the search string
If we get the same result as the one taken at the beginning, i.e.,
The code for the Rabin-Karp algorithm is as follows:
number_of_characters = 256def search(search_string, original_text, q):n = len(search_string)m = len(original_text)i = 0 # indexj = 0 # indexp = 0 # hash key for search_stringt = 0 # hash key for original_texth = 1# h would be assigned "pow(number_of_characters, n-1)%q"for i in range(n-1):h = (h*number_of_characters)%q# calculate the hash value of search_string and first occurence of textfor i in range(n):p = (number_of_characters*p + ord(search_string[i]))%qt = (number_of_characters*t + ord(original_text[i]))%q# slide over the search_string character by characterfor i in range(m-n+1):# we check the hash values for the current window of original_text and search_string.# if the hash value matches then we check for characters one by oneif p==t:for j in range(n):if original_text[i+j] != search_string[j]:breakelse: j+=1# if p == t and search_string[0...n-1] = original_text[i, i+1, ...i+n-1]if j==n:print ("search_string found at slide " + str(i))# we calculate the hash value for the next window of text: Remove# leading digit, add trailing digitif i < m-n:t = (number_of_characters*(t-ord(original_text[i])*h) + ord(original_text[i+n]))%q# if we encounter negative values of t, we convert it to positiveif t < 0:t = t+q# mainoriginal_text = "31415926535"search_string = "41"# prime numberq = 101search(search_string,original_text,q)
In main.py
:
Line 1: number_of_characters
is the number of characters in the input alphabet.
Lines 3–42: We use the search
function to search the pattern that makes use of the Rabin-Karp algorithm for the given search_string
.
This algorithm searches for more than one occurrence of the search_string
.
The best and average runtime case for this algorithm is