Charging Station: Find Frequent Words with Mismatching by Sorting
Learn an algorithm that finds frequent words with mismatching by sorting.
We'll cover the following
This lesson uses some notation from Charging Station: Finding Frequent Words by Sorting.
The following pseudocode reduces the Frequent Words with Mismatches Problem to sorting. It first generates all neighbors (with up to d mismatches) for all k-mers in Text and combines them all into an array NeighborhoodArray. Note that a k-mer Pattern appears Count(Text, Pattern) times in this array. The only thing left is to sort this array, count how many times each k-mer appears in the sorted array, and then return the k-mers that occur the maximum number of times.
Algorithm
The algorithm for the Frequent Words with Mismatches Problem is provided below:
FindingFrequentWordsWithMismatchesBySorting(Text, k, d)
FrequentPatterns ← an empty set
Neighborhoods ← an empty list
for i ← 0 to |Text|−k
add Neighbors(Text(i, k), d) to Neighborhoods
form an array NeighborhoodArray holding all strings in Neighborhoods
for i ← 0 to |Neighborhoods| − 1
Pattern ← NeighborhoodArray(i)
Index(i) ← PatternToNumber(Pattern)
Count(i) ← 1
SortedIndex ← Sort(Index)
for i ← 0 to |Neighborhoods| − 2
if SortedIndex(i) = SortedIndex(i + 1)
Count(i + 1) ← Count(i) + 1
maxCount ← maximum value in array Count
for i ← 0 to |Neighborhoods| − 1
if Count(i) = maxCount
Pattern ← NumberToPattern(SortedIndex(i), k)
add Pattern to FrequentPatterns
return FrequentPatterns
Get hands-on with 1400+ tech skills courses.