Charging Station: The Frequency Array

Learn how we can make the FREQUENTWORDS algorithm work faster, transform patterns, and different algorithms for frequent words problem.

To make FrequentWords faster, we’ll think about why this algorithm is slow in the first place. It slides a window of length k down Text, identifying a k-mer Pattern of Text at each step. For each such k-mer, it must slide a window down the entire length of Text in order to compute PatternCount(Text, Pattern). Instead of doing all this sliding, we aspire to slide a window down Text only once. As we slide this window, we’ll keep track of the number of times that each k-mer Pattern has already appeared in Text, updating these numbers as we proceed.

Ordering 4k^{k} k-mers

To achieve this goal, we’ll first order all 4k^{k} k-mers lexicographically (i.e., according to how they would appear in the dictionary) and then convert them into the 4k^{k} different integers between 0 and 4k1^{k} − 1. Given an integer k, we define the frequency array of a string Text as an array of length 4k^{k}, where the i-th element of the array holds the number of times that the i-th k-mer (in lexicographic order) appears in Text.

Get hands-on with 1200+ tech skills courses.