The Frequent Words Problem
Find the most frequent k-mers in a string.
We'll cover the following...
We'll cover the following...
Most frequent k-mer
We say that Pattern is a most frequent k-mer in Text if it maximizes Count(Text, Pattern) among all k-mers. You can see that ACTAT is a most frequent 5-mer for Text = ACAACTATGCATACTATCGGGAACTATCCT, and ATA is a most frequent 3-mer for Text = CGATATATCCATAG.
STOP and Think: Can a string have multiple most frequent k-mers?
We now have a rigorously defined computational problem.
Frequent Words Problem
Problem overview:
Find the most frequent k-mers in a string.
Input: A string Text and an integer k.
Output: All most frequent k-mers in Text.
Sample dataset:
AGCATGCACGTAAGCTAGC, 3
Sample output:
AGC
import collectionsdef FrequentWords(Text, k):return
Solution explanation
- Line 3: We define a collection
kmer_count, which stores the count of every possible k-mer inTextusing thecollection.Counter()function. - Line 4: We define a variable
max_countthat stores the value of the maximum count of k-mer inText. We’ve done this using thecollection.most_common()function. - Line 5: We define an array
max_count_kmers, which stores all the k-mers with countmax_countinkmer_count. It does this by iterating over thekmer_countcollection and storing each k-mer withcount == max_countinmax_count_kmers. - Line 6: We return the values of the array
max_count_kmers