The Frequent Words Problem
Find the most frequent k-mers in a string.
We'll cover the following...
Most frequent k-mer
We say that Pattern is a most frequent k-mer in Text if it maximizes Count(Text, Pattern) among all k-mers. You can see that ACTAT is a most frequent 5-mer for Text = ACAACTATGCATACTATCGGGAACTATCCT, and ATA is a most frequent 3-mer for Text = CGATATATCCATAG.
STOP and Think: Can a string have multiple most frequent k-mers?
We now have a rigorously defined computational problem.
Frequent Words Problem
Problem overview:
Find the most frequent k-mers in a string.
Input: A string Text and an integer k.
Output: All most frequent k-mers in Text.
Sample dataset:
AGCATGCACGTAAGCTAGC, 3
Sample output:
AGC
Press + to interact
import collectionsdef FrequentWords(Text, k):return
Solution explanation
- Line 3: We define a collection
kmer_count
, which stores the count of every possible k-mer inText
using thecollection.Counter()
function. - Line 4: We define a variable
max_count
that stores the value of the maximum count of k-mer inText
. We’ve done this using thecollection.most_common()
function. - Line 5: We define an array
max_count_kmers
, which stores all the k-mers with countmax_count
inkmer_count
. It does this by iterating over thekmer_count
collection and storing each k-mer withcount == max_count
in