...

/

The Frequent Words Problem

The Frequent Words Problem

Find the most frequent k-mers in a string.

Most frequent k-mer

We say that Pattern is a most frequent k-mer in Text if it maximizes Count(Text, Pattern) among all k-mers. You can see that ACTAT is a most frequent 5-mer for Text = ACAACTATGCATACTATCGGGAACTATCCT, and ATA is a most frequent 3-mer for Text = CGATATATCCATAG.

STOP and Think: Can a string have multiple most frequent k-mers?

We now have a rigorously defined computational problem.

Frequent Words Problem

Problem overview:
Find the most frequent k-mers in a string.

Input: A string Text and an integer k.
Output: All most frequent k-mers in Text.

Sample dataset:

AGCATGCACGTAAGCTAGC, 3

Sample output:

AGC

Press + to interact
import collections
def FrequentWords(Text, k):
return

Solution explanation

  • Line 3: We define a collection kmer_count, which stores the count of every possible k-mer in Text using the collection.Counter() function.
  • Line 4: We define a variable max_count that stores the value of the maximum count of k-mer in Text. We’ve done this using the collection.most_common() function.
  • Line 5: We define an array max_count_kmers, which stores all the k-mers with count max_count in kmer_count. It does this by iterating over the kmer_count collection and storing each k-mer with count == max_count in
...