Frequent Words
Solve a medium-level problem to find the K most frequent words in sentences.
Problem statement
Given a list of sentences and an integer
Example
Sample input
words: ["i learn to code", "i love to code"]K: 2
Sample output
["code", "i"]
Explanation
"i"
, "code"
and "to"
are the three most frequent words with a frequency of two. But only "i"
and "code"
are returned as output due to lexicographical sorting.
Try it yourself
Try to solve the problem yourself before reading the solution.
#include <iostream>#include <vector>using namespace std;vector<string> topKFrequentWords(vector<string> words, int k) {// your code goes herevector<string> answer;return answer;}
Intuition
Sorting-based approach
The most intuitive method to solve this problem is sorting with a custom comparator. The comparator first compares the frequency of two words and sorts them in descending order of frequency. If the frequency for two words is the same, then they are sorted in alphabetically ascending (lexicographical) order.
// custom comparator for sortingcompare(pair a, pair b){// a and b are pairs of format<int,string > to store frequency and word// sort based on frequency// first and then lexicographical orderif (a.cnt is not equal to b.cnt)return a.cnt > b.cntelsereturn a.word < b.word}topKFrequent(sentences, k){// create a hashmapmp = map(string, int)// iterate through all sentencesfor every sentence in sentences{//fetch the words in the sentencesfor every word in sentences{increment countfor word in map mp}}// create a list of pairs (int, string) to store frequency related datafreqData = list(pair<int, string>)// push map data into freqData listfor (word, cnt) pair in mpfreqData.add(new_pair{cnt, word})// sort the freqData list using custom comparatorsort(freqData, compare)answer = list(string)// pick the top K elements from the list// add those to final answerfor every(cnt, word) pair in frequencyData{if (k > 0){answer.add(word)decrement k}}return answer}
Let the total number of words combined from all the sentences be
The frequency of each word is stored in an unordered map. The size of the map can be