...
/Solution Review: Search Similar Words
Solution Review: Search Similar Words
See the solution to the problem challenge.
We'll cover the following...
Solution
#include <iostream>#include <vector>using namespace std;// Definition of a trie nodeclass TrieNode{public:// children array store pointer to TrieNode// of next 26 alphabets + 1 special characterTrieNode *children[27];// A flag used to identify if a// node denotes last character of a wordbool isEndOfWord;// a vector to store the complete word// at the last node of each path// this helps in easy retrieval of wordsvector<string> words;};class Trie{public:TrieNode * root;Trie(){root = new TrieNode();}// insert a word into trievoid insert(string word, string completeWord){// point the current node to the rootTrieNode *curr = root;for (char curChar: word){int idx = 26;if (curChar != '_'){idx = curChar - 'a';}//create a new trie node// if not already existing in trieif (curr->children[idx] == NULL){curr->children[idx] = new TrieNode();}curr = curr->children[idx];}//mark isEndOfWord as true for the last nodecurr->isEndOfWord = true;curr->words.push_back(completeWord);}// search a word into trievector<string> search(string word){// point the current node to the rootTrieNode *curr = root;// iterate through all characters of the given wordfor (char curChar: word){int idx = 26;if (curChar != '_'){idx = curChar - 'a';}// if a character in the word is not found, return falseif (curr->children[idx] == NULL){return {};}curr = curr->children[idx];}// return the value of isEndOfWord for the last character of the word being searched// if isEndOfWord is true, it means word was found completely// else false if returnedreturn curr->words;}};// replaces all indexes by special character "_"// one by one and push to a result vectorvector<string> addSpecialCharAtAllIndexes(string word){vector<string> result;int len = word.size();for (int index = 0; index < len; index++){char temp = word[index];word[index] = '_';result.push_back(word);word[index] = temp;}return result;}vector<int> queryMatchingWordsCount(vector<string> words, vector<string> queryWords){// Initialise a trieTrie *trie = new Trie();// Iterate through the wordsfor (string word: words){vector<string> oneCharAwayStrings = addSpecialCharAtAllIndexes(word);//insert the words oneCharAwayStrings in triefor (string oneCharAwayString: oneCharAwayStrings){trie->insert(oneCharAwayString, word);}}vector<int> result;// Iterate through the query wordsfor (string queryWord: queryWords){vector<string> resultForQueryWord;// find all one character away words for given query word// check if any of those exist in the trievector<string> oneCharAwayStrings = addSpecialCharAtAllIndexes(queryWord);for (string oneCharAwayString: oneCharAwayStrings){vector<string> nearByWords = trie->search(oneCharAwayString);for (string s: nearByWords)resultForQueryWord.push_back(s);}result.push_back(resultForQueryWord.size());}return result;}
Intuition
This problem is a modification of the first problem: spelling correction–based search. Let's modify the problem to a simpler version first. Suppose a list of words and a string is given to us and we're asked to find if the given string exists in the dictionary. In this case, we can insert all the dictionary words into a trie, iterate through all the letters in the strings, and search for the query word in the trie. The above problem requires us to solve something similar, but with the added condition that one letter of the word may not match.
Let's take an example where the word "cat" is present in the list. As per the given conditions, if a user is trying to find any of "aat", "bat", "dat", "eat", "fat", etc., the output should be true in all cases since these words differ in only one letter from the original word “cat.”
This provides an intuition that we can insert all the possible one-letter-away words into the trie and then search for the given query word. However, this way, we would be inserting many more words in the trie than required, consuming much more memory. Alternatively, we can replace each character of the word with a unique special character and then search for the query word. We modify the input query word by replacing each character with a special character and searching for it in the trie. For a word of length
For example, when searching for the word fat
in the dictionary, we replace the first letter f
with a special character _
and insert _at
in the trie. Similarly, we insert f_t
and fa_
.
Now if we need to search for any one-character-away word from bat
, we can convert bat
into ba_
, b_t
, and _at
. We'll find a match for the string _at
, denoting a word in the list that is just one character away from the word bat
.
In our trie node, we maintain a list of the words corresponding to the current modified string. For example, the keywords bat
, cat
, and mat
present in the database are associated with the string _at
.
// Definition of a trie nodeclass TrieNode {public:// children array store pointer to TrieNode// of next 26 alphabets + 1 special characterTrieNode *children[27];// A flag used to identify if a// node denotes last character of a wordbool isEndOfWord;// a vector to store the complete word// at the last node of each path// this helps in easy retrieval of wordsvector<string> words;};
Algorithm
The summary of the algorithm is given below.
Step 1: Insertion in the trie
For every word of the datastore, generate a list of words where a special character, _
, replaces each character of the word. Insert all these words in the trie. Add an extra pointer in the children's array to handle the special character. Add the complete word string in words
array of the last trie node.
Step 2: Searching in the trie
For each query word, generate all possible one-letter-away words, same as in the step above, by replacing each character with _
one by one and creating a new word for each character replacement. Search for these query words in the trie. If the word is found, return the words
list associated with the last trie node; else, return an empty array.
Visualization
The picture below represents the insertion and the searching procedure.
Complexity analysis
Variables
Number of words in the list =
...