...

/

Solution Review: Search Similar Words

Solution Review: Search Similar Words

See the solution to the problem challenge.

Solution

#include <iostream>
#include <vector>
using namespace std;
// Definition of a trie node
class TrieNode
{
public:
// children array store pointer to TrieNode
// of next 26 alphabets + 1 special character
TrieNode *children[27];
// A flag used to identify if a
// node denotes last character of a word
bool isEndOfWord;
// a vector to store the complete word
// at the last node of each path
// this helps in easy retrieval of words
vector<string> words;
};
class Trie
{
public:
TrieNode * root;
Trie()
{
root = new TrieNode();
}
// insert a word into trie
void insert(string word, string completeWord)
{
// point the current node to the root
TrieNode *curr = root;
for (char curChar: word)
{
int idx = 26;
if (curChar != '_')
{
idx = curChar - 'a';
}
//create a new trie node
// if not already existing in trie
if (curr->children[idx] == NULL)
{
curr->children[idx] = new TrieNode();
}
curr = curr->children[idx];
}
//mark isEndOfWord as true for the last node
curr->isEndOfWord = true;
curr->words.push_back(completeWord);
}
// search a word into trie
vector<string> search(string word)
{
// point the current node to the root
TrieNode *curr = root;
// iterate through all characters of the given word
for (char curChar: word)
{
int idx = 26;
if (curChar != '_')
{
idx = curChar - 'a';
}
// if a character in the word is not found, return false
if (curr->children[idx] == NULL)
{
return {};
}
curr = curr->children[idx];
}
// return the value of isEndOfWord for the last character of the word being searched
// if isEndOfWord is true, it means word was found completely
// else false if returned
return curr->words;
}
};
// replaces all indexes by special character "_"
// one by one and push to a result vector
vector<string> addSpecialCharAtAllIndexes(string word)
{
vector<string> result;
int len = word.size();
for (int index = 0; index < len; index++)
{
char temp = word[index];
word[index] = '_';
result.push_back(word);
word[index] = temp;
}
return result;
}
vector<int> queryMatchingWordsCount(vector<string> words, vector<string> queryWords)
{
// Initialise a trie
Trie *trie = new Trie();
// Iterate through the words
for (string word: words)
{
vector<string> oneCharAwayStrings = addSpecialCharAtAllIndexes(word);
//insert the words oneCharAwayStrings in trie
for (string oneCharAwayString: oneCharAwayStrings)
{
trie->insert(oneCharAwayString, word);
}
}
vector<int> result;
// Iterate through the query words
for (string queryWord: queryWords)
{
vector<string> resultForQueryWord;
// find all one character away words for given query word
// check if any of those exist in the trie
vector<string> oneCharAwayStrings = addSpecialCharAtAllIndexes(queryWord);
for (string oneCharAwayString: oneCharAwayStrings)
{
vector<string> nearByWords = trie->search(oneCharAwayString);
for (string s: nearByWords)
resultForQueryWord.push_back(s);
}
result.push_back(resultForQueryWord.size());
}
return result;
}
Solution for problem challenge in C++

Intuition

This problem is a modification of the first problem: spelling correction–based search. Let's modify the problem to a simpler version first. Suppose a list of words and a string is given to us and we're asked to find if the given string exists in the dictionary. In this case, we can insert all the dictionary words into a trie, iterate through all the letters in the strings, and search for the query word in the trie. The above problem requires us to solve something similar, but with the added condition that one letter of the word may not match.

Let's take an example where the word "cat" is present in the list. As per the given conditions, if a user is trying to find any of "aat", "bat", "dat", "eat", "fat", etc., the output should be true in all cases since these words differ in only one letter from the original word “cat.”

This provides an intuition that we can insert all the possible one-letter-away words into the trie and then search for the given query word. However, this way, we would be inserting many more words in the trie than required, consuming much more memory. Alternatively, we can replace each character of the word with a unique special character and then search for the query word. We modify the input query word by replacing each character with a special character and searching for it in the trie. For a word of length WW, we would generate WW new words, but this is much less than the number of words we get by replacing each character with all possible English letters.

For example, when searching for the word fat in the dictionary, we replace the first letter f with a special character _ and insert _at in the trie. Similarly, we insert f_t and fa_.

Now if we need to search for any one-character-away word from bat, we can convert bat into ba_, b_t, and _at. We'll find a match for the string _at, denoting a word in the list that is just one character away from the word bat.

In our trie node, we maintain a list of the words corresponding to the current modified string. For example, the keywords bat, cat, and mat present in the database are associated with the string _at.

Press + to interact
// Definition of a trie node
class TrieNode {
public:
// children array store pointer to TrieNode
// of next 26 alphabets + 1 special character
TrieNode *children[27];
// A flag used to identify if a
// node denotes last character of a word
bool isEndOfWord;
// a vector to store the complete word
// at the last node of each path
// this helps in easy retrieval of words
vector<string> words;
};

Algorithm

The summary of the algorithm is given below.

Step 1: Insertion in the trie

For every word of the datastore, generate a list of words where a special character, _, replaces each character of the word. Insert all these words in the trie. Add an extra pointer in the children's array to handle the special character. Add the complete word string in words array of the last trie node.

Step 2: Searching in the trie

For each query word, generate all possible one-letter-away words, same as in the step above, by replacing each character with _ one by one and creating a new word for each character replacement. Search for these query words in the trie. If the word is found, return the words list associated with the last trie node; else, return an empty array.

Visualization

The picture below represents the insertion and the searching procedure.

Complexity analysis

Variables

  • Number of words in the list = ...

Access this course and 1400+ top-rated courses and projects.