Spelling Correction-Based Search
Solve a medium-level problem of querying one-character-away strings from search engines using tries.
Problem statement
Design and implement a search system that returns one-character-away suggestions. Strings that differ in only one letter are one-character-away strings. For example, kit
and kat
differ only in the second position. Below is a list of unique words present in the datastore and an array of string queries. Return one character away from strings present in the datastore for all the string queries.
Example
Sample input
words: ["bit", "bat", "cat", "cut", "but","sip","sat"]queryWord: ["sit", "mat", "mut"]
Sample output
[["bit","sat","sip"],["bat","cat","sat"],["cut","but"]]
Explanation
"bit"
, "sat"
and "sip"
are one character away from "sit"
."bat"
, "cat"
and "sat"
are one character away from "mat"
."cut"
and "but"
are one character away from "mut"
.
Try it yourself
Try to solve the problem yourself before reading the solution.
#include <iostream>#include <vector>using namespace std;vector<vector<string>> queryMatchingWords(vector<string> words, vector<string> queryWords){// your code goes herevector<vector<string>> result;return result;}
Intuition
Let's modify the problem to a simpler version first. Suppose a list of words and a string is given to us, and we're asked to find if the given string exists in the dictionary or not. In this case, we can insert all the dictionary words into a trie, iterate through all the letters in the strings, and search for the query word in the trie. The above problem requires us to solve something similar, but there's a condition that one letter of the word might not match.
Let's take the example where the word cat
is present in the list. As per the given conditions, if a user is trying to find any of aat
, bat
, dat
, eat
, fat
, etc., the output should be true in all these cases as these words differ only in a letter from the original word cat
.
This provides an intuition that we can insert all the possible one-letter-away words into the trie and then search for the given query word. However, this way we'd insert many words in the trie, consuming a lot of memory. Alternatively, we can replace each character of the word with a unique special character and then search for the query word. We modify the input query word by replacing each character with a special character and searching for it in the trie. For a word of length
For example, when searching for the word fat
in the dictionary, we replace the first letter f
with a special character _
and insert _at
in the trie. Similarly, we insert f_t
and fa_
.
Now if we need to search for any one-character-away the word from bat
we can convert bat
into ba_
, b_t
, _at
. We'll find a match for the string _at
denoting a word in the list that is just one character away from the word bat
.
In our trie node, we maintain a list of the words corresponding to the current modified string. For example, the keywords bat
, cat
, mat
present in the database are associated with the string _at
.
// Definition of a trie nodeclass TrieNode{public:// children array store pointer to TrieNode// of next 26 alphabets + 1 special characterTrieNode *children[27];// A flag used to identify if a// node denotes last character of a wordbool isEndOfWord;// a vector to store the complete word// at the last node of each path// this helps in easy retrieval of wordsvector<string> words;};
Algorithm
The summary of the algorithm is given below.
Step 1: Insertion in the trie
For every word of the datastore, generate a list of words where a special character, _
, replaces each character of the word. Insert all these words in the trie. Add an extra pointer in the children's array to handle the special character. Finally, add the complete word string in words
array of the last trie node.
Step 2: Searching in the trie
For each query word, generate all possible one-letter-away words, same as in the step above, by replacing each character with _
one by one and creating a new word for each character replacement. Then, search for these query words in the trie. If the word is found, return the words
list associated with the last trie node; else, return an empty array.