Spelling Correction-Based Search

Solve a medium-level problem of querying one-character-away strings from search engines using tries.

Problem statement

Design and implement a search system that returns one-character-away suggestions. Strings that differ in only one letter are one-character-away strings. For example, kit and kat differ only in the second position. Below is a list of unique words present in the datastore and an array of string queries. Return one character away from strings present in the datastore for all the string queries.

Example

Sample input

words: ["bit", "bat", "cat", "cut", "but","sip","sat"]
queryWord: ["sit", "mat", "mut"]

Sample output

[["bit","sat","sip"],["bat","cat","sat"],["cut","but"]]

Explanation

"bit", "sat" and "sip" are one character away from "sit".
"bat", "cat" and "sat" are one character away from "mat".
"cut" and "but" are one character away from "mut".

Try it yourself

Try to solve the problem yourself before reading the solution.

#include <iostream>
#include <vector>
using namespace std;
vector<vector<string>> queryMatchingWords(vector<string> words, vector<string> queryWords){
// your code goes here
vector<vector<string>> result;
return result;
}
Solve spelling correction based search in C++

Intuition

Let's modify the problem to a simpler version first. Suppose a list of words and a string is given to us, and we're asked to find if the given string exists in the dictionary or not. In this case, we can insert all the dictionary words into a trie, iterate through all the letters in the strings, and search for the query word in the trie. The above problem requires us to solve something similar, but there's a condition that one letter of the word might not match.

Let's take the example where the word cat is present in the list. As per the given conditions, if a user is trying to find any of aat, bat, dat, eat, fat, etc., the output should be true in all these cases as these words differ only in a letter from the original word cat.

This provides an intuition that we can insert all the possible one-letter-away words into the trie and then search for the given query word. However, this way we'd insert many words in the trie, consuming a lot of memory. Alternatively, we can replace each character of the word with a unique special character and then search for the query word. We modify the input query word by replacing each character with a special character and searching for it in the trie. For a word of length WW, we would generate WW new words, but this is much less than the number of words we get by replacing each character with all possible English letters.

For example, when searching for the word fat in the dictionary, we replace the first letter f with a special character _ and insert _at in the trie. Similarly, we insert f_t and fa_.

Now if we need to search for any one-character-away the word from bat we can convert bat into ba_, b_t, _at. We'll find a match for the string _at denoting a word in the list that is just one character away from the word bat.

In our trie node, we maintain a list of the words corresponding to the current modified string. For example, the keywords bat, cat, mat present in the database are associated with the string _at.

// Definition of a trie node
class TrieNode
{
public:
// children array store pointer to TrieNode
// of next 26 alphabets + 1 special character
TrieNode *children[27];
// A flag used to identify if a
// node denotes last character of a word
bool isEndOfWord;
// a vector to store the complete word
// at the last node of each path
// this helps in easy retrieval of words
vector<string> words;
};
Definition of a trie node in C++

Algorithm

The summary of the algorithm is given below.

Step 1: Insertion in the trie

For every word of the datastore, generate a list of words where a special character, _, replaces each character of the word. Insert all these words in the trie. Add an extra pointer in the children's array to handle the special character. Finally, add the complete word string in words array of the last trie node.

Step 2: Searching in the trie

For each query word, generate all possible one-letter-away words, same as in the step above, by replacing each character with _ one by one and creating a new word for each character replacement. Then, search for these query words in the trie. If the word is found, return the words list associated with the last trie node; else, return an empty array.

Visuali

...
Access this course and 1400+ top-rated courses and projects.