Optimized Prefix and Suffix Search
Solve a hard-level problem of finding the words that start with a given prefix and end with a given suffix optimally using tries.
Problem statement
Given a list of words and two strings pre
and suff
, return the index of the word with the string pre
as its prefix and the string suff
as its suffix. If there are multiple valid answers, then return the bigger index. Solve this problem using a single trie.
Example 1
Sample input
words: ["app", "apple", "apply","applicable"]pref: "a"suff: "y"
Sample output
2
Explanation
The word at index 2, "apply"
, has prefix as "a"
and suffix as "y"
.
Example 2
Sample input
words: ["ape", "ace", "ate"]pre: "a"suff: "e"
Sample output
3
Explanation
The word at indices 0, 1, and 2 ("ape"
, "ace"
and "ate"
) have the prefix "a"
and suffix "e"
, but the larger index 2 is returned as output.
Try it yourself
Try to solve the problem yourself before reading the solution.
#include <iostream>#include <vector>using namespace std;int findWordsWithPrefAndSuff(vector<string>& words, string pref, string suff) {// your code goes herereturn -1;}
Intuition
We just need basic trie operations of insertion and search to solve this problem. We can create two separate tries for prefix and suffix searching. But in this problem, we've been constrained to use only a single trie. It implies that we need to search for both prefixes and suffixes in a single trie. To enable the search for a prefix-suffix combination in a trie, we need to ensure that the prefix-suffix combinations are also inserted in the trie.
We generate all possible suffixes of that word and append them ahead of the word, separated by a special character {
. You can use any special character as a separator. The reason for choosing opening braces is that we don’t need to take care of a special character index separately, and it can be included as the 26
th
index in the trie node children
array itself. The value of int ch = '{' - 'a' = 26
.
For example, for the word apple
, the combinations inserted in the trie would be e{apple
, le{apple
, ple{apple
, pple{apple
and apple{apple
. When a search query comes up, we just concatenate the suffix with a prefix separated by a {
and then simply search the trie. If the prefix is ap
and the suffix is e
, then the query string is suffix query + '{' + prefix query = "e{ap"
.
In case of multiple valid answers, we have to resolve the tie and return the word with the larger index as an answer. To get the most recent word (with the largest index) out of many possible options, we'll maintain an idx
variable for each node of the trie, which will get updated if a newer word arrives in that path. This way we can return the most recent word and hence the largest index.
// Definition of TrieNodeclass TrieNode{public:// children array stores pointers to TrieNode of next 26 alphabets// 1 special character "{" used as a seperatorTrieNode *children[27];// An integer to store the largest index// of the words array with current prefix-suffix combinationint endIdx;};
Algorithm
Let's divide the problem into two smaller parts: inserting and searching in the trie.
Step 1: Create combinations and insert them into a trie
Iterate through all the words given in the list. For every word, generate all the possible suffixes and concatenate them with a special character and the word—that is, suffix + { + word
. Insert all these combinations into a trie. While inserting the combination in the trie, update the idx
value of all the trie nodes in the path to the array index of the current word being inserted.
Step 2: Searching in a trie
All the possible suffix-prefix combinations are already present in the trie. To find the largest array index with a given prefix and suffix, we concatenate the suffix and prefix separated by the special character {
. This is done to ensure that the query string matches the format in which the data was inserted in the trie. Once the query string (suffix + { + prefix)
is created, iterate over the query string characters, search them in the trie, and return the idx
value associated with the last trie node in the search path.
Visualization
The illustration below represents the optimized prefix and suffix searching procedure.
Complexity analysis
Variables
Number of words in the list =
...