Optimized Prefix and Suffix Search

Solve a hard-level problem of finding the words that start with a given prefix and end with a given suffix optimally using tries.

Problem statement

Given a list of words and two strings pre and suff, return the index of the word with the string pre as its prefix and the string suff as its suffix. If there are multiple valid answers, then return the bigger index. Solve this problem using a single trie.

Example 1

Sample input

words: ["app", "apple", "apply","applicable"]
pref: "a"
suff: "y"

Sample output

2

Explanation

The word at index 2, "apply", has prefix as "a" and suffix as "y".

Example 2

Sample input

words: ["ape", "ace", "ate"]
pre: "a"
suff: "e"

Sample output

3

Explanation

The word at indices 0, 1, and 2 ("ape", "ace" and "ate") have the prefix "a" and suffix "e", but the larger index 2 is returned as output.

Try it yourself

Try to solve the problem yourself before reading the solution.

#include <iostream>
#include <vector>
using namespace std;
int findWordsWithPrefAndSuff(vector<string>& words, string pref, string suff) {
// your code goes here
return -1;
}

Intuition

We just need basic trie operations of insertion and search to solve this problem. We can create two separate tries for prefix and suffix searching. But in this problem, we've been constrained to use only a single trie. It implies that we need to search for both prefixes and suffixes in a single trie. To enable the search for a prefix-suffix combination in a trie, we need to ensure that the prefix-suffix combinations are also inserted in the trie.

We generate all possible suffixes of that word and append them ahead of the word, separated by a special character {. You can use any special character as a separator. The reason for choosing opening braces is that we don’t need to take care of a special character index separately, and it can be included as the 26th index in the trie node children array itself. The value of int ch = '{' - 'a' = 26.

For example, for the word apple, the combinations inserted in the trie would be e{apple, le{apple, ple{apple, pple{apple and apple{apple. When a search query comes up, we just concatenate the suffix with a prefix separated by a { and then simply search the trie. If the prefix is ap and the suffix is e, then the query string is suffix query + '{' + prefix query = "e{ap".

In case of multiple valid answers, we have to resolve the tie and return the word with the larger index as an answer. To get the most recent word (with the largest index) out of many possible options, we'll maintain an idx variable for each node of the trie, which will get updated if a newer word arrives in that path. This way we can return the most recent word and hence the largest index.

// Definition of TrieNode
class TrieNode
{
public:
// children array stores pointers to TrieNode of next 26 alphabets
// 1 special character "{" used as a seperator
TrieNode *children[27];
// An integer to store the largest index
// of the words array with current prefix-suffix combination
int endIdx;
};
Definition of a trie node in C++

Algorithm

Let's divide the problem into two smaller parts: inserting and searching in the trie.

Step 1: Create combinations and insert them into a trie

Iterate through all the words given in the list. For every word, generate all the possible suffixes and concatenate them with a special character and the word—that is, suffix + { + word. Insert all these combinations into a trie. While inserting the combination in the trie, update the idx value of all the trie nodes in the path to the array index of the current word being inserted.

Step 2: Searching in a trie

All the possible suffix-prefix combinations are already present in the trie. To find the largest array index with a given prefix and suffix, we concatenate the suffix and prefix separated by the special character {. This is done to ensure that the query string matches the format in which the data was inserted in the trie. Once the query string (suffix + { + prefix) is created, iterate over the query string characters, search them in the trie, and return the idx value associated with the last trie node in the search path.

Visualization

The illustration below represents the optimized prefix and suffix searching procedure.

Complexity analysis

Variables

  • Number of words in the list =NN ...

Access this course and 1400+ top-rated courses and projects.