Solution: Index Pairs of a String

Let’s solve the Index Pairs of a String problem using the Trie pattern.

Statement

Given a string text and an array of strings words, return a list of all index pairs [i, j] such that the substring text[i...j] is present in words.

Return the pairs [i, j] in sorted order, first by the value of i, and if two pairs have the same i, by the value of j.

Constraints:

  • 11 \leq text.length 100\leq 100

  • 11 \leq words.length 20\leq 20

  • 11 \leq words[i].length 50\leq 50

  • text and words[i] consist of lowercase English letters.

  • All the strings of words are unique.

Solution

The algorithm uses a trie to find all pairs of start and end indexes for substrings in a given text that match words from a list. First, it builds the trie by inserting each word from the list. Then, it iterates over each starting index in the text to match substrings using the trie. For each character sequence, it checks if the current character exists in the trie and traverses accordingly. If a word’s end is found (marked by a flag), the start and end indexes of the matched substring are recorded in the result. This method optimizes substring searching using the trie structure to avoid redundant checks and efficiently match multiple words in the text.

TrieNode and Trie Class

  • TrieNode: This represents a single character in the trie with two components: is_end_of_word (a boolean indicating if the node marks the end of a word) and a children dictionary, where the keys are characters, and the values are TrieNode objects representing the next possible characters in the word.

  • Trie: This contains a root node and a method for inserting words. The insert method adds each word character by character, creating new TrieNodes as needed. When the last character of any word is inserted, is_end_of_word is set to True.

The algorithm to solve this problem is as follows:

  1. Insert each word from the list into the trie. Each character is added as a node, and is_end_of_word is set to mark the end of a word.

  2. Loop through each character in text (starting at index i). For each starting index, try to find substrings that match words in the trie by traversing them.

  3. For each character at position i, the algorithm begins traversing the trie from the root node. It then checks whether each subsequent character (from index i to j) is a child node in the trie. If the character is found, the traversal continues to the next character. A valid match has been found if the current node in the Trie marks the end of a word (i.e., is_end_of_word is True). In that case, the index pair [i, j] is recorded, where i is the start index, and j is the end index of the matched word.

  4. After checking all starting indexes, return the list of index pairs representing matched words’ start and end positions.

Let’s look at the following illustration to get a better understanding of the solution:

Level up your interview prep. Join Educative to access 80+ hands-on prep courses.