Solution: Index Pairs of a String
Let’s solve the Index Pairs of a String problem using the Trie pattern.
We'll cover the following
Statement
Given a string text
and an array of strings words
, return a list of all index pairs [i, j]
such that the substring text[i...j]
is present in words
.
Return the pairs [i, j]
in sorted order, first by the value of i
, and if two pairs have the same i
, by the value of j
.
Constraints:
text.length
words.length
words[i].length
text
andwords[i]
consist of lowercase English letters.All the strings of
words
are unique.
Solution
The algorithm uses a trie to find all pairs of start and end indexes for substrings in a given text that match words from a list. First, it builds the trie by inserting each word from the list. Then, it iterates over each starting index in the text to match substrings using the trie. For each character sequence, it checks if the current character exists in the trie and traverses accordingly. If a word’s end is found (marked by a flag), the start and end indexes of the matched substring are recorded in the result. This method optimizes substring searching using the trie structure to avoid redundant checks and efficiently match multiple words in the text.
TrieNode and Trie Class
TrieNode: This represents a single character in the trie with two components:
is_end_of_word
(a boolean indicating if the node marks the end of a word) and achildren
dictionary, where the keys are characters, and the values are TrieNode objects representing the next possible characters in the word.Trie: This contains a
root
node and a method for inserting words. Theinsert
method adds each word character by character, creating new TrieNodes as needed. When the last character of any word is inserted,is_end_of_word
is set toTrue
.
The algorithm to solve this problem is as follows:
Insert each word from the list into the trie. Each character is added as a node, and
is_end_of_word
is set to mark the end of a word.Loop through each character in
text
(starting at indexi
). For each starting index, try to find substrings that match words in the trie by traversing them.For each character at position
i
, the algorithm begins traversing the trie from the root node. It then checks whether each subsequent character (from indexi
toj
) is a child node in the trie. If the character is found, the traversal continues to the next character. A valid match has been found if the current node in the Trie marks the end of a word (i.e.,is_end_of_word
isTrue
). In that case, the index pair[i, j]
is recorded, wherei
is the start index, andj
is the end index of the matched word.After checking all starting indexes, return the list of index pairs representing matched words’ start and end positions.
Let’s look at the following illustration to get a better understanding of the solution:
Level up your interview prep. Join Educative to access 70+ hands-on prep courses.