Solution: Word Break II
Let's solve the Word Break II problem using the Dynamic Programming pattern.
Statement
You are given a string, s
, and an array of strings, wordDict
, representing a dictionary. Your task is to add spaces to s
to break it up into a sequence of valid words from wordDict
. We are required to return an array of all possible sequences of words (sentences). The order in which the sentences are listed is not significant.
Note: The same dictionary word may be reused multiple times in the segmentation.
Constraints:
-
s.length
-
wordDict.length
-
wordDict[i].length
-
s
andwordDict[i]
consist of only lowercase English letters. -
All the strings of
wordDict
are unique.
Solution
So far, you’ve probably brainstormed some approaches and have an idea of how to solve this problem. Let’s explore some of these approaches and figure out which one to follow based on considerations such as time complexity and any implementation constraints.
Naive approach
The naive approach to solve this problem is to use a traditional recursive strategy in which we take each prefix of the input string, s
, and compare it to each word in the dictionary. If it matches, we take the string’s suffix and repeat the process.
Here is how the algorithm works:
-
Base case: If the string is empty, there are no characters in the string that are left to process, so there’ll be no sentences that can be formed. Hence, we return an empty array.
-
Otherwise, the string will not be empty, so we’ll iterate every word of the dictionary and check whether or not the string starts with the current dictionary word. This ensures that only valid word combinations are considered:
-
If it doesn’t start with the current dictionary word, no valid combinations can be formed from this word, so we move on to the next dictionary word.
-
If it does start with the current dictionary word, we have two options:
-
If the length of the current dictionary word is equal to the length of the string, it means the entire string can be formed from the current dictionary word. In this case, the string
s
is directly added to the result without any further processing. -
Recursive case: Otherwise, the length of the current dictionary word will be less than the length of the string. This means that the string can be broken down further. Therefore, we make a recursive call to evaluate the remaining portion (suffix) of the string.
-
-
We’ll then concatenate the prefix and the result of the suffix computed by the recursive call above and store it in the result.
-
-
After all possible combinations have been explored, we return the result.
The time complexity of this solution is , where is the number of words in the dictionary, is the length of the string, and is the length of the longest word.
The space complexity of this solution is , where is the number of words in the dictionary and is the length of the string.
Optimized approach using dynamic programming
Since the recursive solution to this problem is very costly, let’s see if we can reduce this cost in any way. Dynamic programming helps us avoid recomputing the same subproblems. Therefore, let’s analyze our recursive solution to see if it has the properties needed for conversion to dynamic programming.
-
Optimal substructure: Given an input string ,
s
, that we want to break up into dictionary words, we find the first word that matches a word from the dictionary, and then repeat the process for the remaining, shorter input string. This means that, to solve the problem for input , we need to solve the same problem for , where is at least one character shorter than . Therefore, this problem obeys the optimal substructure property. -
Overlapping subproblems: The algorithm solves the same subproblems repeatedly. Consider input string “ancookbook” and the dictionary [“an”, “book”, “cook”, “cookbook”]. The following is the partial call tree for the naive recursive solution:
"ancookbook" / \ "ancookbook" "cookbook" / \ / \ "cookbook" ... "book" ...
From the tree above, it can be seen that the subproblem “cookbook” is evaluated twice.
To take advantage of these opportunities for optimization, we will use bottom-up dynamic programming, also known as the tabulation approach. This is an iterative method of solving dynamic programming problems. The idea is that if a prefix of the input string matches any word in the dictionary, we can split the string into two parts: the matching word and the suffix of the input string. We start from an empty prefix which is the base case. The prefix would eventually develop into the complete input string.
Here’s how the algorithm works:
-
We initialize an empty lookup table,
dp
, of length, , wheredp[i]
will correspond to the prefix of lengthi
. This table will be used to store the solutions to previously solved subproblems. It will have the following properties:-
The first entry of the table will represent a prefix of length , i.e., an empty string “”.
-
The rest of the entries will represent the other prefixes of the string
s
. For example, the input string “vegan” will have the prefixes “v”, “ve”, “veg”, “vega”, and “vegan”. -
Each entry of the table will contain an array containing the sentences that can be formed from the respective prefix. At this point, all the arrays are empty.
-
-
For the base case, we add an empty string to the array corresponding to the first entry of the
dp
table. This is because the only sentence that can be formed from an empty string is an empty string itself. -
Next, we traverse the input string by breaking it into its prefixes by including a single character, one at a time, in each iteration.
-
For the current prefix, we initialize an array,
temp
, that will store the valid sentences formed from that prefix. Let’s suppose that the input string is “vegan”, and that the current prefix is “vega”. -
For all possible suffixes of the current prefix, we check if the suffix exists in the given dictionary. In our example, this would mean checking the dictionary for the suffixes “vega”, “ega”, “ga”, and “a”. For each suffix, it will either match a dictionary word, or not:
-
If it does, we know that the suffix is a valid word from the dictionary and can be used as part of the solution. Therefore, in the
dp
table, we retrieve all the possible sentences for the prefix to the left of this suffix. Supposing that the current suffix of “vega” is “a”, and that “a” is present in the dictionary, we would retrieve all the sentences already found for “veg”. This means that we reuse the solutions of the subproblem smaller than the current subproblem. Now, we form new sentences for the current prefix by appending a space character and the current suffix (which is a valid dictionary word) to each of the retrieved sentences. Supposing that the valid sentences for the subproblem “veg” are “v eg”, and “ve g”, we will add these new sentences for the current subproblem, “vega”: “veg a”, “v eg a”, and “ve g a”. We add the new sentences to thetemp
array of this prefix. -
If the suffix is not present in the dictionary, no sentences can be made from the current prefix, so the
temp
array of that prefix remains empty.
-
-
We repeat the above steps for all suffixes of the current prefix.
-
We set the entry corresponding to the current prefix in the
dp
table equal to thetemp
array.
-
-
We repeat the steps above for all prefixes of the input string.
-
After all the prefixes have been evaluated, the last entry of the
dp
table will be an array containing all the sentences formed from the largest prefix, i.e., the complete string. Therefore, we return this array.
The slides below illustrate how the algorithm runs: