Shortest Common Supersequence
Let's solve the Shortest Common Supersequence problem using Dynamic Programming.
Statement
Given two strings, str1
and str2
, find the length of the shortest string that has both the input strings as subsequences.
Note: A string is a subsequence of string if deleting some number of characters from results in the string .
Let’s assume that we have two strings as follows:
str1 = "yabc"
str2 = "xabc"
There can be multiple strings that have str1
and str2
as subsequences, for example, "xyaabcc"
and "xyabbc"
, but the shortest string that has these input strings as subsequences is "xyabc"
. Please note that the sequence of alphabets in the string can be altered.
Constraints:
-
str1.length
,str2.length
str1
andstr2
consist of lowercase English letters.
Examples
No. | str1 | str2 | Length of Shortest Common Supersequence |
1 | "bcf" | "bdcf" | 4 |
2 | "dynamic" | "programming" | 15 |
Try it yourself
Implement your solution in the following playground.
def shortest_common_supersequence(str1, str2):# your code will replace the placeholder return statement belowreturn -1
Note: If you clicked the “Submit” button and the code timed out, this means that your solution needs to be optimized.
Hint: Use dynamic programming and see the magic.
Solution
We will first explore the naive recursive solution to this problem and then see how it can be improved using the Longest Common Substring dynamic programming pattern.
Naive approach
A naive approach is to try all the supersequences, one character at a time.
The two changing parameters in our recursive function are the two indices, i1
and i2
which are incremented based on the following two conditions:
- If the characters at
i1
andi2
are a match, skip one character from both the sequences and make a recursive call for the remaining lengths. - If the characters do not match, call the function recursively twice by skipping one character from each string. Return the minimum result of these two calls.
Let’s implement the algorithm as discussed above:
def shortest_common_supersequence_recursive(str1, str2, i1, i2):# if any of the pointers has iterated over all the characters of the string,# return the remaining length of the other stringif i1 == len(str1):return len(str2) - i2if i2 == len(str2):return len(str1) - i1# if both the characters pointed by i1 and i2 are same, increment both pointersif str1[i1] == str2[i2]:return 1 + shortest_common_supersequence_recursive(str1, str2, i1 + 1, i2 + 1)# recursively call the function twice by skipping one character from each stringlength1 = 1 + shortest_common_supersequence_recursive(str1, str2, i1, i2 + 1)length2 = 1 + shortest_common_supersequence_recursive(str1, str2, i1 + 1, i2)# return the minimum of the two lengthsreturn min(length1, length2)def shortest_common_supersequence(str1, str2):return shortest_common_supersequence_recursive(str1, str2, 0, 0)# driver codedef main():input_1_strings = ["abcd", "educativeisfun", "cpprocks", "abcf", "dynamic"]input_2_strings = ["efgh", "algorithmsarefun", "cppisfun", "bdcf", "programming"]# you can uncomment the two lines below and check how this recursive solution causes a time-out# input_1_strings.append("iloveprogrammingbutprogrammingdoesnotloveme")# input_2_strings.append("computersarefastprogrammerskeepthemslow")for i in range(len(input_1_strings)):print(i+1, ".", "\tString 1: ", input_1_strings[i], sep="")print("\tString 2: ", input_2_strings[i], sep="")length = shortest_common_supersequence(input_1_strings[i], input_2_strings[i])print("\tLength of Shortest Common Supersequence: ", length, sep="")print("-" * 100)if __name__ == '__main__':main()
Note: Please observe that if you include the test case commented out in the driver code, the solution is likely to time out. Try to solve the larger problem using the dynamic programming solutions provided below and see the difference.
Time complexity
The time complexity of the naive approach is exponential ...