...

/

Solving the Longest Common Substring Problem

Solving the Longest Common Substring Problem

Let's solve the Longest Common Substring problem using Dynamic Programming.

Statement

Given two strings s1 and s2, you have to find the length of the Longest Common Substring (LCS) in both these strings.

Let’s say we have two strings, “helloworld” and “yelloword”, there are multiple common substrings, such as “llo”, “ello”, “ellowor”, “low”, and “d”. The longest common substring is “ellowor”, with length 7.

Constraints:

  • 11 \leq s1.length , s2.length \leq 500500

  • s1 and s2 consist of only lowercase English characters.

  • Spaces are not allowed.

Examples

No.

s1

s2

Length of LCS

1

helloworld

helloold

5

2

educativeexplore

educativeexpress

12

3

educativeisfun

algorithmsarefun

3

Try it yourself

Implement your solution in the following playground.

Press + to interact
Python
usercode > main.py
def lcs_length(s1, s2):
# Write Your Code Here
return 0
Longest Common Substring

Note: If you clicked the “Submit” button and the code timed out, this means that your solution needs to be optimized in terms of running time.

Hint: Use dynamic programming and see the magic.

Solution

We will first explore the naive recursive solution to this problem and then see how it can be improved using dynamic programming.

Naive approach

Let’s look at this problem on a smaller scale. We are comparing each character one by one with the other string. There can be three possibilities for the ith^{th} character of s1 and the jth^{th} character of s2:

  1. If both characters match, these could be part of a common substring, meaning we should count this character towards the length of the longest common substring.

  2. ith^{th}character of s1 might match with (j+1)th^{th}character of s2.

  3. jth^{th}character of s2 might match with (i+1)th^{th}character of s1.

Therefore, we take the maximum among all three of these possibilities. Look at the following visualization.

Let’s implement the algorithm as discussed above:

def lcs_length_rec(s1, s2, i, j, count):
# base case of when either string has been exhausted
if i >= len(s1) or j >= len(s2):
return count
# if i and j characters match, increment the count and compare the rest of the strings
if s1[i] == s2[j]:
count = lcs_length_rec(s1, s2, i+1, j+1, count+1)
# compare s1[i:] with s2, s1 with s2[j:], and take max of current count and these two results
return max(count, lcs_length_rec(s1, s2, i+1, j, 0), lcs_length_rec(s1, s2, i, j+1, 0))
def lcs_length(s1, s2):
return lcs_length_rec(s1, s2, 0, 0, 0)
# Driver code
def main():
s1 = ["educative", "bcdcdcd", "arefun", "yourocks", "abc"]
s2 = ["education", "aacdcdcd", "isfun", "youawesome", "def"]
# You can uncomment the two lines below and check how this recursive solution causes a time-out
#s1.append("ypzrvyigwdiqrnbglvviozqzruvmwivgvqvrfhqi")
#s2.append("wdiqrnbglvviozqzruvmwivgvqvrfhqiypzrvyigwdiqrn")
for i in range(0, len(s1)):
print(i+1, ".\tString 1: \"",s1[i], "\" \n\tString 2: \"",s2[i], "\"", sep="")
result = lcs_length(s1[i], s2[i])
print("\n\tThe Length of Longest Common Substring is: ", result, sep="")
print("-"*100, "\n", sep="")
if __name__ == '__main__':
main()
Longest Common Substring using recursion

Note: Please observe that if you include the test case commented out in the driver code, the solution is likely to time out. Try to solve the larger problem using the dynamic programming solutions provided ...