Longest Common Subsequence
Learn to use dynamic programming in string-based problems.
We'll cover the following
Problem statement
Given two sequences, find the length of the longest subsequence present in both of them.
A subsequence is a sequence that appears in the same relative order but is not necessarily contiguous.
For example, abc, abg, bdf, aeg, acefg…etc., are subsequences of abcdefg. So a string of length n
has different possible subsequences.
Solution: Naïve method
Let X be a sequence of length m
and Y be a sequence of length n
. Check if every subsequence of X is a subsequence of Y, and return the longest common subsequence found.
There are subsequences of X. To check whether or not the sequences are a subsequence of Y takes O(n) time. Thus, the naïve algorithm would take O(n. ) time.
Let us consider an example to understand this problem better.
-
LCS for input Sequences ABCDGH and AEDFHR is ADH of length 3.
-
LCS for input Sequences AGGTAB and GXTXAYB is GTAB of length 4.
The naïve approach discussed above takes exponential time, which is why we need to find a way to optimize the solution. This problem can be solved using the dynamic programming approach (using Tabulation or Memoization).
Solution: Dynamic programming
Before moving on to implementation, let’s first look at the recurrence relation.
- If m = 0 or n = 0, then LCS(str1, str2, m, n) = 0. (Base case)
- If the last two characters of both strings are the same, i.e., if str1[m] = str2[n], then LCS(str1, str2, m, n) = 1 + LCS(str1, str2, m-1, n-1).
- Otherwise, LCS(str1, str2, m, n) = max{LCS(str1, str2, m-1, n), LCS( str1, str2, m, n-1)}
- LCS can take a value between 0 and
min(m,n)
.
Finally, let’s look at how we will calculate the LCS using the slideshow shown below:
Level up your interview prep. Join Educative to access 80+ hands-on prep courses.