Longest Common Subsequence

Learn to use dynamic programming in string-based problems.

Problem statement

Given two sequences, find the length of the longest subsequence present in both of them.

A subsequence is a sequence that appears in the same relative order but is not necessarily contiguous. For example, abc, abg, bdf, aeg, acefg…etc., are subsequences of abcdefg. So a string of length n has 2n{2^{n}} different possible subsequences.

Solution: Naïve method

Let X be a sequence of length m and Y be a sequence of length n. Check if every subsequence of X is a subsequence of Y, and return the longest common subsequence found.

There are 2m{2^{m}} subsequences of X. To check whether or not the sequences are a subsequence of Y takes O(n) time. Thus, the naïve algorithm would take O(n.2m{2^{m}} ) time.

Let us consider an example to understand this problem better.

  • LCS for input Sequences ABCDGH and AEDFHR is ADH of length 3.

  • LCS for input Sequences AGGTAB and GXTXAYB is GTAB of length 4.

The naïve approach discussed above takes exponential time, which is why we need to find a way to optimize the solution. This problem can be solved using the dynamic programming approach (using Tabulation or Memoization).

Solution: Dynamic programming

Before moving on to implementation, let’s first look at the recurrence relation.

  • If m = 0 or n = 0, then LCS(str1, str2, m, n) = 0. (Base case)
  • If the last two characters of both strings are the same, i.e., if str1[m] = str2[n], then LCS(str1, str2, m, n) = 1 + LCS(str1, str2, m-1, n-1).
  • Otherwise, LCS(str1, str2, m, n) = max{LCS(str1, str2, m-1, n), LCS( str1, str2, m, n-1)}
  • LCS can take a value between 0 and min(m,n).

Finally, let’s look at how we will calculate the LCS using the slideshow shown below:

Level up your interview prep. Join Educative to access 80+ hands-on prep courses.