...

/

Solution Review: The Edit Distance Problem

Solution Review: The Edit Distance Problem

In this lesson, we will look at some solutions to the edit distance problem.

Solution 1: Simple recursion #

Press + to interact
def editDistanceRecurse(str1, str2, i, j):
if i == len(str1): # base case of reaching the end of str1
return len(str2) - j
if j == len(str2): # base case of reaching the end of str2
return len(str1) - i
if str1[i] == str2[j]: # if the characters match, we move ahead
return editDistanceRecurse(str1, str2, i+1, j+1)
# if characters don't match
return 1 + min(editDistanceRecurse(str1, str2, i+1, j+1), # we can change characters
editDistanceRecurse(str1, str2, i, j+1), # we can have an insertion in str1 (or skip a character in str1)
editDistanceRecurse(str1, str2, i+1, j)) # we can have a deletion in str1 (or skip a character in str2)
def editDistance(str1, str2):
return editDistanceRecurse(str1, str2, 0, 0)
print(editDistance("teh", "the"))

Explanation

The idea behind the algorithm is to align both the sequences so that the fewest number of operations is used. Let’s first see some examples of how alignment helps us convert str1 into str2.

  • dog and dodge can be aligned as do-g- and dodge. And then we can insert d and e in both the blanks.

  • read and red can be aligned as read and re-d. A dash in the second string means we may remove the corresponding character in the first string thus converting read to red.

  • thr and the can be aligned as thr and the. Here, aligning mismatching characters means that we replace the mismatching character in the first string with the corresponding character in the second string, i.e., replace r with e.

Now let’s look at the algorithm we wrote. First, we have the base cases: if we run out of str1, we will have to insert all the remaining characters of str2(lines 2-3). Similarly, if we run out of str2, we will need to delete all the remaining characters from str1 (lines 5-6).

We have two cases: if the current characters match, we are all good and can move forward in both strings (lines 8-9). However, if they do not match we need to check each of the three possibilities and return the one with the minimum cost. The three recursive calls correspond to the change of character (line 11), insertion of the character (line 12), or removal of the character (line 13).

Let’s look at a visualization of this algorithm below.

Time complexity

At each point, we are faced with three options in the worst case. Thus, if we keep in mind the worst case of two strings of ...