...
/Solution Review: The Edit Distance Problem
Solution Review: The Edit Distance Problem
In this lesson, we will look at some solutions to the edit distance problem.
We'll cover the following...
Solution 1: Simple recursion #
def editDistanceRecurse(str1, str2, i, j):if i == len(str1): # base case of reaching the end of str1return len(str2) - jif j == len(str2): # base case of reaching the end of str2return len(str1) - iif str1[i] == str2[j]: # if the characters match, we move aheadreturn editDistanceRecurse(str1, str2, i+1, j+1)# if characters don't matchreturn 1 + min(editDistanceRecurse(str1, str2, i+1, j+1), # we can change characterseditDistanceRecurse(str1, str2, i, j+1), # we can have an insertion in str1 (or skip a character in str1)editDistanceRecurse(str1, str2, i+1, j)) # we can have a deletion in str1 (or skip a character in str2)def editDistance(str1, str2):return editDistanceRecurse(str1, str2, 0, 0)print(editDistance("teh", "the"))
Explanation
The idea behind the algorithm is to align both the sequences so that the fewest number of operations is used. Let’s first see some examples of how alignment helps us convert str1
into str2
.
-
dog
anddodge
can be aligned asdo-g-
anddodge
. And then we can insertd
ande
in both the blanks. -
read
andred
can be aligned asread
andre-d
. A dash in the second string means we may remove the corresponding character in the first string thus convertingread
tored
. -
thr
andthe
can be aligned asthr
andthe
. Here, aligning mismatching characters means that we replace the mismatching character in the first string with the corresponding character in the second string, i.e., replacer
withe
.
Now let’s look at the algorithm we wrote. First, we have the base cases: if we run out of str1
, we will have to insert all the remaining characters of str2
(lines 2-3). Similarly, if we run out of str2
, we will need to delete all the remaining characters from str1
(lines 5-6).
We have two cases: if the current characters match, we are all good and can move forward in both strings (lines 8-9). However, if they do not match we need to check each of the three possibilities and return the one with the minimum cost. The three recursive calls correspond to the change of character (line 11), insertion of the character (line 12), or removal of the character (line 13).
Let’s look at a visualization of this algorithm below.
Time complexity
At each point, we are faced with three options in the worst case. Thus, if we keep in mind the worst case of two strings of ...