...

/

Solution: The Edit Distance Problem

Solution: The Edit Distance Problem

This review provides a detailed analysis of the ways to solve the famous edit distance problem.

Solution 1: brute force

def min_edit_dist_recursive(str1, str2, m, n):
"""
Calculates minimum Levenshtein distance operations
:param str1: String 1
:param str2: String 2
:param m: Length of str1
:param n: Length of str2
:return: minimum Levenshtein distance operations
"""
# If first string is empty, the only option is to
# insert all characters of second string into first
if m == 0:
return n
# If second string is empty, the only option is to
# remove all characters of first string
if n == 0:
return m
# If last characters of two strings are same, nothing
# much to do. Ignore last characters and get count for
# remaining strings.
if str1[m - 1] == str2[n - 1]:
return min_edit_dist_recursive(str1, str2, m - 1, n - 1)
# If last characters are not same, consider all three
# operations on last character of first string, recursively
# compute minimum cost for all three operations and take
# minimum of three values.
return 1 + min(min_edit_dist_recursive(str1, str2, m, n - 1), # Insert
min_edit_dist_recursive(str1, str2, m - 1, n), # Remove
min_edit_dist_recursive(str1, str2, m - 1, n - 1) # Replace
)
def min_edit_dist(str1, str2):
"""
Calculates minimum Levenshtein distance operations
:param str1: String 1
:param str2: String 2
:return: minimum Levenshtein distance operations
"""
return min_edit_dist_recursive(str1, str2, len(str1), len(str2))
# Driver code to test the above function
if __name__ == '__main__':
str1 = "sunday"
str2 = "saturday"
print(min_edit_dist(str1, str2))

Explanation

We process all the characters of each string, one-by-one. There are two possibilities for every pair of characters being considered:

  • The end characters of both the strings match, in which case, the lengths of both the strings are reduced by one and the function is called recursively on the remaining strings

  • The end characters of both the strings do not match, in which case, all three operations (insertion, deletion, and substitution) are carried out on the last character of the first string

  • The minimum cost for all three operations is computed recursively and returned

Time complexity

The time complexity of this code is exponential, i.e. O(3n)O(3^n) ...

Access this course and 1400+ top-rated courses and projects.