Solution: Edit Distance
Solutions for the Edit Distance Problem.
We'll cover the following...
Solution 1: Recursive algorithm
An alignment of two strings is a two-row matrix such that the first (or second) row contains the ordered symbols of the first (or second) string, interspersed with the space symbols () in such a way that no two space symbols appear in the same column.
Exercise break: Compute the number of different alignments of two strings of lengths and .
We classify the columns of an alignment as follows:
- A column with a symbol and a space is a .
- A column with a gap and a symbol is an .
- A column with two equal symbols is a .
- A column with two different symbols is a .
An alignment is optimal if it minimizes the total number of mismatches, deletions, and insertions among all possible alignments.
Exercise break: Prove that the Edit Distance Problem can be reduced to finding an optimal alignment of two strings.
In the example above, the last column represents an insertion. By dropping this column, we get an optimal alignment of the first string and a prefix of the second string. Here is an idea—let’s compute the edit distance between each pair of prefixes of two strings.
Given strings and , we consider their prefixes and of length and and denote the edit distance between them as . The last column of an optimal alignment of and is either an , a , a , or a .
Then,
...