Solution: Edit Distance

Solutions for the Edit Distance Problem.

Solution 1: Recursive algorithm

An alignment of two strings is a two-row matrix such that the first (or second) row contains the ordered symbols of the first (or second) string, interspersed with the space symbols () in such a way that no two space symbols appear in the same column.

Exercise break: Compute the number of different alignments of two strings of lengths nn and mm.

We classify the columns of an alignment as follows:

  • A column with a symbol and a space is a deletion\color{green}\text{deletion}.
  • A column with a gap and a symbol is an insertion\color{blue}\text{insertion}.
  • A column with two equal symbols is a match\color{orange}\text{match}.
  • A column with two different symbols is a mismatch\color{purple}\text{mismatch}.

An alignment is optimal if it minimizes the total number of mismatches, deletions, and insertions among all possible alignments.

Exercise break: Prove that the Edit Distance Problem can be reduced to finding an optimal alignment of two strings.

In the example above, the last column represents an insertion. By dropping this column, we get an optimal alignment of the first string and a prefix of the second string. Here is an idea—let’s compute the edit distance between each pair of prefixes of two strings.

Given strings A[1n]A[1\ldots n] and B[1m]B[1\ldots m], we consider their prefixes A[1i]A[1\dots i] and B[1j]B[1\ldots j] of length ii and jj and denote the edit distance between them as EditDistance(i,j)EditDistance(i,j). The last column of an optimal alignment of A[1i]A[1\ldots i] and B[1j]B[1\ldots j] is either an insertion\color{blue}\text{insertion}, a deletion\color{green}\text{deletion}, a mismatch\color{purple}\text{mismatch}, or a match\color{orange}\text{match}.

Then,

EditDistance(i,j)=min{EditDistance(i,j1)+1EditDistance(i1,j)+1EditDistance(i1,j1)+1if   A[i]B[j]EditDistance(i1,j1)if   A[i]=B[j]EditDistance(i,j) = min \begin{cases} \color{blue}EditDistance(i,j-1)+1 \\ \color{green}EditDistance(i-1,j)+1 \\ \color{purple}EditDistance(i-1,j-1)+1 & \text{if }~~ A[i]\neq B[j]\\ \color{orange}EditDistance(i-1,j-1) & \text{if }~~ A[i]= B[j]\\ \end{cases} ...