Edit Distance

Get to know the edit distance problem and its solution using dynamic programming.

Edit distance and its applications

The edit distance between two strings is the minimum number of letter insertions, letter deletions, and letter substitutions required to transform one string into the other. For example, the edit distance between FOOD{\color{Red} FOOD} and MONEY{\color{Red} MONEY} is at most four:

FOODMOODMONDMONEDMONEY{\color{Red} \underset{-}{F}OOD } \to {\color{Red}MO\underset{-}{O}D} \to {\color{Red} MON\underset{\wedge}{}D} \to {\color{Red} MONE\underset{-}D} \to {\color{Red} MONEY}

This distance function was independently proposed by Vladimir Levenshtein in 1965 (working on coding theory), Taras Vintsyuk in 1968 (working on speech recognition), and Stanislaw Ulam in 1972 (working with biological sequences). For this reason, edit distance is sometimes called Levenshtein distance or Ulam distance (but strangely, never “Vintsyuk distance”).

We can visualize this editing process by aligning the strings one above the other, with a gap in the first word for each insertion and a gap in the second word for each deletion. Columns with two different characters correspond to substitutions. In this representation, the number of editing steps is just the number of columns that do not contain the same character twice.

FOODMONE.Y\hspace{6cm}{\color{Red} F\hspace{0.3cm} O \hspace{0.3cm}O \hspace{0.87cm} D} \\ \hspace{5.9cm}{\color{Red} M\hspace{0.275cm} O \hspace{0.27cm} N \hspace{0.3cm} \underset{.}E \hspace{0.3cm} Y}

It’s fairly obvious that we can’t transform FOOD{\color{Red} FOOD} into MONEY{\color{Red} MONEY} in three steps, so the edit distance between FOOD{\color{Red} FOOD} and MONEY{\color{Red} MONEY} is exactly four. Unfortunately, it’s not so easy in general to tell when a sequence of edits is as short as possible. For example, the following alignment shows that the distance between the strings ALGORITHM{\color{Red} ALGORITHM} and ALTRUISTIC{\color{Red} ALTRUISTIC} is at most 6. Is that the best we can do?

ALGORITHMALTRUISTIC\hspace{5.94cm}{\color{Red} A\hspace{0.3cm} L \hspace{0.3cm}G \hspace{0.3cm} O \hspace{0.3cm} R \hspace{0.9cm} I \hspace{0.84cm} T \hspace{0.3cm} H \hspace{0.3cm} M} \\ \hspace{5.9cm}{\color{Red} A\hspace{0.275cm} L \hspace{0.94cm} T \hspace{0.3cm} R\hspace{0.3cm} U \hspace{0.3cm} I \hspace{0.3cm} S \hspace{0.3cm} T \hspace{0.35cm} I \hspace{0.45cm} C} ...

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy