What are fuzzy search algorithms?

Overview

Multiple algorithms are used in fuzzy search systems. Some of the most common algorithms are the following:

  • Hamming distance
  • Levenshtein/edit distance
  • Damerau–Levenshtein distance

Hamming distance

Hamming distance is an algorithm to find the distance of the actual user query (string) from strings of length equal to the actual query length. It is measured by the number of substitute operations required to convert the actual query to another string of equal length. 

For example, to convert "burn" into "turn", we need to substitute 't' for 'b'. Other characters in both strings are the same. As we substituted only one character, the hamming distance here is 1. Similarly, the hamming distance between "help" and "heal" is calculated to be 2.

Levenshtein/edit distance

Levenshtein/edit distance is used to find the distance between two strings of unequal length. It is measured by the number of insertion, deletion, or substitution operations required to convert the actual query (string) to another string.

For example, the edit distance between "close" and "open" is 4 . There are 2 remove operations for 'c' and 'l', one substitute operation where we put 'p' in place of 's', and one insertion operation to add 'n' at the end of the string.

Damerau–Levenshtein distance

Damerau–Levenshtein distance is an extended form of Levenshtein distance, where we also include transposition operation. It is measured by the number of insertions, deletions, substitutions, and transpositions needed to convert the actual query to another string.

For example, to convert "educative" to "activate", we need to do the following:

  • Remove the first three characters 'e', 'd', and 'u' from "educative". After removal, the string is now "cative".
  • Swap 'a' and 'c', and the string becomes "active".
  • Substitute 'a' in place of 'e'. The string becomes "activa".
  • Insert two characters 't' and 'e' at the end of the string to finally get the "activate".

We performed 3 delete operations, 1 transposition, 1 substitution, and 2 insertions. The total number of operations is 7, which is the Damerau–Levenshtein distance between "educative" and "activate".

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved