Multiple algorithms are used in fuzzy search systems. Some of the most common algorithms are the following:
Hamming distance is an algorithm to find the distance of the actual user query (string) from strings of length equal to the actual query length. It is measured by the number of substitute operations required to convert the actual query to another string of equal length.
For example, to convert "burn" into "turn", we need to substitute 't' for 'b'. Other characters in both strings are the same. As we substituted only one character, the hamming distance here is 1. Similarly, the hamming distance between "help" and "heal" is calculated to be 2.
Levenshtein/edit distance is used to find the distance between two strings of unequal length. It is measured by the number of insertion, deletion, or substitution operations required to convert the actual query (string) to another string.
For example, the edit distance between "close" and "open" is 4 . There are 2 remove operations for 'c' and 'l', one substitute operation where we put 'p' in place of 's', and one insertion operation to add 'n' at the end of the string.
Damerau–Levenshtein distance is an extended form of Levenshtein distance, where we also include transposition operation. It is measured by the number of insertions, deletions, substitutions, and transpositions needed to convert the actual query to another string.
For example, to convert "educative" to "activate", we need to do the following:
We performed 3 delete operations, 1 transposition, 1 substitution, and 2 insertions. The total number of operations is 7, which is the Damerau–Levenshtein distance between "educative" and "activate".
Free Resources