Fuzzy Search

Learn about fuzzy logic and how to apply it in Elasticsearch.

Fuzzy search

Fuzzy search is a search method that allows for approximate matching of terms or phrases, rather than strict, exact matches. It is specifically designed to handle situations where variations, misspellings, or similar terms are present in the search query. In essence, fuzzy search is a technique that helps us address typos or minor spelling errors in search queries.

For instance, let’s imagine a scenario where a user intends to find Aldous Huxley’s book "Brave New World" but misspells "World" as "Word" in their search query, resulting in "Brave New Word". Elasticsearch’s fuzzy search feature can rectify this by enabling the search to produce accurate results by accommodating minor spelling errors. Consequently, "Brave New Word" would be identified as a match for "Brave New World".

Another instance involves searching for the term "Apple". Through the utilization of fuzzy search, variations like "Aaple", "Aple", or "Aplle" can be considered matches for the original term.

Press + to interact
A fuzzy search example
A fuzzy search example

Fuzzy search is valuable in scenarios where the search system needs to accommodate potential mistakes made by the users. By incorporating fuzzy search, the search experience becomes more user-friendly, improving the likelihood of finding relevant information, even in cases of slight deviations from the exact search terms.

How does fuzzy search work?

When fuzzy search is enabled in a search query, Elasticsearch uses fuzzy logic to match each term instead of an exact match.

Fuzzy logic is a mathematical logic that assigns truth values to variables within a range of 0 to 1. In contrast to traditional logic’s binary representation of 0 or 1 for matching terms, fuzzy logic assigns a value to each term, indicating the degree of matching or similarity.

In Elasticsearch, fuzzy logic is utilized within the fuzzy search functionality. When performing a fuzzy search query, Elasticsearch compares the specified search term with the terms stored in the inverted index, using an edit distance algorithm that represents the minimum number of operations required to transform one term into another.

The edit distance algorithm employed in fuzzy search takes into account three main types of operations:

  • Changing a character: This operation involves substituting one character with another. For instance, the word “box” can be transformed into “fox” by changing the letter “b” to an “f.”

  • Inserting a character: This operation involves adding a character to a word. An example is transforming “sic” into “sick” by inserting the letter “k.”

  • Removing a character: This operation entails deleting a character from a word. For example, removing the letter “b” from the word “black” results in the word “lack.”

Let’s consider an example using the edit distance algorithm to convert the word “horse” into “ros,” resulting in an edit distance of 3.

Here’s the step-by-step explanation:

  • “horse” -> “rorse” (replacing “h” ...