Elasticsearch Fundamentals: Indexing and Querying Data/

...

Analyzers in Elasticsearch

Learn about analyzers in Elasticsearch.

We'll cover the following...

The analyzer is the process or sequence of processes that perform a series of operations on text, such as breaking it into individual words, lowercasing it, or removing common words. These operations are used to prepare text data for indexing and searching.

An analyzer in Elasticsearch comprises three main components:

Character filtering
Tokenization
Token filter

When an analyzer receives text data, it preprocesses it by applying zero or more character filtering. Then, it passes it to precisely one tokenizer, which converts the text into individual tokens (words). After tokenization, the analyzer runs zero or more token filters, which helps in modifying tokens (e.g., lowercasing), deleting tokens (e.g., removing stopwords), or adding tokens.

Elasticsearch provides built-in analyzers—a predefined combination of character filters, tokenization, and token filters. These analyzers can be used out of the box without creating or configuring our analyzer. On the other hand, Elasticsearch provides the ability to create our custom analyzer, which uses the appropriate combination of:

Zero or more character filters
A tokenizer
Zero or more token filters

Analyzers in Elasticsearch

Analyzer

Character filters