Built-In Analyzers
Explore the most commonly used built-in analyzers and how to use them.
Overview
A built-in analyzer in Elasticsearch is a preconfigured set of rules and algorithms that combines character filters, tokenization, and token filters to process and analyze text data. These analyzers can be used without the need for creating or configuring a custom analyzer.
Elasticsearch offers a variety of built-in analyzers that facilitate the processing and analysis of text data stored in its indexes. Here is a list of the commonly used built-in analyzers:
- Standard analyzer
- Whitespace analyzer
- Keyword analyzer
- Fingerprint analyzer
- Language analyzer
Standard analyzer
The standard analyzer is the default analyzer used in Elasticsearch, and it divides the text into individual tokens whenever it encounters a non-letter. In addition, it removes punctuation, lowercases the terms, and supports eliminating stop words (such as a
and the
).
Note: The standard analyzer uses the Unicode Text Segmentation algorithm to tokenize the input text.
For example, if the text "The Fast fox is running."
is analyzed by the standard analyzer, it will produce the following tokens after lowercasing terms and removing punctuation:
["the", "fast", "fox", "is", "running"]
Whitespace analyzer
The whitespace analyzer splits the text ...