What Is Text Analysis?

Understand the important concepts about text analysis in Elasticsearch.

Overview

In Elasticsearch, text analysis is the process of converting unstructured text into a structured format optimized for search. This process involves tokenizing the text into individual words, filtering these tokens, and utilizing an inverted index for storage and search operations.

The text analysis process in Elasticsearch occurs at the index and search times. When indexing a text field, Elasticsearch passes it to the analyzer to perform a set of procedures/operations before storing the results in the inverted index. On the other hand, when a user searches on a text field, Elasticsearch again sends it to the search analyzer, which then performs a set of operations and sends the result to compare the data with the inverted index.

We will discuss some essential concepts before diving into text analysis in Elasticsearch.

Inverted index

An inverted index is a data structure used in Elasticsearch to store and retrieve information efficiently. It maps words or terms to their location in a set of documents. The index lists all the unique words in a collection of documents and the document IDs where each word occurs. When a search query is made, the inverted index is used to quickly identify the documents that contain the query terms, allowing for ...