Keyword vs. Text Data Type

Understand the inverted index concepts and differences between keywords and text data types.

We'll cover the following...

In Elasticsearch, there are two main data types used for storing textual data: text and keyword. Understanding the differences between these two data types is crucial for efficient indexing and searching of textual data. But, before learning what the text and keyword fields are and how they are stored and searched, let’s understand an essential concept in ElasticSearch called the inverted index.

Inverted index

An inverted index is a data structure commonly used in text search engines like Elasticsearch to quickly locate documents containing specific words or phrases. For example, in Elasticsearch, an inverted index is utilized to quickly identify and retrieve documents that match a user’s query.

An inverted index is a data structure that maps all the unique terms or phrases found in any document to a list of document IDs in which they appear. It serves as a lookup table that allows for efficient searching and retrieval of documents based on the terms they contain. Essentially, an inverted index is a mapping between terms and the documents that contain them. This helps Elasticsearch quickly locate relevant documents in response to search queries.

The following example will give us a clearer understanding of an inverted index. Let’s suppose we’re storing documents that contain names in an inverted index. Here are examples of these documents:

Document 1: ...

Document 2: ...