Bucket Aggregation

Learn about bucket aggregation and how to group data into buckets based on certain criteria.

Overview

Bucket aggregation is an aggregation type in Elasticsearch that categorizes documents into distinct groups or buckets based on specific criteria. Every bucket represents a value, serving to define the documents contained within it. By default, this representation corresponds to the document count within the bucket.

There is a wide variety of bucket aggregations available in Elasticsearch, each differing in how they categorize documents into distinct buckets. The following are commonly used bucket aggregations:

  • Terms aggregation

  • Range aggregation

  • Histogram aggregation

Term aggregation

Term aggregation groups documents based on unique terms found within a specified field. It creates separate buckets for each distinct term in the chosen field and provides information about the documents associated with each term.

For example, if we have an index containing customer data and we want to apply a term aggregation on the “country” field, Elasticsearch will create a bucket for each unique country name present in that field. Each bucket will contain the relevant documents that match that particular country. The following illustration visualizes an example of how the term aggregation on the “country” field would work.

Press + to interact
Term aggregation on the country field
Term aggregation on the country field

We can employ term aggregation on fields of types such as keyword, numeric, ip, boolean, or binary. Additionally, it is possible to utilize the text field. However, we need to enable the fielddata feature on the text field beforehand to generate buckets for the field’s analyzed terms.

During the application of term aggregation, a bucket is defined based on a term. In Elasticsearch, a term corresponds to what is stored within the inverted index. Consequently, when conducting term aggregation on a text field, a bucket is created for each term present in the inverted index, deriving from the index generated through analysis by the analyzer.

For example, let’s consider an Elasticsearch index containing product descriptions. The “description” field is of type “text” and is analyzed using the standard analyzer to break down the text into individual terms. Let’s suppose the index includes the following documents:

  • Product 1 with the description,“Apple watch ultra”

  • Product 2 with the description,“Apple AirPods”

The following image illustrates ...