Nested Bucket Aggregations

Learn how to use nested bucket aggregations.

Overview

In Elasticsearch, we have the capability to employ bucket aggregations within other bucket aggregations, a technique that proves invaluable for comprehensive data analysis.

This method involves nesting aggregations to facilitate a multi-dimensional exploration of our data. Starting with primary bucket aggregations, we create initial divisions or buckets based on specific criteria, such as terms or date ranges. Within each of these primary buckets, we can then introduce secondary bucket aggregations to further partition the data, revealing more intricate patterns and insights.

How do nested bucket aggregations work?

Nested bucket aggregations in Elasticsearch are like building layers. The first layer—outer aggregation—groups items based on a main characteristic. This makes distinct buckets. The second layer, which is the inner aggregation, goes within each of these buckets from the first layer, breaking things down into smaller groups.

In simpler terms, Elasticsearch combines every bucket from the inner aggregation with each bucket from the outer aggregation. This makes a grid where each detailed subgroup from the inner layer meets all the broader categories from the outer layer.

For a clearer picture, let’s consider an example where the outer aggregation results in three buckets labeled as “outer_bucket1,” “outer_bucket2,” and “outer_bucket3.” Meanwhile, the inner aggregation generates two buckets named “inner_bucket1” and “inner_bucket2.” In this scenario, Elasticsearch aligns these buckets so that each outer bucket is connected with all the inner buckets. To visualize this concept better, the following illustration offers a helpful depiction.

Press + to interact
Illustration of a nested bucket aggregation
Illustration of a nested bucket aggregation

To illustrate this concept with a practical example, let’s consider a scenario where we have order data from an online store, and our objective revolves around aggregating the daily order count for each product category. To achieve this, we can initiate a date_histogram aggregation to bin the data by day, subsequently embedding a nested term aggregation to create a bucket for each product category within the date_histogram.

We have the following product order data:

Order A

  "date": "2023/1/1",
  "category": "Electronics",
  "price": 100  

Order C

  "date": "2023/1/2",
  "
...

Order B

  "date": "2023/1/1",
  "category": "Clothing",
  "price": 15

Order D

  "date": "2023/1/2",
  "catego
...