Pipeline Aggregations
Learn about pipeline aggregations and the process of combining and transforming data seamlessly.
We'll cover the following...
- Overview
- Sibling aggregation
- Parent aggregations
- Pipeline aggregation syntax
- Example
- Creating data
- Aggregation requests
- Aggregation identifying the product category with the highest cumulative profit
- Aggregation comparing the total sales and profits for each product category
- Aggregation that computes the average sales for each product category, excludes categories with an average below 140, and arranges the results in ascending order based on the average
- Kibana widget
Overview
A pipeline aggregation in Elasticsearch is a powerful feature that allows us to perform complex calculations and aggregations on the results of other aggregations. It’s a way to build multi-step aggregations that work on the intermediate results of previous aggregations. This can be especially useful when we need to perform calculations requiring aggregated data aggregations.
The basic idea of a pipeline aggregation is to take the output of one or more bucket or metrics aggregations and then perform additional calculations or aggregations on that output. This enables us to create advanced analytical queries that go beyond simple aggregations.
Here are the two types of pipeline aggregation:
-
Parent aggregation
-
Slibing aggregation
Sibling aggregation
A sibling aggregation within Elasticsearch’s aggregation receives the output of another aggregation at the same level and uses it to calculate a new aggregation result. In essence, a sibling aggregation leverages the results of existing aggregations to generate additional insights, such as computing the overall average of buckets derived from a term aggregation or determining the maximum value from buckets produced by bucket aggregation.
Consider a scenario where we possess a collection of product data and aim to determine the category with the highest cumulative product price. To achieve this, we initiate a term aggregation based on the product categories and compute the sum of prices within each category using the sum aggregation. Subsequently, we employ a sibling aggregation, specifically the maximum bucket aggregation, to identify the category with the greatest aggregated value. This aggregation type specializes in extracting the maximum bucket aggregation from a preceding bucket aggregation.
Here are the commonly used sibling aggregation types in Elasticsearch:
-
Average bucket aggregation: This aggregation computes the average value from the buckets generated by a bucket aggregation.
-
Max bucket aggregation: This aggregation calculates the maximum value from the buckets created by a bucket aggregation.
-
Min bucket aggregation: This aggregation determines the minimum value from the buckets produced by a bucket aggregation.
-
Stats bucket aggregation: This aggregation provides statistical insights, including the count, sum, average, minimum, and maximum, for the buckets generated by a bucket aggregation.
Parent aggregations
A parent aggregation is a group of pipeline aggregations that receives the results of its parent aggregation’s output and can generate additional buckets or perform fresh aggregations to augment existing buckets.
Essentially, a pipeline aggregation leverages the outcomes of other aggregations to generate fresh aggregations containing additional data points. For instance, a parent aggregation could introduce extra data into the produced buckets, arrange these buckets in ascending sequence, or selectively screen them based on particular criteria.
Elasticsearch offers a diverse range of parent aggregations, each with its own approach to augmenting or modifying supplementary details within bucket aggregations. Here are the bucket aggregations that are frequently utilized:
- Bucket sort aggregation
- Bucket selector aggregation