...

Aggregate Expressions

Learn how to use SQL to calculate aggregates over a group or rows.

We'll cover the following...

Overview
Simple aggregation
Aggregate expression with partition
Multiple aggregate expressions
Named windows

Press + to interact

PostgreSQL

CREATE TABLE temperatures (
    day DATE,
    city VARCHAR(30),
    temperature INT
);
INSERT INTO temperatures (day, city, temperature) 
VALUES 
    ('2021-01-01', 'NY', 10),
    ('2021-01-02', 'NY', 12),
    ('2021-01-03', 'NY', 13),
    ('2021-01-04', 'NY', 14),
    ('2021-01-05', 'NY', 18),
    ('2021-01-06', 'NY', 15),
    ('2021-01-07', 'NY', 16),
    ('2021-01-08', 'NY', 17),
    
    ('2021-01-01', 'LA', 22),
    ('2021-01-02', 'LA', 21),
    ('2021-01-03', 'LA', 19),
    ('2021-01-04', 'LA', 22),
    ('2021-01-05', 'LA', 25),
    ('2021-01-06', 'LA', 27),
    ('2021-01-07', 'LA', 25),
    ('2021-01-08', 'LA', 27)
;
SELECT * FROM temperatures;

Press + to interact

We first pre-calculated the hottest temperature in each city and stored it in a common table expression called hottest. We then joined the results to the temperatures tables based on the city and calculated the difference between the temperature and the hottest temperature in the diff_from_hottest column.

This type of analysis, where each value is compared to an aggregate of a larger group of values, is very common. In the next sections, we are going to use aggregate expressions to produce the same results more easily.

Aggregate expression with partition

Using aggregate expressions, we can calculate an aggregate over a group of rows without reducing the number of rows. For example, to add a column with the highest temperature in each city, we can use the following aggregate expression:

Press + to interact

The results now include a max_temperature_at_city column with the maximum temperature in each city.

Adding the OVER() keyword to the aggregate MAX function, turns it into an aggregate expression. The result of the aggregate MAX(temperature) OVER (PARTITION BY city) expression is the maximum temperature in each city.

We use the PARTITION clause to tell the database what groups to use to calculate the aggregate expression, exactly like in the GROUP BY clause. In this case, we want a group for each city, so we partition it by the city column.

Notice that even though we used an aggregate MAX function, we still get all the rows in the table. The database calculated the results of the aggregate expression over each partition in addition to the existing data set. This is the main difference between regular aggregation using GROUP BY, and aggregate expressions.

Press + to interact

Introduction

Basic SQL for Data Analysis

Descriptive Statistics

Grouping and Subtotals

Running and Cumulative Aggregation

Interpolation

Binning

Conclusion

Aggregate Expressions

Overview

Simple aggregation

Aggregate expression with partition

Partition by expression

Partition by multiple expressions

Partition by all rows

Multiple aggregate expressions

Named windows