...

/

Using Window Functions

Using Window Functions

Learn about the available window functions and discover when to use window functions.

Any and all aggregate functions we already know can be used against a window frame rather than a grouping clause, so we can start to use sum, min, max, count, avg, and the others that we’re already accustomed to.

We might already know that with PostgreSQL, it’s possible to use the CREATE AGGREGATE command to register our own custom aggregate. Any such custom aggregate can also be given a window frame definition to work on.

PostgreSQL, of course, is included with built-in aggregate functions and a number of built-in window functions.

Press + to interact
select surname,
position as pos,
row_number()
over(order by fastestlapspeed::numeric)
as fast,
ntile(3) over w as "group",
lag(code, 1) over w as "prev",
lead(code, 1) over w as "next"
from results
join drivers using(driverid)
where raceid = 890
window w as (order by position)
order by position;

In this example, we can see that we’re reusing the same window definition several times, so we’re giving it a name to simplify the SQL. In this query for each driver, we’re fetching their position in the results, their position in terms of fastest lap speed, a group number if we divide the drivers into a set of four groups thanks to the ntile function, the name of the previous driver who made it, and the name of the driver immediately next to the current one, thanks to the lag and lead functions:

    surname    │ pos │ fast │ group │ prev │ next 
═══════════════╪═════╪══════╪═══════╪══════╪══════
 Hamilton      │   1 │   20 │     1 │ ¤    │ RAI
 Räikkönen     │   2 │   17 │     1 │ HAM  │ VET
 Vettel        │   3 │   21 │     1 │ RAI  │ WEB
 Webber        │   4 │   22 │     1 │ VET  │ ALO
 Alonso        │   5 │   15 │     1 │ WEB  │ GRO
 Grosjean      │   6 │   16 │     1 │ ALO  │ BUT
 Button        │   7 │   12 │     1 │ GRO  │ MAS
 Massa         │   8 │   18 │     1 │ BUT  │ PER
 Pérez         │   9 │   13 │     2 │ MAS  │ MAL
 Maldonado     │  10 │   14 │     2 │ PER  │ HUL
 Hülkenberg    │  11 │    9 │     2 │ MAL  │ VER
 Vergne        │  12 │   11 │     2 │ HUL  │ RIC
 Ricciardo     │  13 │    8 │     2 │ VER  │ VDG
 van der Garde │  14 │    6 │     2 │ RIC  │ PIC
 Pic           │  15 │    5 │     2 │ VDG  │ BIA
 Bianchi       │  16 │    3 │     3 │ PIC  │ CHI
 Chilton       │  17 │    4 │     3 │ BIA  │ DIR
 di Resta      │  18 │   10 │     3 │ CHI  │ ROS
 Rosberg       │  19 │   19 │     3 │ DIR  │ BOT
 Sutil         │   ¤ │    2 │     3 │ GUT  │ ¤
 Gutiérrez     │   ¤ │    1 │     3 │ BOT  │ SUT
 Bottas        │   ¤ │    7 │     3 │ ROS  │ GUT
(22 rows)

And we can see that the fastest lap speed is not as important as one might think, as both the two fastest drivers didn’t even finish the race. In SQL terms, we also see that we can have two different sequences returned from the same query, and again we can reference other rows.

When to use window functions

The real magic of window functions is actually the frame of data they can see when using the over () clause. This frame is specified thanks to the partition by and order by clauses.

It would be best to remember that the windowing clauses are always considered last in the query, meaning after the where clause. In any frame, we can only see rows that have been selected for output—that is, it’s not directly possible to compute a percentage of values that we don’t want to display. We would need to use a subquery in that case.

We use window functions whenever we want to compute values for each row of the result set, and those computations depend on other rows within the same result set. A classic example is a marketing analysis of weekly results: we typically output each day’s gross sales and the variation with the same day in comparison to the previous week.