Using Window Functions
Learn about the available window functions and discover when to use window functions.
We'll cover the following...
Any and all aggregate functions we already know can be used against a window frame rather than a grouping
clause, so we can start to use sum
, min
, max
, count
, avg
, and the others that we’re already accustomed to.
We might already know that with PostgreSQL, it’s possible to use the CREATE AGGREGATE command to register our own custom aggregate. Any such custom aggregate can also be given a window frame definition to work on.
PostgreSQL, of course, is included with built-in aggregate functions and a number of built-in window functions.
select surname,position as pos,row_number()over(order by fastestlapspeed::numeric)as fast,ntile(3) over w as "group",lag(code, 1) over w as "prev",lead(code, 1) over w as "next"from resultsjoin drivers using(driverid)where raceid = 890window w as (order by position)order by position;
In this example, we can see that we’re reusing the same window definition several times, so we’re giving it a name to simplify the SQL. In this query for each driver, we’re fetching their position in the results, their position in terms of fastest lap speed, a group number if we divide the drivers into a set of four groups thanks to the ntile
function, the name of the previous driver who made it, and the name of the driver immediately next to the current one, thanks to the lag
and lead
functions:
surname │ pos │ fast │ group │ prev │ next
═══════════════╪═════╪══════╪═══════╪══════╪══════
Hamilton │ 1 │ 20 │ 1 │ ¤ │ RAI
Räikkönen │ 2 │ 17 │ 1 │ HAM │ VET
Vettel │ 3 │ 21 │ 1 │ RAI │ WEB
Webber │ 4 │ 22 │ 1 │ VET │ ALO
Alonso │ 5 │ 15 │ 1 │ WEB │ GRO
Grosjean │ 6 │ 16 │ 1 │ ALO │ BUT
Button │ 7 │ 12 │ 1 │ GRO │ MAS
Massa │ 8 │ 18 │ 1 │ BUT │ PER
Pérez │ 9 │ 13 │ 2 │ MAS │ MAL
Maldonado │ 10 │ 14 │ 2 │ PER │ HUL
Hülkenberg │ 11 │ 9 │ 2 │ MAL │ VER
Vergne │ 12 │ 11 │ 2 │ HUL │ RIC
Ricciardo │ 13 │ 8 │ 2 │ VER │ VDG
van der Garde │ 14 │ 6 │ 2 │ RIC │ PIC
Pic │ 15 │ 5 │ 2 │ VDG │ BIA
Bianchi │ 16 │ 3 │ 3 │ PIC │ CHI
Chilton │ 17 │ 4 │ 3 │ BIA │ DIR
di Resta │ 18 │ 10 │ 3 │ CHI │ ROS
Rosberg │ 19 │ 19 │ 3 │ DIR │ BOT
Sutil │ ¤ │ 2 │ 3 │ GUT │ ¤
Gutiérrez │ ¤ │ 1 │ 3 │ BOT │ SUT
Bottas │ ¤ │ 7 │ 3 │ ROS │ GUT
(22 rows)
And we can see that the fastest lap speed is not as important as one might think, as both the two fastest drivers didn’t even finish the race. In SQL terms, we also see that we can have two different sequences returned from the same query, and again we can reference other rows.
When to use window functions
The real magic of window functions is actually the frame of data they can see when using the over ()
clause. This frame is specified thanks to the partition by
and order by
clauses.
It would be best to remember that the windowing clauses are always considered last in the query, meaning after the where
clause. In any frame, we can only see rows that have been selected for output—that is, it’s not directly possible to compute a percentage of values that we don’t want to display. We would need to use a subquery in that case.
We use window functions whenever we want to compute values for each row of the result set, and those computations depend on other rows within the same result set. A classic example is a marketing analysis of weekly results: we typically output each day’s gross sales and the variation with the same day in comparison to the previous week.