What is the PARTITION BY clause in PostgreSQL?

database is a collection of structured data stored in an organized manner. Using PostgreSQL, we can perform multiple operations, such as creating tables, inserting data into tables, modifying records, and deleting the existing data from tables. In this Answer, we’ll learn the PARTITION BY clause in PostgreSQL.

The PARTITION BY clause

A PARTITION BY clause is used in window functions, which allows us to divide the result set of a query into partitions to perform calculations on each partition separately. It is specifically used in conjunction with window functions and requires the OVER() clause to define the window or frame over which the window function operates. The syntax for the PARTITION BY clause is as follows:

SELECT column1,
column2,
window_function(expression) OVER (
PARTITION BY partition_expression
ORDER BY sort_expression
frame_specification
) AS result_column
FROM your_table;

Explanation

  • Line 1: The SELECT clause specifies the columns we want to retrieve in the result set.

  • Line 3: The window_function(expression) represents the specific window function we’re using (e.g., SUM, AVG), along with the expression we want to calculate.

  • Line 4: The PARTITION BY partition_expression defines the column by which we want to create partitions.

  • Line 5: The ORDER BY sort_expression is optional and specifies the order within each partition.

  • Line 6: The frame_specification is optional and allows us to further refine the window frame.

  • Line 7: The query uses the alias result_column using the AS keyword.

  • Line 8: The query will take data from the table written in place of your_table.

Let’s have a look at the following section, where the concept is illustrated through example.

Example

We’ll cover two topics in this section, including:

  1. Set up the database

  2. Try it yourself

Set up the database

We use a simple database named Employees having four columns, emp_idemp_namedept, and date_of_joining.

--Creating Employees table
CREATE TABLE IF NOT EXISTS Employees (
emp_id SERIAL PRIMARY KEY,
emp_name TEXT,
dept TEXT,
date_of_joining DATE
);
--Inserting data into Employees table
INSERT INTO Employees (emp_name, dept, date_of_joining)
VALUES
('Usama Khan', 'HR', '2022-01-01'),
('Fatima Ahmed', 'Sales', '2022-02-15'),
('Hassan Ali', 'IT', '2022-09-20'),
('Saira Khan', 'IT', '2022-06-15'),
('Ahmad Malik', 'HR', '2022-07-10'),
('Bilal Ahmed', 'HR', '2023-01-20'),
('Noor Fatima', 'Sales', '2023-02-15'),
('Zainab Khan', 'Sales', '2022-08-25'),
('Sana Khan', 'HR', '2022-10-15'),
('Aamir Malik', 'Sales', '2022-11-30'),
('Amna Raza', 'IT', '2022-12-25'),
('Ali Hassan', 'IT', '2022-03-10'),
('Ayesha Siddiqui', 'HR', '2022-04-05'),
('Usman Ahmed', 'Sales', '2022-05-20'),
('Imran Malik', 'IT', '2023-03-10');
SELECT * FROM Employees;

Code explanation

  • Lines 1–7: We create a table named Employees with four columns: the emp_id (serialSERIAL is a keyword used to create an auto-incrementing integer column. When you define a column as SERIAL, PostgreSQL automatically generates a unique integer value for each new row inserted into the table. This is commonly used for primary key columns to ensure that each row has a unique identifier., primary key), emp_namedept, and date_of_joining.

  • Lines 9–26: We insert a few records into the Employees table using the INSERT INTO command.

  • Line 28: We display the data using the SELECT statement.

Try it yourself

To rank employees within each department based on their joining date, we can use ROW_NUMBER() Window function with the PARTITION BY clause.

SELECT emp_name,
dept,
date_of_joining,
ROW_NUMBER() OVER (
PARTITION BY dept
ORDER BY date_of_joining
) AS rank_within_dept
FROM Employees;

Code explanation

  • Lines 1–3: The SELECT statement reads emp_namedept, and date_of_joining.

  • Line 4: The ROW_NUMBER() function assigns a unique integer value to each row within each partition, starting from 1 and incrementing by 1 for each subsequent row.

  • Lines 5–6: The PARTITION BY clause is used to create partitions or groups within the result set based on the values in the dept column. The ORDER BY clause orders the rows within each department based on the date_of_joining column in ascending order.

  • Line 7: The result of this calculation is named rank_within_dept.

  • Line 8: The table being used in terms of data is Employees.

Conclusion

The PARTITION BY clause is a key component of PostgreSQL that enables vital operations like performing calculations on each partition separately after the result set has been divided by using any window function. We can also sort the data by using the ORDER BY clause with the PARTITION BY clause within the same window function. It’s an essential skill as it unlocks advanced capabilities for querying and analyzing data.

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved