What is ROW_NUMBER() function in PostgreSQL?

A database is a collection of structured data stored in an organized manner. PostgreSQL, a relational database management system, employs SQL for fundamental operations such as creating tables, inserting data into tables, modifying the records, and deleting the existing data from the tables. Understanding SQL keywords is crucial for being skilled in SQL. One such term is the ROW_NUMBER() function, which is an analytical function useful for organizing and sorting data inside result sets.

Significance of the ROW_NUMBER() function in PostgreSQL

The ROW_NUMBER() window function assigns a unique row number to each row within a result set. It is important for various tasks, including ranking, data partitioning, and paging. Its importance lies in its ability to bring order to unordered data, allowing us to address common analytical challenges, such as identifying the top performers in a sales report.

Syntax

ROW_NUMBER() OVER (
PARTITION BY column_name
ORDER BY column_name
)

Explanation

  • Line 1: The ROW_NUMBER() function assigns a unique row number to each row. The OVER clause defines the window specification for the function.

  • Line 2: The PARTITION BY enables us to partition the result set into subsets based on a specified column column_name.

  • Line 3: The ORDER BY clause determines the order in which rows for the column, column_name, are numbered (for example, 1, 2, 3, 4, 5) within each partition.

Setting up a database

Let’s use a simple database named Employees having four columns, employee_idemployee_name,  date_of_joining, and sales_revenue.

--Creating Employees table
CREATE TABLE IF NOT EXISTS Employees (
emp_id SERIAL PRIMARY KEY,
emp_name TEXT,
date_of_joining DATE,
sales_revenue DECIMAL(10, 2),
country TEXT
);
--Inserting data into Employees table
INSERT INTO Employees (emp_name, date_of_joining, sales_revenue, country)
VALUES
('Usama Khan', '2022-01-15', 1500.00, 'Brazil'),
('Fatima Ahmed', '2021-11-10', 2200.50, 'USA'),
('Hassan Ali', '2021-12-05', 1800.25, 'Pakistan'),
('Saira Khan', '2022-02-20', 1300.75, 'Australia'),
('Ahmad Malik', '2022-03-15', 2500.80, 'Brazil'),
('Bilal Ahmed', '2022-04-10', 1950.30, 'USA'),
('Noor Fatima', '2022-01-01', 1620.00, 'Pakistan'),
('Sarah Harris', '2021-11-30', 1850.50, 'Australia'),
('Aamir Malik', '2022-05-20', 1380.25, 'Brazil'),
('Ali Hassan', '2021-12-25', 1100.75, 'USA'),
('Ayesha Siddiqui', '2022-03-10', 2650.80, 'Pakistan'),
('Usman Ahmed', '2022-04-05', 1820.30, 'Australia');
SELECT * FROM Employees;

Explanation

  • Lines 1–7: We create a table named Employees with four columns: the emp_id (SERIAL, PRIMARY KEY), emp_namedate_of_joining, and sales_revenue.

  • Lines 9–24: We insert a few records into the Employees table using the INSERT INTO command.

  • Line 26: We display the data using the SELECT statement.

Code examples

Let's see the different aspects of the ROW_NUMBER() function through some examples.

Ranking using the ORDER BY clause in the ROW_NUMBER() function

In this example, we’ll use the ROW_NUMBER() function to rank employees based on their hiring dates, providing a clear understanding of who joined the company first.

SELECT emp_name,
sales_revenue,
ROW_NUMBER() OVER (
ORDER BY sales_revenue DESC
) AS rank_by_sales_revenue
FROM Employees;

Explanation

  • Lines 1–2: We use the SELECT statement to read the emp_name and sales_revenue.

  • Lines 3–4: We use the ROW_NUMBER() function to assign a unique integer value to each row, starting from 1 and incrementing by 1 for each subsequent row. The ORDER BY clause orders the rows based on the sales_revenue column in descending order.

  • Line 5: We name the result of this calculation rank_by_sales_revenue.

  • Lines 6: We use the Employees table for data.

Data partitioning using the PARTITION BY clause in the ROW_NUMBER() function

In this example, we’ll use the ROW_NUMBER() function to partition data by country and then rank employees within each country based on their sales revenue.

SELECT emp_name,
country,
sales_revenue,
ROW_NUMBER() OVER (
PARTITION BY country
ORDER BY sales_revenue DESC
) AS rank_within_country
FROM Employees;

Explanation

  • Lines 1–3: We use the SELECT statement to read emp_name, country, and sales_revenue.

  • Lines 4–6: We use the ROW_NUMBER() function to assign a unique integer value to each row within each partition, starting from 1 and incrementing by 1 for each subsequent row. The PARTITION BY clause creates partitions or groups within the result set based on the country column. The ORDER BY clause orders the rows within each country based on the sales_revenue column in descending order.

  • Line 7: We name the result of this calculation rank_within_country.

  • Lines 8: We use the Employees table for data.

Using common table expressions (CTE) with the ROW_NUMBER() function

In this example, we’ll use a common table expression (CTE)We can define a CTE using the WITH clause in SQL. It is a named query that we can reference in the same query as any other table or subquery. to apply the ROW_NUMBER() function for more complex ranking tasks, such as calculating cumulative ranks.

WITH RankedEmployees AS (
SELECT emp_name,
sales_revenue,
ROW_NUMBER() OVER (
ORDER BY sales_revenue DESC
) AS rank_by_sales_revenue
FROM Employees
)
SELECT emp_name,
sales_revenue,
rank_by_sales_revenue,
SUM(rank_by_sales_revenue) OVER (
ORDER BY sales_revenue DESC
) AS cumulative_rank
FROM RankedEmployees;

Code explanation

  • Lines 1–8: We create a RankedEmployees CTE to rank employees based on the sales_revenue column.

  • Lines 9-11: We use the SELECT statement to read the emp_name, sales_revenue, and rank_by_sales_revenue.

  • Lines 12-13: We use the SUM() function to calculate the cumulative ranks. The ORDER BY clause orders the rows within each country based on the sales_revenue column in descending order.

  • Line 14: We name the result of this calculation cumulative_rank.

  • Line 15: The data uses the result set of the RankedEmployees CTE.

Conclusion

The ROW_NUMBER() window function assigns a unique row number to each row (starting from 1 and incrementing by 1 for each subsequent row) within a result set. We can use it to rank a single attribute. We can also use it to rank rows in the partitions. Moreover, we can use this window function to do complex calculations using common table expressions. The ROW_NUMBER() window function is essential because it unlocks advanced capabilities for querying and analyzing data.

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved