What is ROW_NUMBER() function in PostgreSQL?
A database is a collection of structured data stored in an organized manner. PostgreSQL, a relational database management system, employs SQL for fundamental operations such as creating tables, inserting data into tables, modifying the records, and deleting the existing data from the tables. Understanding SQL keywords is crucial for being skilled in SQL. One such term is the ROW_NUMBER() function, which is an analytical function useful for organizing and sorting data inside result sets.
Significance of the ROW_NUMBER() function in PostgreSQL
The ROW_NUMBER() window function assigns a unique row number to each row within a result set. It is important for various tasks, including ranking, data partitioning, and paging. Its importance lies in its ability to bring order to unordered data, allowing us to address common analytical challenges, such as identifying the top performers in a sales report.
Syntax
ROW_NUMBER() OVER (PARTITION BY column_nameORDER BY column_name)
Explanation
Line 1: The
ROW_NUMBER()function assigns a unique row number to each row. TheOVERclause defines the window specification for the function.Line 2: The
PARTITION BYenables us to partition the result set into subsets based on a specified columncolumn_name.Line 3: The
ORDER BYclause determines the order in which rows for the column,column_name, are numbered (for example, 1, 2, 3, 4, 5) within each partition.
Setting up a database
Let’s use a simple database named Employees having four columns, employee_id, employee_name, date_of_joining, and sales_revenue.
--Creating Employees tableCREATE TABLE IF NOT EXISTS Employees (emp_id SERIAL PRIMARY KEY,emp_name TEXT,date_of_joining DATE,sales_revenue DECIMAL(10, 2),country TEXT);--Inserting data into Employees tableINSERT INTO Employees (emp_name, date_of_joining, sales_revenue, country)VALUES('Usama Khan', '2022-01-15', 1500.00, 'Brazil'),('Fatima Ahmed', '2021-11-10', 2200.50, 'USA'),('Hassan Ali', '2021-12-05', 1800.25, 'Pakistan'),('Saira Khan', '2022-02-20', 1300.75, 'Australia'),('Ahmad Malik', '2022-03-15', 2500.80, 'Brazil'),('Bilal Ahmed', '2022-04-10', 1950.30, 'USA'),('Noor Fatima', '2022-01-01', 1620.00, 'Pakistan'),('Sarah Harris', '2021-11-30', 1850.50, 'Australia'),('Aamir Malik', '2022-05-20', 1380.25, 'Brazil'),('Ali Hassan', '2021-12-25', 1100.75, 'USA'),('Ayesha Siddiqui', '2022-03-10', 2650.80, 'Pakistan'),('Usman Ahmed', '2022-04-05', 1820.30, 'Australia');SELECT * FROM Employees;
Explanation
Lines 1–7: We create a table named
Employeeswith four columns: theemp_id(SERIAL,PRIMARY KEY),emp_name,date_of_joining, andsales_revenue.Lines 9–24: We insert a few records into the
Employeestable using theINSERT INTOcommand.Line 26: We display the data using the
SELECTstatement.
Code examples
Let's see the different aspects of the ROW_NUMBER() function through some examples.
Ranking using the ORDER BY clause in the ROW_NUMBER() function
In this example, we’ll use the ROW_NUMBER() function to rank employees based on their hiring dates, providing a clear understanding of who joined the company first.
SELECT emp_name,sales_revenue,ROW_NUMBER() OVER (ORDER BY sales_revenue DESC) AS rank_by_sales_revenueFROM Employees;
Explanation
Lines 1–2: We use the
SELECTstatement to read theemp_nameandsales_revenue.Lines 3–4: We use the
ROW_NUMBER()function to assign a unique integer value to each row, starting from 1 and incrementing by 1 for each subsequent row. TheORDER BYclause orders the rows based on thesales_revenuecolumn in descending order.Line 5: We name the result of this calculation
rank_by_sales_revenue.Lines 6: We use the
Employeestable for data.
Data partitioning using the PARTITION BY clause in the ROW_NUMBER() function
In this example, we’ll use the ROW_NUMBER() function to partition data by country and then rank employees within each country based on their sales revenue.
SELECT emp_name,country,sales_revenue,ROW_NUMBER() OVER (PARTITION BY countryORDER BY sales_revenue DESC) AS rank_within_countryFROM Employees;
Explanation
Lines 1–3: We use the
SELECTstatement to reademp_name,country, andsales_revenue.Lines 4–6: We use the
ROW_NUMBER()function to assign a unique integer value to each row within each partition, starting from 1 and incrementing by 1 for each subsequent row. ThePARTITION BYclause creates partitions or groups within the result set based on thecountrycolumn. TheORDER BYclause orders the rows within each country based on thesales_revenuecolumn in descending order.Line 7: We name the result of this calculation
rank_within_country.Lines 8: We use the
Employeestable for data.
Using common table expressions (CTE) with the ROW_NUMBER() function
In this example, we’ll use a ROW_NUMBER() function for more complex ranking tasks, such as calculating cumulative ranks.
WITH RankedEmployees AS (SELECT emp_name,sales_revenue,ROW_NUMBER() OVER (ORDER BY sales_revenue DESC) AS rank_by_sales_revenueFROM Employees)SELECT emp_name,sales_revenue,rank_by_sales_revenue,SUM(rank_by_sales_revenue) OVER (ORDER BY sales_revenue DESC) AS cumulative_rankFROM RankedEmployees;
Code explanation
Lines 1–8: We create a
RankedEmployeesCTE to rank employees based on thesales_revenuecolumn.Lines 9-11: We use the
SELECTstatement to read theemp_name,sales_revenue, andrank_by_sales_revenue.Lines 12-13: We use the
SUM()function to calculate the cumulative ranks. TheORDER BYclause orders the rows within each country based on thesales_revenuecolumn in descending order.Line 14: We name the result of this calculation
cumulative_rank.Line 15: The data uses the result set of the
RankedEmployeesCTE.
Conclusion
The ROW_NUMBER() window function assigns a unique row number to each row (starting from 1 and incrementing by 1 for each subsequent row) within a result set. We can use it to rank a single attribute. We can also use it to rank rows in the partitions. Moreover, we can use this window function to do complex calculations using common table expressions. The ROW_NUMBER() window function is essential because it unlocks advanced capabilities for querying and analyzing data.
Free Resources