What is relational algebra in database management systems?

Relational algebra, an essential concept in database management systems (DBMS), forms the basis for querying and managing relational databases.

Relational database

A relational database comprises a collection of interrelated tables, each consisting of rows and columns. The relational model is the most prevalent database model due to its simplicity, adaptability, and robustness. It allows for efficient storage, retrieval, and manipulation of data while maintaining consistency and integrity.

The relational model is underpinned by a formal structure known as relational algebra, which defines the operations and properties necessary for working with relational databases. Grasping relational algebra is vital for understanding the core principles of query languages like SQL, which are indispensable for managing relational databases.

The main building blocks of relational algebra are operations. Let's discuss them.

Operations

Relational algebra consists of operations that accept one or more relations (tables) as input and produce a new relation as output. These operations can be classified into two categories: unary and binary.

Unary operations

Unary operations work with a single relation. The most common unary operations in relational algebra include:

Select (σ\sigma)

The select operation filters rows from a relation based on a specified condition. The output is a new relation containing only the rows that satisfy the condition. Mathematically, it is denoted as σcondition(R)\sigma_{\text{condition}}(R), where RR represents the input relation.

Let's say we have the following employee table.

Id

Name

Age

Department

101

John

28

Sales

102

Emily

31

Marketing

103

Ben

25

Finance

104

Laura

27

HR

The σ(Age > 25)(Employees) query will select all the employees with an age greater than 25, as given in the following resulting table.

Id

Name

Age

Department

101

John

28

Sales

102

Emily

31

Marketing

104

Laura

27

HR

To perform selection in a programming language like SQL, we can use the "WHERE" clause to specify the condition. For example:

SELECT * FROM Employees WHERE Age > 25;

Project (π\pi)

The project operation extracts certain columns from a relation, discarding the others. The output is a new relation containing only the selected columns. It is denoted as πcolumn(s)(R)\pi_{\text{column(s)}}(R), where RR represents the input relation.

Let's say we have the following employee table.

Id

Name

Age

Department

101

John

28

Sales

102

Emily

31

Marketing

103

Ben

25

Finance

104

Laura

27

HR

The π(Name, Department)(Employees) query will give only the Name and Department columns, as given in the following resulting table.

Name

Department

John

Sales

Emily

Marketing

Ben

Finance

Laura

HR

To perform projection in SQL, we can specify the desired columns in the "SELECT" clause. For example:

SELECT Name, Department FROM Employees;

Binary operations

Binary operations work with two relations. The most common binary operations in relational algebra include:

Union (\cup)

The union operation combines the rows of two relations with the same schema (i.e., identical columns and data types). The output is a new relation containing all unique rows from both input relations. It is denoted as R1R2R_1 \cup R_2.

Let's say we have tables such as Table A and Table B. Table A consists of the "ID" and "Name" columns as given below

Table A

ID

Name

101

John

102

Emily

103

Ben

Table B also consists of "ID" and "Name" columns, as shown below.

Table B

ID

Name

102

Emily

103

Ben

104

Laura

Let's apply union operation on tables A and B, such as A ⋃ B. The union operation combines unique rows from Table A and Table B, eliminating duplicates. The resulting table includes all distinct IDs and names from both tables, with each row appearing only once.

ID

Name

101

John

102

Emily

103

Ben

104

Laura

To perform union in SQL, we can use the "UNION" operator. For example:

SELECT ID, Name FROM TableA
UNION
SELECT ID, Name FROM TableB;

Intersection (⋂)

The intersection operation retrieves rows that are common between two tables. Symbolically, it is represented as table A ⋂ table B.

Let's apply the intersection operation on the same aforementioned tables, A and B.

ID

Name

102

Emily

To perform intersection in SQL, we can use the "INTERSECT" operator. For example:

SELECT ID, Name FROM TableA
INTERSECT
SELECT ID, Name FROM TableB;

Difference (-)

The difference operation returns the rows present in the first relation but not in the second relation. Both input relations must have the same schema. It is denoted as R1R2R_1 - R_2. Let's apply the difference operation on the same tables, A and B.

ID

Name

101

John

103

Ben

To perform the difference operation in SQL, we can use the "EXCEPT" operator. For example:

SELECT ID, Name FROM TableA
EXCEPT
SELECT ID, Name FROM TableB;

Cartesian product (×\times)

The Cartesian product operation generates all possible combinations of rows from two input relations. The output relation contains columns from both input relations. It is denoted as R1×R2R_1 \times R_2.

Now let's consider we have two tables Table X and Table Y as shown below:

Table X

Color

Red

Blue

Table Y

Size

Small

Large

Now let's apply Cartesian product operation on these two tables. The resulting table obtained from the Cartesian product operation combines every row from Table X with every row from Table Y, generating all possible combinations. In this case, the resulting table shows all the combinations of colors and sizes, representing each color paired with both small and large sizes.

Color

Size

Red

Small

Red

Large

Blue

Small

Blue

Large

To perform a Cartesian product in SQL, we can use a combination of "SELECT" statements without specifying any condition. For example:

SELECT * FROM TableX, TableY;

Join (\Join)

The join operation combines rows from two relations based on a specified condition. There are various types of joins, such as inner join, outer join, and natural join. The join operation is denoted as R1conditionR2R_1 \Join_{\text{condition}} R_2.

Now let's consider we have two tables Table R and Table S. Table R contains "ID", "Name," and "Age" columns as shown below.

Table R

ID

Name

Age

101

John

28

102

Emily

31

103

Ben

25

Table S contains Id and Department columns as shown below.

Table S

ID

Department

101

Sales

102

Marketing

104

HR

Now let's apply the join operation on Table R and Table S. The resulting table from the join operation combines rows from Table R and Table S based on matching IDs, displaying the shared information of Id, name, age, and department for the matched records.

ID

Name

Age

Department

101

John

28

Sales

102

Emily

31

Marketing

To perform a join in SQL, we can use the "JOIN" keyword and the appropriate join condition. For example:

SELECT * FROM TableR
JOIN TableS ON TableR.ID = TableS.ID;

The connection between relational algebra and query languages

Relational algebra is the theoretical underpinning for query languages like SQL (structured query language), the most widely used language for managing relational databases. SQL incorporates the concepts and operations of relational algebra, enabling users to perform complex data manipulations and retrieve information from relational databases with ease.

While relational algebra relies on mathematical notation, SQL offers a more user-friendly, English-like syntax. SQL's SELECT, FROM, and JOIN clauses correspond to relational algebra's project, Cartesian product, select, and join operations, respectively. Similarly, SQL supports other relational algebra operations such as union, difference, and various types of joins.

For example, a relational algebra expression to find the names of employees working in a specific department can be represented as:

The equivalent SQL query would be:

SELECT name
FROM Employee
WHERE department = 'IT';

Pros and cons of relational algebra

Relational algebra offers several advantages:

  1. Formal foundation: Relational algebra provides a solid theoretical basis for relational databases and query languages, ensuring consistency and correctness in data manipulation.

  2. Expressiveness: It allows for the expression of complex queries through the composition of basic operations.

  3. Optimization: The well-defined properties and rules of relational algebra enable database systems to optimize query execution plans, enhancing performance.

However, relational algebra also has some limitations:

  1. Notation: The mathematical notation can be challenging to understand and use for those without a strong mathematical background.

  2. Limited support for aggregation: Relational algebra does not natively support aggregation operations like sum, average, or count. Extensions to the core relational algebra typically handle these operations.

Wrapping up

Relational algebra is a fundamental concept in database management systems. It provides the theoretical foundation for working with relational databases. It consists of operations that manipulate relations, enabling the expression of complex queries and data manipulations. Relational algebra is the basis for query languages like SQL, which are essential for managing relational databases. Understanding relational algebra is crucial for grasping the principles behind relational databases and developing effective database applications.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved