Are you gearing up for a database interview? Databases are important components of modern information systems, serving as efficient organizers and managers of data. Understanding databases is compulsory for various job positions, including data analysts, database administrators, software engineers, and more. To help you get ready, we’ve collected a list of the top 20 frequently asked database interview questions, along with their answers, to ensure you are well prepared. Explore the key concepts to boost your confidence and stand out in your next interview.
A database management system (DBMS) is software that serves as a bridge between the data and the software application. It is a collection of processes that collectively store, manage, and retrieve information for the user. DBMS makes system calls to work with the data stored in memory or disk. It also interacts with the OS for other system calls, for instance, network requests and responses.
Types of DBMS
Relational DBMS: Data is organized into tables with rows and columns. This uses SQL (Structured Query Language) for querying and managing data.
Object-oriented DBMS: Data is represented as objects, similar to object-oriented programming concepts. It is useful for managing complex data structures.
NoSQL DBMS: This stands for “not only SQL.” It is designed for large-scale distributed data storage and retrieval and is often used for big data and real-time web applications.
Hierarchical DBMS: Data is organized like a family tree. Each piece of information is connected to another, like branches on a tree.
Network DBMS: It is similar to hierarchical DBMS, but a complex relationship in which records can have multiple parent and child records.
A relational database management system (RDBMS) is a type of software that helps organize and manage data in a structured way. It manages application data stored on disk in a way that creates an abstraction of tables. Each table has rows (representing individual records) and columns (representing attributes or fields of the records). The arrows represent the relationship between the tables. This structure establishes the relationships between tables, enabling efficient data retrieval and manipulation. RDBMS uses SQL to query and manipulate data.
In a DBMS, we use the following languages to manage the databases:
Data Definition Language (DDL) defines the structure and schema of the database.
Data Manipulation Language (DML) manipulates the data stored in the database.
Data Control Language (DCL) manages user permissions and controls in the database system.
Transaction Control Language (TCL) manages transactions in the database.
Here are the differences between a primary key and a unique key:
Primary key:
It uniquely identifies each record in a table.
It cannot contain NULL
values.
Each table can have only one primary key.
Unique key:
It ensures that all values in a column are unique.
It allows NULL
values.
Each table can have multiple unique keys.
As you can see in the table on the right, the ID
column uniquely identifies each record. That’s why we label it as the primary key. The table that the Phone
column has a blank entry in the last row. That’s why we label it as a unique key.
Normalization organizes a database into tables and columns. It divides large tables into smaller tables and defines relationships between them. By organizing the fields and tables of a database, the goal is to minimize data redundancy and dependency.
Denormalization is used to add redundant data to a normalized database. It combines tables and columns to reduce the number of joins needed to retrieve data. The goal is to improve read performance, but it can also lead to data redundancy and increase the database’s complexity.
Normalized databases require data to be cross-referenced across multiple tables, which can sometimes suffer read performance. This is because retrieving information may involve joining multiple tables, which can impact the speed of data retrieval.
Let’s look at the example of normalization and denormalization:
Consider an example of a table, Students
, that stores information about students in a school. The table will look like this:
In the table above:
StudentID
uniquely identifies each student and serves as the primary key.
Name
, Age
, and Department_name
columns contain information about each student.
Let’s identify redundancy. Note that both Alice
and Charlie
are in the engineering department. Similarly, both Bob
and David
are in the accounting department. This repetition of information leads to redundancy.
To address this, we can normalize the table by creating separate tables for distinct entities like Departments
and linking them to the Students
table using foreign keys. This approach reduces redundancy and improves data integrity.
In this normalized structure:
The Students
table links to the Departments
table via the DID
column.
The Departments
table contains unique departments.
The Students
table references the appropriate department for each student.
This structure reduces redundancy and ensures data consistency by avoiding repeating groups and storing related information separately.
There are different forms of database normalization, each building on the previous one to improve data integrity.
The most common normal forms are:
First normal form (1NF): Each column in a table contains atomic (indivisible) values, and each column has a unique name.
Second normal form (2NF): The table is in 1NF, and all nonkey attributes are
Third normal form (3NF): The table is in 2NF, with no
Boyce-Codd normal form (BCNF): A stricter form of 3NF, where every
We can think of the differences between DDL, DML, and DCL in two different dimensions:
Effect on the data:
DDL commands affect the structure of the database schema.
DML commands affect the actual data stored in the database tables.
DCL commands affect the access and permissions of users and roles in the database.
Transaction management:
DDL and DCL commands are typically autocommitted, which means they are immediately applied and cannot be rolled back.
DML commands are transactions that can be committed or rolled back as a single unit.
The advantages of DBMS are as follows:
Data integrity: It ensures that data is accurate and consistent.
Data security: It provides mechanisms to protect data from unauthorized access.
Data independence: It allows changes in the database structure without affecting the applications.
Data backup and recovery: It offers features to backup and restore data in case of failures.
Efficient data access: It utilizes indexing and caching to retrieve data fast.
Concurrent access control: It manages simultaneous access to data by multiple users.
The WHERE
clause filters data from individual rows in a table based on specified conditions, while the HAVING
clause filters data from groups of rows in a query based on conditions involving aggregate values.
The WHERE
clause is used to filter rows before any groupings are made.
It is used with the SELECT
, UPDATE
, and DELETE
statements to specify a condition.
SELECT * FROM employees WHERE department = 'Sales';
The HAVING
clause filters rows after the grouping has been done.
It is used with the GROUP BY
clause in a SELECT
statement to specify a condition.
SELECT department, COUNT(*) FROM employees GROUP BY department HAVING COUNT(*) > 5;
A subquery, a nested query or inner query, is nested inside another query. For example, we have two tables, students
and grades
, and we want to find the names of students who scored above average. We will use a subquery to do this. Here’s how we can write the subquery:
SELECT name FROM students WHERE student_id IN (SELECT student_id FROM grades WHERE grade > (SELECT AVG(grade) FROM grades));
The subquery (SELECT AVG(grade) FROM grades)
calculates the average grade, and the outer query uses this result to find the names of students who scored above average.
Aggregate functions are used to calculate values and return a single result. These functions include finding the number of items (COUNT
), sum (SUM
), average (AVG
), minimum (MIN
), or maximum (MAX
) value in a column of a table.
Function Name | Meaning | Example |
SUM() | The sum of the values in a column |
|
AVG() | The average of the value in a column |
|
MAX() | The largest values in a column |
|
MIN() | The smallestt values in a column |
|
COUNT() | Number of values in a column |
|
A join
is like combining two tables in a database to get information from both.
Here are the different types of join:
Let’s look at an example of students and courses.
Inner join
combines rows from both tables where the specified condition is met.
SELECT students.Name, courses.CourseNameFROM studentsINNER JOIN courses ON students.StudentID = courses.StudentID;
Left join
returns all rows from the left table (students
) and the matched rows from the right table (courses
).
SELECT students.Name, courses.CourseNameFROM studentsLEFT JOIN courses ON students.StudentID = courses.StudentID;
Right join
returns all rows from the right table (courses
) and the matched rows from the left table (students
).
SELECT students.Name, courses.CourseNameFROM studentsRIGHT JOIN courses ON students.StudentID = courses.StudentID;
Outer join
combines the results of both left and right outer joins.
SELECT s.Name, c.CourseNameFROM students sFULL JOIN courses c ON s.StudentID = c.StudentID;
We use view
for the following:
Views are virtual tables that have rows and columns like tables.
Tables are like storage containers where data is kept on disk. Views, on the other hand, don’t store data directly. They’re like virtual windows that show data from tables based on a query. Views can also do more than just display data. They can group data, do aggregation, or set up rules for who can see what data.
Tables allow for data modification, including insert
, update
, and delete
operations. However, we can’t directly modify data through a view if it involves multiple tables.
Views can be used to precompute and store the results of complex queries, improving performance by reducing the need to recompute the results each time the query is run.
A trigger is a special kind of stored procedure in a database that automatically triggers or executes in response to certain events, such as inserting, updating, or deleting data in a table. It’s like a set of instructions that the database follows when specific actions occur, helping to maintain data integrity and automate tasks.
A stored procedure is a set of SQL statements stored in the database that can be called by name to perform a specific task or operation. It helps reuse code, improve performance, and maintain consistency in database operations.
A transaction is a unit of work performed on a database that involves one or more operations, such as reading, writing, or updating data. Transactions have ACID properties (atomicity, consistency, isolation, durability).
The ACIS properties are listed below:
Atomicity: Ensures that all operations within a transaction are completed successfully or none of them are.
Consistency: Ensures that a transaction brings the database from one valid state to another. It maintains the integrity constraints and rules defined for the database.
Isolation: Ensures that the execution of multiple transactions concurrently does not lead to data inconsistencies.
Durability: Ensures that once a transaction is committed, the changes made by the transaction persist, even in the event of a system failure.
The difference between clustered and nonclustered indexes are as follows:
Clustered index: In a clustered index, the rows of the table are stored in the order of the index key. There can only be one clustered index per table, as the rows are physically sorted based on the index key.
Nonclustered index: In a nonclustered index, the index key entries point to the actual rows in the table. This means that the rows are not stored in the order of the index key, and there can be multiple nonclustered indexes per table. It allows for fast retrieval of specific rows based on criteria not covered by the clustered index, improving query performance.
The difference between DROP
, TRUNCATE
, and DELETE
are as follows:
DROP
removes an entire table, including its structure, data, indexes, and triggers. It essentially deletes the table from the database.
TRUNCATE
removes all table rows but retains the table structure, indexes, and triggers. A DELETE
statement can achieve the same result, but a TRUNCATE
is faster because it does not log individual row deletions.
DELETE
removes one or more rows from a table based on a condition. It removes specific rows while leaving the table structure, indexes, and triggers intact. Unlike TRUNCATE
, DELETE
is logged and can be rolled back using a transaction.
The different abstraction levels are listed below:
Physical level: This is the lowest level of abstraction and deals with how data is stored on the storage medium, such as disks. It includes details like data structures, file organization, and access methods.
Logical level: This level describes what data is stored in the database and the relationships among the data. It hides the complexities of the physical storage and focuses on the logical structure of the database, including tables, views, and constraints.
View level: This is the highest level of abstraction and represents how users perceive the data. It defines different database views for different users or applications, providing a customized and simplified interface to the underlying database.
This blog has explored the key topics commonly asked in database interview questions. Specifically, we covered DBMS types, normalization, keys, relational databases, and joins.
If you want to further enhance your skills and tackle more interview questions, consider enrolling in our courses below. It’s designed to help you grasp the concepts guiding you from basic database principles to more advanced topics.
Free Resources