- What is conditional aggregation in SQL?
- Describe the concept of correlated subqueries in SQL.
- What is a self join in SQL, and how do you use aliases?
- Explain why understanding NULL values is essential for running SQL queries.
If you’re seeking a career in data science, data analysis, or plan on working with databases at all, consider adding SQL (Structured Query Language) to your skillset. It’s straightforward, easy to learn, and individuals with SQL experience are in high demand.
One of the reasons there’s such a strong demand for SQL skills is the widespread use of databases in virtually every field of business. Databases can process much more data than traditional spreadsheets like Microsoft Excel, and SQL is used to access and extract that data within relational databases. As a result, relational database management systems (RDBMS) like Microsoft SQL Server, MySQL, and Oracle are the most popular database systems currently in use.
Furthermore, SQL is frequently used alongside more powerful programming languages like Python or Java to analyze and manipulate data. Having SQL in your repertoire is a great way to increase your versatility and utility as a developer.
Check out An Introductory Guide to SQL if you’d like to become better acquainted with the fundamental concepts of SQL.
Let’s get started!
Master database design with our hands-on course today
Databases are universal - they underlie the technology you use every day. They're a crucial component of everything from telecommunications systems, banking systems, and video games to just about any other software system that maintains some amount of persistent data. In this course, you'll learn about the fundamental concepts of databases, why and when they're used, what relational databases are, and entity-relationship diagrams. You will also be exposed to techniques like normalization that help to increase the efficiency of databases. You'll wrap up with a look at the basics of Structured Query Language (SQL). After completing this course, you will be able to move onto more advanced courses involving database systems and SQL.
Due to the diverse applicability of SQL, there is a wide range of positions that could require you to demonstrate your knowledge of SQL, and the interview process will differ from role to role.
In general, you can prepare by practicing query optimization in SQL servers and familiarizing yourself with the concepts that require a particular command. Interviewers are looking for candidates who are comfortable discussing the strengths and weaknesses of different SQL methods, understand how databases process indexes, joins, views, and can correctly identify optimal solutions for a variety of situations. The more you practice, the easier it will be to confidently tackle more complex SQL problems.
An Introductory Guide to SQL goes over eight key areas of knowledge that frequently come up during SQL interviews:
Remember to take some time to reflect on what you’re learning and see if you can connect your technical knowledge to other experiences. It’s great to hone your skills using technical challenges, but we also recommend sharpening your behavioral interviewing skills. Check out our FREE course: Grokking the Behavioral Interview
This section will cover 40 SQL questions that you should know before heading into an interview and should provide a general idea of the kind of questions to expect.
SQL is a non-procedural programming language developed by IBM in the 1970s and then later by Oracle. SQL is used by almost all relational databases to write queries, access, edit, and retrieve data.
A database is an organized collection of digital information, or data, stored in a computer system. Unlike spreadsheets, a database can handle massive quantities of information while allowing multiple users access to the same database to run secure and independent queries.
An RDBMS or Relational Database Management System is essentially a database utilizing a tabular schema to organize multiple data elements related to each other. An RDBMS lets your define, create, and maintain relational databases in addition to providing controlled access to the data within. A DBMS or Database Management System functions similarly to the RDBMS described above. However, DBMS data elements are stored as files rather than in tabular form, and there is no relationship between different data elements.
Understand relational databases and Structured Query Language (SQL)
Relational databases store data in a row-based table structure. Structured Query Language (SQL) is a core language used to retrieve data from the relational database. One of its most widely used extensions, MySQL, also gives you the power to edit, create, and manage the queried data. In this course, you’ll be introduced to the basics of relational database management with SQL. You’ll use MySQL to create a table, edit the data, and even change the structure. You’ll also create indexes to quickly access data for faster performance on common queries. Additionally, you’ll explore more complex queries to extract useful data with aggregate functions, grouping, ordering, and creating limits and filters in output tables. Finally, you’ll learn how to combine data from different tables with join functions. By the end of this course, you’ll be able to create, manage, edit, and merge relational databases with confidence. MySQL will prepare you for data projects large and small.
Make sure you have adequate knowledge of relational databases before moving forward.
There are six basic types of SQL commands.
Type | Description | Commands |
---|---|---|
Data Definition Language (DDL) | Used to create and restructure relational database objects, such as tables. | CREATE TABLE , ALTER TABLE , DROP TABLE , CREATE INDEX , ALTER INDEX , DROP INDEX , CREATE VIEW , DROP VIEW |
Data Manipulation Language (DML) | Used to manipulate data within relational database objects. | INSERT , UPDATE , DELETE |
Data Query Language (DQL) | With only one command, DQL is used to perform queries within a relational database. | SELECT |
Data Control Language (DCL) | Used to control access to data within a relational database. | ALTER PASSWORD , GRANT , REVOKE , CREATE SYNONYM |
Data administration commands | Used to analyze database operations or conduct audits. | START AUDIT , STOP AUDIT |
Transactional control commands | Used to manage transactions within a relational database. | COMMIT , ROLLBACK , SAVEPOINT , SET TRANSACTION |
SQL is a programming language, while MySQL is a popular, open-source RDBMS. MySQL is used to store and organize data, while SQL is used to access, edit, update and maintain data in MySQL.
Learn how to use MySQL in this short useful guide
A PRIMARY KEY
constraint is a column (or combination of columns) used to designate each table row with a unique identifier. You can think of a primary key as having a similar function to national government-issued identification numbers, a citizen’s Social Security Number, or a vehicle identification number (VIN).
Note: There’s a limit of one
PRIMARY KEY
constraint per table. All columns defined within aPRIMARY KEY
constraint must be defined asNOT NULL
.
A FOREIGN KEY
is a column or collection of fields in a table referencing a PRIMARY KEY
in another table. The table containing the primary key is known as the parent table, and the table containing the foreign key is called the child table.
For example, the PRIMARY KEY
in the parent table below is OwnerID
. The PRIMARY KEY
uniquely identifies individual pet owners.
OwnerID | LastName | FirstName | Address | PetCount |
---|---|---|---|---|
2498 | Smith | Bonnie | 123 Mango Street | 2 |
2499 | Brown | Thomas | 456 Papaya Way | 0 |
2450 | Goes | Rosemary | 789 Apple Court | 1 |
For this child table, the PRIMARY KEY
is PetID
, and the OwnerID
column is a FOREIGN KEY
because it references the primary key of another table.
PetID | PetName | Species | OwnerID |
---|---|---|---|
1 | Whiskers | Cat | 2450 |
2 | Gilgamesh | Cockatiel | 2499 |
3 | Enkidu | Cockatiel | 2499 |
Like the PRIMARY KEY
, the UNIQUE
constraint also ensures that each value is different from the others in its column. However, tables can have multiple columns with UNIQUE
constraints, unlike the PRIMARY KEY
constraint, limited to just one.
In SQL, a JOIN
clause combines rows of data in different tables with a shared column. You can SELECT
and return records with matching values in both tables based on this relationship.
There are four kinds of JOIN
clauses in SQL:
JOIN type |
Description |
---|---|
INNER JOIN |
Returns only the records with matching values in both tables. |
LEFT JOIN |
Returns records in the left table in addition to records with matching values in both tables. |
RIGHT JOIN |
Returns records in the right table, in addition to records with matching values in both tables. |
FULL OUTER JOIN or FULL JOIN |
Returns all records with matches in either the left or right tables. |
Study SQL joins in detail here
A JOIN
clause combines rows from two or more tables based on a related column between them. A self join is a regular join, but the table is joined with itself – this is extremely useful for comparisons within a table.
Joining a table with itself means that each table row is combined with itself and with every other row of the table.
In SQL, the cross join combines each row of the first table with each row of the second table. It is also known as the Cartesian join since it returns the Cartesian product of the sets of rows from the joined tables.
The WHERE
clause can be used to establish the first condition that groups and returns only the rows that meet that condition into a result set. Then, secondary conditions can be applied using the HAVING
clause to return only the groups within that set that meet your new criteria.
Differences | DELETE |
TRUNCATE |
---|---|---|
Type | DML (Data Manipulation Language) | DDL (Data Definition Language) |
Function | Used to remove specific rows or tuples from tables or relations. | Used to delete all rows or tuples from a table. |
WHERE |
Can contain WHERE clause. |
Cannot contain WHERE clause. |
Transaction logging | Row deletions are logged. | Deleted data pages are not logged. |
The TRUNCATE
command is faster than DELETE
, but unlike the DELETE
command, data cannot be rolled back after using it to recover data that has been mistakenly deleted.
In the context of this article, a query is a set of instructions written in a query language like SQL that allows an individual to access information held in a database.
A subquery or nested query is a query within a query.
There are two types of subqueries: Correlated and Non-correlated.
FROM
keyword of the main query.The UNION
operation combines the results of two or more SELECT
statements. For example, getting the UNION
of sets A and B, this operation would return all rows from both sets, excluding any duplicate rows.
The UNION ALL
operation does the same thing as UNION
, but includes duplicate rows in its result set.
The INTERSECT
operation combines the results of two SELECT
statements but only returns the rows with matching values in both sets.
The MINUS
operation combines the results of two SELECT
statements but only returns rows with values that belong to the first set of the result.
Normalization refers to the methods used to remove redundancies and inconsistencies in a database.
Denormalization refers to methods used to improve the performance of queries.
Normalization introduces more tables to a database, whereas Denormalization reduces the number of tables.
Scalar functions are defined by the user and return a single value (i.e., int, char, float, etc.) based on the input value.
Common SQL scalar functions:
CONCAT()
concatenates two or more character strings.FORMAT()
sets the format to display a collection of values.LEN()
calculates the total length of a given column.MID()
extracts substrings from a collection of string values.ROUND()
rounds the integer value for a numeric field.NOW()
returns the current date and time.RAND()
calculates a random collection of numbers of a given length.In SQL, aggregate functions (also known as group functions) are applied to a group of values (or all values) to calculate and return a single value.
Common SQL aggregate functions:
AVG
calculates the average or mean of all values in a group.COUNT
calculates the number of rows in group, including rows with NULL
values.MIN
and MAX
returns the smallest and largest value in a group, respectively.SUM
returns the sum of all non-NULL
values in a group.STDDEV
calculates the standard deviation.VARIANCE
calculates the variance.Instead of writing the same SQL query multiple times, you can save it as a stored procedure and call on it whenever necessary to execute it.
Store an SQL query:
CREATE PROCEDURE procedure_name
AS
sql_statements
GO;
Execute a stored procedure:
EXEC procedure_name;
The SELECT
statement is used in SQL queries to store specific data elements or fields from a table and return them in a result set.
The SELECT
syntax:
SELECT column_1, column_2, ...
FROM table_name;
To select all data elements from a table, use
SELECT * FROM table_name;
An SQL index is a lookup table used by the database search engine to find and retrieve data quickly. An index can help makeSELECT
and WHERE
clauses faster but can slow down the use of UPDATE
and INSERT
statements. ]
To create an index:
CREATE INDEX index_name ON table_name;
The basic SQL SELECT
statement contains three clauses:
SELECT
specifies the table columns to retrieveFROM
specifies the tables to accessWHERE
is optional and specifies which rows in the FROM
tables to useThe GROUP BY
clause is used with aggregate functions to group the result set according to specified columns.
The HAVING
clause functions similarly to the WHERE
clause but allows the use of aggregate functions.
The ORDER BY
clause sorts the result set in ascending (ASC
) or descending (DESC
) order according to a specified column.
When writing your SELECT
queries, make sure that your syntax follows this order:
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);
Character manipulation functions can edit, change, or reformat character strings.
For example, you can concatenate two character strings by passing them into the CONCAT
function using a SELECT
query.
When you want to process result sets one row at a time, you can use a database cursor, a control structure that allows you to traverse records in a database. Cursors can be used to point to individual rows in a group of rows.
You can DECLARE
a cursor after any variable declaration.
DECLARE variable_name CHAR(20)
DECLARE cursor_name CURSOR FOR
SELECT column_name
FROM table_name
F(K, N)
, where K
is critical and N
is the number of slots containing a pointer and row.The order of rows in a clustered index corresponds to the order of rows in the database. A table can only have one clustered index at a time.
A non-clustered index functions similarly to a clustered index, but is slower and creates a separate entity within the table that references the original table. A table can have multiple non-clustered indices.
The ACID properties refer to properties that must be followed for transactions in a database management system to remain consistent.
An SQL schema is an abstract representation of logically structured data elements. Database schemas in SQL are defined at the logical level by a database user known as the schema owner.
Revise database schemas in 5 minutes with this helpful article
The alias (AS
) command makes columns or tables easier to read by giving them temporary names for the duration of a query.
You can use shallow cloning to create a copy of an existing table’s data structure and column attributes.
CREATE TABLE new_table LIKE table_1;
This command creates an empty table based on the parent table.
The SELECT DISTINCT
clause will only return unique values from a table.
The default ordering of data is ascending (ASC
). You can change the order by using the descending (DESC
) keyword with the ORDER BY
clause like so:
SELECT * FROM table_name ORDER BY column_name DESC;
LOWER
or LCASE
takes in a given character string and converts it to lower case.UPPER
or UCASE
takes in a given character string and converts it to upper case.The general syntax is:
SELECT column_name, group_function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name
ORDER BY column_name
The character or CHAR
datatype stores fixed length character strings.
The variable character or VARCHAR
datatype stores variable length character strings.
CHAR
has better performance than VARCHAR
, but VARCHAR
can be useful for anticipating data values without a set length.
There are two types of functions in SQL:
User-Defined Functions (UDFs) are similar to functions found in programming languages. UDFs accept parameters, perform complex calculations, and return their results.
There are three types of UDFs:
SELECT
statement can be prepared by the return statement, and this statement defines the structure of the table that function returns.SQL is non-procedural and interacts directly with the database server. It’s easy to learn and use, but if you need to solve more complicated SQL problems and are willing to learn some more complex concepts, then PL/SQL can be a powerful tool.
PL/SQL is a procedural language that doesn’t interact directly with the database server but offers a faster processing speed and an expanded range of supported features. You can accomplish everything you need to do in SQL and more using PL/SQL.
PL/SQL:
An entity can be a real-world object that can be identified by a collection of related attributes or properties. An example of an entity in a zoo database might include zookeepers, veterinarians, different public outreach initiatives, or species of animals.
Relationships are connections between entities that are associated with each other.
The logical relationship between entities creates a database.
Collation is a configuration setting that specifies how a database sorts and compares data. Different collation rules can be configured to determine the correct character sequence used to sort the character data.
Collation sensitivity can be used to specify how different characters are treated.
Learn to become a database professional with our hands-on courses today
SQL is a query language used for managing data in a relational database system. It is key to roles like data scientist, software engineer, and data engineer. This path will teach you essential SQL queries, including SELECT, WHERE, DISTINCT, GROUP BY, and ORDER BY, along with advanced concepts like joins, subqueries, and aggregate functions. By mastering these skills, you'll be prepared for the software industry with a strong foundation in SQL.
Learning how to access, analyze, and understand information has become an invaluable skill to have, and the competition for individuals with SQL experience is stiff. Businesses are modernizing at a breakneck pace, and learning practical data analysis with SQL has proven essential for companies eager to translate available data into innovative strategies and tangible results. We hope that you continue to seek and find new opportunities to expand your knowledge.
Happy learning!
To get started on learning these concepts and more, check out this course at Educative!
If you’re ready to commit to a more challenging curriculum, get certified with one of our learning paths!
Free Resources