Index in SQL is a lookup table that enables us to quickly retrieve data from the database. The index is applied to column(s) in a table.
Indexes help us by making search and data retrieval queries faster. SQL can easily identify the row(s) using the index and quickly fetch the data we need.
However, on every INSERT/UPDATE query on the table, the indexes associated with that table are also updated, which make the writes to the table slower. With multiple indexes, things can become pretty bad.
Ideally, we want to have a low number of indexes that optimize most of the data retrieval we need.
We will be using EXPLAIN
, which provides insights into the query. The one we are most interested in is which index the query is utilizing.
The EXPLAIN
output has two important columns, key
, which is the index that is actually chosen.
You can find more details about the explanation output in the official docs.
Let’s see an example. We create an index using:
CREATE INDEX index_name ON table_name (column_name);
CREATE TABLE person(id INT NOT NULL,first_name VARCHAR(15) NOT NULL,last_name VARCHAR(15) NOT NULL,age INT NOT NULL,PRIMARY KEY(id));/* Index on First Name */CREATE INDEX person_fname ON person (first_name);/* Will tell us whether our query uses the intended index */explain SELECT * FROM person WHERE first_name = "Donald" \G
We need to isolate the column for it to utilize the index. Isolating means that the column should not be part of function or expression.
Let’s see an example:
CREATE TABLE person(id INT NOT NULL,first_name VARCHAR(15) NOT NULL,last_name VARCHAR(15) NOT NULL,age INT NOT NULL,PRIMARY KEY(id));/* Index on Age */CREATE INDEX person_fname ON person (age);/* Column not isolated, Won't use Index */explain SELECT * FROM person WHERE age + 1 = 18 \G/* New Line in output, Please Scroll down in output*/SELECT "";/* This will use index */explain SELECT * FROM person WHERE age= 18 + 1 \G
As discussed earlier, it would not be efficient to create a separate index for each column. So, instead, we can set up a composite index.
Composite or multi-column indexes sound great, but they have one disadvantage: the order of the columns in the index is critical. For every column in the index, a column on its left must be included in the query.
We start from the leftmost column, and although we cannot skip over columns, we can ignore columns on the right.
For example:
INDEX(col_1, col_2, col_3 ... col_n)
The query, Q(col_1, col_2, col3)
would use the index, even though columns on the right of col_3
are not in the query. However, the query, Q(col_2, col_3, ..., col_n)
would not be able to.
Let’s see another example:
CREATE TABLE persons(id INT NOT NULL,first_name VARCHAR(15) NOT NULL,last_name VARCHAR(15) NOT NULL,age INT NOT NULL,address VARCHAR(64) NOT NULL,PRIMARY KEY(id));/* Multi Column Index */CREATE INDEX person_info ON persons (first_name, last_name, age);/* Following Query Uses above Index */explain SELECT * FROM persons WHERE first_name = "Donald" and last_name = "Qualls" and age = 16 \G/* Unable to use the above Index, because we skipped first_name */SELECT "";explain SELECT * FROM persons WHERE last_name="Qualls" and age > 20 \G/* Uses Index as only Columns on Right of the last column used are ignored */SELECT "";explain SELECT * FROM persons WHERE first_name="Donald" \G
Composite indexes also don’t optimize for columns in the index after the first range condition (>, <, like, etc.
) on a column.
So, for INDEX(col_1, col_2, col_3)
and query Q(col_1, col_2 > something, col_3)
, only col_1
and col_2
will be optimized using the Index.
Indexes are stored separately, and SQL uses them to find rows in the actual table quickly.
A covering index is when all required fields for a query are included, so data is taken from the index table instead of the actual table.
A covering index can be identified from the Explain
output when the Extra
property has the value Using Index
.
CREATE TABLE Vehicles(id INT NOT NULL,name VARCHAR(15) NOT NULL,model VARCHAR(15) NOT NULL,year INT NOT NULL,price INT NOT NULL,PRIMARY KEY(id));/* Multi Column Index */CREATE INDEX vehicle_info ON Vehicles (name, year, model);/* Covering, as All information is present in Index*/explain SELECT name, model, year FROM VehiclesWHERE name = 'Nissan' and model = 'modelA' and year = 2018 \G/* Not Covering, price is not in index*/explain SELECT name, model, year, price FROM VehiclesWHERE name = 'Nissan' and model = 'modelA' and year = 2018 \G/* Using Where with Using Index, Why? Due to Range Condition on year, model column is not optimized */explain SELECT name, model, year FROM VehiclesWHERE name = 'Nissan' and model = 'modelA' and year > 2018 \G
Always use indexes on columns with good Selectivity.
Selectivity is defined as the (no. of distinct values * 100) / total values
. For example, the Index on the Gender column has poor selectivity as it mostly has two values. Suppose there are 10,000 records. Then, the selectivity of this column would be 0.02%.
The more distinct values the column has, the better it will perform. For composite-index
, try to keep the columns with high selectivity on the left (or start).
Avoid redundant indexes.
For example,INDEX (A, B)
and INDEX(A)
are redundant as A is leftmost in both and we can remove them later.
Always design your indexes by looking at your queries.
Do not add indexes during Database design or before writing any queries. Index the one that is frequently used or becoming a bottleneck. Analyze, test, and implement.
Free Resources