Advanced Indexing
Explore the tools available for query performance analysis and query optimization in PostgreSQL.
We'll cover the following
The PostgreSQL documentation about indexes covers everything we need to know, in detail, including the following:
- Multicolumn indexes
- Indexes and ORDER BY
- Combining multiple indexes
- Unique indexes
- Indexes on expressions
- Partial indexes
- Partial unique indexes
- Index-only scans
There are more indexes so consider reading this PostgreSQL chapter in its entirety as the content is not repeated in this course. Still, we’ll need it to make informed decisions about our indexing strategy.
Adding indexes
Deciding which indexes to add is central to our indexing strategy. Not every query needs to be that fast, and the requirements are mostly user-defined. That said, a general system-wide analysis can be achieved thanks to the PostgreSQL extension pg_stat_statements
.
Once this PostgreSQL extension is installed and deployed, this needs a PostgreSQL restart because it needs to be registered in shared_preload_libraries
. Then, it’s possible to have a list of the most common queries in terms of the number of times the query is executed and the cumulative time it took to execute the query.
Query’s performance analysis
We can begin our indexing needs analysis by listing every query that averages out to more than 10 milliseconds or some other sensible threshold for our application. The only way to understand where time is spent in a query is by using the explain
command and reviewing the query plan. From the documentation of the command:
PostgreSQL devises a query plan for each query it receives. Choosing the right plan to match the query structure and the properties of the data is absolutely critical for good performance, so the system includes a complex planner that tries to choose good plans. We can use the
explain
command to see what query plan the planner creates for any query. Plan reading is an art that requires some experience to master, but this section attempts to cover the basics.
Here’s a very rough guide to using explain
for fixing query performances:
-
Use the spelling below when using
explain
to understand the runtime characteristics of our queries:explain (analyze, verbose, buffers) <query here>;
-
In particular, when we’re new to reading query plans, use visual tools such as PostgreSQL Explain Visualizer or the one included in pgAdmin.
-
First, check for row count differences between the estimated and the effective numbers.
Good statistics are critical to the PostgreSQL query planner, and the collected statistics need to be reasonably up-to-date. When there’s a huge difference between estimated and effective row counts (several orders of magnitude, a thousand times off or more), check to see if tables are analyzed frequently enough by the Autovacuum Daemon, then check if we should adjust our statistics target.
-
Finally, check for time spent doing sequential scans of our data with a filter step, as that’s the part that a proper index might be able to optimize.
Remember Amdahl’s law when optimizing any system: if some step takes 10 percent of the run time, then the best optimization we can reach from dealing with this step is 10 percent less, and usually, that’s by removing the step entirely.
This very rough guide doesn’t take into account costly functions and expressions which may be indexed thanks to indexes on expressions nor ordering clauses that might be derived directly from a supporting index.
Query optimization in SQL
We’re covering all the SQL capabilities that we can use to retrieve exactly the result set needed by our application.
The vast majority of slow queries found in the wild are still queries that return way too many rows to the application, straining the network and the server’s memory. Returning millions of rows to an application displays a summary in a web browser is far too common.
Get hands-on with 1300+ tech skills courses.