What is an inverted index?

What is an inverted index?

The inverted index is a data structure that allows efficient, full-text searches in the database. It is a very important part of information retrieval systems and search engines that stores a mapping of words (or any type of search terms) to their locations in the database table or document.

Why do we need an inverted index?

I will explain this whole concept with an example.

Let’s assume we have a Quotes table in our database. Here is what the table will look like:

quote_id quote_text
101 Winter is coming
102 Chaos is a ladder
103 Are you coming, mylord
104 Winter has come

Let’s write a SQL query to search all the quotes with the text ‘winter’ in it:

Select * from Quotes where quote_text like '%winter%'

This command will look for the ‘winter’ text in all the rows, but it is very expensiveImagine millions of users searching for winter quotes when a new season of Game of Thrones comes out; this would affect the database throughput.

In this kind of scenario, where we have to do a full-text search in a database, it’s best to create an inverted index. This index allows for fast, full-text searches at the cost of increased processing.

A basic inverted index

This is how a basic inverted index will look for the Quotes table described above.

term quote_id
winter 101,104
is 101,102
coming 101,103
Chaos 102
a 102
Are 103
you 103
mylord 103
has 104
come 104

Once this index is constructed, as shown in this table, we can find all quotes with the term ‘winter’ with just a quick lookup.

Improving inverted index

While a basic inverted index can answer queries that have an exact match in the database, it may not work in all scenarios. For example:

  • Users may search for a term that is not present exactly in an inverted index, but are still related to it. For example, searching for snow or snowing in place of snowfall. We can address this issue through Stemming, which is a technique that extracts the root form of the words by removing affixes. For example, the root form of the words eating, eats, and eaten is eat.

  • Or they can search for a synonymFor example, searching glad or elated in place of happy. To solve this, the synonyms of the searched term are also looked up in the inverted index.

  • Users generally search for phrasessuch as fastest car rather than single wordslike fastest or car. To support phrase searching, Word-level Inverted indexes record the position of a word in the document as well to improve the search results.