The inverted index is a data structure that allows efficient, full-text searches in the database. It is a very important part of information retrieval systems and search engines that stores a mapping of words (or any type of search terms) to their locations in the database table or document.
I will explain this whole concept with an example.
Let’s assume we have a Quotes
table in our database. Here is what the table will look like:
quote_id | quote_text |
---|---|
101 | Winter is coming |
102 | Chaos is a ladder |
103 | Are you coming, mylord |
104 | Winter has come |
Let’s write a SQL query to search all the quotes with the text ‘winter’ in it:
Select * from Quotes where quote_text like '%winter%'
This command will look for the ‘winter’ text in all the rows, but it is
In this kind of scenario, where we have to do a full-text search in a database, it’s best to create an inverted index. This index allows for fast, full-text searches at the cost of increased processing.
This is how a basic inverted index will look for the Quotes
table described above.
term | quote_id |
---|---|
winter | 101,104 |
is | 101,102 |
coming | 101,103 |
Chaos | 102 |
a | 102 |
Are | 103 |
you | 103 |
mylord | 103 |
has | 104 |
come | 104 |
Once this index is constructed, as shown in this table, we can find all quotes with the term ‘winter’ with just a quick lookup.
While a basic inverted index can answer queries that have an exact match in the database, it may not work in all scenarios. For example:
Users may search for a term that is not present exactly in an inverted index, but are still related to it. For example, searching for snow or snowing in place of snowfall. We can address this issue through Stemming, which is a technique that extracts the root form of the words by removing affixes. For example, the root form of the words eating, eats, and eaten is eat.
Or they can search for a
Users generally search for