Data Structure for Storing Prefixes

Learn about the efficient data structure that’s used to store the search suggestions.

The trie data structure

Before we move on to the discussion of the detailed design of the typeahead suggestion system, we must choose an efficient data structure to store the prefixes. Prefixes are the groups of characters a user types. The issue we’re attempting to tackle is that we have many strings that we need to store in a way that allows users to search for them using any prefix. Our service suggests the next words that match the provided prefix. Let’s suppose our database contains the phrases UNITED, UNIQUE, UNIVERSAL, and UNIVERSITY. Our system should suggest “UNIVERSAL” and “UNIVERSITY” when the user types “UNIV.”

There should be a method that can efficiently store our data and help us conduct fast searches because we have to handle a lot of requests with minimal latency. We can’t rely on a database for this because providing suggestions from the database takes longer as compared to reading suggestions from the RAMRecall from the Back-of-the-Envelope Calculation chapter that reading 1MB of data sequentially from RAM takes 9 microseconds while from persistent storage like SSD takes 200 microseconds.. Therefore, we need to store our index in memory in an efficient data structure. However, for durability and availability, this data is stored in the database.

The trie (pronounced “try”) is one of the data structures that’s best suited to our needs. A trie is a tree-like data structure for storing phrases, with each tree node storing a character in the phrase in order. If we needed to store UNITED, UNIQUE, UNIVERSAL, and UNIVERSITY in the trie, it would look like this: