...

/

Feature #2: Design Search Autocomplete System

Feature #2: Design Search Autocomplete System

Implementing the "Design Search Autocomplete System" feature for our "Search Engine" project.

Solution

To design this system, we will again use the trie data structure. Instead of simply storing the words in the prefix tree, as we did in the WordDictionary, we will now store the query strings. The AutocompleteSystem class will act as a trie that keeps a record of the previous queries and assigns them a rank based on their number of occurrences.

We will implement the AutocompleteSystem class as follows:

  • Constructor: In the constructor, we will feed the historical data into the system and create a trie out of it. We will initialize the root node of trie and call the addRecord() function to add all the records.

  • addRecord() function: This function inserts records in the trie by creating new nodes. Its functionality is similar to the insertWord() function that we discussed in Feature #1: Store and Fetch Words. Each node of the trie will have:

    • A children dictionary
    • A Boolean called isEnd to set the end of the query sentence
    • A new variable called data that is optional, but we can use it to store the whole query sentence in the last character of the sentence. This will increase space complexity but can make computation easier.
    • A rank variable to store the number of occurrences

    In the code below, you will notice that we are storing the rank as a negative value. There is a valid reason for doing this unintuitive step. Later, you will see that we will be using the sorted() method to obtain the top three results, and this negative rank will play a significant role. This will be explained in more detail in the explanation for the autoComplete() function.

  • search() function: This function checks to see if the first character exists in its children beginning with the root node. If it exists, we move on to the node with the first character and check its children for the next character. If the node corresponding to a character is not found, we return []. If the search string is found, the dfs() helper function is called on the node with the last character of the input query.

  • dfs() function: This function is the helper function that returns all the possible queries in the record that match the input. First, it will check if the node has isEnd set to True; if it does, the node’s rank and data will be appended as a tuple in an output list ret. Then, DFS exploration will be performed on all children nodes. All the possible queries will be added in ret and returned at the end.

  • autoComplete() function: This function checks if the input string is not the end of string delimiter "#". If it is not the end of the input string, we append the value to the keyword member variable. Then, we call the search() function, which returns the list of tuples as discussed above. On the other hand, if the input string is the end of the input, we add the value present in keyword to the trie by calling addRecord(). Next, the value of the keyword variable is reset to an empty string. Before returning, you can see that we do some computation on the result list. The list that we ...