Feature #3: Find Dictionary
Implementing the "Find Dictionary" feature for our "Cyber Security" project.
Description
We use a network protocol that encrypts all application messages using a proprietary scheme. The encryption scheme has a unique property that the sequence of encrypted messages in a session appears to be in sorted order according to a secret dictionary. However, the dictionary is not transmitted for security purposes.
Before the sender starts transmitting actual messages, it sends several encrypted training messages to the receiver. The sender guarantees that the training messages will follow the lexicographic order according to the unknown dictionary.
The receiver must reverse engineer the training messages and generate the dictionary for future communication with the sender. If the order of the messages is invalid, the receiver generates an empty dictionary and asks the sender to retransmit the training messages.
For simplicity’s sake, we can assume that the encrypted contents of the messages only consist of English lowercase letters.
Let’s review a few examples below:
Solution
We can essentially map this problem to a graph problem but, before exploring the exact details of the solution, there are a few things that we need to keep in mind:
-
The letters within a message don’t tell us anything about the relative order. For example, the message
educative
in the list does not tell us that the lettere
is before the letterd
. -
The input can contain messages followed by their prefix, for example,
educated
and theneducate
. These cases will never result in a valid alphabet (because in a valid alphabet, prefixes are always first). We’ll need to make sure our solution detects these cases correctly. -
There can be more than one valid alphabet ordering. It is fine for our algorithm to return any one of them.
-
The output dictionary must contain all unique letters within the messages’ list, including those that could be in any position within the ordering. It should not contain any additional letters that were not in the input.
Now back to the graph problem part, we can break this particular problem into three parts:
-
Extract the necessary information to identify the dependency rules from the messages. For example, in the messages provided in the slides above,
["decode", "interview"]
, the letterd
comes beforei
. -
With the gathered information, we can put these dependency rules into a directed graph with the letters as nodes and the dependencies (order) as the edges.
-
Lastly, we can sort the graph nodes topologically to generate the letter ordering (dictionary).
Let’s look at each part ...
Level up your interview prep. Join Educative to access 70+ hands-on prep courses.