Solution: Alien Dictionary

Statement

In this challenge, you are given an array of words written in an alien language, where the words are sorted lexicographically by the rules of this language. Surprisingly, the aliens also use English lowercase letters, but possibly in a different order.

Given an array of words written in the alien language, you have to return a string of unique letters sorted in the lexicographical order of the alien language as derived from the array of words.

If there’s no solution, that is, no valid lexicographical ordering, you can return an empty string.

Note: The lexicographic order of a given language is defined by the order in which the letters of its alphabet appear. In English, the letter “n” appears before the letter “r” in the alphabet. As a result, in two words that are the same up to the point where one features “n” and the other features “r,” the former is considered the lexicographically smaller word of the two. For this reason, “ban” is considered lexicographically smaller than “bar.”

Similarly, if an input contains words followed by their prefix, such as “educated” and then “educate,” these cases will never result in a valid alphabet because in a valid alphabet, prefixes are always first.

Constraints:

  • 11 \leq words.length 103\leq 10^3
  • 11 \leq words[i].length 20\leq 20
  • All characters in words[i] are English lowercase letters.

Solution

So far, you’ve probably brainstormed some approaches and have an idea of how to solve this problem. Let’s explore some of these approaches and figure out which one to follow based on considerations such as time complexity and any implementation constraints.

Naive approach

The naive approach is to generate all possible orders of alphabets in the alien language and then iterate over them, character by character, to select the ones that satisfy the dictionary dependencies. So, there’d be O(u!) permutations, where uu is the number of unique alphabets in the alien language, and for each permutation, we’d have to check if it’s a valid partial order. That requires comparing against the dictionary words repeatedly.

This is very expensive since there are an exponential number of possible orders (u!{u!}) and only a handful of valid ones. On top of that, there’d be additional effort to compare them against the dictionary. The time complexity for this approach is O(u!)O(u!). The space complexity is O(1).O(1).

Optimized approach using topological sort

We can solve this problem using the topological sort pattern. Topological sort is used to find a linear ordering of elements that have dependencies on or priority over each other. For example, if AA is dependent on BB or BB has priority over AA, then BB is listed before AA in topological order.

Using the array of words, we identify the relative precedence order of the letters in the words and generate a graph to represent this ordering. To traverse a graph, we can use breadth-first search to find the letters’ order.

We can essentially map this problem to a graph problem, but before exploring the exact details of the solution, there are a few things that we need to keep in mind:

  • The letters within a word don’t tell us anything about the relative order. For example, the word “educative” in the array doesn’t tell us that the letter “e” is before the letter “d.”

  • The input can contain words followed by their prefix, such as “educated” and then “educate.” These cases will never result in a valid alphabet because in a valid alphabet, prefixes are always first. We need to make sure our solution detects these cases correctly.

  • There can be more than one valid alphabet ordering. It’s fine for our algorithm to return any one of them.

  • The output dictionary must contain all unique letters within the words array, including those that could be in any position within the ordering. It shouldn’t contain any additional letters that weren’t in the input.