Shortest Encoding

Solve a medium-level problem of generating the shortest encoding from multiple strings using tries.

Problem statement

Given an array of strings, encode the given strings into a single reference string and an array of integers called indexes, such that the size of the word array is the same as the index array. Furthermore, for every word of the input, the reference string ends with a # character, and for each index indices[i], the substring of $S$ starting from indices[i] and up to the next # character is equal to words[i]. Based on these conditions, return the length of the shortest reference string.

Example

Sample input

Explanation

A valid encoding would be "beat#ball#" and indices = [0, 1, 5].

words[0] = "beat", the substring of $S$ starting from indices[0] = 0 to the next '#' is placed in brackets in "(beat)#ball#".

words[1] = "eat", the substring of $S$ starting from indices[1] = 1 to the next '#' is placed in brackets in "b(eat)#ball#".

words[2] = "ball", the substring of $S$ starting from indices[2] = 5 to the next '#' is placed in brackets in "beat#(ball)#".

Try it yourself

Try to solve the problem yourself before reading the solution.

Intuition

Hashset-based approach

A brute-force solution is to build a set of words. Next, iterate on all the input words and remove all suffixes of every word from the set. Finally, the set will contain all the encodings of the words. Iterate on the set and return the sum of the word length + 1 for every word in the set. One is added to the size to accommodate for # as a separator.

Let's try to understand this better for the example given above with these words: beat, eat, and ball.

Set $S$ : {"beat", "eat", "ball"}

Suffixes: {"t", "at", "eat", "at", "t", "all", "ll", "l"}

If any of the suffixes is present in the set $S$ , it's deleted from the set. In this case, eat is deleted from the set $S$ . So the set becomes {"beat", "ball"}.

For the words remaining in the set, the length of word + 1 is added to find the length of the shortest encoding. In this case, the final length becomes $((4+1)+(4+1)) = 10$ ...

Access this course and 1400+ top-rated courses and projects.

Preview Free Lessons→

Preview Free Lessons

Introduction to Tries

Prefix Search

Suffix Search

Bitwise Tries

Pattern Matching

File Systems

Trie Traversal

Search Engine

Miscellaneous

Conclusion

Shortest Encoding

Problem statement

Example

Try it yourself

Intuition