Shortest Encoding
Solve a medium-level problem of generating the shortest encoding from multiple strings using tries.
Problem statement
Given an array of strings, encode the given strings into a single reference string and an array of integers called indexes, such that the size of the word array is the same as the index array. Furthermore, for every word of the input, the reference string ends with a #
character, and for each index indices[i]
, the substring of indices[i]
and up to the next #
character is equal to words[i]
. Based on these conditions, return the length of the shortest reference string.
Example
Sample input
words: ["beat", "eat", "ball"]
Sample output
10
Explanation
A valid encoding would be "beat#ball#"
and indices = [0, 1, 5]
.words[0] = "beat"
, the substring of indices[0] = 0
to the next '#'
is placed in brackets in "(beat)#ball#"
. words[1] = "eat"
, the substring of indices[1] = 1
to the next '#'
is placed in brackets in "b(eat)#ball#"
. words[2] = "ball"
, the substring of indices[2] = 5
to the next '#'
is placed in brackets in "beat#(ball)#"
.
Try it yourself
Try to solve the problem yourself before reading the solution.
#include<iostream>#include<vector>using namespace std;int minimumLengthEncoding(vector<string>& words) {// your code goes herereturn -1;}
Intuition
Hashset-based approach
A brute-force solution is to build a set of words. Next, iterate on all the input words and remove all suffixes of every word from the set. Finally, the set will contain all the encodings of the words. Iterate on the set and return the sum of the word length + 1
for every word in the set. One is added to the size to accommodate for #
as a separator.
Let's try to understand this better for the example given above with these words: beat
, eat
, and ball
.
Set {"beat", "eat", "ball"}
Suffixes: {"t", "at", "eat", "at", "t", "all", "ll", "l"}
If any of the suffixes is present in the set eat
is deleted from the set {"beat", "ball"}
.
For the words remaining in the set, the length of word + 1
is added to find the length of the shortest encoding. In this case, the final length becomes