Shortest Encoding

Solve a medium-level problem of generating the shortest encoding from multiple strings using tries.

Problem statement

Given an array of strings, encode the given strings into a single reference string and an array of integers called indexes, such that the size of the word array is the same as the index array. Furthermore, for every word of the input, the reference string ends with a # character, and for each index indices[i], the substring of SS starting from indices[i] and up to the next # character is equal to words[i]. Based on these conditions, return the length of the shortest reference string.

Example

Sample input

words: ["beat", "eat", "ball"]

Sample output

10

Explanation

A valid encoding would be "beat#ball#" and indices = [0, 1, 5].

words[0] = "beat", the substring of SS starting from indices[0] = 0 to the next '#' is placed in brackets in "(beat)#ball#".

words[1] = "eat", the substring of SS starting from indices[1] = 1 to the next '#' is placed in brackets in "b(eat)#ball#".

words[2] = "ball", the substring of SS starting from indices[2] = 5 to the next '#' is placed in brackets in "beat#(ball)#".

Try it yourself

Try to solve the problem yourself before reading the solution.

#include<iostream>
#include<vector>
using namespace std;
int minimumLengthEncoding(vector<string>& words) {
// your code goes here
return -1;
}
Solve shortest encoding in C++

Intuition

Hashset-based approach

A brute-force solution is to build a set of words. Next, iterate on all the input words and remove all suffixes of every word from the set. Finally, the set will contain all the encodings of the words. Iterate on the set and return the sum of the word length + 1 for every word in the set. One is added to the size to accommodate for # as a separator.

Let's try to understand this better for the example given above with these words: beat, eat, and ball.

Set SS: {"beat", "eat", "ball"}

Suffixes: {"t", "at", "eat", "at", "t", "all", "ll", "l"}

If any of the suffixes is present in the set SS, it's deleted from the set. In this case, eat is deleted from the set SS. So the set becomes {"beat", "ball"}.

For the words remaining in the set, the length of word + 1 is added to find the length of the shortest encoding. In this case, the final length becomes ((4+1)+(4+1))=10((4+1)+(4+1)) = 10 ...

Access this course and 1400+ top-rated courses and projects.