Encoder for TinyURL
Understand the inner details of an encoder that are critical for URL shortening.
Introduction
We've discussed the overall design of a short URL generator (SUG) in detail, but two aspects need more clarification:
How does encoding improve the readability of the short URL?
How are the sequencer and the base-58 encoder in the short URL generation related?
Why to use encoding
Our sequencer generates a 64-bit ID in base-10, which can be converted to a base-64 short URL. Base-64 is the most common encoding for alphanumeric strings' generation. However, there are some inherent issues with sticking to the base-64 for this design problem: the generated short URL might have readability issues because of look-alike characters. Characters like O
(capital o) and 0
(zero), I
(capital I), and l
(lower case L) can be confused while characters like +
and /
should be avoided because of other system-dependent encodings.
So, we slash out the six characters and use base-58 instead of base-64 (includes A-Z, a-z, 0-9, +
and /
) for enhanced readability purposes. Let's look at our base-58 definition.