Encoder for TinyURL

Understand the inner details of an encoder that are critical for URL shortening.

Introduction

We've discussed the overall design of a short URL generator (SUG) in detail, but two aspects need more clarification:

  1. How does encoding improve the readability of the short URL?

  2. How are the sequencer and the base-58 encoder in the short URL generation related?

Why to use encoding

Our sequencer generates a 64-bit ID in base-10, which can be converted to a base-64 short URL. Base-64 is the most common encoding for alphanumeric strings' generation. However, there are some inherent issues with sticking to the base-64 for this design problem: the generated short URL might have readability issues because of look-alike characters. Characters like O (capital o) and 0 (zero), I (capital I), and l (lower case L) can be confused while characters like + and / should be avoided because of other system-dependent encodings.

So, we slash out the six characters and use base-58 instead of base-64 (includes A-Z, a-z, 0-9, + and /) for enhanced readability purposes. Let's look at our base-58 definition.