Encoder for TinyURL

Understand the inner details of an encoder that is critical for URL shortening.

Introduction

We have discussed the overall design of a short URL generator (SUG) in detail, but two aspects need more clarification:

How does encoding improve the readability of the short URL?
How the sequencer and the base-58 encoder in the short URL generation are related?

Why Encoding?

Our sequencer generates a 64-bit ID in base-10 which can be converted to a base-64 short URL; base-64 is the most common encoding for alphanumeric strings generation. However, there are some inherent issues with sticking to the base-64 for this design problem: the generated short URL might have readability issues because of look-alike characters. Characters like O(capital o) and 0(zero), I (capital I), and l (lower case L) can be confused while characters like + and / should be avoided because of other system-dependent encodings.

We, therefore, slash out the six characters and use base-58 instead of base-64 (includes A-Z, a-z, 0-9, + and /) for enhanced readability purposes. Let's have a look at our base-58 definition.

The highlighted cells contain the succeeding characters of the omitted ones -- 0, O, I and l.

Base 10 -> Base 58

As we are converting base-10 numeric IDs to base-58 alphanumeric IDs, explaining the conversion process will be helpful in grasping the underlying mechanism as well as the overall scope of the SUG. To achieve the above functionality, we use the modulus function.

Process: We keep diving the base-10 number by 58, making note of the remainder at each step. We stop where there is no remainder left. Then we assign the character indexes to the remainders, starting from assigning the recent-most remainder to the left-most place and the oldest remainder to the right-most place.

Example: Let's assume that the selected unique ID is 2468135791013. Following are the steps depicting the remainder calculations:

Base-10 = 2468135791013

$2468135791013 \ \% \ 58 = 17$
$42554065362 \ \% \ 58 = 6$
$733690782 \ \% \ 58 = 4$
$12649841 \ \% \ 58 = 41$
$218100 \ \% \ 58 = 20$
$3760 \ \% \ 58 = 48$
$64 \ \% \ 58 = 6$
$1 \ \% \ 58 = 1$

Now, we need to write the remainders in the recent-most to the oldest order.

Base-58 = [1] [6] [48] [20] [41] [4] [6] [17]

Using the table above, we can write the associated characters with the above-written remainders.

Base-58 = 27qMi57J

Both the base-10 numeric IDs and base-64 alphanumeric IDs have 64-bits, as per our sequencer design.

Let's see the above example with the help of the following illustration.

Base-58 -> Base-10

The decoding process holds equal importance as the encoding process, as we used a decoder in case of custom short URLs generation, explained in the design lesson.

Process: The process of converting a base-58 number into a base-10 number is also straightforward: we just need to multiply each character index (value column from the table above) by the number of 58s that position holds, and add all the individual multiplication results.

Example: Let's reverse engineer the above example to see how decoding works.

Base-58: 27qMi57J

$2_{58} = 1 \times 58^{7} = 2207984167552$ ...

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy

Value	Character	Value	Charcter	Value	Character	Value	Charatcer
0	1	15	G	30	X	45	n
1	2	16	H	31	Y	46	o
2	3	17	J	32	Z	47	p
3	4	18	K	33	a	48	q
4	5	19	L	34	b	49	r
5	6	20	M	35	c	50	s
6	7	21	N	36	d	51	t
7	8	22	P	37	e	52	u
8	9	23	Q	38	f	53	v
9	A	24	R	39	g	54	w
10	B	25	S	40	h	55	x
11	C	26	T	41	i	56	y
12	D	27	U	42	j	57	z
13	E	28	V	43	k
14	F	29	W	44	m

Introduction

Abstractions

Non-functional System Characteristics

Back-of-the-Envelope Calculations

Building Blocks

Domain Name System (DNS)

Sequencer

Rate Limiter

Distributed Cache

Blob Store

Content Delivery Network (CDN)

Load Balancers

Key-Value Store

Distributed Messaging Queue

Pub-sub

Distributed Task Scheduler

Distributed Search

Distributed Logging

Distributed Monitoring

Monitoring Server Side Errors

Monitoring Client Side Errors

Databases

Sharded Counters

Concluding Building Blocks

Design YouTube

Design Quora

Design Google Maps

Designing a Proximity Server like Yelp

Design Uber

Design Twitter

Newsfeed System

Design Instagram

Design URL Shortening Service / TinyURL

Design a Web Crawler

Design WhatsApp

Design Typeahead Suggestion

Design Collaborative Document Editing Service / Google Docs

Spectacular Failures

Concluding Remarks

Appendix: System Design Interviews

All content below this will likely go away

Design Exercises

Archived temporary lessons

Design Resource Allocator for a Large Datacenter

Design Zoom

Continuous Monitoring using Data Processing

Design Live Commenting at Facebook

Security

For Noor: Placeholder for Illustration Making

Appendix

Backup of our Lessons

Caching Billions of Tiny Objects on Flash

Design Quora

Copy-Design YouTube

Identity & Access Management

Copy of CDN (02-03-2022)

Encoder for TinyURL

Introduction

Why Encoding?

Base-58

Base 10 -> Base 58

Base-58 -> Base-10