Unique IDs with Causality

Learn how to use the time to generate a unique ID and also maintain the causality of events.

Causality

In the previous lesson, we generated unique IDs to differentiate various events. Apart from the identification of events, we are also interested in finding the sequence of these events. Let’s consider an example where Peter and John are two Twitter users. John tweets on a Twitter post (event A), and Peter replies to John’s comment (event B). Event B is dependent on event A and cannot happen before it. The events are not concurrent here.

We can also have concurrent events, i.e., two events independent of each other. For example, if Peter and John comment on two different Twitter posts, there is no happened-before relationship (causality) between them. It is essential to identify the dependence of one event over the other but not in the case of concurrent events.

The above scenario can also be handled by assigning a unique ID and encoding the dependence of events using a social graph. We might also use a separate time data structure and a simple, unique ID. However, we want a unique ID to do the double duty—unique identification and also helping with the causality of events.

The following slides provide a visualization of concurrent and non-concurrent events.

Some applications need that events to be given unique identifiers and carry any causality information. An example is giving an identifier to the concurrent writes of a key into a key-value store to implement the last-write-wins strategy.

We can either use logical or physical clocks to infer causality. Some systems have additional requirements where we want event identifiers’ causality to map wall-clock time. An example is a financial application complying with European MiFID regulations. MiFID requires clocks to be within 100 micro-seconds of UTC to detect anomalies during high-volume / high-speed market trades.

There are many subtleties associated with logical or physical clocks. See the following optional text, “Time in a Distributed System,” to refresh your concepts of time.

We use the time to determine the sequence of events in our life. For example, if Sam took a bath at 6 A.M. and ate breakfast at 7:00 A.M., we can determine that Sam took a bath before breakfast by timestamps of each event. Timestamps, therefore, can be used to maintain causality.

Using UNIX timestamps

The UNIX timestamps are granular to the milli-second, and we can use them to distinguish the events. We have an ID Generating Server that can generate one ID in a single millisecond. Any request to generate a unique ID is routed to that server, and it returns a timestamp, and we have a unique ID. Generating in milliseconds allows us to generate a thousand identifiers per second (that means we can get $24 (hour)* 60 (min/hour) * 60 (sec/min) * 1000 (ID/sec) = 86400000 IDs$ in a day). That is less than a billion per day.

Connect to the following terminal to view the UNIX timestamp in milliseconds.

The explanation of the division of the bits is as follows:

• Sign bit: A single bit is assigned as a sign bit, and its value will always be 0. It makes the overall number positive. Doing so helps to ensure that any programming environment using these identifiers interprets them as positive integers.

• Timestamp: 41 bits are assigned for milliseconds. The Twitter snowflake default epoch will be used. Its value is 1288834974657 which is equivalent to Nov 04, 2010, 01:42:54 UTC. (We can initiate our own epoch when our system will be deployed, say 2022-01-20 at 12 midnight can be the start of our epoch ...

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy

	Unique	Scalable	Available	64-bit numeric ID	Causality maintained
Using UUID	✖️	✔️	✔️	✖️	✖️
Using database	✖️	✖️	✔️	✔️	✖️
Using Rangehandler	✔️	✔️	✔️	✔️	✖️
Using UNIX timestamps	✖️	weak	✔️	✔️	weak

Introduction

Abstractions

Non-functional System Characteristics

Back-of-the-Envelope Calculations

Building Blocks

Domain Name System (DNS)

Sequencer

Rate Limiter

Distributed Cache

Blob Store

Content Delivery Network (CDN)

Load Balancers

Key-Value Store

Distributed Messaging Queue

Pub-sub

Distributed Task Scheduler

Distributed Search

Distributed Logging

Distributed Monitoring

Monitoring Server Side Errors

Monitoring Client Side Errors

Databases

Sharded Counters

Concluding Building Blocks

Design YouTube

Design Quora

Design Google Maps

Designing a Proximity Server like Yelp

Design Uber

Design Twitter

Newsfeed System

Design Instagram

Design URL Shortening Service / TinyURL

Design a Web Crawler

Design WhatsApp

Design Typeahead Suggestion

Design Collaborative Document Editing Service / Google Docs

Spectacular Failures

Concluding Remarks

Appendix: System Design Interviews

All content below this will likely go away

Design Exercises

Archived temporary lessons

Design Resource Allocator for a Large Datacenter

Design Zoom

Continuous Monitoring using Data Processing

Design Live Commenting at Facebook

Security

For Noor: Placeholder for Illustration Making

Appendix

Backup of our Lessons

Caching Billions of Tiny Objects on Flash

Design Quora

Copy-Design YouTube

Identity & Access Management

Copy of CDN (02-03-2022)

Unique IDs with Causality

Causality

Using UNIX timestamps

Pros

Cons

Twitter snowflake