There is another basic problem in distributed systems that is strongly related to the notion of time and order. It is stated below.

Problem

Here is the problem: how to record a snapshot of the state of a distributed system comprising of multiple nodes that perform a continuous computation?

There are many more problems in distributed systems that can be expressed in terms of the problem of detecting a global stateThis problem is also known as stable property detection and can have many different usages, such as detection of deadlocks or termination of a computation. and specific properties associated with it. We will only focus on the distributed snapshots problem in this lesson.

Distributed snapshots can be used as a recovery mechanism from a point in the past when failures happen.

Capturing distributed snapshots

A seminal algorithm used for capturing distributed snapshots is the Chandy-Lamport algorithmK. M. Chandy and L. Lamport, “Distributed Snapshots: Determining Global States of Distributed Systems,” ACM Transactions on Computer Systems (TOCS), Volume 3 Issue 1, Feb. 1985, 1985..

Get hands-on with 1400+ tech skills courses.