Solving the Distributed Snapshot Problem
Let's explore a seminal algorithm used for capturing distributed snapshots.
We'll cover the following
Chandy-Lamport algorithm
The Chandy Lamport algorithm solves the consistent snapshot problem in a distributed system.
Idea
The algorithm is based on the following main idea: a marker message is sent between nodes using the available communication channels that represent an instruction to a node to record a snapshot of the current state.
Working
The algorithm works as follows:
- The node that initiates the protocol records its state and then sends a marker message to all the outbound channels.
Importantly, the marker is sent after the node records its state and before any further messages are sent to the channels.
- When a node receives a marker message, its behaviour depends on whether the node has already recorded its state (while emitting the mark previously) or not.
- If the node has not recorded its state, it records its state, and then it records the state of the channel the marker was received from as an empty sequence. It then sends the marker to all the outbound channels.
- If the node has recorded its state, it records the state of the channel the marker was received from as the sequence of messages received from after the node’s state was recorded and before the node received the marker from .
Get hands-on with 1400+ tech skills courses.