Vector clocks are an extension of Lamport clocks used to capture causality between events in a distributed system. Vector clocks are widely used in large-scale distributed systems to resolve conflicting data.
To enhance performance, Dynamo uses optimistic replication. This means that there is a possibility of conflicting data. Vector clocks are used in Dynamo to resolve multiple conflicting values against the same key.
A single machine is the absolute or wall clock time:
Suppose we perform a write to key k
with timestamp t1
and then perform another write to k
with timestamp t2
. Since t2
> t1
, the second write must be newer than the first write, and therefore the database can safely overwrite the original value.
In a distributed system, this assumption does not hold. The problem is clock skew, such as, different clocks tend to run at different rates, so we cannot assume that time t
on node a
happened before time t + 1
on node b
.
The most practical techniques that help with synchronizing clocks, like
Instead of employing tight synchronization mechanics, Dynamo uses vector clocks to capture causality between different versions of the same object. A vector clock is effectively a (node, counter)
pair. Every version of an object stored in Dynamo is associated with a vector clock. To determine causality between two versions, one can examine their vector clocks. If the first object’s counters are strictly less than or equal to the second clock, the first object happened before the second. Otherwise, it suggests that the vector clocks do not have a causal connection and require further conflict reconciliation.
Dynamo resolves these conflicts at read-time. Let’s understand this with an example:
A
serves a write to key k1
, with the value foo
. It assigns it a version of [A:1]
. This write gets replicated to server B
.A
serves a write to key k1
, with value bar
. It assigns it a version of [A:2]
. This write also gets replicated to server B
.A
and B
cannot talk to each other.A
serves a write to key k1
, with the value baz
. It assigns it a version of [A:3]
. It cannot replicate it to server B
, but it gets stored in a hinted handoff buffer on another server.B
sees a write to key k1
, with the bax
value. It assigns it a version of [B:1]
. It cannot replicate it to server A
, but it gets stored in a hinted handoff buffer on another server.A
and B
can talk to each other again.k1
. It sees the same key with different versions [A:3]
and [A:2][B:1]
, but it does not know which one is newer. It returns both and tells the client to figure out the version and write the newer version back into the system.In the above example, most of the time, new versions subsume the previous version(s), and the system itself can determine the correct version (For example, [A:2]
is newer than [A:1]
). However, there are cases where the system cannot conclude on the correct version among the multiple versions of the same object, so this responsibility is given to the client. This is known as semantic reconciliation.
Such a situation can arise in the presence of node failures, combined with concurrent updates leading to conflicting versions of an object. An example of semantic reconciliation is the shopping cart feature provided by Amazon. This mechanism guarantees that an ‘Add to cart’ operation is never lost, but it is possible that deleted items will resurface.