Versioning Data and Achieving Configurability
Learn how to resolve conflicts via versioning and how to make the key-value storage into a configurable service.
Data versioning
When network partitions and node failures occur during an update, an object’s
To handle inconsistency, we need to maintain causality between the events. We can do this using the timestamps and update all conflicting values with the value of the latest request. But time isn’t reliable in a distributed system, so we can’t use it as a deciding factor.
Another approach to maintaining causality effectively is by using vector clocks. A vector clock is a list of (node, counter) pairs. There’s a single vector clock for every version of an object. If two objects have different vector clocks, we’re able to tell whether they’re causally related or not (more on this in a bit). Unless one of the two changes is reconciled, the two are deemed at odds.
Explain how metadata like versioning and checksums, which detect data corruption, help maintain data integrity and consistency in a key-value store.
Use the AI assessment widget below to submit your solution and get an interactive response.
Modify the API design
We talked about how we can decide if two events are causally related or not using a vector clock value. For this, we need information about which node performed the operation before and what its vector clock value was. This is the context of an operation. So, we’ll modify our API design as follows.
The API call to get a value should look like this:
get(key)
Parameter | Description |
| This is the |
We return an object or a collection of conflicting objects along with a context
. The context
holds encoded metadata about the object, including details such as the object’s version.
The API call to put the value into the system should look like this:
put(key, context, value)
Parameter | Description |
| This is the |
| This holds the metadata for each object. |
| This is the object that needs to be stored against the |
The function finds the node where the value should be placed on the basis of the key
and stores the value associated with it. The context
is returned by the system after the get
operation. If we have a list of objects in context
that raises a conflict, we’ll ask the
To update an object in the key-value store, the client must give the context
. We determine version information using a vector clock by supplying the context
from a previous read operation. If the key-value store has access to several branches, it provides all objects at the leaf nodes, together with their respective version information in context, when processing a read request. Reconciling disparate versions and merging them into a single new version is considered an update.
Note: This process of resolving conflicts is comparable to how it’s done in Git. If Git is able to merge multiple versions into one, merging is performed automatically. It’s up to the client (the developer) to resolve conflicts manually if automatic conflict resolution is not possible. Along the same lines, our system can try automatic conflict resolution and, if not possible, ask the application to provide a final resolved value.
Vector clock usage example
Let’s consider an example. Say we have a write operation request. Node handles the first version of the write request, ; where means event. The corresponding vector clock has node information and its counter—that is, . Node handles another write for the same object on which the previous write was performed. So, for , we have . is no longer required because ...