The Read Path

Examine the Apache Cassandra Read Path along with the storage structures that expedite the read process.

High-level read path

Being a peer-to-peer system, any node in the Cassandra cluster can handle a read request. The node receiving the request becomes the coordinator node. The coordinator node checks whether enough replicas are available to satisfy the consistency level (CL) specified for the operation. If not, an exception is thrown and the read operation fails.

If CL can be achieved, the coordinator employs a dynamic snitch to determine the fastest replica. It sends a “direct read request” to the fastest replica and “digest requests” to a number of replicas required to fulfill the consistency level. A direct read request results in the replica responding with the requested data. A digest request results in the replica sending a digest/checksum-hash of the requested data. Digest requests reduce the network data traffic.

Once the data from the direct (full) read request has been received, the coordinator calculates its digest/hash and compares it to the digests sent by other replicas. If all digests are identical, and the required consistency level has been achieved, the data from the direct read request is delivered to the client. 

The diagram below illustrates the high-level read path with an example. The cluster consists of one datacenter titled datacenter1 comprising five nodes. Assume the RF for the keyspace is 3 and the consistency level (CL) for the read operation is TWO.

Get hands-on with 1300+ tech skills courses.