...

/

Failure in the World of Distributed Systems

Failure in the World of Distributed Systems

Let's see why failures occur in distributed systems, and how we can detect them.

We should understand that it is challenging to identify failure because of all the characteristics of a distributed system that the Difficulties Designing Distributed Systems lesson described. One of them is the asynchronous nature of the network.

One reason for failure

The asynchronous nature of the network in a distributed system can make it very hard for us to differentiate between a crashed node and a node that is just really slow to respond to requests.

One mechanism to detect failure

Timeouts is the main mechanism we can use ...