Failure in the World of Distributed Systems
Let's see why failures occur in distributed systems, and how we can detect them.
We'll cover the following...
We'll cover the following...
We should understand that it is challenging to identify failure because of all the characteristics of a distributed system that the Difficulties Designing Distributed Systems lesson described. One of them is the asynchronous nature of the network.
One reason for failure
The asynchronous nature of the network in a distributed system can make it very hard for us to differentiate between a crashed node and a node that is just really slow to respond to requests.
One mechanism to detect failure
Timeouts is the main mechanism we can use ...