Introduction
We'll cover the following...
Introduction
Writing code on a single node is fairly straightforward but the moment we switch to writing code that runs on multiple computers connected by a network (distributed systems), the number of ways faults and failures can occur is numerous, nondeterministic and unpredictable. For example:
Misconfiguration of network switches
Accidental power cycles
Power distribution unit (PDU) failures
Backbone failures for the entire datacenter
Power failure for the entire datacenter
Distributed systems also suffer from partial failures, where a part of the system experiences failure but not the entire system. A distributed system may continue to work intermittently ...