Handling hardware faults

For disk failures, it’s a very common practice to add redundancy by having more than one disk drive store the same data. These disks are generally cheap ones. Storing the same data in more than one disk helps to make sure that if there is an unrecoverable disk failure, then another disk can be used to recover the data.

Note: Arranging multiple disks to store the same copy of data is called RAID (Redundant Array of Independent Disks). Say your initial system of the My Cool App was like this:

A single node.
The server and database process are both running on the same node.

Now, if there is a failure in the disk of the single node, it’s unusable. In this ...

Introduction

What Distributed Systems Achieve for Us

Data in Distributed Systems

Communication Between Nodes

Data Processing in Large Scale

Distributed System Architectural Patterns

Case Study 1: Apache Spark

Case Study 2: Apache Druid

Conclusion

Handling Hardware and Software Faults

Handling hardware faults