High Availability
This lesson explains how high availability is implemented for HDFS.
We'll cover the following...
High Availability
High availability is characteristic of a distributed system. It is defined as the ability of a system or system component to be continuously operational for a long period of time. For example, Amazon’s ubiquitous S3 storage boasts a 99.99% availability over a given year.
To achieve high availability for HDFS, we need more than one instance of the Namenode to avoid downtime and failures during software/hardware upgrades . In HA setup, one Namenode serves client queries and is known as the Active Namenode. The rest are known as standby Namenodes. If the active Namenode experiences a failure, a standby Namenodes takes over.
Working
Imagine a cluster with two Namenodes. In order for the standby Namenode to successfully take over incase of failure, it exactly ...