Heartbeat messages in distributed systems

A node in a distributed system sends a regularly spaced message indicating it's healthy and active. If the node fails to send heartbeat messages, other nodes can assume that the node has failed.

Pros vs. cons of heartbeats

The following are some advantages and disadvantages of using heartbeat messages.

Pros

Cons

Low overhead; only a light message needs to be sent.

Limited information; incase of a failure, there will be no log of how the failure occurred.

High availability; the system can quickly detect and recover from failure.

False positives; because of any delay, a functional node can be assumed to be non-operational.

Heartbeats can detect most failures, even crashes and byzantine (a node partially fails and gives incorrect responses) failures.

Heartbeats are dependent on having reliable networks.

Examples

  • Kafka: It has a daemon thread called the HeartbeatThread that sends heartbeat requests to the group coordinator after regular intervals. The default interval is 30000 ms.

  • GFS: Each master node receives heartbeats from a chunk server node. The heartbeats in GFS also include other pieces of information, such as the available free space and the data chunks they hold.

  • HDFS: Here, a heartbeat message is used to communicate the status of DataNodes to the (master) NameNodes. Like GFS, HDFS heartbeat messages are used to send metadata.

Copyright ©2024 Educative, Inc. All rights reserved