How to build highly available/fault-tolerant services in node.js

During my job for an important client, I began thinking about high availability and recovery NFRs – our tech stack included Cassandra and Kafka, two distributed systems whose internal behavior I studied.

Kafka uses Zookeeper to keep track of assigned partitions for each consumer; Cassandra has a gossip algorithm between nodes and divides data in partition ranges.

So, I was starting to think if there was any library (not an external service like zookeeper) that had an algorithm with gossip implemented so that people could build new distributed systems more easily.

That library does not exist, so I created ring-election .

You can integrate ring-election into your node process, and you will have some important pre-constructed NFRs!!!

What the ring-election driver offers you:

  • A default partitioner for an object that returns the partition to which it is assigned.
  • Mechanism of leader election.
  • Failure detection between nodes.
  • Assignment and rebalancing of partitions between nodes.
  • Automatic re-election of the leader.
  • Listening for new assigned/revoked partitions.

What problems can you solve with this driver?

  • Scalability
  • High Availability
  • Concurrency between nodes in a cluster
  • Automatic Failover
  • Gossip between nodes

How it works under the hood

Terminology:

  • Leader – the node that will handle the cluster and has assigned partitions.
  • Follower – a node that will have assigned partitions and will work on them.
  • Heartbeat – a message sent periodically from the follower nodes to the leader node to make sure that the follower is alive.
  • Heartcheck – a process that runs on the leader and checks the last heartbeat received by each follower.
  • Priority – is assigned to each follower based on the time that they joined the cluster. When a node dies, the priority is decreased by one. If the leader dies, the node with a lower priority will become the leader.
  • Node id – each follower node has an assigned id that is unique to the cluster.

Start up phase

widget

Detect follower failures

widget

Leader failure

widget

How to integrate it

Need to know how to integrate it? Join https://github.com/pioardi/ring-election for more info. If you want to suggest new features or need help integrating a ring-election, open an issue on GitHub and I will be happy to help you. Also,​ new feature and pull requests are welcome :)