...

/

Exploring High Availability and Fault Tolerance of a Cluster

Exploring High Availability and Fault Tolerance of a Cluster

Explore the high-availability and fault tolerance of our cluster.

The cluster would not be reliable if it’s not fault-tolerant. kOps intends to do that, but we’re going to validate that anyway.

Terminating a worker node

Let’s retrieve the list of worker node instances.

Press + to interact
aws ec2 \
describe-instances | jq -r \
".Reservations[].Instances[] \
| select(.SecurityGroups[]\
.GroupName==\"nodes.$NAME\")\
.InstanceId"

We use aws ec2 describe-instances to retrieve all the instances (five in total). The output is sent to jq, which filters them by the security group dedicated to worker nodes.

The output is as follows:

Press + to interact
i-063fabc7ad5935db5
i-04d32c91cfc084369

We’ll terminate one of the worker nodes. To do that, we’ll pick a random one and retrieve its ID. ...