Exploring High Availability and Fault Tolerance of a Cluster
Explore the high-availability and fault tolerance of our cluster.
We'll cover the following...
The cluster would not be reliable if it’s not fault-tolerant. kOps intends to do that, but we’re going to validate that anyway.
Terminating a worker node
Let’s retrieve the list of worker node instances.
Press + to interact
aws ec2 \describe-instances | jq -r \".Reservations[].Instances[] \| select(.SecurityGroups[]\.GroupName==\"nodes.$NAME\")\.InstanceId"
We use aws ec2 describe-instances
to retrieve all the instances (five in total). The output is sent to jq
, which filters them by the security group dedicated to worker nodes.
The output is as follows:
Press + to interact
i-063fabc7ad5935db5i-04d32c91cfc084369
We’ll terminate one of the worker nodes. To do that, we’ll pick a random one and retrieve its ID. ...