The DevOps Toolkit: Kubernetes Chaos Engineering/

...

Making Nodes Drainable

In this lesson, we will re-run the chaos experiment after making our nodes drainable by scaling our cluster and the Istio components.

We'll cover the following...

- Taking a look at Istio Deployments
- Scaling the cluster
  - Defining an environment variable
- - For GKE clusters
- - For EKS clusters
- - For AKS clusters
- Checking the nodes to confirm
- Scaling the Istio components
- Re-running chaos experiment and inspecting the output

Press + to interact

The output is as follows.

NAME                 READY UP-TO-DATE AVAILABLE AGE
istio-ingressgateway 1/1   1          1         12m
istiod               1/1   1          1         13m
prometheus           1/1   1          1         12m

We can see that there are two components, not counting prometheus. If we focus on the READY column, we can see that they’re all having one replica.

The two Istio components have a HorizontalPodAutoscaler (HPA) associated. They control how many replicas we’ll have, based on metrics like CPU and memory usage. What we need to do is set the minimum number of instances to 2.

Since the experiment revealed that istio-ingressgateway should have at least two replicas, that’s the one we’ll focus on. Later on, the experiment might reveal other issues. If it does, we’ll deal with them then.

Scaling the cluster

Before we dive into scaling Istio, we are going to explore scaling the cluster itself. It would be pointless to increase the number of replicas of Istio components, as a way to solve the problem of not being able to drain a node, if that is the only node in a cluster. We need the Gateway not only scaled but also distributed across different nodes of the cluster. Only then can we hope to drain a node successfully while the Gateway is running in it. We’ll assume that the experiment might shut down one replica, while others are still running somewhere else. Fortunately for us, Kubernetes always does its best to distribute instances of our apps across different nodes. As long as it can, it will not run multiple replicas on a single node.

So, our first action is to scale our cluster. However, ...

Introduction To Kubernetes Chaos Engineering

Defining Requirements

Destroying Application Instances

Experimenting with Application Availability

Obstructing and Destroying Network

Draining and Deleting Nodes

Creating Chaos Experiment Reports

Running Chaos Experiments Inside a Kubernetes Cluster

Executing Random Chaos

What’s Next?

Making Nodes Drainable

Taking a look at Istio Deployments

Scaling the cluster