Deleting Worker Nodes
In this lesson, we will carry out an experiment that will delete a node in the cluster. This experiment can help us understand how our cluster behaves if nodes are destroyed or damaged.
After resolving a few problems, we are now able to drain nodes. We discovered those issues through experiments. As a result, we should be able to upgrade our cluster without doing something terribly wrong and, hopefully, without negatively affecting our applications.
Draining nodes is, most of the time, a voluntary action. We tend to drain our nodes when we choose to upgrade our cluster. The previous experiment was beneficial because we now have the confidence to upgrade the cluster without (much) fear. However, there is still something worse that can happen to our nodes.
More often than not, nodes will fail without our consent. They will not drain. They will get destroyed or damaged, they will go down, and they will be powered off. Bad things will happen to nodes, whether we like it or not.
Let’s see whether we can create an experiment that will validate how our cluster behaves when such things happen.
Inspecting the definition of node-delete.yaml
and comparing it with node-uncordon.yaml
As always, we’re going to take a look at yet another experiment.
cat chaos/node-delete.yaml
The output is as follows.
version: 1.0.0
title: What happens if we delete a node
description: All the instances are distributed among healthy nodes and the applications are healthy
tags:
- k8s
- deployment
- node
configuration:
node_label:
type: env
key: NODE_LABEL
steady-state-hypothesis:
title: Nodes are indestructible
probes:
- name: all-apps-are-healthy
type: probe
tolerance: true
provider:
type: python
func: all_microservices_healthy
module: chaosk8s.probes
arguments:
ns: go-demo-8
method:
- type: action
name: delete-node
provider:
type: python
func: delete_nodes
module: chaosk8s.node.actions
arguments:
label_selector: ${node_label}
count: 1
pod_namespace: go-demo-8
pauses:
after: 10
We can see that we replaced ...