The DevOps Toolkit: Kubernetes Chaos Engineering/

...

Deleting Worker Nodes

In this lesson, we will carry out an experiment that will delete a node in the cluster. This experiment can help us understand how our cluster behaves if nodes are destroyed or damaged.

We'll cover the following...

- Inspecting the definition of node-delete.yaml and comparing it with node-uncordon.yaml
- Running chaos experiment and inspecting the output
- Checking the nodes to confirm
- Assignment

After resolving a few problems, we are now able to drain nodes. We discovered those issues through experiments. As a result, we should be able to upgrade our cluster without doing something terribly wrong and, hopefully, without negatively affecting our applications.

Draining nodes is, most of the time, a voluntary action. We tend to drain our nodes when we choose to upgrade our cluster. The previous experiment was beneficial because we now have the confidence to upgrade the cluster without (much) fear. However, there is still something worse that can happen to our nodes.

More often than not, nodes will fail without our consent. They will not drain. They will get destroyed or damaged, they will go down, and they will be powered off. Bad things will happen to nodes, whether we like it or not.

Let’s see whether we can create an experiment that will validate how our cluster behaves when such things happen.

Inspecting the definition of `node-delete.yaml` and comparing it with `node-uncordon.yaml`

As always, we’re going to take a look at yet another experiment.

Press + to interact

The output is as follows.

version: 1.0.0
title: What happens if we delete a node
description: All the instances are distributed among healthy nodes and the applications are healthy
tags:
- k8s
- deployment
- node
configuration:
  node_label:
      type: env
      key: NODE_LABEL
steady-state-hypothesis:
  title: Nodes are indestructible
  probes:
  - name: all-apps-are-healthy
    type: probe
    tolerance: true
    provider:
      type: python
      func: all_microservices_healthy
      module: chaosk8s.probes
      arguments:
        ns: go-demo-8
method:
- type: action
  name: delete-node
  provider:
    type: python
    func: delete_nodes
    module: chaosk8s.node.actions
    arguments:
      label_selector: ${node_label}
      count: 1
      pod_namespace: go-demo-8
  pauses: 
    after: 10

We can see that we replaced the draining-node method ...

Introduction To Kubernetes Chaos Engineering

Defining Requirements

Destroying Application Instances

Experimenting with Application Availability

Obstructing and Destroying Network

Draining and Deleting Nodes

Creating Chaos Experiment Reports

Running Chaos Experiments Inside a Kubernetes Cluster

Executing Random Chaos

What’s Next?

Deleting Worker Nodes

Inspecting the definition of `node-delete.yaml` and comparing it with `node-uncordon.yaml`

Deleting Worker Nodes

Inspecting the definition of node-delete.yaml and comparing it with node-uncordon.yaml

Inspecting the definition of `node-delete.yaml` and comparing it with `node-uncordon.yaml`