Validating Application Availability
In this lesson, we will be running some chaos experiments to check if our application is highly available and remains accessible from outside.
We'll cover the following...
Validating whether all the Pods are healthy and running is useful. But that does not necessarily mean that our application is accessible. Maybe the Pods are running, and everything is fantastic and peachy, but our customers cannot access our application.
Let’s see how we can validate whether we can send HTTP requests to our application and whether we can continue doing that after an instance of our app is destroyed. How would that definition look like?
Before we dive into application availability, we have a tiny problem that needs to be addressed. I couldn’t define the address of our application in YAML because your IP is almost certainly different than mine. And neither of us are using real domains because that would be too complicated to set up. That problem allows me to introduce you to yet another feature of Chaos Toolkit.
We are going to define a variable that can be injected into our definition.
Inspecting the definition of health-http.yaml
Let’s take a quick look at yet another YAML.
cat chaos/health-http.yaml
The output is as follows.
version: 1.0.0
title: What happens if we terminate an instance of the application?
description: If an instance of the application is terminated, the applications as a whole should still be operational.
tags:
- k8s
- pod
- http
configuration:
ingress_host:
type: env
key: INGRESS_HOST
steady-state-hypothesis:
title: The app is healthy
probes:
- name: app-responds-to-requests
type: probe
tolerance: 200
provider:
type: http
timeout: 3
verify_tls: false
url: http://${ingress_host}/demo/person
headers:
Host: go-demo-8.acme.com
method:
- type: action
name: terminate-app-pod
provider:
type: python
module: chaosk8s.pod.actions
func: terminate_pods
arguments:
label_selector: app=go-demo-8
rand: true
ns: go-demo-8
pauses:
after: 2
So the new sections of this definition, when compared to the previous one, is that we added a configuration section and we changed our steady-state-hypothesis
. There are a few other changes. Some of those are cosmetic, while others are indeed important.
Checking the difference between of health-pause.yaml
and health-http.yaml
We’ll skip commenting on the contents of this file because it is hard to see what’s different when compared to what we had before. We’ll comment on the differences by executing a diff
between the new and the old version of the definition.
diff chaos/health-pause.yaml chaos/health-http.yaml
The output is as follows.
3c3
< description: If an instance of
...