Targeting Chaos
Learn about randomness in chaos testing, faults and failures, cunning malevolent intelligence, chaos automation, and repeat.
Randomness
We could certainly use randomness. This is how Chaos Monkey works: it picks a cluster at random, picks an instance at random, and kills it. If you’re just getting started with chaos engineering, then random selection is as good a process as any. Most software has so many problems that shooting at random targets will uncover something alarming.
Faults and failures
We’re looking for faults that lead to failures. Many faults won’t cause failures. In fact, on any given day, most faults don’t result in failures. More about that later in this chapter. When we inject faults into service-to-service calls, we’re searching for the crucial calls. As with any search problem, we have to confront the challenge of dimensionality.
Suppose there’s a partner data load process that runs every Tuesday. A fault during one part of that process causes bad data in the database. Later, when using that data to present an API response, a service throws an exception and returns a 500
response code. How likely are we to find that problem via random search? Not very likely.
When to avoid randomness?
Randomness works well at the beginning because the ...