The Simian Army
Learn about chaos monkey, chaos engineering, robustness, opt-in and opt-out environments.
We'll cover the following...
Chaos monkey
Probably the best known example of chaos engineering is Netflix’s Chaos Monkey. Every once in a while, the monkey wakes up, picks an autoscaling cluster, and kills one of its instances. The cluster should recover automatically. If it doesn’t, then there’s a problem and the team that owns the service has to fix it.
The Chaos Monkey tool was born during Netflix’s migration to Amazon’s AWS cloud infrastructure and a microservice architecture. As services proliferated, engineers found that availability could be jeopardized by an increasing number of components. Unless they found a way to make the whole service immune to component failures, they would be doomed. So every cluster needed to autoscale and recover from failure of ...