Governor

Learn about human and automation limitations for controlling errors, governors, uses of governors, and a quick wrap up.

Human vs. Automation

In the Force Multiplier lesson, we looked into an outage that Reddit suffered. As a quick reminder, Reddit’s configuration management system restarted a part of its infrastructure management that scales server instances up and down. This was in the middle of a ZooKeeper migration, so the autoscaler read a partial configuration and decided to shut down nearly every machine instance in Reddit.

The flip side of that coin is a job scheduler that spins up too many computational instances in order to process a queue before a deadline. The work still can’t get done fast enough, and, to add insult to injury, the cloud provider’s invoice that month is written in scientific notation. Automation has no judgment. When it goes wrong, it tends to go wrong really quickly. By the time a human perceives the problem, it’s a question of recovery rather than intervention. How can we allow human intervention without putting a human in the loop for everything? We should use automation for things humans are bad at: repetitive ...