Automation Goes Really Fast

Learn about different outages and autoscaling services, the difference between human and automation, and the drawbacks of automations.

AWS postmortem

Another fascinating bit of information shows up in Amazon’s AWS post mortem:

“While removal of capacity is a key operational practice, in this instance, the tool used allowed too much capacity to be removed too quickly. We have modified this tool to remove capacity more slowly and added safeguards to prevent capacity from being removed when it will take any subsystem below its minimum required capacity level.”

Reddit outage

This part stuck out because it closely resembled the outage that Reddit.com suffered in August 2016 ...