Auto-scaling#
Auto-scaling is a cloud feature that allows a program to dynamically spin-up more application instances in response to workload intensity. Cloud engineers are responsible for setting the auto-scaling configuration that decides when new instances are spun up and the maximum number of instances allowed.
Speed vs. CPU Cost
As a single application instance handles more workload, it gets slower like other programs. Another instance of the application can be spun up to split the workload and thus maintain the speed of service. These instances cost CPU, which taxes the server hardware and has a limit of how much can be used. Cloud engineers always try to maximize speed and minimize CPU, so they’re constantly working toward the optimal balance between the two.
All configurations use scheduled auto-scaling. This is where a cloud engineer sets a maximum number of instances or CPU usage to prevent new instances from being created if reached. This helps to manage costs, as computing power is expensive, and without this ceiling, our auto-scaler could call for an unlimited number of instances, each drawing considerable computing power.
Besides setting the scheduled auto-scaling ceiling, cloud engineers also determine which type of auto-scaling is most efficient for their company’s needs, either predictive auto-scaling or dynamic auto-scaling.
Predictive auto-scaling#
Predictive auto-scaling
involves using machine learning and previous data to anticipate how many instances will be required to handle the workload at any given time. This is best used if your cloud’s workload has consistent periods of peak workload, as more instances don’t have to be spun up on the fly.
The downside is, if the workload is smaller than predicted, you pay more upkeep than is needed, or if the workload is too big, you must slow the product to spin-up more instances.
For example, when designing the auto-scaling policies for Netflix, it would be good to use predictive auto-scaling if you find that the application is consistently used more on weekends than on weekdays.
Dynamic auto-scaling#
Dynamic auto-scaling, on the other hand, spins up instances on the fly based on target metrics decided by the cloud engineer. Some common metrics are CPU usage, requests per minute on the program, or container resource usage. Most high level dynamic auto-scaling implementations utilize all of these metrics to ensure a sufficient number of instances.
This is best used when the cloud workload has no foreseeable pattern or remains mostly constant. The upside of this type of auto-scaling is that you’ll never have more instances than needed.
For the downside, the creation of instances on the fly can slow performance as current instances are overworked while the new one is spinning up. Also, if a key metric is missed, the program would not create a new instance when needed and run slowly as a result.