Implementing Autoscaling for Kubernetes Services

Kubernetes itself is a really powerful orchestration platform that allows us to control how few or how many system resources a given bit of executable code can have access to. Since the implementation of Kubernetes can be done on-premises, in the cloud on virtual machines, or through a managed service, there are several options for autoscaling configuration. These can range from patterns in Kubernetes itself to certain features of cloud-managed services to third-party plugins that are purpose-built for specific scenarios.

Native Kubernetes options

As an orchestrator, Kubernetes offers a rich ecosystem that allows us to use as little or as much of the cluster’s compute power as needed, in a variety of ways. Some features allow us to control how applications can scale out, depending on some of the primary indicators we covered in the previous section. In this section, we’ll start with a couple of native options that can be used in our cluster, wherever it resides.

Horizontal Pod Autoscalers

Horizontal Pod Autoscalers (HPAs) are constructs within Kubernetes that allow us to specify a target condition or threshold that a workload must reach before the cluster will scale out and create a new pod or set of pods. The common parameters of minimum and maximum nodes are applicable here, giving us the ability to set up some guardrails around how much scaling occurs. The other parameter is the condition the deployment itself is experiencing—whether the event triggering the autoscaling is CPU or memory-bound.

While CPU and memory are the two out-of-the-box means for autoscaling, there are other ways we can add metrics to control autoscaling. The table below lists other common targets that can be used in tandem to determine how and when autoscaling should start.

Get hands-on with 1200+ tech skills courses.