This article is more than one year old. Older articles may contain outdated content. Check that the information in the page has not become incorrect since its publication.

Autoscaling in Kubernetes

Editor’s note: this post is part of a series of in-depth articles on what's new in Kubernetes 1.3

Customers using Kubernetes respond to end user requests quickly and ship software faster than ever before. But what happens when you build a service that is even more popular than you planned for, and run out of compute? In Kubernetes 1.3, we are proud to announce that we have a solution: autoscaling. On Google Compute Engine (GCE) and Google Container Engine (GKE) (and coming soon on AWS), Kubernetes will automatically scale up your cluster as soon as you need it, and scale it back down to save you money when you don’t.

Benefits of Autoscaling

To understand better where autoscaling would provide the most value, let’s start with an example. Imagine you have a 24/7 production service with a load that is variable in time, where it is very busy during the day in the US, and relatively low at night. Ideally, we would want the number of nodes in the cluster and the number of pods in deployment to dynamically adjust to the load to meet end user demand. The new Cluster Autoscaling feature together with Horizontal Pod Autoscaler can handle this for you automatically.

Setting Up Autoscaling on GCE

The following instructions apply to GCE. For GKE please check the autoscaling section in cluster operations manual available here.

Before we begin, we need to have an active GCE project with Google Cloud Monitoring, Google Cloud Logging and Stackdriver enabled. For more information on project creation, please read our Getting Started Guide. We also need to download a recent version of Kubernetes project (version v1.3.0 or later).

First, we set up a cluster with Cluster Autoscaler turned on. The number of nodes in the cluster will start at 2, and autoscale up to a maximum of 5. To implement this, we’ll export the following environment variables:

export NUM\_NODES=2




and start the cluster by running:


The script creates a cluster together with Cluster Autoscaler add-on. The autoscaler will try to add new nodes to the cluster if there are pending pods which could schedule on a new node.

Let’s see our cluster, it should have two nodes:

$ kubectl get nodes

NAME                           STATUS                     AGE

kubernetes-master              Ready,SchedulingDisabled   2m

kubernetes-minion-group-de5q   Ready                      2m

kubernetes-minion-group-yhdx   Ready                      1m

Run & Expose PHP-Apache Server

To demonstrate autoscaling we will use a custom docker image based on php-apache server. The image can be found here. It defines index.php page which performs some CPU intensive computations.

First, we’ll start a deployment running the image and expose it as a service:

$ kubectl run php-apache \\_containers/hpa-example \

  --requests=cpu=500m,memory=500M --expose --port=80  

service "php-apache" createddeployment "php-apache" created

Now, we will wait some time and verify that both the deployment and the service were correctly created and are running:

$ kubectl get deployment


php-apache   1         1         1            1           49s

$ kubectl get pods  
NAME                          READY     STATUS    RESTARTS   AGE

php-apache-2046965998-z65jn   1/1       Running   0          30s

We may now check that php-apache server works correctly by calling wget with the service's address:

$ kubectl run -i --tty service-test --image=busybox /bin/sh  
Hit enter for command prompt  
$ wget -q -O- http://php-apache.default.svc.cluster.local


Starting Horizontal Pod Autoscaler

Now that the deployment is running, we will create a Horizontal Pod Autoscaler for it. To create it, we will use kubectl autoscale command, which looks like this:

$ kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

This defines a Horizontal Ppod Autoscaler that maintains between 1 and 10 replicas of the Pods controlled by the php-apache deployment we created in the first step of these instructions. Roughly speaking, the horizontal autoscaler will increase and decrease the number of replicas (via the deployment) so as to maintain an average CPU utilization across all Pods of 50% (since each pod requests 500 milli-cores by kubectl run, this means average CPU usage of 250 milli-cores). See here for more details on the algorithm.

We may check the current status of autoscaler by running:

$ kubectl get hpa

NAME         REFERENCE                     TARGET    CURRENT   MINPODS   MAXPODS   AGE

php-apache   Deployment/php-apache/scale   50%       0%        1         20        14s

Please note that the current CPU consumption is 0% as we are not sending any requests to the server (the CURRENT column shows the average across all the pods controlled by the corresponding replication controller).

Raising the Load

Now, we will see how our autoscalers (Cluster Autoscaler and Horizontal Pod Autoscaler) react on the increased load of the server. We will start two infinite loops of queries to our server (please run them in different terminals):

$ kubectl run -i --tty load-generator --image=busybox /bin/sh  
Hit enter for command prompt  
$ while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done

We need to wait a moment (about one minute) for stats to propagate. Afterwards, we will examine status of Horizontal Pod Autoscaler:

$ kubectl get hpa

NAME         REFERENCE                     TARGET    CURRENT   MINPODS   MAXPODS   AGE

php-apache   Deployment/php-apache/scale   50%       310%      1         20        2m

$ kubectl get deployment php-apache


php-apache        7         7         7            3           4m

Horizontal Pod Autoscaler has increased the number of pods in our deployment to 7. Let’s now check, if all the pods are running:

jsz@jsz-desk2:~/k8s-src$ kubectl get pods

php-apache-2046965998-3ewo6        0/1       Pending   0          1m

php-apache-2046965998-8m03k        1/1       Running   0          1m

php-apache-2046965998-ddpgp        1/1       Running   0          5m

php-apache-2046965998-lrik6        1/1       Running   0          1m

php-apache-2046965998-nj465        0/1       Pending   0          1m

php-apache-2046965998-tmwg1        1/1       Running   0          1m

php-apache-2046965998-xkbw1        0/1       Pending   0          1m

As we can see, some pods are pending. Let’s describe one of pending pods to get the reason of the pending state:

$ kubectl describe pod php-apache-2046965998-3ewo6

Name: php-apache-2046965998-3ewo6

Namespace: default



  FirstSeen From SubobjectPath Type Reason Message

  1m {default-scheduler } Warning FailedScheduling pod (php-apache-2046965998-3ewo6) failed to fit in any node

fit failure on node (kubernetes-minion-group-yhdx): Insufficient CPU

fit failure on node (kubernetes-minion-group-de5q): Insufficient CPU

  1m {cluster-autoscaler } Normal TriggeredScaleUp pod triggered scale-up, mig: kubernetes-minion-group, sizes (current/new): 2/3

The pod is pending as there was no CPU in the system for it. We see there’s a TriggeredScaleUp event connected with the pod. It means that the pod triggered reaction of Cluster Autoscaler and a new node will be added to the cluster. Now we’ll wait for the reaction (about 3 minutes) and list all nodes:

$ kubectl get nodes

NAME                           STATUS                     AGE

kubernetes-master              Ready,SchedulingDisabled   9m

kubernetes-minion-group-6z5i   Ready                      43s

kubernetes-minion-group-de5q   Ready                      9m

kubernetes-minion-group-yhdx   Ready                      9m

As we see a new node kubernetes-minion-group-6z5i was added by Cluster Autoscaler. Let’s verify that all pods are now running:

$ kubectl get pods

NAME                               READY     STATUS    RESTARTS   AGE

php-apache-2046965998-3ewo6        1/1       Running   0          3m

php-apache-2046965998-8m03k        1/1       Running   0          3m

php-apache-2046965998-ddpgp        1/1       Running   0          7m

php-apache-2046965998-lrik6        1/1       Running   0          3m

php-apache-2046965998-nj465        1/1       Running   0          3m

php-apache-2046965998-tmwg1        1/1       Running   0          3m

php-apache-2046965998-xkbw1        1/1       Running   0          3m

After the node addition all php-apache pods are running!

Stop Load

We will finish our example by stopping the user load. We’ll terminate both infinite while loops sending requests to the server and verify the result state:

$ kubectl get hpa

NAME         REFERENCE                     TARGET    CURRENT   MINPODS   MAXPODS   AGE

php-apache   Deployment/php-apache/scale   50%       0%        1         10        16m

$ kubectl get deployment php-apache


php-apache        1         1         1            1           14m

As we see, in the presented case CPU utilization dropped to 0, and the number of replicas dropped to 1.

After deleting pods most of the cluster resources are unused. Scaling the cluster down may take more time than scaling up because Cluster Autoscaler makes sure that the node is really not needed so that short periods of inactivity (due to pod upgrade etc) won’t trigger node deletion (see cluster autoscaler doc). After approximately 10-12 minutes you can verify that the number of nodes in the cluster dropped:

$ kubectl get nodes

NAME                           STATUS                     AGE

kubernetes-master              Ready,SchedulingDisabled   37m

kubernetes-minion-group-de5q   Ready                      36m

kubernetes-minion-group-yhdx   Ready                      36m

The number of nodes in our cluster is now two again as node kubernetes-minion-group-6z5i was removed by Cluster Autoscaler.

Other use cases

As we have shown, it is very easy to dynamically adjust the number of pods to the load using a combination of Horizontal Pod Autoscaler and Cluster Autoscaler.

However Cluster Autoscaler alone can also be quite helpful whenever there are irregularities in the cluster load. For example, clusters related to development or continuous integration tests can be less needed on weekends or at night. Batch processing clusters may have periods when all jobs are over and the new will only start in couple hours. Having machines that do nothing is a waste of money.

In all of these cases Cluster Autoscaler can reduce the number of unused nodes and give quite significant savings because you will only pay for these nodes that you actually need to run your pods. It also makes sure that you always have enough compute power to run your tasks.

-- Jerzy Szczepkowski and Marcin Wielgus, Software Engineers, Google