--- aliases: - /2019/09/jetstream_kubernetes_magnum_autoscaler categories: - kubernetes - jetstream - jupyterhub date: 2019-09-12 12:00 layout: post slug: kubernetes-jetstream-autoscaler title: Deploy Cluster Autoscaler for Kubernetes on Jetstream --- The [Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler) is a service that runs within a Kubernetes cluster and when there are not enough resources to accomodate the pods that are queued to run, it contacts the API of the cloud provider to create more Virtual Machines to join the Kubernetes Cluster. Initially the Cluster Autoscaler only supported commercial cloud provides, but back in March 2019 [a user contributed Openstack support based on Magnum](https://github.com/kubernetes/autoscaler/pull/1690). First step you should have a Magnum-based deployment running on Jetstream, see [my recent tutorial about that](https://zonca.github.io/2019/06/kubernetes-jupyterhub-jetstream-magnum.html). Therefore you should also have already a copy of the repository of all configuration files checked out on your local machine that you are using to interact with the openstack API, if not: git clone https://github.com/zonca/jupyterhub-deploy-kubernetes-jetstream.git and enter the folder dedicated to the autoscaler: cd jupyterhub-deploy-kubernetes-jetstream/kubernetes_magnum/autoscaler ## Setup credentials We first create the service account needed by the autoscaler to interact with the Kubernetes API: ```bash kubectl create -f cluster-autoscaler-svcaccount.yaml ``` Then we need to provide all connection details for the autoscaler to interact with the Openstack API, those are contained in the `cloud-config` of our cluster available in the master node and setup by Magnum. Get the `IP` of your master node from: ```bash openstack server list IP=xxx.xxx.xxx.xxx ``` Now ssh into the master node and access the `cloud-config` file: ```bash ssh fedora@$IP cat /etc/kubernetes/cloud-config ``` now copy the `[Global]` section at the end of `cluster-autoscaler-secret.yaml` on the local machine. Also remove the line of `ca-file` ```bash kubectl create -f cluster-autoscaler-secret.yaml ``` ## Launch the Autoscaler deployment Create the Autoscaler deployment: ```bash kubectl create -f cluster-autoscaler-deployment-master.yaml ``` Alternatively, I also added a version for a cluster where we are not deploying pods on master `cluster-autoscaler-deployment.yaml`. Check that the deployment is active: ```bash kubectl -n kube-system get pods NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE cluster-autoscaler 1 1 1 0 10s ``` And check its logs: ```bash kubectl -n kube-system logs cluster-autoscaler-59f4cf4f4-4k4p2 I0905 05:29:21.589062 1 leaderelection.go:217] attempting to acquire leader lease kube-system/cluster-autoscaler... I0905 05:29:39.412449 1 leaderelection.go:227] successfully acquired lease kube-system/cluster-autoscaler I0905 05:29:43.896557 1 magnum_manager_heat.go:293] For stack ID 17ab3ae7-1a81-43e6-98ec-b6ffd04f91d3, stack name is k8s-lu3bksbwsln3 I0905 05:29:44.146319 1 magnum_manager_heat.go:310] Found nested kube_minions stack: name k8s-lu3bksbwsln3-kube_minions-r4lhlv5xuwu3, ID d0590824-cc70-4da5-b9ff-8581d99c666b ``` If you redeploy the cluster and keep a older authentication, you'll see "Authentication failed" in the logs of the autoscaler pod, you need to update the secret every time you redeploy the cluster. ## Test the autoscaler Now we need to produce a significant load on the cluster so that the autoscaler is triggered to request Openstack Magnum to create more Virtual Machines. We can create a deployment of the NGINX container (any other would work for this test): ```bash kubectl create deployment autoscaler-demo --image=nginx ``` And then create a large number of replicas: ```bash kubectl scale deployment autoscaler-demo --replicas=300 ``` We are using 2 nodes with a large amount of memory and CPU, so they can accommodate more then 200 of those pods. The rest remains in the queue: ```bash kubectl get deployment autoscaler-demo NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE autoscaler-demo 300 300 300 213 18m ``` And this triggers the autoscaler: ```bash kubectl -n kube-system logs cluster-autoscaler-59f4cf4f4-4k4p2 I0905 05:34:47.401149 1 scale_up.go:689] Scale-up: setting group DefaultNodeGroup size to 2 I0905 05:34:49.267280 1 magnum_nodegroup.go:101] Increasing size by 1, 1->2 I0905 05:35:22.222387 1 magnum_nodegroup.go:67] Waited for cluster UPDATE_IN_PROGRESS status ``` Check also in the Openstack API: ```bash openstack coe cluster list +------+------+---------+------------+--------------+--------------------+ | uuid | name | keypair | node_count | master_count | status | +------+------+---------+------------+--------------+--------------------+ | 09fcf| k8s | comet | 2 | 1 | UPDATE_IN_PROGRESS | +------+------+---------+------------+--------------+--------------------+ ``` It takes about 4 minutes for a new VM to boot, be configured by Magnum and join the Kubernetes cluster. Checking the logs again should show another line: ```bash I0912 17:18:28.290987 1 magnum_nodegroup.go:67] Waited for cluster UPDATE_COMPLETE status ``` Then you should have all 3 nodes available: ```bash kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-6bawhy45wr5t-master-0 Ready master 38m v1.11.1 k8s-6bawhy45wr5t-minion-0 Ready 38m v1.11.1 k8s-6bawhy45wr5t-minion-1 Ready 30m v1.11.1 ``` and all 300 NGINX containers deployed: ```bash kubectl get deployments NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE autoscaler-demo 300 300 300 300 35m ``` You can also test scaling down by scaling back the number of NGINX containers to only a few and check in the logs of the autoscaler that this process triggers the scale-down process. In `cluster-autoscaler-deployment-master.yaml` I have configured the scale down process to trigger just after 1 minute, to simplify testing. For production, better increase this to 10 minutes or more. Check the [documentation of Cluster Autoscaler 1.14](https://github.com/zonca/autoscaler/blob/cluster-autoscaler-1.14-magnum/cluster-autoscaler/FAQ.md) for all other available options. ## Note about the Cluster Autoscaler container The Magnum provider was added in Cluster Autoscaler 1.15, however this version is not compatible with Kubernetes 1.11 which is currently available on Jetstream. Therefore I have taken the development version of Cluster Autoscaler 1.14 and compiled it myself. I also noticed that the scale down process was not working due to incompatible IDs when the Cloud Provider tried to lookup the ID of a Minion in the Stack. I am now directly using the MachineID instead of going through these indices. This version is available in [my fork of `autoscaler`](https://github.com/zonca/autoscaler/tree/cluster-autoscaler-1.14-magnum) and it is built into docker containers on the [`zonca/k8s-cluster-autoscaler-jetstream` repository on Docker Hub](https://cloud.docker.com/repository/docker/zonca/k8s-cluster-autoscaler-jetstream). The image tags are the short version of the repository git commit hash. I build the container using the `run_gobuilder.sh` and `run_build_autoscaler_container.sh` scripts included in the repository. ## Note about images used by Magnum I have tested this deployment using the `Fedora-Atomic-27-20180419` image on Jetstream at Indiana University. The Fedora Atomic 28 image had a long hang-up during boot and took more than 10 minutes to start and that caused timeout in the autoscaler and anyway it would have been too long for a user waiting to start a notebook. I also tried updating the Fedora Atomic 28 image with `sudo atomic host upgrade` and while this fixed the slow startup issue, it generated a broken Kubernetes installation, i.e. the Kubernetes services didn't detect the master node as part of the cluster, `kubectl get nodes` only showed the minion.