--- categories: - jetstream2 - jupyterhub - kubernetes layout: post date: '2024-02-08' title: Deploy Kubernetes on Jetstream 2 with GPU support --- This work has been supported by Indiana University and is cross-posted on the Jetstream 2 official documentation website. Thanks to work by [Ana Espinoza](https://github.com/ana-v-espinoza), the standard recipe now supports GPUs out of the box and also supports hybrid clusters where some nodes are standard CPU nodes and some nodes have GPU. The Jetstream 2 cloud includes [90 GPU nodes with 4 NVIDIA A100 each](https://docs.jetstream-cloud.org/overview/config/){target=\_blank}. If we want to leverage the GPUs inside Kubernetes pods, for example JupyterHub users, we both need to have a GPU-enabled ContainerD runtime and a compatible Docker image based off NVIDIA images. ## Deploy Kubernetes with NVIDIA runtime Kubespray has built-in support for NVIDIA runtime, in a previous version of this tutorial I had a specific branch dedicated to supporting a cluster where all worker nodes had GPUs. Therefore it is just a matter of following the [standard Kubespray deployment tutorial](https://www.zonca.dev/posts/2023-07-19-jetstream2_kubernetes_kubespray){target=\_blank}, configuring properly the variables in `cluster.tfvars` following the comments available there. In summary, for a GPU-only cluster we only set: supplementary_node_groups = "gpu-node" Instead for a hybrid cluster we need to set the number of worker nodes to zero and instead list explicitly all the nodes we want Terraform to create, specifying their name and if they should have a GPU or not. If we deploy a hybrid GPU-CPU cluster in the default configuration from `cluster.tfvars`, we will have 2 CPU and 2 GPU nodes: ``` > kubectl get nodes NAME STATUS ROLES AGE VERSION kubejetstream-1 Ready control-plane 44m v1.25.6 kubejetstream-k8s-node-nf-cpu-1 Ready 43m v1.25.6 kubejetstream-k8s-node-nf-cpu-2 Ready 43m v1.25.6 kubejetstream-k8s-node-nf-gpu-1 Ready 43m v1.25.6 kubejetstream-k8s-node-nf-gpu-2 Ready 43m v1.25.6 ``` Next we need to install the `k8s-device-plugin`, at the moment it is just necessary to execute: kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.4/nvidia-device-plugin.yml However, make sure you check the latest [`k8s-device-plugin` documentation](https://github.com/NVIDIA/k8s-device-plugin){target=\_blank}. For testing, you can run a simple GPU job, this is requesting a GPU, so it will automatically run on a GPU node if we have an hybrid cluster: ``` cat <