--- aliases: - /2022/04/dask-gateway-jupyterhub categories: - kubernetes - jetstream2 - jupyterhub - dask - python date: '2022-04-04' layout: post title: Deploy Dask Gateway with JupyterHub on Kubernetes --- Tutorial obsolete, see [the new version of the tutorial](./2023-09-28-dask-gateway-jupyterhub.md) **Updated 28 April 2022**: switched to Dask Gateway 2022.4.0 In this tutorial we will install [Dask Gateway](https://gateway.dask.org/index.html) on Kubernetes and configure JupyterHub so Jupyter Notebook users can launch private Dask cluster and connect to them. I assume to start from a Kubernetes cluster already running and JupyterHub deployed on top of it via Helm. And SSL encryption also activated (it isn't probably necessary, but I haven't tested that). I tested on Jetstream 2, but the recipe should be agnostic of that. ## Preparation Clone on the machine you use to run `helm` and `kubectl` the repository with the configuration files and scripts: git clone https://github.com/zonca/jupyterhub-deploy-kubernetes-jetstream/ Then you need to setup one API token, create it with: openssl rand -hex 32 Then paste it both in `dask_gateway/config_jupyterhub.yaml` and `dask_gateway/config_dask-gateway.yaml`, look for the string `TOKEN` and replace it. ## Launch dask gateway We can install version 2022.4.0 with: $ bash install_dask-gateway.sh You might want to check `config_dask-gateway.yaml` for extra configuration options, but for initial setup and testing it shouldn't be necessary. After this you should see the 3 dask gateway pods running, e.g.: $ kubectl -n jhub get pods NAME READY STATUS RESTARTS AGE api-dask-gateway-64bf5db96c-4xfd6 1/1 Running 2 23m controller-dask-gateway-7674bd545d-cwfnx 1/1 Running 0 23m traefik-dask-gateway-5bbd68c5fd-5drm8 1/1 Running 0 23m ## Modify the JupyterHub configuration Only 2 options need to be changed in JupyterHub: * We need to run a image which has the same version of `dask-gateway` we installed on Kubernetes (currently `0.9.0`) * We need to proxy `dask-gateway` through JupyterHub so the users can access the Dask dashboard If you are using my `install_jhub.sh` script to deploy JupyterHub, you can modify it and add another `values` option at the end, `--values dask_gateway/config_jupyterhub.yaml`. You can modify the image you are using for Jupyterhub in `dask_gateway/config_jupyterhub.yaml`. To assure that there are not compatibility issues, the "Client" (JupyterHub session), the dask gateway server, the scheduler and the workers should all have the same version of Python and the same version of `dask`, `distributed` and `dask_gateway`. If this is not possible, you can test different combinations and they might work. Then redeploy JupyterHub: bash install_jhub.sh && cd dask_gateway && bash install_dask-gateway.sh Check that the service is working correctly, if open a browser tab and access , you should see: {"status": "pass"} If this is not working, you can open login to JupyterHub, get a terminal and first check if the service is working: > curl http://traefik-dask-gateway/services/dask-gateway/api/health Should give: {"status": "pass"} ## Create a dask cluster You can now login to JupyterHub and check you can connect properly to `dask-gateway`: ```python from dask_gateway import Gateway gateway = Gateway( address="http://traefik-dask-gateway/services/dask-gateway/", public_address="https://js-XXX-YYY.jetstream-cloud.org/services/dask-gateway/", auth="jupyterhub") gateway.list_clusters() ``` Then create a cluster and use it: ```python cluster = gateway.new_cluster() cluster.scale(2) client = cluster.get_client() ``` Client is a standard `distributed` client and all subsequent calls to dask will go through the cluster. Printing the `cluster` object gives the link to the Dask dashboard. For a full example and screenshots of the widgets and of the dashboard see: (Click on the `Raw` button to download notebook and upload it to your session).