---
layout: center
highlighter: shiki
css: unocss
colorSchema: dark
transition: fade-out
title: Taming Dependency Chaos for LLM in K8S
exportFilename: KubeCon HK 2025.06 - Taming Dependency Chaos for LLM in K8S
lineNumbers: false
drawings:
persist: false
mdc: true
clicks: 0
preload: false
glowSeed: 229
routerMode: hash
---
```bash
$ python train.py
ImportError: libcudart.so.11.0: cannot open shared object file
$ pip install torch --index-url https://download.pytorch.org/whl/cu118
RuntimeError: CUDA error: no kernel image is available for execution
$ ldd $(which python3) | grep 'not found'
libstdc++.so.6 => not found
```
---
class: py-10
glowSeed: 175
---
# Development vs Training: The Environment Gap
The Perfect Storm: When Python Code Meets C++ Underpinnings
ML libraries are just thin Python wrappers around massive C++ and CUDA codebases
---
class: py-10
glowSeed: 123
---
# The Silent Saboteurs
One solution to rule them all
Python + C++ + CUDA harmony in Kubernetes
---
# Dataset CRD
One CRD to Rule Them All
```yaml
apiVersion: dataset.baizeai.io/v1alpha1
kind: Dataset
metadata:
name: pytorch-env
spec:
source:
type: CONDA
uri: conda://python?version=3.11.9
options:
packageManager: CONDA
pythonVersion: 3.11.9
condaEnvironmentYml: |-
channels: ['nvidia', 'conda-forge']
dependencies: [
- 'cuda'
- 'cuda-libraries-dev'
- 'cuda-nvcc'
- 'cuda-nvtx'
- 'cuda-cupti'
pipRequirementsTxt: |-
transformers==4.35.0
torch
torchaudio
torchvision
```
```yaml
apiVersion: dataset.baizeai.io/v1alpha1
kind: Dataset
metadata:
name: qwen3-32b
spec:
dataSyncRound: 1
secretRef: dataset-hf-qwen3-32b-secret
source:
options:
endpoint: https://hf-mirror.com
repoType: MODEL
type: HUGGING_FACE
uri: huggingface://Qwen/Qwen3-32B
volumeClaimTemplate:
metadata: {}
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: '0'
storageClassName: juicefs-no-share-sc
status: {}
```