--- name: Kubernetes description: Avoid common Kubernetes mistakes — resource limits, probe configuration, selector mismatches, and RBAC pitfalls. metadata: {"clawdbot":{"emoji":"☸️","requires":{"bins":["kubectl"]},"os":["linux","darwin","win32"]}} --- ## Resource Management - `requests` = guaranteed minimum — scheduler uses this for placement - `limits` = maximum allowed — exceeding memory = OOMKilled, CPU = throttled - No limits = can consume entire node — always set production limits - `requests` without `limits` = burstable — can use more if available ## Probes - `readinessProbe` controls traffic — fails = removed from Service endpoints - `livenessProbe` restarts container — fails = container killed and restarted - `startupProbe` for slow starts — disables liveness/readiness until success - Don't use same endpoint for liveness and readiness — liveness should be minimal health check ## Probe Pitfalls - Liveness probe checking dependencies — if DB down, all pods restart indefinitely - `initialDelaySeconds` too short — pod killed before app starts - `timeoutSeconds` too short — slow response = restart loop - HTTP probe to HTTPS endpoint — needs `scheme: HTTPS` ## Labels and Selectors - Service selector must match Pod labels exactly — typo = no endpoints - Deployment selector is immutable — can't change after creation - Use consistent labeling scheme — `app`, `version`, `environment` - `matchExpressions` for complex selection — `In`, `NotIn`, `Exists` ## ConfigMaps and Secrets - ConfigMap changes don't restart pods — mount as volume for auto-update, or restart manually - Secrets are base64 encoded, not encrypted — use external secrets manager for sensitive data - `envFrom` imports all keys — `env.valueFrom` for specific keys - Volume mount makes files — `subPath` for single file without replacing directory ## Networking - `ClusterIP` internal only — default, only accessible within cluster - `NodePort` exposes on node IP — 30000-32767 range, not for production - `LoadBalancer` provisions cloud LB — works only in supported environments - Ingress needs Ingress Controller — nginx-ingress, traefik, etc. installed separately ## Persistent Storage - PVC binds to PV — must match capacity and access modes - `storageClassName` must match — or use `""` for no dynamic provisioning - `ReadWriteOnce` = single node — `ReadWriteMany` needed for multi-pod - Pod deletion doesn't delete PVC — `persistentVolumeReclaimPolicy` controls PV fate ## Common Mistakes - `kubectl apply` vs `create` — apply for declarative (can update), create for imperative (fails if exists) - Forgetting namespace — `-n namespace` or set context default - Image tag `latest` in production — no version pinning, unpredictable updates - Not setting `imagePullPolicy` — `Always` for latest tag, `IfNotPresent` for versioned - Service port vs targetPort — port is Service's, targetPort is container's ## Debugging - `kubectl describe pod` for events — shows scheduling failures, probe failures - `kubectl logs -f pod` for logs — `-p` for previous container (after crash) - `kubectl exec -it pod -- sh` for shell — debug inside container - `kubectl get events --sort-by=.lastTimestamp` — cluster-wide events timeline ## RBAC - `ServiceAccount` per workload — not default, for least privilege - `Role` is namespaced — `ClusterRole` is cluster-wide - `RoleBinding` binds Role to user/SA — `ClusterRoleBinding` for cluster-wide - Check permissions: `kubectl auth can-i verb resource --as=system:serviceaccount:ns:sa`