---
name: Kubernetes
description: Avoid common Kubernetes mistakes — resource limits, probe configuration, selector mismatches, and RBAC pitfalls.
metadata: {"clawdbot":{"emoji":"☸️","requires":{"bins":["kubectl"]},"os":["linux","darwin","win32"]}}
---

## Resource Management
- `requests` = guaranteed minimum — scheduler uses this for placement
- `limits` = maximum allowed — exceeding memory = OOMKilled, CPU = throttled
- No limits = can consume entire node — always set production limits
- `requests` without `limits` = burstable — can use more if available

## Probes
- `readinessProbe` controls traffic — fails = removed from Service endpoints
- `livenessProbe` restarts container — fails = container killed and restarted
- `startupProbe` for slow starts — disables liveness/readiness until success
- Don't use same endpoint for liveness and readiness — liveness should be minimal health check

## Probe Pitfalls
- Liveness probe checking dependencies — if DB down, all pods restart indefinitely
- `initialDelaySeconds` too short — pod killed before app starts
- `timeoutSeconds` too short — slow response = restart loop
- HTTP probe to HTTPS endpoint — needs `scheme: HTTPS`

## Labels and Selectors
- Service selector must match Pod labels exactly — typo = no endpoints
- Deployment selector is immutable — can't change after creation
- Use consistent labeling scheme — `app`, `version`, `environment`
- `matchExpressions` for complex selection — `In`, `NotIn`, `Exists`

## ConfigMaps and Secrets
- ConfigMap changes don't restart pods — mount as volume for auto-update, or restart manually
- Secrets are base64 encoded, not encrypted — use external secrets manager for sensitive data
- `envFrom` imports all keys — `env.valueFrom` for specific keys
- Volume mount makes files — `subPath` for single file without replacing directory

## Networking
- `ClusterIP` internal only — default, only accessible within cluster
- `NodePort` exposes on node IP — 30000-32767 range, not for production
- `LoadBalancer` provisions cloud LB — works only in supported environments
- Ingress needs Ingress Controller — nginx-ingress, traefik, etc. installed separately

## Persistent Storage
- PVC binds to PV — must match capacity and access modes
- `storageClassName` must match — or use `""` for no dynamic provisioning
- `ReadWriteOnce` = single node — `ReadWriteMany` needed for multi-pod
- Pod deletion doesn't delete PVC — `persistentVolumeReclaimPolicy` controls PV fate

## Common Mistakes
- `kubectl apply` vs `create` — apply for declarative (can update), create for imperative (fails if exists)
- Forgetting namespace — `-n namespace` or set context default
- Image tag `latest` in production — no version pinning, unpredictable updates
- Not setting `imagePullPolicy` — `Always` for latest tag, `IfNotPresent` for versioned
- Service port vs targetPort — port is Service's, targetPort is container's

## Debugging
- `kubectl describe pod` for events — shows scheduling failures, probe failures
- `kubectl logs -f pod` for logs — `-p` for previous container (after crash)
- `kubectl exec -it pod -- sh` for shell — debug inside container
- `kubectl get events --sort-by=.lastTimestamp` — cluster-wide events timeline

## RBAC
- `ServiceAccount` per workload — not default, for least privilege
- `Role` is namespaced — `ClusterRole` is cluster-wide
- `RoleBinding` binds Role to user/SA — `ClusterRoleBinding` for cluster-wide
- Check permissions: `kubectl auth can-i verb resource --as=system:serviceaccount:ns:sa`