--- name: kubernetes-patterns description: Kubernetes workload patterns, resource management, RBAC, probes, autoscaling, ConfigMap/Secret handling, and kubectl debugging for production-grade deployments. origin: ECC --- # Kubernetes Patterns Production-grade Kubernetes patterns for deploying, managing, and debugging workloads reliably. ## When to Activate - Writing Kubernetes manifests (Deployments, Services, Ingress, Jobs) - Configuring resource requests/limits, liveness/readiness probes - Setting up RBAC, namespaces, or ServiceAccounts - Managing configuration and secrets in K8s - Debugging CrashLoopBackOff, OOMKilled, pending pods, or image pull errors - Configuring HPA (Horizontal Pod Autoscaler) or PodDisruptionBudgets - Reviewing K8s YAML for security or correctness ## When to Use > Same as **When to Activate** above. This alias satisfies repo skill-format conventions. Use this skill any time you are writing, reviewing, or debugging Kubernetes YAML and workloads. ## How It Works This skill provides **copy-pasteable, production-grade YAML patterns** and **kubectl debugging commands** organized by task: 1. **Deployment template** — A fully configured production `Deployment` with security context, rolling update strategy, all three probe types, resource limits, and environment injection from ConfigMap/Secret. 2. **Probes** — Decision table for startup vs liveness vs readiness, with correct `failureThreshold × periodSeconds` math. 3. **Services & Ingress** — ClusterIP, LoadBalancer, and TLS Ingress patterns with cert-manager annotations. 4. **ConfigMaps & Secrets** — `envFrom`, file-mount, and external secrets guidance. 5. **Resource management** — Requests vs limits rules of thumb by workload type (web API, JVM, worker, sidecar). 6. **RBAC** — Least-privilege ServiceAccount → Role → RoleBinding chain. 7. **HPA & PDB** — Autoscaling and node-drain safety configurations. 8. **Jobs & CronJobs** — One-off and scheduled workload patterns with correct `restartPolicy`. 9. **kubectl cheatsheet** — Logs, exec, rollback, port-forward, dry-run, and common error diagnosis commands. 10. **Anti-patterns & checklist** — What NOT to do, and a security/reliability/observability checklist. ## Examples See the sections below for complete, runnable examples. Quick references: | Task | Jump to | |------|---------| | Full production Deployment YAML | [Core Workload Patterns](#core-workload-patterns) | | Probe configuration | [Probes](#probes--liveness-readiness-startup) | | RBAC least-privilege setup | [RBAC](#rbac--roles-and-serviceaccounts) | | Debug a CrashLoopBackOff | [kubectl Debugging Cheatsheet](#kubectl-debugging-cheatsheet) | | Autoscaling | [HPA](#horizontal-pod-autoscaler-hpa) | --- ## Core Workload Patterns ### Deployment — Production Template ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: my-app namespace: my-namespace labels: app: my-app version: "1.0.0" spec: replicas: 3 selector: matchLabels: app: my-app strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 # Allow 1 extra pod during update maxUnavailable: 0 # Never reduce below desired count template: metadata: labels: app: my-app version: "1.0.0" spec: # Security context at pod level securityContext: runAsNonRoot: true runAsUser: 1001 fsGroup: 1001 # Graceful shutdown terminationGracePeriodSeconds: 30 containers: - name: my-app image: ghcr.io/org/my-app:1.0.0 # Never use :latest imagePullPolicy: IfNotPresent ports: - containerPort: 8080 protocol: TCP # Resource requests AND limits are both required resources: requests: cpu: "100m" memory: "128Mi" limits: cpu: "500m" memory: "256Mi" # Container security context securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: - ALL # Probes (see Probes section below) startupProbe: httpGet: path: /health port: 8080 failureThreshold: 30 periodSeconds: 5 livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 0 periodSeconds: 30 failureThreshold: 3 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 10 failureThreshold: 2 # Environment from ConfigMap and Secret envFrom: - configMapRef: name: my-app-config env: - name: DB_PASSWORD valueFrom: secretKeyRef: name: my-app-secrets key: db-password # Writable tmp directory when readOnlyRootFilesystem: true volumeMounts: - name: tmp mountPath: /tmp volumes: - name: tmp emptyDir: {} ``` --- ## Probes — Liveness, Readiness, Startup Understanding when to use each probe is critical: | Probe | Failure Action | Use For | |-------|---------------|---------| | `startupProbe` | Kills container if slow to start | Slow-starting apps (JVM, Python) | | `livenessProbe` | Restarts container | Deadlock / hung process detection | | `readinessProbe` | Removes from Service endpoints | Temporary unavailability (DB reconnect) | ```yaml # Correct pattern: startupProbe covers slow startup, # then liveness/readiness take over startupProbe: httpGet: path: /health port: 8080 failureThreshold: 30 # 30 * 5s = 150s max startup time periodSeconds: 5 livenessProbe: httpGet: path: /health port: 8080 periodSeconds: 30 failureThreshold: 3 # 3 * 30s = 90s before restart readinessProbe: httpGet: path: /ready # Separate endpoint: checks DB, cache, etc. port: 8080 periodSeconds: 10 failureThreshold: 2 ``` ```yaml # WRONG: initialDelaySeconds without startupProbe # If the app takes 60s to start, set a startupProbe instead livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 60 # BAD: Arbitrary wait, race condition ``` --- ## Services and Ingress ### Service Types ```yaml # ClusterIP (default) — internal-only apiVersion: v1 kind: Service metadata: name: my-app namespace: my-namespace spec: selector: app: my-app ports: - port: 80 targetPort: 8080 protocol: TCP type: ClusterIP ``` ```yaml # LoadBalancer — external traffic (cloud providers) spec: type: LoadBalancer ports: - port: 443 targetPort: 8080 ``` ### Ingress with TLS ```yaml apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: my-app namespace: my-namespace annotations: nginx.ingress.kubernetes.io/ssl-redirect: "true" cert-manager.io/cluster-issuer: "letsencrypt-prod" spec: ingressClassName: nginx tls: - hosts: - myapp.example.com secretName: my-app-tls rules: - host: myapp.example.com http: paths: - path: / pathType: Prefix backend: service: name: my-app port: number: 80 ``` --- ## ConfigMaps and Secrets ### ConfigMap — Non-sensitive configuration ```yaml apiVersion: v1 kind: ConfigMap metadata: name: my-app-config namespace: my-namespace data: LOG_LEVEL: "info" APP_ENV: "production" MAX_CONNECTIONS: "100" # Mount as a file for complex config app.yaml: | server: port: 8080 timeout: 30s ``` ```yaml # Mount ConfigMap as a file volumes: - name: config configMap: name: my-app-config items: - key: app.yaml path: app.yaml volumeMounts: - name: config mountPath: /etc/app readOnly: true ``` ### Secrets — Sensitive data ```bash # Create secret from literal (CLI, then store in Vault/SOPS) kubectl create secret generic my-app-secrets \ --from-literal=db-password='s3cr3t' \ --namespace=my-namespace \ --dry-run=client -o yaml | kubectl apply -f - ``` ```yaml apiVersion: v1 kind: Secret metadata: name: my-app-secrets namespace: my-namespace type: Opaque # Values are base64-encoded (NOT encrypted — use Sealed Secrets or ESO for real encryption) data: db-password: czNjcjN0 # base64 of 's3cr3t' ``` > **Important:** Raw Kubernetes Secrets are only base64-encoded, not encrypted at rest unless your cluster has encryption configured. Use [Sealed Secrets](https://github.com/bitnami-labs/sealed-secrets) or [External Secrets Operator](https://external-secrets.io) for production. --- ## Resource Requests and Limits ```yaml resources: requests: # Scheduler uses this to place the pod cpu: "100m" # 100 millicores = 0.1 CPU memory: "128Mi" limits: # Container is killed/throttled above this cpu: "500m" memory: "256Mi" ``` **Rules of thumb:** | Workload Type | CPU Request | Memory Request | Notes | |---------------|-------------|----------------|-------| | Web API | 100–250m | 128–256Mi | Set limits 2-4x requests | | Worker/consumer | 250–500m | 256–512Mi | Memory limit = request for predictability | | JVM app | 500m–1 | 512Mi–2Gi | Allow headroom above `-Xmx` for JVM overhead | | Sidecar | 10–50m | 32–64Mi | Keep minimal | ```yaml # WRONG: No requests or limits — unpredictable scheduling, OOM evictions containers: - name: app image: myapp:latest # Missing resources: {} — this is dangerous in production # WRONG: Limits without requests — requests default to limits, over-reserves capacity resources: limits: cpu: "2" memory: "1Gi" # requests missing — will default to limits values ``` --- ## RBAC — Roles and ServiceAccounts ### Principle of Least Privilege **Two patterns depending on whether the app calls the Kubernetes API:** #### Pattern A — App does NOT need the Kubernetes API (most apps) Disable token automounting on the ServiceAccount. The Role/RoleBinding are not needed. ```yaml # ServiceAccount with token disabled — safest default apiVersion: v1 kind: ServiceAccount metadata: name: my-app-sa namespace: my-namespace automountServiceAccountToken: false # No K8s API token injected into pods ``` ```yaml # Reference in Deployment — no token, no API access spec: template: spec: serviceAccountName: my-app-sa automountServiceAccountToken: false # Belt-and-suspenders: also set at pod level ``` #### Pattern B — App DOES need the Kubernetes API (operators, controllers, config watchers) Enable the token and grant only the permissions actually required. ```yaml # 1. ServiceAccount — enable token for this SA apiVersion: v1 kind: ServiceAccount metadata: name: my-app-sa namespace: my-namespace automountServiceAccountToken: true # Token required: app calls K8s API ``` ```yaml # 2. Role — grant only what the app needs (namespace-scoped) apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: my-app-role namespace: my-namespace rules: - apiGroups: [""] resources: ["configmaps"] verbs: ["get", "list", "watch"] # Read-only, specific resource - apiGroups: [""] resources: ["secrets"] resourceNames: ["my-app-secrets"] # Restrict to specific secret by name verbs: ["get"] ``` ```yaml # 3. Bind Role to ServiceAccount apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: my-app-rolebinding namespace: my-namespace subjects: - kind: ServiceAccount name: my-app-sa namespace: my-namespace roleRef: kind: Role apiGroup: rbac.authorization.k8s.io name: my-app-role ``` ```yaml # 4. Reference SA in Deployment spec: template: spec: serviceAccountName: my-app-sa # automountServiceAccountToken defaults to true from SA — token is injected ``` --- ## Horizontal Pod Autoscaler (HPA) ```yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: my-app-hpa namespace: my-namespace spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: my-app minReplicas: 2 # Always at least 2 for HA maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 # Scale up when avg CPU > 70% - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 ``` > HPA requires `resources.requests` to be set on all containers — it calculates utilization as `current / request`. --- ## PodDisruptionBudget (PDB) Prevent too many pods going down during node drains or rolling updates: ```yaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: my-app-pdb namespace: my-namespace spec: minAvailable: 2 # OR use maxUnavailable: 1 selector: matchLabels: app: my-app ``` --- ## Namespaces and Multi-Tenancy ```bash # Create namespace with resource quotas kubectl create namespace my-namespace # Apply ResourceQuota to limit namespace consumption kubectl apply -f - < -n my-namespace # Events and state details kubectl logs -n my-namespace # Current logs kubectl logs -n my-namespace --previous # Logs from crashed container kubectl logs -n my-namespace -c # Multi-container pod # --- Execute into a running container --- kubectl exec -it -n my-namespace -- sh kubectl exec -it -n my-namespace -- bash # --- Check resource usage --- kubectl top pods -n my-namespace kubectl top nodes # --- Deployment operations --- kubectl rollout status deployment/my-app -n my-namespace kubectl rollout history deployment/my-app -n my-namespace kubectl rollout undo deployment/my-app -n my-namespace # Rollback kubectl rollout undo deployment/my-app --to-revision=2 -n my-namespace # --- Scale manually --- kubectl scale deployment my-app --replicas=5 -n my-namespace # --- Inspect events (cluster-wide issues) --- kubectl get events -n my-namespace --sort-by='.lastTimestamp' # --- Port-forward for local debugging --- kubectl port-forward pod/ 8080:8080 -n my-namespace kubectl port-forward svc/my-app 8080:80 -n my-namespace # --- Dry-run to validate YAML --- kubectl apply -f deployment.yaml --dry-run=client kubectl apply -f deployment.yaml --dry-run=server # Validates against live cluster ``` ### Diagnosing Common Errors ```bash # CrashLoopBackOff: container keeps crashing kubectl logs --previous -n my-namespace # Check crash logs kubectl describe pod -n my-namespace # Check exit code & OOMKilled # ImagePullBackOff: can't pull image kubectl describe pod -n my-namespace # Check Events section # Causes: wrong image tag, missing imagePullSecret, private registry # Pending pod: not scheduled kubectl describe pod -n my-namespace # Causes: insufficient resources, no matching node selector, taint/toleration mismatch # OOMKilled: out of memory # Increase memory limits, check for memory leaks kubectl describe pod -n my-namespace | grep -A5 "Last State" ``` --- ## Anti-Patterns ```yaml # BAD: Using :latest tag — non-deterministic deployments image: myapp:latest # GOOD: Pin to a specific immutable tag (SHA or semver) image: ghcr.io/org/myapp:1.4.2 # or image: ghcr.io/org/myapp@sha256:abc123... # --- # BAD: Running as root securityContext: {} # Defaults to root # GOOD: Non-root with explicit UID securityContext: runAsNonRoot: true runAsUser: 1001 # --- # BAD: No resource limits — one pod can starve the entire node containers: - name: app image: myapp:1.0.0 # No resources defined # GOOD: Always set requests and limits resources: requests: cpu: "100m" memory: "128Mi" limits: cpu: "500m" memory: "256Mi" # --- # BAD: Storing plaintext secrets in ConfigMaps apiVersion: v1 kind: ConfigMap data: DB_PASSWORD: "mysecretpassword" # NEVER — use Secret or external secrets manager # --- # BAD: ClusterAdmin for application service accounts apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding roleRef: kind: ClusterRole name: cluster-admin # Grants god-mode to your app # --- # BAD: minAvailable: 0 in PDB — defeats the purpose spec: minAvailable: 0 # --- # BAD: restartPolicy: Always in a Job (causes infinite restart loop) spec: restartPolicy: Always # Use OnFailure or Never for Jobs ``` --- ## Best Practices Checklist ### Security - [ ] Container runs as non-root (`runAsNonRoot: true`, `runAsUser` set) - [ ] `readOnlyRootFilesystem: true` with `emptyDir` for writable paths - [ ] `allowPrivilegeEscalation: false` - [ ] All capabilities dropped (`capabilities.drop: [ALL]`) - [ ] Dedicated ServiceAccount per app, not `default` - [ ] `automountServiceAccountToken: false` unless needed - [ ] RBAC follows least privilege (use `Role`, not `ClusterRole` unless needed) - [ ] Secrets managed via Sealed Secrets or External Secrets Operator ### Reliability - [ ] All 3 probe types configured (startup + liveness + readiness) - [ ] Resource requests AND limits set on every container - [ ] `minReplicas: 2+` for any production workload - [ ] PodDisruptionBudget defined for stateful or critical services - [ ] `RollingUpdate` strategy with `maxUnavailable: 0` - [ ] HPA configured for variable-load services ### Observability - [ ] App exposes `/health` (liveness) and `/ready` (readiness) endpoints - [ ] Structured JSON logging (no PII in logs) - [ ] Resource labels: `app`, `version`, `environment` --- ## Related Skills - `docker-patterns` — Multi-stage Dockerfiles and image security - `deployment-patterns` — CI/CD pipelines, rollback strategy, health check endpoints - `security-review` — Broader security hardening context - `git-workflow` — GitOps integration with K8s (ArgoCD / Flux patterns)