--- name: kubernetes-patterns description: Provides Kubernetes resource management, Helm chart patterns, service mesh configuration, and autoscaling strategies. Covers HPA, VPA, KEDA, operators, security contexts, and namespace isolation. Use when user mentions 'kubernetes', 'k8s', 'helm', 'istio', 'linkerd', 'service mesh', 'HPA', 'VPA', 'KEDA', 'pod security', 'resource quotas', 'operators'. type: skill category: patterns status: stable origin: tibsfox modified: false first_seen: 2026-02-07 first_path: examples/kubernetes-patterns/SKILL.md superseded_by: null --- # Kubernetes Patterns Best practices for deploying, scaling, securing, and managing workloads on Kubernetes. This skill covers resource management, Helm chart structure, service mesh configuration, autoscaling strategies, and security hardening. ## Resource Management Every container must declare resource requests and limits. Without them, the scheduler cannot make informed placement decisions and nodes can become overcommitted. | Resource Type | Request (Guaranteed) | Limit (Maximum) | What Happens at Limit | |---------------|---------------------|-----------------|----------------------| | CPU | Reserved on node | Throttled (not killed) | Container slows down | | Memory | Reserved on node | OOM-killed | Container restarts | | Ephemeral Storage | Reserved on node | Evicted | Pod removed from node | | GPU | Reserved on node | Hard limit | Cannot exceed | ### QoS Classes Kubernetes assigns QoS classes based on resource declarations. This determines eviction priority. | QoS Class | Condition | Eviction Priority | |-----------|-----------|-------------------| | Guaranteed | requests == limits for all containers | Last (highest priority) | | Burstable | requests < limits for at least one container | Middle | | BestEffort | No requests or limits set | First (lowest priority) | ### Resource Declaration Best Practices ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: api-server namespace: production spec: replicas: 3 selector: matchLabels: app: api-server template: metadata: labels: app: api-server version: v2.1.0 spec: # Topology spread for high availability topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: app: api-server containers: - name: api image: ghcr.io/our-org/api@sha256:a1b2c3d4e5f6 ports: - containerPort: 8080 protocol: TCP resources: requests: cpu: 250m # 0.25 cores -- baseline memory: 256Mi # baseline memory limits: cpu: "1" # burst to 1 core memory: 512Mi # hard cap prevents OOM cascade # Probes are essential for rolling updates readinessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 5 periodSeconds: 10 livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 15 periodSeconds: 20 failureThreshold: 3 startupProbe: httpGet: path: /healthz port: 8080 failureThreshold: 30 periodSeconds: 2 ``` ## Namespace Isolation Strategies Namespaces provide logical boundaries. Combine with NetworkPolicies and RBAC for true isolation. | Strategy | Isolation Level | Use Case | |----------|----------------|----------| | Per-team | Medium | Small org, shared cluster | | Per-environment | Medium | Dev/staging/prod in one cluster | | Per-application | High | Microservices with strict boundaries | | Per-tenant | Highest | Multi-tenant SaaS | ### Resource Quotas and Limit Ranges ```yaml # ResourceQuota: caps total resource consumption per namespace apiVersion: v1 kind: ResourceQuota metadata: name: team-alpha-quota namespace: team-alpha spec: hard: requests.cpu: "10" requests.memory: 20Gi limits.cpu: "20" limits.memory: 40Gi pods: "50" services: "20" persistentvolumeclaims: "10" secrets: "30" configmaps: "30" --- # LimitRange: sets defaults and bounds per container apiVersion: v1 kind: LimitRange metadata: name: team-alpha-limits namespace: team-alpha spec: limits: - type: Container default: cpu: 500m memory: 256Mi defaultRequest: cpu: 100m memory: 128Mi min: cpu: 50m memory: 64Mi max: cpu: "4" memory: 4Gi - type: PersistentVolumeClaim min: storage: 1Gi max: storage: 50Gi ``` ### Network Policy for Namespace Isolation ```yaml # Default deny all ingress and egress apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all namespace: team-alpha spec: podSelector: {} policyTypes: - Ingress - Egress --- # Allow only within namespace + DNS apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-intra-namespace namespace: team-alpha spec: podSelector: {} policyTypes: - Ingress - Egress ingress: - from: - podSelector: {} egress: - to: - podSelector: {} - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system podSelector: matchLabels: k8s-app: kube-dns ports: - protocol: UDP port: 53 - protocol: TCP port: 53 ``` ## Helm Chart Structure Helm charts package Kubernetes manifests with templating and dependency management. ### Standard Chart Layout ``` my-app/ Chart.yaml # Chart metadata, version, dependencies Chart.lock # Locked dependency versions values.yaml # Default configuration values values-staging.yaml # Environment-specific overrides values-production.yaml # Environment-specific overrides templates/ _helpers.tpl # Template helper functions deployment.yaml # Deployment manifest service.yaml # Service manifest ingress.yaml # Ingress manifest hpa.yaml # HorizontalPodAutoscaler configmap.yaml # ConfigMap secret.yaml # Secret (sealed or external) serviceaccount.yaml # ServiceAccount networkpolicy.yaml # NetworkPolicy pdb.yaml # PodDisruptionBudget tests/ test-connection.yaml # Helm test hooks charts/ # Dependency charts (vendored) ``` ### Chart.yaml Best Practices ```yaml apiVersion: v2 name: my-app description: A Helm chart for the My App API service type: application version: 1.4.0 # Chart version (bump on chart changes) appVersion: "2.1.0" # Application version (bump on app changes) dependencies: - name: postgresql version: "13.x" repository: https://charts.bitnami.com/bitnami condition: postgresql.enabled - name: redis version: "18.x" repository: https://charts.bitnami.com/bitnami condition: redis.enabled maintainers: - name: Platform Team email: platform@company.com ``` ### Helm Template with Guards ```yaml # templates/deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: {{ include "my-app.fullname" . }} labels: {{- include "my-app.labels" . | nindent 4 }} spec: {{- if not .Values.autoscaling.enabled }} replicas: {{ .Values.replicaCount }} {{- end }} selector: matchLabels: {{- include "my-app.selectorLabels" . | nindent 6 }} template: metadata: annotations: # Force rollout on config changes checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }} labels: {{- include "my-app.selectorLabels" . | nindent 8 }} spec: serviceAccountName: {{ include "my-app.serviceAccountName" . }} securityContext: {{- toYaml .Values.podSecurityContext | nindent 8 }} containers: - name: {{ .Chart.Name }} securityContext: {{- toYaml .Values.securityContext | nindent 12 }} image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}" imagePullPolicy: {{ .Values.image.pullPolicy }} ports: - name: http containerPort: {{ .Values.service.targetPort }} protocol: TCP {{- with .Values.resources }} resources: {{- toYaml . | nindent 12 }} {{- end }} {{- with .Values.env }} env: {{- toYaml . | nindent 12 }} {{- end }} ``` ## Service Mesh: Istio Configuration Service meshes handle traffic management, security, and observability at the infrastructure layer. ### Istio vs Linkerd Comparison | Aspect | Istio | Linkerd | |--------|-------|---------| | Complexity | High (many CRDs, control plane components) | Low (minimal, opinionated) | | Resource Overhead | ~100MB per sidecar | ~25MB per sidecar | | mTLS | Configurable (permissive/strict) | On by default | | Traffic Management | Very flexible (VirtualService, DestinationRule) | Basic (TrafficSplit, ServiceProfile) | | Multi-cluster | Built-in | Supported with multicluster extension | | Learning Curve | Steep | Gentle | | Best For | Complex routing, advanced policies | Simple mTLS + observability | ### Istio VirtualService: Canary with Header Routing ```yaml apiVersion: networking.istio.io/v1 kind: VirtualService metadata: name: api-server namespace: production spec: hosts: - api-server - api.company.com gateways: - mesh # In-mesh traffic - api-gateway # External traffic http: # Route internal testers to canary via header - match: - headers: x-canary: exact: "true" route: - destination: host: api-server subset: canary weight: 100 # Weighted canary for production traffic - route: - destination: host: api-server subset: stable weight: 90 - destination: host: api-server subset: canary weight: 10 retries: attempts: 3 perTryTimeout: 2s retryOn: 5xx,reset,connect-failure timeout: 10s --- apiVersion: networking.istio.io/v1 kind: DestinationRule metadata: name: api-server namespace: production spec: host: api-server trafficPolicy: connectionPool: tcp: maxConnections: 100 http: h2UpgradePolicy: DEFAULT maxRequestsPerConnection: 1000 outlierDetection: consecutive5xxErrors: 5 interval: 30s baseEjectionTime: 30s maxEjectionPercent: 50 subsets: - name: stable labels: version: v2.0.0 - name: canary labels: version: v2.1.0 ``` ## Autoscaling Strategies ### HPA with Custom Metrics ```yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: api-server-hpa namespace: production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: api-server minReplicas: 3 maxReplicas: 50 behavior: scaleUp: stabilizationWindowSeconds: 60 policies: - type: Percent value: 100 # Double capacity per minute periodSeconds: 60 - type: Pods value: 5 # Or add 5 pods, whichever is higher periodSeconds: 60 selectPolicy: Max scaleDown: stabilizationWindowSeconds: 300 # Wait 5 min before scaling down policies: - type: Percent value: 25 # Remove 25% per 2 minutes periodSeconds: 120 selectPolicy: Min metrics: # CPU-based scaling - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 # Memory-based scaling - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 # Custom metric: requests per second from Prometheus - type: Pods pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: "1000" ``` ### KEDA ScaledObject: Event-Driven Autoscaling ```yaml apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: order-processor namespace: production spec: scaleTargetRef: name: order-processor pollingInterval: 15 # Check triggers every 15s cooldownPeriod: 60 # Wait 60s after last trigger before scale-down minReplicaCount: 1 # Minimum replicas (0 for scale-to-zero) maxReplicaCount: 100 fallback: failureThreshold: 3 replicas: 5 # Fallback if scaler fails triggers: # Scale based on Kafka consumer lag - type: kafka metadata: bootstrapServers: kafka.production:9092 consumerGroup: order-processor topic: orders lagThreshold: "50" # Scale up when lag > 50 per partition # Scale based on RabbitMQ queue depth - type: rabbitmq metadata: host: amqp://rabbitmq.production:5672 queueName: order-queue queueLength: "100" # Scale based on Prometheus metric - type: prometheus metadata: serverAddress: http://prometheus.monitoring:9090 query: | sum(rate(http_requests_total{service="order-processor"}[2m])) threshold: "500" --- # Scale-to-zero for batch jobs apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: report-generator namespace: batch spec: scaleTargetRef: name: report-generator minReplicaCount: 0 # Scale to zero when idle maxReplicaCount: 10 triggers: - type: cron metadata: timezone: America/New_York start: 0 2 * * * # Scale up at 2 AM end: 0 6 * * * # Scale down at 6 AM desiredReplicas: "5" ``` ### Autoscaling Strategy Comparison | Strategy | Scales On | Scale-to-Zero | Latency | Best For | |----------|-----------|---------------|---------|----------| | HPA (CPU/Memory) | Resource utilization | No | Seconds | Steady traffic patterns | | HPA (Custom) | Application metrics | No | Seconds | API servers, web apps | | VPA | Historical usage | No | Pod restart | Right-sizing resources | | KEDA | External events | Yes | Seconds | Event-driven workloads | | Cluster Autoscaler | Node pressure | No | Minutes | Node pool management | | Karpenter | Pod scheduling needs | No | Seconds | Fast, flexible node scaling | ## Pod Security Best Practices ### Security Context Configuration ```yaml apiVersion: v1 kind: Pod metadata: name: secure-app namespace: production spec: # Pod-level security context securityContext: runAsNonRoot: true runAsUser: 10001 runAsGroup: 10001 fsGroup: 10001 seccompProfile: type: RuntimeDefault serviceAccountName: app-service-account automountServiceAccountToken: false # Disable unless needed containers: - name: app image: ghcr.io/our-org/app@sha256:abc123 # Container-level security context securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: - ALL # Only add specific capabilities if absolutely needed # add: # - NET_BIND_SERVICE volumeMounts: - name: tmp mountPath: /tmp - name: cache mountPath: /app/cache volumes: # Writable dirs for read-only root filesystem - name: tmp emptyDir: sizeLimit: 100Mi - name: cache emptyDir: sizeLimit: 500Mi ``` ### Pod Security Standards (PSS) | Level | Description | Key Restrictions | |-------|-------------|-----------------| | Privileged | Unrestricted | None (cluster admin workloads) | | Baseline | Minimally restrictive | No hostNetwork, hostPID, hostIPC, privileged containers | | Restricted | Heavily restricted | runAsNonRoot, drop ALL capabilities, readOnlyRootFilesystem, seccomp | ```yaml # Enforce restricted standard on namespace apiVersion: v1 kind: Namespace metadata: name: production labels: pod-security.kubernetes.io/enforce: restricted pod-security.kubernetes.io/enforce-version: latest pod-security.kubernetes.io/audit: restricted pod-security.kubernetes.io/warn: restricted ``` ## Operator Pattern Operators extend Kubernetes with domain-specific controllers that encode operational knowledge. ### When to Build an Operator | Use Operator | Don't Use Operator | |-------------|-------------------| | Stateful applications (databases, caches) | Stateless apps (use Deployment) | | Complex lifecycle management | Simple CRUD workloads | | Custom scaling logic | Standard HPA is sufficient | | Automated backup/restore | Manual operations are fine | | Multi-step provisioning | Single manifest applies cleanly | ### Operator Maturity Model | Level | Capability | Example | |-------|-----------|---------| | 1 - Basic Install | Automated install, lifecycle hooks | Helm chart with operator | | 2 - Seamless Upgrades | Patch and minor version upgrades | Rolling update strategy | | 3 - Full Lifecycle | Backup, restore, failure recovery | Automated database failover | | 4 - Deep Insights | Metrics, alerts, log processing | Custom Prometheus exporters | | 5 - Auto Pilot | Auto-scaling, tuning, anomaly detection | Self-healing database cluster | ## Anti-Patterns | Anti-Pattern | Problem | Fix | |--------------|---------|-----| | No resource requests/limits | Node overcommit, OOM kills, unpredictable scheduling | Set requests and limits on every container | | `latest` image tag | Non-reproducible deployments, silent breakage | Use immutable tags or `@sha256:` digest | | Running as root | Container escape leads to host compromise | `runAsNonRoot: true`, `runAsUser: 10001` | | No readiness probe | Traffic sent to unready pods, user-facing errors | Always define readinessProbe with appropriate thresholds | | No PodDisruptionBudget | Cluster upgrades kill all replicas simultaneously | Set PDB with `minAvailable` or `maxUnavailable` | | Single replica in production | Any disruption causes downtime | Minimum 3 replicas with topology spread | | Hardcoded config in images | Rebuilds needed for config changes | Use ConfigMaps, Secrets, environment variables | | ClusterRole for app workloads | Excessive permissions across all namespaces | Namespace-scoped Roles with least privilege | | No NetworkPolicy | All pods can talk to all pods (flat network) | Default-deny with explicit allow rules | | Helm values in CI pipeline | Config scattered, hard to audit | `values-{env}.yaml` files in Git | | `kubectl apply` in production | No rollback tracking, no drift detection | GitOps with Argo CD or Flux | | Ignoring pod topology spread | All replicas on same node/zone | `topologySpreadConstraints` for HA | | No seccomp profile | Containers can use any syscall | `seccompProfile.type: RuntimeDefault` | | Mounting service account tokens | Compromised pod can access API server | `automountServiceAccountToken: false` unless needed | ## Kubernetes Security Checklist - [ ] All containers define resource requests and limits - [ ] Images pinned by digest (`@sha256:`) not mutable tags - [ ] `runAsNonRoot: true` on all pods - [ ] `allowPrivilegeEscalation: false` on all containers - [ ] `readOnlyRootFilesystem: true` with explicit writable mounts - [ ] All capabilities dropped (`drop: [ALL]`), add back only as needed - [ ] `seccompProfile.type: RuntimeDefault` on all pods - [ ] `automountServiceAccountToken: false` unless API access is needed - [ ] NetworkPolicies enforce default-deny with explicit allow rules - [ ] Pod Security Standards enforced at namespace level (`restricted`) - [ ] RBAC uses namespace-scoped Roles (not ClusterRoles) for workloads - [ ] Secrets encrypted at rest (EncryptionConfiguration or KMS provider) - [ ] PodDisruptionBudgets defined for all production workloads - [ ] Topology spread constraints distribute pods across zones - [ ] Readiness, liveness, and startup probes configured on all containers - [ ] Helm charts use `values-{env}.yaml` per environment, reviewed in PRs - [ ] Image pull policies set to `IfNotPresent` for tagged, `Always` for `latest` - [ ] Service mesh mTLS enabled for inter-service communication - [ ] Audit logging enabled on API server with appropriate retention - [ ] Cluster upgrades tested in staging before production rollout