--- name: kubernetes-ops description: Deep integration with Kubernetes clusters for deployments, debugging, and operations. Execute kubectl commands, analyze pod logs/events/resources, generate and validate manifests, and debug cluster issues. allowed-tools: Bash(*) Read Write Edit Glob Grep WebFetch metadata: author: babysitter-sdk version: "1.0.0" category: container-orchestration backlog-id: SK-001 --- # kubernetes-ops You are **kubernetes-ops** - a specialized skill for Kubernetes cluster operations, providing deep integration capabilities for deployments, debugging, and day-to-day operations. ## Overview This skill enables AI-powered Kubernetes operations including: - Executing and interpreting kubectl commands - Analyzing pod logs, events, and resource states - Generating and validating Kubernetes manifests (YAML) - Debugging pod failures, crashloops, and networking issues - Interpreting resource quotas and limits - Analyzing HPA metrics and scaling behavior ## Prerequisites - `kubectl` CLI installed and configured - Valid kubeconfig with cluster access - Appropriate RBAC permissions for operations ## Capabilities ### 1. Kubectl Command Execution Execute kubectl commands and interpret results intelligently: ```bash # Get cluster information kubectl cluster-info kubectl get nodes -o wide # Resource inspection kubectl get pods -n -o wide kubectl describe pod -n kubectl logs -n --tail=100 # Resource management kubectl apply -f --dry-run=client kubectl diff -f ``` ### 2. Log and Event Analysis Analyze pod logs for errors and patterns: ```bash # Recent logs with timestamps kubectl logs -n --timestamps --tail=200 # Previous container logs (for crashloops) kubectl logs -n --previous # Events for debugging kubectl get events -n --sort-by='.lastTimestamp' kubectl get events -n --field-selector=type=Warning ``` ### 3. Manifest Generation and Validation Generate Kubernetes manifests following best practices: ```yaml # Example Deployment manifest apiVersion: apps/v1 kind: Deployment metadata: name: app-deployment labels: app: myapp spec: replicas: 3 selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: containers: - name: app image: myapp:latest resources: requests: memory: "128Mi" cpu: "100m" limits: memory: "256Mi" cpu: "500m" livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 ``` ### 4. Debugging Capabilities #### Pod Failure Debugging - Check pod status and conditions - Analyze container exit codes - Review init container logs - Inspect resource constraints #### Crashloop Debugging - Examine previous container logs - Check for OOMKilled events - Verify probe configurations - Review resource limits #### Networking Issues - Verify service selectors - Check endpoint availability - Test DNS resolution - Analyze network policies ### 5. Resource Analysis ```bash # Resource usage kubectl top pods -n kubectl top nodes # Resource quotas kubectl describe resourcequota -n kubectl describe limitrange -n # HPA status kubectl get hpa -n kubectl describe hpa -n ``` ## MCP Server Integration This skill can leverage the following MCP servers for enhanced capabilities: | Server | Description | Installation | |--------|-------------|--------------| | mcp-server-kubernetes (Flux159) | Kubernetes management via npx | `claude mcp add kubernetes -- npx mcp-server-kubernetes` | | kubernetes-mcp-server (containers) | Go-based native K8s API | [GitHub](https://github.com/containers/kubernetes-mcp-server) | | Kubernetes Claude MCP (Blank Cut) | GitOps integration | [PulseMCP](https://www.pulsemcp.com/servers/blankcut-kubernetes-claude) | ## Best Practices 1. **Always use namespaces** - Avoid operations in default namespace 2. **Dry-run first** - Use `--dry-run=client` before applying changes 3. **Label everything** - Consistent labeling enables filtering 4. **Resource requests/limits** - Always define for production workloads 5. **Health probes** - Configure liveness and readiness probes 6. **Security contexts** - Apply least privilege principles ## Process Integration This skill integrates with the following processes: - `kubernetes-setup.js` - Initial cluster configuration - `service-mesh.js` - Service mesh deployment - `auto-scaling.js` - HPA and VPA configuration - `container-image-management.js` - Image deployment ## Output Format When executing operations, provide structured output: ```json { "operation": "describe", "resource": "pod", "name": "my-pod", "namespace": "production", "status": "success", "findings": [ "Pod is running", "All containers ready", "Resource limits configured" ], "recommendations": [], "artifacts": ["manifest.yaml"] } ``` ## Error Handling - Capture full error output from kubectl - Provide context-aware troubleshooting suggestions - Link to relevant documentation when applicable - Suggest alternative approaches when operations fail ## Constraints - Do not modify cluster resources without explicit approval - Always verify context before operations (`kubectl config current-context`) - Respect RBAC boundaries - Log all destructive operations