--- name: kubernetes-flux description: Kubernetes cluster management and troubleshooting. Query pods, deployments, services, logs, and events. Supports context switching, scaling, and rollout management. Use for Kubernetes debugging, monitoring, and operations. version: 1.0 model: sonnet invoked_by: both user_invocable: true tools: [Read, Write, Bash] best_practices: - Verify kubectl is configured before operations - Use namespace flags for clarity - Check current context before cluster operations - Avoid destructive operations without confirmation - Mask secrets in output error_handling: graceful streaming: supported safety_level: high --- # Kubernetes Flux Skill ## Overview This skill provides comprehensive Kubernetes cluster management through kubectl, enabling AI agents to inspect, troubleshoot, and manage Kubernetes resources. ## When to Use - Debugging application pods and containers - Monitoring deployment rollouts and status - Analyzing service networking and endpoints - Investigating cluster events and errors - Troubleshooting performance issues - Managing application scaling - Port forwarding for local development ## Requirements - kubectl installed and configured - Valid KUBECONFIG file or default context - Cluster access credentials - Appropriate RBAC permissions ## Quick Reference ```bash # Get pods in current namespace kubectl get pods # Get pods in specific namespace kubectl get pods -n production # Get pods with labels kubectl get pods -l app=web -n production # Describe a pod kubectl describe pod my-app-123 -n default # Get pod logs kubectl logs my-app-123 -n default # Get logs with tail kubectl logs my-app-123 -n default --tail=100 # Get logs since time kubectl logs my-app-123 -n default --since=1h # List recent events kubectl get events -n default --sort-by='.lastTimestamp' | tail -20 # Watch events in real-time kubectl get events -n default -w ``` ## Resource Discovery ### Pods ```bash # List all pods kubectl get pods -n # List pods with wide output kubectl get pods -n -o wide # List pods across all namespaces kubectl get pods -A # Filter by label kubectl get pods -l app=nginx -n ``` ### Deployments ```bash # List deployments kubectl get deployments -n # Get deployment details kubectl describe deployment -n # Check rollout status kubectl rollout status deployment/ -n ``` ### Services ```bash # List services kubectl get svc -n # Describe service kubectl describe svc -n # Get endpoints kubectl get endpoints -n ``` ### ConfigMaps and Secrets ```bash # List ConfigMaps kubectl get configmaps -n # Describe ConfigMap kubectl describe configmap -n # Get ConfigMap data kubectl get configmap -n -o yaml # List Secrets (names only) kubectl get secrets -n # Describe Secret (values masked) kubectl describe secret -n ``` ### Namespaces ```bash # List namespaces kubectl get namespaces # Get namespace details kubectl describe namespace ``` ## Troubleshooting ### Pod Debugging ```bash # Describe pod for events and conditions kubectl describe pod -n # Get pod logs kubectl logs -n # Get logs from specific container kubectl logs -c -n # Get previous container logs (after crash) kubectl logs -n --previous # Exec into pod kubectl exec -it -n -- /bin/sh # Run command in pod kubectl exec -n -- ls -la /app ``` ### Events ```bash # List events sorted by time kubectl get events -n --sort-by='.lastTimestamp' # Filter warning events kubectl get events -n --field-selector type=Warning # Watch events live kubectl get events -n -w ``` ## Management Operations ### Scaling ```bash # Scale deployment kubectl scale deployment --replicas=5 -n # Autoscale deployment kubectl autoscale deployment --min=2 --max=10 --cpu-percent=80 -n ``` ### Rollouts ```bash # Check rollout status kubectl rollout status deployment/ -n # View rollout history kubectl rollout history deployment/ -n # Rollback to previous version kubectl rollout undo deployment/ -n # Rollback to specific revision kubectl rollout undo deployment/ --to-revision=2 -n ``` ### Port Forwarding ```bash # Forward local port to pod kubectl port-forward 8080:80 -n # Forward to service kubectl port-forward svc/ 8080:80 -n ``` ## Context Management ```bash # Get current context kubectl config current-context # List all contexts kubectl config get-contexts # Switch context kubectl config use-context # Set default namespace kubectl config set-context --current --namespace= ``` ## Common Workflows ### Troubleshoot a Failing Pod ```bash # 1. Find the problematic pod kubectl get pods -n production # 2. Describe for events kubectl describe pod -n production # 3. Check events kubectl get events -n production --sort-by='.lastTimestamp' | tail -20 # 4. Get logs kubectl logs -n production --tail=200 ``` ### Monitor Deployment Rollout ```bash # 1. Check deployment status kubectl get deployments -n production # 2. Watch rollout kubectl rollout status deployment/ -n production # 3. Watch pods kubectl get pods -l app= -n production -w ``` ### Debug Service Connectivity ```bash # 1. Check service kubectl describe svc -n # 2. Check endpoints kubectl get endpoints -n # 3. Check backing pods kubectl get pods -l -n # 4. Port forward for testing kubectl port-forward svc/ 8080:80 -n ``` ## Safety Features ### Blocked Operations The following are dangerous and require confirmation: - `kubectl delete` commands - Destructive exec commands (rm, dd, mkfs) - Scale to 0 replicas in production ### Masked Output Secret values are always masked. Only metadata shown. ## Error Handling | Error | Cause | Fix | | --------------------------- | ------------------- | --------------------- | | `kubectl not found` | Not installed | Install kubectl | | `Unable to connect` | Cluster unreachable | Check network/VPN | | `Forbidden` | RBAC permissions | Request permissions | | `NotFound` | Resource missing | Verify name/namespace | | `context deadline exceeded` | Timeout | Check cluster health | ## Related - kubectl docs: https://kubernetes.io/docs/reference/kubectl/ - Kubernetes API: https://kubernetes.io/docs/reference/kubernetes-api/ ## Memory Protocol (MANDATORY) **Before starting:** ```bash cat .claude/context/memory/learnings.md ``` **After completing:** Record any new patterns or exceptions discovered. > ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.