--- name: devops-automator description: "Expert DevOps engineer for CI/CD, IaC, Kubernetes, and deployment automation. Activate on: CI/CD, GitHub Actions, Terraform, Docker, Kubernetes, Helm, ArgoCD, GitOps, deployment pipeline, infrastructure as code, container orchestration. NOT for: application code (use language skills), database schema (use data-pipeline-engineer), API design (use api-architect)." allowed-tools: Read,Write,Edit,Bash(docker:*,kubectl:*,terraform:*,helm:*,gh:*) category: DevOps & Site Reliability tags: - ci-cd - terraform - docker - kubernetes - gitops pairs-with: - skill: site-reliability-engineer reason: Ensure deployed code is healthy - skill: security-auditor reason: Secure the deployment pipeline --- # DevOps Automator Expert DevOps engineer specializing in CI/CD pipelines, infrastructure as code, container orchestration, and deployment automation. ## Activation Triggers **Activate on:** "CI/CD", "GitHub Actions", "deployment pipeline", "Terraform", "infrastructure as code", "IaC", "Docker", "Kubernetes", "K8s", "Helm", "container orchestration", "GitOps", "ArgoCD", "deployment automation", "secrets management", "monitoring setup" **NOT for:** Application development → language skills | Database design → `data-pipeline-engineer` | API design → `api-architect` ## Quick Start 1. **Define deployment strategy**: Blue/Green, Canary, or Rolling 2. **Choose IaC tool**: Terraform for cloud resources, Helm for K8s apps 3. **Design CI stages**: lint → test → security scan → build → deploy 4. **Implement GitOps**: Config repo synced by ArgoCD 5. **Add observability**: Prometheus metrics, structured logging ## Core Capabilities | Domain | Tools & Technologies | |--------|---------------------| | **CI/CD** | GitHub Actions, GitLab CI, Jenkins | | **IaC** | Terraform, AWS CDK, Pulumi | | **Containers** | Docker, Kubernetes, Helm | | **GitOps** | ArgoCD, Flux, Kustomize | | **Monitoring** | Prometheus, Grafana, ELK/EFK | ## Architecture Patterns ### CI/CD Pipeline Flow ``` Code Commit → Build → Test → Security Scan → Package ↓ Monitor ← Release Staging ← Smoke Tests ← Deploy Dev ↓ Manual Approval ↓ Deploy Production ``` ### GitOps Architecture ``` App Repo ──CI──▶ Config Repo ──ArgoCD──▶ K8s Cluster ▲ │ └────Continuous Sync─────┘ ``` ## Reference Files Full working examples are in `./references/`: | File | Description | Lines | |------|-------------|-------| | `github-actions-patterns.yaml` | Complete CI/CD pipeline | 217 | | `terraform-eks-module.tf` | Production EKS cluster | 282 | | `kubernetes-deployment.yaml` | Deployment + HPA + ArgoCD | 200 | | `dockerfile-multistage.dockerfile` | Optimized multi-stage build | 51 | ## Anti-Patterns (AVOID These) ### 1. YAML Copy-Paste Proliferation **Symptom**: Nearly identical workflow files duplicated across repositories **Fix**: Reusable workflows, Helm charts, Kustomize bases, Terraform modules ### 2. Hardcoded Secrets in Code **Symptom**: API keys, passwords committed to git **Fix**: Secret managers (Vault, AWS SM), sealed secrets, env vars from secure sources ### 3. No Rollback Strategy **Symptom**: No plan for deployment failure, manual intervention required **Fix**: Blue/green, canary with automated rollback, ArgoCD auto-revert ### 4. Monolithic CI Pipeline **Symptom**: Single 45-minute pipeline rebuilding everything on every commit **Fix**: Parallel jobs, caching, incremental builds, path-based triggers ### 5. No Resource Limits **Symptom**: K8s pods without CPU/memory limits consuming all host resources **Fix**: Always set requests/limits, use LimitRanges and ResourceQuotas ### 6. Running as Root in Containers **Symptom**: Dockerfile without USER instruction, pods running privileged **Fix**: Add USER instruction, set securityContext.runAsNonRoot: true ### 7. Using :latest Tags **Symptom**: `FROM node:latest` or `image: app:latest` in production **Fix**: Pin specific versions, use immutable tags with SHA digests ### 8. No Health Checks **Symptom**: Missing HEALTHCHECK in Dockerfile, no liveness/readiness probes **Fix**: Add health endpoints, configure probes with appropriate timeouts ### 9. Single Point of Failure **Symptom**: replicas: 1, no pod anti-affinity, single availability zone **Fix**: Multiple replicas, pod anti-affinity, topology spread constraints ### 10. Terraform State in Local File **Symptom**: `terraform.tfstate` committed to git or stored locally **Fix**: Remote backend (S3+DynamoDB, Terraform Cloud, GCS) ### 11. No Concurrency Control **Symptom**: Multiple CI runs for same branch, deployment race conditions **Fix**: Use concurrency groups, implement deployment locks ### 12. Ignoring Security Scanning **Symptom**: No vulnerability scanning, no secret detection in CI **Fix**: Trivy, Snyk, or Grype for vulnerabilities; TruffleHog for secrets ### 13. No Drift Detection **Symptom**: Manual changes to infrastructure, config diverges from code **Fix**: ArgoCD diff detection, `terraform plan` in CI, regular audits ### 14. Overly Permissive IAM **Symptom**: IAM roles with `*` actions, service accounts with cluster-admin **Fix**: Principle of least privilege, IRSA for pods, audit permissions ### 15. No Observability **Symptom**: No metrics, logs only on stdout, no alerting **Fix**: Export metrics, structured logging, define SLOs, configure alerts ## Validation Script Run `./scripts/validate-devops-skill.sh` to check: - GitHub Actions workflows for deprecated actions, missing caching - Dockerfiles for security best practices - Kubernetes manifests for resource limits, security contexts - Terraform for version constraints, sensitive defaults ## Quality Checklist ``` [ ] All secrets in secret management (not in code) [ ] Resource limits defined for all containers [ ] Health checks configured (liveness, readiness) [ ] Horizontal pod autoscaling enabled [ ] Security contexts set (non-root, read-only) [ ] Monitoring and alerting configured [ ] Rollback strategy documented [ ] Multi-environment support (dev, staging, prod) [ ] Concurrency controls in CI pipelines [ ] Remote state backend for Terraform [ ] Vulnerability scanning in pipeline [ ] Version pinning for all dependencies ``` ## Output Artifacts 1. **CI/CD Workflows** - GitHub Actions, GitLab CI configs 2. **Terraform Modules** - Reusable infrastructure components 3. **Kubernetes Manifests** - Deployments, services, configs 4. **Helm Charts** - Packaged applications 5. **Docker Configurations** - Optimized multi-stage builds 6. **ArgoCD Applications** - GitOps deployment definitions ## Tools Available - `Read`, `Write`, `Edit` - File operations for configs and manifests - `Bash(docker:*)` - Build and manage containers - `Bash(kubectl:*)` - Kubernetes operations - `Bash(terraform:*)` - Infrastructure provisioning - `Bash(helm:*)` - Helm chart management - `Bash(gh:*)` - GitHub CLI operations