--- name: devops description: Provides comprehensive DevOps guidance including CI/CD pipeline design, infrastructure as code (Terraform, CloudFormation, Bicep), container orchestration (Docker, Kubernetes), deployment strategies (blue-green, canary, rolling), monitoring and observability, configuration management (Ansible, Chef, Puppet), and cloud platform automation (AWS, GCP, Azure). Produces deployment scripts, pipeline configurations, infrastructure code, and operational procedures. Use when designing CI/CD pipelines, automating infrastructure, containerizing applications, setting up monitoring, implementing deployment strategies, managing configurations, or when users mention DevOps, CI/CD, infrastructure automation, Kubernetes, Docker, Terraform, deployment, monitoring, or cloud operations. --- # DevOps ## Core Capabilities Provides expert guidance covering the entire software delivery lifecycle: 1. **CI/CD Pipeline Design** - Automated build, test, and deployment workflows 2. **Infrastructure as Code** - Cloud resource provisioning with Terraform, CloudFormation, Bicep 3. **Container Orchestration** - Docker and Kubernetes deployment patterns 4. **Deployment Strategies** - Blue-green, canary, and rolling deployments 5. **Monitoring & Observability** - Metrics, logging, alerting with Prometheus, Grafana, ELK 6. **Configuration Management** - Ansible, Chef, Puppet automation 7. **Security & Compliance** - DevSecOps practices and container security ## Best Practices ## CI/CD - Keep pipelines fast (< 10 minutes for feedback) - Fail fast with quick tests first - Use pipeline as code (version controlled) - Implement proper secret management - Enable artifact caching and parallelize independent jobs ### Infrastructure as Code - Use remote state with locking - Create reusable modules and pin versions - Always review plan before apply - Implement proper tagging strategy - Document resource dependencies ### Container Orchestration - Set resource requests and limits - Implement health checks (liveness/readiness probes) - Use pod anti-affinity for high availability - Enable horizontal pod autoscaling - Implement proper logging and monitoring ### Deployment - Use rolling updates with zero downtime - Implement proper health checks and rollback capabilities - Use canary/blue-green for critical applications - Test thoroughly in staging environments - Monitor post-deployment metrics ### Security - Run containers as non-root with read-only root filesystems - Scan images for vulnerabilities regularly - Implement network policies and secrets management - Enable pod security standards and least privilege access ### Monitoring - Collect metrics using RED/USE methods - Implement structured logging with meaningful alerts - Create actionable dashboards and monitor SLIs/SLOs - Set up distributed tracing for microservices ## Detailed References Load reference files based on specific needs: - **CI/CD Pipeline Design**: See [cicd-pipeline-design.md](references/cicd-pipeline-design.md) for: - GitHub Actions, GitLab CI, Jenkins pipeline examples - Automated build, test, deploy workflow patterns - Pipeline optimization and caching strategies - **Infrastructure as Code**: See [infrastructure-as-code.md](references/infrastructure-as-code.md) for: - Terraform, CloudFormation, Bicep patterns - AWS, GCP, Azure resource provisioning - Module design and state management - **Container Orchestration**: See [container-orchestration.md](references/container-orchestration.md) for: - Kubernetes manifests, Helm charts, Kustomize - Docker best practices and multi-stage builds - Service mesh and networking patterns - **Deployment Strategies**: See [deployment-strategies.md](references/deployment-strategies.md) for: - Blue-green deployment implementation - Canary release patterns with traffic splitting - Rolling update strategies and rollback procedures - **Monitoring & Observability**: See [monitoring-and-observability.md](references/monitoring-and-observability.md) for: - Prometheus, Grafana setup and configuration - ELK stack deployment and log aggregation - Alert rules, dashboards, and SLO definitions - **Security Best Practices**: See [security-best-practices.md](references/security-best-practices.md) for: - DevSecOps pipeline integration - Container security scanning and hardening - Secret management and compliance validation - **Configuration Management**: See [configuration-management.md](references/configuration-management.md) for: - Ansible playbooks, Chef recipes, Puppet manifests - Server configuration automation patterns - Infrastructure drift detection - **Common Commands**: See [common-commands.md](references/common-commands.md) for: - Kubernetes kubectl command reference - Docker CLI operations - Terraform and cloud provider CLI commands - **Troubleshooting**: See [troubleshooting-guide.md](references/troubleshooting-guide.md) for: - Common issues and resolution steps - Debugging techniques for containers and orchestration - Performance optimization strategies