--- name: "senior-devops" description: Comprehensive DevOps skill for CI/CD, infrastructure automation, containerization, and cloud platforms (AWS, GCP, Azure). Includes pipeline setup, infrastructure as code, deployment automation, and monitoring. Use when setting up pipelines, deploying applications, managing infrastructure, implementing monitoring, or optimizing deployment processes. --- # Senior Devops Complete toolkit for senior devops with modern tools and best practices. ## Quick Start ### Main Capabilities This skill provides three core capabilities through automated scripts: ```bash # Script 1: Pipeline Generator — scaffolds CI/CD pipelines for GitHub Actions or CircleCI python scripts/pipeline_generator.py ./app --platform=github --stages=build,test,deploy # Script 2: Terraform Scaffolder — generates and validates IaC modules for AWS/GCP/Azure python scripts/terraform_scaffolder.py ./infra --provider=aws --module=ecs-service --verbose # Script 3: Deployment Manager — orchestrates container deployments with rollback support python scripts/deployment_manager.py deploy --env=production --image=app:1.2.3 --strategy=blue-green ``` ## Core Capabilities ### 1. Pipeline Generator Scaffolds CI/CD pipeline configurations for GitHub Actions or CircleCI, with stages for build, test, security scan, and deploy. **Example — GitHub Actions workflow:** ```yaml # .github/workflows/ci.yml name: CI/CD Pipeline on: push: branches: [main, develop] pull_request: branches: [main] jobs: build-and-test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Set up Node.js uses: actions/setup-node@v4 with: node-version: '20' cache: 'npm' - run: npm ci - run: npm run lint - run: npm test -- --coverage - name: Upload coverage uses: codecov/codecov-action@v4 build-docker: needs: build-and-test runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Build and push image uses: docker/build-push-action@v5 with: push: ${{ github.ref == 'refs/heads/main' }} tags: ghcr.io/${{ github.repository }}:${{ github.sha }} deploy: needs: build-docker if: github.ref == 'refs/heads/main' runs-on: ubuntu-latest steps: - name: Deploy to ECS run: | aws ecs update-service \ --cluster production \ --service app-service \ --force-new-deployment ``` **Usage:** ```bash python scripts/pipeline_generator.py --platform=github|circleci --stages=build,test,deploy ``` ### 2. Terraform Scaffolder Generates, validates, and plans Terraform modules. Enforces consistent module structure and runs `terraform validate` + `terraform plan` before any apply. **Example — AWS ECS service module:** ```hcl # modules/ecs-service/main.tf resource "aws_ecs_task_definition" "app" { family = var.service_name requires_compatibilities = ["FARGATE"] network_mode = "awsvpc" cpu = var.cpu memory = var.memory container_definitions = jsonencode([{ name = var.service_name image = var.container_image essential = true portMappings = [{ containerPort = var.container_port protocol = "tcp" }] environment = [for k, v in var.env_vars : { name = k, value = v }] logConfiguration = { logDriver = "awslogs" options = { awslogs-group = "/ecs/${var.service_name}" awslogs-region = var.aws_region awslogs-stream-prefix = "ecs" } } }]) } resource "aws_ecs_service" "app" { name = var.service_name cluster = var.cluster_id task_definition = aws_ecs_task_definition.app.arn desired_count = var.desired_count launch_type = "FARGATE" network_configuration { subnets = var.private_subnet_ids security_groups = [aws_security_group.app.id] assign_public_ip = false } load_balancer { target_group_arn = aws_lb_target_group.app.arn container_name = var.service_name container_port = var.container_port } } ``` **Usage:** ```bash python scripts/terraform_scaffolder.py --provider=aws|gcp|azure --module=ecs-service|gke-deployment|aks-service [--verbose] ``` ### 3. Deployment Manager Orchestrates deployments with blue/green or rolling strategies, health-check gates, and automatic rollback on failure. **Example — Kubernetes blue/green deployment (blue-slot specific elements):** ```yaml # k8s/deployment-blue.yaml apiVersion: apps/v1 kind: Deployment metadata: name: app-blue labels: app: myapp slot: blue # slot label distinguishes blue from green spec: replicas: 3 selector: matchLabels: app: myapp slot: blue template: metadata: labels: app: myapp slot: blue spec: containers: - name: app image: ghcr.io/org/app:1.2.3 readinessProbe: # gate: pod must pass before traffic switches httpGet: path: /healthz port: 8080 initialDelaySeconds: 10 periodSeconds: 5 resources: requests: cpu: "250m" memory: "256Mi" limits: cpu: "500m" memory: "512Mi" ``` **Usage:** ```bash python scripts/deployment_manager.py deploy \ --env=staging|production \ --image=app:1.2.3 \ --strategy=blue-green|rolling \ --health-check-url=https://app.example.com/healthz python scripts/deployment_manager.py rollback --env=production --to-version=1.2.2 python scripts/deployment_manager.py --analyze --env=production # audit current state ``` ## Resources - Pattern Reference: `references/cicd_pipeline_guide.md` — detailed CI/CD patterns, best practices, anti-patterns - Workflow Guide: `references/infrastructure_as_code.md` — IaC step-by-step processes, optimization, troubleshooting - Technical Guide: `references/deployment_strategies.md` — deployment strategy configs, security considerations, scalability - Tool Scripts: `scripts/` directory ## Development Workflow ### 1. Infrastructure Changes (Terraform) ```bash # Scaffold or update module python scripts/terraform_scaffolder.py ./infra --provider=aws --module=ecs-service --verbose # Validate and plan — review diff before applying terraform -chdir=infra init terraform -chdir=infra validate terraform -chdir=infra plan -out=tfplan # Apply only after plan review terraform -chdir=infra apply tfplan # Verify resources are healthy aws ecs describe-services --cluster production --services app-service \ --query 'services[0].{Status:status,Running:runningCount,Desired:desiredCount}' ``` ### 2. Application Deployment ```bash # Generate or update pipeline config python scripts/pipeline_generator.py . --platform=github --stages=build,test,security,deploy # Build and tag image docker build -t ghcr.io/org/app:$(git rev-parse --short HEAD) . docker push ghcr.io/org/app:$(git rev-parse --short HEAD) # Deploy with health-check gate python scripts/deployment_manager.py deploy \ --env=production \ --image=app:$(git rev-parse --short HEAD) \ --strategy=blue-green \ --health-check-url=https://app.example.com/healthz # Verify pods are running kubectl get pods -n production -l app=myapp kubectl rollout status deployment/app-blue -n production # Switch traffic after verification kubectl patch service app-svc -n production \ -p '{"spec":{"selector":{"slot":"blue"}}}' ``` ### 3. Rollback Procedure ```bash # Immediate rollback via deployment manager python scripts/deployment_manager.py rollback --env=production --to-version=1.2.2 # Or via kubectl kubectl rollout undo deployment/app -n production kubectl rollout status deployment/app -n production # Verify rollback succeeded kubectl get pods -n production -l app=myapp curl -sf https://app.example.com/healthz || echo "ROLLBACK FAILED — escalate" ``` ## Troubleshooting Check the comprehensive troubleshooting section in `references/deployment_strategies.md`.