--- name: devops-workflow-engineer description: > Use when designing GitHub Actions workflows, creating CI/CD pipelines, planning multi-environment deployments, optimizing pipeline cost and execution time, or implementing deployment strategies (blue-green, canary, rolling). Generates production-ready workflow YAML, analyzes existing pipelines for optimization, and creates deployment plans. license: MIT + Commons Clause metadata: version: 1.1.0 author: borghei category: engineering domain: devops updated: 2026-04-02 tags: [github-actions, ci-cd, deployment, workflows] python-tools: workflow_generator.py, pipeline_analyzer.py, deployment_planner.py tech-stack: python, github-actions, yaml, ci-cd --- # DevOps Workflow Engineer The agent generates GitHub Actions workflow YAML, analyzes existing pipelines for optimization opportunities, and creates deployment plans with strategy selection, health checks, and rollback procedures. --- ## Quick Start ```bash # Generate a CI workflow python scripts/workflow_generator.py --type ci --language python --test-framework pytest # Analyze existing pipelines for optimization python scripts/pipeline_analyzer.py .github/workflows/ --format json # Plan a deployment strategy python scripts/deployment_planner.py --type webapp --environments dev,staging,prod --strategy canary ``` ## Tools Overview | Tool | Input | Output | |------|-------|--------| | `workflow_generator.py` | Workflow type + language | GitHub Actions YAML (ci, cd, release, security-scan, docs-check) | | `pipeline_analyzer.py` | Workflow file or directory | Optimization findings, cost estimates, severity ratings | | `deployment_planner.py` | Project type + environments | Deployment plan with strategy, health checks, rollback | All tools support `--format json` and `--output` for file writing. --- ## Workflow 1: CI Pipeline Design The agent generates pipelines following fail-fast ordering: 1. **Lint and format** (~30s) -- cheapest gate first 2. **Unit tests** (~2-5m) -- matrix across versions 3. **Build verification** (~3-8m) 4. **Integration tests** (~5-15m, parallel with build) 5. **Security scanning** (~2-5m) ```yaml jobs: lint: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: make lint test: needs: lint strategy: matrix: python-version: ['3.10', '3.11', '3.12'] steps: - uses: actions/setup-python@v5 with: { python-version: "${{ matrix.python-version }}", cache: pip } - run: pip install -r requirements.txt - run: pytest --junitxml=results.xml security: needs: lint steps: - run: pip-audit -r requirements.txt ``` **CI targets:** | Metric | Target | Fix | |--------|--------|-----| | Total CI time | < 10 min | Parallelize, add caching | | Lint step | < 1 min | Use pre-commit locally | | Unit tests | < 5 min | Split suites, use matrix | | Flaky rate | < 1% | Quarantine flaky tests | | Cache hit rate | > 80% | Review cache keys | --- ## Workflow 2: CD Pipeline and Multi-Environment Deployment ```bash python scripts/deployment_planner.py --type webapp --environments dev,staging,prod --format json ``` **Environment promotion flow:** ``` Build -> Dev (auto) -> Staging (auto) -> Production (manual approval) | Canary (10%) -> Full rollout ``` | Aspect | Dev | Staging | Production | |--------|-----|---------|------------| | Trigger | Every push | Merge to main | Manual approval | | Replicas | 1 | 2 | 3+ (auto-scaled) | | Secrets | Repository | Environment | Vault/OIDC | | Monitoring | Basic logs | Full observability | Full + alerting | **Key CD rules:** - Build once, deploy the same artifact everywhere - Tag artifacts with commit SHA for traceability - Use environment protection rules for production gates - Maintain rollback capability at every stage --- ## Workflow 3: Pipeline Optimization ```bash python scripts/pipeline_analyzer.py .github/workflows/ --format json -o report.json ``` The agent checks for: 1. **Missing caching** -- dependencies reinstalled every run 2. **No timeouts** -- stuck jobs burn budget 3. **Sequential chains** that could parallelize 4. **Deprecated actions** with newer versions available 5. **Security issues** -- secrets in logs, missing permissions scoping 6. **Cost inefficiency** -- oversized runners, no path filtering **Optimization techniques:** **Path-based filtering** -- skip CI for docs-only changes: ```yaml on: push: paths: ['src/**', 'tests/**', 'requirements*.txt'] paths-ignore: ['docs/**', '*.md'] ``` **Concurrency cancellation** -- cancel superseded runs: ```yaml concurrency: group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true ``` **Dependency caching:** ```yaml - uses: actions/cache@v4 with: path: ~/.cache/pip key: ${{ runner.os }}-deps-${{ hashFiles('**/requirements.txt') }} ``` --- ## Deployment Strategies **Decision tree:** ``` Zero-downtime required? No -> Rolling deployment Yes -> Need instant rollback? No -> Rolling with health checks Yes -> Budget for 2x infrastructure? Yes -> Blue-green No -> Canary ``` **Canary traffic split schedule:** | Phase | % | Duration | Gate | |-------|---|----------|------| | 1 | 5% | 15 min | Error rate < 0.1% | | 2 | 25% | 30 min | P99 latency < 200ms | | 3 | 50% | 60 min | Business metrics stable | | 4 | 100% | -- | Full promotion | --- ## GitHub Actions Patterns **Reusable workflows** -- define once, call everywhere: ```yaml # .github/workflows/reusable-deploy.yml on: workflow_call: inputs: environment: { required: true, type: string } image_tag: { required: true, type: string } secrets: DEPLOY_KEY: { required: true } ``` **OIDC authentication** -- no long-lived credentials: ```yaml permissions: id-token: write contents: read steps: - uses: aws-actions/configure-aws-credentials@v4 with: role-to-assume: arn:aws:iam::123456789:role/github-actions aws-region: us-east-1 ``` **Secrets hierarchy:** Organization > Repository > Environment. Never echo secrets; use `add-mask` for dynamic values. Prefer OIDC for cloud auth. --- ## Runner Cost Optimization | Runner | vCPU | RAM | Cost/min | Best For | |--------|------|-----|----------|----------| | 2-core | 2 | 7 GB | $0.008 | Standard tasks | | 4-core | 4 | 16 GB | $0.016 | Build-heavy | | 8-core | 8 | 32 GB | $0.032 | Large compilations | | 16-core | 16 | 64 GB | $0.064 | Parallel test suites | **Monthly estimate:** `(runs/day) x (avg min/run) x 30 x (cost/min)` Example: 50 pushes/day x 8 min x 30 = 12,000 min x $0.008 = **$96/month**. --- ## Anti-Patterns | Anti-Pattern | Problem | Fix | |-------------|---------|-----| | Monolithic workflow | 45-min single workflow | Split into parallel jobs | | No caching | Reinstall deps every run | Cache dependencies and builds | | Secrets in logs | Leaked credentials | `add-mask`, avoid `echo` | | No timeout | Stuck jobs burn budget | `timeout-minutes` on every job | | Full matrix every push | 30-min matrix on every commit | Full nightly; reduced on push | | No rollback plan | Stuck with broken deploy | Automate rollback in CD pipeline | --- ## Troubleshooting | Problem | Cause | Solution | |---------|-------|----------| | Workflow never triggers | Wrong `on:` config or branch name mismatch | Verify triggers match branching strategy | | Cache miss every run | Volatile cache key (timestamp) | Use `hashFiles()` on lock files | | Matrix fails on one OS only | Platform-specific paths or deps | Use `shell: bash`; install OS deps per matrix entry | | Secret not available | Wrong environment scope | Ensure job declares correct `environment:` | | Health check fails after deploy | App not started before check | Add retry loop with backoff | | Concurrency cancels needed runs | Overly broad group key | Scope to `workflow-ref`; separate groups for deploy | --- ## References | Guide | Path | |-------|------| | GitHub Actions Patterns | `references/github-actions-patterns.md` | | Deployment Strategies | `references/deployment-strategies.md` | | Agentic Workflows Guide | `references/agentic-workflows-guide.md` | --- ## Integration Points | Skill | Integration | |-------|-------------| | `release-orchestrator` | Release workflows align with versioning and changelog | | `senior-devops` | Deployment strategies complement infra automation | | `senior-secops` | Security scanning steps feed SecOps dashboards | | `senior-qa` | CI quality gates map to QA acceptance criteria | | `incident-commander` | Rollback procedures connect to incident playbooks | --- **Last Updated:** April 2026 **Version:** 1.1.0