# Harness Deployment > CI/CD pipeline analysis, deployment strategy design, and environment management. From commit to production with confidence. ## When to Use - When setting up or reviewing CI/CD pipelines for a new or existing project - When evaluating deployment strategies (blue-green, canary, rolling) for a service - When auditing environment separation and promotion workflows - NOT for container image building or registry management (use harness-containerization) - NOT for infrastructure provisioning (use harness-infrastructure-as-code) - NOT for application performance under load (use harness-perf) ## Process ### Phase 1: DETECT -- Identify Pipeline and Environment Configuration 1. **Scan for CI/CD configuration files.** Search the project root for pipeline definitions: - `.github/workflows/*.yml` -- GitHub Actions - `.gitlab-ci.yml` -- GitLab CI - `Jenkinsfile` -- Jenkins - `.circleci/config.yml` -- CircleCI - `bitbucket-pipelines.yml` -- Bitbucket Pipelines - `azure-pipelines.yml` -- Azure DevOps - `deploy/`, `scripts/deploy*` -- custom deployment scripts 2. **Identify deployment targets.** Parse pipeline files for deployment steps and extract: - Target environments (dev, staging, production) - Deployment mechanisms (kubectl apply, aws ecs update-service, serverless deploy, rsync) - Cloud provider and region information - Container registry references 3. **Detect environment configuration.** Look for environment-specific config: - `.env.production`, `.env.staging` files - Environment variable injection in pipeline definitions - Secret references (GitHub Secrets, GitLab CI variables, Vault paths) - Feature flag provider configuration per environment 4. **Map the deployment topology.** Build a summary of what gets deployed where: - Service name, pipeline file, target environment, deployment mechanism - Dependencies between services (deploy order constraints) - Manual approval gates vs. automatic promotion 5. **Present detection summary.** Output the discovered topology before proceeding: ``` Deployment Topology: Platform: GitHub Actions Pipelines: 3 workflow files Environments: dev, staging, production Strategy: Rolling (detected from kubectl rolling-update) Approval gates: production (manual) ``` --- ### Phase 2: ANALYZE -- Evaluate Pipeline Quality and Gaps 1. **Check pipeline stage completeness.** A mature pipeline includes these stages. Flag any that are missing: - Build and compile - Unit tests - Integration tests - Security scan (SAST/DAST) - Artifact packaging - Deploy to staging - Smoke tests post-deploy - Deploy to production - Post-deploy verification 2. **Evaluate environment isolation.** Verify that environments are properly separated: - Staging and production use different credentials - Environment-specific variables are not shared across environments - Database connections point to the correct environment - No hardcoded production URLs in non-production configs 3. **Check deployment safety mechanisms.** Verify the pipeline includes: - Rollback procedures (automatic or documented manual) - Health checks after deployment - Timeout configuration on deployment steps - Concurrency controls (prevent parallel deploys to the same environment) - Branch protection rules that gate production deploys 4. **Analyze pipeline performance.** Identify bottlenecks: - Steps that could run in parallel but are sequential - Missing caching (dependencies, build artifacts, Docker layers) - Redundant steps across workflows - Total pipeline duration from commit to production 5. **Check secret hygiene in pipelines.** Verify: - No secrets hardcoded in pipeline files - Secrets are scoped to the minimum required environment - Secret rotation is possible without pipeline changes - OIDC or workload identity is used where available instead of long-lived credentials --- ### Phase 3: DESIGN -- Recommend Strategy Improvements 1. **Recommend deployment strategy.** Based on the service characteristics: - **Rolling** -- suitable for stateless services with backward-compatible changes - **Blue-green** -- suitable when zero-downtime cutover is required and rollback must be instant - **Canary** -- suitable for high-traffic services where gradual validation reduces blast radius - **Recreate** -- suitable only for development environments or when downtime is acceptable 2. **Design missing pipeline stages.** For each gap identified in Phase 2, provide: - The stage definition in the project's CI/CD platform syntax - Where it fits in the pipeline order - What tools or services it requires - Example configuration snippet 3. **Recommend environment promotion workflow.** Design the path from commit to production: - Automatic promotion from dev to staging after tests pass - Manual approval gate before production (with notification to the team channel) - Smoke test suite that runs post-deploy in each environment - Rollback trigger conditions (error rate spike, health check failure) 4. **Design rollback procedure.** Every deployment must have a documented rollback: - For container deployments: revert to previous image tag - For serverless: revert to previous function version - For database migrations: backward-compatible migration strategy - Maximum rollback time target (e.g., under 5 minutes) 5. **Recommend monitoring integration.** Connect deployment events to observability: - Deploy markers in APM tools (Datadog, New Relic, Grafana) - Automated alerts on error rate increase after deploy - Deployment frequency and lead time tracking --- ### Phase 4: VALIDATE -- Verify Pipeline Correctness 1. **Lint pipeline configuration.** Run syntax validation: - GitHub Actions: `actionlint` or YAML schema validation - GitLab CI: `gitlab-ci-lint` API endpoint - Jenkinsfile: Groovy syntax check - General: YAML structure validation for all config files 2. **Verify environment variable completeness.** For each environment: - All required variables are defined - No placeholder values remain (TODO, CHANGEME, xxx) - Variables referenced in code exist in the pipeline configuration 3. **Verify branch protection alignment.** Confirm that: - Production deploy pipelines only trigger from protected branches - Required status checks match the pipeline stages - Force-push is disabled on deployment branches 4. **Generate deployment readiness report.** Summarize findings: ``` Deployment Readiness: [PASS/WARN/FAIL] Pipeline stages: 7/9 present (missing: security scan, smoke tests) Environment isolation: PASS Rollback procedure: WARN (documented but not automated) Secret hygiene: PASS Pipeline performance: 12m avg (recommend parallelizing test stages) Recommendations: 1. Add SAST scan stage between build and deploy 2. Add post-deploy smoke test stage 3. Automate rollback on health check failure ``` 5. **Present results.** Use `emit_interaction` to deliver the report and ask whether to proceed with implementing recommendations. --- ## Harness Integration - **`harness skill run harness-deployment`** -- Primary invocation for deployment analysis. - **`harness validate`** -- Run after any pipeline configuration changes to verify project health. - **`harness check-deps`** -- Verify deployment script dependencies are available. - **`emit_interaction`** -- Present deployment readiness report and gather decisions on strategy. ## Success Criteria - All CI/CD configuration files in the project are identified and cataloged - Pipeline stage completeness is assessed against the standard checklist - Environment isolation is verified with no cross-environment credential leakage - A deployment strategy recommendation is provided with rationale - Rollback procedures are documented or flagged as missing - Pipeline lint passes without errors ## Examples ### Example: Node.js API with GitHub Actions ``` Phase 1: DETECT Found: .github/workflows/ci.yml, .github/workflows/deploy.yml Environments: staging (auto), production (manual dispatch) Strategy: Rolling (kubectl set image) Registry: ghcr.io/org/api-server Phase 2: ANALYZE Missing stages: security scan, post-deploy smoke tests Environment isolation: PASS Secret hygiene: WARN -- AWS_ACCESS_KEY_ID used instead of OIDC Pipeline duration: 18m (test and lint run sequentially) Phase 3: DESIGN Recommendation: Add trivy scan after Docker build Recommendation: Switch to AWS OIDC for keyless authentication Recommendation: Parallelize lint and test jobs (saves ~4m) Recommendation: Add smoke test job after deploy-staging Phase 4: VALIDATE actionlint: PASS Environment variables: PASS Branch protection: WARN -- main branch allows force-push Result: WARN -- 3 recommendations, 1 security improvement needed ``` ### Example: Python Service with GitLab CI and Canary Deploy ``` Phase 1: DETECT Found: .gitlab-ci.yml with 5 stages Environments: dev, staging, production Strategy: Canary (Istio VirtualService weight shifting) Registry: registry.gitlab.com/org/service Phase 2: ANALYZE All 9 standard stages present Environment isolation: PASS Canary configuration: 5% -> 25% -> 75% -> 100% over 30 minutes Rollback: Automatic on 5xx rate > 1% Phase 3: DESIGN Current strategy is well-configured. Minor recommendations: - Add canary duration metrics to Grafana dashboard - Add deployment event annotation to Prometheus - Consider adding a manual gate between 75% and 100% Phase 4: VALIDATE GitLab CI lint: PASS Environment variables: PASS Branch protection: PASS Result: PASS -- pipeline is production-ready ``` ## Gates - **No production deploy without staging validation.** If the pipeline allows direct-to-production deployment without a prior staging step, flag as a blocking issue. - **No long-lived credentials in pipelines.** Hardcoded secrets or long-lived access keys in pipeline files are blocking findings. OIDC or short-lived tokens must be used. - **No deploy without rollback.** Every deployment target must have a documented or automated rollback mechanism. Missing rollback is a blocking warning. - **No skipping pipeline lint.** Pipeline configuration must pass syntax validation before recommendations are made. ## Evidence Requirements When this skill makes claims about existing code, architecture, or behavior, it MUST cite evidence using one of: 1. **File reference:** `file:line` format (e.g., `src/auth.ts:42`) 2. **Code pattern reference:** `file` with description (e.g., `src/utils/hash.ts` — "existing bcrypt wrapper") 3. **Test/command output:** Inline or referenced output from a test run or CLI command 4. **Session evidence:** Write to the `evidence` session section via `manage_state` **Uncited claims:** Technical assertions without citations MUST be prefixed with `[UNVERIFIED]`. Example: `[UNVERIFIED] The auth middleware supports refresh tokens`. ## Red Flags ### Universal These apply to ALL skills. If you catch yourself doing any of these, STOP. - **"I believe the codebase does X"** — Stop. Read the code and cite a file:line reference. Belief is not evidence. - **"Let me recommend [pattern] for this"** without checking existing patterns — Stop. Search the codebase first. The project may already have a convention. - **"While we're here, we should also [unrelated improvement]"** — Stop. Flag the idea but do not expand scope beyond the stated task. ### Domain-Specific - **"Deploying without a health check endpoint"** — Stop. Without health checks, the orchestrator cannot detect failed deployments. Add health checks before deploying. - **"Skipping canary deployment, it's a small change"** — Stop. Small changes cause outages too. Follow the deployment policy regardless of change size. - **"Rolling back manually if something goes wrong"** — Stop. Manual rollback under incident pressure fails. Automate rollback before deploying. - **"We can update the runbook after the deploy"** — Stop. If the deployment changes operational behavior, update the runbook first. Stale runbooks during incidents cause escalations. ## Rationalizations to Reject ### Universal These reasoning patterns sound plausible but lead to bad outcomes. Reject them. - **"It's probably fine"** — "Probably" is not evidence. Verify before asserting. - **"This is best practice"** — Best practice in what context? Cite the source and confirm it applies to this codebase. - **"We can fix it later"** — If it is worth flagging, it is worth documenting now with a concrete follow-up plan. ### Domain-Specific | Rationalization | Reality | | ---------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | | "It's just a config change, not a code change" | Config changes cause outages at the same rate as code changes. Deploy them with the same rigor and rollback strategy. | | "We tested this in staging" | Staging is not production. Traffic patterns, data volume, and edge cases differ. Staging success does not guarantee production safety. | | "Downtime will be brief" | Brief is not zero. Quantify the expected impact and communicate it to stakeholders before deploying. | ## Escalation - **When the CI/CD platform is unsupported:** Report which platform was detected and that analysis is limited to general best practices. Recommend the user provide platform-specific documentation for deeper analysis. - **When secrets are found hardcoded in pipeline files:** Immediately flag as a critical finding. Do not proceed with strategy recommendations until secrets are remediated. Recommend rotating the exposed credentials. - **When multiple deployment strategies are mixed across environments:** This is valid (e.g., rolling for staging, canary for production). Analyze each independently and verify the promotion workflow handles the strategy transition. - **When pipeline configuration is generated by a tool (Terraform, Pulumi):** Analyze the generated output but note that fixes must be applied to the generator configuration, not the output files.