---
name: qcsd-production-swarm
description: "Use when assessing post-release production health with DORA metrics, root cause analysis, defect prediction, or cross-phase feedback loops in the QCSD Production phase."
category: qcsd-phases
priority: critical
version: 1.0.0
tokenEstimate: 32000
# DDD Domain Mapping (from QCSD-AGENTIC-QE-MAPPING-FRAMEWORK.md)
domains:
  primary:
    - domain: learning-optimization
      agents: [qe-metrics-optimizer]
    - domain: defect-intelligence
      agents: [qe-defect-predictor, qe-root-cause-analyzer]
  conditional:
    - domain: chaos-resilience
      agents: [qe-chaos-engineer, qe-performance-tester]
    - domain: defect-intelligence
      agents: [qe-regression-analyzer, qe-pattern-learner]
    - domain: enterprise-integration
      agents: [qe-middleware-validator, qe-sap-rfc-tester, qe-sod-analyzer]
  feedback:
    - domain: learning-optimization
      agents: [qe-learning-coordinator, qe-transfer-specialist]
# Agent Inventory
agents:
  core: [qe-metrics-optimizer, qe-defect-predictor, qe-root-cause-analyzer]
  conditional: [qe-chaos-engineer, qe-performance-tester, qe-regression-analyzer, qe-pattern-learner, qe-middleware-validator, qe-sap-rfc-tester, qe-sod-analyzer]
  feedback: [qe-learning-coordinator, qe-transfer-specialist]
  total: 12
  sub_agents: 0
skills: [shift-right-testing, chaos-engineering-resilience, quality-metrics, performance-testing, holistic-testing-pact]
# Execution Models (Task Tool is PRIMARY)
execution:
  primary: task-tool
  alternatives: [mcp-tools, cli]
swarm_pattern: true
parallel_batches: 3
last_updated: 2026-02-17
enforcement_level: strict
tags: [qcsd, production, telemetry, dora, rca, defect-prediction, feedback-loop, learning, swarm, parallel, ddd]
trust_tier: 3
validation:
  schema_path: schemas/output.json
  validator_path: scripts/validate-config.json
  eval_path: evals/qcsd-production-swarm.yaml
---

# QCSD Production Swarm v1.0

Post-release production health assessment and QCSD feedback loop closure.

---

## Overview

The Production Swarm assesses release health in the live production environment using
DORA metrics, incident RCA, defect prediction, and cross-phase feedback loops. It renders
a HEALTHY / DEGRADED / CRITICAL decision and is the only QCSD phase with dual
responsibility: assessing current production health AND closing the feedback loop
back to Ideation and Refinement phases.

### QCSD Phase Positioning

| Phase | Swarm | Decision | When |
|-------|-------|----------|------|
| Ideation | qcsd-ideation-swarm | GO / CONDITIONAL / NO-GO | PI/Sprint Planning |
| Refinement | qcsd-refinement-swarm | READY / CONDITIONAL / NOT-READY | Sprint Refinement |
| Development | qcsd-development-swarm | SHIP / CONDITIONAL / HOLD | During Sprint |
| Verification | qcsd-cicd-swarm | RELEASE / REMEDIATE / BLOCK | Pre-Release / CI-CD |
| **Production** | **qcsd-production-swarm** | **HEALTHY / DEGRADED / CRITICAL** | **Post-Release** |

### Parameters

- `TELEMETRY_DATA`: Path to production telemetry, incident reports, and DORA metrics (required)
- `RELEASE_ID`: Release identifier for tracking (optional)
- `OUTPUT_FOLDER`: Where to save reports (default: `${PROJECT_ROOT}/Agentic QCSD/production/`)
- `SLA_DEFINITIONS`: Path to SLA/SLO target definitions (optional)

---

## ENFORCEMENT RULES - READ FIRST

| Rule | Enforcement |
|------|-------------|
| **E1** | You MUST spawn ALL THREE core agents in Step 2. No exceptions. |
| **E2** | You MUST put all parallel Task calls in a SINGLE message. |
| **E3** | You MUST STOP and WAIT after each batch. No proceeding early. |
| **E4** | You MUST spawn conditional agents if flags are TRUE. No skipping. |
| **E5** | You MUST apply HEALTHY/DEGRADED/CRITICAL logic exactly as specified in Step 5. |
| **E6** | You MUST generate the full report structure. No abbreviated versions. |
| **E7** | Each agent MUST read its reference files before analysis. |
| **E8** | You MUST run BOTH feedback agents in Step 8 SEQUENTIALLY. Always. Both agents. |
| **E9** | You MUST execute Step 7 learning persistence. No skipping. |

**PROHIBITED BEHAVIORS:**
- Summarizing instead of spawning agents
- Skipping agents "for brevity"
- Proceeding before background tasks complete
- Providing your own analysis instead of spawning specialists
- Omitting report sections or using placeholder text

---

## Step Execution Protocol

This skill uses a micro-file step architecture. Each step is a self-contained file
loaded one at a time to avoid "lost in the middle" context degradation.

**Execute steps sequentially by reading each step file with the Read tool.**

### Steps

1. **Flag Detection** -- `steps/01-flag-detection.md` -- Retrieve CI/CD signals, detect telemetry source, evaluate all 7 flags
2. **Core Agents** -- `steps/02-core-agents.md` -- Spawn qe-metrics-optimizer, qe-defect-predictor, qe-root-cause-analyzer in parallel
3. **Batch 1 Results** -- `steps/03-batch1-results.md` -- Wait for core agents, extract all metrics
4. **Conditional Agents** -- `steps/04-conditional-agents.md` -- Spawn flagged conditional agents in parallel
5. **Decision Synthesis** -- `steps/05-decision-synthesis.md` -- Apply HEALTHY/DEGRADED/CRITICAL logic
6. **Report Generation** -- `steps/06-report-generation.md` -- Generate executive summary and full report
7. **Learning Persistence** -- `steps/07-learning-persistence.md` -- Store findings to memory, save persistence record
8. **Feedback Loop** -- `steps/08-feedback-loop.md` -- Run learning coordinator then transfer specialist (sequential)
9. **Final Output** -- `steps/09-final-output.md` -- Display completion summary with all scores

### Execution Instructions

1. Use the Read tool to load the current step file (e.g., `Read({ file_path: ".claude/skills/qcsd-production-swarm/steps/01-flag-detection.md" })`)
2. Execute the step's instructions completely
3. Verify all success criteria are met before proceeding
4. Pass the step's output as context to the next step
5. If a step fails, halt and report the failure point -- do not skip ahead

### Resume Support

To resume from a specific step: specify `--from-step N` and the orchestrator will
skip to step N. Ensure you have the required prerequisite data from prior steps.

---

## Agent Inventory

| Agent | Type | Domain | Batch |
|-------|------|--------|-------|
| qe-metrics-optimizer | Core (always) | learning-optimization | 1 |
| qe-defect-predictor | Core (always) | defect-intelligence | 1 |
| qe-root-cause-analyzer | Core (always) | defect-intelligence | 1 |
| qe-chaos-engineer | Conditional (HAS_INFRASTRUCTURE_CHANGE) | chaos-resilience | 2 |
| qe-performance-tester | Conditional (HAS_PERFORMANCE_SLA) | chaos-resilience | 2 |
| qe-regression-analyzer | Conditional (HAS_REGRESSION_RISK) | defect-intelligence | 2 |
| qe-pattern-learner | Conditional (HAS_RECURRING_INCIDENTS) | defect-intelligence | 2 |
| qe-middleware-validator | Conditional (HAS_MIDDLEWARE) | enterprise-integration | 2 |
| qe-sap-rfc-tester | Conditional (HAS_SAP_INTEGRATION) | enterprise-integration | 2 |
| qe-sod-analyzer | Conditional (HAS_AUTHORIZATION) | enterprise-integration | 2 |
| qe-learning-coordinator | Feedback (always, sequential) | learning-optimization | 3 |
| qe-transfer-specialist | Feedback (always, sequential) | learning-optimization | 3 |

**Total: 12 agents (3 core + 7 conditional + 2 feedback)**

---

## Quality Gate Thresholds

| Metric | HEALTHY | DEGRADED | CRITICAL |
|--------|---------|----------|----------|
| DORA Score | >= 0.7 | 0.4 - 0.69 | < 0.4 |
| SLA Compliance | >= 99% | 95 - 98.9% | < 95% |
| Incident Severity | P3/P4/NONE | P2 | P0/P1 |
| Defect Trend | declining/stable | stable (density > 2) | increasing + density > 5 |
| RCA Completeness | >= 80% | 50 - 79% | < 50% |

---

## Report Filename Mapping

| Agent | Report Filename | Step |
|-------|----------------|------|
| qe-metrics-optimizer | `02-dora-metrics.md` | 2 |
| qe-defect-predictor | `03-defect-prediction.md` | 2 |
| qe-root-cause-analyzer | `04-root-cause-analysis.md` | 2 |
| qe-chaos-engineer | `05-chaos-resilience.md` | 4 |
| qe-performance-tester | `06-performance-sla.md` | 4 |
| qe-regression-analyzer | `07-regression-analysis.md` | 4 |
| qe-pattern-learner | `08-pattern-analysis.md` | 4 |
| Learning Persistence | `09-learning-persistence.json` | 7 |
| qe-middleware-validator | `10-middleware-health.md` | 4 |
| qe-sap-rfc-tester | `11-sap-health.md` | 4 |
| qe-sod-analyzer | `12-sod-compliance.md` | 4 |
| Feedback agents | `13-feedback-loops.md` | 8 |
| Synthesis | `01-executive-summary.md` | 6 |

---

## Execution Model Options

| Model | When to Use | Agent Spawn |
|-------|-------------|-------------|
| **Task Tool** (PRIMARY) | Claude Code sessions | `Task({ subagent_type, run_in_background: true })` |
| **MCP Tools** | MCP server available | `fleet_init({})` / `task_submit({})` |
| **CLI** | Terminal/scripts | `swarm init` / `agent spawn` |

---

## Key Principle

**Production health is measured by outcomes, not intentions. This swarm provides
evidence-based production assessment and closes the QCSD feedback loop.**