---
name: end-to-end-orchestrator
description: Complete development workflow orchestrator coordinating all multi-ai skills (research → planning → implementation → testing → verification) with quality gates, failure recovery, and state management. Single-command complete workflows from objective to production-ready code. Use when implementing complete features requiring full pipeline, coordinating multiple skills automatically, or executing production-grade development cycles end-to-end.
allowed-tools: Task, Read, Write, Edit, Glob, Grep, Bash
---

# End-to-End Orchestrator

## Overview

end-to-end-orchestrator provides single-command complete development workflows, coordinating all 5 multi-ai skills from research through production deployment.

**Purpose**: Transform "I want feature X" into production-ready code through automated skill coordination

**Pattern**: Workflow-based (5-stage pipeline with quality gates)

**Key Innovation**: Automatic orchestration of research → planning → implementation → testing → verification with failure recovery and quality gates

**The Complete Pipeline**:
```
Input: Feature description
  ↓
1. Research (multi-ai-research) [optional]
  ↓ [Quality Gate: Research complete]
2. Planning (multi-ai-planning)
  ↓ [Quality Gate: Plan ≥90/100]
3. Implementation (multi-ai-implementation)
  ↓ [Quality Gate: Tests pass, coverage ≥80%]
4. Testing (multi-ai-testing)
  ↓ [Quality Gate: Coverage ≥95%, verified]
5. Verification (multi-ai-verification)
  ↓ [Quality Gate: Score ≥90/100, all layers pass]
Output: Production-ready code
```

---

## When to Use

Use end-to-end-orchestrator when:

- Implementing complete features (not quick fixes)
- Want automated workflow (not manual skill chaining)
- Production-quality required (all gates must pass)
- Time optimization important (parallel where possible)
- Need failure recovery (automatic retry/rollback)

**When NOT to Use**:
- Quick fixes (<30 minutes)
- Exploratory work (uncertain requirements)
- Manual control preferred (step through each phase)

---

## Prerequisites

### Required
- All 5 multi-ai skills installed:
  - multi-ai-research
  - multi-ai-planning
  - multi-ai-implementation
  - multi-ai-testing
  - multi-ai-verification

### Optional
- agent-memory-system (for learning from past work)
- hooks-manager (for automation)
- Gemini CLI, Codex CLI (for tri-AI research)

---

## Complete Workflow

### Stage 1: Research (Optional)

**Purpose**: Ground implementation in proven patterns

**Process**:

1. **Determine if Research Needed**:
   ```typescript
   // Check if objective is familiar
   const similarWork = await recallMemory({
     type: 'episodic',
     query: objective
   });

   if (similarWork.length === 0) {
     // Unfamiliar domain → research needed
     needsResearch = true;
   } else {
     // Familiar → can skip research, use past learnings
     needsResearch = false;
   }
   ```

2. **Execute Research** (if needed):
   ```
   Use multi-ai-research for "[domain] implementation patterns and best practices"
   ```

   **What It Provides**:
   - Claude research: Official docs, codebase patterns
   - Gemini research: Web best practices, latest trends
   - Codex research: GitHub patterns, code examples
   - Quality: ≥95/100 with 100% citations

3. **Quality Gate: Research Complete**:
   ```markdown
   ✅ Research findings documented
   ✅ Patterns identified (minimum 2)
   ✅ Best practices extracted (minimum 3)
   ✅ Quality score ≥95/100
   ```

   **If Fail**: Research incomplete → retry research OR proceed without (user decides)

**Outputs**:
- Research findings (.analysis/ANALYSIS_FINAL.md)
- Patterns and best practices
- Implementation recommendations

**Time**: 30-60 minutes (can skip if familiar domain)

**Next**: Proceed to Stage 2

---

### Stage 2: Planning

**Purpose**: Create agent-executable plan with quality ≥90/100

**Process**:

1. **Load Research Context** (if research done):
   ```typescript
   let context = "";

   if (researchDone) {
     context = await readFile('.analysis/ANALYSIS_FINAL.md');
   }
   ```

2. **Invoke Planning**:
   ```
   Use multi-ai-planning to create plan for [objective]

   ${context ? `Research findings available in: .analysis/ANALYSIS_FINAL.md` : ''}

   Create comprehensive plan following 6-step workflow.
   ```

   **What It Does**:
   - Analyzes objective
   - Hierarchical decomposition (8-15 tasks)
   - Maps dependencies, identifies parallel
   - Plans verification for all tasks
   - Scores quality (0-100)

3. **Quality Gate: Plan Approved**:
   ```markdown
   ✅ Plan created
   ✅ Quality score ≥90/100
   ✅ All tasks have verification
   ✅ Dependencies mapped
   ✅ No circular dependencies
   ```

   **If Fail** (score <90):
   - Review gap analysis
   - Apply recommended fixes
   - Re-verify
   - Retry up to 2 times
   - If still <90: Escalate to human review

4. **Save Plan to Shared State**:
   ```bash
   # Save for next stage
   cp plans/[plan-id]/plan.json .multi-ai-context/plan.json
   ```

**Outputs**:
- plan.json (machine-readable)
- PLAN.md (human-readable)
- COORDINATION.md (execution guide)
- Quality ≥90/100

**Time**: 1.5-3 hours

**Next**: Proceed to Stage 3

---

### Stage 3: Implementation

**Purpose**: Execute plan with TDD, produce working code

**Process**:

1. **Load Plan**:
   ```typescript
   const plan = JSON.parse(readFile('.multi-ai-context/plan.json'));
   console.log(`📋 Loaded plan: ${plan.objective}`);
   console.log(`   Tasks: ${plan.tasks.length}`);
   console.log(`   Estimated: ${plan.metadata.estimated_total_hours} hours`);
   ```

2. **Invoke Implementation**:
   ```
   Use multi-ai-implementation following plan in .multi-ai-context/plan.json

   Execute all 6 steps:
   1. Explore & gather context
   2. Plan architecture (plan already created, refine as needed)
   3. Implement incrementally with TDD
   4. Coordinate multi-agent (if parallel tasks)
   5. Integration & E2E testing
   6. Quality verification before commit

   Success criteria from plan.
   ```

   **What It Does**:
   - Explores codebase (progressive disclosure)
   - Implements incrementally (<200 lines per commit)
   - Test-driven development (tests first)
   - Multi-agent coordination for parallel tasks
   - Continuous testing during implementation
   - Doom loop prevention (max 3 retries)

3. **Quality Gate: Implementation Complete**:
   ```markdown
   ✅ All plan tasks implemented
   ✅ All tests passing
   ✅ Coverage ≥80% (gate), ideally ≥95%
   ✅ No regressions
   ✅ Doom loop avoided (< max retries)
   ```

   **If Fail**:
   - Identify failing task
   - Retry with different approach
   - If 3 failures: Escalate to human
   - Save state for recovery

4. **Save Implementation State**:
   ```bash
   # Save for next stage
   echo '{
     "status": "implemented",
     "files_changed": [...],
     "tests_run": 95,
     "tests_passed": 95,
     "coverage": 87,
     "commits": ["abc123", "def456"]
   }' > .multi-ai-context/implementation-status.json
   ```

**Outputs**:
- Working code
- Tests passing
- Coverage ≥80%
- Commits created

**Time**: 3-10 hours (varies by complexity)

**Next**: Proceed to Stage 4

---

### Stage 4: Testing (Independent Verification)

**Purpose**: Verify tests are comprehensive and prevent gaming

**Process**:

1. **Load Implementation Context**:
   ```typescript
   const implStatus = JSON.parse(
     readFile('.multi-ai-context/implementation-status.json')
   );

   console.log(`🧪 Testing implementation:`);
   console.log(`   Files changed: ${implStatus.files_changed.length}`);
   console.log(`   Current coverage: ${implStatus.coverage}%`);
   ```

2. **Invoke Independent Testing**:
   ```
   Use multi-ai-testing independent verification workflow

   Verify:
   - Tests in: tests/
   - Code in: src/
   - Specifications in: .multi-ai-context/plan.json

   Workflows to execute:
   1. Test quality verification (independent agent)
   2. Coverage validation (≥95% target)
   3. Edge case discovery (AI-powered)
   4. Multi-agent ensemble scoring (if critical feature)

   Score test quality (0-100).
   ```

   **What It Does**:
   - Independent verification (separate agent from impl)
   - Checks tests match specifications (not just what code does)
   - Generates additional edge case tests
   - Multi-agent ensemble for quality scoring
   - Prevents overfitting

3. **Quality Gate: Testing Verified**:
   ```markdown
   ✅ Test quality score ≥90/100
   ✅ Coverage ≥95% (target achieved)
   ✅ Independent verification passed
   ✅ No test gaming detected
   ✅ Edge cases covered
   ```

   **If Fail**:
   - Review test quality issues
   - Generate additional tests
   - Re-verify
   - Max 2 retries, then escalate

4. **Save Testing State**:
   ```bash
   echo '{
     "status": "tested",
     "test_quality_score": 92,
     "coverage": 96,
     "tests_total": 112,
     "edge_cases": 23,
     "gaming_detected": false
   }' > .multi-ai-context/testing-status.json
   ```

**Outputs**:
- Test quality ≥90/100
- Coverage ≥95%
- Independent verification passed

**Time**: 1-3 hours

**Next**: Proceed to Stage 5

---

### Stage 5: Verification (Multi-Layer QA)

**Purpose**: Final quality assurance before production

**Process**:

1. **Load All Context**:
   ```typescript
   const plan = JSON.parse(readFile('.multi-ai-context/plan.json'));
   const implStatus = JSON.parse(readFile('.multi-ai-context/implementation-status.json'));
   const testStatus = JSON.parse(readFile('.multi-ai-context/testing-status.json'));

   console.log(`🔍 Final verification:`);
   console.log(`   Objective: ${plan.objective}`);
   console.log(`   Implementation: ${implStatus.status}`);
   console.log(`   Testing: ${testStatus.coverage}% coverage`);
   ```

2. **Invoke Multi-Layer Verification**:
   ```
   Use multi-ai-verification for complete quality check

   Verify:
   - Code: src/
   - Tests: tests/
   - Plan: .multi-ai-context/plan.json

   Execute all 5 layers:
   1. Rules-based (linting, types, schema, SAST)
   2. Functional (tests, coverage, examples)
   3. Visual (if UI: screenshots, a11y)
   4. Integration (E2E, API compatibility)
   5. Quality scoring (LLM-as-judge, 0-100)

   All 5 quality gates must pass.
   ```

   **What It Does**:
   - Runs all 5 verification layers
   - Each layer is independent
   - LLM-as-judge for holistic assessment
   - Agent-as-a-Judge can execute tools to verify claims
   - Multi-agent ensemble for critical features

3. **Quality Gate: Production Ready**:
   ```markdown
   ✅ Layer 1 (Rules): PASS
   ✅ Layer 2 (Functional): PASS, coverage 96%
   ✅ Layer 3 (Visual): PASS or SKIPPED
   ✅ Layer 4 (Integration): PASS
   ✅ Layer 5 (Quality): 92/100 ≥90 ✅

   ALL GATES PASSED → PRODUCTION APPROVED
   ```

   **If Fail**:
   - Review gap analysis from failed layer
   - Apply recommended fixes
   - Re-verify from failed layer (not all 5)
   - Max 2 retries per layer
   - If still failing: Escalate to human

4. **Generate Final Report**:
   ```markdown
   # Feature Implementation Complete

   **Objective**: [from plan]

   ## Pipeline Execution Summary

   ### Stage 1: Research
   - Status: ✅ Complete
   - Quality: 97/100
   - Time: 52 minutes

   ### Stage 2: Planning
   - Status: ✅ Complete
   - Quality: 94/100
   - Tasks: 23
   - Time: 1.8 hours

   ### Stage 3: Implementation
   - Status: ✅ Complete
   - Files changed: 15
   - Lines added: 847
   - Commits: 12
   - Time: 6.2 hours

   ### Stage 4: Testing
   - Status: ✅ Complete
   - Test quality: 92/100
   - Coverage: 96%
   - Tests: 112
   - Time: 1.5 hours

   ### Stage 5: Verification
   - Status: ✅ Complete
   - Quality score: 92/100
   - All layers: PASS
   - Time: 1.2 hours

   ## Final Metrics

   - **Total Time**: 11.3 hours
   - **Quality**: 92/100
   - **Coverage**: 96%
   - **Status**: ✅ PRODUCTION READY

   ## Commits
   - abc123: feat: Add database schema
   - def456: feat: Implement OAuth integration
   - [... 10 more ...]

   ## Next Steps
   - Create PR for team review
   - Deploy to staging
   - Production release
   ```

5. **Save to Memory** (if agent-memory-system available):
   ```typescript
   await storeMemory({
     type: 'episodic',
     event: {
       description: `Complete implementation: ${objective}`,
       outcomes: {
         total_time: 11.3,
         quality_score: 92,
         test_coverage: 96,
         stages_completed: 5
       },
       learnings: extractedDuringPipeline
     }
   });
   ```

**Outputs**:
- Production-ready code
- Comprehensive final report
- Commits created
- PR ready (if requested)
- Memory saved for future learning

**Time**: 30-90 minutes

**Result**: ✅ PRODUCTION READY

---

## Failure Recovery

### Failure Handling at Each Stage

**Stage Fails** → **Recovery Strategy**:

**Research Fails**:
- Retry with different sources
- Skip research (use memory if available)
- Escalate to human if critical gap

**Planning Fails** (score <90):
- Review gap analysis
- Apply fixes automatically if possible
- Retry planning (max 2 attempts)
- Escalate if still <90

**Implementation Fails**:
- Identify failing task
- Automatic rollback to last checkpoint
- Retry with alternative approach
- Doom loop prevention (max 3 retries)
- Escalate with full error context

**Testing Fails** (coverage <80% or quality <90):
- Generate additional tests for gaps
- Retry verification
- Max 2 retries
- Escalate with coverage report

**Verification Fails** (score <90 or layer fails):
- Apply auto-fixes for Layer 1-2 issues
- Manual fixes needed for Layer 3-5
- Re-verify from failed layer (not all 5)
- Max 2 retries per layer
- Escalate with quality report

---

### Escalation Protocol

**When to Escalate to Human**:
1. Any stage fails 3 times (doom loop)
2. Planning quality <80 after 2 retries
3. Implementation doom loop detected
4. Verification score <80 after 2 retries
5. Budget exceeded (if cost tracking enabled)
6. Circular dependency detected
7. Irrecoverable error (file system, permissions)

**Escalation Format**:
```markdown
# ⚠️ ESCALATION REQUIRED

**Stage**: Implementation (Stage 3)
**Failure**: Doom loop detected (3 failed attempts)

## Context
- Objective: Implement user authentication
- Failing Task: 2.2.2 Token generation
- Error: Tests fail with "undefined userId" repeatedly

## Attempts Made
1. Attempt 1: Added userId to payload → Same error
2. Attempt 2: Changed payload structure → Same error
3. Attempt 3: Different JWT library → Same error

## Root Cause Analysis
- Tests expect `user.id` but implementation uses `user.userId`
- Mismatch in data model between test and implementation
- Auto-fix failed 3 times

## Recommended Actions
1. Review test specifications vs. implementation
2. Align data model (user.id vs. user.userId)
3. Manual intervention required

## State Saved
- Checkpoint: checkpoint-003 (before attempts)
- Rollback available: `git checkout checkpoint-003`
- Continue after fix: Resume from Task 2.2.2
```

---

## Parallel Execution Optimization

### Identifying Parallel Opportunities

**From Plan**:
```typescript
const plan = readFile('.multi-ai-context/plan.json');

// Plan identifies parallel groups
const parallelGroups = plan.parallel_groups;

// Example:
// Group 1: Tasks 2.1, 2.2, 2.3 (independent)
// Can execute in parallel
```

### Executing Parallel Tasks

**Pattern**:
```typescript
// Stage 3: Implementation with parallel tasks

const parallelGroup = plan.parallel_groups.find(g => g.group_id === 'pg2');

// Spawn parallel implementation agents
const results = await Promise.all(
  parallelGroup.tasks.map(taskId => {
    const task = plan.tasks.find(t => t.id === taskId);

    return task({
      description: `Implement ${task.description}`,
      prompt: `Implement task ${task.id}: ${task.description}

      Specifications from plan:
      ${JSON.stringify(task, null, 2)}

      Success criteria:
      ${task.verification.success_criteria.join('\n')}

      Write implementation and tests.
      Report completion status.`
    });
  })
);

// Verify all parallel tasks completed
const allSucceeded = results.every(r => r.status === 'complete');

if (allSucceeded) {
  // Proceed to integration
} else {
  // Handle failures
}
```

**Time Savings**: 20-40% faster than sequential execution

---

## State Management

### Cross-Skill State Sharing

**Shared Context Directory**: `.multi-ai-context/`

**Standard Files**:
```
.multi-ai-context/
├── research-findings.json     # From multi-ai-research
├── plan.json                  # From multi-ai-planning
├── implementation-status.json # From multi-ai-implementation
├── testing-status.json        # From multi-ai-testing
├── verification-report.json   # From multi-ai-verification
├── pipeline-state.json        # Orchestrator state
└── failure-history.json       # For doom loop detection
```

**Benefits**:
- Skills don't duplicate work
- Later stages read earlier outputs
- Failure recovery knows full state
- Memory can be saved from shared state

---

### Progress Tracking

**Real-Time Progress**:
```json
{
  "pipeline_id": "pipeline_20250126_1200",
  "objective": "Implement user authentication",
  "started_at": "2025-01-26T12:00:00Z",
  "current_stage": 3,
  "stages": [
    {
      "stage": 1,
      "name": "Research",
      "status": "complete",
      "duration_minutes": 52,
      "quality": 97
    },
    {
      "stage": 2,
      "name": "Planning",
      "status": "complete",
      "duration_minutes": 108,
      "quality": 94
    },
    {
      "stage": 3,
      "name": "Implementation",
      "status": "in_progress",
      "started_at": "2025-01-26T13:48:00Z",
      "tasks_total": 23,
      "tasks_complete": 15,
      "tasks_remaining": 8,
      "percent_complete": 65
    },
    {
      "stage": 4,
      "name": "Testing",
      "status": "pending"
    },
    {
      "stage": 5,
      "name": "Verification",
      "status": "pending"
    }
  ],
  "estimated_completion": "2025-01-26T20:00:00Z",
  "quality_target": 90,
  "current_quality_estimate": 92
}
```

**Query Progress**:
```bash
# Check current status
cat .multi-ai-context/pipeline-state.json | jq '.current_stage, .stages[2].percent_complete'

# Output: Stage 3, 65% complete
```

---

## Workflow Modes

### Standard Mode (Full Pipeline)

**All 5 Stages**:
```
Research → Planning → Implementation → Testing → Verification
```

**Time**: 8-20 hours
**Quality**: Maximum (all gates, ≥90)
**Use For**: Production features, complex implementations

---

### Fast Mode (Skip Research)

**4 Stages** (familiar domains):
```
Planning → Implementation → Testing → Verification
```

**Time**: 6-15 hours
**Quality**: High (all gates except research)
**Use For**: Familiar domains, time-sensitive features

---

### Quick Mode (Essential Gates Only)

**Implementation + Basic Verification**:
```
Planning → Implementation → Testing (basic) → Verification (Layers 1-2 only)
```

**Time**: 3-8 hours
**Quality**: Good (essential gates only)
**Use For**: Internal tools, prototypes

---

## Best Practices

### 1. Always Run Planning Stage
Even for "simple" features - planning quality ≥90 prevents issues

### 2. Use Memory to Skip Research
If similar work done before, recall patterns instead of researching

### 3. Monitor Progress
Check `.multi-ai-context/pipeline-state.json` to track progress

### 4. Trust the Quality Gates
If gate fails, there's a real issue - don't skip fixes

### 5. Save State Frequently
Each stage completion saves state (enables recovery)

### 6. Review Final Report
Complete understanding of what was built and quality achieved

---

## Integration Points

### With All 5 Multi-AI Skills

**Coordinates**:
1. multi-ai-research (Stage 1)
2. multi-ai-planning (Stage 2)
3. multi-ai-implementation (Stage 3)
4. multi-ai-testing (Stage 4)
5. multi-ai-verification (Stage 5)

**Provides**:
- Automatic skill invocation
- Quality gate enforcement
- Failure recovery
- State management
- Progress tracking
- Final reporting

---

### With agent-memory-system

**Before Pipeline**:
- Recall similar past work
- Load learned patterns
- Skip research if memory sufficient

**After Pipeline**:
- Save complete episode to memory
- Extract learnings
- Update procedural patterns
- Improve estimation accuracy

---

### With hooks-manager

**Session Hooks**:
- SessionStart: Load pipeline state
- SessionEnd: Save pipeline progress
- PostToolUse: Track stage completions

**Notification Hooks**:
- Send telemetry on stage completions
- Alert on gate failures
- Track quality scores

---

## Quick Reference

### The 5-Stage Pipeline

| Stage | Skill | Time | Quality Gate | Output |
|-------|-------|------|--------------|--------|
| 1 | multi-ai-research | 30-60m | ≥95/100 | Research findings |
| 2 | multi-ai-planning | 1.5-3h | ≥90/100 | Executable plan |
| 3 | multi-ai-implementation | 3-10h | Tests pass, ≥80% cov | Working code |
| 4 | multi-ai-testing | 1-3h | ≥95% cov, quality ≥90 | Verified tests |
| 5 | multi-ai-verification | 1-3h | ≥90/100, all layers | Production ready |

**Total**: 8-20 hours → Production-ready feature

### Workflow Modes

| Mode | Stages | Time | Quality | Use For |
|------|--------|------|---------|---------|
| **Standard** | All 5 | 8-20h | Maximum | Production features |
| **Fast** | 2-5 (skip research) | 6-15h | High | Familiar domains |
| **Quick** | 2,3,4,5 (basic) | 3-8h | Good | Internal tools |

### Quality Gates

- **Research**: ≥95/100, patterns identified
- **Planning**: ≥90/100, all tasks verifiable
- **Implementation**: Tests pass, coverage ≥80%
- **Testing**: Quality ≥90/100, coverage ≥95%
- **Verification**: ≥90/100, all 5 layers pass

---

**end-to-end-orchestrator provides complete automation from feature description to production-ready code, coordinating all 5 multi-ai skills with quality gates, failure recovery, and state management - delivering enterprise-grade development workflows in a single command.**

For examples, see examples/. For failure recovery, see Failure Recovery section.