---
name: silent-degradation-audit
description: |
  Production-ready skill for detecting silent degradation across codebases.
  Uses multi-wave audit system with 6 specialized category agents, multi-agent
  validation panel, and convergence detection.
---

# Silent Degradation Audit Skill

## Overview

Production-ready skill for detecting silent degradation across codebases. Uses multi-wave audit system with 6 specialized category agents, multi-agent validation panel, and convergence detection. Battle-tested on CyberGym codebase (~250 bugs found).

## When to Use This Skill

**Use this skill when:**

- Code has reliability issues but unclear where
- Systems fail silently without operator visibility
- Error handling exists but effectiveness unknown
- Need comprehensive audit across multiple failure modes
- Preparing for production deployment
- Post-mortem analysis after silent failures

**Don't use for:**

- Code style or formatting issues (use linters)
- Performance optimization (use profilers)
- Security vulnerabilities (use security scanners)
- Simple one-off code reviews (use /analyze)

## Key Features

### Multi-Wave Progressive Audit

- **Wave 1**: Broad scan, finds obvious issues (40-50% of total)
- **Wave 2-3**: Deeper analysis, finds hidden issues (30-40%)
- **Wave 4-6**: Edge cases and subtleties (10-20%)
- **Convergence**: Stops when < 10 new findings or < 5% of Wave 1

### 6 Category Agents

1. **Dependency Failures** (Category A): "What happens when X is down?"
2. **Config Errors** (Category B): "What happens when config is wrong?"
3. **Background Work** (Category C): "What happens when background work fails?"
4. **Test Effectiveness** (Category D): "Do tests actually detect failures?"
5. **Operator Visibility** (Category E): "Is the error visible to operators?"
6. **Functional Stubs** (Category F): "Does this code actually do what its name says?"

### Multi-Agent Validation Panel

- 3 agents review findings: Security, Architect, Builder
- 2/3 consensus required to validate finding
- Prevents false positives and unnecessary changes
- Tracks strong vs weak consensus

### Language-Agnostic

Supports 9 languages with language-specific patterns:

- Python, JavaScript, TypeScript
- Rust, Go, Java, C#
- Ruby, PHP

## Integration Modes

### Standalone Invocation

Direct skill invocation for focused audit:

```
/silent-degradation-audit path/to/codebase
```

### Sub-Loop in Quality Audit Workflow

Integrated as Phase 2 of quality-audit-workflow:

```
quality-audit-workflow calls silent-degradation-audit
→ Returns findings to quality workflow
→ Quality workflow applies fixes
→ Continues to next phase
```

## Usage

### Basic Usage

```bash
# Audit entire codebase
/silent-degradation-audit .

# Audit specific directory
/silent-degradation-audit ./src

# With custom exclusions
/silent-degradation-audit . --exclusions .my-exclusions.json
```

### Configuration

Create `.silent-degradation-config.json` in codebase root:

```json
{
  "convergence": {
    "absolute_threshold": 10,
    "relative_threshold": 0.05
  },
  "max_waves": 6,
  "exclusions": {
    "patterns": ["*.test.js", "test_*.py", "**/__tests__/**"]
  },
  "categories": {
    "enabled": [
      "dependency-failures",
      "config-errors",
      "background-work",
      "test-effectiveness",
      "operator-visibility",
      "functional-stubs"
    ]
  }
}
```

### Exclusion Lists

#### Global Exclusions

Edit `~/.amplihack/.claude/skills/silent-degradation-audit/exclusions-global.json`:

```json
[
  {
    "pattern": "*.test.*",
    "reason": "Test files excluded from production audits",
    "category": "*"
  },
  {
    "pattern": "**/vendor/**",
    "reason": "Third-party code",
    "category": "*"
  }
]
```

#### Repository-Specific Exclusions

Create `.silent-degradation-exclusions.json` in repository root:

```json
[
  {
    "pattern": "src/legacy/*.py",
    "reason": "Legacy code being replaced",
    "category": "*",
    "wave": 1
  },
  {
    "pattern": "api/endpoints.py:42",
    "reason": "Empty dict is valid API response",
    "category": "functional-stubs",
    "type": "exact"
  }
]
```

## Output

### Report Format

Generates `.silent-degradation-report.md`:

```markdown
# Silent Degradation Audit Report

## Summary

- **Total Waves**: 4
- **Total Findings**: 137
- **Converged**: Yes
- **Convergence Ratio**: 4.2%

## Convergence Progress

Wave 1: ██████████████████████████████████████████████████ 120
Wave 2: ███████████████████████████ 65 (54.2% of Wave 1)
Wave 3: ████████ 18 (15.0% of Wave 1)
Wave 4: ██ 5 (4.2% of Wave 1)

Status: ✓ CONVERGED
Reason: Relative threshold met: 4.2% < 5.0%

## Findings by Category

### dependency-failures (42 findings)

- High: 15
- Medium: 20
- Low: 7

[... continues for all 6 categories ...]
```

### Findings Format

Generates `.silent-degradation-findings.json`:

```json
[
  {
    "id": "dep-001",
    "category": "dependency-failures",
    "severity": "high",
    "file": "src/payments.py",
    "line": 89,
    "description": "Payment API failure silently falls back to mock",
    "impact": "Production system using mock payments, no real charges",
    "visibility": "None - no logs or metrics",
    "recommendation": "Add explicit failure logging and metric, or fail fast",
    "wave": 1,
    "validation": {
      "result": "VALIDATED",
      "consensus": "strong",
      "votes": {
        "security": "APPROVE",
        "architect": "APPROVE",
        "builder": "APPROVE"
      }
    }
  },
  ...
]
```

## Workflow Details

### Phase 1: Initialization

1. Create convergence tracker with thresholds
2. Initialize exclusion manager
3. Set up audit state

### Phase 2: Language Detection

1. Scan codebase for file extensions
2. Identify languages (> 5 files or > 5% threshold)
3. Load language-specific patterns

### Phase 3: Load Exclusions

1. Load global exclusions from skill directory
2. Load repository-specific exclusions
3. Merge into single exclusion list

### Phase 4: Wave Loop

For each wave (until convergence):

1. **Category Analysis** (6 agents in parallel)
   - Each agent scans for category-specific issues
   - Uses language-specific patterns
   - Excludes previous findings

2. **Validation Panel** (3 agents in parallel)
   - Security agent reviews security implications
   - Architect agent reviews design impact
   - Builder agent reviews implementation feasibility

3. **Vote Tallying**
   - Require 2/3 consensus (APPROVE)
   - Track strong vs weak consensus
   - Flag inconclusive for human review

4. **Exclusion Filtering**
   - Apply global and repo-specific exclusions
   - Filter out duplicates

5. **State Update**
   - Add new findings to total
   - Record wave metrics

6. **Convergence Check**
   - Absolute: < 10 new findings
   - Relative: < 5% of Wave 1 findings
   - Break if converged

### Phase 5: Report Generation

1. Generate convergence plot
2. Calculate metrics summary
3. Categorize findings by type and severity
4. Write markdown report
5. Write JSON findings

## Architecture

### Directory Structure

```
.claude/skills/silent-degradation-audit/
├── SKILL.md                    # This file
├── reference.md                # Detailed patterns and examples
├── examples.md                 # Usage examples
├── patterns.md                 # Language-specific patterns
├── README.md                   # Quick start
├── category_agents/            # 6 category agent definitions
│   ├── dependency-failures.md
│   ├── config-errors.md
│   ├── background-work.md
│   ├── test-effectiveness.md
│   ├── operator-visibility.md
│   └── functional-stubs.md
├── validation_panel/           # Validation panel specs
│   ├── panel-spec.md
│   └── voting-rules.md
├── recipe/                     # Recipe-based workflow
│   └── audit-workflow.yaml
└── tools/                      # Python utilities
    ├── exclusion_manager.py
    ├── language_detector.py
    ├── convergence_tracker.py
    └── __init__.py
```

### Component Responsibilities

**Category Agents**:

- Scan codebase for category-specific issues
- Use language-specific patterns
- Produce findings with severity, impact, recommendation

**Validation Panel**:

- Review findings from multiple perspectives
- Vote APPROVE/REJECT/ABSTAIN
- Require 2/3 consensus

**Convergence Tracker**:

- Track findings per wave
- Calculate convergence metrics
- Determine when to stop

**Exclusion Manager**:

- Load and merge exclusion lists
- Filter findings against patterns
- Add new exclusions

**Language Detector**:

- Identify languages in codebase
- Load language-specific patterns
- Support 9 languages

## Best Practices

### Running First Audit

1. **Start with small scope**: Audit single service/module first
2. **Review Wave 1 carefully**: Establishes baseline
3. **Tune exclusions**: Add false positives to exclusion list
4. **Verify fixes**: Test fixes before applying broadly

### Exclusion Management

**When to add exclusions:**

- False positives (finding not actually an issue)
- Intentional design (behavior is correct as-is)
- Legacy code (not worth fixing right now)
- Third-party code (can't modify)

**When NOT to add exclusions:**

- Real issues you don't want to fix
- Issues without time to fix now
- Issues that seem hard

Better approach: Fix real issues, prioritize by severity.

### Validation Tuning

**If too many false positives:**

- Review validation panel prompts
- Increase consensus threshold (require unanimous)
- Add category-specific validation rules

**If missing real issues:**

- Review category agent patterns
- Add language-specific patterns
- Decrease consensus threshold (1/3 approval)

### Wave Management

**Typical wave characteristics:**

- Wave 1: 40-50% of findings (obvious issues)
- Wave 2: 25-30% (deeper issues)
- Wave 3: 15-20% (subtle issues)
- Wave 4+: < 10% each (edge cases)

**If waves not converging:**

- Check for duplicate findings (exclusion not working)
- Review category agent overlap (agents finding same things)
- Consider lowering convergence threshold

## Metrics and Monitoring

### Success Metrics

Track these over time:

```
Audit Success:
- Convergence reached: Yes/No
- Waves to convergence: 4 (target: 3-5)
- Total findings: 137 (varies by codebase)
- Validation rate: 75% (target: 60-80%)

Finding Distribution:
- High severity: 15% (target: < 20%)
- Medium severity: 45% (target: 40-60%)
- Low severity: 40% (target: 30-50%)

Panel Effectiveness:
- Strong consensus: 60% (target: > 50%)
- Weak consensus: 30% (target: 20-40%)
- Inconclusive: 10% (target: < 10%)
- Abstention rate: 5% (target: < 10%)
```

### Quality Indicators

**Healthy audit:**

- Converges in 3-5 waves
- Validation rate 60-80%
- Strong consensus > 50%
- Abstention rate < 10%

**Warning signs:**

- Doesn't converge after 6 waves (agents finding same things)
- Validation rate > 95% (rubber stamping)
- Validation rate < 40% (too strict)
- Inconclusive rate > 20% (poor context)

## Troubleshooting

### "Audit not converging"

**Symptoms**: Reaches max waves without convergence

**Causes**:

- Category agents finding duplicate issues
- Exclusion filtering not working
- Convergence threshold too tight

**Solutions**:

1. Review findings for duplicates
2. Check exclusion patterns are matching
3. Increase relative threshold to 10%
4. Reduce max waves to 5

### "Too many false positives"

**Symptoms**: Validation rate > 95%, many non-issues

**Causes**:

- Category agents too aggressive
- Validation panel too permissive
- Patterns not tuned for codebase

**Solutions**:

1. Review category agent patterns
2. Add exclusions for false positive patterns
3. Require unanimous validation (3/3)
4. Tune language-specific patterns

### "Missing real issues"

**Symptoms**: Known issues not in findings

**Causes**:

- Category agent gaps
- Exclusion too broad
- Validation panel too strict

**Solutions**:

1. Check if issue matches any category
2. Review exclusion list for overly broad patterns
3. Lower consensus threshold to 1/3
4. Add specific patterns for missed issues

### "Validation panel abstaining"

**Symptoms**: High abstention rate (> 20%)

**Causes**:

- Insufficient context in findings
- Agent prompts unclear
- Findings outside agent expertise

**Solutions**:

1. Include more code context in findings
2. Review and improve agent prompts
3. Add fourth "generalist" agent
4. Improve finding descriptions

## Advanced Configuration

### Custom Category Agents

Create custom category agent in `category_agents/custom.md`:

```markdown
# Category Custom: My Special Cases

## Core Question

"What happens when [specific scenario]?"

## Detection Focus

[Patterns to detect...]

## Language-Specific Patterns

[Language examples...]
```

Then enable in config:

```json
{
  "categories": {
    "enabled": [
      "dependency-failures",
      "config-errors",
      "background-work",
      "test-effectiveness",
      "operator-visibility",
      "functional-stubs",
      "custom"
    ]
  }
}
```

### Custom Validation Panel

Override validation panel with different agents:

```yaml
# In recipe/audit-workflow.yaml
validation_panel:
  agents:
    - security
    - architect
    - builder
    - domain-expert # Add domain-specific agent

  consensus:
    required: 0.75 # Require 3/4 approval
```

### Staged Rollout

Audit codebase incrementally:

```bash
# Phase 1: Critical services only
/silent-degradation-audit ./services/payments ./services/auth

# Phase 2: All services
/silent-degradation-audit ./services

# Phase 3: Full codebase
/silent-degradation-audit .
```

## See Also

- `reference.md` - Detailed technical reference
- `examples.md` - Real-world usage examples
- `patterns.md` - Language-specific degradation patterns
- `README.md` - Quick start guide
- `category_agents/` - Individual category agent documentation
- `validation_panel/` - Validation panel specifications

## Changelog

### Version 1.0.0 (2025-02-24)

- Initial release
- 6 category agents (A-F)
- Multi-agent validation panel (2/3 consensus)
- Convergence detection (dual thresholds)
- Language-agnostic (9 languages)
- Battle-tested on CyberGym (~250 bugs)
- Integration modes: standalone + sub-loop