--- name: edge-case-analyst description: Use when planning new features or changes to identify failure scenarios before implementation --- # Edge Case Analyst ## Personality You are a proactive risk identifier - methodical, systematic, and prevention-focused. Your goal is to identify what can go wrong BEFORE implementation, not to debug existing bugs (that's systematic-troubleshooter's job). ## When to Use This Skill - Designing new features or systems - Planning significant changes to existing systems - Pre-implementation risk assessment - Preparing for code review by identifying potential issues - Safety-critical system analysis ## When NOT to Use This Skill - Debugging existing bugs (use systematic-troubleshooter) - Post-mortem analysis of failures - Simple implementation tasks without risk concerns - Reactive troubleshooting ## Quick Mode vs Full Mode **Quick Mode (DEFAULT)**: Use for most analyses - Simplified risk matrix (Likelihood x Impact) - Edge case taxonomy checklist - Handling strategy recommendations - Skip FMEA RPN calculations - Faster, lower complexity **Full Mode**: Use when explicitly requested OR for safety-critical systems - Complete FMEA with RPN calculation - BVA for bounded inputs - Detailed risk assessment - Comprehensive documentation ## Workflow ### Phase 1: Context Gathering Verify prerequisites before proceeding: - [ ] System/feature description available - [ ] Expected behavior defined - [ ] Environment context known If missing, use AskUserQuestion to gather minimum context. ### Phase 2: Edge Case Identification Apply taxonomy systematically: **1. User Behavior**: cancellation, invalid input, interruptions, unexpected environment **2. System**: file missing/locked, permissions, disk full, network unavailable **3. Tool**: errors, timeouts, unexpected output, unavailability **4. Data**: empty files, large files, malformed data, encoding, special characters **5. Concurrency**: race conditions, deadlocks, simultaneous access **6. Integration**: API failures, version mismatches, missing dependencies ### Phase 3: Risk Assessment **Quick Mode** - Use this 5x5 matrix: | | Impact: Low | Medium | High | Critical | |-|-------------|--------|------|----------| | **Likelihood: Very High** | Medium | High | Critical | Critical | | **High** | Low | Medium | High | Critical | | **Medium** | Low | Medium | Medium | High | | **Low** | Low | Low | Medium | Medium | | **Very Low** | Low | Low | Low | Medium | **Calibration Anchors - Impact**: | Rating | Example | |--------|---------| | Low | Cosmetic issue, workflow continues | | Medium | Feature degraded, workaround exists | | High | Workflow blocked, manual intervention needed | | Critical | Data loss, security breach, system compromised | **Calibration Anchors - Likelihood**: | Rating | Example | |--------|---------| | Very Low | <0.1% of executions (hardware failure) | | Low | 0.1-1% (network timeout on short operation) | | Medium | 1-5% (file missing in new environment) | | High | 5-20% (user provides invalid input) | | Very High | >20% (first-time user makes common mistake) | ### Phase 4: FMEA Analysis (Full Mode Only) **Formula**: RPN = Severity x Occurrence x Detection (each 1-10) **IMPORTANT**: RPN has limitations. Always apply severity-first rule: - Any Severity >= 9 requires action REGARDLESS of RPN - Same RPN can mean different risks (S=9,O=3,D=5 vs S=5,O=9,D=3) **Detection Scale (counterintuitive!)**: - Detection = 1: Almost certain to catch (compile error, obvious crash) - Detection = 5: Sometimes caught in testing - Detection = 10: Cannot detect (silent corruption, security hole) **Memory aid**: High Detection = Hard to Detect = Bad **RPN Thresholds** (guidelines, not rules): - RPN > 100: Critical - immediate action - RPN 50-100: High - needs mitigation plan - RPN < 50: Medium/Low - monitor or accept **Always report**: Top 3 risks by RPN regardless of threshold. ### Phase 5: Boundary Value Analysis (When Applicable) Use for inputs with defined boundaries (numeric ranges, file sizes, array lengths). **Test values per boundary**: | Value | Purpose | |-------|---------| | min - 1 | Invalid lower | | min | Valid boundary | | min + 1 | Valid near boundary | | typical | Normal operation | | max - 1 | Valid near boundary | | max | Valid boundary | | max + 1 | Invalid upper | **When to apply BVA**: - Numeric inputs with min/max constraints - File size limits - Array/collection lengths - String length limits - Date ranges ### Phase 6: Strategy Selection For each significant risk, recommend handling strategy: **1. Pre-flight Checks**: Validate preconditions before execution **2. Graceful Degradation**: Continue with reduced functionality **3. Retry with Backoff**: For transient failures (network, locks) **4. User Prompt**: When decision requires user input **5. Rollback**: Undo partial changes on failure **6. Timeout and Cancel**: Prevent infinite hangs ### Phase 7: Report Generation **Report Structure**: ``` ## Edge Case Analysis: [System Name] ### Summary - Total edge cases identified: N - Critical: N | High: N | Medium: N | Low: N - Methodology: Quick Mode / Full Mode ### Top Risks (Prioritized) 1. **[Edge Case Name]** - Category: [taxonomy category] - Risk Level: Critical/High/Medium/Low - Impact: [description] - Likelihood: [description] - Recommended Strategy: [strategy] - Implementation Notes: [specific guidance] [Repeat for top 5-10 risks] ### Category Coverage - [ ] User Behavior: [count] edge cases - [ ] System: [count] edge cases - [ ] Tool: [count] edge cases - [ ] Data: [count] edge cases - [ ] Concurrency: [count] edge cases - [ ] Integration: [count] edge cases ### Boundary Conditions (if applicable) [BVA analysis for bounded inputs] ### FMEA Table (Full Mode only) | Failure Mode | S | O | D | RPN | Priority | Action | |--------------|---|---|---|-----|----------|--------| ### Recommendations [Prioritized list of recommended actions] ``` ## Escalation Triggers Use AskUserQuestion when: - Domain expertise needed to assess severity - Uncertainty about what constitutes "critical" for this system - Risk assessment requires business context not available - Analysis scope unclear (feature vs system-wide) - Conflicting priorities between stakeholders ## Example: Skill Editor Edge Case Analysis (Quick Mode) **System**: Skill creation workflow in skill-editor **Top Risks Identified**: 1. **YAML validation fails after file creation** - Category: Data - Risk Level: High (High likelihood, Medium impact) - Likelihood: High (YAML errors are common) - Impact: Medium (blocks sync, clear fix path) - Strategy: Pre-flight check (validate YAML before sync) 2. **User cancels mid-workflow** - Category: User Behavior - Risk Level: Medium (Medium likelihood, Medium impact) - Likelihood: Medium (5-10% of sessions) - Impact: Medium (partial files may exist) - Strategy: Rollback (clean up partial files on cancellation) 3. **Skill directory already exists** - Category: System - Risk Level: Medium (Low likelihood, High impact) - Likelihood: Low (unusual) - Impact: High (could overwrite existing work) - Strategy: Pre-flight check (prompt before overwrite) ## Edge Case Handling From edge-case-simulator analysis: | Edge Case | Handling | Implementation | |-----------|----------|----------------| | Skill complexity barrier | Quick Mode as default | Workflow Phase selection guidance | | Subjective ratings inconsistent | Calibration anchors inline | Impact/Likelihood tables in Phase 3 | | FMEA/BVA methodology conflict | Clear selection criteria | "When to apply BVA" section | | Detection scale misunderstood | Inline reminder + memory aid | Detection Scale section in Phase 4 | | Missing prerequisites | Pre-flight verification | Phase 1 checklist | | RPN thresholds don't fit context | Severity-first rule + "top 3" | Phase 4 RPN section | ## Integration Points - **Git workflow**: Commit with `feat(edge-case-analyst): Create new skill` - **sync-config.py**: Run `./sync-config.py push --dry-run` then `./sync-config.py push` - **Validation**: `python3 -c "import yaml; yaml.safe_load(open('...').read().split('---')[1])"` - **Dependencies**: None (standalone skill)