# Error Recovery Patterns Skill This skill provides comprehensive guidance on error handling patterns, recovery strategies, and debugging techniques in GitHub Agentic Workflows (gh-aw). ## Purpose Guide developers in implementing robust error recovery patterns to: - Reduce retry loops in agent sessions (target: <10% vs current 23%) - Implement circuit breakers to prevent infinite retry loops - Add proactive recovery for installation, dependency, and API failures - Improve debug logging for recovery attempts ## When to Use This Skill Invoke this skill when: - Implementing retry logic for network operations, installations, or API calls - Debugging retry loop issues in workflows or agent sessions - Adding error recovery patterns to new or existing code - Understanding transient vs non-transient error classification - Implementing circuit breakers or exponential backoff - Adding debug logging for recovery attempts ## Key Concepts Covered ### 1. Circuit Breaker Pattern - Maximum retry limits (standard: 3 attempts) - Exponential backoff strategies - Fail-fast on non-transient errors - Implementation in JavaScript, Shell, and Go ### 2. Installation Failure Recovery - NPM installation with cache clearing and registry fallbacks - Python pip installation with mirror alternatives - Docker image pull with retry and rate limit handling - Copilot CLI installation with network retry ### 3. API Timeout and Rate Limit Handling - GitHub API rate limit detection and backoff - Transient error detection patterns - Custom retry configuration for different APIs - Rate limit-specific retry strategies ### 4. Debug Logging for Recovery - Logger package usage for retry attempts - Category naming conventions (pkg:filename) - DEBUG environment variable patterns - Zero-overhead logging when disabled ### 5. Error Categorization - Transient vs non-transient errors - Network errors, timeout patterns - HTTP error codes (502, 503, 504) - GitHub-specific errors (rate limits, abuse detection) ## Anti-Patterns to Avoid This skill explicitly covers anti-patterns to avoid: - ❌ Infinite retry loops without maximum limits - ❌ Retrying validation errors that won't self-correct - ❌ No backoff delay between attempts - ❌ Silent retries without logging - ❌ Retrying non-transient errors ## Code Examples Provided The skill includes production-ready examples for: - JavaScript retry with `withRetry()` function - Shell script retry loops with exponential backoff - Go retry patterns with context and timeouts - NPM/pip/docker installation recovery - GitHub API rate limit handling - Debug logging for all recovery attempts ## Related Skills - **error-messages** - Error message formatting and style guide - **error-pattern-safety** - Safety guidelines for error pattern regex - **developer** - General development guidelines and conventions ## Full Documentation Complete documentation available at: `../../scratchpad/error-recovery-patterns.md` This skill references the comprehensive error recovery patterns document which includes: - Console formatting requirements - Error wrapping patterns - Common error scenarios with step-by-step resolution - Error message templates - Debugging runbook - Error categorization decision trees - Metrics and monitoring strategies