--- name: helpmetest-self-heal description: "Autonomous test maintenance agent. Monitors test failures and fixes them automatically. Always use this when tests start failing after a UI or code change — it's far more systematic than trying to fix tests manually one by one. Use when user mentions 'fix failing tests', 'heal tests', 'auto-fix', 'monitor test health', 'tests broke after deploy', or test suite has multiple failures needing systematic repair. Distinguishes fixable test issues (selector changes, timing) from real application bugs." allowed-tools: mcp__helpmetest-* --- # Self-Healing Agent Monitors test failures and fixes them autonomously. **Mode:** On startup, fix all existing failures. Then monitor for new failures and fix them as they occur. ## Prerequisites **MANDATORY:** Before healing, call: ``` how_to({ type: "context_discovery" }) how_to({ type: "debugging_self_healing" }) how_to({ type: "interactive_debugging" }) ``` Use `context_discovery` to find the existing SelfHealing artifact (if any) and resume from it rather than creating a duplicate. Also identifies which Feature artifacts have known bugs so you don't try to heal tests that are failing due to real application bugs. ## Startup: Fix Existing Failures On startup, check all tests for failures: 1. Get list of all failing tests using `helpmetest_status` 2. For each failing test: - Get recent test history to understand failure pattern - Classify failure type (selector change, timing issue, etc.) - Determine if fixable or not 3. For fixable failures (test issues): - Investigate interactively using `helpmetest_run_interactive_command` - Find the fix (new selector, longer timeout, etc.) - Apply fix using `helpmetest_upsert_test` - Verify fix works using `helpmetest_run_test` - Document the fix in SelfHealing artifact 4. For non-fixable failures (likely bugs): - Document why not fixable in SelfHealing artifact - Include issue type, error message, recommendation 5. After processing all existing failures, enter monitoring mode ## Monitoring: Respond to New Failures After startup, monitor for test failures using `listen_to_events`: When a test fails: 1. Get error details 2. Classify failure pattern 3. If fixable: investigate, fix, verify, document 4. If not fixable: document as potential bug 5. Continue monitoring ## Healing Workflow Use this workflow for both startup failures and new failures during monitoring: 1. **Fetch test history:** - Get last 10 runs using `helpmetest_status` - Look for patterns (alternating PASS/FAIL? changing error values?) 2. **Classify failure pattern:** - Check error type (selector? timeout? assertion?) - Determine root cause using `debugging_self_healing.md` categories - Decide: test issue (fixable) vs app bug (not fixable) 3. **If fixable (test issue):** - Investigate interactively using `helpmetest_run_interactive_command` - Find the fix (new selector, longer timeout, etc.) - Validate fix works interactively first - Apply fix using `helpmetest_upsert_test` - Verify fix by running test multiple times (especially for isolation issues) - Document in SelfHealing artifact: * test_id, pattern_detected, fix_applied, verification_result, timestamp * Tags: feature:self-healing, test:, pattern: 4. **If not fixable (app bug):** - Document in SelfHealing artifact: * issue_type, error_message, why_not_fixable, recommendation 5. **Continue monitoring** for new failures ## Fixable vs Not Fixable **Fixable (test issues):** - Selector changed (element still exists, different selector needed) - Timing issue (element appears later, need longer wait) - Form field added/removed (form structure changed) - Button moved (still exists, different location) - Test isolation (shared state between tests, make idempotent) **Not fixable (likely bugs):** - Authentication broken - Server errors (500, 404) - Missing pages/features - Data corruption - API endpoints removed ## SelfHealing Artifact Track all healing activity in a SelfHealing artifact. This gives the user a clear record of what was fixed automatically vs what needs human attention. ```json { "type": "SelfHealing", "id": "self-healing-log", "name": "SelfHealing: Test Maintenance Log", "content": { "fixed": [ { "test_id": "test-login", "pattern_detected": "selector_change", "error": "Element not found: button.submit", "fix_applied": "Updated selector to [data-testid='submit-btn']", "verification_result": "Test passed on re-run", "timestamp": "2024-01-15T10:30:00Z", "tags": ["feature:authentication", "pattern:selector_change"] } ], "not_fixed": [ { "test_id": "test-checkout", "issue_type": "server_error", "error_message": "500 Internal Server Error on POST /api/checkout", "why_not_fixable": "Backend endpoint returning 500 - application bug, not a test issue", "recommendation": "Investigate checkout API endpoint", "timestamp": "2024-01-15T10:35:00Z" } ], "summary": { "total_processed": 5, "fixed": 3, "not_fixable": 2, "last_run": "2024-01-15T10:35:00Z" } } } ``` ## Monitoring Loop The monitor must run in the background — otherwise the agent is blocked while healing a test and can't receive new events at the same time. **Preferred: use `/loop` if available.** The `/loop` skill provides background execution and will handle the event loop for you. Pass it the monitoring logic below. **Fallback: spawn as a background Task.** If `/loop` isn't available, spawn a subagent with the Task tool and let it run the loop independently. The foreground conversation stays free while the background agent heals tests. **Monitoring logic (runs inside the background agent):** ``` listen_to_events({ type: "test_run_completed" }) ``` When an event arrives with a failed test: 1. Check `event.test_id` and `event.error` 2. Get test history: `helpmetest_status({ id: event.test_id, testRunLimit: 10 })` 3. Run the healing workflow (classify → investigate → fix or document) 4. Update SelfHealing artifact with result 5. Resume listening Events that arrive while healing a test are queued — they'll be processed once the current fix completes. Report a summary to the SelfHealing artifact periodically so the user can check in at any time.