--- name: ci-fix-loop description: Autonomous CI fix loop with background monitoring and retry logic. Runs up to 10 fix-commit-push-wait cycles until CI passes or max retries reached. --- # CI Fix Loop Skill Orchestrates autonomous CI repair: analyze → fix → commit → push → monitor → repeat until success. ## When to Use This skill is invoked when: - User runs `/fix-ci --loop` or `/fix-ci --auto` - Multiple CI fix iterations are needed - User wants hands-off CI repair ## Configuration | Setting | Value | Description | |---------|-------|-------------| | max_attempts | 10 | Maximum fix iterations | | poll_interval | 60 | Seconds between CI status checks | | ci_start_timeout | 120 | Seconds to wait for CI run to start | | ci_run_timeout | 1800 | Max seconds to wait for CI completion (30 min) | ## Workflow ### Phase 1: Initialize Get context and validate: ```bash BRANCH=$(git branch --show-current) REPO=$(gh repo view --json nameWithOwner -q .nameWithOwner 2>/dev/null || echo "unknown") ``` **Safety checks:** 1. Block on protected branches: ```bash if [[ "$BRANCH" == "main" || "$BRANCH" == "master" ]]; then echo "Cannot run autonomous fixes on $BRANCH" echo "Create a feature branch: git checkout -b fix/ci-errors" # STOP - do not proceed fi ``` 2. Handle uncommitted changes: ```bash if [[ -n $(git status --porcelain) ]]; then echo "Stashing uncommitted changes..." git stash push -m "pre-ci-fix-loop-$(date +%Y%m%d_%H%M%S)" fi ``` Initialize state: ``` attempt = 1 max_attempts = 10 last_errors = [] history = [] started_at = now ``` ### Phase 2: Fix Loop For each attempt from 1 to 10: #### Step 2.1: Display Progress ``` ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ CI Fix Loop - Attempt ${attempt}/${max_attempts} ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Branch: ${branch} Repository: ${repo} ``` #### Step 2.2: Fetch CI Logs Get most recent failed run: ```bash RUN_ID=$(gh run list --branch "$BRANCH" --limit 5 --json databaseId,conclusion \ --jq '[.[] | select(.conclusion == "failure")][0].databaseId') if [ -z "$RUN_ID" ]; then echo "No failed runs found - checking if CI is passing..." # May already be fixed, verify fi ``` Fetch logs for failed jobs: ```bash FAILED_JOBS=$(gh run view $RUN_ID --json jobs --jq '.jobs[] | select(.conclusion == "failure") | .databaseId') for JOB_ID in $FAILED_JOBS; do gh api repos/${REPO}/actions/jobs/${JOB_ID}/logs > /tmp/ci-logs-${JOB_ID}.txt 2>/dev/null || true done ``` #### Step 2.3: Analyze Errors Invoke the `ci-log-analyzer` agent: - Parse CI logs from /tmp/ci-logs-*.txt - Extract structured error list with type, file, line, message - Returns JSON with errors categorized by type (lint/test/type/build) #### Step 2.4: Check for Progress Compare current errors with previous attempt: ``` if current_errors == last_errors AND attempt > 1: # Same errors after fix attempt = likely unfixable consecutive_same_errors += 1 if consecutive_same_errors >= 2: echo "Same errors detected after 2 fix attempts - aborting" echo "These errors may require manual intervention" # STOP - exit loop with failure report fi if current_errors is empty: # No errors found - CI might be passing # Skip to monitoring phase ``` #### Step 2.5: Apply Fixes Invoke the `ci-error-fixer` agent with error list: - Applies targeted fixes based on error type - Shows diffs for each change - Reports fixed vs flagged-for-manual-review counts Track results: ``` errors_fixed = count of successfully fixed errors errors_flagged = count of errors needing manual review ``` #### Step 2.6: Commit & Push Stage and commit changes: ```bash git add . # Create descriptive commit message git commit -m "fix(ci): automated fix attempt ${attempt} Errors addressed: - ${error_summary_list} Attempt ${attempt} of ${max_attempts} (ci-fix-loop)" ``` Push to trigger CI: ```bash PUSH_TIME=$(date -u +%Y-%m-%dT%H:%M:%SZ) git push origin ${BRANCH} ``` #### Step 2.7: Wait for CI Run to Start Poll until new run appears (max 2 minutes): ```bash TIMEOUT=120 START=$(date +%s) while true; do RUN_JSON=$(gh run list --branch "$BRANCH" --limit 1 --json databaseId,status,createdAt) CREATED=$(echo "$RUN_JSON" | jq -r '.[0].createdAt') # Check if this run was created after our push if [[ "$CREATED" > "$PUSH_TIME" ]]; then NEW_RUN_ID=$(echo "$RUN_JSON" | jq -r '.[0].databaseId') echo "CI run started: $NEW_RUN_ID" break fi ELAPSED=$(($(date +%s) - START)) if [ $ELAPSED -gt $TIMEOUT ]; then echo "Warning: No CI run started after ${TIMEOUT}s" echo "Check if workflows are enabled for this branch" break fi sleep 5 done ``` #### Step 2.8: Monitor CI (Background) Spawn the `ci-monitor` agent with `run_in_background: true`: The monitor will: - Poll `gh run list` every 60 seconds - Return when CI reaches terminal state - Output: `SUCCESS|RUN_ID`, `FAILURE|RUN_ID`, `CANCELLED|RUN_ID`, or `TIMEOUT|RUN_ID` Wait for monitor result using `TaskOutput` tool. #### Step 2.9: Handle Result Parse monitor output: ``` case "$RESULT" in SUCCESS*) # CI passed! Exit loop with success ;; FAILURE*) # CI still failing - continue to next attempt ;; CANCELLED*) # Run was cancelled - warn and exit echo "CI run was cancelled externally" # EXIT with warning ;; TIMEOUT*) # Exceeded 30 min wait echo "CI run timed out after 30 minutes" # Ask if should continue waiting or abort ;; esac ``` #### Step 2.10: Record History ``` history.append({ attempt: attempt, errors_found: len(current_errors), errors_fixed: errors_fixed, errors_flagged: errors_flagged, run_id: run_id, result: conclusion, duration: attempt_duration }) last_errors = current_errors attempt += 1 ``` ### Phase 3: Final Report ``` ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ CI Fix Loop Complete ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Result: [SUCCESS|FAILURE] after ${attempts} attempt(s) Summary: Total time: ${total_duration} Commits created: ${commit_count} Errors fixed: ${total_errors_fixed} History: ``` For each entry in history: ``` Attempt ${n}: Found ${errors_found} errors, fixed ${fixed} → ${result} ``` If FAILURE: ``` Remaining Issues (require manual intervention): - ${file}:${line} - ${message} Type: ${type} Suggested next steps: 1. Review errors above 2. Check CI logs: gh run view ${last_run_id} --log-failed 3. Fix manually and push ``` If SUCCESS: ``` CI is now passing! Next steps: 1. Review automated commits: git log --oneline -${commit_count} 2. Squash if desired: git rebase -i HEAD~${commit_count} 3. Create PR: /github:create-pr ``` ## Error Handling ### Network/API Failures - Retry `gh` commands 3 times with 5s backoff - If persistent, abort and report ### Git Conflicts - If push fails due to upstream changes: ``` echo "Upstream changes detected" echo "Pull and retry: git pull --rebase && /fix-ci --loop" ``` - Abort loop ### Unfixable Errors - Track errors persisting across 2+ attempts - Mark as "unfixable" in final report - Continue attempting other errors ### Timeout - CI run timeout (30 min): report and suggest `gh run watch` - CI start timeout (2 min): check workflow configuration ## Safety Mechanisms 1. **Branch protection**: Never run on main/master 2. **Max attempts**: Hard limit of 10 iterations 3. **Stash protection**: Uncommitted changes are preserved 4. **Progress detection**: Abort if same errors repeat twice 5. **Timeout limits**: 30 min max CI wait per attempt 6. **Commit tracking**: Report all commits for easy revert ## Token Efficiency Estimated per iteration: - Analysis (sonnet): ~2000 tokens - Fix application (sonnet): ~3000 tokens - CI monitoring (haiku): ~500 tokens - State/reporting: ~500 tokens - **Total: ~6000 tokens/iteration** - **10 iterations max: ~60,000 tokens** Key optimizations: - Haiku model for CI polling (10x cheaper than sonnet) - No context accumulation between iterations - Minimal state tracking - Background execution frees terminal