Evaluate - Step Review

--- name: evaluate description: Review agent execution steps with structured feedback. Use when user says evaluate, review steps, or is not satisfied with the result. user_invocable: true --- # Evaluate Skill Interactive step-by-step review of agent's execution. Creates an HTML page where the user sees what the agent did at each step of the Agent Execution Loop and can write targeted feedback. ## When to Use - User calls `/evaluate` - User is not satisfied with a result and wants to give precise feedback - After a complex task where the user wants to review the execution path Do NOT use after every task - only when evaluation is needed. ## Flow 1. Reflect on your actions in the current session 2. For each step (MATCH, THINK, ACT, VERIFY, LEARN) write what you did 3. Generate HTML with embedded data 4. Open in browser 5. User writes comments next to each step, clicks "Copy Feedback" 6. User pastes feedback in chat 7. Agent parses feedback per step and applies fixes (update skill, retry, etc.) ## Step 1: Reflect Analyze the current session and fill in each step honestly: - **MATCH**: Which skill was chosen? Why? Or why none matched? - **THINK**: What was the expected result? What verification criteria were defined? - **ACT**: Which tools were called? In what order? What was parallel vs sequential? - **VERIFY**: What was checked? Did it pass or fail? Any skill audit issues? - **LEARN**: Was anything updated? If not, why? Be specific - include file paths, tool names, actual values. The user needs to see exactly what happened. ## Step 2: Generate HTML Write a self-contained HTML file to `/tmp/evaluate-{timestamp}.html` with the data embedded inline. Use the template below. Replace `__TASK_DESCRIPTION__` and each `__STEP_*__` placeholder with actual content from your reflection. Use HTML-safe text (escape `<`, `>`, `&`). ```html Evaluate - Step Review

1. Match

__STEP_MATCH__

Your feedback on MATCH

2. Think

__STEP_THINK__

Your feedback on THINK

3. Act

__STEP_ACT__

Your feedback on ACT

4. Verify

__STEP_VERIFY__

Your feedback on VERIFY

5. Learn

__STEP_LEARN__

Your feedback on LEARN

``` ## Step 3: Open in Browser ```bash open /tmp/evaluate-{timestamp}.html ``` Tell the user: "Открыл Evaluate в браузере. Напиши комментарии к шагам которые не понравились, нажми Copy Feedback и вставь сюда." ## Step 4: Parse and Fix Feedback When the user pastes feedback back (format: `## Evaluation Feedback` with `### STEP_NAME` sections), parse each step: 1. Read each `**Feedback:**` line 2. Determine what needs fixing: - MATCH feedback -> update skill frontmatter description or CLAUDE.md skill matching rules - THINK feedback -> update skill instructions (expected result, verification criteria) - ACT feedback -> update skill steps, tool usage, order of operations - VERIFY feedback -> add/update verification checks in skill - LEARN feedback -> update skill Known Issues, add prevention rules 3. Apply the fix to the appropriate file 4. Git commit: `evaluate: {skill} - {what changed}` 5. If the task needs re-execution, re-run through the Agent Execution Loop ## Known Issues - Browser clipboard API requires HTTPS or localhost - if clipboard fails, user can manually select all text from the output area