--- name: evaluate description: Review agent execution steps with structured feedback. Use when user says evaluate, review steps, or is not satisfied with the result. user_invocable: true --- # Evaluate Skill Interactive step-by-step review of agent's execution. Creates an HTML page where the user sees what the agent did at each step of the Agent Execution Loop and can write targeted feedback. ## When to Use - User calls `/evaluate` - User is not satisfied with a result and wants to give precise feedback - After a complex task where the user wants to review the execution path Do NOT use after every task - only when evaluation is needed. ## Flow 1. Reflect on your actions in the current session 2. For each step (MATCH, THINK, ACT, VERIFY, LEARN) write what you did 3. Generate HTML with embedded data 4. Open in browser 5. User writes comments next to each step, clicks "Copy Feedback" 6. User pastes feedback in chat 7. Agent parses feedback per step and applies fixes (update skill, retry, etc.) ## Step 1: Reflect Analyze the current session and fill in each step honestly: - **MATCH**: Which skill was chosen? Why? Or why none matched? - **THINK**: What was the expected result? What verification criteria were defined? - **ACT**: Which tools were called? In what order? What was parallel vs sequential? - **VERIFY**: What was checked? Did it pass or fail? Any skill audit issues? - **LEARN**: Was anything updated? If not, why? Be specific - include file paths, tool names, actual values. The user needs to see exactly what happened. ## Step 2: Generate HTML Write a self-contained HTML file to `/tmp/evaluate-{timestamp}.html` with the data embedded inline. Use the template below. Replace `__TASK_DESCRIPTION__` and each `__STEP_*__` placeholder with actual content from your reflection. Use HTML-safe text (escape `<`, `>`, `&`). ```html