--- name: reviewing-agent-prompting description: "Review and improve prompts for coding agents. Use PROACTIVELY when auditing, checking, or evaluating agent instructions, system prompts, or task delegation text. Applies state-machine thinking to identify structural gaps and improve effectiveness." --- **Why structure matters:** LLMs hallucinate when they have too much freedom and too little structure. Vague prompts force the model to guess intent, approach, tools, and completion criteria. Each guess compounds uncertainty. **The core insight:** Treat prompts as state machines, not prose. Effective prompts have: - Entry conditions (what triggers this phase) - Actions (what to do) - Exit criteria (how to know it's complete) - Transitions (what comes next) **The paradox:** Rigid structure creates reliable autonomy. Constraints clarify; they don't limit. **What would you like reviewed?** Provide the prompt text you want analyzed. This could be: - An agent system prompt - A task delegation prompt - Instructions for a coding assistant - Any prompt meant to guide autonomous behavior **Wait for the prompt text before proceeding.** **Phase 1: Structural Analysis** Check for these elements (present/absent/partial): **1. Request Classification** Does the prompt force categorization before action? **Strong pattern:** ``` Before acting, classify the request: - TYPE A (Simple): Single file, known location → direct action - TYPE B (Complex): Unknown scope → explore first - TYPE C (Research): External context needed → gather info ``` **Weak pattern:** No classification, jumps straight to action **Why it matters:** Classification prevents wrong-strategy application. A quick lookup shouldn't trigger comprehensive analysis. **2. Phase Gates with Exit Criteria** Are there explicit phases with completion conditions? **Strong pattern:** ``` ## Phase 1: Assessment - Check existing patterns - Review conventions **Exit criteria:** Know what patterns to follow before proceeding. ## Phase 2: Execution ... ``` **Weak pattern:** List of actions without phases or completion signals **Why it matters:** Without gates, agents skip steps or thrash. Exit criteria tell the agent when to move on. **3. Tool Constraints** Are tool permissions explicit and minimal? **Strong pattern:** ``` tools: Read, Grep, Glob (read-only, cannot modify) ``` **Weak pattern:** All tools available, or tools unspecified **Why it matters:** Constraints clarify. An agent that can't write files knows its purpose is analysis. **4. Mandatory Output Structure** Is output format required, not suggested? **Strong pattern:** ``` Every response MUST include: ... HIGH/MEDIUM/LOW ... Responses missing any section are incomplete. ``` **Weak pattern:** "Consider including..." or no format specified **Why it matters:** Required sections force completeness. The agent can't hand-wave. **5. Anti-Patterns (NEVER DO)** Are failure modes explicitly closed off? **Strong pattern:** ``` ## NEVER DO - Never suppress errors with workarounds - Never commit without running tests - Never guess file locations without searching ``` **Weak pattern:** Only positive instructions, no prohibitions **Why it matters:** Negative examples are as important as positive ones. They close off common failure modes. **6. Escalation Triggers** Does the prompt define when to stop and ask for help? **Strong pattern:** ``` Stop and escalate when: - 3 consecutive attempts have failed - Uncertainty exceeds 70% - Change would affect more than 5 files ``` **Weak pattern:** No acknowledgment of limits **Why it matters:** An agent that knows when to stop is more reliable than one that guesses forever. **7. First Action Clarity** Is the immediate first step crystal clear? **Strong pattern:** ``` When invoked: 1. Run git diff to see recent changes 2. Read each modified file 3. Begin analysis ``` **Weak pattern:** ``` When invoked: 1. Think about the problem 2. Consider options 3. Do something helpful ``` **Why it matters:** Agents need concrete starting points, not vague intentions. **Structural Completeness Score** | Element | Weight | Criteria | |---------|--------|----------| | Request Classification | 15% | Forces explicit categorization before action | | Phase Gates | 20% | Clear phases with exit criteria | | Tool Constraints | 10% | Explicit, minimal permissions | | Output Structure | 15% | Mandatory format with required sections | | Anti-Patterns | 15% | NEVER DO section with specific prohibitions | | Escalation Triggers | 10% | Knows when to stop and ask | | First Action | 15% | Concrete immediate step | **Scoring:** - 0: Element absent - 1: Element present but weak - 2: Element present and strong **Interpretation:** - 12-14: Production-ready prompt - 8-11: Functional but could be more reliable - 4-7: Significant gaps, expect inconsistent behavior - 0-3: Essentially unstructured, high hallucination risk **Agent Prompt Review** **Summary** [One sentence assessment] **Structural Score: X/14** | Element | Score | Notes | |---------|-------|-------| | Classification | 0/1/2 | ... | | Phase Gates | 0/1/2 | ... | | Tool Constraints | 0/1/2 | ... | | Output Structure | 0/1/2 | ... | | Anti-Patterns | 0/1/2 | ... | | Escalation | 0/1/2 | ... | | First Action | 0/1/2 | ... | **Critical Issues** [Elements scoring 0 that should be added] **Improvements** [Elements scoring 1 that could be strengthened] **Strengths** [Elements scoring 2, what's working well] **Recommended Additions** [Specific text to add, with examples] **Input prompt:** ``` You are a helpful code assistant. Help the user with their code questions. Be thorough and accurate. ``` **Review:** ### Summary Minimal prompt with no structural elements. High hallucination risk. ### Structural Score: 1/14 | Element | Score | Notes | |---------|-------|-------| | Classification | 0 | No request typing | | Phase Gates | 0 | No phases | | Tool Constraints | 0 | Unspecified | | Output Structure | 0 | No format required | | Anti-Patterns | 0 | No prohibitions | | Escalation | 0 | No limits defined | | First Action | 1 | "Help" is vague but present | **Critical Issues** - No classification: Agent will apply same approach to simple lookups and complex research - No phases: Agent may skip directly to answers without gathering context - No output structure: Responses will be inconsistent **Recommended Additions** Add request classification: ``` Before responding, classify the request: - LOOKUP: Specific fact or syntax → answer directly - EXPLAIN: Understanding code → read and summarize - DEBUG: Something broken → investigate systematically - IMPLEMENT: Write new code → clarify requirements first ``` Add output structure: ``` Every response must include: Direct response to the question HIGH/MEDIUM/LOW Code references or documentation ``` Add anti-patterns: ``` ## NEVER DO - Never guess without searching first - Never provide code without testing syntax - Never skip reading files you're asked about ``` **Additional Patterns to Check** **Parallel Execution Requirements** For tasks requiring thoroughness, does the prompt specify minimum parallel tool usage? ``` Minimum parallel calls by task type: - Simple lookup: 2+ tools - Code search: 3+ tools - Research: 4+ tools Cross-validate results. If tools disagree, investigate. ``` **Delegation Structure** When delegating to other agents, does the prompt require structured handoff? ``` Delegation prompts MUST include: 1. TASK: What specifically needs done 2. EXPECTED OUTCOME: What success looks like 3. MUST DO: Non-negotiable requirements 4. MUST NOT DO: Explicit prohibitions 5. EXIT CRITERIA: How to know it's done ``` **Tool Cost Awareness** Does the prompt guide efficient tool selection? ``` Tool allocation: - FREE: grep/glob/read (use liberally) - CHEAP: explore agent (use for unknown territory) - EXPENSIVE: oracle agent (reserve for complex decisions) Start FREE, escalate only when needed. ``` **Confidence Calibration** Does the prompt require confidence assessment? ``` HIGH: Found definitive answer with evidence MEDIUM: Found likely answer, some uncertainty LOW: Best guess, recommend verification If LOW, state what additional information would raise confidence. ``` **Quick Review Checklist** Before an agent prompt ships, verify: - [ ] **Classification**: Does it categorize requests before acting? - [ ] **Phases**: Are there explicit stages with exit criteria? - [ ] **First action**: Is step 1 concrete and specific? - [ ] **Output format**: Is structure required, not suggested? - [ ] **NEVER DO**: Are common failure modes explicitly prohibited? - [ ] **Escalation**: Does it know when to stop and ask? - [ ] **Tools**: Are permissions explicit and minimal? If more than 2 boxes unchecked, the prompt needs work.