---
name: reviewing-agent-prompting
description: "Review and improve prompts for coding agents. Use PROACTIVELY when auditing, checking, or evaluating agent instructions, system prompts, or task delegation text. Applies state-machine thinking to identify structural gaps and improve effectiveness."
---
**Why structure matters:** LLMs hallucinate when they have too much freedom and too little structure. Vague prompts force the model to guess intent, approach, tools, and completion criteria. Each guess compounds uncertainty.
**The core insight:** Treat prompts as state machines, not prose. Effective prompts have:
- Entry conditions (what triggers this phase)
- Actions (what to do)
- Exit criteria (how to know it's complete)
- Transitions (what comes next)
**The paradox:** Rigid structure creates reliable autonomy. Constraints clarify; they don't limit.
**What would you like reviewed?**
Provide the prompt text you want analyzed. This could be:
- An agent system prompt
- A task delegation prompt
- Instructions for a coding assistant
- Any prompt meant to guide autonomous behavior
**Wait for the prompt text before proceeding.**
**Phase 1: Structural Analysis**
Check for these elements (present/absent/partial):
**1. Request Classification**
Does the prompt force categorization before action?
**Strong pattern:**
```
Before acting, classify the request:
- TYPE A (Simple): Single file, known location → direct action
- TYPE B (Complex): Unknown scope → explore first
- TYPE C (Research): External context needed → gather info
```
**Weak pattern:** No classification, jumps straight to action
**Why it matters:** Classification prevents wrong-strategy application. A quick lookup shouldn't trigger comprehensive analysis.
**2. Phase Gates with Exit Criteria**
Are there explicit phases with completion conditions?
**Strong pattern:**
```
## Phase 1: Assessment
- Check existing patterns
- Review conventions
**Exit criteria:** Know what patterns to follow before proceeding.
## Phase 2: Execution
...
```
**Weak pattern:** List of actions without phases or completion signals
**Why it matters:** Without gates, agents skip steps or thrash. Exit criteria tell the agent when to move on.
**3. Tool Constraints**
Are tool permissions explicit and minimal?
**Strong pattern:**
```
tools: Read, Grep, Glob (read-only, cannot modify)
```
**Weak pattern:** All tools available, or tools unspecified
**Why it matters:** Constraints clarify. An agent that can't write files knows its purpose is analysis.
**4. Mandatory Output Structure**
Is output format required, not suggested?
**Strong pattern:**
```
Every response MUST include:
...
HIGH/MEDIUM/LOW
...
Responses missing any section are incomplete.
```
**Weak pattern:** "Consider including..." or no format specified
**Why it matters:** Required sections force completeness. The agent can't hand-wave.
**5. Anti-Patterns (NEVER DO)**
Are failure modes explicitly closed off?
**Strong pattern:**
```
## NEVER DO
- Never suppress errors with workarounds
- Never commit without running tests
- Never guess file locations without searching
```
**Weak pattern:** Only positive instructions, no prohibitions
**Why it matters:** Negative examples are as important as positive ones. They close off common failure modes.
**6. Escalation Triggers**
Does the prompt define when to stop and ask for help?
**Strong pattern:**
```
Stop and escalate when:
- 3 consecutive attempts have failed
- Uncertainty exceeds 70%
- Change would affect more than 5 files
```
**Weak pattern:** No acknowledgment of limits
**Why it matters:** An agent that knows when to stop is more reliable than one that guesses forever.
**7. First Action Clarity**
Is the immediate first step crystal clear?
**Strong pattern:**
```
When invoked:
1. Run git diff to see recent changes
2. Read each modified file
3. Begin analysis
```
**Weak pattern:**
```
When invoked:
1. Think about the problem
2. Consider options
3. Do something helpful
```
**Why it matters:** Agents need concrete starting points, not vague intentions.
**Structural Completeness Score**
| Element | Weight | Criteria |
|---------|--------|----------|
| Request Classification | 15% | Forces explicit categorization before action |
| Phase Gates | 20% | Clear phases with exit criteria |
| Tool Constraints | 10% | Explicit, minimal permissions |
| Output Structure | 15% | Mandatory format with required sections |
| Anti-Patterns | 15% | NEVER DO section with specific prohibitions |
| Escalation Triggers | 10% | Knows when to stop and ask |
| First Action | 15% | Concrete immediate step |
**Scoring:**
- 0: Element absent
- 1: Element present but weak
- 2: Element present and strong
**Interpretation:**
- 12-14: Production-ready prompt
- 8-11: Functional but could be more reliable
- 4-7: Significant gaps, expect inconsistent behavior
- 0-3: Essentially unstructured, high hallucination risk
**Agent Prompt Review**
**Summary**
[One sentence assessment]
**Structural Score: X/14**
| Element | Score | Notes |
|---------|-------|-------|
| Classification | 0/1/2 | ... |
| Phase Gates | 0/1/2 | ... |
| Tool Constraints | 0/1/2 | ... |
| Output Structure | 0/1/2 | ... |
| Anti-Patterns | 0/1/2 | ... |
| Escalation | 0/1/2 | ... |
| First Action | 0/1/2 | ... |
**Critical Issues**
[Elements scoring 0 that should be added]
**Improvements**
[Elements scoring 1 that could be strengthened]
**Strengths**
[Elements scoring 2, what's working well]
**Recommended Additions**
[Specific text to add, with examples]
**Input prompt:**
```
You are a helpful code assistant. Help the user with their code questions.
Be thorough and accurate.
```
**Review:**
### Summary
Minimal prompt with no structural elements. High hallucination risk.
### Structural Score: 1/14
| Element | Score | Notes |
|---------|-------|-------|
| Classification | 0 | No request typing |
| Phase Gates | 0 | No phases |
| Tool Constraints | 0 | Unspecified |
| Output Structure | 0 | No format required |
| Anti-Patterns | 0 | No prohibitions |
| Escalation | 0 | No limits defined |
| First Action | 1 | "Help" is vague but present |
**Critical Issues**
- No classification: Agent will apply same approach to simple lookups and complex research
- No phases: Agent may skip directly to answers without gathering context
- No output structure: Responses will be inconsistent
**Recommended Additions**
Add request classification:
```
Before responding, classify the request:
- LOOKUP: Specific fact or syntax → answer directly
- EXPLAIN: Understanding code → read and summarize
- DEBUG: Something broken → investigate systematically
- IMPLEMENT: Write new code → clarify requirements first
```
Add output structure:
```
Every response must include:
Direct response to the question
HIGH/MEDIUM/LOW
Code references or documentation
```
Add anti-patterns:
```
## NEVER DO
- Never guess without searching first
- Never provide code without testing syntax
- Never skip reading files you're asked about
```
**Additional Patterns to Check**
**Parallel Execution Requirements**
For tasks requiring thoroughness, does the prompt specify minimum parallel tool usage?
```
Minimum parallel calls by task type:
- Simple lookup: 2+ tools
- Code search: 3+ tools
- Research: 4+ tools
Cross-validate results. If tools disagree, investigate.
```
**Delegation Structure**
When delegating to other agents, does the prompt require structured handoff?
```
Delegation prompts MUST include:
1. TASK: What specifically needs done
2. EXPECTED OUTCOME: What success looks like
3. MUST DO: Non-negotiable requirements
4. MUST NOT DO: Explicit prohibitions
5. EXIT CRITERIA: How to know it's done
```
**Tool Cost Awareness**
Does the prompt guide efficient tool selection?
```
Tool allocation:
- FREE: grep/glob/read (use liberally)
- CHEAP: explore agent (use for unknown territory)
- EXPENSIVE: oracle agent (reserve for complex decisions)
Start FREE, escalate only when needed.
```
**Confidence Calibration**
Does the prompt require confidence assessment?
```
HIGH: Found definitive answer with evidence
MEDIUM: Found likely answer, some uncertainty
LOW: Best guess, recommend verification
If LOW, state what additional information would raise confidence.
```
**Quick Review Checklist**
Before an agent prompt ships, verify:
- [ ] **Classification**: Does it categorize requests before acting?
- [ ] **Phases**: Are there explicit stages with exit criteria?
- [ ] **First action**: Is step 1 concrete and specific?
- [ ] **Output format**: Is structure required, not suggested?
- [ ] **NEVER DO**: Are common failure modes explicitly prohibited?
- [ ] **Escalation**: Does it know when to stop and ask?
- [ ] **Tools**: Are permissions explicit and minimal?
If more than 2 boxes unchecked, the prompt needs work.