---
name: agent-creation
description: Systematically design and validate specialist agents with evidence-based prompting, tooling contracts, and adversarial evaluation.
allowed-tools: Read, Write, Edit, Bash, Glob, Grep, Task, TodoWrite
model: sonnet
x-version: 3.2.0
x-category: foundry
x-vcl-compliance: v3.1.1
x-cognitive-frames:
  - HON
  - MOR
  - COM
  - CLS
  - EVD
  - ASP
  - SPC
---


### L1 Improvement
- Rebuilt the skill to follow the Skill Forge section cadence with explicit guardrails, validation hooks, and confidence ceilings.
- Added prompt-architect style constraint extraction, contract-first design, and adversarial testing steps to avoid shallow agent definitions.

## STANDARD OPERATING PROCEDURE

### Purpose
Create production-ready specialist agents with clear personas, tool wiring, evaluation scaffolding, and documentation that can be routed by orchestration layers.

### Trigger Conditions
- Positive: requests to create or refine an agent, multi-agent system design, agent prompt hardening, adding new tools to an agent.
- Negative/reroute: single-turn tasks better handled by micro-skill-creator, standalone prompt tuning handled by prompt-architect, or skill structure requests handled by skill-builder/skill-forge.

### Guardrails
- Use English-first outputs with explicit confidence ceilings following prompt-architect rules.
- Structure-first: deliver the agent spec (frontmatter + body), examples, and validation notes; avoid "created" confirmations without artifacts.
- Avoid duplication: check existing registry entries; prefer specialization over overlap.
- Run adversarial probes (boundary, failure, misuse) before considering the agent production-ready.
- Respect hooks latency targets from Skill Forge: pre_hook_target_ms 20/100 max; post_hook_target_ms 100/1000 max.

### Execution Phases
1. **Discovery**: Capture intent, domain, constraints (hard/soft/inferred), tools, and expected outputs.
2. **Design**: Define persona, responsibilities, tool permissions, memory strategy, and decision checkpoints.
3. **Prompt Construction**: Author the system prompt with role, inputs, outputs, style, and escalation rules; add few-shot patterns if helpful.
4. **Validation**: Create test cases (happy path + edge), run self-consistency checks, and document evaluation results with confidence ceilings.
5. **Delivery**: Package the agent spec, integration notes, and next-step risks; register metadata for agent-selector routing.

### Pattern Recognition
- Greenfield agent for new domain → emphasize constraint extraction and baseline tests.
- Refining underperforming agent → capture failure cases and add targeted guardrails.
- Multi-agent topology → design interaction contracts, escalation paths, and boundaries between agents.

### Advanced Techniques
- Use program-of-thought to decompose complex roles into capabilities and checkpoints.
- Apply contrastive prompting to defend against prompt drift and misuse.
- Capture evaluation harness steps so skill-forge or CI can rerun them.

### Common Anti-Patterns
- Shipping an agent without tool/contracts defined.
- Using generic personas without domain evidence or examples.
- Skipping adversarial tests or confidence ceilings.

### Practical Guidelines
- Prefer specific tool verbs over open-ended "help" instructions.
- Keep output formats deterministic (JSON or structured text) to ease downstream parsing.
- Record INFERRED constraints and seek confirmation before finalizing.

### Cross-Skill Coordination
- Upstream: prompt-architect for constraint extraction and clarity; skill-builder for directory scaffolding.
- Parallel: cognitive-lensing for alternative reasoning frames; meta-tools for tooling composition.
- Downstream: agent-selector for routing; recursive-improvement for iterative hardening.

### MCP Requirements
- Optional memory/vector MCP for loading prior agent outcomes; tag sessions with WHO=agent-creation-{session}, WHY=skill-execution.
- If external tools are added, document authentication and rate-limit assumptions in the agent spec.

### Input/Output Contracts
```yaml
inputs:
  task: string  # required description of the agent's job
  domain: string  # required domain or vertical
  tools: list[string]  # optional tools or MCP servers to integrate
  constraints: list[string]  # optional hard/soft/inferred constraints
outputs:
  agent_spec: file  # system prompt with frontmatter and body
  eval_notes: file  # tests, adversarial cases, and results
  integration: summary  # how to register and route the new agent
```

### Recursive Improvement
- Run recursive-improvement with failure cases and evaluation deltas; iterate until improvement delta < 2% or risks are documented.

### Examples
- Create a compliance-review agent with SOC2/GDPR guardrails and audit-ready outputs.
- Refine an existing API-tester agent to add contract testing and retry logic for flaky endpoints.

### Troubleshooting
- Ambiguous scope → rerun constraint extraction and confirm hard vs soft requirements.
- Overlapping agents → consult registry and propose consolidation or specialization.
- Tool misfires → add error-handling branches and safe defaults in the prompt body.

### Completion Verification
- [ ] Agent spec delivered with persona, contracts, and escalation rules.
- [ ] Tests and adversarial probes executed with recorded results.
- [ ] Confidence ceiling stated; memory/tags applied if MCP used.
- [ ] Registry metadata prepared for agent-selector.

Confidence: 0.70 (ceiling: inference 0.70) - SOP rewritten with Skill Forge structure and prompt-architect guardrails.