--- name: ai-agent-prd version: 0.1.1 description: "Write comprehensive PRDs for AI Agent products—covering agent identity, capability architecture (skills, tools, memory, RAG, workflows), behavior specifications, safety guardrails, and evaluation frameworks. Use when: designing conversational agents, autonomous agents, copilots, multi-agent systems, or any LLM-powered agentic application. Triggers: 'AI agent PRD', 'agent product requirements', 'design AI agent', 'agent capability spec', 'LLM agent requirements', '智能体PRD', '智能体需求文档', '对话机器人PRD', '多智能体系统需求'. Anti-triggers: '传统PRD(非智能体)', '只润色提示词/只写Prompt', '只写用户故事/验收标准但不涉及工具调用、记忆或RAG'." license: MIT compatibility: "Works with any agent framework. Skeleton generator requires bash (macOS/Linux); Windows via WSL/Git Bash." metadata: author: "肆〇柒/ForOhZen" version: 0.1.1 category: product-management tags: "prd, ai-agent, llm, skills, tools, rag, workflow, memory" --- # AI Agent PRD Guide ## Overview Write PRDs for AI Agent products that define not just **what the agent does**, but **how it thinks, decides, and acts**. ### Relationship with Other Skills This skill **extends** `prd-writing-guide` for AI Agent products specifically. You should: - Apply `prd-writing-guide`'s **Seven Lenses** to each agent capability - Follow `prd-writing-guide`'s **Writing Style Guide** for requirement clarity - Use `prd-writing-guide`'s **Developer Test** as your quality bar **Handoff:** The Agent PRD this skill produces feeds into `prd-to-engineering-spec` for technical design. That skill includes an Agent-specific validation branch for converting agent capabilities into engineering specs. ``` Traditional PRD: Input → Deterministic Logic → Output Agent PRD: Goal → Perceive → Think → Decide → Act → Learn ↑ │ └───────── Feedback ────────┘ You're not defining a function. You're defining a cognitive architecture. ``` ### Quality Test Can your engineering team answer these without asking you? - What is the agent's purpose and identity? - What capabilities (skills/tools) does it have? - How does it decide what to do? - What can it NOT do? (boundaries) - When should humans intervene? - How do we know if it's working well? ## Quick Start 1. Generate a document skeleton: ```bash bash scripts/generate_agent_prd_skeleton.sh ./docs/agent-prd "Customer Support Agent" ``` 2. Fill in using templates from references 3. Validate completeness with checklist **Note:** The skeleton generator writes a set of `.md` files into your output directory. Use a new/empty folder to avoid accidental overwrites. --- ## Workflow ``` Phase 1: Agent Identity ──────► Who is the agent? What's its purpose? ↓ Phase 2: Capability Architecture ──► Skills, Tools, Memory, RAG, Workflows ↓ Phase 3: Behavior & System Prompt ─► How does it think? What's its DNA? ↓ Phase 4: Conversation Design ────► Golden conversations, example behaviors ↓ Phase 5: Safety & Guardrails ────► What can't it do? Human oversight? ↓ Phase 6: Evaluation Framework ───► How do we measure success? ↓ Phase 7: Operational Model ──────► Cost, scaling, iteration ``` --- ## Phase 1: Agent Identity **Goal:** Define who the agent is and its relationship with users. ### Key Elements | Element | Questions to Answer | |---------|---------------------| | **Persona** | Name, role, personality, expertise domain | | **Mission** | Why does this agent exist? | | **Boundaries** | What it IS vs what it is NOT | | **User Relationship** | Copilot, Autopilot, Peer, Expert, or Executor? | ### User-Agent Relationship Models | Model | Description | Example | |-------|-------------|---------| | **Copilot** | Human leads, agent assists | Code completion | | **Autopilot** | Agent leads, human monitors | Customer support | | **Peer** | Equal collaboration | Brainstorming | | **Expert** | Agent advises, human decides | Medical advisor | | **Executor** | Human commands, agent executes | Task automation | --- ## Phase 2: Capability Architecture **Goal:** Define the building blocks that enable agent capabilities. ### Capability Stack ``` ┌─────────────────────────────────────────────────────────────────┐ │ SKILLS TOOLS WORKFLOWS │ │ (What it (External (Multi-step │ │ can do) actions) processes) │ │ └──────────────┼──────────────┘ │ │ ↓ │ │ AGENT CORE (Reasoning, Planning) │ │ ↓ │ │ ┌──────────────┼──────────────┐ │ │ MEMORY RAG CONTEXT │ │ (State/History) (Knowledge) (Awareness) │ └─────────────────────────────────────────────────────────────────┘ ``` ### 2.1 Skills Reusable capability modules. See [skills-specification.md](references/skills-specification.md). **Per skill, document:** - Purpose & trigger conditions - Input/output specification - Process logic - Examples & boundaries ### 2.2 Tools External actions the agent can invoke. See [tools-specification.md](references/tools-specification.md). **Per tool, document:** - Interface definition (JSON schema) - Execution details (endpoint, auth, timeout) - Response handling - Safety requirements (confirmation, audit) ### 2.3 Memory Stateful, context-aware behavior. See [memory-patterns.md](references/memory-patterns.md). | Type | Scope | Example | |------|-------|---------| | **Working** | Current request | Context window | | **Session** | Current session | Conversation history | | **Long-term** | Cross-session | User preferences | ### 2.4 Knowledge (RAG) Knowledge grounding via retrieval. See [memory-patterns.md](references/memory-patterns.md) for architecture patterns. **Per knowledge source, document:** | Attribute | Specify | |-----------|---------| | **Source** | What data source? (docs, DB, API, web) | | **Format** | Document types, data structure | | **Volume** | How much data? Growth rate? | | **Freshness** | Update frequency? Acceptable staleness? | | **Authority** | Is this authoritative? What if conflicting sources? | **Retrieval configuration:** - Chunking strategy (semantic, fixed-size, hybrid) and chunk size rationale - Embedding model and dimension - Retrieval method (dense, sparse, hybrid) and top-k range - Re-ranking strategy (if any) - Quality threshold (minimum similarity score for inclusion) **Knowledge gap handling:** - How does the agent detect it doesn't know something? - Response when knowledge is insufficient (admit? search? escalate?) - Citation requirements (when must it cite? format? inline or footnote?) **Knowledge conflict resolution:** - When multiple sources disagree, which takes priority? - Should the agent present conflicting views or choose one? ### 2.5 Workflows Multi-step orchestrated processes. Document: - Trigger and steps with success criteria - Human checkpoints - Timeout and cancellation handling --- ## Phase 3: Behavior & System Prompt **Goal:** Define how the agent thinks, decides, communicates—and encode it into a System Prompt specification. ### Reasoning Strategies | Strategy | Description | Use When | |----------|-------------|----------| | **ReAct** | Think → Act → Observe → Repeat | Most tasks | | **Plan-then-Execute** | Full plan upfront → Execute | Complex multi-step | | **Tree of Thought** | Explore multiple paths | Exploration needed | | **Reflexion** | Self-critique and improve | Quality-critical | See [agent-patterns.md](references/agent-patterns.md) for detailed patterns. ### Decision Framework Define priority order for agent decisions: 1. Safety first 2. User intent 3. Efficiency 4. Quality ### Conversation Design | Aspect | Define | |--------|--------| | **Voice & Tone** | Persona, formality, verbosity | | **Response Patterns** | By scenario (simple, complex, error, out-of-scope) | | **Multi-turn** | Context retention, topic switching, reference resolution | ### System Prompt Specification ⭐ Core Deliverable The System Prompt is the agent's DNA. The PRD must produce a **System Prompt Design Spec** (not the final prompt text, but its design intent). See [system-prompt-design.md](references/system-prompt-design.md). **Required sections in the System Prompt Spec:** | Section | Content | Example | |---------|---------|---------| | **Identity Declaration** | Who the agent is, role, personality | "You are Aria, a senior financial advisor..." | | **Capability Declaration** | What tools/skills are available, when to use each | "You have access to: search_docs, calculate..." | | **Behavioral Instructions** | How to reason, when to ask vs act, output style | "Always explain your reasoning before acting..." | | **Constraint Boundaries** | What the agent must never do | "Never provide medical diagnoses..." | | **Output Format Rules** | Response structure, length, formatting | "Use bullet points for lists of 3+..." | | **Escalation Rules** | When and how to hand off to humans | "If user mentions legal action, transfer to..." | --- ## Phase 4: Example Conversations (Golden Conversations) **Goal:** Define concrete conversation examples that serve as both behavioral spec and acceptance criteria. See [conversation-design.md](references/conversation-design.md) for detailed methodology. ### Why Golden Conversations Matter For Agent products, example conversations are the **most precise behavioral specification**. They are: - Acceptance criteria (does the agent behave like this example?) - Training signals (few-shot examples in the system prompt) - Evaluation dataset (automated quality testing) - Stakeholder alignment tool (shows exactly what "good" looks like) ### Coverage Requirements Design golden conversations for each of these scenario types: | Scenario Type | Count | Purpose | |---------------|-------|---------| | **Happy path** | 2-3 per use case | Shows ideal agent behavior | | **Edge cases** | 1-2 per use case | Shows boundary handling | | **Safety boundaries** | 3-5 total | Shows refusal/escalation | | **Multi-turn complex** | 2-3 total | Shows context management | | **Context switching** | 1-2 total | Shows topic change handling | | **Error recovery** | 2-3 total | Shows tool failure handling | | **Out-of-scope** | 2-3 total | Shows graceful boundary enforcement | ### Conversation Annotation Format Each golden conversation should include: ```markdown ## Conversation: [Scenario Name] **Type:** [happy-path | edge-case | safety | multi-turn | error] **Tests:** [Which capabilities/rules this validates] ### Dialogue User: [input] Agent: [expected response] // Annotation: [Why this response is correct. What rules apply.] User: [follow-up] Agent: [expected response] // Annotation: [Key behavior being demonstrated] ### Unacceptable Alternatives - Agent should NOT: [describe bad behavior] - Agent should NOT: [describe bad behavior] ### Evaluation Criteria - [ ] [Checkable criterion 1] - [ ] [Checkable criterion 2] ``` --- ## Phase 5: Safety & Guardrails **Goal:** Define boundaries, controls, and human oversight. See [safety-checklist.md](references/safety-checklist.md) for comprehensive checklist. ### 5.1 Capability Boundaries | Category | Document | |----------|----------| | **CAN DO** | Authorized actions with conditions | | **CANNOT DO** | Prohibited actions with response | | **MUST ASK** | Actions requiring confirmation | ### 5.2 Human-in-the-Loop Define when humans must intervene: - Approval triggers and workflow - Escalation paths - Override capabilities ### 5.3 Guardrails **Input Guardrails:** - Prompt injection protection - Harmful request detection - Input validation **Output Guardrails:** - Harmful content filtering - PII leakage prevention - Hallucination detection ### 5.4 Error Handling | Error Type | Document | |------------|----------| | Tool failure | Detection, message, recovery | | Knowledge gap | Detection, message, fallback | | Reasoning failure | Detection, restart/escalate | --- ## Phase 6: Evaluation Framework **Goal:** Define how to measure agent quality and success. See [evaluation-rubrics.md](references/evaluation-rubrics.md) for detailed rubrics. ### Core Metrics | Dimension | Metrics | |-----------|---------| | **Task Success** | Completion rate, first-turn resolution | | **Quality** | Accuracy, relevance, completeness | | **Safety** | Harmful response rate, boundary violations | | **Efficiency** | Latency, token usage, cost | | **User Experience** | CSAT, NPS, escalation rate | ### Evaluation Methods | Method | Purpose | Frequency | |--------|---------|-----------| | **Automated Testing** | Regression, benchmarks | Every change | | **Human Evaluation** | Quality assessment | Weekly | | **LLM-as-Judge** | Scalable quality scoring | Continuous | | **Red Team Testing** | Adversarial testing | Quarterly | | **A/B Testing** | Compare variants | As needed | --- ## Phase 7: Operational Model ### 7.1 Cost Model | Component | Document | |-----------|----------| | Per-request costs | LLM tokens, embeddings, tool calls | | Projected costs | By scale (launch, 6 months, 1 year) | | Cost controls | Budgets, alerts, throttling | ### 7.2 Scaling & Iteration - Scaling strategy (horizontal, rate limiting, caching) - Feedback collection mechanisms - Continuous improvement cycle - Version management --- ## Output Structure ``` agent-prd/ ├── AGENT_PRD.md # Main document ├── IDENTITY.md # Agent persona & boundaries ├── USE_CASES.md # Users and use cases ├── SKILLS.md # Skills specification ├── TOOLS.md # Tools specification ├── MEMORY.md # Memory architecture ├── KNOWLEDGE.md # RAG configuration ├── WORKFLOWS.md # Workflow definitions ├── BEHAVIOR.md # Reasoning & conversation ├── SYSTEM_PROMPT_SPEC.md # System prompt design specification ⭐ ├── CONVERSATIONS.md # Golden conversations ⭐ ├── SAFETY.md # Guardrails ├── EVALUATION.md # Metrics & testing ├── EXAMPLES.md # Additional example interactions └── CHECKLIST.md # Completion checklist ``` --- ## Resources **Scripts:** - `scripts/generate_agent_prd_skeleton.sh` - Generate PRD structure **Core References:** - `references/agent-prd-template.md` - Complete PRD template - `references/skills-specification.md` - Skill definition guide - `references/tools-specification.md` - Tool definition guide - `references/memory-patterns.md` - Memory architecture patterns - `references/agent-patterns.md` - Reasoning & architecture patterns - `references/conversation-design.md` - Golden conversation methodology ⭐ - `references/worked-example.md` - **End-to-end worked example** (HelpBot agent) ⭐ **Safety & Evaluation:** - `references/safety-checklist.md` - Safety requirements - `references/evaluation-rubrics.md` - Evaluation frameworks **Advanced Topics:** - `references/multi-agent-design.md` - Multi-agent system design - `references/system-prompt-design.md` - System prompt engineering - `references/multimodal-design.md` - Multi-modal agent design - `references/observability-operations.md` - Monitoring & operations - `references/protocols-standards.md` - MCP, protocols, standards - `references/domain-specific-design.md` - Domain-specific guidance --- ## Extensibility & Future-Proofing This skill is designed to evolve with Agent technology: | Current | Future-Ready | |---------|--------------| | Text I/O | Multimodal (vision, audio, video) | | Single Agent | Multi-Agent orchestration | | Custom tools | Protocol standards (MCP, Agent Protocol) | | Basic metrics | Full observability stack | | Generic | Domain-specific extensions | **Adding new capabilities:** 1. Add reference file in `references/` 2. Update SKILL.md Resources section 3. Extend PRD template if needed --- ## Summary: Agent PRD Principles ``` ┌─────────────────────────────────────────────────────────────────┐ │ 1. DEFINE IDENTITY - Who is this agent? Not just features. │ │ 2. SPECIFY CAPABILITIES - Skills, Tools, Memory, Knowledge. │ │ 3. DESIGN THE PROMPT - System Prompt is the agent's DNA. │ │ 4. SHOW, DON'T TELL - Golden conversations are the spec. │ │ 5. BOUND THE BEHAVIOR - What it CAN'T do matters equally. │ │ 6. EVALUATE CONTINUOUSLY - Define metrics before building. │ │ 7. HUMANS IN THE LOOP - Know when to escalate, always. │ └─────────────────────────────────────────────────────────────────┘ ``` The goal is to **architect cognition**—define how an intelligent system should think, decide, and act within safe boundaries.