--- name: ai-prompt-engineering description: "Operational prompt engineering for production LLM apps: structured outputs (JSON/schema), deterministic extractors, RAG grounding/citations, tool/agent workflows, prompt safety (injection/exfiltration), and prompt evaluation/regression testing. Use when designing, debugging, or standardizing prompts for Codex CLI, Claude Code, and OpenAI/Anthropic/Gemini APIs." --- # Prompt Engineering — Operational Skill **Modern Best Practices (January 2026)**: versioned prompts, explicit output contracts, regression tests, and safety threat modeling for tool/RAG prompts (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/). This skill provides **operational guidance** for building production-ready prompts across standard tasks, RAG workflows, agent orchestration, structured outputs, hidden reasoning, and multi-step planning. All content is **operational**, not theoretical. Focus on patterns, checklists, and copy-paste templates. ## Quick Start (60 seconds) 1. Pick a pattern from the decision tree (structured output, extractor, RAG, tools/agent, rewrite, classification). 2. Start from a template in `assets/` and fill in `TASK`, `INPUT`, `RULES`, and `OUTPUT FORMAT`. 3. Add guardrails: instruction/data separation, “no invented details”, missing → `null`/explicit missing. 4. Add validation: JSON parse check, schema check, citations check, post-tool checks. 5. Add evals: 10–20 cases while iterating, 50–200 before release, plus adversarial injection cases. ## Model Notes (2026) This skill includes Claude Code + Codex CLI optimizations: - **Action directives**: Frame for implementation, not suggestions - **Parallel tool execution**: Independent tool calls can run simultaneously - **Long-horizon task management**: State tracking, incremental progress, context compaction resilience - **Positive framing**: Describe desired behavior rather than prohibitions - **Style matching**: Prompt formatting influences output style - **Domain-specific patterns**: Specialized guidance for frontend, research, and agentic coding - **Style-adversarial resilience**: Stress-test refusals with poetic/role-play rewrites; normalize or decline stylized harmful asks before tool use Prefer “brief justification” over requesting chain-of-thought. When using private reasoning patterns, instruct: think internally; output only the final answer. ## Quick Reference | Task | Pattern to Use | Key Components | When to Use | |------|----------------|----------------|-------------| | **Machine-parseable output** | Structured Output | JSON schema, "JSON-only" directive, no prose | API integrations, data extraction | | **Field extraction** | Deterministic Extractor | Exact schema, missing->null, no transformations | Form data, invoice parsing | | **Use retrieved context** | RAG Workflow | Context relevance check, chunk citations, explicit missing info | Knowledge bases, documentation search | | **Internal reasoning** | Hidden Chain-of-Thought | Internal reasoning, final answer only | Classification, complex decisions | | **Tool-using agent** | Tool/Agent Planner | Plan-then-act, one tool per turn | Multi-step workflows, API calls | | **Text transformation** | Rewrite + Constrain | Style rules, meaning preservation, format spec | Content adaptation, summarization | | **Classification** | Decision Tree | Ordered branches, mutually exclusive, JSON result | Routing, categorization, triage | --- ## Decision Tree: Choosing the Right Pattern ```text User needs: [Prompt Type] |-- Output must be machine-readable? | |-- Extract specific fields only? -> **Deterministic Extractor Pattern** | `-- Generate structured data? -> **Structured Output Pattern (JSON)** | |-- Use external knowledge? | `-- Retrieved context must be cited? -> **RAG Workflow Pattern** | |-- Requires reasoning but hide process? | `-- Classification or decision task? -> **Hidden Chain-of-Thought Pattern** | |-- Needs to call external tools/APIs? | `-- Multi-step workflow? -> **Tool/Agent Planner Pattern** | |-- Transform existing text? | `-- Style/format constraints? -> **Rewrite + Constrain Pattern** | `-- Classify or route to categories? `-- Mutually exclusive rules? -> **Decision Tree Pattern** ``` --- ## Copy/Paste: Minimal Prompt Skeletons ### 1) Generic "output contract" skeleton ```text TASK: {{one_sentence_task}} INPUT: {{input_data}} RULES: - Follow TASK exactly. - Use only INPUT (and tool outputs if tools are allowed). - No invented details. Missing required info -> say what is missing. - Keep reasoning hidden. - Follow OUTPUT FORMAT exactly. OUTPUT FORMAT: {{schema_or_format_spec}} ``` ### 2) Tool/agent skeleton (deterministic) ```text AVAILABLE TOOLS: {{tool_signatures_or_names}} WORKFLOW: - Make a short plan. - Call tools only when required to complete the task. - Validate tool outputs before using them. - If the environment supports parallel tool calls, run independent calls in parallel. ``` ### 3) RAG skeleton (grounded) ```text RETRIEVED CONTEXT: {{chunks_with_ids}} RULES: - Use only retrieved context for factual claims. - Cite chunk ids for each claim. - If evidence is missing, say what is missing. ``` --- ## Operational Checklists Use these references when validating or debugging prompts: - `frameworks/shared-skills/skills/ai-prompt-engineering/references/quality-checklists.md` - `frameworks/shared-skills/skills/ai-prompt-engineering/references/production-guidelines.md` ## Context Engineering (2026) True expertise in prompting extends beyond writing instructions to shaping the entire context in which the model operates. Context engineering encompasses: - **Conversation history**: What prior turns inform the current response - **Retrieved context (RAG)**: External knowledge injected into the prompt - **Structured inputs**: JSON schemas, system/user message separation - **Tool outputs**: Results from previous tool calls that shape next steps ### Context Engineering vs Prompt Engineering | Aspect | Prompt Engineering | Context Engineering | |--------|-------------------|---------------------| | Focus | Instruction text | Full input pipeline | | Scope | Single prompt | RAG + history + tools | | Optimization | Word choice, structure | Information architecture | | Goal | Clear instructions | Optimal context window | ### Key Context Engineering Patterns **1. Context Prioritization**: Place most relevant information first; models attend more strongly to early context. **2. Context Compression**: Summarize history, truncate tool outputs, select most relevant RAG chunks. **3. Context Separation**: Use clear delimiters (``, ``, ``) to separate instruction types. **4. Dynamic Context**: Adjust context based on task complexity - simple tasks need less context, complex tasks need more. --- ## Core Concepts vs Implementation Practices ### Core Concepts (Vendor-Agnostic) - **Prompt contract**: inputs, allowed tools, output schema, max tokens, and refusal rules. - **Context engineering**: conversation history, RAG context, tool outputs, and structured inputs shape model behavior. - **Determinism controls**: temperature/top_p, constrained decoding/structured outputs, and strict formatting. - **Cost & latency budgets**: prompt length and max output drive tokens and tail latency; enforce hard limits and measure p95/p99. - **Evaluation**: golden sets + regression gates + A/B + post-deploy monitoring. - **Security**: prompt injection, data exfiltration, and tool misuse are primary threats (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/). ### Implementation Practices (Model/Platform-Specific) - Use model-specific structured output features when available; keep a schema validator as the source of truth. - Align tracing/metrics with OpenTelemetry GenAI semantic conventions (https://opentelemetry.io/docs/specs/semconv/gen-ai/). ## Do / Avoid **Do** - Do keep prompts small and modular; centralize shared fragments (policies, schemas, style). - Do add a prompt eval harness and block merges on regressions. - Do prefer "brief justification" over requesting chain-of-thought; treat hidden reasoning as model-internal. **Avoid** - Avoid prompt sprawl (many near-duplicates with no owner or tests). - Avoid brittle multi-step chains without intermediate validation. - Avoid mixing policy and product copy in the same prompt (harder to audit and update). ## Navigation: Core Patterns - **[Core Patterns](references/core-patterns.md)** - 7 production-grade prompt patterns - Structured Output (JSON), Deterministic Extractor, RAG Workflow - Hidden Chain-of-Thought, Tool/Agent Planner, Rewrite + Constrain, Decision Tree - Each pattern includes structure template and validation checklist ## Navigation: Best Practices - **[Best Practices (Core)](references/best-practices-core.md)** - Foundation rules for production-grade prompts - System instruction design, output contract specification, action directives - Context handling, error recovery, positive framing, style matching, style-adversarial red teaming - Anti-patterns, Claude 4+ specific optimizations - **[Production Guidelines](references/production-guidelines.md)** - Deployment and operational guidance - Evaluation & testing (Prompt CI/CD), model parameters, few-shot selection - Safety & guardrails, conversation memory, context compaction resilience - Answer engineering, decomposition, multilingual/multimodal, benchmarking - **CI/CD Tools** (2026): Promptfoo, DeepEval integration patterns - **Security** (2026): PromptGuard 4-layer defense, Microsoft Prompt Shields, taint tracking - **[Quality Checklists](references/quality-checklists.md)** - Validation checklists before deployment - Prompt QA, JSON validation, agent workflow checks - RAG workflow, safety & security, performance optimization - Testing coverage, anti-patterns, quality score rubric - **[Domain-Specific Patterns](references/domain-specific-patterns.md)** - Claude 4+ optimized patterns for specialized domains - Frontend/visual code: Creativity encouragement, design variations, micro-interactions - Research tasks: Success criteria, verification, hypothesis tracking - Agentic coding: No speculation rule, principled implementation, investigation patterns - Cross-domain best practices and quality modifiers ## Navigation: Specialized Patterns - **[RAG Patterns](references/rag-patterns.md)** - Retrieval-augmented generation workflows - Context grounding, chunk citation, missing information handling - **[Agent and Tool Patterns](references/agent-patterns.md)** - Tool use and agent orchestration - Plan-then-act workflows, tool calling, multi-step reasoning, generate-verify-revise chains - **Multi-Agent Orchestration** (2026): centralized, handoff, federated patterns; plan-and-execute (90% cost reduction) - **[Extraction Patterns](references/extraction-patterns.md)** - Deterministic field extraction - Schema-based extraction, null handling, no hallucinations - **[Reasoning Patterns (Hidden CoT)](references/reasoning-patterns.md)** - Internal reasoning without visible output - Hidden reasoning, final answer only, classification workflows - **Extended Thinking API** (Claude 4+): budget management, think tool, multishot patterns - **[Additional Patterns](references/additional-patterns.md)** - Extended prompt engineering techniques - Advanced patterns, edge cases, optimization strategies --- ## Navigation: Templates Templates are copy-paste ready and organized by complexity: ### Quick Templates - **[Quick Template](assets/quick/template-quick.md)** - Fast, minimal prompt structure ### Standard Templates - **[Standard Template](assets/standard/template-standard.md)** - Production-grade operational prompt - **[Agent Template](assets/standard/template-agent.md)** - Tool-using agent with planning - **[RAG Template](assets/standard/template-rag.md)** - Retrieval-augmented generation - **[Chain-of-Thought Template](assets/standard/template-cot.md)** - Hidden reasoning pattern - **[JSON Extractor Template](assets/standard/template-json-extractor.md)** - Deterministic field extraction - **[Prompt Evaluation Template](assets/eval/prompt-eval-template.md)** - Regression tests, A/B testing, rollout gates --- ## External Resources External references are listed in [data/sources.json](data/sources.json): - Official documentation (OpenAI, Anthropic, Google) - LLM frameworks (LangChain, LlamaIndex) - Vector databases (Pinecone, Weaviate, FAISS) - Evaluation tools (OpenAI Evals, HELM) - Safety guides and standards - RAG and retrieval resources --- ## Freshness Rule (2026) When asked for “latest” prompting recommendations, prefer provider docs and standards from `data/sources.json`. If web search is unavailable, state the constraint and avoid overconfident “current best” claims. --- ## Related Skills This skill provides foundational prompt engineering patterns. For specialized implementations: **AI/LLM Skills**: - [AI Agents Development](../ai-agents/SKILL.md) - Production agent patterns, MCP integration, orchestration - [AI LLM Engineering](../ai-llm/SKILL.md) - LLM application architecture and deployment - [AI LLM RAG Engineering](../ai-rag/SKILL.md) - Advanced RAG pipelines and chunking strategies - [AI LLM Search & Retrieval](../ai-rag/SKILL.md) - Search optimization, hybrid retrieval, reranking - [AI LLM Development](../ai-llm/SKILL.md) - Fine-tuning, evaluation, dataset creation **Software Development Skills**: - [Software Architecture Design](../software-architecture-design/SKILL.md) - System design patterns - [Software Backend](../software-backend/SKILL.md) - Backend implementation - [Foundation API Design](../dev-api-design/SKILL.md) - API design and contracts --- ## Usage Notes **For Claude Code**: - Reference this skill when building prompts for agents, commands, or integrations - Use Quick Reference table for fast pattern lookup - Follow Decision Tree to select appropriate pattern - Validate outputs with Quality Checklists before deployment - Use templates as starting points, customize for specific use cases **For Codex CLI**: - Use the same patterns and templates; adapt tool-use wording to the local tool interface - For long-horizon tasks, track progress explicitly (a step list/plan) and update it as work completes - Run independent reads/searches in parallel when the environment supports it; keep writes/edits serialized - **AGENTS.md Integration**: Place project-specific prompt guidance in AGENTS.md files at global (~/.codex/AGENTS.md), project-level (./AGENTS.md), or subdirectory scope for layered instructions - **Reasoning Effort**: Use `medium` for interactive coding (default), `high`/`xhigh` for complex autonomous multi-hour tasks