--- name: prompt-engeneering description: Universal prompt engineering techniques for any LLM. Use when crafting, optimizing, or reviewing prompts for AI models. Triggers on requests like "improve this prompt", "write a system prompt", "optimize my instructions", "help me prompt engineer", "audit this prompt", "review my prompt", or when building agentic systems that need structured prompts. --- # Prompt Engineering Universal techniques for crafting effective prompts across any LLM. ## Core Principles ### 1. Structure with XML Tags Use XML tags to create clear, parseable prompts: ```xml Background information here 1. First step 2. Second step Sample inputs/outputs Expected structure ``` **Benefits:** - **Clarity**: Separates context, instructions, and examples - **Accuracy**: Prevents model from mixing up sections - **Flexibility**: Easy to modify individual parts - **Parseability**: Enables structured output extraction **Best practices:** - Use consistent tag names throughout (``, not sometimes ``) - Reference tags explicitly: "Using the data in `` tags..." - Nest tags for hierarchy: `...` - Combine with other techniques: `` for chain-of-thought, `` for final output ### 2. Control Output Shape Specify explicit constraints on length, format, and structure: ```xml - Default: 3-6 sentences or ≤5 bullets - Simple yes/no questions: ≤2 sentences - Complex multi-step tasks: - 1 short overview paragraph - ≤5 bullets: What changed, Where, Risks, Next steps, Open questions - Use Markdown with headers, bullets, tables when helpful - Avoid long narrative paragraphs; prefer compact structure ``` ### 3. Prevent Scope Drift Explicitly constrain what the model should NOT do: ```xml - Implement EXACTLY and ONLY what is requested - No extra features, components, or embellishments - If ambiguous, choose the simplest valid interpretation - Do NOT invent values, make assumptions, or add unrequested elements ``` ### 4. Handle Ambiguity Explicitly Prevent hallucinations and overconfidence: ```xml - If the question is ambiguous: - Ask 1-3 precise clarifying questions, OR - Present 2-3 plausible interpretations with labeled assumptions - When facts may have changed: answer in general terms, state uncertainty - Never fabricate exact figures or references when uncertain - Prefer "Based on the provided context..." over absolute claims ``` ### 5. Long-Context Grounding For inputs >10k tokens, add re-grounding instructions: ```xml - First, produce a short internal outline of key sections relevant to the request - Re-state user constraints explicitly before answering - Anchor claims to sections ("In the 'Data Retention' section...") - Quote or paraphrase fine details (dates, thresholds, clauses) ``` ## Agentic Prompts ### Tool Usage Rules ```xml - Prefer tools over internal knowledge for: - Fresh or user-specific data (tickets, orders, configs) - Specific IDs, URLs, or document references - Parallelize independent reads when possible - After write operations, restate: what changed, where, any validation performed ``` ### User Updates ```xml - Send brief updates (1-2 sentences) only when: - Starting a new major phase - Discovering something that changes the plan - Avoid narrating routine operations - Each update must include a concrete outcome ("Found X", "Updated Y") - Do not expand scope beyond what was asked ``` ### Self-Check for High-Risk Outputs ```xml Before finalizing answers in sensitive contexts (legal, financial, safety): - Re-scan for unstated assumptions - Check for ungrounded numbers or claims - Soften overly strong language ("always", "guaranteed") - Explicitly state assumptions ``` ## Structured Extraction For data extraction tasks, always provide a schema: ```xml Extract data into this exact schema (no extra fields): { "field_name": "string", "optional_field": "string | null", "numeric_field": "number | null" } - If a field is not present in source, set to null (don't guess) - Re-scan source for missed fields before returning ``` ## Web Research Prompts ```xml - Browse the web for: time-sensitive topics, recommendations, navigational queries, ambiguous terms - Include citations after paragraphs with web-derived claims - Use multiple sources for key claims; prioritize primary sources - Research until additional searching won't materially change the answer - Structure output with Markdown: headers, bullets, tables for comparisons ``` ## Example: Before/After **Without structure:** ``` You're a financial analyst. Generate a Q2 report for investors. Include Revenue, Margins, Cash Flow. Use this data: {{DATA}}. Make it professional and concise. ``` **With structure:** ```xml You're a financial analyst at AcmeCorp generating a Q2 report for investors. AcmeCorp is a B2B SaaS company. Investors value transparency and actionable insights. {{DATA}} 1. Include sections: Revenue Growth, Profit Margins, Cash Flow 2. Highlight strengths and areas for improvement 3. Use concise, professional tone - Use bullet points with metrics and YoY changes - Include "Action:" items for areas needing improvement - End with 2-3 bullet Outlook section ``` ## Prompt Migration Checklist When adapting prompts across models or versions: 1. **Switch model, keep prompt identical** — isolate the variable 2. **Pin reasoning/thinking depth** to match prior model's profile 3. **Run evals** — if results are good, ship 4. **If regressions, tune prompt** — adjust verbosity/format/scope constraints 5. **Re-eval after each small change** — one change at a time ## Quick Reference | Technique | Tag Pattern | Use Case | |-----------|-------------|----------| | Separate sections | ``, ``, `` | Any complex prompt | | Control length | `` with word/bullet limits | Prevent verbosity | | Prevent drift | `` with explicit "do NOT" | Feature creep | | Handle uncertainty | `` | Factual queries | | Chain of thought | ``, `` | Reasoning tasks | | Extraction | `` with JSON structure | Data parsing | | Research | `` | Web-enabled agents | | Self-check | `` | High-risk domains | | Tool usage | `` | Agentic systems | | Eagerness control | ``, `` | Agent autonomy | | Persona | `` + behavioral constraints | Tone & style | ## Prompting Techniques Catalog Comprehensive catalog of prompting techniques. Full details, examples, and academic references in [references/prompting-techniques.md](references/prompting-techniques.md). | Technique | Use Case | |-----------|----------| | **Zero-Shot Prompting** | Direct task execution without examples; classification, translation, summarization | | **Few-Shot Prompting** | In-context learning via exemplars; format control, label calibration, style matching | | **Chain-of-Thought (CoT)** | Step-by-step reasoning; arithmetic, logic, commonsense reasoning tasks | | **Meta Prompting** | LLM as orchestrator delegating to specialized expert prompts; complex multi-domain tasks | | **Self-Consistency** | Sample multiple CoT paths, pick majority answer; boost accuracy on math & reasoning | | **Generated Knowledge** | Generate relevant knowledge first, then answer; commonsense & factual QA | | **Prompt Chaining** | Break complex tasks into sequential subtasks; document analysis, multi-step workflows | | **Tree of Thoughts (ToT)** | Explore multiple reasoning branches with lookahead/backtracking; planning, puzzles | | **RAG** | Retrieve external documents before generating; knowledge-intensive tasks, fresh data | | **ART (Auto Reasoning + Tools)** | Auto-select and orchestrate tools with CoT; tasks requiring calculation, search, APIs | | **APE (Auto Prompt Engineer)** | LLM generates and scores candidate prompts; prompt optimization at scale | | **Active-Prompt** | Identify uncertain examples, annotate selectively for CoT; adaptive few-shot | | **Directional Stimulus** | Add a hint/keyword to guide generation direction; summarization, dialogue | | **PAL (Program-Aided LM)** | Generate code instead of text for reasoning; math, data manipulation, symbolic tasks | | **ReAct** | Interleave reasoning traces with tool actions; search, QA, decision-making agents | | **Reflexion** | Agent self-reflects on failures with verbal feedback; iterative improvement, debugging | | **Multimodal CoT** | Two-stage: rationale generation then answer with text+image; visual reasoning tasks | | **Graph Prompting** | Structured graph-based prompts; node classification, relation extraction, graph tasks | ### Prompting Fundamentals LLM settings, prompt elements, formatting, and practical examples — see [references/prompting-introduction.md](references/prompting-introduction.md). Covers: - **LLM Settings** — temperature, top-p, max length, stop sequences, frequency/presence penalties - **Prompt Elements** — instruction, context, input data, output indicator - **Design Tips** — start simple, be specific, avoid impreciseness, say what TO do (not what NOT to do) - **Task Examples** — summarization, extraction, QA, classification, conversation, code generation, reasoning ### Risks & Misuses Adversarial attacks, factuality issues, and bias mitigation — see [references/prompting-risks.md](references/prompting-risks.md). Covers: - **Adversarial Prompting** — prompt injection, prompt leaking, jailbreaking (DAN, Waluigi Effect), defense tactics - **Factuality** — ground truth grounding, calibrated confidence, admit-ignorance patterns - **Biases** — exemplar distribution skew, exemplar ordering effects, balanced few-shot design ## Prompt Audit / Review When asked to audit, review, or improve a prompt, follow this workflow. Full checklist with per-check references: [prompt-audit-checklist.md](references/prompt-audit-checklist.md). ### Workflow 1. **Read the prompt fully** — identify its purpose, target model, and deployment context (interactive chat, agentic system, batch pipeline, RAG-augmented) 2. **Walk 8 dimensions** — check each, note issues with severity (Critical / Warning / Suggestion): | # | Dimension | What to Check | |---|-----------|---------------| | 1 | **Clarity & Specificity** | Task definition, success criteria, audience, output format, conflicting constraints | | 2 | **Structure & Formatting** | Section separation (XML tags), prompt smells (monolithic, mixed layers, negative bias) | | 3 | **Safety & Security** | Control/data separation, secrets in prompt, injection resilience, tool permissions | | 4 | **Hallucination & Factuality** | Role framing, grounding, citation-without-sources, uncertainty handling | | 5 | **Context Management** | Info placement (not buried in middle), context size, RAG doc count, re-grounding | | 6 | **Maintainability & Debt** | Hardcoded values, regenerated logic, model pinning, testability | | 7 | **Model-Specific Fit** | Model-specific params and gotchas (see Model-Specific Guides below) | | 8 | **Evaluation Readiness** | Eval criteria, adversarial test cases, schema enforcement, monitoring | 3. **Produce a report** — issues table (dimension, check, severity, issue, fix) + rewritten prompt or targeted fix suggestions. Use the report template from the checklist reference. 4. **For each issue**, cite the relevant reference file so the user can dive deeper. ### Quick Decision: Which Dimensions to Prioritize - **User-facing chatbot** → prioritize Safety (#3), Hallucination (#4), Clarity (#1) - **Agentic system with tools** → prioritize Safety (#3), Context (#5), Maintainability (#6) - **Batch/pipeline** → prioritize Structure (#2), Evaluation (#8), Maintainability (#6) - **RAG-augmented** → prioritize Context (#5), Safety (#3), Hallucination (#4) ## Common Mistakes & Anti-Patterns Three complementary layers — use the one matching your need: **Deep-dives by category** — root causes, mechanisms, prevention checklists (from "The Architecture of Instruction", 2026): | Mistake Category | Key Issues | Reference | |-----------------|------------|-----------| | **Hallucinations & Logic** | Ambiguity-induced confabulation, automation bias, overloaded prompts, logical failures in verification tasks, no role framing | [mistakes-hallucinations.md](references/mistakes-hallucinations.md) | | **Structural Fragility** | Formatting sensitivity (up to 76pp variance), reproducibility crisis, prompt smells catalog (6 anti-patterns), deliberation ladder | [mistakes-structure.md](references/mistakes-structure.md) | | **Context Rot** | "Lost in the middle" U-shaped attention, RAG over-retrieval, naive data loading, context engineering shift | [mistakes-context.md](references/mistakes-context.md) | | **Prompt Debt** | Token tax of regenerative code, debt taxonomy (prompt/hyperparameter/framework/cost), multi-agent solutions, automated repair | [mistakes-debt.md](references/mistakes-debt.md) | | **Security** | Direct/indirect injection, jailbreaking, system prompt leakage (OWASP LLM07:2025), RAG poisoning, multimodal injection, adversarial suffixes | [mistakes-security.md](references/mistakes-security.md) | **Quick reference** — 18-category taxonomy with MRPs, risk scores, case studies, action items: [failure-taxonomy.md](references/failure-taxonomy.md). Start here for an overview or to prioritize which categories to address first. Covers: control-plane vs data-plane model, heuristic risk scoring, real-world incidents (EchoLeak CVE-2025-32711, Mata v. Avianca, Samsung shadow AI). **How to measure & test** — eval metrics, CI gating, red-teaming, tooling: [evaluation-redteaming.md](references/evaluation-redteaming.md). Covers: TruthfulQA, FActScore, SelfCheckGPT, PromptBench, AILuminate, LLM-as-judge pitfalls, guardrail libraries, open research questions. ## Model-Specific Guides Each model family has unique parameters, gotchas, and patterns. Consult the reference for your target model: - **[Claude Family](references/claude-family-prompting.md)** — Opus/Sonnet 4.6: adaptive thinking (`effort` param), prefill deprecation (use Structured Outputs), tool overtriggering fix, prompt caching, citations, context engineering, agentic subagent patterns, vision, migration from 4.5 - **[GPT-5 Family](references/gpt5-family-prompting.md)** — GPT-5/5.1/5.2: `reasoning_effort` param (defaults vary per version), `verbosity` API control, named tools (`apply_patch`), agentic eagerness templates, compaction API, instruction conflict sensitivity, migration paths - **[Gemini 3 Family](references/gemini3-family-prompting.md)** — Gemini 2.5/3/3.1: temperature MUST be 1.0, `thinking_budget` vs `thinking_level`, constraint placement (end of prompt), persona priority, function calling, structured output, multimodal, image generation - **[GPT-5.2 Specifics](references/gpt5-prompting-guide.md)** — Compaction API code examples, web research agent prompt, full XML specification blocks