--- name: gpt-5-4-prompting description: Apply when creating or editing prompts targeting GPT-5.4. Covers output contracts, scope discipline, reasoning_effort tuning, long-context handling, ambiguity management, tool persistence, completeness and verification contracts, structured extraction, research mode with citation discipline, personality controls, coding autonomy, compaction awareness, and migration from older GPT models. --- # GPT-5.4 Prompt Writing Guidelines ## When to Use - Creating or editing system prompts targeting GPT-5.4 - Writing prompts for long-context inputs (10k+ tokens) - Writing agentic prompts with multi-step tool workflows - Structuring extraction prompts for documents, PDFs, tables, and emails - Managing verbosity, scope, and personality in GPT-5.4 outputs - Writing tool descriptions and tool-calling instructions for GPT-5.4 - Writing research and synthesis agent prompts - Migrating prompts from GPT-4o, GPT-4.1, GPT-5, GPT-5.1, GPT-5.2, or GPT-5.3-Codex ## Overview GPT-5.4 is OpenAI's current frontier model. Compared to GPT-5.2, it delivers stronger personality and tone adherence (less drift on long answers), more disciplined multi-step execution, better long-context coherence, and more accurate batched/parallel tool calls. It remains prompt-sensitive and responds well to structured constraints and explicit output specifications. Key behavioral characteristics to design prompts around: - **Stronger Instruction Adherence**: More reliable with explicit output contracts; less drift from user intent - **Improved Long-Context Coherence**: Stays coherent over longer, multi-turn conversations; fewer breakdowns as sessions grow - **Agentic Discipline**: More reliably completes multi-step work and retries; still benefits from explicit persistence rules - **Tone and Style Stability**: Holds persona and register across long outputs without drifting - **Conservative Grounding Bias**: Favors correctness and explicit reasoning; research mode needs citation discipline to stay grounded - **Token Efficiency**: More concise by default, though still responsive to length specs ## Output Contract GPT-5.4 defaults to concise output and respects section/format contracts well. Use an explicit output contract rather than loose "be concise" instructions: ``` - Return exactly the sections requested, in the requested order. - If the prompt defines a preamble, analysis block, or working section, do not treat it as extra output. - Apply length limits only to the section they are intended for. - If a format is required (JSON, Markdown, SQL, XML), output only that format. ``` For quantitative length control, combine the contract with explicit limits per section: ``` - Default: 3-6 sentences or <=5 bullets for typical answers. - Simple "yes/no + short explanation" questions: <=2 sentences. - Complex multi-step or multi-file tasks: - 1 short overview paragraph - then <=5 bullets tagged: What changed, Where, Risks, Next steps, Open questions. - Prefer compact bullets and short sections over long narrative paragraphs. - Do not rephrase the user's request unless it changes semantics. ``` Quantitative constraints ("3-6 sentences") outperform qualitative ones ("be concise"). ## Scope and Design Constraints GPT-5.4 is stronger at structured code but may still produce more than minimal specs require. Use scope constraints adapted to the task domain to prevent over-engineering. For code generation and UI tasks: ``` - Explore existing design systems deeply. - Implement EXACTLY and ONLY what the user requests. - No extra features, no added components, no UX embellishments. - Style aligned to the design system at hand. - Do NOT invent colors, shadows, tokens, animations, or new UI elements, unless requested. - If any instruction is ambiguous, choose the simplest valid interpretation. ``` For extraction and data tasks, scope is schema adherence: ``` - Follow this schema exactly (no extra fields). - If a field is not present in the source, set it to null rather than guessing. ``` For research and synthesis tasks, scope is query coverage -- cover plausible user intents rather than expanding into tangential topics. ## Long-Context Handling GPT-5.4 remains more coherent over long sessions, but inputs exceeding ~10k tokens still benefit from explicit grounding instructions: ``` - For inputs longer than ~10k tokens (multi-chapter docs, long threads, multiple PDFs): - First, produce a short internal outline of key sections relevant to the request. - Re-state the user's constraints explicitly before answering. - In your answer, anchor claims to sections rather than speaking generically. - If the answer depends on fine details (dates, thresholds, clauses), quote or paraphrase them. ``` ## Ambiguity, Uncertainty, and Follow-Through GPT-5.4 handles ambiguity better by default. Set explicit follow-through policy to avoid unnecessary clarifying questions: ``` - If the request is clear and low-risk, proceed without asking permission. - Ask before: irreversible actions, external side effects, or operations requiring sensitive info not yet provided. - Preserve earlier instructions that do not conflict with newer ones. ``` For genuine ambiguity and hallucination mitigation: ``` - If the question is ambiguous or underspecified, explicitly call this out and: - Ask up to 1-3 precise clarifying questions, OR - Present 2-3 plausible interpretations with clearly labeled assumptions. - When external facts may have changed recently and no tools are available: - Answer in general terms and state that details may have changed. - Never fabricate exact figures, line numbers, or external references when uncertain. - When unsure, prefer language like "Based on the provided context..." instead of absolute claims. ``` For high-stakes domains, add a self-check step: ``` Before finalizing an answer in legal, financial, compliance, or safety-sensitive contexts: - Briefly re-scan your own answer for: - Unstated assumptions, - Specific numbers or claims not grounded in context, - Overly strong language ("always," "guaranteed," etc.). - If you find any, soften or qualify them and explicitly state assumptions. ``` ## Reasoning Effort Tuning GPT-5.4 supports `reasoning_effort` (`none` | `minimal` | `low` | `medium` | `high` | `xhigh`). It defaults to `none` for fast, low-deliberation behavior. **Engineer the prompt before bumping reasoning.** Before increasing `reasoning_effort`, first add: - Completeness contract - Verification loop - Tool persistence rules These often restore behavior without added latency/cost. Only bump reasoning when evals show a gap the prompt can't close. **Recommended defaults by task:** | Task profile | reasoning_effort | |---|---| | Fast, cost-sensitive; no thinking needed | `none` | | Latency-sensitive with instruction complexity | `low` | | Stronger reasoning required; measure perf gain | `medium` or `high` | | Long-horizon agents, research-heavy workflows | `medium` or `high` | | Hardest analytical work where intelligence >> speed/cost | `xhigh` (reserve; do not default) | ### Migration Mapping When migrating prompts across models, use the appropriate `reasoning_effort` to preserve behavior: | Current Model | Target | reasoning_effort | Notes | |---|---|---|---| | GPT-4o | GPT-5.4 | `none` | Treat as "fast/low-deliberation" by default | | GPT-4.1 | GPT-5.4 | `none` | Same as GPT-4o for snappy behavior | | GPT-5 / 5.1 / 5.2 | GPT-5.4 | match current | Preserve prior setting | | GPT-5.3-Codex | GPT-5.4 | match current | Preserves coding behavior | ## Tool Persistence For agentic prompts, instruct the model not to stop early and to retry on empty or partial results: ``` - Use tools whenever they materially improve correctness, completeness, or grounding. - Do not stop early when another tool call is likely to materially improve correctness or completeness. - Keep calling tools until: (1) the task is complete, and (2) verification passes. - If a tool returns empty or partial results, retry with a different strategy. ``` ## Completeness Contract For lists, batches, and paginated work, require an internal checklist and explicit `[blocked]` markers: ``` - Treat the task as incomplete until all requested items are covered or explicitly marked [blocked]. - Keep an internal checklist of required deliverables. - For lists, batches, or paginated results: determine expected scope when possible, track processed items or pages, confirm coverage before finalizing. - If any item is blocked by missing data, mark it [blocked] and state exactly what is missing. ``` ## Verification Loop For high-stakes outputs, require a pre-finalization check: ``` Before finalizing: - Check correctness: does the output satisfy every requirement? - Check grounding: are factual claims backed by provided context or tool outputs? - Check formatting: does the output match the requested schema or style? - Check safety and irreversibility: if the next step has external side effects, ask permission first. ``` ## Empty Result Recovery Prevent premature "no results" conclusions: ``` If a lookup returns empty, partial, or suspiciously narrow results: - do not immediately conclude that no results exist, - try at least one or two fallback strategies, such as: - alternate query wording, - broader filters, - a prerequisite lookup, - or an alternate source or tool, - Only then report that no results were found, along with what you tried. ``` ## Agentic Steerability Clamp update verbosity and scope discipline: ``` - Send brief updates (1-2 sentences) only when: - You start a new major phase of work, or - You discover something that changes the plan. - Avoid narrating routine tool calls ("reading file...", "running tests..."). - Each update must include at least one concrete outcome ("Found X", "Confirmed Y", "Updated Z"). - Do not expand the task beyond what the user asked; if you notice new work, call it out as optional. ``` ## Tool-Calling Prompt Patterns ### Usage Rules ``` - Prefer tools over internal knowledge whenever: - You need fresh or user-specific data (tickets, orders, configs, logs). - You reference specific IDs, URLs, or document titles. - Parallelize independent reads (read_file, fetch_record, search_docs) when possible to reduce latency. - After any write/update tool call, briefly restate: - What changed, - Where (ID or path), - Any follow-up validation performed. ``` ### Tool Description Guidance - **Describe tools crisply** in 1-2 sentences. Verbose descriptions waste context without improving selection accuracy. - **Encourage parallelism** for codebases, vector stores, and multi-entity operations. Mark tools as independent when they are. - **Require verification** for high-impact operations (orders, billing, infrastructure). Add the verification step in the tool description or system prompt. ## Structured Output (JSON / SQL) For strict format outputs: ``` - Output only the requested format. - Do not add prose or markdown fences unless requested. - Validate that parentheses and brackets are balanced. - Do not invent tables or fields. - If required schema information is missing, ask for it or return an explicit error object. ``` ## Structured Extraction For extracting structured data from tables, PDFs, emails, and documents: ``` You will extract structured data from tables/PDFs/emails into JSON. - Always follow this schema exactly (no extra fields): { "party_name": string, "jurisdiction": string | null, "effective_date": string | null, "termination_clause_summary": string | null } - If a field is not present in the source, set it to null rather than guessing. - Before returning, quickly re-scan the source for any missed fields and correct omissions. ``` For multi-table/multi-file extraction, serialize per-document results separately and include a stable ID (filename, contract title, page range). ### Bounding-Box Extraction For layout-aware extraction with coordinates: ``` - Use the specified coordinate format exactly (e.g., [x1,y1,x2,y2] normalized 0..1). - For each bbox, include: page, label, text snippet, confidence. - Add a vertical-drift sanity check: ensure bboxes align with the line of text (not shifted up or down). - If dense layout, process page by page and do a second pass for missed items. ``` ## Personality and Writing Controls GPT-5.4 holds persona well across long outputs. Separate persistent personality from per-response controls: ``` - Persona: - Channel: - Emotional register: + "not " - Formatting: - Length: - Default follow-through: if the request is clear and low-risk, proceed without asking permission. ``` For polished professional writing: ``` - Write in a polished, professional memo style. - Use exact names, dates, entities, and authorities when supported by record. - Follow domain-specific structure if one is requested. - Prefer precise conclusions over generic hedging. - When uncertainty is real, tie it to the exact missing fact or conflicting source. - Synthesize across documents rather than summarizing each one independently. ``` ## Coding Autonomy For coding tasks, require end-to-end persistence within the turn: ``` - Persist until the task is fully handled end-to-end within the current turn whenever feasible. - Do not stop at analysis or partial fixes; carry changes through implementation, verification, and a clear explanation of outcomes. - Stop only if the user explicitly pauses or redirects. ``` Pair with a terse update spec: 1 sentence on outcome + 1 sentence on next step; no routine tool narration. ## Research Mode For multi-source research, use a 3-pass structure: ``` - Do research in 3 passes: 1) Plan: list 3-6 sub-questions to answer. 2) Retrieve: search each sub-question and follow 1-2 second-order leads. 3) Synthesize: resolve contradictions and write the final answer with citations. - Stop only when more searching is unlikely to change the conclusion. ``` ### Citation Discipline Lock citations to retrieved sources to prevent fabrication: ``` - Only cite sources retrieved in the current workflow. - Never fabricate citations, URLs, IDs, or quote spans. - Use exactly the citation format required by the host application. - Attach citations to the specific claims they support, not only at the end. ``` ### Web Research Template ``` - Act as an expert research assistant; default to comprehensive, well-structured answers. - Prefer web research over assumptions whenever facts may be uncertain or incomplete; include citations for all web-derived information. - Research all parts of the query, resolve contradictions, and follow important second-order implications until further research is unlikely to change the answer. - Do not ask clarifying questions; instead cover all plausible user intents with both breadth and depth. - Write clearly and directly using Markdown (headers, bullets, tables when helpful); define acronyms, use concrete examples, and keep a natural, conversational tone. ``` ### Web Research Agent Reference Prompt The official guide provides this structure for a comprehensive web research agent: - **CORE MISSION**: Answer fully and helpfully with enough evidence for skeptical readers. Never invent facts. Go one step further by adding high-value adjacent material. - **PERSONA**: Be the world's greatest research assistant. Engage warmly while avoiding ungrounded flattery. Default to natural, conversational tone. - **FACTUALITY AND ACCURACY**: Browse the web and include citations for all non-creative queries. Always browse for latest/current topics, time-sensitive info, recommendations, navigational queries, or ambiguous terms. - **CITATIONS**: Include citations after paragraphs containing non-obvious web-derived claims. Use multiple sources for key claims, prioritizing primary sources. - **HOW YOU RESEARCH**: Conduct deep research. Use parallel searches when helpful. Research until additional searching is unlikely to materially change the answer. - **WRITING GUIDELINES**: Be direct and comprehensive. Use simple language. Use readable Markdown formatting. Do not add potential follow-up questions unless explicitly asked. - **REQUIRED VALUE-ADD**: Provide concrete examples, specific numbers/dates, and "how it works" detail. Include relevant, well-researched material that makes answers more useful. - **HANDLING AMBIGUITY**: Never ask clarifying questions unless explicitly requested. State your best-guess interpretation and comprehensively cover the most likely intent(s). - **IF YOU CANNOT FULLY COMPLY**: Don't lead with blunt refusal. First deliver what you can, then clearly state limitations. ## Mid-Conversation Task Updates When changing scope mid-conversation, scope the update explicitly so earlier instructions survive: ``` For the next response only: [specific change] All earlier instructions still apply unless they conflict with this update. ``` ## Compaction Awareness GPT-5.4 is more coherent over long sessions, but compaction still helps tool-heavy workflows. When writing prompts for systems that use compaction: - **Keep prompts functionally identical when resuming** -- do not change system prompts between compacted sessions, as this causes behavior drift - **Design prompts to be self-contained** -- compacted items are opaque; the system prompt is the only stable anchor across compaction boundaries - **Compact after major milestones** (not every turn) -- frequent compaction loses nuance ## Prompt Migration Guide When migrating prompts to GPT-5.4, follow these steps to isolate changes and preserve behavior: 1. **Switch models without prompt changes** -- test the model alone to isolate model-vs-prompt effects. 2. **Pin `reasoning_effort`** -- explicitly match the prior model's latency/depth profile (see migration mapping above). 3. **Run evals for baseline** -- measure post-switch performance before touching prompts. 4. **Add output contract + verbosity controls** if outputs are verbose or unfocused. 5. **Add tool persistence + completeness contract** for agent prompts. 6. **Add verification loop** for high-stakes outputs. 7. **Add citation rules + empty-result recovery** for research prompts. 8. **Tune `reasoning_effort` only after prompt engineering is exhausted.** Change one thing at a time and re-run evals. ### Prompt-Specific Migration Notes **From GPT-4o / GPT-4.1:** - Remove defensive prompting (GPT-5.4 handles edge cases better by default) - Add output contract if outputs are too long or too short - Add scope constraints for code generation tasks **From GPT-5 / GPT-5.1 / GPT-5.2:** - Test without prompt changes first -- isolate model-vs-prompt effects - Replace older `` wording with `` where helpful - Add tool persistence, completeness, and verification templates for agentic flows - Long-context handling is still useful but less critical given improved coherence **From GPT-5.3-Codex:** - Preserve existing `reasoning_effort` to keep coding behavior stable - Add `` for end-to-end task completion ## Anti-Patterns - **Asking clarifying questions when you can cover plausible intents** -- instruct the model to present interpretations instead - **Expanding task scope beyond user request** -- implement only what was asked - **Inventing exact figures, citations, or external references when uncertain** -- instruct to hedge, lock to retrieved sources, or use tools to verify - **Rephrasing user requests unless semantics change** -- preserve the user's language - **Narrating routine tool calls in agent updates** -- instruct to report only meaningful milestones - **Creating extra UI/styling beyond design system specs** -- enforce scope constraints - **Changing prompts during model migrations** -- test model alone first, then tune prompts - **Over-prompting for default behavior** -- GPT-5.4's instruction adherence means less scaffolding is needed for basics - **Verbose tool descriptions** -- keep to 1-2 sentences; more wastes context without improving selection - **Bumping `reasoning_effort` before engineering the prompt** -- add completeness, verification, and tool persistence first - **Defaulting to `xhigh` reasoning** -- reserve for long agentic tasks where intelligence outweighs speed/cost - **Treating empty tool results as final** -- require fallback strategies first - **Skipping prerequisites when the end state looks obvious** -- require dependency checks for multi-step flows ## Quality Checklist - [ ] Output contract is set (sections, format, length limits per section) - [ ] Scope constraints are defined for code and design tasks - [ ] Long-context grounding is added for inputs over 10k tokens - [ ] Default follow-through policy is set for agentic prompts - [ ] Uncertainty handling is specified for the domain's risk level - [ ] `reasoning_effort` is chosen deliberately; not defaulted to `xhigh` - [ ] Tool persistence, completeness contract, and verification loop are set for agents - [ ] Empty-result recovery is specified for tool-heavy flows - [ ] Tool descriptions are crisp (1-2 sentences each) - [ ] Extraction tasks include exact JSON schema with null handling - [ ] Research prompts specify 3-pass structure and citation rules - [ ] Personality and memo mode are set for customer-facing prose - [ ] `` is set for coding tasks - [ ] Prompt has been tested without changes after model migration ## Reference - Official GPT-5.4 Prompt Guidance: https://developers.openai.com/api/docs/guides/prompt-guidance