--- name: openai-responses description: | Build agentic AI with OpenAI Responses API - stateful conversations with preserved reasoning, built-in tools (Code Interpreter, File Search, Web Search), and MCP integration. Prevents 11 documented errors. Use when: building agents with persistent reasoning, using server-side tools, or migrating from Chat Completions/Assistants for better multi-turn performance. user-invocable: true --- # OpenAI Responses API **Status**: Production Ready **Last Updated**: 2026-01-21 **API Launch**: March 2025 **Dependencies**: openai@6.16.0 (Node.js) or fetch API (Cloudflare Workers) --- ## What Is the Responses API? OpenAI's unified interface for agentic applications, launched **March 2025**. Provides **stateful conversations** with **preserved reasoning state** across turns. **Key Innovation:** Unlike Chat Completions (reasoning discarded between turns), Responses **preserves the model's reasoning notebook**, improving performance by **5% on TAUBench** and enabling better multi-turn interactions. **vs Chat Completions:** | Feature | Chat Completions | Responses API | |---------|-----------------|---------------| | State | Manual history tracking | Automatic (conversation IDs) | | Reasoning | Dropped between turns | Preserved across turns (+5% TAUBench) | | Tools | Client-side round trips | Server-side hosted | | Output | Single message | Polymorphic (8 types) | | Cache | Baseline | **40-80% better utilization** | | MCP | Manual | Built-in | --- ## Quick Start ```bash npm install openai@6.16.0 ``` ```typescript import OpenAI from 'openai'; const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); const response = await openai.responses.create({ model: 'gpt-5', input: 'What are the 5 Ds of dodgeball?', }); console.log(response.output_text); ``` **Key differences from Chat Completions:** - Endpoint: `/v1/responses` (not `/v1/chat/completions`) - Parameter: `input` (not `messages`) - Role: `developer` (not `system`) - Output: `response.output_text` (not `choices[0].message.content`) --- ## When to Use Responses vs Chat Completions **Use Responses:** - Agentic applications (reasoning + actions) - Multi-turn conversations (preserved reasoning = +5% TAUBench) - Built-in tools (Code Interpreter, File Search, Web Search, MCP) - Background processing (60s standard, 10min extended timeout) **Use Chat Completions:** - Simple one-off generation - Fully stateless interactions - Legacy integrations --- ## Stateful Conversations **Automatic State Management** using conversation IDs: ```typescript // Create conversation const conv = await openai.conversations.create({ metadata: { user_id: 'user_123' }, }); // First turn const response1 = await openai.responses.create({ model: 'gpt-5', conversation: conv.id, input: 'What are the 5 Ds of dodgeball?', }); // Second turn - model remembers context + reasoning const response2 = await openai.responses.create({ model: 'gpt-5', conversation: conv.id, input: 'Tell me more about the first one', }); ``` **Benefits:** No manual history tracking, reasoning preserved, 40-80% better cache utilization **Conversation Limits:** 90-day expiration --- ## Built-in Tools (Server-Side) **Server-side hosted tools** eliminate backend round trips: | Tool | Purpose | Notes | |------|---------|-------| | `code_interpreter` | Execute Python code | Sandboxed, 30s timeout (use `background: true` for longer) | | `file_search` | RAG without vector stores | Max 512MB per file, supports PDF/Word/Markdown/HTML/code | | `web_search` | Real-time web information | Automatic source citations | | `image_generation` | DALL-E integration | DALL-E 3 default | | `mcp` | Connect external tools | OAuth supported, tokens NOT stored | **Usage:** ```typescript const response = await openai.responses.create({ model: 'gpt-5', input: 'Calculate mean of: 10, 20, 30, 40, 50', tools: [{ type: 'code_interpreter' }], }); ``` ### Web Search TypeScript Note **TypeScript Limitation**: The `web_search` tool's `external_web_access` option is missing from SDK types (as of v6.16.0). **Workaround**: ```typescript const response = await openai.responses.create({ model: 'gpt-5', input: 'Search for recent news', tools: [{ type: 'web_search', external_web_access: true, } as any], // ✅ Type assertion to suppress error }); ``` **Source**: [GitHub Issue #1716](https://github.com/openai/openai-node/issues/1716) --- ## MCP Server Integration Built-in support for **Model Context Protocol (MCP)** servers to connect external tools (Stripe, databases, custom APIs). ### User Approval Requirement **By default, explicit user approval is required** before any data is shared with a remote MCP server (security feature). **Handling Approval**: ```typescript const response = await openai.responses.create({ model: 'gpt-5', input: 'Get my Stripe balance', tools: [{ type: 'mcp', server_label: 'stripe', server_url: 'https://mcp.stripe.com', authorization: process.env.STRIPE_TOKEN, }], }); if (response.status === 'requires_approval') { // Show user: "This action requires sharing data with Stripe. Approve?" // After user approves, retry with approval token } ``` **Alternative**: Pre-approve MCP servers in OpenAI dashboard (users configure trusted servers via settings) **Source**: [Official MCP Guide](https://platform.openai.com/docs/guides/tools-connectors-mcp) ### Basic MCP Usage ```typescript const response = await openai.responses.create({ model: 'gpt-5', input: 'Roll 2d6 dice', tools: [{ type: 'mcp', server_label: 'dice', server_url: 'https://example.com/mcp', authorization: process.env.TOKEN, // ⚠️ NOT stored, required each request }], }); ``` **MCP Output Types:** - `mcp_list_tools` - Tools discovered on server - `mcp_call` - Tool invocation + result - `message` - Final response --- ## Reasoning Preservation **Key Innovation:** Model's internal reasoning state survives across turns (unlike Chat Completions which discards it). **Visual Analogy:** - Chat Completions: Model tears out scratchpad page before responding - Responses API: Scratchpad stays open for next turn **Performance:** +5% on TAUBench (GPT-5) purely from preserved reasoning **Reasoning Summaries** (free): ```typescript response.output.forEach(item => { if (item.type === 'reasoning') console.log(item.summary[0].text); if (item.type === 'message') console.log(item.content[0].text); }); ``` ### Important: Reasoning Traces Privacy **What You Get**: Reasoning summaries (not full internal traces) **What OpenAI Keeps**: Full chain-of-thought reasoning (proprietary, for security/privacy) For GPT-5-Thinking models: - OpenAI preserves reasoning **internally** in their backend - This preserved reasoning improves multi-turn performance (+5% TAUBench) - But developers only receive **summaries**, not the actual chain-of-thought - Full reasoning traces are not exposed (OpenAI's IP protection) **Source**: [Sean Goedecke Analysis](https://www.seangoedecke.com/responses-api/) --- ## Background Mode For long-running tasks, use `background: true`: ```typescript const response = await openai.responses.create({ model: 'gpt-5', input: 'Analyze 500-page document', background: true, tools: [{ type: 'file_search', file_ids: [fileId] }], }); // Poll for completion (check every 5s) const result = await openai.responses.retrieve(response.id); if (result.status === 'completed') console.log(result.output_text); ``` **Timeout Limits:** - Standard: 60 seconds - Background: 10 minutes ### Performance Considerations **Time-to-First-Token (TTFT) Latency:** Background mode currently has higher TTFT compared to synchronous responses. OpenAI is working to reduce this gap. **Recommendation:** - For user-facing real-time responses, use sync mode (lower latency) - For long-running async tasks, use background mode (latency acceptable) **Source**: [OpenAI Background Mode Docs](https://platform.openai.com/docs/guides/background) --- ## Data Retention and Privacy **Default Retention**: 30 days when `store: true` (default) **Zero Data Retention (ZDR)**: Organizations with ZDR automatically enforce `store: false` **Background Mode**: NOT ZDR compatible (stores data ~10 minutes for polling) **Timeline**: - September 26, 2025: OpenAI court-ordered retention ended - Current: 30-day default retention with `store: true` **Control Storage**: ```typescript // Disable storage (no retention) const response = await openai.responses.create({ model: 'gpt-5', input: 'Hello!', store: false, // ✅ No retention }); // ZDR organizations: store always treated as false const response = await openai.responses.create({ model: 'gpt-5', input: 'Hello!', store: true, // ⚠️ Ignored by OpenAI for ZDR orgs, treated as false }); ``` **ZDR Compliance**: - Avoid background mode (requires temporary storage) - Explicitly set `store: false` for clarity - Note: 60s timeout applies in sync mode **Source**: [OpenAI Data Controls](https://platform.openai.com/docs/guides/your-data) --- ## Polymorphic Outputs Returns **8 output types** instead of single message: | Type | Example | |------|---------| | `message` | Final answer, explanation | | `reasoning` | Step-by-step thought process (free!) | | `code_interpreter_call` | Python code + results | | `mcp_call` | Tool name, args, output | | `mcp_list_tools` | Tool definitions from MCP server | | `file_search_call` | Matched chunks, citations | | `web_search_call` | URLs, snippets | | `image_generation_call` | Image URL | **Processing:** ```typescript response.output.forEach(item => { if (item.type === 'reasoning') console.log(item.summary[0].text); if (item.type === 'web_search_call') console.log(item.results); if (item.type === 'message') console.log(item.content[0].text); }); // Or use helper for text-only console.log(response.output_text); ``` --- ## Migration from Chat Completions **Breaking Changes:** | Feature | Chat Completions | Responses API | |---------|-----------------|---------------| | Endpoint | `/v1/chat/completions` | `/v1/responses` | | Parameter | `messages` | `input` | | Role | `system` | `developer` | | Output | `choices[0].message.content` | `output_text` | | State | Manual array | Automatic (conversation ID) | | Streaming | `data: {"choices":[...]}` | SSE with 8 item types | **Example:** ```typescript // Before const response = await openai.chat.completions.create({ model: 'gpt-5', messages: [ { role: 'system', content: 'You are a helpful assistant.' }, { role: 'user', content: 'Hello!' }, ], }); console.log(response.choices[0].message.content); // After const response = await openai.responses.create({ model: 'gpt-5', input: [ { role: 'developer', content: 'You are a helpful assistant.' }, { role: 'user', content: 'Hello!' }, ], }); console.log(response.output_text); ``` --- ## Migration from Assistants API **CRITICAL: Assistants API Sunset Timeline** - **August 26, 2025**: Assistants API officially deprecated - **2025-2026**: OpenAI providing migration utilities - **August 26, 2026**: Assistants API sunset (stops working) **Migrate before August 26, 2026** to avoid breaking changes. **Source**: [Assistants API Sunset Announcement](https://community.openai.com/t/assistants-api-beta-deprecation-august-26-2026-sunset/1354666) **Key Breaking Changes:** | Assistants API | Responses API | |----------------|---------------| | Assistants (created via API) | Prompts (created in dashboard) | | Threads | Conversations (store items, not just messages) | | Runs (server-side lifecycle) | Responses (stateless calls) | | Run-Steps | Items (polymorphic outputs) | **Migration Example:** ```typescript // Before (Assistants API - deprecated) const assistant = await openai.beta.assistants.create({ model: 'gpt-4', instructions: 'You are helpful.', }); const thread = await openai.beta.threads.create(); const run = await openai.beta.threads.runs.create(thread.id, { assistant_id: assistant.id, }); // After (Responses API - current) const conversation = await openai.conversations.create({ metadata: { purpose: 'customer_support' }, }); const response = await openai.responses.create({ model: 'gpt-5', conversation: conversation.id, input: [ { role: 'developer', content: 'You are helpful.' }, { role: 'user', content: 'Hello!' }, ], }); ``` **Migration Guide**: [Official Assistants Migration Docs](https://platform.openai.com/docs/guides/migrate-to-responses) --- ## Known Issues Prevention This skill prevents **11** documented errors: **1. Session State Not Persisting** - Cause: Not using conversation IDs or using different IDs per turn - Fix: Create conversation once (`const conv = await openai.conversations.create()`), reuse `conv.id` for all turns **2. MCP Server Connection Failed** (`mcp_connection_error`) - Causes: Invalid URL, missing/expired auth token, server down - Fix: Verify URL is correct, test manually with `fetch()`, check token expiration **3. Code Interpreter Timeout** (`code_interpreter_timeout`) - Cause: Code runs longer than 30 seconds - Fix: Use `background: true` for extended timeout (up to 10 min) **4. Image Generation Rate Limit** (`rate_limit_error`) - Cause: Too many DALL-E requests - Fix: Implement exponential backoff retry (1s, 2s, 3s delays) **5. File Search Relevance Issues** - Cause: Vague queries return irrelevant results - Fix: Use specific queries ("pricing in Q4 2024" not "find pricing"), filter by `chunk.score > 0.7` **6. Cost Tracking Confusion** - Cause: Responses bills for input + output + tools + stored conversations (vs Chat Completions: input + output only) - Fix: Set `store: false` if not needed, monitor `response.usage.tool_tokens` **7. Conversation Not Found** (`invalid_request_error`) - Causes: ID typo, conversation deleted, or expired (90-day limit) - Fix: Verify exists with `openai.conversations.list()` before using **8. Tool Output Parsing Failed** - Cause: Accessing wrong output structure - Fix: Use `response.output_text` helper or iterate `response.output.forEach(item => ...)` checking `item.type` **9. Zod v4 Incompatibility with Structured Outputs** - **Error**: `Invalid schema for response_format 'name': schema must be a JSON Schema of 'type: "object"', got 'type: "string"'.` - **Source**: [GitHub Issue #1597](https://github.com/openai/openai-node/issues/1597) - **Why It Happens**: SDK's vendored `zod-to-json-schema` library doesn't support Zod v4 (missing `ZodFirstPartyTypeKind` export) - **Prevention**: Pin to Zod v3 (`"zod": "^3.23.8"`) or use custom `zodTextFormat` with `z.toJSONSchema({ target: "draft-7" })` ```typescript // Workaround: Pin to Zod v3 (recommended) { "dependencies": { "openai": "^6.16.0", "zod": "^3.23.8" // DO NOT upgrade to v4 yet } } ``` **10. Background Mode Web Search Missing Sources** - **Error**: `web_search_call` output items contain query but no sources/results - **Source**: [GitHub Issue #1676](https://github.com/openai/openai-node/issues/1676) - **Why It Happens**: When using `background: true` + `web_search` tool, OpenAI doesn't return sources in the response - **Prevention**: Use synchronous mode (`background: false`) when web search sources are needed ```typescript // ✅ Sources available in sync mode const response = await openai.responses.create({ model: 'gpt-5', input: 'Latest AI news?', background: false, // Required for sources tools: [{ type: 'web_search' }], }); ``` **11. Streaming Mode Missing output_text Helper** - **Error**: `finalResponse().output_text` is `undefined` in streaming mode - **Source**: [GitHub Issue #1662](https://github.com/openai/openai-node/issues/1662) - **Why It Happens**: `stream.finalResponse()` doesn't include `output_text` convenience field (only available in non-streaming responses) - **Prevention**: Listen for `output_text.done` event or manually extract from `output` items ```typescript // Workaround: Listen for event const stream = openai.responses.stream({ model: 'gpt-5', input: 'Hello!' }); let outputText = ''; for await (const event of stream) { if (event.type === 'output_text.done') { outputText = event.output_text; // ✅ Available in event } } ``` --- ## Critical Patterns **✅ Always:** - Use conversation IDs for multi-turn (40-80% better cache) - Handle all 8 output types in polymorphic responses - Use `background: true` for tasks >30s - Provide MCP `authorization` tokens (NOT stored, required each request) - Monitor `response.usage.total_tokens` for cost control **❌ Never:** - Expose API keys in client-side code - Assume single message output (use `response.output_text` helper) - Reuse conversation IDs across users (security risk) - Ignore error types (handle `rate_limit_error`, `mcp_connection_error` specifically) - Poll faster than 1s for background tasks (use 5s intervals) --- ## References **Official Docs:** - Responses API Guide: https://platform.openai.com/docs/guides/responses - API Reference: https://platform.openai.com/docs/api-reference/responses - MCP Integration: https://platform.openai.com/docs/guides/tools-connectors-mcp - Blog Post: https://developers.openai.com/blog/responses-api/ - Starter App: https://github.com/openai/openai-responses-starter-app **Skill Resources:** `templates/`, `references/responses-vs-chat-completions.md`, `references/mcp-integration-guide.md`, `references/built-in-tools-guide.md`, `references/migration-guide.md`, `references/top-errors.md` --- **Last verified**: 2026-01-21 | **Skill version**: 2.1.0 | **Changes**: Added 3 TIER 1 issues (Zod v4, background web search, streaming output_text), 2 TIER 2 findings (MCP approval, reasoning privacy), Data Retention & ZDR section, Assistants API sunset timeline, background mode TTFT note, web search TypeScript limitation. Updated SDK version to 6.16.0.