---
title: Responses API
---

# Responses API Reference

The Responses API provides an OpenAI-compatible interface for agentic workflows with built-in support for multi-turn conversations, tool execution, and MCP (Model Context Protocol) integration.

---

## Overview

### Purpose vs Chat Completions API

The Responses API differs from the Chat Completions API in several key ways:

| Feature | Chat Completions | Responses API |
|---------|------------------|---------------|
| Conversation State | Stateless | Server-managed state |
| Tool Execution | Client-side | Server-side with MCP support |
| Multi-turn | Manual | Automatic with `previous_response_id` |
| Persistence | None | Built-in response/conversation storage |
| Agentic Workflows | Manual orchestration | Built-in tool loop execution |

### Agentic Workflow Concepts

The Responses API enables agentic workflows where the model can:

1. **Reason** about tasks using optional reasoning parameters
2. **Plan** tool usage with automatic tool selection
3. **Execute** tools via MCP servers or function calling
4. **Iterate** through multiple tool calls in a single request
5. **Persist** conversation history for multi-session workflows

---

## Base URL

```
http://localhost:30000/v1
```

---

## Create Response

Create a new response with optional tool execution and conversation management.

```
POST /v1/responses
```

### Request Body

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `model` | string | Yes | Model identifier |
| `input` | string or array | Yes | Input text or array of input items |
| `instructions` | string | No | System instructions for the model |
| `max_output_tokens` | integer | No | Maximum tokens to generate |
| `max_tool_calls` | integer | No | Maximum number of tool calls per request |
| `temperature` | number | No | Sampling temperature (0-2), default: 1.0 |
| `top_p` | number | No | Nucleus sampling parameter (0-1) |
| `stream` | boolean | No | Enable streaming responses |
| `store` | boolean | No | Store response for later retrieval, default: true |
| `tools` | array | No | Available tools (function, mcp, web_search_preview, code_interpreter) |
| `tool_choice` | string/object | No | Tool selection behavior: `auto`, `none`, `required`, or specific tool |
| `parallel_tool_calls` | boolean | No | Allow parallel tool execution, default: true |
| `previous_response_id` | string | No | Continue from a previous response |
| `conversation` | string | No | Conversation ID (mutually exclusive with `previous_response_id`) |
| `reasoning` | object | No | Reasoning configuration |
| `text` | object | No | Text format for structured outputs |
| `metadata` | object | No | Custom metadata (max 16 properties) |
| `user` | string | No | End-user identifier |
| `background` | boolean | No | Run request in background (not with streaming) |

### Input Formats

**Simple text input:**

```json
{
  "input": "What is the capital of France?"
}
```

**Structured input items:**

```json
{
  "input": [
    {
      "type": "message",
      "role": "user",
      "content": [{"type": "input_text", "text": "Hello!"}]
    }
  ]
}
```

### Tool Configuration

**Function tools:**

```json
{
  "tools": [
    {
      "type": "function",
      "name": "get_weather",
      "description": "Get weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {"type": "string"}
        },
        "required": ["location"]
      }
    }
  ]
}
```

**MCP tools:**

```json
{
  "tools": [
    {
      "type": "mcp",
      "server_url": "http://localhost:8080/mcp",
      "server_label": "my-mcp-server",
      "server_description": "My MCP server for data access",
      "require_approval": "never",
      "allowed_tools": ["query_database", "search_files"]
    }
  ]
}
```

### Reasoning Configuration

```json
{
  "reasoning": {
    "effort": "medium",
    "summary": "auto"
  }
}
```

Effort levels: `minimal`, `low`, `medium`, `high`

### Text Format (Structured Outputs)

```json
{
  "text": {
    "format": {
      "type": "json_schema",
      "name": "user_info",
      "schema": {
        "type": "object",
        "properties": {
          "name": {"type": "string"},
          "age": {"type": "integer"}
        }
      },
      "strict": true
    }
  }
}
```

### Example Request

```bash
curl http://localhost:30000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "input": "Search for the latest news about AI",
    "instructions": "Be concise and factual",
    "max_output_tokens": 500,
    "temperature": 0.7,
    "tools": [
      {
        "type": "mcp",
        "server_url": "http://localhost:8080/mcp",
        "server_label": "search"
      }
    ],
    "tool_choice": "auto"
  }'
```

### Response

```json
{
  "id": "resp_abc123def456",
  "object": "response",
  "created_at": 1705312345,
  "status": "completed",
  "model": "meta-llama/Llama-3.1-8B-Instruct",
  "output": [
    {
      "type": "mcp_list_tools",
      "id": "mcp_list_001",
      "server_label": "search",
      "tools": [
        {
          "name": "web_search",
          "description": "Search the web",
          "input_schema": {"type": "object", "properties": {"query": {"type": "string"}}}
        }
      ]
    },
    {
      "type": "mcp_call",
      "id": "mcp_call_001",
      "status": "completed",
      "name": "web_search",
      "arguments": "{\"query\": \"latest AI news\"}",
      "output": "{\"results\": [...]}",
      "server_label": "search"
    },
    {
      "type": "message",
      "id": "msg_001",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Based on my search, here are the latest AI developments..."
        }
      ],
      "status": "completed"
    }
  ],
  "usage": {
    "input_tokens": 50,
    "output_tokens": 150,
    "total_tokens": 200
  },
  "tools": [
    {
      "type": "mcp",
      "server_label": "search",
      "server_url": "http://localhost:8080/mcp"
    }
  ],
  "tool_choice": "auto",
  "parallel_tool_calls": true,
  "store": true
}
```

### Streaming Response

With `"stream": true`, responses are sent as Server-Sent Events:

```bash
curl http://localhost:30000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "input": "Hello!",
    "stream": true
  }'
```

**Event sequence:**

```
event: response.created
data: {"type": "response.created", "sequence_number": 0, "response": {...}}

event: response.in_progress
data: {"type": "response.in_progress", "sequence_number": 1, "response": {...}}

event: response.output_item.added
data: {"type": "response.output_item.added", "sequence_number": 2, "output_index": 0, "item": {...}}

event: response.content_part.added
data: {"type": "response.content_part.added", "sequence_number": 3, "output_index": 0, "content_index": 0, "part": {...}}

event: response.output_text.delta
data: {"type": "response.output_text.delta", "sequence_number": 4, "output_index": 0, "content_index": 0, "delta": "Hello"}

event: response.output_text.done
data: {"type": "response.output_text.done", "sequence_number": 5, "output_index": 0, "content_index": 0, "text": "Hello! How can I help you?"}

event: response.output_item.done
data: {"type": "response.output_item.done", "sequence_number": 6, "output_index": 0, "item": {...}}

event: response.completed
data: {"type": "response.completed", "sequence_number": 7, "response": {...}}

data: [DONE]
```

**MCP-specific streaming events:**

```
event: response.mcp_list_tools.in_progress
data: {"type": "response.mcp_list_tools.in_progress", "output_index": 0, "item_id": "mcp_list_001"}

event: response.mcp_list_tools.completed
data: {"type": "response.mcp_list_tools.completed", "output_index": 0, "item_id": "mcp_list_001"}

event: response.mcp_call.in_progress
data: {"type": "response.mcp_call.in_progress", "output_index": 1, "item_id": "mcp_call_001"}

event: response.mcp_call_arguments.delta
data: {"type": "response.mcp_call_arguments.delta", "output_index": 1, "item_id": "mcp_call_001", "delta": "{\"query\": \"..."}

event: response.mcp_call_arguments.done
data: {"type": "response.mcp_call_arguments.done", "output_index": 1, "item_id": "mcp_call_001", "arguments": "{\"query\": \"...\"}"}

event: response.output_item.done
data: {"type": "response.output_item.done", "output_index": 1, "item": {"type": "mcp_call", "output": "...", ...}}
```

---

## Get Response

Retrieve a previously stored response by ID.

```
GET /v1/responses/{response_id}
```

### Path Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `response_id` | string | The response ID (e.g., `resp_abc123`) |

### Query Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `include` | array | Additional fields to include |

### Example Request

```bash
curl http://localhost:30000/v1/responses/resp_abc123def456
```

### Response

Returns the full response object as shown in the Create Response section.

---

## Cancel Response

```
POST /v1/responses/{response_id}/cancel
```

Attempts to cancel an in-progress response. Behavior depends on the connection mode:

- **gRPC workers**: Background mode is not supported. This endpoint always returns a `400 Bad Request` error with code `cancellation_not_supported`.
- **HTTP workers**: The request is proxied to the backend worker. Whether cancellation succeeds depends on backend support.

### Path Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `response_id` | string | The response ID to cancel |

### Example Request

```bash
curl -X POST http://localhost:30000/v1/responses/resp_abc123def456/cancel
```

### Response

**HTTP workers**: Returns the response object from the backend.

**gRPC workers**: Returns a `400 Bad Request` error:

```json
{
  "error": {
    "message": "Background mode is not supported. Synchronous and streaming responses cannot be cancelled.",
    "type": "Bad Request",
    "code": "cancellation_not_supported"
  }
}
```

---

## Delete Response

Delete a stored response.

```
DELETE /v1/responses/{response_id}
```

### Path Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `response_id` | string | The response ID to delete |

### Example Request

```bash
curl -X DELETE http://localhost:30000/v1/responses/resp_abc123def456
```

### Response

```json
{
  "id": "resp_abc123def456",
  "object": "response.deleted",
  "deleted": true
}
```

---

## List Response Input Items

List the input items that were sent with a response.

```
GET /v1/responses/{response_id}/input_items
```

### Path Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `response_id` | string | The response ID |

### Example Request

```bash
curl http://localhost:30000/v1/responses/resp_abc123def456/input_items
```

### Response

```json
{
  "object": "list",
  "data": [
    {
      "id": "msg_input_001",
      "type": "message",
      "role": "user",
      "content": [{"type": "input_text", "text": "Hello!"}]
    }
  ],
  "first_id": "msg_input_001",
  "last_id": "msg_input_001",
  "has_more": false
}
```

---

## Conversation Management

Conversations provide persistent storage for multi-turn interactions, enabling chat history to be maintained across multiple requests.

### Create Conversation

```
POST /v1/conversations
```

### Request Body

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `metadata` | object | No | Custom metadata (max 16 properties) |

### Example Request

```bash
curl http://localhost:30000/v1/conversations \
  -H "Content-Type: application/json" \
  -d '{
    "metadata": {
      "project": "customer-support",
      "user_id": "user_123"
    }
  }'
```

### Response

```json
{
  "id": "conv_abc123def456",
  "object": "conversation",
  "created_at": 1705312345,
  "metadata": {
    "project": "customer-support",
    "user_id": "user_123"
  }
}
```

---

### Get Conversation

```
GET /v1/conversations/{conversation_id}
```

### Example Request

```bash
curl http://localhost:30000/v1/conversations/conv_abc123def456
```

### Response

```json
{
  "id": "conv_abc123def456",
  "object": "conversation",
  "created_at": 1705312345,
  "metadata": {
    "project": "customer-support"
  }
}
```

---

### Update Conversation

Update conversation metadata. Uses merge semantics - set a key to `null` to delete it.

```
POST /v1/conversations/{conversation_id}
```

### Request Body

| Field | Type | Description |
|-------|------|-------------|
| `metadata` | object | Metadata to merge (null values delete keys) |

### Example Request

```bash
curl http://localhost:30000/v1/conversations/conv_abc123def456 \
  -H "Content-Type: application/json" \
  -d '{
    "metadata": {
      "status": "resolved",
      "project": null
    }
  }'
```

### Response

Returns the updated conversation object.

---

### Delete Conversation

```
DELETE /v1/conversations/{conversation_id}
```

### Example Request

```bash
curl -X DELETE http://localhost:30000/v1/conversations/conv_abc123def456
```

### Response

```json
{
  "id": "conv_abc123def456",
  "object": "conversation.deleted",
  "deleted": true
}
```

---

### List Conversation Items

```
GET /v1/conversations/{conversation_id}/items
```

### Query Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `limit` | integer | 100 | Maximum items to return |
| `order` | string | `desc` | Sort order: `asc` or `desc` |
| `after` | string | - | Cursor for pagination |

### Example Request

```bash
curl "http://localhost:30000/v1/conversations/conv_abc123/items?limit=20&order=asc"
```

### Response

```json
{
  "object": "list",
  "data": [
    {
      "id": "item_001",
      "type": "message",
      "role": "user",
      "content": [{"type": "input_text", "text": "Hello"}],
      "status": "completed",
      "created_at": 1705312345
    },
    {
      "id": "item_002",
      "type": "message",
      "role": "assistant",
      "content": [{"type": "output_text", "text": "Hi there!"}],
      "status": "completed",
      "created_at": 1705312346
    }
  ],
  "first_id": "item_001",
  "last_id": "item_002",
  "has_more": false
}
```

---

### Create Conversation Items

Add items to a conversation. Maximum 20 items per request.

```
POST /v1/conversations/{conversation_id}/items
```

### Request Body

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `items` | array | Yes | Array of items to add (max 20) |

### Supported Item Types

- `message` - User or assistant messages
- `reasoning` - Model reasoning content
- `mcp_list_tools` - MCP tool listing
- `mcp_call` - MCP tool invocation
- `item_reference` - Reference to an existing item
- `function_call` - Function tool call
- `function_call_output` - Function call result

### Example Request

```bash
curl http://localhost:30000/v1/conversations/conv_abc123/items \
  -H "Content-Type: application/json" \
  -d '{
    "items": [
      {
        "type": "message",
        "role": "user",
        "content": [{"type": "input_text", "text": "What is 2+2?"}]
      },
      {
        "type": "message",
        "role": "assistant",
        "content": [{"type": "output_text", "text": "2+2 equals 4."}]
      }
    ]
  }'
```

### Response

```json
{
  "object": "list",
  "data": [
    {
      "id": "item_003",
      "type": "message",
      "role": "user",
      "content": [{"type": "input_text", "text": "What is 2+2?"}],
      "status": "completed"
    },
    {
      "id": "item_004",
      "type": "message",
      "role": "assistant",
      "content": [{"type": "output_text", "text": "2+2 equals 4."}],
      "status": "completed"
    }
  ],
  "first_id": "item_003",
  "last_id": "item_004",
  "has_more": false
}
```

---

### Get Conversation Item

```
GET /v1/conversations/{conversation_id}/items/{item_id}
```

### Example Request

```bash
curl http://localhost:30000/v1/conversations/conv_abc123/items/item_001
```

### Response

Returns the item object.

---

### Delete Conversation Item

Remove an item from a conversation. This performs a soft delete - the item may still exist if referenced by other conversations.

```
DELETE /v1/conversations/{conversation_id}/items/{item_id}
```

### Example Request

```bash
curl -X DELETE http://localhost:30000/v1/conversations/conv_abc123/items/item_001
```

### Response

Returns the updated conversation object.

---

## Examples

### Simple Agentic Workflow

```python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:30000/v1",
    api_key="your-api-key"
)

# Create a response with MCP tools
response = client.responses.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    input="Search for the weather in San Francisco and summarize it",
    tools=[
        {
            "type": "mcp",
            "server_url": "http://localhost:8080/mcp",
            "server_label": "weather-service"
        }
    ],
    tool_choice="auto"
)

# The response includes tool calls and final answer
for output in response.output:
    if output.type == "mcp_call":
        print(f"Tool called: {output.name}")
        print(f"Result: {output.output}")
    elif output.type == "message":
        for content in output.content:
            if content.type == "output_text":
                print(f"Answer: {content.text}")
```

### Multi-turn Conversation with Tools

```python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:30000/v1",
    api_key="your-api-key"
)

# Create a conversation
conversation = client.conversations.create(
    metadata={"session": "support-123"}
)

# First turn
response1 = client.responses.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    input="I need help with my order #12345",
    conversation=conversation.id,
    tools=[
        {
            "type": "mcp",
            "server_url": "http://localhost:8080/mcp",
            "server_label": "order-service"
        }
    ]
)
print(f"First response: {response1.id}")

# Second turn - continues the conversation
response2 = client.responses.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    input="Can you also check if there are any discounts available?",
    conversation=conversation.id,
    tools=[
        {
            "type": "mcp",
            "server_url": "http://localhost:8080/mcp",
            "server_label": "order-service"
        }
    ]
)
print(f"Second response: {response2.id}")

# List conversation history
items = client.conversations.items.list(conversation.id)
for item in items.data:
    print(f"{item.role}: {item.content}")
```

### Streaming Response Handling

```python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:30000/v1",
    api_key="your-api-key"
)

# Stream a response
with client.responses.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    input="Explain quantum computing",
    stream=True
) as stream:
    for event in stream:
        if event.type == "response.output_text.delta":
            print(event.delta, end="", flush=True)
        elif event.type == "response.mcp_call.in_progress":
            print(f"\n[Calling tool: {event.item_id}]")
        elif event.type == "response.completed":
            print(f"\n\nTokens used: {event.response.usage.total_tokens}")
```

### Using Previous Response ID

```python
# Alternative to conversations - chain responses directly
response1 = client.responses.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    input="What are the main programming paradigms?",
    store=True
)

# Continue from previous response
response2 = client.responses.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    input="Can you elaborate on functional programming?",
    previous_response_id=response1.id,
    store=True
)
```

---

## Error Responses

### Error Format

```json
{
  "error": {
    "message": "Error description",
    "type": "error_type",
    "param": "field_name",
    "code": "error_code"
  }
}
```

### Common Errors

| HTTP Status | Type | Description |
|-------------|------|-------------|
| 400 | `invalid_request_error` | Malformed request or validation failure |
| 401 | `authentication_error` | Invalid or missing API key |
| 404 | `not_found_error` | Response, conversation, or item not found |
| 429 | `rate_limit_error` | Rate limit exceeded |
| 500 | `internal_error` | Server error |
| 503 | `service_unavailable` | No healthy workers available |

### Validation Errors

```json
{
  "error": {
    "message": "Invalid 'conversation': 'invalid-id'. Expected an ID that begins with 'conv_'.",
    "type": "invalid_request_error",
    "param": "conversation",
    "code": "invalid_conversation_id"
  }
}
```

```json
{
  "error": {
    "message": "Mutually exclusive parameters. Ensure you are only providing one of: 'previous_response_id' or 'conversation'.",
    "type": "invalid_request_error",
    "code": "mutually_exclusive_parameters"
  }
}
```

---

## SGLang Extensions

The Responses API includes additional sampling parameters specific to SGLang:

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `top_k` | integer | -1 | Top-k sampling (-1 = disabled) |
| `min_p` | number | 0.0 | Min-p sampling threshold |
| `repetition_penalty` | number | 1.0 | Repetition penalty (1.0 = disabled) |
| `frequency_penalty` | number | - | OpenAI-compatible frequency penalty |
| `presence_penalty` | number | - | OpenAI-compatible presence penalty |
| `stop` | string/array | - | Stop sequences |

Example:

```json
{
  "model": "meta-llama/Llama-3.1-8B-Instruct",
  "input": "Write a story",
  "top_k": 50,
  "min_p": 0.05,
  "repetition_penalty": 1.1
}
```