# Provider API Patterns ## Provider-neutral view Most provider APIs can support the same architecture: ```text instructions + context + tool schemas -> model output -> final response or tool call -> application executes tool -> application returns tool result -> repeat ``` Provider differences are mostly in message shape, state handling, hosted tools, streaming events, and reasoning/tool item formats. ## OpenAI Responses-style APIs Use Responses-style APIs for new OpenAI-native agent work when available. They provide typed output items, hosted tools, remote MCP support, stateful chaining options, and richer agent-like primitives. Implementation pattern: ```python response = client.responses.create( model=model, instructions=instructions, input=input_items, tools=visible_tools, store=True, ) for item in response.output: if item.type == "function_call": result = execute_tool(item.name, item.arguments) next_response = client.responses.create( model=model, previous_response_id=response.id, input=[{ "type": "function_call_output", "call_id": item.call_id, "output": result, }], ) ``` Use the harness for private/business tools, permission checks, durable state, and audit logs even when hosted tools are available. ## Chat Completions-style and OpenAI-compatible APIs Use Chat Completions-style APIs when you need compatibility with OpenAI-compatible providers or when your harness already owns message history manually. Implementation pattern: ```python messages = [ {"role": "system", "content": instructions}, {"role": "user", "content": task}, ] while True: response = client.chat.completions.create( model=model, messages=messages, tools=visible_tools, ) msg = response.choices[0].message messages.append(msg) if not msg.tool_calls: return msg.content for call in msg.tool_calls: result = execute_tool(call.function.name, call.function.arguments) messages.append({ "role": "tool", "tool_call_id": call.id, "content": result, }) ``` In this pattern, the harness owns: - conversation state; - message trimming; - compaction; - previous tool results; - tool-call ID matching; - approval pauses; - retries; - finalization. ## Anthropic API pattern With Anthropic APIs, use structured tool-use and tool-result blocks. The model emits a tool-use request; the application executes the operation and returns the corresponding result in the next request. Provider-neutral shape: ```text request: messages + tools response: assistant content with tool-use blocks application: validate and execute tool-use blocks next request: user/tool-result content blocks repeat until final answer ``` Keep the same harness rules: validate arguments locally, check permissions, return structured results, preserve budgets, and trace every step. ## API adapter layer Use an adapter so the rest of the harness is provider-neutral. Adapter responsibilities: ```text normalize input messages/items normalize tool schemas normalize model output into ToolCall or FinalAnswer events normalize tool results back to provider format handle streaming event conversion handle provider-specific state chaining capture token/cost/latency metadata ``` Internal event types should be stable even when provider APIs differ. ## Hosted tools versus client tools Hosted tools run in provider infrastructure. Client tools run in your application or sandbox. Hosted tools are useful for: - web search; - file search; - code execution; - image generation; - general computer/browser use; - remote connector calls supported by the provider. Client tools are preferred for: - private business APIs; - tenant-specific permissions; - regulated data; - financial actions; - communication sends; - state-changing operations; - custom audit requirements. Do not outsource business authorization to a hosted tool unless the product explicitly supports and logs the required approval policy. ## Strict schemas Use strict function schemas where available: ```text required fields explicit unknown fields rejected enums for actions minimum/maximum constraints validated IDs structured outputs ``` Then validate again in the harness before execution. ## Streaming Streaming can reduce latency but adds complexity. Rules: - buffer enough data to validate complete tool calls; - execute only when a tool call is complete; - keep result ordering deterministic; - handle aborts by sending synthetic tool results if required; - do not stream partial sensitive data to users before output guardrails run. ## State strategies Options: ```text stateless: every request sends full selected context previous-response chaining: provider stores prior state references conversation object: provider stores conversation items application event store: harness stores full operational history ``` Even when provider state is used, maintain an application event store for audit, replay, approvals, and evals. ## OpenAI-compatible provider caveats OpenAI-compatible APIs vary in: - tool-call schema fidelity; - support for parallel tool calls; - strict schema behavior; - streaming event shapes; - reasoning item visibility; - multimodal support; - context windows; - storage defaults; - hosted tools; - safety behavior. Do not assume full OpenAI parity. Test the exact provider and model. ## Prompt caching and retention Provider APIs differ in prompt-cache controls, but the harness rules are provider-neutral: ```text stable content first volatile content late deterministic tool/schema ordering append-only history until compaction cache usage fields logged on every call prompt/tool bundle versions tracked ``` OpenAI APIs expose prompt caching automatically on supported requests and report cached-token usage in response metadata. Some OpenAI APIs also support retention controls for longer-lived cached prefixes. Anthropic APIs expose prompt caching through provider-specific cache controls and usage fields. Use provider documentation for current marker syntax, TTL behavior, and breakpoint limits. OpenAI-compatible APIs vary. Confirm whether the provider actually implements prompt caching, how it reports cache hits, and whether routing keys or backend cache settings are available. See [prompt-caching-and-cost.md](prompt-caching-and-cost.md) for the detailed provider-neutral design pattern. ## Source links - OpenAI Responses migration: https://developers.openai.com/api/docs/guides/migrate-to-responses - OpenAI function calling: https://developers.openai.com/api/docs/guides/function-calling - OpenAI tools: https://developers.openai.com/api/docs/guides/tools - OpenAI Agents SDK: https://developers.openai.com/api/docs/guides/agents - OpenAI guardrails and human review: https://developers.openai.com/api/docs/guides/agents/guardrails-approvals - OpenAI prompt caching: https://developers.openai.com/api/docs/guides/prompt-caching - OpenAI Prompt Caching 201: https://developers.openai.com/cookbook/examples/prompt_caching_201 - Anthropic building effective agents: https://www.anthropic.com/research/building-effective-agents - Anthropic writing effective tools for agents: https://www.anthropic.com/engineering/writing-tools-for-agents