# Unified API Endpoint Tuning Engines exposes an OpenAI-compatible inference API. Use it as the unified endpoint for tools, workers, coding agents, dashboards, and agent runtimes that already support custom OpenAI-compatible providers. ```text Base URL: https://api.tuningengines.com/v1 API key: sk-te-... Models: use tenant model IDs from te inference models ``` Requests routed through this endpoint can use Tuning Engines for model RBAC, routing, fallbacks, guardrails, AGT policy, MCP and agent access checks, traces, capture, usage metering, and cost attribution. ## Keys Use the right key for the job: | Key | Use | | --- | --- | | `sk-te-...` | Live inference through `https://api.tuningengines.com/v1` | | `te_...` | CLI, MCP server, tenant admin automation, and trace upload | List available models: ```bash TE_API_URL=https://app.tuningengines.com \ TE_API_KEY=te_your_app_api_key \ te inference models ``` ## OpenCode OpenCode supports custom OpenAI-compatible providers. Add a credential through `/connect`, choose `Other`, and use `tuning-engines` as the provider ID. Paste your `sk-te-...` inference key when prompted. Then add or update `opencode.json`: ```json { "$schema": "https://opencode.ai/config.json", "provider": { "tuning-engines": { "npm": "@ai-sdk/openai-compatible", "name": "Tuning Engines", "options": { "baseURL": "https://api.tuningengines.com/v1" }, "models": { "llama-3.3-70b-fp8": { "name": "llama-3.3-70b-fp8" } } } } } ``` Replace `llama-3.3-70b-fp8` with a model visible to the inference key. The provider ID in `/connect` must match the `provider` key in the config. Run `/models` in OpenCode and select the Tuning Engines model. ## Python OpenAI SDK ```python from openai import OpenAI client = OpenAI( api_key="sk-te-your-inference-key", base_url="https://api.tuningengines.com/v1", ) response = client.chat.completions.create( model="llama-3.3-70b-fp8", messages=[ {"role": "user", "content": "Summarize the current deployment risk."}, ], ) print(response.choices[0].message.content) ``` ## JavaScript OpenAI SDK ```js import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.TE_INFERENCE_KEY, baseURL: "https://api.tuningengines.com/v1", }); const response = await client.chat.completions.create({ model: "llama-3.3-70b-fp8", messages: [ { role: "user", content: "Summarize the current deployment risk." }, ], }); console.log(response.choices[0].message.content); ``` ## Temporal Activities Temporal should own durable workflow execution. Put nondeterministic model calls inside Activities, and call Tuning Engines from those Activities. ```python import os from openai import OpenAI from temporalio import activity client = OpenAI( api_key=os.environ["TE_INFERENCE_KEY"], base_url=os.environ.get("TE_INFERENCE_BASE", "https://api.tuningengines.com/v1"), ) @activity.defn def summarize_ticket(messages: list[dict[str, str]]) -> str: response = client.chat.completions.create( model=os.environ.get("TE_MODEL", "llama-3.3-70b-fp8"), messages=messages, ) return response.choices[0].message.content or "" ``` For governed MCP calls, tenant-agent dispatch, and trace upload from Temporal, use the packaged runtime adapter: ```bash pip install "tuning-agents[temporal] @ git+https://github.com/cerebrixos-org/tuning-engines-cli.git#subdirectory=packages/tuning-agents" ``` ## LangGraph LangGraph should own graph execution, state, memory, interrupts, and checkpointing. For a simple OpenAI-compatible setup, call Tuning Engines from a graph node with the standard OpenAI client. ```python import os from openai import OpenAI client = OpenAI( api_key=os.environ["TE_INFERENCE_KEY"], base_url=os.environ.get("TE_INFERENCE_BASE", "https://api.tuningengines.com/v1"), ) def call_tuning_engines(state: dict) -> dict: response = client.chat.completions.create( model=os.environ.get("TE_MODEL", "llama-3.3-70b-fp8"), messages=state["messages"], ) return {"messages": state["messages"] + [response.choices[0].message.model_dump()]} ``` For governed MCP calls, tenant-agent dispatch, and trace upload from LangGraph, use the packaged runtime adapter: ```bash pip install "tuning-agents[langgraph] @ git+https://github.com/cerebrixos-org/tuning-engines-cli.git#subdirectory=packages/tuning-agents" ``` ## Other OpenAI-Compatible Clients For any service that asks for a custom provider, AI gateway, OpenAI-compatible base URL, or OpenAI API override, use: ```text Base URL / API base: https://api.tuningengines.com/v1 API key: sk-te-your-inference-key Model: any model allowed by that key ``` For CLI or automation setup, keep the app API URL separate: ```bash export TE_API_URL=https://app.tuningengines.com export TE_API_KEY=te_your_app_api_key export TE_INFERENCE_BASE=https://api.tuningengines.com/v1 export TE_INFERENCE_KEY=sk-te-your-inference-key ``` ## Troubleshooting - A `401` usually means the inference key is missing, expired, or not an `sk-te-...` key. - A `403` usually means the model, MCP server, agent, skill, or tool is not allowed by the key, role, guardrail, or AGT policy. - A provider authentication error can mean Tuning Engines allowed the request through but the upstream provider credential or deployment config needs to be fixed. - If a coding agent relies on tool calls, make sure the selected model supports OpenAI-style tool calling and that the client sends tool definitions through the OpenAI-compatible request shape.