openapi: 3.0.1 info: title: Cerebrium Cortex Inference API description: >- Specification of the Cerebrium serverless GPU inference surface. Each function deployed with the Cortex framework and the Cerebrium CLI becomes an authenticated POST endpoint under /v4/{projectId}/{appName}/{functionName}. The same endpoint supports synchronous JSON responses, Server-Sent Events streaming (Accept: text/event-stream), asynchronous submission via the async=true query parameter, and OpenAI-compatible chat/embedding payloads. Endpoint hosts are regional (for example api.aws.us-east-1.cerebrium.ai); the legacy api.cortex.cerebrium.ai host is also documented. Authentication uses the JWT bearer token from the dashboard API Keys section. App deployment, scaling, logs, and status are managed through the Cerebrium CLI and dashboard rather than a documented public management REST API. termsOfService: https://www.cerebrium.ai/terms-of-service contact: name: Cerebrium Support url: https://www.cerebrium.ai/docs email: support@cerebrium.ai version: 'v4' servers: - url: https://api.aws.us-east-1.cerebrium.ai description: AWS us-east-1 regional endpoint (region varies by deployment) - url: https://api.cortex.cerebrium.ai description: Cortex endpoint host (also documented for v4 invocation) paths: /v4/{projectId}/{appName}/{functionName}: post: operationId: runFunction tags: - Inference summary: Invoke a deployed function description: >- Calls a deployed Cortex function. The JSON request body maps directly to the function's parameters. Returns a JSON object containing run_id, run_time_ms, and result. Use the function name `run` (or `predict`, or any public function defined in the app) as documented in examples. To stream output, send Accept: text/event-stream and have the function yield data; the response is a text/event-stream of `data:` lines. parameters: - name: projectId in: path required: true description: Cerebrium project identifier, for example p-xxxxxxxx. schema: type: string - name: appName in: path required: true description: Deployed app name from cerebrium.toml. schema: type: string - name: functionName in: path required: true description: >- Public function exposed by the app (for example run or predict). Functions prefixed with an underscore are private and not callable. schema: type: string - name: async in: query required: false description: >- When true, the request is accepted for asynchronous execution and the call returns 202 with a run_id instead of the result. schema: type: boolean - name: Accept in: header required: false description: >- Set to text/event-stream to receive a Server-Sent Events stream from a generator function. schema: type: string requestBody: required: false content: application/json: schema: type: object additionalProperties: true description: >- Free-form JSON whose keys map to the deployed function's parameters. responses: '200': description: Synchronous execution succeeded. content: application/json: schema: $ref: '#/components/schemas/RunResponse' text/event-stream: schema: type: string description: Server-Sent Events stream of `data:` lines. '202': description: Async request accepted (async=true). content: application/json: schema: $ref: '#/components/schemas/AsyncAcceptedResponse' '401': description: Missing or invalid authentication token. '403': description: Authenticated token is not authorized for this resource. '404': description: App or function not found. '500': description: Application exception or platform error. /v4/{projectId}/{appName}/{functionName}/chat/completions: post: operationId: openaiChatCompletions tags: - OpenAI Compatible summary: OpenAI-compatible chat completions description: >- OpenAI-compatible chat completions path served by a function that accepts the OpenAI request parameters and yields a JSON-serializable response. Use the standard OpenAI client with the function base URL and the Cerebrium JWT as the api_key. parameters: - name: projectId in: path required: true schema: type: string - name: appName in: path required: true schema: type: string - name: functionName in: path required: true schema: type: string requestBody: required: true content: application/json: schema: type: object additionalProperties: true description: OpenAI-compatible chat completion request body. responses: '200': description: Chat completion response (JSON or streamed). '401': description: Missing or invalid authentication token. '404': description: App or function not found. '500': description: Application exception or platform error. /v4/{projectId}/{appName}/{functionName}/embedding: post: operationId: openaiEmbedding tags: - OpenAI Compatible summary: OpenAI-compatible embeddings description: >- OpenAI-compatible embeddings path served by a function implementing the embeddings interface. Invoked with the standard OpenAI client using the function base URL and the Cerebrium JWT. parameters: - name: projectId in: path required: true schema: type: string - name: appName in: path required: true schema: type: string - name: functionName in: path required: true schema: type: string requestBody: required: true content: application/json: schema: type: object additionalProperties: true description: OpenAI-compatible embeddings request body. responses: '200': description: Embedding response. '401': description: Missing or invalid authentication token. '404': description: App or function not found. '500': description: Application exception or platform error. components: securitySchemes: bearerAuth: type: http scheme: bearer bearerFormat: JWT description: >- JWT token from the dashboard API Keys section, sent as Authorization: Bearer . schemas: RunResponse: type: object properties: run_id: type: string description: Unique identifier for the request. run_time_ms: type: number description: Execution duration in milliseconds. result: description: The data returned by the function. required: - run_id - result AsyncAcceptedResponse: type: object properties: run_id: type: string description: Identifier of the accepted asynchronous run. required: - run_id security: - bearerAuth: []