aid: cerebras
name: Cerebras
description: >-
  Cerebras Systems designs the wafer-scale WSE-3 chip and the CS-2/CS-3 AI
  systems built around it, and operates Cerebras Inference, a high-throughput
  cloud platform for running open-source large language models including
  Llama, Qwen, and DeepSeek families. Cerebras Inference is positioned as one
  of the fastest token-generation services in the market, with OpenAI-compatible
  REST endpoints, first-party Python and Node.js SDKs, and dedicated and
  on-prem deployment options for enterprise customers. The company partners
  with OpenAI, AWS, GSK, Mayo Clinic, and Notion, and maintains an active open
  source presence including its model garden and inference cookbook on GitHub.
type: Index
position: Provider
access: 3rd-Party
image: https://kinlane-productions.s3.amazonaws.com/apis-json/apis-json-logo.jpg
tags:
  - AI Inference
  - Large Language Models
  - Wafer Scale
  - Hardware
  - Cloud
  - OpenAI Compatible
  - LLM
  - SDK
  - Accelerator
  - High Performance Computing
url: https://raw.githubusercontent.com/api-evangelist/cerebras/refs/heads/main/apis.yml
created: '2026-05-23'
modified: '2026-05-23'
specificationVersion: '0.20'
apis:
  - aid: cerebras:cerebras-inference-api
    name: Cerebras Inference API
    description: >-
      The Cerebras Inference API exposes ultra-low-latency inference for
      open-weight large language models including Llama 3.1, Llama 4, Qwen,
      and other frontier open models. The API is OpenAI-compatible at the
      chat completions surface, supports streaming, and is consumed via
      first-party Python and Node.js SDKs as well as raw HTTP. Dedicated and
      on-prem deployments are available for production workloads.
    humanURL: https://inference-docs.cerebras.ai
    baseURL: https://api.cerebras.ai/v1
    tags:
      - Inference
      - LLM
      - Chat Completions
      - OpenAI Compatible
      - Streaming
      - REST
    properties:
      - type: Documentation
        url: https://inference-docs.cerebras.ai
      - type: GettingStarted
        url: https://inference-docs.cerebras.ai/quickstart
      - type: SDK
        url: https://github.com/Cerebras/cerebras-cloud-sdk-python
      - type: SDK
        url: https://github.com/Cerebras/cerebras-cloud-sdk-node
      - type: Cookbook
        url: https://github.com/Cerebras/Cerebras-Inference-Cookbook
      - type: VSCodeExtension
        url: https://github.com/Cerebras/vscode-cerebras-chat
      - type: MCP
        url: https://github.com/Cerebras/cerebras-code-mcp
    features:
      - name: OpenAI-Compatible Chat Completions
        description: >-
          Drop-in compatibility with OpenAI client libraries for fast
          migration of existing applications.
      - name: Ultra-Fast Token Generation
        description: >-
          WSE-3 wafer-scale silicon delivers token-per-second throughput
          marketed as up to 15x faster than GPU inference.
      - name: Open-Weight Model Catalog
        description: >-
          Hosted access to Llama, Qwen, DeepSeek, and other curated
          open-source models with no infrastructure setup.
      - name: Streaming Responses
        description: >-
          Server-sent event streaming for chat completions enabling
          real-time agent and voice UX.
      - name: Dedicated Endpoints
        description: >-
          Private capacity and custom model hosting via dedicated endpoint
          tier for production workloads.
      - name: First-Party SDKs
        description: >-
          Official Python and TypeScript/Node SDKs with typed model and
          parameter support.
      - name: On-Premises Deployment
        description: >-
          CS-2 and CS-3 systems for private data center and sovereign AI
          deployments.
    useCases:
      - name: Real-Time Voice and Agent Applications
        description: >-
          Power voice agents, copilots, and tool-calling agents that need
          sub-second time-to-first-token.
      - name: Coding Copilots
        description: >-
          Drive code generation and review assistants with fast inference
          on open-weight coding models.
      - name: Reasoning and Research Workloads
        description: >-
          Run long-context reasoning loops and chain-of-thought workflows
          economically at high throughput.
      - name: Enterprise Inference Migration
        description: >-
          Move existing OpenAI-based workloads to Cerebras with minimal
          code change for cost and latency wins.
      - name: Healthcare and Life Sciences
        description: >-
          Used by partners including GSK and Mayo Clinic for biomedical
          and clinical AI workloads.
    integrations:
      - name: OpenAI SDK
      - name: LangChain
      - name: LlamaIndex
      - name: Vercel AI SDK
      - name: AWS
      - name: Hugging Face
      - name: VS Code
      - name: Model Context Protocol
    authentication:
      - type: API Key
        description: >-
          Requests authenticate via Bearer token using a CEREBRAS_API_KEY
          provisioned from the Cerebras Cloud dashboard.
common:
  - type: Website
    url: https://cerebras.ai
  - type: Documentation
    url: https://inference-docs.cerebras.ai
  - type: Developer Portal
    url: https://cloud.cerebras.ai
  - type: Pricing
    url: https://www.cerebras.ai/inference
  - type: GitHubOrganization
    url: https://github.com/Cerebras
  - type: ModelZoo
    url: https://github.com/Cerebras/modelzoo
  - type: Blog
    url: https://www.cerebras.ai/blog
  - type: LinkedIn
    url: https://www.linkedin.com/company/cerebras-systems
  - type: Twitter
    url: https://twitter.com/CerebrasSystems
  - type: Status
    url: https://status.cerebras.ai
  - type: LLMsTxt
    url: https://inference-docs.cerebras.ai/llms.txt
maintainers:
  - FN: Kin Lane
    email: kin@apievangelist.com