aid: octoai
name: OctoAI
description: >-
  OctoAI (formerly OctoML) was a Seattle-based AI inference platform founded
  in 2019 as a University of Washington Allen School spin-out of the Apache
  TVM project. The company originally focused on machine-learning model
  optimization and compilation across CPUs, GPUs, and accelerators, and in
  June 2023 launched a generative-AI SaaS inference platform that served
  open-source foundation models (Llama 2, Mixtral, SDXL, Stable Diffusion,
  Whisper) behind OpenAI-style REST APIs with Python and TypeScript SDKs.
  In January 2024 OctoML formally rebranded to OctoAI and in April 2024
  unveiled OctoStack, a self-contained generative-AI production stack for
  deploying models inside customer VPC and on-premises environments across
  NVIDIA GPUs, AMD GPUs, and AWS Inferentia. NVIDIA acquired OctoAI in
  September 2024 for a reported $165M (down from a 2021 peak valuation of
  ~$900M), with CEO Luis Ceze and key staff joining NVIDIA. OctoAI sent
  customers a "Wind down of OctoAI Services" notice and terminated all
  hosted endpoints, accounts, and SDK access on 31 October 2024. The
  octo.ai domain now 301-redirects to nvidia.com and no public OctoAI
  product, API, dashboard, or developer portal remains; the technology has
  been absorbed into NVIDIA's internal AI inference stack and is not
  separately purchasable. This catalog entry is a historical record of the
  former OctoAI developer surface and the GitHub artifacts that remain.
type: Index
image: https://kinlane-productions.s3.amazonaws.com/apis-json/apis-json-logo.jpg
tags:
  - Acquired
  - Defunct
  - AI Inference
  - Generative AI
  - LLM
  - Foundation Models
  - Model Optimization
  - Apache TVM
  - GPU
  - Private AI
  - NVIDIA
url: https://raw.githubusercontent.com/api-evangelist/octoai/refs/heads/main/apis.yml
created: '2026-05-25'
modified: '2026-05-25'
specificationVersion: '0.20'
apis:
  - aid: octoai:octoai-text-gen-api
    name: OctoAI Text Gen Inference API
    description: >-
      OpenAI-compatible chat and text-completion endpoints serving open-source
      LLMs including Llama 2, Llama 3, Mixtral 8x7B, Mistral 7B, Code Llama,
      and customer fine-tunes. Supported streaming, function calling, JSON mode,
      and a shared model catalog. The API was reachable at
      https://text.octoai.run/v1 and shut down on 31 October 2024.
    humanURL: https://octo.ai
    baseURL: https://text.octoai.run/v1
    tags:
      - LLM
      - Chat
      - Completions
      - OpenAI Compatible
      - Defunct
    properties:
      - type: StatusPage
        url: https://octo.ai
        description: Domain now 301-redirects to nvidia.com; service terminated 31 October 2024.
  - aid: octoai:octoai-image-gen-api
    name: OctoAI Image Gen Inference API
    description: >-
      Text-to-image and image-to-image inference for SDXL, SDXL-Lightning,
      Stable Diffusion 1.5, and SSD-1B with ControlNet, LoRA, and adapter
      support, plus inpainting and asset-management endpoints. The API was
      reachable at https://image.octoai.run and shut down on 31 October 2024.
    humanURL: https://octo.ai
    baseURL: https://image.octoai.run
    tags:
      - Images
      - Diffusion
      - SDXL
      - ControlNet
      - Defunct
    properties:
      - type: StatusPage
        url: https://octo.ai
        description: Domain now 301-redirects to nvidia.com; service terminated 31 October 2024.
  - aid: octoai:octoai-asset-library-api
    name: OctoAI Asset Library API
    description: >-
      Endpoints for uploading, listing, and managing user assets — checkpoints,
      LoRAs, textual inversions, ControlNets, and VAE files — used by the image
      and text inference APIs. The API was reachable under api.octoai.cloud and
      shut down on 31 October 2024.
    humanURL: https://octo.ai
    baseURL: https://api.octoai.cloud
    tags:
      - Assets
      - LoRA
      - Checkpoints
      - Defunct
    properties:
      - type: StatusPage
        url: https://octo.ai
        description: Domain now 301-redirects to nvidia.com; service terminated 31 October 2024.
  - aid: octoai:octoai-compute-service-api
    name: OctoAI Compute Service API
    description: >-
      Container-deployment API ("Compute Service") that let customers build,
      register, and serve their own custom model containers on OctoAI's
      managed GPU fleet, with autoscaling and OpenAI-style invocation. Shut
      down on 31 October 2024.
    humanURL: https://octo.ai
    baseURL: https://api.octoai.cloud
    tags:
      - Compute
      - Containers
      - Custom Models
      - Deployment
      - Defunct
    properties:
      - type: StatusPage
        url: https://octo.ai
        description: Domain now 301-redirects to nvidia.com; service terminated 31 October 2024.
  - aid: octoai:octostack
    name: OctoStack
    description: >-
      OctoStack was OctoAI's self-contained generative-AI production stack
      for deploying open and customer-trained foundation models inside a
      customer's VPC or on-premises environment. Announced April 2024, it
      supported NVIDIA GPUs, AMD GPUs, and AWS Inferentia, claimed 4x
      better GPU utilization, and bundled high-utilization batching,
      fine-tuning, and asset management. OctoStack is no longer offered as
      a standalone product after the NVIDIA acquisition; its technology has
      been absorbed into NVIDIA's inference stack.
    humanURL: https://octo.ai
    tags:
      - Private AI
      - On-Prem
      - VPC
      - Inference
      - Defunct
    properties:
      - type: StatusPage
        url: https://octo.ai
        description: Product wound down after NVIDIA acquisition; absorbed into NVIDIA's inference stack.
common:
  - type: Website
    url: https://octo.ai
  - type: GitHubOrganization
    url: https://github.com/octoml
  - type: Acquirer
    url: https://www.nvidia.com
  - type: AcquisitionAnnouncement
    url: https://www.geekwire.com/2024/chip-giant-nvidia-acquires-octoai-a-seattle-startup-that-helps-companies-run-ai-models/
  - type: WindDownNotice
    url: https://www.sunsethq.com/blog/octoai-acquisition
  - type: Crunchbase
    url: https://www.crunchbase.com/organization/octoml
  - type: LinkedIn
    url: https://www.linkedin.com/company/octoml
  - type: Features
    data:
      - name: OpenAI-Compatible Inference
        description: >-
          OctoAI's text and image endpoints implemented OpenAI-style request
          and response shapes so existing OpenAI client code could be
          repointed by changing the base URL and API key.
      - name: Open-Source Model Catalog
        description: >-
          A shared catalog hosted Llama 2/3, Mixtral, Mistral, Code Llama,
          SDXL, SSD-1B, Stable Diffusion 1.5, and Whisper behind per-token
          and per-image pricing without GPU provisioning.
      - name: Custom Model Compute Service
        description: >-
          Customers could package their own model containers and have OctoAI
          autoscale them on a managed GPU fleet, billed by GPU-second.
      - name: Asset Library
        description: >-
          Upload and manage LoRAs, checkpoints, textual inversions, VAEs,
          and ControlNets and apply them at request time to image and
          text-generation endpoints.
      - name: OctoStack Private Deployment
        description: >-
          Self-contained inference stack that ran inside a customer VPC or
          on-premises across NVIDIA, AMD, and AWS Inferentia hardware with
          fine-tuning, batching, and asset management built in.
      - name: TVM-Based Model Optimization
        description: >-
          OctoAI's optimization pipeline descended from Apache TVM (created
          by founder Tianqi Chen) and used ML-guided compilation to improve
          throughput and latency across heterogeneous accelerators.
  - type: UseCases
    data:
      - name: Repointing OpenAI Workloads to Open Models
        description: >-
          Teams used the OpenAI-compatible endpoints to swap GPT-3.5/4 calls
          for Llama 2 / Mixtral at lower cost without rewriting client code.
      - name: Generative Image Pipelines
        description: >-
          Product, marketing, and creative teams ran SDXL-based image
          generation with custom LoRAs and ControlNets for branded asset
          production.
      - name: Private Generative AI in Regulated Industries
        description: >-
          Healthcare, financial-services, and government customers deployed
          OctoStack in-VPC or on-premises to keep prompts, completions, and
          model weights inside their security boundary.
      - name: Custom Fine-Tune Hosting
        description: >-
          Teams fine-tuned open-weights models and served the resulting
          adapters and full-weight checkpoints behind OctoAI inference
          endpoints without managing GPU infrastructure.
  - type: Integrations
    data:
      - name: NVIDIA
        description: >-
          Acquired OctoAI in September 2024 for a reported $165M; OctoAI
          team and technology absorbed into NVIDIA's AI inference stack and
          all OctoAI hosted services terminated on 31 October 2024.
      - name: Apache TVM
        description: >-
          OctoAI's optimization stack originated from Apache TVM, the
          deep-learning compiler founded by OctoAI co-founder Tianqi Chen at
          the University of Washington.
      - name: AWS
        description: >-
          OctoAI was an AWS Partner; OctoStack ran on AWS GPU instances and
          AWS Inferentia accelerators, with sagemaker-examples published in
          the GitHub org.
      - name: Docker
        description: >-
          OctoAI ran a DockerCon 2023 generative-AI workshop and published
          the dockercon23-octoai workshop repo.
      - name: LangChain & LlamaIndex
        description: >-
          OctoAI's LLM endpoints shipped with documented LangChain and
          LlamaIndex providers, demonstrated in the octoml-llm-qa sample
          repo.
  - type: SDK
    data:
      - name: Python SDK
        description: >-
          octoai-python-sdk — Python client for the OctoAI inference,
          asset-library, and compute-service APIs. Package and repo were
          retired alongside the service shutdown on 31 October 2024.
      - name: TypeScript SDK
        description: >-
          octoai-typescript-sdk — TypeScript / Node.js client for the
          OctoAI inference and asset APIs. Retired alongside the service
          shutdown on 31 October 2024.
  - type: SuccessorOrganization
    url: https://www.nvidia.com
maintainers:
  - FN: Kin Lane
    email: kin@apievangelist.com