aid: vllm url: https://raw.githubusercontent.com/api-evangelist/vllm/refs/heads/main/apis.yml name: vLLM x-type: company description: >- vLLM is a high-throughput, memory-efficient open-source inference and serving engine for LLMs. It provides an OpenAI-compatible REST server (vllm serve) plus a Python API. vLLM is Apache 2.0 and run on your own GPU infrastructure; there is no hosted vLLM SaaS from the project itself. image: https://kinlane-productions.s3.amazonaws.com/apis-json/apis-json-logo.jpg tags: - LLM - Inference - Open Source - GPU - OpenAI Compatible - Self-Hosted created: '2026-05-08' modified: '2026-05-08' specificationVersion: '0.19' apis: - aid: vllm:openai-compatible name: vLLM OpenAI-Compatible Server description: >- OpenAI-compatible REST API exposed by `vllm serve`. Endpoints include /v1/chat/completions, /v1/completions, /v1/embeddings, /v1/score, /v1/audio/transcriptions, /v1/audio/translations, /v1/realtime (WebSocket), /tokenize, /detokenize, and /generative_scoring. Authentication via the --api-key flag set on server start; clients can use the official OpenAI Python library unmodified, with vLLM-specific extensions passed via extra_body. humanURL: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html baseURL: http://localhost:8000/v1 tags: - Chat - Completions - Embeddings - Audio - Score - OpenAI-Compatible properties: - type: Documentation url: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html - type: GitHub url: https://github.com/vllm-project/vllm - type: OpenAICompat url: https://platform.openai.com/docs/api-reference common: - type: Website url: https://docs.vllm.ai/ - type: DeveloperPortal url: https://docs.vllm.ai/ - type: OpenSource url: https://github.com/vllm-project/vllm - type: Plans url: plans/vllm-plans-pricing.yml - type: RateLimits url: rate-limits/vllm-rate-limits.yml - type: FinOps url: finops/vllm-finops.yml maintainers: - FN: Kin Lane email: kin@apievangelist.com