---
name: venice-ai-api
description: Venice.ai API integration for privacy-first AI applications. Use when building applications with Venice.ai API for chat completions, image generation, video generation, text-to-speech, speech-to-text, or embeddings. Triggers on Venice, Venice.ai, uncensored AI, privacy-first AI, or when users need OpenAI-compatible API with uncensored models.
---

# Venice.ai API Skill

Venice.ai provides privacy-first AI infrastructure with uncensored models and zero data retention. The API is OpenAI-compatible, allowing use of the OpenAI SDK with Venice's base URL. Inference runs on a decentralized network (DePIN) where nodes are disincentivized from retaining user data.

## Quick Reference

**Base URL:** `https://api.venice.ai/api/v1`
**Auth:** `Authorization: Bearer VENICE_API_KEY`
**SDK:** Use OpenAI SDK with custom base URL
**API Key Types:** `ADMIN` (full access) or `INFERENCE` (inference only)

## Setup

```python
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("VENICE_API_KEY"),
    base_url="https://api.venice.ai/api/v1"
)
```

```javascript
import OpenAI from 'openai';

const client = new OpenAI({
    apiKey: process.env.VENICE_API_KEY,
    baseURL: 'https://api.venice.ai/api/v1'
});
```

## Account Tiers

| Tier | Qualification | Rate Limits | Use Case |
|------|--------------|-------------|----------|
| Explorer | Pro subscription | Low RPM/TPM (~15-25 req/day) | Testing, prototyping |
| Paid | USD balance or staked VVV (Diems) | Standard production limits | Commercial apps |
| Partner | Enterprise agreement | Custom high-volume | Enterprise SaaS |

## API Capabilities

### 1. Chat Completions
Text inference with multimodal support (text, images, audio, video).

```python
completion = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Hello!"}
    ]
)
```

**Popular Models:**
- `llama-3.3-70b` - Balanced performance (Tier M, 128K context)
- `zai-org-glm-4.7` - Complex tasks, deep reasoning (Tier L, 128K context)
- `mistral-31-24b` - Vision + function calling (Tier S, 131K context)
- `venice-uncensored` - No content filtering (Tier S, 32K context)
- `deepseek-ai-DeepSeek-R1` - Advanced reasoning, math, coding (Tier L, 64K context)
- `qwen3-235b` - Massive MoE reasoning (Tier L)
- `qwen3-4b` - Fast, lightweight (Tier XS, 40K context)

**Venice Parameters** (via `extra_body` in Python, direct in JS):
- `enable_web_search`: "off" | "on" | "auto"
- `enable_web_scraping`: boolean
- `enable_web_citations`: boolean — adds `^index^` citation format
- `include_venice_system_prompt`: boolean (default: true)
- `strip_thinking_response`: boolean
- `disable_thinking`: boolean
- `character_slug`: string
- `prompt_cache_key`: string — routing hint for cache hits
- `prompt_cache_retention`: "default" | "extended" | "24h"

See [references/chat-completions.md](references/chat-completions.md) for full parameter reference.

### 2. Image Generation
Generate images from text prompts.

```python
import requests

response = requests.post(
    "https://api.venice.ai/api/v1/image/generate",
    headers={"Authorization": f"Bearer {os.getenv('VENICE_API_KEY')}"},
    json={
        "model": "venice-sd35",
        "prompt": "A sunset over mountains",
        "width": 1024,
        "height": 1024
    }
)
# Response contains base64 images in images array
```

**Image Models:**
| Model | Best For | Pricing |
|-------|----------|---------|
| `qwen-image` | Highest quality, editing | Variable |
| `venice-sd35` | General purpose (default) | ~$0.01/image |
| `hidream` | Fast generation | ~$0.01/image |
| `flux-2-pro` | Professional quality | ~$0.04/image |
| `flux-2-max` | High-quality output | ~$0.02/image |
| `nano-banana-pro` | Photorealism, 2K/4K support | $0.18-$0.35 |

### 3. Image Upscaling
Enhance image resolution 2x or 4x.

```python
import base64

with open("image.jpg", "rb") as f:
    image_base64 = base64.b64encode(f.read()).decode("utf-8")

response = requests.post(
    "https://api.venice.ai/api/v1/image/upscale",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "image": image_base64,
        "scale": 4  # 2 or 4
    }
)
# Returns raw image binary
with open("upscaled.png", "wb") as f:
    f.write(response.content)
```

**Pricing:** $0.02 (2x), $0.08 (4x)

### 4. Image Editing (Inpainting)
Modify existing images with AI-powered instructions.

```python
import base64

with open("photo.jpg", "rb") as f:
    image_base64 = base64.b64encode(f.read()).decode("utf-8")

response = requests.post(
    "https://api.venice.ai/api/v1/image/edit",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "prompt": "Change the sky to a sunset",
        "image": image_base64  # or URL starting with http/https
    }
)
# Returns raw image binary
with open("edited.png", "wb") as f:
    f.write(response.content)
```

**Model:** Uses Qwen-Image. **Pricing:** ~$0.04/edit.

See [references/image-api.md](references/image-api.md) for all parameters and style presets.

### 5. Video Generation
Async queue-based video generation. Always call `/video/quote` first for pricing.

**Full Workflow:**

```python
import requests
import time
import base64

api_key = os.getenv("VENICE_API_KEY")
headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

# Step 1: Get price quote
quote = requests.post(
    "https://api.venice.ai/api/v1/video/quote",
    headers=headers,
    json={
        "model": "kling-2.5-turbo-pro-text-to-video",
        "duration": "10s",
        "resolution": "720p",
        "aspect_ratio": "16:9",
        "audio": True
    }
)
print(f"Estimated cost: ${quote.json()['quote']}")

# Step 2: Queue the job (text-to-video)
queue_resp = requests.post(
    "https://api.venice.ai/api/v1/video/queue",
    headers=headers,
    json={
        "model": "kling-2.5-turbo-pro-text-to-video",
        "prompt": "A serene forest with sunlight filtering through trees",
        "negative_prompt": "low quality, blurry",
        "duration": "10s",
        "resolution": "720p",
        "aspect_ratio": "16:9",
        "audio": True
    }
)
queue_id = queue_resp.json()["queueid"]

# Step 3: Poll until complete
while True:
    status_resp = requests.post(
        "https://api.venice.ai/api/v1/video/retrieve",
        headers=headers,
        json={
            "model": "kling-2.5-turbo-pro-text-to-video",
            "queueid": queue_id,
            "delete_media_on_completion": False
        }
    )
    if (status_resp.status_code == 200
            and status_resp.headers.get("Content-Type") == "video/mp4"):
        with open("output.mp4", "wb") as f:
            f.write(status_resp.content)
        print("Video saved!")
        break
    else:
        status = status_resp.json()
        print(f"Status: {status['status']}, Duration: {status['executionDuration']}ms")
        time.sleep(10)

# Step 4: Cleanup (optional — deletes from Venice storage)
requests.post(
    "https://api.venice.ai/api/v1/video/complete",
    headers=headers,
    json={
        "model": "kling-2.5-turbo-pro-text-to-video",
        "queueid": queue_id
    }
)
```

**Image-to-Video:**
```python
with open("image.png", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode("utf-8")

queue_resp = requests.post(
    "https://api.venice.ai/api/v1/video/queue",
    headers=headers,
    json={
        "model": "wan-2.5-preview-image-to-video",
        "prompt": "Animate this scene with gentle motion",
        "image_url": f"data:image/png;base64,{img_b64}",
        "duration": "5s",
        "resolution": "720p"
    }
)
```

**Video Models:**
| Model | Type | Features |
|-------|------|----------|
| `kling-2.5-turbo-pro` | Text/Image-to-Video | Fast, high quality |
| `wan-2.5-preview` | Image-to-Video | Animation specialist |
| `ltx-2-full` | Text/Image-to-Video | Full quality |
| `veo3-fast` | Text/Image-to-Video | Speed-optimized |
| `sora-2` | Image-to-Video | High-end quality |

See [references/video-api.md](references/video-api.md) for full parameter reference.

### 6. Text-to-Speech
Convert text to audio with 60+ voices.

```python
response = requests.post(
    "https://api.venice.ai/api/v1/audio/speech",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "input": "Hello, welcome to Venice.",
        "model": "tts-kokoro",
        "voice": "af_sky",
        "speed": 1.0,            # 0.25 to 4.0
        "response_format": "mp3"  # mp3, opus, aac, flac, wav, pcm
    }
)
with open("speech.mp3", "wb") as f:
    f.write(response.content)
```

**Voices:** `af_sky`, `af_nova`, `am_liam`, `bf_emma`, `zf_xiaobei`, `jm_kumo`, and 50+ more.
**Pricing:** $3.50 per 1M characters.

### 7. Speech-to-Text
Transcribe audio files.

```python
with open("audio.mp3", "rb") as f:
    response = requests.post(
        "https://api.venice.ai/api/v1/audio/transcriptions",
        headers={"Authorization": f"Bearer {api_key}"},
        files={"file": f},
        data={
            "model": "nvidia/parakeet-tdt-0.6b-v3",
            "response_format": "json",  # json or text
            "timestamps": "true"
        }
    )
```

**Formats:** WAV, FLAC, MP3, M4A, AAC, MP4.
**Pricing:** $0.0001 per audio second.

### 8. Embeddings
Generate vector embeddings for RAG and semantic search.

```python
response = requests.post(
    "https://api.venice.ai/api/v1/embeddings",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "model": "text-embedding-bge-m3",
        "input": "Privacy-first AI infrastructure",
        "encoding_format": "float"  # or "base64"
    }
)
```

### 9. Vision (Multimodal)
Analyze images with vision-capable models.

```python
response = client.chat.completions.create(
    model="mistral-31-24b",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What is in this image?"},
            {"type": "image_url", "image_url": {"url": "https://..."}}
        ]
    }]
)
```

### 10. Function Calling
Define tools for the model to call.

```python
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="zai-org-glm-4.7",
    messages=[{"role": "user", "content": "Weather in SF?"}],
    tools=tools
)
```

### 11. Structured Outputs
Get guaranteed JSON schema responses.

```python
response = client.chat.completions.create(
    model="venice-uncensored",
    messages=[...],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "my_response",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {"answer": {"type": "string"}},
                "required": ["answer"],
                "additionalProperties": False
            }
        }
    }
)
```

**Requirements:** `strict: true`, `additionalProperties: false`, all fields in `required`.

### 12. AI Characters
Interact with predefined AI personas.

```python
# List characters
characters = requests.get(
    "https://api.venice.ai/api/v1/characters",
    headers={"Authorization": f"Bearer {api_key}"},
    params={"categories": "philosophy", "limit": 50}
).json()

# Chat with a character
response = client.chat.completions.create(
    model="venice-uncensored",
    messages=[{"role": "user", "content": "What is the meaning of life?"}],
    extra_body={
        "venice_parameters": {"character_slug": "alan-watts"}
    }
)
```

### 13. Model Discovery
Query available models and capabilities programmatically.

```python
# List models by type
models = requests.get(
    "https://api.venice.ai/api/v1/models",
    headers={"Authorization": f"Bearer {api_key}"},
    params={"type": "text"}  # text, image, audio, video, embedding
).json()

# Get model traits for auto-selection
traits = requests.get(
    "https://api.venice.ai/api/v1/models/traits",
    params={"type": "text"}
).json()
# e.g. {"default": "zai-org-glm-4.7", "fastest": "qwen3-4b", "uncensored": "venice-uncensored"}

# Use trait as model ID for automatic routing
response = client.chat.completions.create(
    model="fastest",  # Venice routes to the current fastest model
    messages=[...]
)
```

## Error Handling

### Error Codes

| Status | Error Code | Meaning | Action |
|--------|------------|---------|--------|
| 400 | `INVALID_REQUEST` | Bad parameters | Check payload schema |
| 401 | `AUTHENTICATION_FAILED` | Invalid API key | Verify key and balance |
| 402 | — | Insufficient balance | Add USD or stake VVV |
| 403 | — | Unauthorized access | Check key type (ADMIN vs INFERENCE) |
| 413 | — | Payload too large | Reduce request size |
| 415 | — | Invalid content type | Use `application/json` |
| 422 | — | Content policy violation | Modify prompt |
| 429 | `RATE_LIMIT_EXCEEDED` | Too many requests | Backoff, wait for reset |
| 500 | `INFERENCE_FAILED` | Model error | Retry with backoff |
| 503 | — | Model at capacity | Retry later or switch model |
| 504 | — | Timeout | Use streaming for long responses |

### Abuse Protection
Sending >20 failed requests in 30 seconds triggers a 30-second IP block. Always implement backoff.

### Retry with Exponential Backoff (Python)

```python
import time
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def create_venice_session():
    """Create a requests session with automatic retry and backoff."""
    session = requests.Session()
    retry = Retry(
        total=3,
        backoff_factor=1,  # 1s, 2s, 4s
        status_forcelist=[429, 500, 502, 503, 504],
        allowed_methods=["POST", "GET"]
    )
    adapter = HTTPAdapter(max_retries=retry)
    session.mount("https://", adapter)
    return session

session = create_venice_session()
response = session.post(url, json=payload, headers=headers)
```

### Retry with Exponential Backoff (JavaScript)

```javascript
async function veniceRequest(url, options, maxRetries = 3) {
    for (let attempt = 0; attempt <= maxRetries; attempt++) {
        const response = await fetch(url, options);

        if (response.ok) return response;

        if ([429, 500, 502, 503, 504].includes(response.status)) {
            if (attempt < maxRetries) {
                const delay = Math.pow(2, attempt) * 1000;
                console.log(`Retry ${attempt + 1} in ${delay}ms (status ${response.status})`);
                await new Promise(r => setTimeout(r, delay));
                continue;
            }
        }

        throw new Error(`Venice API error: ${response.status} ${response.statusText}`);
    }
}
```

### Rate Limit-Aware Client (Python)

```python
import time
import requests

class VeniceClient:
    """Wrapper that respects rate limits using response headers."""
    def __init__(self, api_key):
        self.api_key = api_key
        self.base_url = "https://api.venice.ai/api/v1"
        self.session = create_venice_session()
        self.headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }

    def request(self, method, path, **kwargs):
        resp = self.session.request(
            method, f"{self.base_url}{path}",
            headers=self.headers, **kwargs
        )
        remaining = resp.headers.get("x-ratelimit-remaining-requests")
        if remaining and int(remaining) <= 1:
            reset = resp.headers.get("x-ratelimit-reset-requests")
            if reset:
                wait = max(0, float(reset) - time.time())
                time.sleep(wait)
        resp.raise_for_status()
        return resp
```

## Response Headers

Monitor these headers for production:
- `x-ratelimit-remaining-requests` — Requests left in window
- `x-ratelimit-remaining-tokens` — Tokens left in window
- `x-ratelimit-reset-requests` — Timestamp when request count resets
- `x-venice-balance-usd` — USD balance
- `x-venice-balance-diem` — DIEM balance
- `x-venice-is-blurred` — Image was blurred (safe mode)
- `x-venice-is-content-violation` — Content policy violation
- `x-venice-model-deprecation-warning` — Deprecation notice
- `x-venice-model-deprecation-date` — Sunset date
- `CF-RAY` — Request ID for support

## Rate Limits by Model Tier

**Text Models:**
| Tier | RPM | TPM | Example Models |
|------|-----|-----|----------------|
| XS | 500 | 1,000,000 | qwen3-4b, llama-3.2-3b |
| S | 75 | 750,000 | mistral-31-24b, venice-uncensored |
| M | 50 | 750,000 | llama-3.3-70b, qwen3-next-80b |
| L | 20 | 500,000 | zai-org-glm-4.7, deepseek-ai-DeepSeek-R1 |

**Other Endpoints:**
| Endpoint | RPM |
|----------|-----|
| Image Generation | 20 |
| Audio Synthesis | 60 |
| Audio Transcription | 60 |
| Embeddings | 500 |
| Video Queue | 40 |
| Video Retrieve | 120 |

## API Key Management

```bash
# Create key programmatically (requires ADMIN key)
curl -X POST https://api.venice.ai/api/v1/api_keys \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"apiKeyType": "INFERENCE", "description": "My App", "consumptionLimit": {"usd": 100}}'

# Check rate limits and balance
curl https://api.venice.ai/api/v1/api_keys/rate_limits \
  -H "Authorization: Bearer $VENICE_API_KEY"

# List keys
curl https://api.venice.ai/api/v1/api_keys \
  -H "Authorization: Bearer $VENICE_API_KEY"

# Delete key
curl -X DELETE "https://api.venice.ai/api/v1/api_keys?id={key_id}" \
  -H "Authorization: Bearer $VENICE_API_KEY"
```

## Reference Files

- [references/chat-completions.md](references/chat-completions.md) — Full chat API parameters
- [references/image-api.md](references/image-api.md) — Image generation, editing, upscaling details
- [references/video-api.md](references/video-api.md) — Video generation workflow and parameters
- [references/models.md](references/models.md) — Available models, tiers, pricing, and capabilities