# LLM API Guide Novita AI provides OpenAI-compatible APIs for a continuously updated model catalog. Last verified: 2026-02-09 ## Table of Contents - [Quick Setup](#quick-setup) - [Basic Chat Completion](#basic-chat-completion) - [Key Parameters](#key-parameters) - [Function Calling (Tool Use)](#function-calling-tool-use) - [Vision (Image Input)](#vision-image-input) - [Structured Outputs (JSON Mode)](#structured-outputs-json-mode) - [Batch API](#batch-api) - [Advanced Features](#advanced-features) - [API Reference](#api-reference) ## Quick Setup ```python from openai import OpenAI client = OpenAI( base_url="https://api.novita.ai/openai", api_key="", ) ``` ## Basic Chat Completion ```python response = client.chat.completions.create( model="deepseek/deepseek-r1", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"} ], max_tokens=512, stream=True, # Recommended for long responses ) for chunk in response: print(chunk.choices[0].delta.content or "", end="") ``` ## Key Parameters ### Model Selection - Browse models: https://novita.ai/models - Query via API: `GET https://api.novita.ai/openai/v1/models` - Format: `provider/model-name` (e.g., `deepseek/deepseek-r1`) - Fetch latest models: ```bash curl https://api.novita.ai/openai/v1/models \ -H "Authorization: Bearer $NOVITA_API_KEY" ``` ### Output Control | Parameter | Description | Typical Value | |-----------|-------------|---------------| | `max_tokens` | Maximum response length | 512-4096 | | `temperature` | Creativity (0=deterministic, 2=creative) | 0.7 | | `top_p` | Nucleus sampling | 0.9 | | `stream` | Stream response chunks | true | ### Repetition Control | Parameter | Description | |-----------|-------------| | `presence_penalty` | Penalize tokens that appeared (encourages new topics) | | `frequency_penalty` | Penalize based on frequency (reduces repetition) | | `stop` | Stop sequences to terminate generation | --- ## Function Calling (Tool Use) Enable LLMs to call external functions/APIs. ### Supported Models - `deepseek/deepseek-v3.2` - `qwen/qwen3-coder-next` - `zai-org/glm-4.7-flash` - [View all](https://novita.ai/models) ### Example Define tools, call the model with `tools`, then read the returned tool call. ```python import json tools = [ { "type": "function", "function": { "name": "get_weather", "description": "Get weather for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "City and state, e.g. San Francisco, CA" } }, "required": ["location"] } } } ] response = client.chat.completions.create( model="deepseek/deepseek-v3.2", messages=[{"role": "user", "content": "What's the weather in Tokyo?"}], tools=tools, ) tool_call = response.choices[0].message.tool_calls[0] print(tool_call.function.name) # "get_weather" print(tool_call.function.arguments) # '{"location": "Tokyo, Japan"}' ``` --- ## Vision (Image Input) Process images with Vision-Language Models. ### Supported Models - `qwen/qwen2-vl-72b-instruct` - `meta-llama/llama-4-maverick` - [View all vision models](https://novita.ai/models) ### Image via URL ```python response = client.chat.completions.create( model="qwen/qwen2-vl-72b-instruct", messages=[ { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": "https://example.com/image.jpg", "detail": "high" # high, low, or auto } }, {"type": "text", "text": "Describe this image."} ] } ], ) ``` ### Image via Base64 ```python import base64 with open("image.jpg", "rb") as f: base64_image = base64.b64encode(f.read()).decode("utf-8") response = client.chat.completions.create( model="qwen/qwen2-vl-72b-instruct", messages=[ { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{base64_image}", "detail": "high" } }, {"type": "text", "text": "What text is in this image?"} ] } ], ) ``` --- ## Structured Outputs (JSON Mode) Force LLM to output valid JSON matching your schema. ### Example ```python response = client.chat.completions.create( model="deepseek/deepseek-v3.2", messages=[ {"role": "system", "content": "Extract expense info as JSON."}, {"role": "user", "content": "I spent $50 on lunch and $30 on coffee today."} ], response_format={ "type": "json_schema", "json_schema": { "name": "expenses", "schema": { "type": "object", "properties": { "items": { "type": "array", "items": { "type": "object", "properties": { "description": {"type": "string"}, "amount": {"type": "number"}, "category": {"type": "string"} }, "required": ["description", "amount"] } } }, "required": ["items"] } } } ) ``` --- ## Batch API Process large volumes of requests asynchronously with discounted batch pricing (see live pricing/docs for current terms). ### Workflow 1. Upload JSONL file with requests 2. Create batch job 3. Poll for completion 4. Download results ```python with open("requests.jsonl", "rb") as f: file = client.files.create(file=f, purpose="batch") batch = client.batches.create( input_file_id=file.id, endpoint="/v1/chat/completions", completion_window="24h" ) status = client.batches.retrieve(batch.id) print(status.status) # "completed" results = client.files.content(status.output_file_id) ``` --- ## Advanced Features ### Prompt Caching Automatically caches repeated prompt prefixes for faster responses and lower costs. ### Reasoning Models Models like `deepseek/deepseek-r1` show reasoning process: ```python print(response.choices[0].message.reasoning_content) ``` ### Streaming Always use streaming for long outputs to avoid timeouts: ```python stream = client.chat.completions.create( model="deepseek/deepseek-r1", messages=[...], stream=True, ) for chunk in stream: print(chunk.choices[0].delta.content or "", end="") ``` --- ## API Reference - **Endpoint**: `POST https://api.novita.ai/openai/v1/chat/completions` - **Full API docs**: https://novita.ai/docs/api-reference/model-apis-llm-create-chat-completion - **Rate limits**: Vary by account tier and model; verify current limits in docs/console