--- name: together-common-errors description: 'Together AI common errors for inference, fine-tuning, and model deployment. Use when working with Together AI''s OpenAI-compatible API. Trigger: "together common errors". ' allowed-tools: Read, Write, Edit, Bash(pip:*), Grep version: 1.0.0 license: MIT author: Jeremy Longshore tags: - saas - ai - inference - together compatibility: Designed for Claude Code --- # Together AI Common Errors ## Overview Together AI provides OpenAI-compatible inference, fine-tuning, and batch processing across 100+ open-source models (Llama, Mixtral, Qwen, FLUX). Common errors include model-not-available failures when requesting deprecated or gated models, token limit violations that differ per model architecture, and fine-tune job failures from dataset formatting issues. The API is compatible with any OpenAI client library at `base_url = 'https://api.together.xyz/v1'`. Model IDs use the full namespace format (e.g., `meta-llama/Meta-Llama-3.1-8B-Instruct`) and must match exactly. This reference covers inference, fine-tuning, and deployment errors. ## Error Reference | Code | Message | Cause | Fix | |------|---------|-------|-----| | `401` | `Unauthorized` | Invalid or missing `TOGETHER_API_KEY` | Verify key at api.together.xyz > Settings | | `400` | `Model not found` | Wrong model ID or model deprecated | Use `client.models.list()` to get valid model IDs | | `400` | `Token limit exceeded` | Input + max_tokens exceeds model context | Reduce input length or lower `max_tokens` parameter | | `400` | `Invalid fine-tune dataset` | JSONL format errors or missing required fields | Each line must be valid JSON with `messages` array | | `402` | `Insufficient credits` | Account balance depleted | Add credits at api.together.xyz > Billing | | `404` | `Fine-tune job not found` | Invalid job ID or job expired | List active jobs with `client.fine_tuning.list()` | | `429` | `Rate limit exceeded` | Too many concurrent requests | Implement backoff; use batch API for 50% cost reduction | | `500` | `Model overloaded` | High demand on specific model | Retry with backoff; try alternative model of same family | ## Error Handler ```typescript interface TogetherError { code: number; message: string; category: "auth" | "rate_limit" | "validation" | "billing"; } function classifyTogetherError(status: number, body: string): TogetherError { if (status === 401) { return { code: 401, message: body, category: "auth" }; } if (status === 402) { return { code: 402, message: body, category: "billing" }; } if (status === 429) { return { code: 429, message: "Rate limit exceeded", category: "rate_limit" }; } return { code: status, message: body, category: "validation" }; } ``` ## Debugging Guide ### Authentication Errors Together uses Bearer token authentication. Pass `TOGETHER_API_KEY` via `Authorization: Bearer` header or set it in the client constructor. Keys do not expire but can be revoked. If using the OpenAI client library, set `base_url='https://api.together.xyz/v1'` and pass the Together key as `api_key`. ### Rate Limit Errors Rate limits vary by plan tier and are enforced per-key. Free tier allows 5 requests/second; paid tiers scale higher. Use the batch inference API (`/v1/batch`) for non-real-time workloads at 50% cost reduction. Check `X-RateLimit-Remaining` header to monitor quota. ### Validation Errors Model IDs must match exactly (e.g., `meta-llama/Meta-Llama-3.1-8B-Instruct`). Use `client.models.list()` to enumerate available models. Token limits vary per model -- Llama 3.1 supports 128K context while older models may support only 4K. Fine-tune datasets must be JSONL with each line containing a `messages` array in chat format. Empty `messages` arrays or missing `role` fields cause silent validation failures. Validate each JSONL line independently before uploading. ## Error Handling | Scenario | Pattern | Recovery | |----------|---------|----------| | Model deprecated | 400 with "not found" | Check model list; migrate to successor model | | Token limit exceeded | 400 on long prompts | Truncate input or use model with larger context window | | Fine-tune dataset rejected | JSONL validation errors | Validate each line independently; fix and re-upload | | Credits depleted mid-batch | 402 after N successful calls | Add credits, resume from last successful request | | Model overloaded at peak | 500 on popular models | Fall back to alternative model in same family | ## Quick Diagnostic ```bash # Verify API connectivity and list available models curl -s -o /dev/null -w "%{http_code}" \ -H "Authorization: Bearer $TOGETHER_API_KEY" \ https://api.together.xyz/v1/models ``` ## Resources - [Together AI Documentation](https://docs.together.ai/) - [API Reference](https://docs.together.ai/reference/chat-completions-1) - [Supported Models](https://docs.together.ai/docs/inference-models) ## Next Steps See `together-debug-bundle`.