--- name: nebius-batch-synthetic description: Generate synthetic training data using Nebius Token Factory batch inference (50% cheaper than real-time, async, no rate-limit impact). Use this skill whenever the user wants to run batch inference on Nebius, generate synthetic datasets at scale, create instruction-tuning data, run async LLM jobs on large prompt sets, or export batch results as fine-tuning JSONL. Trigger for phrases like "generate synthetic data with Nebius", "run batch inference", "create training data at scale", "async LLM generation", "batch job on Token Factory", "generate QA pairs", or any question about bulk/offline inference or synthetic data pipelines on Nebius. --- # Nebius Batch Inference — Synthetic Data Generation Run large-scale async LLM jobs at 50% cost, no rate-limit impact. Ideal for generating synthetic training datasets, annotation, evaluation sets, or any offline bulk inference. ## Prerequisites ```bash pip install openai export NEBIUS_API_KEY="your-key" ``` API base: `https://api.tokenfactory.nebius.com/v1/` ## Limits & pricing | Constraint | Value | |------------|-------| | Max requests per file | 5,000,000 | | Max file size | 10 GB | | Completion window | 24 hours | | Cost vs real-time | **50% cheaper** | | Rate limits | Not consumed | ## Complete pipeline ### 1. Build JSONL batch file Each line = one inference request. All requests must use the same model. ```python import json, uuid prompts = [ "Explain vector databases for beginners.", "What is the difference between RAG and fine-tuning?", # ... up to 5M prompts ] with open("batch_requests.jsonl", "w") as f: for prompt in prompts: f.write(json.dumps({ "custom_id": str(uuid.uuid4()), # unique ID to match results "url": "/v1/chat/completions", "body": { "model": "meta-llama/Meta-Llama-3.1-70B-Instruct", "messages": [ {"role": "system", "content": "You are a helpful expert."}, {"role": "user", "content": prompt}, ], "max_tokens": 1024, "temperature": 0.7, }, }) + "\n") ``` ### 2. Upload + create batch job ```python from openai import OpenAI client = OpenAI(base_url="https://api.tokenfactory.nebius.com/v1/", api_key=API_KEY) with open("batch_requests.jsonl", "rb") as f: file_obj = client.files.create(file=f, purpose="batch") batch = client.batches.create( input_file_id=file_obj.id, endpoint="/v1/chat/completions", completion_window="24h", metadata={"description": "synthetic-data-gen"}, ) print(f"Batch: {batch.id} status={batch.status}") ``` ### 3. Poll until complete ```python import time while True: batch = client.batches.retrieve(batch.id) counts = batch.request_counts print(f"status={batch.status} done={counts.completed}/{counts.total}") if batch.status in ("completed", "failed", "cancelled", "expired"): break time.sleep(30) ``` ### 4. Download outputs ```python content = client.files.content(batch.output_file_id) results = [json.loads(line) for line in content.text.strip().splitlines()] ``` Each result record: ```json { "custom_id": "...", "response": { "body": { "choices": [{"message": {"content": "The model's response..."}}] } } } ``` ### 5. Export as fine-tuning JSONL ```python # Build custom_id → original prompt lookup id_to_prompt = {} with open("batch_requests.jsonl") as f: for line in f: req = json.loads(line) user_msg = next(m["content"] for m in req["body"]["messages"] if m["role"] == "user") id_to_prompt[req["custom_id"]] = user_msg with open("training.jsonl", "w") as out: for rec in results: reply = rec["response"]["body"]["choices"][0]["message"]["content"].strip() prompt = id_to_prompt.get(rec["custom_id"], "") if len(reply) < 50: # quality filter continue out.write(json.dumps({ "messages": [ {"role": "user", "content": prompt}, {"role": "assistant", "content": reply}, ] }) + "\n") ``` ## Tips for synthetic data quality - Use a **large teacher model** (70B+) to generate, then fine-tune a smaller model — teacher distillation - Set `temperature: 0.6–0.8` for diverse yet coherent outputs - Add a quality filter (min length, keyword checks) before using as training data - Run deduplication on `custom_id` before uploading as training file ## Clean up batch files You can have up to 500 batch files. Delete old ones: ```python client.files.delete("file_123") ``` ## Bundled reference Read `references/batch-format.md` when the user asks about JSONL structure, file limits, or output format. ## Reference script Full working script: `scripts/05_batch_inference_synthetic.py` Docs: https://docs.tokenfactory.nebius.com/ai-models-inference/batch-inference