--- name: bulk-inference description: "Runs bulk VLM inference via vLLM, OpenAI, or Gemini. Async parallel with resume and JSONL append. Use for 'run inference', 'bulk inference', '추론 실행'." model: sonnet --- # Bulk Inference ## Purpose Execute bulk VLM inference across multiple providers (vLLM local, OpenAI, Gemini) using [scripts/inference_runner.py](scripts/inference_runner.py). Handles JSONL input/output, resume from interruption, and concurrent async requests. ## Prerequisites - Input JSONL file with at minimum: an image path field, a question/prompt field, and one or more ID fields. - For `vllm_local`: running vLLM server(s) — use `/vllm-serve` first. - For `openai`: `OPENAI_API_KEY` env var set. - For `gemini`: `GOOGLE_API_KEY` env var set. ## Process 1. **Gather parameters from user**: - `--provider`: `vllm_local`, `openai`, or `gemini` - `--endpoints`: server URLs (vllm_local) or API base URL - `--model-id`: HF model name or API model ID - `--input`: path to input JSONL - `--output`: path for output JSONL - `--n-concurrent`: requests per endpoint (vllm) or total (API), default 6 - `--max-tokens`: default 100 - `--temperature`: default 0.0 - Optional: `--api-key-env`, `--reasoning-effort`, `--thinking-budget`, `--rate-limit-delay` - Optional: `--image-field`, `--question-field`, `--id-fields`, `--prompt-template` 2. **Validate inputs** — Confirm input JSONL exists and is readable. Check provider-specific requirements (API keys, server health). 3. **Run inference**: ```bash python scripts/inference_runner.py \ --provider {provider} \ --endpoints {urls} \ --model-id {model_id} \ --input {input_jsonl} \ --output {output_jsonl} \ --n-concurrent {n} \ --max-tokens {max_tokens} \ --temperature {temp} \ [--api-key-env {env_var}] \ [--reasoning-effort {effort}] \ [--thinking-budget {budget}] \ [--rate-limit-delay {delay}] \ [--no-resume] \ [--image-field {field}] \ [--question-field {field}] \ [--id-fields {f1},{f2}] \ [--prompt-template "Answer the question..."] ``` 4. **Monitor output** — The script prints a tqdm progress bar and final summary with total, success, errors, and throughput. 5. **Report results** — After completion, report: output file path, total processed, success rate, error count. ## Input JSONL Format Each line is a JSON object. Required fields are configurable via `--image-field`, `--question-field`, `--id-fields`. Defaults: - `image_path` — path to image file - `question_string` — prompt/question text - `triplet_id`, `condition` — composite ID for resume ## Output JSONL Format Each output line preserves ALL original input fields plus: ```json {"...original fields...", "model": "...", "raw_response": "...", "parsed_answer": "...", "error": null} ``` ## Rules - Resume is ON by default — interrupted runs continue from where they stopped. - Never modify the input JSONL file. - Append mode: output JSONL is opened in append mode, one line per completed item. - All errors are captured per-item; the runner does not abort on individual failures.