--- name: pipeline-context description: Load context for pipeline, cron, Lambda, OCR, and translation work. Use when starting any pipeline monitoring, debugging, or processing task. --- # Pipeline Context Read these files before proceeding with pipeline work: 1. `memory/pipeline-ops.md` — Emergency controls, worker architecture, operational details 2. `memory/lessons-learned.md` — Operational postmortems and patterns 3. `.claude/docs/pipeline.md` — Full processing pipeline (states, crons, prompts, costs) 4. `.claude/docs/worker-architecture.md` — Lambda worker details 5. `.claude/docs/page-lifecycle.md` — Page processing states ## Critical Rules - Model selection: prefer `getModelForBook(book)` from `src/lib/types/ai-models.ts` over hardcoding. It routes BPH books and non-Latin-script languages to `gemini-3-flash-preview` (full quality) and everything else to `gemini-3.1-flash-lite-preview` (50% cheaper, comparable quality on Latin-script). Enrich-worker uses `gemini-3.1-flash-lite-preview` for all phases (summary+index, chapters, quality scoring, collection assignment). Never use anything below Gemini v3 — `gemini-2.x` is deprecated. - NEVER use Gemini Batch API for translation — use Lambda workers (SQS FIFO) - Any script overwriting `ocr.data` or `translation.data` MUST call `createRevision()` first - MongoDB Atlas saturates at ~40 concurrent Lambda jobs — global backpressure limit - Emergency stop: `system_config._id: 'processing_control'`, set `paused: true` ## Audit Trail All AI calls logged to `gemini_usage` collection via `logGeminiCall()` in `src/lib/gemini-logger.ts`. - Book history timeline: `GET /api/books/[id]/history` (assembles from 6 collections) - Dashboard: `GET /api/admin/processing-dashboard?provider=ia` - Error classification: `src/lib/errors.ts` → `classifyError(error)` - `cost_tracking` collection is DEPRECATED — use `gemini_usage` for all cost queries ## Staleness Check After reading the memory files above, flag anything that contradicts what you observe in the codebase: - File paths or function names that no longer exist - Behavioral claims that don't match the current code - Stats or counts with dates older than 14 days — note as potentially stale - If you find contradictions, update the memory file immediately and tell the user what changed. ## Also Relevant - Batch processing (Gemini Batch API): `.claude/docs/batch-processing.md` - Observability & audit trail: `.claude/docs/observability.md` - First translation identification: `.claude/docs/first-translation-system.md`