--- id: ins_kesava-mandiga-cheap-tier-delegation title: 'Cheap external model for grunt work; Claude only sees judgment' operator: Kesava Mandiga operator_role: 'Head of PMM, JustCall (SaaS Labs)' source_url: https://www.linkedin.com/in/k3sava/ source_type: thread source_title: 'Kesava on cheap-tier delegation as routing pattern' source_date: 2026-05-03 captured_date: 2026-05-03 domain: [ai-native, engineering] lifecycle: [routing, cost-quality] maturity: applied artifact_class: pattern score: { originality: 5, specificity: 5, evidence: 4, transferability: 5, source: 4 } tier: A related: [ins_kesava-mandiga-workflow-collapse-not-speedup] raw_ref: ../delegation/README.md --- # Cheap external model for grunt work; Claude only sees judgment ## Claim Claude Code's weekly limit is the constraint. The fix isn't a smaller Claude, Haiku still hits the same meter. The fix is a different provider for the dumb-pipe work. DeepSeek V4 Flash reads files, summarizes transcripts, generates boilerplate at $0.001 a call against a separate balance, and Claude only spends its limited tokens on the things that need judgment. Subagents don't help. Bash-shell-out to a non-Anthropic API does. ## Mechanism The two limits (token budget and weekly cap) are coupled today because everything routes through Anthropic. Decoupling them via an external dumb-pipe is the only architecture that touches the cap. Once decoupled, Opus-as-orchestrator becomes worthwhile because Opus stops being spent on grunt.