--- name: cost-accrual-tracker description: Track real-time API cost accrual during LLM execution. Activate on 'cost tracking', 'token usage', 'API costs', 'budget monitoring', 'usage metrics'. NOT for cost estimation, pricing tiers, or billing systems. allowed-tools: Read,Write,Edit --- # Cost Accrual Tracker Real-time tracking of API costs during LLM execution with support for partial costs on abort. ## When to Use ✅ **Use for**: - Implementing real-time cost tracking during execution - Capturing partial costs when executions are aborted - Building cost display widgets for execution UIs - Integrating token counting into execution pipelines - Adding budget thresholds with auto-stop ❌ **NOT for**: - Cost estimation before execution (use pricing calculators) - Billing system design (use billing-system skill) - Price tier management or discounts - Historical cost analytics dashboards ## Core Patterns ### 1. Token-Based Cost Calculation ```typescript interface TokenUsage { inputTokens: number; outputTokens: number; cacheReadTokens?: number; // Prompt caching hits cacheWriteTokens?: number; // Prompt caching misses } interface CostCalculation { inputCostUsd: number; outputCostUsd: number; cacheSavingsUsd?: number; totalCostUsd: number; } function calculateCost(usage: TokenUsage, model: string): CostCalculation { const pricing = MODEL_PRICING[model]; const inputCostUsd = (usage.inputTokens / 1_000_000) * pricing.inputPerMTok; const outputCostUsd = (usage.outputTokens / 1_000_000) * pricing.outputPerMTok; return { inputCostUsd, outputCostUsd, totalCostUsd: inputCostUsd + outputCostUsd, }; } ``` ### 2. Incremental Accrual Pattern Track costs as they accrue, not just at completion: ```typescript class CostAccrualTracker { private totalInputTokens = 0; private totalOutputTokens = 0; private accruedCostUsd = 0; private readonly model: string; constructor(model: string) { this.model = model; } /** * Called after each API response (streaming or complete) */ recordUsage(usage: TokenUsage): void { this.totalInputTokens += usage.inputTokens; this.totalOutputTokens += usage.outputTokens; const cost = calculateCost(usage, this.model); this.accruedCostUsd += cost.totalCostUsd; } /** * Get current accrued cost (for real-time display) */ getCurrentCost(): number { return this.accruedCostUsd; } /** * Finalize on completion or abort */ finalize(reason: 'completed' | 'aborted' | 'failed'): CostReport { return { totalInputTokens: this.totalInputTokens, totalOutputTokens: this.totalOutputTokens, totalCostUsd: this.accruedCostUsd, completionReason: reason, finalizedAt: Date.now(), }; } } ``` ### 3. Abort-Aware Cost Capture **Critical**: Always capture partial costs on abort: ```typescript // In execution handler const tracker = new CostAccrualTracker(model); try { for await (const chunk of executeStream(request)) { if (abortSignal.aborted) { // CRITICAL: Capture cost BEFORE throwing const partialCost = tracker.finalize('aborted'); onCostUpdate(partialCost); throw new AbortError('Execution aborted'); } tracker.recordUsage(chunk.usage); onCostUpdate(tracker.getCurrentCost()); } return tracker.finalize('completed'); } catch (error) { if (error instanceof AbortError) { throw error; // Already handled } return tracker.finalize('failed'); } ``` ### 4. Budget Threshold Pattern Auto-stop execution when budget is exceeded: ```typescript interface BudgetConfig { maxCostUsd: number; warnAtPercentage: number; // e.g., 0.8 for 80% onWarn?: (current: number, max: number) => void; onExceed?: (current: number, max: number) => void; } function createBudgetGuard(config: BudgetConfig) { return { check(currentCostUsd: number): 'ok' | 'warn' | 'exceed' { const percentage = currentCostUsd / config.maxCostUsd; if (percentage >= 1.0) { config.onExceed?.(currentCostUsd, config.maxCostUsd); return 'exceed'; } if (percentage >= config.warnAtPercentage) { config.onWarn?.(currentCostUsd, config.maxCostUsd); return 'warn'; } return 'ok'; } }; } ``` ## Anti-Patterns ### Lost Costs on Abort **Novice thinking**: "Just throw an error when aborted" **Reality**: If you don't capture costs before aborting, you lose: - Token usage data for partial execution - Accurate cost reporting for billing - Audit trail for debugging **Timeline**: Always been an issue, but became critical with expensive models (GPT-4, Claude Opus) **Correct approach**: Always call `finalize()` with partial data BEFORE throwing abort errors. ### Polling Without Debounce **Novice thinking**: "Poll cost endpoint every 100ms for real-time updates" **Reality**: - Wastes bandwidth and CPU - Cost updates only happen after API responses - Polling faster than response rate is pointless **Correct approach**: Poll at 1-2 second intervals, or use event-driven updates from the execution stream. ### Ignoring Prompt Caching **Novice thinking**: "Just multiply tokens by price per token" **Reality**: Claude's prompt caching changes the cost model: - Cache reads are 90% cheaper - Cache writes cost extra on first use - Ignoring caching leads to inaccurate costs **Timeline**: - Pre-2024: No caching, simple calculation - 2024+: Claude prompt caching requires separate tracking **Correct approach**: Track `cache_read_input_tokens` and `cache_creation_input_tokens` separately. ### Per-Request Cost Objects **Novice thinking**: "Create new tracker for each request" **Reality**: For DAG execution with multiple nodes: - Need aggregate cost across all nodes - Need to attribute costs to specific nodes - Need rollup for parent execution **Correct approach**: Hierarchical tracking - per-node trackers that roll up to execution-level. ## State Flow ``` ┌─────────────────────────────────────────┐ │ CostAccrualTracker │ └─────────────────────────────────────────┘ │ ┌─────────────────────────┼─────────────────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ recordUsage() │ │ getCurrentCost()│ │ finalize() │ │ │ │ │ │ │ │ After each API │ │ For real-time │ │ On completion, │ │ response │ │ display │ │ abort, or fail │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ │ │ │ │ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ CostReport │ │ { inputTokens, outputTokens, totalCostUsd, completionReason } │ └─────────────────────────────────────────────────────────────────┘ ``` ## UI Display Pattern For real-time cost display in execution UIs: ```typescript // Poll every 2 seconds while executing useEffect(() => { if (status !== 'running') return; const interval = setInterval(async () => { const response = await fetch(`/api/execute/${executionId}`); const data = await response.json(); setAccruedCost(data.cost.accruedUsd); setTokens({ input: data.cost.inputTokens, output: data.cost.outputTokens, }); }, 2000); return () => clearInterval(interval); }, [executionId, status]); // Display format