---
name: agent-rate-limit-strategy
description: "Control LLM spend and Apex governor exposure for high-traffic Agentforce agents via per-user token budgets and graceful fallback. NOT for API rate-limiting of REST endpoints."
category: agentforce
salesforce-version: "Spring '25+"
well-architected-pillars:
  - Scalability
  - Operational Excellence
triggers:
  - "agent token budget blown"
  - "one user drives the whole agent quota"
  - "graceful fallback when agent is over limit"
  - "rate-limit agentforce per user"
tags:
  - agentforce
  - rate-limiting
  - cost
  - platform-events
inputs:
  - "Traffic forecast"
  - "per-tenant/per-user quotas"
  - "fallback UX"
outputs:
  - "Rate-limit policy CMDT"
  - "fallback messaging"
  - "observability dashboard"
dependencies: []
version: 1.0.0
author: Pranav Nagrecha
updated: 2026-04-28
---

# Agent Rate Limit Strategy

Agentforce exposes internal LLM quotas indirectly — you hit them as platform 503s with no forward signal. This skill builds a client-side budget gate in front of the agent: per-user token ledger, CMDT-driven thresholds, and a graceful fallback when the budget is exhausted.

## Recommended Workflow

1. Define budget: tokens/user/hour, tokens/tenant/day. Persist in `Agent_Rate_Limit__mdt` per-persona.
2. In the channel entry point (LWC wrapper or Connect API), call a `BudgetService.checkAndConsume(userId, estTokens)` before dispatching to the agent.
3. Log consumption via Platform Event `Agent_Token_Consumed__e` — the subscriber rolls into a `User_Token_Ledger__c` aggregate.
4. When the budget is exhausted, render the fallback UX (human handoff / retry-after message) instead of calling the agent.
5. Dashboard: p50/p95 consumption per persona, budget exhaustion count, tokens/turn — page SRE when exhaustion >1%.

## Key Considerations

- Estimate tokens conservatively from input length (`chars/4`) before the call; reconcile after.
- Ledger is eventually consistent — use the Platform Event pattern, not synchronous DML.
- Fallback UX must be pre-approved; a generic 'try again later' erodes trust.

## Worked Examples (see `references/examples.md`)

- *Budget service sketch* — 100 Service reps use agent summarization; one rep loops a bad input 500x.
- *Graceful fallback* — Budget exhausted mid-conversation.

## Common Gotchas (see `references/gotchas.md`)

- **Estimating tokens from chars misses long RAG context** — Estimate says 500 tokens; reality is 5000 after grounding.
- **Platform Event volume limits** — High-traffic agent saturates your 24h PE quota.
- **Ledger bucket skew** — Timezone boundary resets at midnight UTC, users see it mid-afternoon locally.

## Top LLM Anti-Patterns (full list in `references/llm-anti-patterns.md`)

- Trusting Agentforce to rate-limit for you — it only fails loudly on hard limits.
- Hard-coded tokens/minute without per-persona CMDT — cannot respond to traffic shifts.
- Synchronous ledger DML per turn — blows DML limits.

## Official Sources Used

- Agentforce Developer Guide — https://developer.salesforce.com/docs/einstein/genai/guide/agentforce.html
- Einstein Trust Layer — https://help.salesforce.com/s/articleView?id=sf.generative_ai_trust_layer.htm
- Invocable Actions (Apex) — https://developer.salesforce.com/docs/atlas.en-us.apexref.meta/apexref/apex_classes_invocable_action.htm
- Agentforce Testing Center — https://help.salesforce.com/s/articleView?id=sf.agentforce_testing_center.htm