specification: API Commons Rate Limits specificationVersion: '0.1' provider: Agno providerId: agno url: https://docs.agno.com/features/api created: '2026-06-12' modified: '2026-06-12' notes: >- Agno's AgentOS API is a self-hosted FastAPI-based runtime deployed in the customer's own cloud infrastructure. As a result, rate limits are determined by the customer's deployment configuration rather than published centrally by Agno. The framework includes built-in fallback model support that activates when upstream LLM providers return rate limit errors (HTTP 429). Specific rate limit headers and quotas depend on the underlying LLM provider (OpenAI, Anthropic, Google, etc.) and the customer's AgentOS deployment settings. headers: retryAfter: true retryAfterHeader: Retry-After rateLimitRemaining: false rateLimitReset: false responseCodes: throttled: 429 unauthorized: 401 forbidden: 403 limits: - scope: agent-runs description: >- Concurrent agent run limit depends on AgentOS deployment resources and configuration. Agno supports background execution and streaming via Server-Sent Events to handle long-running tasks without blocking. metric: concurrent_runs limit: varies timeFrame: per-deployment notes: Configurable per AgentOS deployment; no centrally published limit - scope: llm-provider description: >- Rate limits from upstream LLM providers (OpenAI, Anthropic, Google Gemini, etc.) are surfaced as 429 errors. Agno's fallback model feature automatically switches to a configured backup model on rate limit errors, outages, or context window overflows. metric: requests limit: varies timeFrame: varies notes: Depends on the LLM provider plan; Agno handles retries via fallback models - scope: knowledge-uploads description: >- File and URL uploads to agent knowledge bases. Supports vector, keyword, and hybrid search. Limits depend on storage configuration in the customer's cloud deployment. metric: uploads limit: varies timeFrame: per-deployment notes: Configurable per AgentOS deployment fallbackBehavior: description: >- Agno supports fallback models that activate automatically when the primary model fails due to rate limits, outages, or context window overflows. This is configured at the agent or team level in the framework. documentation: https://www.agno.com/changelog