--- name: podium-rate-limit-survival description: Survive the rate-limit failure modes that crater production Podium integrations — cascading 429s that burn the daily quota by lunch, ignored `Retry-After` hints, silent daily-quota breaches, per-endpoint budget exhaustion, end-of-day review-request bursts, and webhook-driven outbound amplification. Use when building the outbound API layer, instrumenting quota monitoring, smoothing end-of-day review-request bursts, or recovering from a 429 cascade. Trigger with "podium rate limit", "podium 429", "podium token bucket", "podium quota monitor", "podium burst smoothing", "podium retry-after". allowed-tools: Read, Write, Edit, Bash(curl:*), Bash(jq:*), Bash(python3:*), Bash(redis-cli:*), Grep version: 2.0.0 license: MIT author: Jeremy Longshore compatibility: Designed for Claude Code tags: - podium - rate-limits - token-bucket - quota-monitoring - resilience - burst-control --- # Podium Rate Limit Survival ## Overview Make the outbound side of a Podium integration survive a real production day. This is not a "just retry on 429" walkthrough — it is the rate-limiting code your integration runs when Shopify ships 80 orders at 5pm AEST and KombiLife fires 80 review-request POSTs in 30 seconds, when an inbound webhook burst fans out 5x outbound, and when a junior engineer's naive retry loop has already eaten 92% of the daily quota by 10:30am. The six production failures this skill prevents: 1. **Cascading 429s burn the whole day** — a naive `while status == 429: retry` loop stampedes the per-minute window for the rest of the minute, then the next minute, etc. By 11am you've consumed the 24-hour quota and every endpoint is hard-down until UTC midnight. 2. **`Retry-After` header ignored** — clients that retry on a fixed delay (or worse, no delay) miss Podium's server-side hint and hit the same rate wall again. The header supports both integer seconds and HTTP-date form; many clients parse one and crash on the other. 3. **No daily-quota monitor** — the 24-hour envelope quota is silent until you breach it. Operations discover the wall on a Friday afternoon when review-request automation collapses and the on-call has no leading indicator. 4. **No per-endpoint isolation** — the `conversations.write` endpoint blows its budget on a chatty inbound webhook; `contacts.read` also fails because the client treats the API as a single bucket. One endpoint family taking down siblings is a multiplier on every other failure mode. 5. **End-of-day burst overflow** — Shopify orders ship in a 5pm cluster, KombiLife fires ~80 review-request POSTs in 30 seconds, the per-minute ceiling rejects half. The integration "works" 23 hours a day and silently drops 30-50% of review requests during the only hour that matters commercially. 6. **Webhook-driven amplification** — one inbound webhook triggers 5 outbound API calls; 100 inbound webhooks in a burst = 500 outbound = quota collapse. The amplification factor is invisible until the cascade fires. ## Authentication This skill **does not** mint, refresh, or hold Podium credentials — those concerns live in the sibling `podium-auth` skill. Every wrapped HTTP call in this skill calls `auth.get_token()` immediately after the bucket releases, where `auth` is a `PodiumAuth` instance constructed by the consumer per the [podium-auth SKILL.md](../podium-auth/SKILL.md) instructions (OAuth2 refresh-token grant against `https://accounts.podium.com/oauth/token`). The bearer token is passed in the `Authorization: Bearer {token}` header on every `api.podium.com` request. If `auth.get_token()` raises, this skill propagates the auth error to the caller without retry — auth recovery is `podium-auth`'s responsibility, not this skill's. ## Prerequisites - A working `podium-auth` integration (this skill assumes a `PodiumAuth` instance is available — see the [podium-auth skill](../podium-auth/SKILL.md) in this pack) - Python 3.10+ with `asyncio` (the patterns translate to Node.js; see references/implementation.md) - A token-bucket library — `aiolimiter` recommended, or hand-rolled on `asyncio.sleep` - A daily-quota counter store — Redis preferred (atomic INCR + TTL), local SQLite acceptable for single-process integrations - Knowledge of which Podium endpoint families your integration hits (conversations, contacts, reviews, locations, webhooks) — bucket isolation is per-family ## Instructions Build in this order. Each section neutralizes one of the six production failures. ### 1. Token-bucket rate limiter (neutralizes cascading 429s) The Podium API's documented ceiling is 60 requests per minute per OAuth app. Treat it as a hard ceiling and stay under it by construction — never by reacting to 429s. Hand the hot path a token-bucket gate that paces requests at the documented rate; concurrent callers serialize on the bucket, no retry storm is possible. ```python import asyncio import time from contextlib import asynccontextmanager from typing import Optional class TokenBucket: """Async token-bucket limiter. Pace = rate tokens per second, max burst = capacity.""" def __init__(self, rate_per_minute: int, capacity: int): self.rate_per_sec = rate_per_minute / 60.0 self.capacity = capacity self._tokens = float(capacity) self._last_refill = time.monotonic() self._lock = asyncio.Lock() async def acquire(self, tokens: float = 1.0) -> None: while True: async with self._lock: self._refill() if self._tokens >= tokens: self._tokens -= tokens return deficit = tokens - self._tokens wait_s = deficit / self.rate_per_sec # Sleep OUTSIDE the lock so other callers can refill-and-check in parallel await asyncio.sleep(wait_s) def _refill(self) -> None: now = time.monotonic() elapsed = now - self._last_refill self._tokens = min(self.capacity, self._tokens + elapsed * self.rate_per_sec) self._last_refill = now ``` Wire it into the outbound HTTP path: ```python PODIUM_LIMIT_PER_MIN = 60 # documented ceiling PODIUM_BURST_CAPACITY = 10 # conservative burst headroom; tune per endpoint bucket = TokenBucket(rate_per_minute=PODIUM_LIMIT_PER_MIN, capacity=PODIUM_BURST_CAPACITY) async def podium_call(method: str, path: str, **kwargs) -> httpx.Response: await bucket.acquire() token = await auth.get_token() async with httpx.AsyncClient(timeout=10) as c: return await c.request( method, f"https://api.podium.com{path}", headers={"Authorization": f"Bearer {token}"}, **kwargs, ) ``` The bucket converts what would be a 429 cascade into bounded queueing. Latency goes up on the burst; **success rate stays at 100%**. ### 2. `Retry-After` parsing for the residual 429s (neutralizes ignored hints) Even with a bucket, the residual 429s happen — clock drift between your process and Podium's edge, multiple processes sharing a quota, an inbound webhook fan-out that the bucket sees but the server already counted. When 429 happens, Podium returns a `Retry-After` header. Honor it. Support both forms: - `Retry-After: 30` — integer seconds to wait - `Retry-After: Wed, 21 Oct 2026 07:28:00 GMT` — HTTP-date (RFC 7231) ```python from email.utils import parsedate_to_datetime from datetime import datetime, timezone def parse_retry_after(header_value: str) -> float: """Return seconds to wait. Supports int-seconds and HTTP-date forms.""" header_value = header_value.strip() # Try integer seconds first — most common form Podium returns try: seconds = int(header_value) return max(0.0, float(seconds)) except ValueError: pass # HTTP-date form — RFC 7231 try: retry_at = parsedate_to_datetime(header_value) if retry_at.tzinfo is None: retry_at = retry_at.replace(tzinfo=timezone.utc) delta = (retry_at - datetime.now(timezone.utc)).total_seconds() return max(0.0, delta) except (TypeError, ValueError): # Malformed header — fall back to a safe default rather than crash return 60.0 ``` Wire it into the retry wrapper: ```python async def podium_call_with_retry(method: str, path: str, max_attempts: int = 4, **kwargs): for attempt in range(1, max_attempts + 1): await bucket.acquire() r = await _raw_call(method, path, **kwargs) if r.status_code != 429: return r wait_s = parse_retry_after(r.headers.get("Retry-After", "60")) # Cap the wait so a misconfigured server can't pin us indefinitely wait_s = min(wait_s, 120.0) await asyncio.sleep(wait_s) raise PodiumRateLimitError(f"429 persisted after {max_attempts} attempts on {path}") ``` Two things make this correct: parse both header forms, and cap the maximum wait. A server returning `Retry-After: 86400` would otherwise stall the integration for a day. ### 3. Daily quota monitor (neutralizes silent quota breaches) The per-minute ceiling is one envelope; Podium also enforces a 24-hour envelope per OAuth app. The 24-hour envelope is silent until you breach it. Track outbound call count in a counter with a UTC-midnight TTL; emit warn / page / hard-throttle alerts at 70 / 85 / 95% consumption. ```python import redis.asyncio as aioredis DAILY_QUOTA = 50_000 # set to your actual quota; conservative default WARN_THRESHOLD = 0.70 PAGE_THRESHOLD = 0.85 THROTTLE_THRESHOLD = 0.95 class DailyQuotaMonitor: def __init__(self, redis_url: str, quota: int = DAILY_QUOTA): self._redis = aioredis.from_url(redis_url, decode_responses=True) self.quota = quota def _key(self) -> str: return f"podium:quota:{datetime.utcnow().strftime('%Y-%m-%d')}" async def increment(self, n: int = 1) -> int: key = self._key() # INCR-then-EXPIRE is atomic enough — first-write-wins on the TTL is fine new_count = await self._redis.incr(key, n) if new_count == n: # First increment of the day — set TTL to UTC midnight + 1h grace await self._redis.expire(key, 90_000) return new_count async def check_and_alert(self) -> str: count = int(await self._redis.get(self._key()) or 0) ratio = count / self.quota if ratio >= THROTTLE_THRESHOLD: page_oncall(f"Podium daily quota at {ratio:.1%} ({count}/{self.quota}) — hard-throttle engaged") return "throttle" if ratio >= PAGE_THRESHOLD: page_oncall(f"Podium daily quota at {ratio:.1%} ({count}/{self.quota})", severity="high") return "page" if ratio >= WARN_THRESHOLD: log_warn(f"Podium daily quota at {ratio:.1%} ({count}/{self.quota})") return "warn" return "ok" ``` When the throttle threshold fires, drop the token-bucket rate by 50% for the rest of the day. Customers see slower processing of low-priority traffic; the integration does not collapse. ### 4. Per-endpoint bucket isolation (neutralizes cross-endpoint contagion) If `conversations.write` is busy on a chatty inbound webhook, `contacts.read` should not also start failing. Isolate buckets per endpoint family — one bucket each for conversations, contacts, reviews, locations, webhooks. Each gets a share of the per-minute ceiling proportional to its expected load: ```python ENDPOINT_BUCKETS = { "conversations": TokenBucket(rate_per_minute=20, capacity=5), "contacts": TokenBucket(rate_per_minute=15, capacity=5), "reviews": TokenBucket(rate_per_minute=15, capacity=10), # bursty "locations": TokenBucket(rate_per_minute=5, capacity=2), "webhooks": TokenBucket(rate_per_minute=5, capacity=2), } # Sum of per-minute rates = 60, matching the documented ceiling. def endpoint_family(path: str) -> str: # /v4/conversations/abc → "conversations" parts = path.strip("/").split("/") if len(parts) >= 2 and parts[0] == "v4": return parts[1] return "default" async def podium_call_isolated(method: str, path: str, **kwargs) -> httpx.Response: family = endpoint_family(path) bucket = ENDPOINT_BUCKETS.get(family) or ENDPOINT_BUCKETS["conversations"] await bucket.acquire() return await _raw_call(method, path, **kwargs) ``` The sum of per-family rates must equal the documented ceiling — over-allocating per-family rates means the global ceiling fires across all families simultaneously, which is the cross-contagion this section is meant to prevent. ### 5. End-of-day burst smoother (neutralizes the 5pm review-request cluster) KombiLife's pattern is documented: Shopify orders ship in a tight 5pm AEST cluster, the integration fires ~80 review-request POSTs in 30 seconds, the per-minute ceiling rejects half. The fix is to detect the burst, smooth it over the next 90 seconds, and absorb residual via the bucket. ```python class BurstSmoother: """Smooth a batch of N requests over a target window respecting the bucket rate.""" def __init__(self, bucket: TokenBucket, target_window_seconds: float = 90.0): self.bucket = bucket self.target_window = target_window_seconds async def submit_batch(self, requests: list[dict], handler) -> list: if not requests: return [] # Compute per-request delay so the batch completes within target_window # OR at bucket rate, whichever is slower (bucket rate wins on small windows). ideal_delay = self.target_window / len(requests) rate_delay = 1.0 / self.bucket.rate_per_sec delay = max(ideal_delay, rate_delay) results = [] for i, req in enumerate(requests): if i > 0: await asyncio.sleep(delay) await self.bucket.acquire() results.append(await handler(req)) return results ``` Usage: ```python smoother = BurstSmoother(bucket=ENDPOINT_BUCKETS["reviews"], target_window_seconds=120) # 80 review requests fire over 120s instead of 30s — bucket eats the residual smoothly results = await smoother.submit_batch(review_request_payloads, send_review_request) ``` For KombiLife specifically: 80 requests over 120s = 0.67 req/sec = 40 req/min, well under the 15 req/min the `reviews` bucket grants. The burst completes in 2 minutes with zero 429s and zero dropped review requests. ### 6. Webhook amplification accounting (neutralizes inbound→outbound multipliers) When an inbound Podium webhook (or Shopify webhook, or any other source) triggers N outbound Podium calls, the effective rate the bucket sees is N× the inbound rate. Estimate the amplification factor per inbound event type and admit-control at the front door rather than queue at the bucket: ```python AMPLIFICATION_FACTOR = { "shopify.order.created": 5, # contact upsert + 1 review request + 3 attribute writes "podium.conversation.new": 2, # ack + tag write "podium.review.received": 3, # contact update + sentiment write + slack mirror } class AdmissionController: """Reject inbound work when its projected outbound cost exceeds remaining budget.""" def __init__(self, bucket: TokenBucket, daily_monitor: DailyQuotaMonitor): self.bucket = bucket self.daily = daily_monitor async def admit(self, event_type: str) -> bool: cost = AMPLIFICATION_FACTOR.get(event_type, 1) # Reject if a single event would burn >5% of remaining daily quota remaining = self.daily.quota - int(await self.daily._redis.get(self.daily._key()) or 0) if cost > remaining * 0.05: log_warn(f"admission denied {event_type}: cost={cost} remaining={remaining}") return False return True ``` Reject-with-replay is acceptable for webhooks Podium delivers — Podium retries inbound webhooks on non-2xx. Reject-with-replay is **not** acceptable for Shopify webhooks unless your handler is replayable; queue them to a durable store instead and drain when the daily quota recovers. ## Error Handling | HTTP Status | Podium Error | Root Cause | Action | |---|---|---|---| | `429 Too Many Requests` | `rate_limited` | Per-minute or per-day envelope exceeded | Parse `Retry-After`; honor + cap at 120s; back off attempts | | `503 Service Unavailable` | `service_overloaded` | Podium-side overload (not client-attributable) | Exponential backoff + jitter; max 4 attempts | | `400 Bad Request` | `quota_exhausted` | 24h envelope hit (returned by some endpoints instead of 429) | Hard-stop the offending endpoint family until UTC midnight | | `502/504` | `gateway_timeout` | Upstream timeout, often during burst | Retry once with full bucket wait; do not retry-storm | | in-process | `BurstSmoother queue full` | Submitted batch larger than smoother capacity | Spill to a durable queue; drain on the next minute | | in-process | `AdmissionController denied` | Projected cost > 5% of remaining daily quota | Defer to a low-priority worker; alert on sustained denials | ## Examples ### Minimal — wrap an existing call site with the bucket ```python from podium_rate_limit import TokenBucket bucket = TokenBucket(rate_per_minute=60, capacity=10) async def safe_podium_call(method: str, path: str, **kwargs): await bucket.acquire() return await unsafe_podium_call(method, path, **kwargs) ``` One line of change at every call site. The bucket is global; safe under asyncio concurrency. ### Operator — simulate a request trace and see projected 429 count ```bash python3 scripts/bucket_simulator.py \ --trace ./traces/2026-05-09-prod-replay.csv \ --rate-per-minute 60 \ --capacity 10 ``` Output: ```json { "trace_requests": 4127, "trace_window_seconds": 3600, "projected_429_count": 0, "projected_p99_queue_wait_ms": 1840, "would_exhaust_daily_quota_at_request": null } ``` ### Operator — check today's quota consumption ```bash python3 scripts/quota_monitor.py --redis-url redis://localhost:6379 --quota 50000 # Exit 0 = healthy; 1 = warn; 2 = page; 3 = throttle ``` ### Operator — smooth a CSV of pending review requests ```bash python3 scripts/burst_smoother.py \ --input pending-reviews-2026-05-09.csv \ --rate-per-minute 15 \ --target-window-seconds 120 \ --output smoothed-schedule.csv ``` ### Operator — parse a `Retry-After` header from a real 429 response ```bash # Integer-seconds form python3 scripts/retry_after_parse.py --header "30" # {"wait_seconds": 30.0, "absolute_wakeup_utc": "2026-05-09T17:00:30+00:00"} # HTTP-date form python3 scripts/retry_after_parse.py --header "Wed, 09 May 2026 17:05:00 GMT" # {"wait_seconds": 287.4, "absolute_wakeup_utc": "2026-05-09T17:05:00+00:00"} ``` ## Output - Token-bucket rate limiter wired into every outbound Podium call site - `Retry-After` parser supporting integer-seconds AND HTTP-date forms, with a 120s cap - Daily quota monitor with three severity tiers (warn 70%, page 85%, throttle 95%) - Per-endpoint bucket isolation — one bucket per endpoint family, summed rates = 60/min - End-of-day burst smoother for the 5pm review-request cluster - Webhook amplification accounting with admission control on inbound events ## Resources - [Podium API docs — Rate limits](https://docs.podium.com/reference/rate-limiting) - [Token bucket algorithm — Wikipedia](https://en.wikipedia.org/wiki/Token_bucket) - [RFC 7231 § 7.1.3 — `Retry-After` header](https://datatracker.ietf.org/doc/html/rfc7231#section-7.1.3) - [config/settings.yaml](config/settings.yaml) — bucket sizes, thresholds, endpoint allocation - [references/errors.md](references/errors.md) — ERR_RL_* codes with cause + solution - [references/examples.md](references/examples.md) — 10 worked examples (single-call, isolation, burst, replay) - [references/implementation.md](references/implementation.md) — Node.js equivalents, Redis vs SQLite stores, the on-call playbook for a 429 cascade in progress - [scripts/bucket_simulator.py](scripts/bucket_simulator.py) — CLI: simulate token-bucket behavior on a request trace - [scripts/quota_monitor.py](scripts/quota_monitor.py) — CLI: query daily quota counter and emit warn/page/throttle - [scripts/burst_smoother.py](scripts/burst_smoother.py) — CLI: smooth a batched CSV against the per-minute ceiling - [scripts/retry_after_parse.py](scripts/retry_after_parse.py) — CLI: parse `Retry-After` to absolute wakeup time