--- name: cloudflare-traffic-investigator description: Investigate traffic anomalies, spikes, and service degradation on Cloudflare-protected domains. Uses Cloudflare MCP tools for GraphQL analytics, JA4 fingerprint analysis, bot/WAF security scoring, and incident reporting. Use this skill whenever traffic spikes, service overloads, 429 errors, circuit breaker events, Cloudflare analytics, or domain performance issues are mentioned — even if the user doesn't explicitly say "traffic spike". Also triggers when asked to check Cloudflare data for any domain. allowed-tools: mcp__cloudflare-api__search, mcp__cloudflare-api__execute, Read, Bash, Write, Edit argument-hint: "[domain] [zone-id] [timerange]" model: sonnet context: fork --- # Investigating Traffic on Cloudflare-Protected Domains ## Arguments | Argument | Description | |----------|-------------| | `$ARGUMENTS[0]` | Cloudflare-protected domain to investigate (e.g., `example.com`) | | `$ARGUMENTS[1]` | Cloudflare zone ID for the domain (e.g., `abc123def456`) | | `$ARGUMENTS[2]` | *(optional)* Time range to investigate (e.g., `"2025-06-01 04:00-05:00 NZST"`, `"today 9:00-10:00 AEDT"`). In current agent's local timezone (detect via system clock), not UTC. | If domain or zone ID is not provided, ask the user via `AskUserQuestion`. Time range is collected in Step 1 if not passed here. --- Investigate unusual traffic patterns on Cloudflare-protected domains that cause downstream service failures (e.g., service overload, database saturation, API rate limiting). This skill walks through a structured investigation from confirming the spike through to a full incident report. ## Investigation Workflow Follow these steps in order. Each step file contains detailed instructions and example Cloudflare GraphQL queries. 1. **[Get parameters](steps/step-01-get-parameters.md)** — Collect time range and zone info 2. **[Confirm spike](steps/step-02-confirm-spike.md)** — Query hourly traffic to verify the anomaly 3. **[Minute-level detail](steps/step-03-minute-detail.md)** — Narrow to exact spike timing 4. **[Identify culprit JA4](steps/step-04-identify-ja4.md)** — Find JA4 fingerprints with highest request counts 5. **[Analyze traffic](steps/step-05-analyze-traffic.md)** — For top JA4s, identify paths, user IDs, ASNs 6. **[Verify legitimacy](steps/step-06-verify-legitimacy.md)** — Check bot scores, WAF scores, User-Agent 7. **[Extract top users](steps/step-07-extract-users.md)** — Find which users made the most requests 8. **[Synthesize & report](steps/step-08-synthesize.md)** — Combine findings into an incident report ## Cloudflare API MCP All Cloudflare interactions use two tools: - `mcp__cloudflare-api__search` — Discover API endpoints by searching the OpenAPI spec - `mcp__cloudflare-api__execute` — Execute API calls via `cloudflare.request()` (GraphQL analytics via POST to `/graphql`, Radar via REST, zone operations via `/zones`) See **[Cloudflare API MCP Reference](references/cloudflare-api-mcp.md)** for query patterns and examples. ## JA4 TLS Fingerprints - Format: `t13d311200_e8f1e7e78f70_d339722ba4af` - A single fingerprint across millions of requests indicates backend service configuration, not individual users - Useful for identifying automated/service-to-service traffic - Cross-reference with [Known Fingerprints](references/known-fingerprints.json) before flagging as unknown ## Cloudflare Sampled Data Firewall events use **adaptive sampling**. Numbers are sampled counts, not actual totals. Use them for pattern identification and relative comparisons — top users in sample likely represent top users overall. Always note this in reports. ## Common Failure Patterns Quickly identify root causes using these patterns: | Pattern | Signal | Resolution | |---------|--------|------------| | Circuit Breaker Cascade | 429 → timeout → breaker opens | Scale service or add rate limiting | | Retry Storm | Error count exceeds initial traffic | Add exponential backoff, client-side circuit breaker | | Single User Amplification | One user dominates request count | Contact user, fix frontend logic | | Undersized Service | Normal distribution, fails at <10 req/sec | Scale service capacity urgently | | Cascading Failure | Multiple services failing sequentially | Isolate fault, restart root service | | Cache Stampede | Spike after cache expiration | Cache lock, stale-while-revalidate | **Detailed descriptions and resolution steps:** [Failure Patterns Reference](references/failure-patterns.md) ## Escalation Criteria | Priority | Condition | |----------|-----------| | **P1 — Immediate** | Service 429 errors / circuit breaker open, >10% error rate, cascading failures | | **P2 — High** | Single user >500 req/hour on critical endpoint, sustained spike >50% above baseline, multiple dependencies affected | | **P3 — Monitor** | Moderate increase <50% above baseline, isolated user anomalies | ## Incident Report Document findings using the **[Incident Report Template](references/incident-report-template.md)** covering metrics, timeline, security analysis, root cause, and recommendations. ## Tips - Ask for time range first using `AskUserQuestion` if not provided - Identify JA4 dynamically — query Cloudflare, don't assume - Only ask the user about unknown/suspicious User-Agents — skip well-known bots and clearly internal services - Calculate actual req/sec to understand service load - Document findings immediately using the incident template ## Reference Files ### Steps 1. [Get parameters](steps/step-01-get-parameters.md) 2. [Confirm spike](steps/step-02-confirm-spike.md) 3. [Minute-level detail](steps/step-03-minute-detail.md) 4. [Identify culprit JA4](steps/step-04-identify-ja4.md) 5. [Analyze traffic](steps/step-05-analyze-traffic.md) 6. [Verify legitimacy](steps/step-06-verify-legitimacy.md) 7. [Extract top users](steps/step-07-extract-users.md) 8. [Synthesize & report](steps/step-08-synthesize.md) ### References - [Cloudflare API MCP](references/cloudflare-api-mcp.md) - [Known Fingerprints](references/known-fingerprints.json) - [Security Scores](references/security-scores.md) - [Failure Patterns](references/failure-patterns.md) - [Incident Report Template](references/incident-report-template.md)