--- name: pagerduty-oncall description: Investigate PagerDuty incidents for Envato on-call escalation policies. Use when asked about incidents, on-call status, incident analysis, or PagerDuty investigation. argument-hint: "what to investigate (e.g. 'incidents last 24h', 'on-call yesterday', 'incidents today')" allowed-tools: Bash(pd auth *), Bash(pd ep list *), Bash(pd incident list *), Bash(pd incident log *), Bash(pd incident notes *), Bash(pd incident analytics *), Bash(node *), Bash(mkdir *), Bash(test *), Bash(chmod *), Read, Write, Edit model: sonnet context: fork --- # PagerDuty On-Call Incident Investigator Authenticate, list escalation policies, fetch all incidents and their details, then analyse relevance across Envato on-call teams. ## Arguments - `$ARGUMENTS[0]` — What to investigate (e.g. `"incidents last 24h"`, `"on-call yesterday"`, `"incidents today"`, `"incidents 2026-03-01 to 2026-03-05"`). Defaults to `"incidents today"`. Interpret the time range to derive `--since` and `--until` dates (YYYY-MM-DD) in the agent's local timezone (detect via system clock), not UTC. See the date derivation table in Step 5. ## Target Escalation Policies The list of escalation policies to investigate is resolved in order: 1. `config.json` — `escalation_policies` array in [config.json](config.json) 2. `PD_ESCALATION_POLICIES` — comma-separated env var (e.g. `"On Call, Platform Engineering On-Call"`) 3. If both are empty, all escalation policies are included ## System Requirements - `pd` CLI installed (https://github.com/martindstone/pagerduty-cli) - Environment variable `PAGERDUTY_API_TOKEN` set with a valid PagerDuty REST API token. **Important:** When checking this variable, verify at least 2 times before concluding it is not set. Environment variables can appear unset due to shell context differences. **Never expose the value** — use existence checks only (e.g. `test -n "$PAGERDUTY_API_TOKEN"`). ## Scripts All parsing and filtering is handled by scripts in `${CLAUDE_SKILL_DIR}/scripts/`: | Script | Purpose | |--------|---------| | `extract-json.js` | Extracts JSON from pd CLI output that may contain non-JSON text | | `filter-eps.js` | Filters EPs by target names, extracts relevant fields | | `filter-incidents.js` | Filters incidents by service IDs from `ep-list.json`, extracts fields | | `parse-log.js` | Extracts relevant fields from incident log entries | | `parse-notes.js` | Extracts relevant fields from incident notes | | `parse-analytics.js` | Extracts relevant fields from incident analytics | All scripts read from stdin and write to stdout. Pipe pd CLI output through `extract-json.js` first, then through the appropriate filter/parse script. ## Output Directory All intermediate JSON and the final report are saved to: ``` .pagerduty-oncall-tmp/ ├── ep-list.json # Filtered escalation policies ├── incidents.json # Incidents filtered by target EPs ├── logs/.json # Log entries per incident ├── notes/.json # Notes per incident ├── analytics/.json # Analytics per incident └── report.md # Final analysis report ``` ## Execution ### 1. Verify Connection Check that `PAGERDUTY_API_TOKEN` is set (verify at least 2 times before concluding it is missing): ```bash test -n "$PAGERDUTY_API_TOKEN" && echo "TOKEN_SET" || echo "TOKEN_MISSING" ``` If token is set, authenticate: ```bash pd auth add --token "$PAGERDUTY_API_TOKEN" ``` If authentication fails, use `AskUserQuestion` to inform the user and link to the [PagerDuty CLI User Guide](https://github.com/martindstone/pagerduty-cli/wiki/PagerDuty-CLI-User-Guide) for setup instructions. Do NOT continue until authentication succeeds. ### 2. Prepare Output Directory Create the output directory `.pagerduty-oncall-tmp/` and subdirectories `logs/`, `notes/`, `analytics/` before saving any files. ### 3. Load Escalation Policy Configuration Read [config.json](config.json) using the Read tool. Extract the `escalation_policies` array. If the array is empty, check for the `PD_ESCALATION_POLICIES` env var: ```bash test -n "$PD_ESCALATION_POLICIES" && echo "$PD_ESCALATION_POLICIES" || echo "EMPTY" ``` Build the list of target EP names for the next step. If both sources are empty, no names will be passed (all EPs included). ### 4. List and Filter Escalation Policies Fetch, extract JSON, filter by target names, and save in one pipeline: ```bash pd ep list --json 2>&1 | node ${CLAUDE_SKILL_DIR}/scripts/extract-json.js | node ${CLAUDE_SKILL_DIR}/scripts/filter-eps.js "EP Name 1" "EP Name 2" > .pagerduty-oncall-tmp/ep-list.json ``` Pass each target EP name as a separate argument to `filter-eps.js`. If no target names, omit the arguments (all EPs pass through). ### 5. Fetch and Filter Incidents Derive `SINCE_DATE` and `UNTIL_DATE` (YYYY-MM-DD) from `$ARGUMENTS[0]` in the agent's local timezone. **Important:** `--until` is **exclusive** in the `pd` CLI — it does NOT include that day. Use this table (assuming today is 2026-03-06): | Input | SINCE_DATE | UNTIL_DATE | Why | |-------|-----------|-----------|-----| | "last 24h" | 2026-03-05 | 2026-03-07 | 24h back from now, until must be tomorrow to include today | | "today" | 2026-03-06 | 2026-03-07 | until=tomorrow to include today | | "yesterday" | 2026-03-05 | 2026-03-06 | until=today to include yesterday | | "last 3 days" | 2026-03-03 | 2026-03-07 | 3 days back, until=tomorrow | | "2026-03-01 to 2026-03-05" | 2026-03-01 | 2026-03-06 | until=day after end date | Then fetch, extract JSON, filter by service IDs from `ep-list.json`, and save: ```bash pd incident list --json --statuses=open --statuses=closed --statuses=triggered --statuses=acknowledged --statuses=resolved --since=SINCE_DATE --until=UNTIL_DATE 2>&1 | node ${CLAUDE_SKILL_DIR}/scripts/extract-json.js | node ${CLAUDE_SKILL_DIR}/scripts/filter-incidents.js .pagerduty-oncall-tmp/ep-list.json > .pagerduty-oncall-tmp/incidents.json ``` Read `.pagerduty-oncall-tmp/incidents.json` to check the result. If the array is empty, write a report noting zero incidents and stop. ### 6. Gather Incident Details For each incident in `.pagerduty-oncall-tmp/incidents.json`, fetch details **sequentially** (to avoid PagerDuty API rate limits). Use the `id` field (e.g. `Q1V3O5Q3JX39LJ`), NOT `incident_number` — the `pd` CLI only accepts internal IDs. **Log entries:** ```bash pd incident log -i --json 2>&1 | node ${CLAUDE_SKILL_DIR}/scripts/extract-json.js | node ${CLAUDE_SKILL_DIR}/scripts/parse-log.js > .pagerduty-oncall-tmp/logs/.json ``` **Notes:** ```bash pd incident notes -i --output=json 2>&1 | node ${CLAUDE_SKILL_DIR}/scripts/extract-json.js | node ${CLAUDE_SKILL_DIR}/scripts/parse-notes.js > .pagerduty-oncall-tmp/notes/.json ``` **Analytics:** ```bash pd incident analytics -i --json 2>&1 | node ${CLAUDE_SKILL_DIR}/scripts/extract-json.js | node ${CLAUDE_SKILL_DIR}/scripts/parse-analytics.js > .pagerduty-oncall-tmp/analytics/.json ``` ### 7. Analyse and Report Read all saved JSON files from `.pagerduty-oncall-tmp/` using the Read tool. Produce a structured analysis and save it using Write to `.pagerduty-oncall-tmp/report.md`: 1. **Incident Summary Table** — For each incident: ID, title, service, escalation policy, status, urgency, created/resolved timestamps (current agent's local timezone via system clock, not UTC), duration 2. **Cross-Team Correlation** — Identify incidents that overlap in time across different escalation policies. Flag potential cascading failures or shared root causes 3. **Timeline** — Chronological view of all incidents across all teams in current agent's local timezone, highlighting clusters of activity 4. **Key Findings** — Patterns, recurring services, repeated triggers, or escalation policy gaps 5. **Recommendations** — Actionable suggestions based on the analysis After writing the report, inform the user of the report location: `.pagerduty-oncall-tmp/report.md` think yes