---
name: observability
description: |
  Complete observability infrastructure. Error tracking, alerting, logging, health checks.
  Optimized for indie dev: minimal services, CLI-manageable, AI agent integration.
argument-hint: "[focus area, e.g. 'alerts' or 'health checks']"
---

# /observability

Production observability with one service. Audit, fix, verify—every time.

## Philosophy

**Two services, not twenty.** Sentry handles errors. PostHog handles product analytics. That's it. Vercel captures stdout automatically (no setup needed).

**CLI-first.** Everything manageable from command line. No dashboard clicking.

**AI-agent ready.** Errors should trigger automated analysis and fixes.

## What This Does

Examines project observability, identifies gaps, implements fixes, and verifies alerting works. Every run does the full cycle.

## Branching

Assumes you start on `master`/`main`. Before making code changes:

```bash
git checkout -b infra/observability-$(date +%Y%m%d)
```

## Architecture

```
App → Sentry (errors, performance)
    → PostHog (product analytics, feature flags)
    → stdout (Vercel captures automatically)
    → /api/health (uptime monitoring)

AI Integration:
    Sentry MCP → Claude (query errors, analyze, fix)
    PostHog MCP → Claude (query funnels, cohorts, events)
    Sentry webhook → GitHub Action → agent (auto-triage)
    CLI scripts → manual triage and resolution
```

**Services:** 2 (Sentry + PostHog)
**Built-in free:** Vercel logs
**CLI-manageable:** 100% (both have MCP servers)

## Process

### 1. Audit

**Check what exists:**
```bash
# Sentry configured?
~/.claude/skills/sentry-observability/scripts/detect_sentry.sh

# Health endpoint?
[ -f "app/api/health/route.ts" ] || [ -f "src/app/api/health/route.ts" ] && echo "✓ Health endpoint" || echo "✗ Health endpoint"

# Structured logging?
grep -r "console.log\|console.error" --include="*.ts" --include="*.tsx" src/ app/ 2>/dev/null | head -5

# PostHog analytics?
grep -q "posthog" package.json && echo "✓ PostHog" || echo "✗ PostHog not installed (P1)"
```

**Spawn agent for deep review:**
Spawn `observability-advocate` agent to audit logging coverage, error handling, and silent failure risks.

### 2. Plan

Every project needs:

**Essential (every production app):**
- Sentry error tracking with source maps
- Health check endpoint (`/api/health`)
- Structured logging (JSON to stdout)
- At least one alert rule (new errors)

**Required (user-facing apps):**
- PostHog product analytics (funnels, cohorts, session replay)
- PostHog feature flags (replaces LaunchDarkly)

**Recommended:**
- Webhook for AI agent integration
- Triage scripts for CLI management

**Only if needed:**
- Custom uptime monitoring

### 3. Execute

**Install Sentry:**
```bash
pnpm add @sentry/nextjs
npx @sentry/wizard@latest -i nextjs
```

Or use init script:
```bash
~/.claude/skills/sentry-observability/scripts/init_sentry.sh
```

**Configure PII redaction:**
```typescript
// sentry.client.config.ts
Sentry.init({
  dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
  beforeSend(event) {
    // Scrub PII
    if (event.extra) delete event.extra.password;
    if (event.user) delete event.user.email;
    return event;
  },
});
```

**Create health endpoint:**
```typescript
// app/api/health/route.ts
export async function GET() {
  const checks = {
    app: 'ok',
    timestamp: new Date().toISOString(),
  };

  // Add service checks as needed
  // checks.database = await checkDb();
  // checks.stripe = await checkStripe();

  return Response.json(checks);
}
```

**Set up structured logging:**
Use JSON logs that Vercel can parse:
```typescript
// lib/logger.ts
export function log(level: 'info' | 'warn' | 'error', message: string, data?: Record<string, unknown>) {
  const entry = {
    level,
    message,
    timestamp: new Date().toISOString(),
    ...data,
  };
  console[level === 'error' ? 'error' : 'log'](JSON.stringify(entry));
}
```

**Create alert rule:**
```bash
~/.claude/skills/sentry-observability/scripts/create_alert.sh --name "New Errors" --type issue
```

**Set up webhook for AI integration (optional):**
In Sentry Dashboard → Settings → Integrations → Internal Integrations:
1. Create integration with webhook URL
2. Subscribe to issue events
3. Point to GitHub Action or custom endpoint

### 4. Verify

**Verify Sentry setup:**
```bash
~/.claude/skills/sentry-observability/scripts/verify_setup.sh
```

**Test error tracking:**
```typescript
// Trigger test error
throw new Error('Test error for Sentry verification');
```

Then check Sentry dashboard or:
```bash
~/.claude/skills/sentry-observability/scripts/list_issues.sh --limit 1
```

**Test health endpoint:**
```bash
curl -s http://localhost:3000/api/health | jq
```

**Test alerting:**
- Trigger an error
- Verify alert fires (check email/Slack/webhook)

If any verification fails, go back and fix it.

## AI Agent Integration

### Option A: Sentry MCP Server

For direct Claude integration, use the Sentry MCP server:
```json
// claude_desktop_config.json
{
  "mcpServers": {
    "sentry": {
      "command": "npx",
      "args": ["-y", "@anthropic/sentry-mcp"],
      "env": {
        "SENTRY_AUTH_TOKEN": "your-token",
        "SENTRY_ORG": "your-org"
      }
    }
  }
}
```

Claude can then:
- Query recent errors
- Get full error context
- Analyze root causes
- Propose fixes

### Option B: Webhook → GitHub Action → Agent

For automated triage:
```yaml
# .github/workflows/sentry-triage.yml
name: Sentry Auto-Triage

on:
  repository_dispatch:
    types: [sentry-issue]

jobs:
  triage:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Analyze with Claude
        run: |
          # Query issue details and spawn analysis agent
          claude --print "Analyze Sentry issue ${{ github.event.client_payload.issue_id }}"
```

### Option C: CLI Scripts (Already Exist)

```bash
# List and triage issues
~/.claude/skills/sentry-observability/scripts/list_issues.sh --env production
~/.claude/skills/sentry-observability/scripts/triage_score.sh --json

# Get issue details for analysis
~/.claude/skills/sentry-observability/scripts/issue_detail.sh PROJ-123

# Resolve after fixing
~/.claude/skills/sentry-observability/scripts/resolve_issue.sh PROJ-123
```

## Tool Choices

**Sentry over alternatives.** Best error tracking, mature CLI, AI-first roadmap (Seer webhooks, auto-fix features), excellent free tier.

**Vercel logs over log services.** stdout is captured automatically. No additional service needed. Query with `vercel logs`.

**PostHog for ALL analytics.** Official MCP server, Terraform provider, all-in-one platform. 1M events/month free.

**NOT Vercel Analytics.** It has no API, no CLI, no MCP server. Completely unusable for AI-assisted workflows. Do not install it.

## Environment Variables

```bash
# .env.example

# Sentry (required for error tracking)
NEXT_PUBLIC_SENTRY_DSN=
SENTRY_AUTH_TOKEN=
SENTRY_ORG=
SENTRY_PROJECT=
```

## What You Get

When complete:
- Sentry capturing all errors with source maps
- Health check endpoint at `/api/health`
- Structured JSON logging (captured by Vercel)
- At least one alert rule configured
- PostHog for product analytics (user-facing apps)
- AI agent integration ready (MCP or webhooks)

User can:
- See errors in Sentry immediately when they occur
- Get alerted on new/critical errors
- Query errors via CLI (`list_issues.sh`, `triage_score.sh`)
- Trigger AI analysis of errors
- Monitor app health via `/api/health`
- View logs via `vercel logs`

## Related Skills

- `sentry-observability` — Detailed Sentry setup and scripts
- `observability-stack` — PostHog/analytics integration patterns
- `observability-advocate` — Agent for auditing observability coverage