# MCP Prompts and Workflow Guide

## Overview

IonHour's MCP server includes **7 built-in prompts** — pre-built workflow templates that guide AI assistants through multi-step monitoring operations. Prompts provide structured instructions so the assistant follows best practices without you having to spell out every step.

You can also configure **project rules** in your AI tool to customize how the assistant interacts with IonHour.

## Built-in Prompts

### `diagnose_incident`

**Purpose:** Step-by-step incident investigation workflow.

**Arguments:**
- `incidentId` (required): The incident ID to diagnose

**What it does:**
1. Fetches incident details with `get_incident`
2. Gets current check status and recent signals with `get_check_status`
3. Retrieves signal history with `list_signals`
4. Checks uptime trend with `get_check_uptime`
5. Looks for dependency involvement
6. Produces a summary: what happened, when, likely root cause, and recommended action

**Example usage:**
```
Use the diagnose_incident prompt for incident 42
```

---

### `setup_monitoring`

**Purpose:** Guided workflow to set up monitoring for a new service from scratch.

**Arguments:**
- `serviceName` (required): Name of the service to monitor

**What it does:**
1. Lists existing projects to avoid duplicates
2. Creates a new project if needed
3. Asks what checks are needed (interval, endpoints)
4. Registers checks with `register_check`
5. Verifies alert channels exist, creates them if missing
6. Sets up escalation rules to connect the project to alert channels
7. Provides a summary with ping URLs and integration instructions

**Example usage:**
```
Use the setup_monitoring prompt for "payment-api"
```

---

### `deployment_checklist`

**Purpose:** Pre-deployment verification and deployment window management.

**Arguments:**
- `projectId` (required): The project ID being deployed

**What it does:**

*Pre-deploy:*
1. Lists all checks for the project
2. Checks for active incidents
3. Reports current status

*Deploy:*
4. Creates a deployment window (auto-pauses checks)
5. Confirms checks are paused

*Post-deploy:*
6. Waits for user confirmation
7. Ends the deployment window (resumes checks)
8. Verifies all checks are healthy

**Example usage:**
```
Use the deployment_checklist prompt for project 5
```

---

### `weekly_reliability_report`

**Purpose:** Generate a reliability summary across all checks.

**Arguments:**
- `daysBack` (optional, default: 7): Number of days to cover

**What it does:**
1. Gets workspace summary for current status
2. Lists all checks
3. Calculates uptime for each check (top 10)
4. Reviews incidents from the period
5. Compiles a report with:
   - Overall workspace uptime percentage
   - Best and worst performing checks
   - Incident summary (count, types, MTTR)
   - Recommendations for improving reliability

**Example usage:**
```
Use the weekly_reliability_report prompt with 30 days
```

---

### `triage_all_incidents`

**Purpose:** List and triage all active incidents.

**Arguments:** None

**What it does:**
1. Lists all active incidents
2. Fetches full details for each one
3. Assesses severity and impact
4. Presents a prioritized list
5. For each incident, asks if you want to:
   - Acknowledge it
   - Resolve it
   - Investigate further (delegates to `diagnose_incident`)
6. Provides a summary of all actions taken

**Example usage:**
```
Use the triage_all_incidents prompt
```

---

### `status_page_incident`

**Purpose:** Guided workflow to communicate an incident through status pages.

**Arguments:**
- `incidentId` (required): The incident ID to communicate about

**What it does:**
1. Fetches incident details to understand what happened
2. Lists status pages to find the right one
3. Drafts an announcement based on severity:
   - CRITICAL incidents → impact=CRITICAL, status=INVESTIGATING
   - WARNING incidents → impact=MINOR, status=INVESTIGATING
4. Creates the announcement on the status page
5. Asks if you want to post follow-up updates as the situation evolves
6. When resolved, creates a follow-up with status=RESOLVED

**Example usage:**
```
Use the status_page_incident prompt for incident 15
```

---

### `dependency_health_audit`

**Purpose:** Audit all external dependencies and assess their health impact.

**Arguments:** None

**What it does:**
1. Lists all registered dependencies
2. For each dependency with status DOWN or unknown:
   - Gets linked checks
   - Checks each linked check's current status
3. Reviews active incidents for correlation
4. Produces a health report:
   - Dependencies grouped by status (OK / DOWN / unknown)
   - Checks impacted by unhealthy dependencies
   - Recommendations for dependencies that need attention
5. Asks if you want to update any dependency statuses

**Example usage:**
```
Use the dependency_health_audit prompt
```

## Project Rules

Most AI tools support project-specific rules files that customize behavior. Add IonHour-specific guidance to ensure consistent, safe interactions.

### Rules File Locations

| AI Tool | File |
|---------|------|
| Claude Code | `CLAUDE.md` or `.claude/rules.md` |
| Cursor | `.cursorrules` |
| Windsurf | `.windsurfrules` |

### Example Rules

```markdown
# IonHour Monitoring Rules

## Before Making Changes
- Always call `get_workspace_summary` first to understand current state
- Use `find_check_by_name` to resolve check names to IDs before operations
- List existing resources before creating new ones to avoid duplicates

## Check Creation
- Use descriptive names: `{service}-{what}` (e.g., "payment-api-health", "order-processor-heartbeat")
- Default to 5-minute intervals unless the user specifies otherwise
- Always set up alert channels and escalation rules after creating checks
- Provide the ping URL and integration example after creation

## Incident Response
- Acknowledge incidents before investigating to signal awareness
- Always add a note explaining what you found during investigation
- When resolving, include a message explaining the resolution
- After resolving, verify checks return to OK status

## Deployments
- Check for active incidents before starting a deployment
- Always use deployment windows — never just pause checks manually
- Verify all checks are healthy after ending a deployment

## Destructive Operations
- Always confirm with the user before deleting any resource
- List what will be affected (e.g., "This check has 5 associated incidents that will be cleaned up")
- Never delete alert channels that have active escalation rules without warning

## Scheduling
- Avoid scheduling multiple checks at identical intervals to prevent load spikes
- For critical services, use shorter intervals (5 min)
- For background jobs, longer intervals are fine (30 min - 1 hour)
```

### Environment-Specific Rules

For teams managing multiple environments, add context about which workspace the API key connects to:

```markdown
## Environment Context
- This workspace is PRODUCTION — confirm all changes with the user
- Do not create test checks in this workspace
- Coordinate with the team before modifying shared alert channels
- All incidents should be communicated through the status page
```

For staging/development workspaces:

```markdown
## Environment Context
- This workspace is STAGING — safe to experiment
- Test alert channels before replicating to production
- Use the deployment_checklist prompt before deploying to production
```

## Best Practices

### 1. Start with Context

Before performing any operation, understand the current state:

```
What's the current status of my workspace?
→ AI calls get_workspace_summary, reports check counts by status and active incidents
```

### 2. Use Prompts for Complex Workflows

Instead of issuing individual tool calls, use prompts for multi-step operations:

```
# Instead of:
"List my projects, then create a check, then set up alerts..."

# Use:
"Use the setup_monitoring prompt for my new user-service"
```

### 3. Name Resolution

Checks can be referenced by name instead of ID:

```
"What's the uptime of the payment-health check?"
→ AI uses find_check_by_name to resolve "payment-health" to a check ID
→ Then calls get_check_uptime with the resolved ID
```

### 4. Human-Readable Intervals

When creating checks, use natural language:

```
"Create a check that runs every 15 minutes"
→ AI passes interval="every 15 minutes" to register_check
→ Server parses to 900 seconds
```

### 5. Confirm Before Destructing

AI assistants should always confirm before:
- Deleting checks, alert channels, escalation rules, or dependencies
- Resolving incidents (the user may not realize this closes the incident)
- Modifying production escalation rules

### 6. Post-Action Verification

After write operations, verify the result:
- After `register_check` → provide the ping URL and suggest a test heartbeat
- After `create_deployment` → confirm checks are paused
- After `end_deployment` → verify checks return to OK
- After `resolve_incident` → check that no new incidents appeared

## Prompt Priority

When multiple guidance sources exist, they apply in this order (highest priority first):

1. **Direct user instructions** in the current conversation
2. **Project rules files** (CLAUDE.md, .cursorrules, etc.)
3. **Built-in prompts** when explicitly invoked
4. **Resource guides** (ionhour://guides/workflows)

## Combining Prompts

Prompts can chain naturally. For example, during an incident response:

1. Start with `triage_all_incidents` to see all active issues
2. Use `diagnose_incident` for the most critical one
3. Use `status_page_incident` to communicate to users
4. After fixing, verify with `get_workspace_summary`

Or during a release:

1. Start with `deployment_checklist` for the project
2. If issues arise after deploy, use `triage_all_incidents`
3. Generate a `weekly_reliability_report` to track the impact