---
name: pg-performance-baseline
description: Generates a comprehensive PostgreSQL performance baseline report capturing health metrics, query workload patterns, table statistics, configuration settings, index utilization, and I/O patterns. Designed for before/after comparison when making changes.
---

# PostgreSQL Performance Baseline

This skill guides creation of a comprehensive performance baseline report for a PostgreSQL database, intended for comparison before and after making configuration changes, schema modifications, or infrastructure upgrades.

## When to Use This Skill

Use this skill when the user:

- Is about to make database configuration changes and wants a "before" snapshot
- Wants to document current database performance for reference
- Needs to compare performance across time periods
- Asks "how is my database performing?" or "give me a performance report"
- Is planning a migration and needs to capture current metrics
- Wants to establish SLAs or performance targets based on current state

## MCP Resources Available

- `pgtuner://docs/tools` -- Tool reference for parameter details
- `pgtuner://docs/prompts` -- Available prompt templates
- `pgtuner://settings/{category}` -- Settings by category (use for structured capture: `memory`, `checkpoint`, `wal`, `autovacuum`, `connections`)
- `pgtuner://health/{check_type}` -- Targeted health metrics by type
- `pgtuner://table/{schema}/{table_name}/stats` -- Quick per-table stats

## Related MCP Prompt

This skill corresponds to the **`performance_baseline`** MCP Prompt. If the user triggers that prompt, follow this skill's workflow.

## Prerequisites

- The pgtuner-mcp server must be connected with a valid `DATABASE_URI`
- Required: `pg_stat_statements` extension (for query workload data)
- Recommended: `pgstattuple` extension (for bloat measurements)
- Recommended: Sufficient uptime since last `pg_stat_statements` reset for meaningful workload data

## Agent Decision Logic

```
Before starting:
  |
  +--> Is this a "before" baseline (pre-change)?
  |    --> Collect all 8 steps, save results systematically
  |    --> Tell user: "Baseline collected. After making changes, ask me to
  |        collect another baseline and I will generate a comparison."
  |
  +--> Is this an "after" baseline (post-change)?
  |    --> Collect all 8 steps with identical parameters
  |    --> Generate the Comparison Template with before/after diff
  |
  +--> Is this a standalone performance report?
       --> Collect all 8 steps
       --> Include recommendations inline (not just metrics)
```

## Baseline Collection Workflow

Collect all data systematically. Each step gathers a different performance dimension.

### Step 1: Health Metrics

```
Tool: check_database_health
Parameters:
  include_recommendations: true
  verbose: true
```

**Record:**
- Overall health score
- Per-dimension scores (connections, cache, locks, replication, wraparound, disk, bgwriter, checkpoints)
- PostgreSQL version
- Uptime since last restart

### Step 2: Query Workload Profile

```
Tool: get_slow_queries
Parameters:
  limit: 20
  min_calls: 1
  order_by: "mean_time"
```

Then also sort by total time to find the highest aggregate cost:
```
Tool: get_slow_queries
Parameters:
  limit: 20
  min_calls: 1
  order_by: "calls"
```

Also capture over-fetching patterns:
```
Tool: get_slow_queries
Parameters:
  limit: 10
  min_calls: 5
  order_by: "rows"
```

**Record:**
- Top 20 queries by mean execution time
- Top 20 queries by call frequency
- Top 10 queries by rows returned (over-fetching candidates)
- Total query count and aggregate execution time
- Cache hit ratios per query
- Queries with `temp_blks_written > 0` (work_mem issues)

### Step 3: Table Statistics

```
Tool: get_table_stats
Parameters:
  schema_name: "public"
  include_indexes: true
  order_by: "size"
```

Also check for vacuum maintenance gaps:
```
Tool: get_table_stats
Parameters:
  schema_name: "public"
  include_indexes: false
  order_by: "last_vacuum"
```

And tables with most sequential scans (index candidates):
```
Tool: get_table_stats
Parameters:
  schema_name: "public"
  include_indexes: false
  order_by: "seq_scans"
```

**Record:**
- Table sizes (data + TOAST + indexes)
- Row counts and dead tuple ratios
- Sequential scan vs index scan ratios
- Last vacuum and analyze timestamps
- Index count and sizes per table

### Step 4: Configuration Snapshot

Capture settings by category for structured comparison:

```
Tool: review_settings
Parameters:
  category: "memory"
  include_all_settings: false
```

```
Tool: review_settings
Parameters:
  category: "checkpoint"
  include_all_settings: false
```

```
Tool: review_settings
Parameters:
  category: "wal"
  include_all_settings: false
```

```
Tool: review_settings
Parameters:
  category: "autovacuum"
  include_all_settings: false
```

```
Tool: review_settings
Parameters:
  category: "connections"
  include_all_settings: false
```

**Record key settings in categories:**

**Memory:**
- shared_buffers
- effective_cache_size
- work_mem
- maintenance_work_mem
- huge_pages

**WAL / Checkpoints:**
- wal_level
- max_wal_size
- min_wal_size
- checkpoint_completion_target
- wal_compression

**Autovacuum:**
- autovacuum_max_workers
- autovacuum_naptime
- autovacuum_vacuum_scale_factor
- autovacuum_vacuum_cost_delay

**Connections:**
- max_connections
- superuser_reserved_connections

**Planner:**
- random_page_cost
- effective_io_concurrency
- default_statistics_target

### Step 5: Index Utilization

```
Tool: find_unused_indexes
Parameters:
  schema_name: "public"
  min_size_mb: 0.1
  include_duplicates: true
```

**Record:**
- Total number of indexes
- Unused indexes (count and total size)
- Duplicate/overlapping indexes
- Total index storage footprint

### Step 6: Wait Event Distribution

```
Tool: analyze_wait_events
Parameters:
  active_only: true
```

**Record:**
- Wait event type distribution
- Top wait events by count
- Any Lock or IO waits indicating contention

### Step 7: Disk I/O Patterns

```
Tool: analyze_disk_io_patterns
Parameters:
  analysis_type: "all"
  top_n: 20
```

**Record:**
- Buffer pool hit ratio
- Top tables by I/O volume
- Temp file usage statistics
- Checkpoint I/O statistics
- Backend vs checkpointer write ratio

### Step 8: Bloat Overview (Optional but recommended)

```
Tool: get_bloat_summary
Parameters:
  schema_name: "public"
  top_n: 10
```

**Record:**
- Top bloated tables and estimated waste
- Top bloated indexes
- Overall estimated reclaimable space

## Output Format

Structure the baseline as a timestamped report:

### Performance Baseline Report

**Database:** `<database_name>`
**Collected:** `<timestamp>`
**PostgreSQL Version:** `<version>`
**Uptime:** `<uptime>`

---

#### 1. Health Summary

| Dimension | Score | Notes |
|-----------|-------|-------|
| Overall | XX | |
| Connections | XX | X/Y used |
| Cache | XX | Hit ratio: XX.X% |
| Locks | XX | |
| Wraparound | XX | Oldest XID: XXM |
| Checkpoints | XX | Requested: X%, Timed: X% |

#### 2. Workload Profile

| Metric | Value |
|--------|-------|
| Total unique queries tracked | X |
| Top query mean time | X ms |
| Top query total time | X sec |
| Average cache hit ratio | XX.X% |
| Queries > 100ms mean time | X |
| Queries > 1s mean time | X |

**Top 5 by mean execution time:**

| Query (truncated) | Mean Time | Calls | Total Time |
|-------------------|-----------|-------|------------|
| SELECT ... | X ms | X | X s |

#### 3. Storage Profile

| Metric | Value |
|--------|-------|
| Total tables | X |
| Total data size | X GB |
| Total index size | X GB |
| Largest table | X (X GB) |
| Tables with > 10% dead tuples | X |
| Tables never vacuumed | X |

#### 4. Index Profile

| Metric | Value |
|--------|-------|
| Total indexes | X |
| Unused indexes | X (X MB wasted) |
| Duplicate indexes | X |
| Index-to-table size ratio | X:1 |

#### 5. Configuration Highlights

| Setting | Current | Recommended | Gap |
|---------|---------|-------------|-----|
| shared_buffers | X | X | OK / Needs change |
| work_mem | X | X | OK / Needs change |
| ... | ... | ... | ... |

#### 6. I/O Profile

| Metric | Value |
|--------|-------|
| Buffer cache hit ratio | XX.X% |
| Temp files created (since reset) | X |
| Checkpoint frequency (req/timed) | X% / X% |

#### 7. Current Issues

Numbered list of any active problems found during the baseline collection.

---

### Comparison Template

When collecting a second baseline after changes, present a diff:

| Metric | Before | After | Change |
|--------|--------|-------|--------|
| Health score | 82 | 91 | +9 |
| Cache hit ratio | 97.2% | 99.4% | +2.2% |
| Top query mean time | 450ms | 120ms | -73% |
| Unused index waste | 2.4 GB | 0.3 GB | -88% |

## Best Practices

- **Collect baselines at consistent times**: Same day of week, similar workload period
- **Do not reset pg_stat_statements** between before/after baselines
- **Record the exact timestamp** of collection for accurate comparison
- **Run the same version of tools** for both baselines
- **Capture system metrics too** (CPU, memory, disk I/O from OS) if possible, as they complement database-level metrics
- **Save the raw tool output** in addition to the summary for deeper post-analysis

## Important Notes

- All data collection is read-only and safe for production
- The quality of workload data depends on how long `pg_stat_statements` has been collecting since the last reset
- Very recently restarted databases will have incomplete statistics
- Table statistics require regular `ANALYZE` to be accurate: check `last_analyze` timestamps
- Bloat measurements with `pgstattuple` are I/O intensive on large tables (use `use_approx: true` for tables > 50 GB)

## Iterative Verification

Since baselines are snapshots, "verification" means collecting a second baseline for comparison:

1. **After configuration changes**: Collect a new baseline using identical parameters. Compare using the Comparison Template.
2. **After index changes**: Re-collect Step 2 (workload) and Step 5 (index utilization) to verify query improvements and index cleanup.
3. **After vacuum/bloat remediation**: Re-collect Step 8 (bloat overview) and Step 3 (table stats) to confirm space reclamation.
4. **After all changes**: Wait for representative workload (at least 24 hours of normal traffic) before collecting the "after" baseline.

## When to Stop and Ask the User

- **If pg_stat_statements was recently reset**: "Query workload data is limited because pg_stat_statements was recently reset. The baseline will be incomplete for the workload dimension. Would you like to proceed or wait for more data?"
- **If this is a post-change baseline**: "Is this baseline being collected after making changes? If so, I will generate a before/after comparison. Do you have the previous baseline results?"
- **If the database was recently restarted**: "The database was restarted recently. Cumulative statistics (buffer hits, checkpoint counts) may not be representative yet."

## Cross-References to Other Skills

After collecting a baseline, the agent should recommend specific skills based on findings:

- **Health score issues**: Use `pg-health-check` for deeper investigation
- **Slow queries identified**: Use `pg-slow-query-diagnosis` skill
- **Unused indexes found**: Use `pg-index-optimization` skill
- **Bloat detected**: Use `pg-bloat-analysis` skill
- **Configuration gaps**: Use `pg-config-tuning` skill
- **I/O concerns**: Use `pg-io-deep-dive` skill

## PostgreSQL Version Notes

| Feature | Version | Impact on Baseline |
|---------|---------|-------------------|
| `pg_stat_statements` `total_exec_time` | PG13+ | Column renamed from `total_time` |
| `pg_stat_io` | PG16+ | Additional I/O metrics available |
| `compute_query_id` | PG14+ | Required for queryid |
| `pgstattuple_approx()` | PG9.5+ | Faster bloat estimation |

## Production Safety

- All baseline collection is strictly read-only
- No data, configuration, or schema is modified
- Bloat measurements (`analyze_table_bloat` with `use_approx: false`) are I/O intensive and should use `use_approx: true` for very large tables during peak hours