--- name: pg-bloat-analysis description: Detects and remediates table and index bloat in PostgreSQL using pgstattuple analysis. Identifies fragmentation, dead tuples, and wasted space with prioritized maintenance actions including VACUUM, REINDEX, and pg_repack recommendations. --- # PostgreSQL Bloat Analysis This skill guides detection, analysis, and remediation of table and index bloat in PostgreSQL databases using pgtuner-mcp tools. ## When to Use This Skill Use this skill when the user: - Reports tables or indexes growing unexpectedly large - Asks about database bloat, dead tuples, or fragmentation - Mentions disk space running low - Sees slow sequential scans on tables that should be small - Asks "why is my table so large?" or "how do I reclaim disk space?" - Notices `VACUUM` not keeping up or autovacuum running constantly - Reports degraded performance after bulk DELETE or UPDATE operations ## MCP Resources Available - `pgtuner://health/bloat` -- Quick bloat health assessment (lightweight) - `pgtuner://table/{schema}/{table_name}/stats` -- Table statistics including dead tuple counts - `pgtuner://settings/autovacuum` -- Autovacuum configuration (relevant for root cause analysis) ## Related MCP Prompt This skill is referenced by the **`health_check`** MCP Prompt (Step 6: Quick Bloat Scan). For dedicated bloat investigation, follow this skill's full workflow. ## Prerequisites - The pgtuner-mcp server must be connected with a valid `DATABASE_URI` - Required: `pgstattuple` extension (for accurate bloat measurement) - Without `pgstattuple`: The tools fall back to statistical estimation (less accurate) ## Pre-Flight Checks - If `analyze_table_bloat` or `get_bloat_summary` returns an error about `pgstattuple`, inform the user: "The pgstattuple extension is not installed. Bloat measurements will be estimated from `pg_stat_user_tables` statistics, which is less accurate. For precise measurements, install pgstattuple: `CREATE EXTENSION pgstattuple;`" ## Agent Decision Tree ``` User reports bloat concerns | +--> General "how bloated is my database?" | --> Start with Step 1 (get_bloat_summary) | +--> Specific table is too large | --> Skip to Step 2 (analyze_table_bloat) for that table | +--> Disk space running low | --> Start with Step 1, then prioritize high-bloat tables | +--> After Step 1, triage results: | +--> Dead tuple ratio > 20% on any table? | --> Deep-dive Step 2, then investigate root cause (Step 5) | +--> XID age > 800M on any table? | --> URGENT: redirect to pg-vacuum-tuning emergency procedures | +--> Index bloat > 40%? | --> Step 3 (index bloat analysis), recommend REINDEX | +--> Autovacuum not keeping up? --> Step 4 (vacuum status), then pg-vacuum-tuning skill ``` ## Bloat Analysis Workflow ### Step 1: Database-Wide Bloat Survey Start with a high-level overview: ``` Tool: get_bloat_summary Parameters: schema_name: "public" top_n: 20 min_size_gb: 0.01 # Lower threshold to catch smaller tables too ``` **What this provides:** - Combined table and index bloat data - Prioritized maintenance action list (high/medium/low priority) - Total wasted space estimate **Triage the results:** - **High priority**: Tables with dead tuple ratio > 20% or approaching wraparound - **Medium priority**: Tables with dead tuple ratio 10-20% or significant index bloat - **Low priority**: Minor bloat that autovacuum should handle ### Step 2: Deep Table Bloat Analysis For tables flagged in the survey, check dead tuple density: ``` Tool: get_table_stats Parameters: schema_name: "public" include_indexes: false order_by: "dead_tuples" ``` Then do a deep analysis on the worst offenders: ``` Tool: analyze_table_bloat Parameters: table_name: "" schema_name: "public" use_approx: false # Use exact measurement for important tables include_toast: true # Check TOAST tables too (large text/json columns) ``` For very large tables (> 100 GB), use approximate mode to avoid long scans: ``` Tool: analyze_table_bloat Parameters: table_name: "" use_approx: true min_table_size_gb: 0.1 ``` **Interpret the results:** | Metric | Healthy | Warning | Critical | |--------|---------|---------|----------| | dead_tuple_percent | < 5% | 5-20% | > 20% | | free_space_percent | < 10% | 10-30% | > 30% | | tuple_percent (live data density) | > 80% | 50-80% | < 50% | **Severity scoring (from the tool):** - Score >= 6: Critical bloat - Score >= 4: High bloat - Score >= 2: Moderate bloat - Score >= 1: Low bloat ### Step 3: Index Bloat Analysis For indexes on bloated tables: ``` Tool: analyze_index_bloat Parameters: table_name: "" schema_name: "public" min_index_size_gb: 0.01 min_bloat_percent: 15 ``` **Interpret the results:** | Index Type | Metric | Healthy | Bloated | |-----------|--------|---------|---------| | B-tree | avg_leaf_density | > 80% | < 70% | | B-tree | estimated_bloat | < 20% | > 30% | | GIN | pending_pages | 0 | > 100 | **Common causes of index bloat:** - HOT updates not possible (indexed columns being updated) - Long-running transactions preventing dead tuple cleanup - Monotonically increasing keys with deletions (B-tree right-growth pattern) ### Step 4: Vacuum Status Check Understand why bloat accumulated: ``` Tool: monitor_vacuum_progress Parameters: action: "needs_vacuum" schema_name: "public" min_dead_tuples: 100 ``` Check autovacuum configuration: ``` Tool: monitor_vacuum_progress Parameters: action: "autovacuum_status" ``` Check recent vacuum history: ``` Tool: monitor_vacuum_progress Parameters: action: "recent_activity" ``` **What to look for:** - Tables that need vacuum but haven't been vacuumed - Autovacuum workers at capacity (all slots busy) - Tables with aggressive thresholds that never get vacuumed - Transaction wraparound risk (XID age approaching `autovacuum_freeze_max_age`) ### Step 5: Root Cause Analysis Based on the data collected, identify why bloat is accumulating: | Symptom | Root Cause | Evidence | |---------|-----------|----------| | High dead tuples, recent vacuum | Update/delete rate exceeds vacuum throughput | `n_dead_tup` still high after vacuum | | High dead tuples, no recent vacuum | Autovacuum not running on this table | `last_autovacuum` is NULL or very old | | High dead tuples, vacuum running | Long-running transaction holding back cleanup | `backend_xmin` in `pg_stat_activity` | | Index bloat but table is OK | HOT updates not possible | Updated columns are indexed | | Free space high, dead tuples low | Space not returned to OS after mass delete | Table was VACUUMed but not reclaimed | ## Remediation Plan ### For Table Bloat **Mild bloat (dead tuples 5-20%):** ```sql -- Standard VACUUM (reclaims dead tuples, does NOT return space to OS) VACUUM VERBOSE schema.table_name; ``` **Severe bloat (dead tuples > 20% or free space > 30%):** ```sql -- VACUUM FULL rewrites the table (LOCKS TABLE - schedule during maintenance window) VACUUM FULL VERBOSE schema.table_name; -- Then update statistics ANALYZE schema.table_name; ``` **Alternative for production (no downtime):** ```sql -- pg_repack (requires pg_repack extension) -- Rewrites table online without exclusive lock pg_repack --table schema.table_name --no-superuser-check -d dbname ``` ### For Index Bloat **Moderate index bloat (20-40%):** ```sql -- REINDEX CONCURRENTLY (PostgreSQL 12+, no blocking) REINDEX INDEX CONCURRENTLY schema.index_name; ``` **Severe index bloat (> 40%):** ```sql -- For all indexes on a table REINDEX TABLE CONCURRENTLY schema.table_name; ``` ### Autovacuum Tuning If autovacuum is not keeping up: ```sql -- Per-table autovacuum tuning for high-churn tables ALTER TABLE schema.table_name SET ( autovacuum_vacuum_scale_factor = 0.01, -- Vacuum at 1% dead tuples (default 20%) autovacuum_vacuum_threshold = 1000, -- Minimum dead tuples before vacuum autovacuum_analyze_scale_factor = 0.005, -- Analyze at 0.5% changes autovacuum_vacuum_cost_delay = '2ms' -- Faster vacuum (default is 20ms on PG<12, 2ms on PG12+) ); ``` Global autovacuum tuning: ```sql -- Make autovacuum more aggressive globally ALTER SYSTEM SET autovacuum_max_workers = 6; -- Default 3 ALTER SYSTEM SET autovacuum_vacuum_cost_delay = '2ms'; -- Default 20ms ALTER SYSTEM SET autovacuum_naptime = '15s'; -- Default 1min -- Requires reload: SELECT pg_reload_conf(); ``` ## Output Format ### Bloat Summary | Object | Type | Size | Bloat % | Dead Tuples | Priority | Action | |--------|------|------|---------|-------------|----------|--------| | orders | table | 12 GB | 35% | 4.2 GB | High | VACUUM FULL or pg_repack | | idx_orders_date | index | 3.2 GB | 42% | 1.3 GB | High | REINDEX CONCURRENTLY | ### Root Cause Brief explanation of why bloat accumulated. ### Maintenance Script Ordered SQL commands to resolve the issues, with timing and locking notes. ### Prevention Recommendations Autovacuum tuning and application-level changes to prevent recurrence. ## Important Notes - `analyze_table_bloat` with `use_approx: false` reads the entire table (I/O intensive on large tables) - `VACUUM FULL` requires an exclusive lock on the table (blocks all reads and writes) - `REINDEX CONCURRENTLY` requires PostgreSQL 12+ - Always run `ANALYZE` after `VACUUM FULL` or `REINDEX` to update statistics - TOAST table bloat is often overlooked: use `include_toast: true` for tables with large text/json columns - `pgstattuple` extension must be installed for accurate measurements. Without it, results are estimated. ## Iterative Verification After applying remediation: 1. **After VACUUM**: Re-run `analyze_table_bloat` to confirm dead tuples were cleaned up. Note: `VACUUM` does not reduce table size on disk (only marks space as reusable). 2. **After VACUUM FULL**: Re-run `analyze_table_bloat` to confirm both dead tuples and free space were reclaimed. 3. **After REINDEX**: Re-run `analyze_index_bloat` to confirm avg_leaf_density improved. 4. **After autovacuum tuning**: Monitor with `monitor_vacuum_progress` `action: "progress"` over the next hours/days to confirm vacuum is keeping up. ## When to Stop and Ask the User - **Before recommending VACUUM FULL**: "VACUUM FULL requires an exclusive lock on the table, blocking all reads and writes for the duration. For a X GB table, this could take Y minutes. When is your maintenance window? Alternatively, pg_repack can do this online." - **If pgstattuple is not installed**: "Bloat measurements are estimated. For accurate results, install pgstattuple. Would you like to proceed with estimates?" - **If bloat is moderate (< 20%)**: "The bloat level is within normal range and autovacuum should handle it. No immediate action needed. Would you like me to tune autovacuum for better ongoing maintenance?" - **If root cause is long-running transactions**: "Bloat is accumulating because of idle-in-transaction sessions blocking vacuum. This requires application-level fixes. Would you like me to investigate the blocking sessions?" ## Cross-References to Other Skills - **Autovacuum tuning for bloat prevention**: Use `pg-vacuum-tuning` skill - **Index-specific bloat and optimization**: Use `pg-index-optimization` skill - **I/O impact of bloated tables**: Use `pg-io-deep-dive` skill - **Lock issues blocking vacuum**: Use `pg-lock-diagnosis` skill ## PostgreSQL Version Notes | Feature | Version | |---------|---------| | `pgstattuple_approx()` | PG9.5+ | | `REINDEX CONCURRENTLY` | PG12+ | | `pg_stat_progress_vacuum` | PG9.6+ | | Improved autovacuum cost delay default (2ms) | PG12+ | ## Production Safety - `analyze_table_bloat` and `analyze_index_bloat` are read-only but I/O intensive (especially with `use_approx: false` on large tables) - `get_bloat_summary` is read-only and safe - All VACUUM, REINDEX, and ALTER TABLE commands are recommendations, not executed by the tools - For large tables (> 50 GB), always use `use_approx: true` to avoid long scan times