--- name: pg-lock-diagnosis description: Diagnoses PostgreSQL lock contention including blocking lock chains, deadlock patterns, idle-in-transaction problems, and wait event analysis. Guides the agent through identifying blockers, understanding lock types, and recommending transaction and configuration changes. --- # PostgreSQL Lock Contention Diagnosis This skill guides investigation and resolution of lock contention issues in PostgreSQL using pgtuner-mcp tools. ## When to Use This Skill Use this skill when the user: - Reports queries being blocked or timing out - Sees "Lock" wait events in `pg_stat_activity` - Experiences deadlocks (logged in PostgreSQL error log) - Reports `idle in transaction` sessions causing problems - Asks "why are my queries waiting?" or "what is blocking this query?" - Sees application-level timeout errors due to lock waits - Reports slow response times during peak concurrent usage ## MCP Resources Available - `pgtuner://health/locks` -- Quick lock health check (lightweight, targeted) - `pgtuner://health/connections` -- Connection state overview - `pgtuner://settings/connections` -- Connection and timeout configuration ## Related MCP Prompt This skill extends the lock analysis portions of the **`health_check`** MCP Prompt (which checks locks at a surface level). ## Prerequisites - The pgtuner-mcp server must be connected with a valid `DATABASE_URI` - Access to `pg_stat_activity`, `pg_locks` system views ## Agent Decision Tree ``` What is the user's symptom? | +--> "Queries are being blocked / timing out" | --> Start with Step 1 (Active Queries) to find blockers | +--> "Deadlock errors in logs" | --> Start with Step 2 (Wait Events) then Step 1 with include_idle | +--> "idle in transaction problems" | --> Start with Step 1 (include_idle: true, min_duration_seconds: 60) | +--> "General lock performance concerns" | --> Start with Step 3 (Health Check for locks) then Step 2 | +--> "Lock waits during maintenance (VACUUM, REINDEX, etc.)" --> Start with Step 1 then check Step 4 (Configuration) ``` ## Lock Investigation Workflow ### Step 1: Identify Active Queries and Blockers ``` Tool: get_active_queries Parameters: include_idle: true include_system: false min_duration_seconds: 0 ``` **What to look for:** | State | Duration | Severity | Action | |-------|----------|----------|--------| | `active` | < 5s | Normal | No issue | | `active` | > 30s | Warning | Investigate the query | | `active` | > 5 min | Critical | Consider canceling | | `idle in transaction` | > 60s | Warning | Application not closing transactions | | `idle in transaction` | > 5 min | Critical | Blocks vacuum and other transactions | | `idle in transaction (aborted)` | any | Warning | Failed transaction not rolled back | **Agent reasoning for blocking chains:** The tool output includes information about blocked queries and their blockers. Build a blocking chain: ``` Blocker (PID: 1234, idle in transaction, started 10 min ago) | +--> Blocked (PID: 2345, waiting for RowExclusiveLock on orders) | +--> Blocked (PID: 3456, waiting for AccessShareLock on orders) ``` The **root blocker** is the session that is NOT waiting on any lock itself. This is usually the target for resolution. **IF root blocker is `idle in transaction`:** The application opened a transaction but never committed/rolled back. **IF root blocker is `active` with a long-running query:** The query is holding locks while executing. May need optimization. ### Step 2: Analyze Wait Events ``` Tool: analyze_wait_events Parameters: active_only: false ``` Note: Using `active_only: false` to include idle-in-transaction sessions in the wait event analysis. **Lock-related wait events:** | Wait Event | Meaning | Common Cause | |------------|---------|-------------| | `Lock:relation` | Waiting for table-level lock | DDL operations, VACUUM FULL, LOCK TABLE | | `Lock:tuple` | Waiting for row-level lock | Concurrent UPDATE/DELETE on same row | | `Lock:transactionid` | Waiting for another transaction to complete | Transaction holding row lock hasn't committed | | `Lock:virtualxid` | Waiting for virtual transaction ID | Usually brief, indicates snapshot conflict | | `Lock:extend` | Waiting to extend a relation | Concurrent inserts on same table, consider fillfactor | | `Lock:advisory` | Waiting for advisory lock | Application-level locking | **IF many `Lock:tuple` waits:** Multiple transactions are competing for the same rows. Recommend: - Shorter transactions - Access rows in consistent order to avoid deadlocks - Consider `SELECT ... FOR UPDATE SKIP LOCKED` for queue-like patterns **IF many `Lock:relation` waits:** A DDL or maintenance operation is blocking normal queries: - `VACUUM FULL` takes `AccessExclusiveLock` - `ALTER TABLE` takes `AccessExclusiveLock` - `CREATE INDEX` (without `CONCURRENTLY`) takes `ShareLock` ### Step 3: Targeted Lock Health Check For a quick initial assessment, read the MCP resource first: Read resource: `pgtuner://health/locks` For a deeper check: ``` Tool: check_database_health Parameters: include_recommendations: true verbose: true ``` Focus on the "locks" dimension score: - Score >= 90: Lock contention is minimal - Score 70-89: Some contention, investigate - Score < 70: Significant contention, urgent ### Step 4: Review Lock-Related Configuration ``` Tool: review_settings Parameters: category: "connections" include_all_settings: false ``` **Key lock-related settings:** | Setting | Default | Recommended | Purpose | |---------|---------|-------------|---------| | `lock_timeout` | 0 (infinite) | 5s-30s | Maximum time to wait for a lock before failing | | `deadlock_timeout` | 1s | 1s (usually fine) | Time before checking for deadlocks | | `idle_in_transaction_session_timeout` | 0 (infinite) | 60s-300s | Auto-terminate idle-in-transaction | | `statement_timeout` | 0 (infinite) | 30s-120s | Maximum query execution time | | `max_locks_per_transaction` | 64 | 64 (increase if needed) | Lock table size (rarely needs changing) | ### Step 5: Correlate with Query Patterns If specific queries are repeatedly involved in lock contention: ``` Tool: get_slow_queries Parameters: limit: 20 min_calls: 5 order_by: "mean_time" ``` Look for UPDATE/DELETE queries with high mean time -- they may be holding locks longer than necessary. Then analyze the lock-holding query: ``` Tool: analyze_query Parameters: query: "" analyze: true buffers: true verbose: false ``` IF the query is slow due to missing indexes, adding an index will reduce lock hold time (because the query completes faster). ## Common Lock Contention Patterns and Solutions ### Pattern 1: Long-Running Transaction Blocks Everything **Symptoms:** Multiple sessions waiting on `Lock:transactionid`. One session in `idle in transaction` state for minutes. **Root cause:** Application opens transaction, does work outside the DB (API call, file processing), then comes back to commit. **Solution:** ```sql -- Set idle_in_transaction timeout ALTER SYSTEM SET idle_in_transaction_session_timeout = '60s'; SELECT pg_reload_conf(); -- To immediately terminate the idle session (use with caution): SELECT pg_terminate_backend(); ``` ### Pattern 2: Deadlocks Between Concurrent Updates **Symptoms:** `deadlock detected` errors in PostgreSQL logs. **Root cause:** Two transactions update the same rows in different order: - Transaction A: UPDATE row 1, then UPDATE row 2 - Transaction B: UPDATE row 2, then UPDATE row 1 **Solution:** - Ensure transactions always access rows in a consistent order (e.g., by primary key) - Use `SELECT ... FOR UPDATE` to acquire locks upfront in a deterministic order - Keep transactions as short as possible ### Pattern 3: DDL Blocking DML **Symptoms:** `ALTER TABLE` or `CREATE INDEX` blocks all queries on the table. **Solution:** ```sql -- Use CONCURRENTLY for index operations CREATE INDEX CONCURRENTLY idx_name ON table_name (column); REINDEX INDEX CONCURRENTLY idx_name; -- For ALTER TABLE, consider: SET lock_timeout = '5s'; -- Fail fast if lock not acquired ALTER TABLE ...; -- Retry during low-traffic period ``` ### Pattern 4: VACUUM FULL Blocking Queries **Symptoms:** `VACUUM FULL` holds `AccessExclusiveLock`, blocking all access. **Solution:** - Use regular `VACUUM` instead (only takes `ShareUpdateExclusiveLock`, does not block reads/writes) - Use `pg_repack` for online table rewriting without exclusive locks - Schedule `VACUUM FULL` only during maintenance windows ### Pattern 5: Lock Queue Amplification **Symptoms:** One DDL statement queued behind a long-running query, then all subsequent queries queue behind the DDL. **Explanation:** PostgreSQL lock queue is FIFO. If an `ALTER TABLE` (needs `AccessExclusiveLock`) waits for a long SELECT, all new queries also wait behind the ALTER. **Solution:** ```sql -- Set a lock_timeout before DDL SET lock_timeout = '3s'; ALTER TABLE ...; -- If it fails, retry later. Don't let the DDL queue block everyone. ``` ## Output Format ### Lock Contention Report **Current Blocking Chains:** ``` Root Blocker: PID 1234 | State: idle in transaction | Duration: 12 min Query: BEGIN; UPDATE orders SET ... WHERE id = 42; | +--> Blocked: PID 2345 | Waiting: 3 min | Lock: RowExclusiveLock on orders | Query: UPDATE orders SET status = 'shipped' WHERE id = 42; | +--> Blocked: PID 3456 | Waiting: 2 min | Lock: AccessShareLock on orders Query: SELECT * FROM orders WHERE id = 42; ``` **Wait Event Distribution:** | Wait Type | Count | Top Events | |-----------|-------|------------| | Lock | 5 | tuple (3), transactionid (2) | | IO | 2 | DataFileRead (2) | **Recommendations:** 1. Immediate: Terminate PID 1234 (idle in transaction for 12 min) 2. Configuration: Set `idle_in_transaction_session_timeout = '60s'` 3. Application: Fix transaction handling in the orders update flow ## Iterative Verification After resolving lock contention: 1. Re-run `get_active_queries` with `include_idle: true` to confirm no more blocking chains 2. Re-run `analyze_wait_events` to confirm Lock wait events have decreased 3. Monitor over the next hour to ensure the pattern doesn't recur ## When to Stop and Ask the User - **Before terminating a backend**: Always ask: "PID XXXX is blocking N other sessions and has been idle in transaction for X minutes. Should I provide the command to terminate it?" - **If the blocker is a known batch job**: Ask: "The blocking session appears to be a batch operation. Is this expected? Should we wait for it to complete?" - **If lock contention is application-level**: "The lock contention is caused by application transaction patterns. This requires application code changes, not database tuning. Would you like me to describe the recommended changes?" ## PostgreSQL Version Notes | Feature | Version | |---------|---------| | `idle_in_transaction_session_timeout` | PG9.6+ | | `REINDEX CONCURRENTLY` | PG12+ | | `Lock wait event type breakdown` | PG9.6+ (detailed wait events) | | `pg_blocking_pids()` function | PG9.6+ | ## Production Safety - All diagnostic tools used here are read-only - `pg_cancel_backend()` attempts a graceful cancel (query stops, transaction stays open) - `pg_terminate_backend()` forcefully terminates the session (use as last resort) - Never terminate the `autovacuum` launcher unless absolutely necessary - Setting `lock_timeout` before DDL is a safe practice to prevent queue amplification