---
name: database-migration-safe
description: "Use when creating database migrations. Prevents data loss, downtime, and performance issues. Supports PostgreSQL, MySQL, SQLite. Python 3.8+"
author: "Claude Code Learning Flywheel Team"
allowed-tools: ["Read", "Write", "Edit", "Bash", "Grep", "Glob"]
version: 1.0.0
last_verified: "2026-01-01"
tags: ["database", "migrations", "safety", "operations"]
related-skills: []
---

# Skill: Safe Database Migrations

## Purpose
Database changes are high-risk operations. This skill acts as a safety harness to prevent data loss, downtime, and performance issues during schema migrations.

## 1. Negative Knowledge (Critical Blockers)

> **🛑 STOP:** Do not proceed if your plan involves any of these patterns.

| Dangerous Operation | Why It Fails | Safe Alternative |
| :--- | :--- | :--- |
| Renaming columns | Causes downtime, breaks running app | Add new column → backfill → switch code → drop old |
| Renaming tables | Breaks all running queries | Create view → migrate code → rename later |
| Adding NOT NULL without default | Fails on large tables | Add as nullable → backfill → add constraint |
| Default values on large tables | Locks entire table | Add default in application layer first |
| Dropping columns immediately | Breaks running app instances | Deprecate → remove from code → wait → drop |
| Changing column types | Can lose data, slow migration | Add new column → migrate data → drop old |
| Adding indexes on large tables | Locks table for minutes/hours | Use CONCURRENTLY (PostgreSQL) or equivalent |
| Foreign key constraints without index | Slow queries, lock contention | Create index first, then constraint |

## 2. Verified Migration Patterns

### Pattern 1: Renaming a Column (Safe)

**❌ Dangerous Approach:**
```sql
ALTER TABLE users RENAME COLUMN name TO full_name;
```
**Problem:** All running app instances crash immediately.

**✅ Safe Approach (Multi-Step):**

**Step 1: Add new column**
```sql
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);
```

**Step 2: Backfill data**
```sql
UPDATE users SET full_name = name WHERE full_name IS NULL;
-- For large tables, batch this:
-- UPDATE users SET full_name = name WHERE id >= X AND id < Y AND full_name IS NULL;
```

**Step 3: Update application code**
```typescript
// Old code: user.name
// New code: user.full_name || user.name  (supports both)
```
Deploy this version.

**Step 4: Make new column NOT NULL (after backfill complete)**
```sql
ALTER TABLE users ALTER COLUMN full_name SET NOT NULL;
```

**Step 5: Update code to only use new column**
```typescript
// New code: user.full_name
```
Deploy this version.

**Step 6: Drop old column (in separate migration, days/weeks later)**
```sql
ALTER TABLE users DROP COLUMN name;
```

### Pattern 2: Adding NOT NULL Column (Safe)

**❌ Dangerous:**
```sql
ALTER TABLE products ADD COLUMN category_id INT NOT NULL;
```
**Problem:** Fails if table has existing rows.

**✅ Safe:**

**Step 1: Add as nullable with default**
```sql
ALTER TABLE products ADD COLUMN category_id INT;
```

**Step 2: Backfill data**
```sql
UPDATE products SET category_id = 1 WHERE category_id IS NULL;
-- Or more complex logic based on business rules
```

**Step 3: Add NOT NULL constraint (after verification)**
```sql
ALTER TABLE products ALTER COLUMN category_id SET NOT NULL;
```

### Pattern 3: Adding Index Without Locking (PostgreSQL)

**❌ Dangerous:**
```sql
CREATE INDEX idx_users_email ON users(email);
```
**Problem:** Locks table for duration of index creation.

**✅ Safe:**
```sql
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
```
**Benefit:** Allows reads/writes during index creation.

**Note:** For MySQL, use `ALGORITHM=INPLACE, LOCK=NONE` (5.6+)

### Pattern 4: Dropping a Column (Safe)

**✅ Safe Multi-Step Process:**

**Step 1: Stop writing to column**
```typescript
// Remove all code that sets this field
// Keep reads for backwards compatibility
```
Deploy.

**Step 2: Wait for all instances to deploy (1-7 days)**

**Step 3: Remove reads from code**
Deploy.

**Step 4: Drop column in migration**
```sql
ALTER TABLE users DROP COLUMN deprecated_field;
```

## 3. Verified Procedure for All Migrations

### Pre-Migration Checklist

- [ ] **Backup:** Ensure automated backups are working
- [ ] **Test locally:** Run migration against production-like dataset
- [ ] **Validate:** Run migration validator script:
  ```bash
  python .claude/skills/database-migration-safe/scripts/validate_migration.py <migration_file>
  ```
- [ ] **Size check:** If table > 1M rows, verify CONCURRENTLY or batching strategy
- [ ] **Rollback plan:** Document how to revert (within transaction if possible)
- [ ] **Timing:** Schedule during low-traffic window if risky

### Migration Execution

**Step 1: Generate migration file**
```bash
# TypeORM
npm run migration:generate -- -n AddUserEmailIndex

# Sequelize
npx sequelize-cli migration:generate --name add-user-email-index

# Prisma
npx prisma migrate dev --name add-user-email-index
```

**Step 2: Audit migration**
```bash
python .claude/skills/database-migration-safe/scripts/validate_migration.py \
  migrations/20260101_add_user_email_index.sql
```

**Step 3: Apply locally**
```bash
npm run migration:up
# Verify schema with: npm run migration:show
```

**Step 4: Test application**
```bash
npm test
npm run dev  # Manual verification
```

**Step 5: Commit migration**
```bash
git add migrations/
git commit -m "feat(db): add index on users.email for faster lookups"
```

**Step 6: Apply to staging**
```bash
# On staging environment
npm run migration:up
# Verify no errors, check performance
```

**Step 7: Apply to production**
```bash
# On production environment
npm run migration:up
# Monitor logs, database metrics
```

### Post-Migration Verification

- [ ] Check application logs for errors
- [ ] Verify query performance (EXPLAIN ANALYZE)
- [ ] Monitor database metrics (CPU, I/O, locks)
- [ ] Test critical user flows
- [ ] Confirm no data loss (row counts, checksums)

## 4. Zero-Context Scripts

### validate_migration.py

Located at: `.claude/skills/database-migration-safe/scripts/validate_migration.py`

**Purpose:** Automated detection of dangerous migration patterns.

**Usage:**
```bash
python .claude/skills/database-migration-safe/scripts/validate_migration.py <migration_file>
```

**Returns:**
- Exit code 0: Safe migration
- Exit code 1: Dangerous patterns detected
- JSON report of findings

## 5. Failed Attempts (Negative Knowledge Evolution)

### ❌ Attempt: Rename column in single migration
**Context:** Renamed `user.email` to `user.email_address` in one step
**Failure:** All running app instances crashed, 15min downtime
**Learning:** Always use multi-step rename pattern with dual-read period

### ❌ Attempt: Add NOT NULL column with default on 10M row table
**Context:** `ALTER TABLE orders ADD COLUMN status VARCHAR(20) NOT NULL DEFAULT 'pending'`
**Failure:** Table locked for 45 minutes, application unavailable
**Learning:** Add as nullable, backfill in batches, then add constraint

### ❌ Attempt: Create index during peak traffic
**Context:** Added index on production during business hours
**Failure:** Query timeouts, cascade failures across services
**Learning:** Use CONCURRENTLY and schedule during low-traffic windows

### ❌ Attempt: Drop column referenced by running code
**Context:** Removed column that was still read by some services
**Failure:** Services crashed until rollback
**Learning:** Multi-step deprecation: stop writes → deploy → stop reads → deploy → drop column

## 6. Database-Specific Guidance

### PostgreSQL
- ✅ Use `CONCURRENTLY` for index creation
- ✅ Use `ADD COLUMN IF NOT EXISTS` for idempotency
- ✅ Wrap DDL in transactions (except CONCURRENTLY operations)
- ✅ Use `pg_stat_activity` to check for locks before migration

### MySQL
- ✅ Use `ALGORITHM=INPLACE, LOCK=NONE` for online DDL (5.6+)
- ✅ Use `pt-online-schema-change` for large tables (Percona Toolkit)
- ⚠️ Be aware of metadata locks and long-running transactions

### SQLite
- ⚠️ Limited ALTER TABLE support (can't drop columns in older versions)
- ✅ Use table recreation pattern: create new → copy data → rename
- ✅ Always use transactions

## 7. Migration Naming Conventions

**Format:** `YYYYMMDDHHMMSS_descriptive_name.sql`

**Examples:**
- ✅ `20260101120000_add_index_users_email.sql`
- ✅ `20260101120100_create_orders_table.sql`
- ✅ `20260101120200_add_not_null_users_created_at.sql`
- ❌ `migration.sql`
- ❌ `update_db.sql`

## 8. Rollback Strategy

### Transactional Migrations (Preferred)
```sql
BEGIN;

-- Migration code here
ALTER TABLE users ADD COLUMN age INT;

-- Verification
DO $$
BEGIN
  IF NOT EXISTS (
    SELECT 1 FROM information_schema.columns
    WHERE table_name = 'users' AND column_name = 'age'
  ) THEN
    RAISE EXCEPTION 'Column not created';
  END IF;
END $$;

COMMIT;
```

### Non-Transactional Operations
For operations that can't be in a transaction (e.g., `CREATE INDEX CONCURRENTLY`):
- Document manual rollback steps
- Create a companion "down" migration file
- Test rollback locally before production

## 9. Governance
- **Token Budget:** ~480 lines (within 500 limit)
- **Dependencies:** Database-agnostic patterns (examples in PostgreSQL)
- **Critical:** Failure here causes data loss and downtime
- **Maintenance:** Update as new database versions add features
- **Verification Date:** 2026-01-01