--- name: schema-design description: "Universal database schema design principles: normalization strategy, data types, primary keys, constraints, anti-patterns, and red flags. Apply when designing schemas, reviewing database architecture, or refactoring data models." --- # Database Schema Design Principles **"Schema design debt compounds faster than code debt. Fix it now or pay 10x later."** ## Foundation Principles ### Primary Keys - Every table MUST have a primary key - Prefer surrogate keys (auto-increment, UUIDv7) over composite natural keys - UUIDv7 for distributed systems, auto-increment BIGINT for monoliths ### Foreign Keys - Use FK constraints unless specific reason not to (high-volume logging, sharded DBs) - ON DELETE: RESTRICT (safest), CASCADE (use sparingly), SET NULL (breaks audit) ### Data Types - Choose smallest sufficient type (BIGINT vs INT = 4 bytes * rows) - Money: DECIMAL (never FLOAT/DOUBLE) - Dates without time: DATE not DATETIME - Small sets: ENUM not VARCHAR ### Constraints - NOT NULL on required columns - UNIQUE on natural keys - CHECK for business rules - DEFAULT where appropriate ## Quality Checklist ### Structural Integrity - [ ] Every table has primary key - [ ] Foreign key constraints defined - [ ] Appropriate data types (smallest sufficient) - [ ] NOT NULL, UNIQUE, CHECK constraints ### Anti-Pattern Scan - [ ] No EAV (entity-attribute-value) patterns - [ ] No god tables (> 50 columns) - [ ] No multi-valued fields (CSV in columns) - [ ] No DATETIME for date-only data ### Performance - [ ] Indexes match query patterns - [ ] Foreign keys indexed - [ ] Composite index column order optimized ## Decision Trees ### "Should I denormalize this?" ``` Have evidence of query performance problem? ├─ NO → DON'T denormalize (premature optimization) └─ YES → Tried indexes, query optimization, caching? ├─ NO → Try those first └─ YES → Read-heavy (> 100:1)? ├─ NO → Normalize, optimize queries └─ YES → Denormalize specific fields ``` ### "UUID or auto-increment?" ``` Distributed system (multiple write nodes)? ├─ YES → UUIDv7 (time-ordered, better than v4) └─ NO → Exposed to users (issue-123)? ├─ YES → Auto-increment (better UX) └─ NO → Auto-increment (better performance) ``` ### "Soft or hard delete?" ``` GDPR "right to erasure" applies? ├─ YES → Hard delete or audit table └─ NO → Need audit trail? ├─ YES → Audit table pattern (recommended) └─ NO → High deletion rate (> 20%)? ├─ YES → Hard delete └─ NO → Soft delete acceptable ``` ## References Detailed patterns and examples: - `references/anti-patterns.md` — EAV, god tables, multi-valued fields, red flags - `references/normalization.md` — 1NF/2NF/3NF, when to denormalize, OLTP vs OLAP - `references/advanced-patterns.md` — Soft delete, temporal data, JSON columns - `references/naming-conventions.md` — Tables, columns, indexes, constraints - `references/performance-patterns.md` — Indexing strategy, partitioning, data types ## Remember **"The best schema is one you can understand in 6 months and modify with confidence."** Design schemas that: 1. **Enforce integrity** — Constraints, foreign keys, data types 2. **Optimize for common patterns** — Indexes, denormalization where proven 3. **Enable evolution** — Proper normalization, migration strategy 4. **Prevent known anti-patterns** — No EAV, god tables, multi-valued fields