--- name: minimal-modeling description: Use when designing database schemas from business requirements, translating vague requirements into concrete SQL definitions, or validating database designs with non-technical stakeholders. Provides a systematic methodology that decouples logical business modeling from physical implementation to reduce cognitive load and ensure business-driven design. --- # Minimal Modeling Database Design ## Overview Minimal Modeling is a database design methodology that bridges the gap between vague business requirements and concrete SQL schemas through systematic decoupling of logical and physical concerns. **Core Principle:** Separate the "what exists" (business logic) from "how to store it" (technical implementation) to reduce cognitive load and enable validation by non-technical stakeholders before writing any SQL. ## When to Use Use this skill when: - Translating business requirements ("I need a website for...") into database schemas - Starting a new project with unclear data requirements - Validating database design with non-technical stakeholders - Experiencing analysis paralysis from too many implementation details early - Need to explain database structure to business users Do NOT use when: - Database schema already exists and you're just querying it - Working with predefined ORM models that can't be changed - Technical constraints (like existing tables) must drive the design ## The Two-Phase Process ### Phase A: Logical Model (Business Domain) Focus entirely on business concepts without technical implementation. ### Phase B: Physical Schema (Implementation) Translate business concepts into tables, columns, and data types. ## Phase A: Logical Model ### Step 1: Gather Requirements Start with informal text or transcript describing the system. ### Step 2: Find Anchors (The "Nouns") **Definition:** Entities that exist independently with unique IDs. They represent things you count, not data itself. **Validation:** Use two sentences to test if a noun is an Anchor: ``` Counting Sentence: "We have 1,000 [Nouns] in our database." Adding Sentence: "The system inserts another [Noun] into the database." ``` If both sentences sound natural, it's an Anchor. **Examples:** - ✅ User, Post, Order, Product (you count these) - ❌ Email, Title, Price (these are data about things) ### Step 3: Find Attributes (The "Data") **Definition:** Information that describes an Anchor. **Validation:** Define each attribute with a Human-Readable Question: ``` "What is the [attribute name]?" "Is this [condition]?" ``` **Examples:** - User → "What is the email address?" (String) - Product → "What is the price?" (Decimal) - Room → "Is this wheelchair accessible?" (Boolean) **Common Attribute Types:** - Strings: Names, descriptions, addresses - Numbers: Counts, IDs, quantities - Money: Prices, balances (always use DECIMAL, never Float) - Booleans: Flags, yes/no questions - Dates/Times: Created dates, scheduled events - Binary: Files, images (or S3 URLs) ### Step 4: Find Links (The "Relationships") **Definition:** Connections between two Anchors (or an Anchor to itself). **Validation:** Use the Two-Sentence Rule to determine cardinality: ``` Sentence 1: "An [Anchor A] [verb] [one/several] [Anchor B]." Sentence 2: "An [Anchor B] [verb] [one/several] [Anchor A]." ``` **Cardinality Patterns:** | Sentence Pattern | Cardinality | Implementation | | ---------------------------------------- | ------------------ | --------------------------- | | "A has several B" + "B belongs to one A" | 1:N (One-to-Many) | Foreign key in B table | | "A has one B" + "B belongs to one A" | 1:1 (One-to-One) | Foreign key in either table | | "A has several B" + "B has several A" | N:M (Many-to-Many) | Junction table | **Examples:** ``` User ↔ Order "A User can place several Orders." "An Order is placed by one User." Result: 1:N → Add user_id to orders table User ↔ Post (likes) "A User can like several Posts." "A Post can be liked by several Users." Result: N:M → Create users_liked_posts junction table ``` ### Step 5: Verify Completeness Go back to original requirements. Highlight every word covered by your model. **Missing highlights indicate:** - Missed Anchors - Missed Attributes - Missed Links Continue iterating until all business concepts are captured. ## Phase B: Physical Schema ### Step 6: Choose Table Names **Convention:** Pluralize the Anchor name - User → `users` - Post → `posts` - Order → `orders` ### Step 7: Choose Column Types **Critical Type Decisions:** | Logical Type | SQL Type | Notes | | ------------------ | --------------------------- | ----------------------------------------------- | | ID | `INTEGER` or `BIGINT` | Use BIGINT for massive scale (billions of rows) | | String | `VARCHAR(N)` | Default to empty string, not NULL | | Money | `DECIMAL(M,D)` | NEVER use Float for money | | Boolean | `BOOLEAN` or `TINYINT` | Depends on database | | Timestamp (past) | `TIMESTAMP` in UTC | Store server events in UTC | | Timestamp (future) | Store local time + timezone | For user-scheduled events | | Binary | `BLOB` or S3 URL | Consider external storage for large files | **NULL vs Default Values:** - Prefer meaningful defaults over NULL when possible - Strings: Default to `''` (empty string) - Numbers: Consider if `0` makes sense or if NULL is semantic ### Step 8: Implement Links **One-to-Many (1:N):** Add foreign key column to the "Many" side table. ```sql -- User has many Orders CREATE TABLE orders ( id INTEGER PRIMARY KEY, user_id INTEGER NOT NULL, -- Foreign key to users total_amount DECIMAL(10,2), FOREIGN KEY (user_id) REFERENCES users(id) ); ``` **Many-to-Many (N:M):** Create a junction table with foreign keys to both sides. ```sql -- Users can like many Posts, Posts can be liked by many Users CREATE TABLE users_liked_posts ( user_id INTEGER NOT NULL, post_id INTEGER NOT NULL, liked_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (user_id, post_id), FOREIGN KEY (user_id) REFERENCES users(id), FOREIGN KEY (post_id) REFERENCES posts(id) ); ``` **One-to-One (1:1):** Add foreign key to either table (choose based on query patterns). ```sql -- User has one Profile CREATE TABLE profiles ( id INTEGER PRIMARY KEY, user_id INTEGER UNIQUE NOT NULL, -- UNIQUE enforces 1:1 bio TEXT, FOREIGN KEY (user_id) REFERENCES users(id) ); ``` ## Quick Reference: Validation Checklist **Anchors:** - [ ] Pass counting sentence test - [ ] Pass adding sentence test - [ ] Have unique IDs - [ ] Exist independently **Attributes:** - [ ] Have human-readable question - [ ] Belong to a specific Anchor - [ ] Are single-valued (not lists) - [ ] Appropriate SQL type chosen **Links:** - [ ] Two-sentence validation complete - [ ] Cardinality determined (1:1, 1:N, N:M) - [ ] Implementation strategy chosen - [ ] Foreign keys properly constrained **Physical Schema:** - [ ] Table names pluralized - [ ] Primary keys defined - [ ] Foreign keys with constraints - [ ] Column types match logical types - [ ] Defaults specified where appropriate - [ ] Indexes planned for common queries ## Common Mistakes **Mistake 1: Mixing Logical and Physical Too Early** ``` ❌ BAD: "Should User have a VARCHAR(255) email?" ✅ GOOD: "Does User have an email address? [Yes] → Later: VARCHAR(255)" ``` **Mistake 2: Attributes as Anchors** ``` ❌ BAD: EmailAddress as an Anchor ✅ GOOD: Email as an Attribute of User Ask: "We have 1,000 emails in our database?" (sounds weird → not an Anchor) ``` **Mistake 3: Using Float for Money** ``` ❌ BAD: price FLOAT ✅ GOOD: price DECIMAL(10,2) Reason: Float has rounding errors; DECIMAL is exact ``` **Mistake 4: Wrong Cardinality Side** ``` ❌ BAD: Adding order_id to users table (1:N wrong side) ✅ GOOD: Adding user_id to orders table (1:N correct side) Rule: Foreign key goes on the "Many" side ``` **Mistake 5: Missing Junction Tables** ``` ❌ BAD: Adding post_ids array column to users (N:M) ✅ GOOD: Create users_liked_posts junction table Reason: Relational databases don't handle arrays well ``` **Mistake 6: Storing Timezone as String** ``` ❌ BAD: "America/New_York" string for all times ✅ GOOD: - Past events: UTC timestamp - Future events: Local time + timezone Reason: Different semantic needs ``` ## Secondary Data **Definition:** Data that is duplicated, cached, or aggregated for performance. Not the source of truth. **Examples:** - `total_posts` count on User (redundant with COUNT of posts) - `last_login_at` on User (duplicated from sessions table) **When to Use:** - Performance optimization after proving it's needed - Denormalization for read-heavy workloads **Always:** - Document as secondary data - Have mechanism to rebuild from source of truth - Consider if query optimization or indexing is better solution ## Working with Query Builders and ORMs When using query builders like Kysely or ORMs, the process remains the same: 1. **Logical Model first** (Anchors, Attributes, Links) 2. **Define schema types** instead of raw SQL 3. **Generate migrations** from type definitions ```typescript // After logical modeling, translate to Kysely schema: // Define database schema interface interface Database { users: UserTable; // Anchor: User orders: OrderTable; // Anchor: Order } // Anchor: User interface UserTable { id: Generated; email: string; // Attribute created_at: Generated; } // Anchor: Order interface OrderTable { id: Generated; user_id: number; // Link (1:N) - Foreign key to users total: string; // Decimal stored as string for precision created_at: Generated; } // Create Kysely instance const db = new Kysely({ dialect: new PostgresDialect({ pool }), }); // Queries respect the logical model: const userOrders = await db .selectFrom('orders') .where('user_id', '=', userId) // Following 1:N relationship .selectAll() .execute(); ``` ## Time and Timezone Best Practices **Past Events (already happened):** ```sql -- Server events, log entries, message timestamps created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -- Store in UTC ``` **Future Events (scheduled by user):** ```sql -- Concert times, meeting schedules, alarm times event_time TIMESTAMP, event_timezone VARCHAR(50) -- e.g., "America/New_York" ``` **Reason:** Past events have definite UTC times. Future events need local context because timezone rules may change (DST, political changes). ## Iterative Refinement Database design is iterative. After creating initial schema: 1. **Test with sample data** - Does it feel natural? 2. **Write common queries** - Are they simple or complex? 3. **Show to stakeholders** - Do they understand the model? 4. **Adjust and repeat** - Refine based on feedback **Signs you need to revisit:** - Queries require 5+ JOINs regularly - Stakeholders confused by table names - Constant use of NULL checks in queries - Performance issues with simple queries - Adding "workaround" columns frequently ## Real-World Impact **Benefits:** - Non-technical stakeholders can validate the logical model - Catch misunderstandings before writing SQL - Faster iterations (change model, not migrations) - Clear documentation of business logic - Reduced cognitive load during design **When you skip this:** - Premature optimization (wrong indexes too early) - Schema doesn't match business needs - Expensive refactoring later - Technical debt from implementation-driven design