--- name: databases-architecture-skill description: Master database design (SQL, NoSQL), system architecture, API design (REST, GraphQL), and building scalable systems. Learn PostgreSQL, MongoDB, system design patterns, and enterprise architectures. --- # Databases & Architecture Skill Complete guide to designing databases, systems, and APIs that scale. ## Quick Start ### Learning Path ``` Data → Schema → APIs → Architecture ↓ ↓ ↓ ↓ SQL Normalize REST Microservices NoSQL Indexes GraphQL Patterns ``` ### Get Started in 5 Steps 1. **SQL Fundamentals** (2-3 weeks) - SELECT, INSERT, UPDATE, DELETE - Joins and aggregations 2. **Database Design** (3-4 weeks) - Normalization - Entity-relationship modeling - Indexing 3. **NoSQL Databases** (2-3 weeks) - Document stores (MongoDB) - Key-value (Redis) - When to use each 4. **API Design** (3-4 weeks) - REST principles - GraphQL basics - Error handling 5. **System Architecture** (ongoing) - Scalability patterns - Caching strategies - Distributed systems --- ## SQL Databases ### **SQL Fundamentals** ```sql -- CREATE TABLE CREATE TABLE users ( id SERIAL PRIMARY KEY, name VARCHAR(255) NOT NULL, email VARCHAR(255) UNIQUE, age INT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- INSERT INSERT INTO users (name, email, age) VALUES ('Alice', 'alice@example.com', 25); -- SELECT (basic) SELECT * FROM users; SELECT name, email FROM users; -- WHERE (filtering) SELECT * FROM users WHERE age > 25; SELECT * FROM users WHERE age >= 25 AND age <= 35; -- LIKE (pattern matching) SELECT * FROM users WHERE name LIKE 'A%'; -- Starts with A -- ORDER BY (sorting) SELECT * FROM users ORDER BY age DESC; -- Highest first -- LIMIT (pagination) SELECT * FROM users LIMIT 10 OFFSET 20; -- Skip 20, show 10 ``` ### **Advanced SQL** ```sql -- JOINS SELECT users.name, orders.amount FROM users INNER JOIN orders ON users.id = orders.user_id; -- LEFT JOIN (include nulls) SELECT users.name, COUNT(orders.id) as order_count FROM users LEFT JOIN orders ON users.id = orders.user_id GROUP BY users.id, users.name; -- GROUP BY & AGGREGATION SELECT age, COUNT(*) as count, AVG(salary) as avg_salary FROM users GROUP BY age HAVING COUNT(*) > 5; -- Filter groups -- Window functions SELECT name, salary, AVG(salary) OVER (PARTITION BY department) as dept_avg, RANK() OVER (ORDER BY salary DESC) as salary_rank FROM employees; -- CTEs (Common Table Expressions) WITH high_earners AS ( SELECT * FROM employees WHERE salary > 100000 ) SELECT department, COUNT(*) as count FROM high_earners GROUP BY department; -- UPDATE UPDATE users SET age = 26 WHERE name = 'Alice'; -- DELETE DELETE FROM users WHERE age < 18; ``` ### **Database Design** **Normalization (Reduce data redundancy):** ``` 1NF: Each column has atomic value 2NF: Remove partial dependencies 3NF: Remove transitive dependencies BCNF: Every determinant is a candidate key ``` **Example - Poor vs Good Design:** ```sql -- POOR (denormalized) CREATE TABLE students ( id INT PRIMARY KEY, name VARCHAR(255), course1 VARCHAR(255), course2 VARCHAR(255), course3 VARCHAR(255), teacher1 VARCHAR(255), teacher2 VARCHAR(255) ); -- GOOD (normalized) CREATE TABLE students ( id INT PRIMARY KEY, name VARCHAR(255) ); CREATE TABLE courses ( id INT PRIMARY KEY, name VARCHAR(255), teacher_id INT FOREIGN KEY ); CREATE TABLE enrollments ( student_id INT FOREIGN KEY, course_id INT FOREIGN KEY, PRIMARY KEY (student_id, course_id) ); ``` ### **Indexing & Performance** ```sql -- Create index CREATE INDEX idx_email ON users(email); CREATE INDEX idx_age_salary ON users(age, salary); -- Composite -- Analyze query performance EXPLAIN SELECT * FROM users WHERE email = 'alice@example.com'; -- Index types -- B-tree: General purpose (default) -- Hash: Exact matches only -- GiST: Geospatial, full-text search -- BRIN: Large datasets, sequential data -- When to index -- ✓ Columns in WHERE clause -- ✓ Columns in JOIN ON clause -- ✗ Low cardinality (yes/no, status) -- ✗ Small tables ``` ### **PostgreSQL Advanced** ```sql -- JSONB (JSON with indexing) CREATE TABLE products ( id SERIAL PRIMARY KEY, name VARCHAR(255), metadata JSONB ); INSERT INTO products VALUES (1, 'Laptop', '{"color": "silver", "specs": {"cpu": "M1"}}'); -- Query JSONB SELECT * FROM products WHERE metadata->>'color' = 'silver'; SELECT * FROM products WHERE metadata->'specs'->>'cpu' = 'M1'; -- Array columns CREATE TABLE tags ( id SERIAL PRIMARY KEY, article_id INT, tags TEXT[] ); SELECT * FROM tags WHERE 'database' = ANY(tags); -- Full-text search CREATE TABLE articles ( id SERIAL PRIMARY KEY, title VARCHAR(255), content TEXT, search_vector tsvector ); UPDATE articles SET search_vector = to_tsvector('english', title || ' ' || content); SELECT * FROM articles WHERE search_vector @@ to_tsquery('database'); ``` --- ## NoSQL Databases ### **MongoDB Document Storage** ```javascript // Insert documents db.users.insertOne({ _id: ObjectId(), name: "Alice", email: "alice@example.com", age: 25, tags: ["developer", "python"], address: { street: "123 Main St", city: "New York", zip: "10001" } }); // Query documents db.users.find({ name: "Alice" }); db.users.find({ age: { $gt: 25 } }); // Greater than db.users.find({ tags: "python" }); // Array contains // Update db.users.updateOne( { name: "Alice" }, { $set: { age: 26 } } ); db.users.updateOne( { _id: ObjectId(...) }, { $push: { tags: "javascript" } } // Add to array ); // Aggregation pipeline db.users.aggregate([ { $match: { age: { $gt: 20 } } }, { $group: { _id: null, avg_age: { $avg: "$age" } } }, { $sort: { avg_age: -1 } } ]); // Indexes db.users.createIndex({ email: 1 }); db.users.createIndex({ name: 1, age: 1 }); db.users.createIndex({ search: "text" }); // Full-text search ``` ### **Redis Caching** ```python import redis r = redis.Redis(host='localhost', port=6379) # Strings r.set('user:1:name', 'Alice') r.get('user:1:name') # b'Alice' r.incr('page:views') # Increment counter # TTL (Time to live) r.setex('token:xyz', 3600, 'valid') # Expires in 1 hour # Lists r.lpush('queue:jobs', 'job1', 'job2') r.rpop('queue:jobs') # Dequeue r.llen('queue:jobs') # Length # Sets r.sadd('tags:post:1', 'python', 'database', 'backend') r.smembers('tags:post:1') r.sismember('tags:post:1', 'python') # Is member? # Hashes r.hset('user:1', mapping={'name': 'Alice', 'email': 'alice@example.com'}) r.hgetall('user:1') # Pub/Sub r.publish('channel:notifications', 'New message') # Transactions pipe = r.pipeline() pipe.set('key1', 'value1') pipe.set('key2', 'value2') pipe.execute() ``` --- ## API Design ### **REST API Best Practices** ``` HTTP Methods: GET - Retrieve resource (safe, idempotent) POST - Create resource PUT - Replace entire resource (idempotent) PATCH - Partial update DELETE - Remove resource (idempotent) Status Codes: 200 OK - Success 201 Created - Resource created 204 No Content - Success, no body 400 Bad Request - Client error 401 Unauthorized - Auth required 403 Forbidden - Not allowed 404 Not Found - Resource missing 500 Internal Server Error ``` **Resource URLs:** ``` GET /api/users # List all GET /api/users/:id # Get one POST /api/users # Create PUT /api/users/:id # Update (full) PATCH /api/users/:id # Update (partial) DELETE /api/users/:id # Delete // Nested resources GET /api/users/:id/posts # User's posts POST /api/users/:id/posts # Create post for user ``` **Request/Response Example:** ```json POST /api/users Content-Type: application/json { "name": "Alice", "email": "alice@example.com", "age": 25 } Response (201 Created): { "id": 123, "name": "Alice", "email": "alice@example.com", "age": 25, "created_at": "2024-01-15T10:30:00Z" } ``` ### **GraphQL** ```graphql # Schema type User { id: ID! name: String! email: String! posts: [Post!]! } type Post { id: ID! title: String! content: String! author: User! } type Query { user(id: ID!): User users(limit: Int): [User!]! post(id: ID!): Post } type Mutation { createUser(name: String!, email: String!): User! updateUser(id: ID!, name: String): User deleteUser(id: ID!): Boolean! } ``` ```graphql # Query query GetUserWithPosts { user(id: "123") { name email posts { title id } } } # Mutation mutation CreateUser { createUser(name: "Alice", email: "alice@example.com") { id name email } } ``` **GraphQL vs REST:** | Aspect | REST | GraphQL | |--------|------|---------| | Over-fetching | Common | None | | Under-fetching | Need multiple requests | Single query | | Caching | Easy (HTTP caching) | More complex | | Learning curve | Low | High | | Use case | Simple CRUD | Complex, nested data | --- ## System Design & Architecture ### **Scalability Patterns** **Vertical Scaling (Scale Up):** - Add more CPU, RAM, storage - Simple but has limits - Single point of failure **Horizontal Scaling (Scale Out):** - Add more servers - Load balancing needed - Better resilience ### **Caching Strategy** ``` Cache Levels: 1. Client-side (browser cache) 2. CDN (edge caching) 3. Application cache (Redis, Memcached) 4. Database (query caching) ``` **Cache Invalidation Strategies:** ``` 1. TTL (Time to Live) - Automatic expiration 2. Event-based - Invalidate on change 3. Purge - Manual invalidation ``` ### **Microservices Architecture** ``` Advantages: ✓ Independent scaling ✓ Technology diversity ✓ Faster deployment Challenges: ✗ Network latency ✗ Distributed transactions ✗ Operational complexity Pattern: API Gateway → Services → Databases ↓ Service Discovery Message Queue Logging/Monitoring ``` ### **Database Sharding** ``` Split data across multiple databases - Range-based: User ID 1-1000 → DB1, 1001-2000 → DB2 - Hash-based: hash(user_id) % num_shards - Directory-based: Lookup table maps to shard Tradeoffs: ✓ Horizontal scaling ✗ Complex queries ✗ Operational overhead ``` --- ## Learning Checklist - [ ] Understand SQL SELECT with WHERE, JOIN - [ ] Can design normalized schema - [ ] Know when to use indexes - [ ] Understand NoSQL document stores - [ ] Built API with proper status codes - [ ] Know REST vs GraphQL trade-offs - [ ] Understand caching strategies - [ ] Know sharding and replication - [ ] Understand microservices patterns - [ ] Ready for architect role! --- **Source**: https://roadmap.sh/sql, https://roadmap.sh/system-design, https://roadmap.sh/api-design