--- name: architect description: Design system architecture and high-level technical strategy allowed-tools: - Read - Grep - Glob - Bash --- # Architecture Design Design system architecture and make strategic technical decisions. ## Core Principle **Good architecture enables change while maintaining simplicity.** ## Name han-core:architect - Design system architecture and high-level technical strategy ## Synopsis ``` /architect [arguments] ``` ## Architecture vs Planning **Architecture Design (this skill):** - Strategic: "How should the system be structured?" - Component interactions and boundaries - Technology and pattern choices - Long-term implications - System-level decisions **Technical Planning:** - Tactical: "How do I implement feature X?" - Specific implementation tasks - Execution details - Short-term focus **Use /architect when:** - Designing new systems or subsystems - Significant system change (new subsystem, major refactor) - Affects multiple components or teams - Major refactors affecting multiple components - Technology selection decisions - Defining system boundaries and interfaces - Long-term technical strategy needed - Need to evaluate multiple approaches - Decisions have broad impact **Use /plan when:** - Implementing within existing architecture - Implementing specific feature within existing architecture - Tactical execution planning - Breaking down known work - Architecture is already decided - Task sequencing and execution ## Architecture Process ### 1. Understand Context **Business context:** - What problem are we solving? - Who are the users? - What are the business goals? - What are the success metrics? **Technical context:** - What exists today? - What constraints exist? - What must we integrate with? - What scale must we support? **Team context:** - What's our expertise? - What can we maintain? - What's our velocity? ### 2. Gather Requirements **Functional requirements:** - What must the system do? - What are the features? - What are the user scenarios? **Non-functional requirements:** - **Performance**: Response time, throughput - **Scalability**: Expected load, growth - **Availability**: Uptime requirements - **Security**: Compliance, data protection - **Maintainability**: Team size, skills - **Cost**: Budget constraints **Example:** ```markdown ## Requirements ### Functional - Users can search products by name/category - Users can add items to cart - Users can checkout and pay ### Non-Functional - Search response time < 200ms (p95) - Support 10,000 concurrent users - 99.9% uptime - PCI DSS compliant for payments - Team of 5 developers can maintain ``` ### 3. Identify Constraints **Technical constraints:** - Must use existing authentication system - Must integrate with legacy inventory system - Database must be PostgreSQL (existing infrastructure) **Business constraints:** - Budget for infrastructure - Must support EU data residency **Team constraints:** - Team experienced in Python, less in Go - No DevOps specialist on team - Remote team across timezones ### 4. Consider Alternatives **Never design in a vacuum - consider options:** **Example: Data storage choice** **Option 1: PostgreSQL** - Pros: Team knows it, ACID guarantees, rich query support - Cons: Vertical scaling limits, setup complexity **Option 2: MongoDB** - Pros: Flexible schema, horizontal scaling - Cons: Team unfamiliar, eventual consistency **Option 3: DynamoDB** - Pros: Fully managed, auto-scaling - Cons: Vendor lock-in, query limitations, cost at scale **Decision: PostgreSQL** - Team expertise outweighs scaling concerns - Can re-evaluate if scale becomes issue - Faster initial development ### 5. Design System Structure **Define components and their responsibilities:** ``` ┌─────────────────────────────────────────────┐ │ Client Apps │ │ (Web, iOS, Android) │ └────────────────┬────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────┐ │ API Gateway / Load Balancer │ └────────────────┬────────────────────────────┘ │ ┌────────┴────────┐ ▼ ▼ ┌───────────────┐ ┌───────────────┐ │ Auth │ │ Core API │ │ Service │ │ Service │ └───────┬───────┘ └───────┬───────┘ │ │ │ ┌────────┴────────┐ │ ▼ ▼ │ ┌──────────────┐ ┌──────────────┐ │ │ PostgreSQL │ │ Redis │ │ │ (Primary) │ │ (Cache) │ │ └──────────────┘ └──────────────┘ │ ▼ ┌───────────────┐ │ User DB │ └───────────────┘ ``` **Component descriptions:** ```markdown ## Components ### API Gateway **Responsibility:** Route requests, rate limiting, authentication **Technology:** Nginx **Dependencies:** Auth Service, Core API Service **Scale:** 2-3 instances behind load balancer ### Auth Service **Responsibility:** User authentication, session management, JWT issuing **Technology:** Python (Flask), PostgreSQL **API:** REST **Scale:** Stateless, 2-N instances ### Core API Service **Responsibility:** Business logic, data access, external integrations **Technology:** Python (FastAPI), PostgreSQL, Redis **API:** REST **Scale:** Stateless, 2-N instances ### PostgreSQL **Responsibility:** Primary data store **Scale:** Primary with read replica ### Redis **Responsibility:** Session storage, caching, rate limiting **Scale:** Cluster mode (3 nodes) ``` ### 6. Define Interfaces **API contracts:** ```markdown ## API Design ### POST /api/auth/login **Purpose:** Authenticate user, issue JWT **Request:** ```json { "email": "user@example.com", "password": "secure_password" } ``` **Response (200):** ```json { "token": "eyJ...", "user": { "id": "123", "email": "user@example.com", "name": "John Doe" } } ``` **Errors:** - 400: Invalid request - 401: Invalid credentials - 429: Rate limit exceeded ``` ### 7. Plan for Failure **What can go wrong?** - Database unavailable - External API down - Network partition - High load - Data corruption **Mitigation strategies:** - Retry with exponential backoff - Circuit breakers for external services - Graceful degradation - Health checks and monitoring - Database backups **Example:** ```markdown ## Failure Scenarios ### Database Unavailable **Impact:** Cannot read/write data **Mitigation:** - Read replica failover (automated) - Circuit breaker after 3 failures - Cache serves stale data for 5 minutes - User sees degraded experience message **Recovery:** Manual failover to replica, fix primary ### External Payment API Down **Impact:** Cannot process payments **Mitigation:** - Retry 3 times with exponential backoff - Queue payments for later processing - User notified of delay - Alert on-call engineer **Recovery:** Process queued payments once API recovers ``` ### 8. Document Decisions **Architecture Decision Record (ADR):** ```markdown # ADR-001: Use PostgreSQL for Primary Database **Status:** Accepted **Date:** 2024-01-15 **Deciders:** Tech Lead, Backend Team ## Context We need to choose a primary database for user data, products, and orders. Requirements: - Strong consistency (ACID) - Complex queries (joins, aggregations) - < 200ms query time for 90% of queries - Support 100k users initially ## Decision Use PostgreSQL as primary database. ## Alternatives Considered ### MongoDB - **Pros:** Flexible schema, horizontal scaling - **Cons:** Team unfamiliar, eventual consistency issues - **Why not:** Team expertise more valuable than flexibility ### DynamoDB - **Pros:** Managed service, auto-scaling - **Cons:** Vendor lock-in, limited query capability, cost - **Why not:** Query limitations would hurt development velocity ### MySQL - **Pros:** Similar to PostgreSQL, team knows it - **Cons:** Less feature-rich than PostgreSQL - **Why not:** PostgreSQL offers JSON support, better full-text search ## Consequences **Positive:** - Team can be productive immediately - Strong consistency guarantees - Rich query capabilities - JSON support for flexible data **Negative:** - Vertical scaling limits (mitigated: can add read replicas) - More complex than managed services (mitigated: use RDS) - Higher operational overhead **Trade-offs:** - Chose familiarity over horizontal scaling - Chose rich queries over eventual consistency - Can re-evaluate if scale requirements change ## Validation - Team confirmed expertise in PostgreSQL - Load testing shows meets performance requirements - Cost analysis shows acceptable for first year ``` ## Architecture Document Structure ```markdown # Architecture Design: [System/Feature Name] ## Context ### Problem Statement [What business problem are we solving?] ### Goals [What are we trying to achieve?] ### Non-Goals [What are we explicitly NOT trying to achieve?] ### Requirements **Functional:** - [Requirement 1] - [Requirement 2] **Non-Functional:** - Performance: [e.g., < 200ms response time] - Scalability: [e.g., handle 10k concurrent users] - Security: [e.g., PCI compliance] - Maintainability: [e.g., easy to modify] ### Constraints [Technical, time, resource, or business constraints] ## Current Architecture [What exists today? What needs to change?] ## Proposed Architecture ### High-Level Design [Diagram or description of system components] ``` ┌─────────────┐ ┌─────────────┐ ┌──────────────┐ │ Client │────▶│ API │────▶│ Database │ └─────────────┘ └─────────────┘ └──────────────┘ │ ▼ ┌─────────────┐ │ Cache │ └─────────────┘ ``` ### Components #### Component A **Responsibility:** [What it does] **Interface:** [How others interact with it] **Dependencies:** [What it depends on] **Technology:** [Implementation stack] #### Component B [Similar structure...] ### Data Flow [How data moves through the system] ### API Design [Key endpoints, schemas, contracts] ### Data Model [Database schema, key entities] ### Security Model [Authentication, authorization, data protection] ## Alternative Approaches Considered ### Alternative 1: [Approach name] **Pros:** - [Advantage 1] - [Advantage 2] **Cons:** - [Disadvantage 1] - [Disadvantage 2] **Why not chosen:** [Reasoning] ### Alternative 2: [Another approach] [Similar structure...] ## Decision Rationale ### Why This Architecture? [Explain the key decisions and trade-offs] ### Trade-offs Accepted [What we gave up for what benefits] ### Assumptions [What we're assuming to be true] ## Implementation Strategy ### Phase 1: [Foundation] [What to build first] ### Phase 2: [Core Features] [Next phase] ### Phase 3: [Enhancement] [Final phase] ### Migration Strategy [If replacing existing system, how to transition?] ## Risks & Mitigation | Risk | Impact | Probability | Mitigation | |------|--------|-------------|------------| | [Risk 1] | High | Medium | [How to mitigate] | | [Risk 2] | Medium | Low | [How to mitigate] | ## Testing Strategy [How will we validate this architecture?] ## Monitoring & Observability [How will we know if it's working?] ## Success Metrics [How will we measure success?] ## Open Questions [What still needs to be resolved?] ## References [Links to research, related docs, RFCs] ``` ## Architecture Principles ### 1. Simplicity **Start simple, add complexity only when needed.** ``` BAD: Microservices from day 1 with 20 services GOOD: Start with monolith, split when needed ``` **Apply YAGNI:** You Aren't Gonna Need It - Don't build for hypothetical future - Add when actually needed - Simpler is easier to maintain ### 2. Separation of Concerns **Each component has one clear responsibility.** ``` GOOD: - Auth Service: Authentication only - User Service: User profile management - Order Service: Order processing BAD: - God Service: Does everything ``` **Apply SOLID principles:** - Single Responsibility - Open/Closed - Liskov Substitution - Interface Segregation - Dependency Inversion ### 3. Loose Coupling **Components depend on interfaces, not implementations.** ```typescript // BAD: Tight coupling class OrderService { constructor(private db: PostgresDatabase) {} } // GOOD: Loose coupling class OrderService { constructor(private db: Database) {} // Interface } ``` **Benefits:** - Easier to test (mock interface) - Easier to swap implementations - Components can evolve independently ### 4. High Cohesion **Related functionality stays together.** ``` GOOD: user/ - create_user.ts - update_user.ts - delete_user.ts - user_repository.ts BAD: create/ - create_user.ts - create_order.ts update/ - update_user.ts - update_order.ts ``` ### 5. Explicit Over Implicit **Make dependencies and contracts clear.** ```typescript // BAD: Implicit dependency function processOrder(orderId: string) { const db = global.database // Where does this come from? // ... } // GOOD: Explicit dependency function processOrder( orderId: string, db: Database, logger: Logger ) { // Dependencies are clear } ``` ### 6. Fail Fast **Detect and report errors early.** ```typescript // BAD: Silent failure function divide(a: number, b: number) { if (b === 0) return 0 // Wrong! return a / b } // GOOD: Fail fast function divide(a: number, b: number) { if (b === 0) { throw new Error('Division by zero') } return a / b } ``` ### 7. Design for Testability **Make it easy to test.** ```typescript // BAD: Hard to test class OrderService { processOrder(orderId: string) { const db = new PostgresDatabase() // Can't mock const api = new PaymentAPI() // Can't mock // ... } } // GOOD: Easy to test class OrderService { constructor( private db: Database, // Can inject mock private api: PaymentAPI // Can inject mock ) {} processOrder(orderId: string) { // ... } } ``` ## Common Architecture Patterns ### Layered Architecture ``` ┌─────────────────────┐ │ Presentation │ (UI, API controllers) ├─────────────────────┤ │ Business Logic │ (Domain, services) ├─────────────────────┤ │ Data Access │ (Repositories, ORMs) ├─────────────────────┤ │ Database │ (Storage) └─────────────────────┘ ``` **When to use:** Simple to moderate complexity ### Hexagonal Architecture (Ports & Adapters) ``` ┌───────────────────────┐ │ External Systems │ │ (UI, DB, APIs) │ └──────────┬────────────┘ │ ┌──────────▼────────────┐ │ Adapters │ (Implementation) │ (REST, PostgreSQL) │ └──────────┬────────────┘ │ ┌──────────▼────────────┐ │ Ports │ (Interfaces) │ (IUserRepo, IAuth) │ └──────────┬────────────┘ │ ┌──────────▼────────────┐ │ Core Domain │ (Business logic) │ (Pure logic) │ └───────────────────────┘ ``` **When to use:** Want to isolate business logic, multiple frontends ### Microservices ``` ┌─────────┐ ┌─────────┐ ┌─────────┐ │ User │ │ Order │ │ Payment │ │ Service │ │ Service │ │ Service │ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ └────────────┴────────────┘ │ ┌───────▼────────┐ │ Message Bus │ │ (Event-driven)│ └────────────────┘ ``` **When to use:** Large team, need independent deploy, clear boundaries **Avoid when:** Small team, unclear boundaries, early stage ### Event-Driven Architecture ``` ┌─────────┐ ┌─────────────┐ ┌─────────┐ │Producer │──────▶│ Event Bus │──────▶│Consumer │ └─────────┘ └─────────────┘ └─────────┘ ``` **When to use:** Async processing, decoupled systems, audit trails ## Architectural Patterns to Consider **System Patterns:** - Layered architecture - Microservices vs monolith - Event-driven architecture - CQRS (Command Query Responsibility Segregation) - Hexagonal architecture **Data Patterns:** - Database per service - Shared database - Event sourcing - CQRS - Cache-aside **Integration Patterns:** - API Gateway - Service mesh - Message queue - Pub/sub - GraphQL federation ## Anti-Patterns ### Premature Optimization **Don't optimize for scale you don't have.** ``` BAD: Build microservices for 100 users GOOD: Start with monolith, split when needed ``` ### Resume-Driven Architecture **Don't choose technology to pad resume.** ``` BAD: "I want to learn Kubernetes, let's use it" GOOD: "Kubernetes fits our scale needs" ``` ### Distributed Monolith **Microservices that are tightly coupled.** ``` BAD: Service A can't deploy without Service B GOOD: Services are independently deployable ``` ### Big Ball of Mud **No structure, everything depends on everything.** ``` BAD: Any code can call any other code GOOD: Clear layers and boundaries ``` ### Analysis Paralysis **Over-analyzing, never shipping.** ``` BAD: Spend months on perfect architecture GOOD: Design enough to start, iterate ``` ## Architecture Review Checklist - [ ] Business goals clearly understood - [ ] Functional requirements documented - [ ] Non-functional requirements defined - [ ] Constraints identified - [ ] Multiple alternatives considered - [ ] Trade-offs explicitly stated - [ ] Component responsibilities clear - [ ] Interfaces well-defined - [ ] Data flow documented - [ ] Failure scenarios planned for - [ ] Security model defined - [ ] Scalability addressed - [ ] Testability designed in - [ ] Decisions documented (ADRs) - [ ] Implementation phases outlined - [ ] Team can implement and maintain - [ ] Success metrics defined ## Examples When the user says: - "Design the architecture for our multi-tenant system" - "How should we structure our microservices?" - "Plan the technical approach for real-time notifications" - "Design the data model for our marketplace" - "Create architecture for migrating from monolith to services" ## Integration with Other Skills - Apply **solid-principles** - Guide component design - Apply **simplicity-principles** - KISS, YAGNI - Apply **orthogonality-principle** - Independent components - Apply **structural-design-principles** - Composition patterns - Use **technical-planning** - For implementation after design ## Notes - Use TaskCreate to track architecture design steps - Apply all relevant design principle skills - Create diagrams (ASCII art or reference drawing tools) - Architecture evolves - document decisions and changes - Consider using /plan for detailed implementation after architecture is approved - Archive decisions as ADRs for future reference ## Remember 1. **Simplicity first** - Start simple, add complexity when needed 2. **Document decisions** - Future you will thank you 3. **Consider alternatives** - Never the first idea only 4. **State trade-offs** - Every decision has consequences 5. **Design for change** - Systems evolve **The best architecture is the one that's simple enough to ship and flexible enough to evolve.**