--- name: senior-architect description: Expert software architecture covering system design, distributed systems, microservices, scalability patterns, and technical decision-making. version: 1.0.0 author: Claude Skills category: engineering tags: [architecture, system-design, distributed-systems, scalability, patterns] --- # Senior Software Architect Expert-level software architecture for scalable systems. ## Core Competencies - System design - Distributed systems - Microservices architecture - Scalability patterns - Technical decision-making - Architecture documentation - Technology evaluation - Performance optimization ## Architecture Patterns ### Monolith vs Microservices | Aspect | Monolith | Microservices | |--------|----------|---------------| | Complexity | Lower initially | Higher | | Deployment | Single unit | Independent | | Scaling | Vertical/Horizontal | Service-level | | Team Size | Small teams | Multiple teams | | Data | Single DB | Distributed | | Best For | Startups, MVPs | Scale, large orgs | ### Service Architecture Patterns **Layered Architecture:** ``` ┌─────────────────────────────────────┐ │ Presentation Layer │ ├─────────────────────────────────────┤ │ Application Layer │ ├─────────────────────────────────────┤ │ Domain Layer │ ├─────────────────────────────────────┤ │ Infrastructure Layer │ └─────────────────────────────────────┘ ``` **Hexagonal Architecture:** ``` ┌─────────┐ ┌──────│ API │──────┐ │ └─────────┘ │ ┌────▼────┐ ┌──────▼──────┐ │ CLI │ │ Database │ └────┬────┘ └──────┬──────┘ │ ┌─────────┐ │ └──────│ Core │──────┘ │ Domain │ ┌──────│ │──────┐ │ └─────────┘ │ ┌────▼────┐ ┌──────▼──────┐ │ Queue │ │ External │ └─────────┘ │ Service │ └─────────────┘ ``` **Event-Driven Architecture:** ``` ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Service A │────▶│ Event │────▶│ Service B │ └─────────────┘ │ Bus │ └─────────────┘ │ │ ┌─────────────┐ │ │ ┌─────────────┐ │ Service C │◀────│ │────▶│ Service D │ └─────────────┘ └─────────────┘ └─────────────┘ ``` ## Distributed Systems ### CAP Theorem - **Consistency**: All nodes see same data simultaneously - **Availability**: Every request receives response - **Partition Tolerance**: System continues despite network failures **Choose Two:** - CA: Traditional RDBMS (single node) - CP: Distributed databases (MongoDB, HBase) - AP: DNS, Cassandra, DynamoDB ### Consistency Patterns **Strong Consistency:** - All reads return most recent write - Use: Financial transactions, inventory - Trade-off: Higher latency **Eventual Consistency:** - Reads may return stale data temporarily - Use: Social media feeds, analytics - Trade-off: Complexity in application **Causal Consistency:** - Preserves cause-effect relationships - Use: Collaborative editing, messaging - Trade-off: Moderate complexity ### Distributed Transactions **Saga Pattern:** ``` ┌─────────────────────────────────────────────────────────┐ │ Choreography Saga │ ├─────────────────────────────────────────────────────────┤ │ │ │ Order Inventory Payment Shipping │ │ Service Service Service Service │ │ │ │ │ │ │ │ │──Create────▶│ │ │ │ │ │ │───Reserve────▶│ │ │ │ │ │ │───Charge───▶│ │ │ │ │ │ │──Ship │ │ │◀───────────────────────────────────Complete│ │ │ │ │ Compensation (on failure): │ │ │◀─Release───│◀──Refund─────│◀──Cancel────│ │ │ │ └─────────────────────────────────────────────────────────┘ ``` **Two-Phase Commit:** ``` Phase 1 (Prepare): Coordinator → All Participants: "Prepare" All Participants → Coordinator: "Ready" or "Abort" Phase 2 (Commit/Abort): If all Ready: Coordinator → All Participants: "Commit" Else: Coordinator → All Participants: "Abort" ``` ## Scalability Patterns ### Horizontal Scaling **Load Balancing Strategies:** - Round Robin: Simple rotation - Least Connections: Route to least busy - IP Hash: Sticky sessions - Weighted: Based on server capacity **Stateless Design:** ``` ┌─────────────┐ ┌─────────────┐ │ Client │────▶│ Load │ └─────────────┘ │ Balancer │ └──────┬──────┘ ┌───────────────┼───────────────┐ ▼ ▼ ▼ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ Server 1 │ │ Server 2 │ │ Server 3 │ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ └───────────────┼───────────────┘ ▼ ┌─────────────────────┐ │ Shared State │ │ (Redis/DB) │ └─────────────────────┘ ``` ### Caching Strategies **Cache Levels:** ``` ┌─────────────────────────────────────────────────────┐ │ L1: Application Cache (in-memory) ~1ms │ ├─────────────────────────────────────────────────────┤ │ L2: Distributed Cache (Redis) ~5ms │ ├─────────────────────────────────────────────────────┤ │ L3: CDN Cache ~50ms │ ├─────────────────────────────────────────────────────┤ │ L4: Database ~100ms │ └─────────────────────────────────────────────────────┘ ``` **Cache Patterns:** ``` Cache-Aside: 1. Check cache 2. If miss, read from DB 3. Store in cache 4. Return data Write-Through: 1. Write to cache 2. Cache writes to DB 3. Return success Write-Behind: 1. Write to cache 2. Return success 3. Cache async writes to DB ``` ### Database Scaling **Read Replicas:** ``` ┌────────────┐ │ Master │◀────── Writes └─────┬──────┘ │ Replication ├──────────────┬──────────────┐ ▼ ▼ ▼ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ Replica 1│ │ Replica 2│ │ Replica 3│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ └─────────────┴─────────────┘ │ Reads distributed ``` **Sharding Strategies:** - Hash-based: Consistent hashing - Range-based: Date ranges, alphabetical - Geographic: By region - Directory-based: Lookup service ## API Design ### REST Maturity Model **Level 0**: Single URI, POST everything **Level 1**: Multiple URIs, resources **Level 2**: HTTP methods (GET, POST, PUT, DELETE) **Level 3**: HATEOAS (Hypermedia controls) ### API Versioning ``` # URL Path GET /api/v1/users # Query Parameter GET /api/users?version=1 # Header GET /api/users Accept: application/vnd.api+json;version=1 # Content Negotiation GET /api/users Accept: application/vnd.company.api.v1+json ``` ### Rate Limiting **Token Bucket Algorithm:** ``` Bucket Capacity: 100 tokens Refill Rate: 10 tokens/second Request arrives: If tokens > 0: tokens -= 1 Process request Else: Reject (429 Too Many Requests) ``` ## Reliability Patterns ### Circuit Breaker ``` States: CLOSED → OPEN → HALF-OPEN → CLOSED CLOSED: - Normal operation - Track failures - If failure_count > threshold: → OPEN OPEN: - Reject all requests immediately - After timeout: → HALF-OPEN HALF-OPEN: - Allow limited requests - If success: → CLOSED - If failure: → OPEN ``` ### Bulkhead ``` ┌─────────────────────────────────────────────┐ │ Application │ ├─────────────┬─────────────┬─────────────────┤ │ Thread │ Thread │ Thread │ │ Pool A │ Pool B │ Pool C │ │ (Orders) │ (Users) │ (Analytics) │ │ │ │ │ │ [===] │ [===] │ [===] │ │ 10 threads │ 10 threads │ 5 threads │ └─────────────┴─────────────┴─────────────────┘ If Orders service fails, only Pool A exhausted. Pools B and C continue operating. ``` ### Retry with Backoff ```python def exponential_backoff(attempt: int, base: float = 1.0, max_delay: float = 60.0): delay = min(base * (2 ** attempt), max_delay) jitter = random.uniform(0, delay * 0.1) return delay + jitter # Retry pattern for attempt in range(max_retries): try: result = make_request() break except TransientError: if attempt == max_retries - 1: raise time.sleep(exponential_backoff(attempt)) ``` ## Architecture Documentation ### Architecture Decision Record (ADR) ```markdown # ADR-001: Use PostgreSQL for primary database ## Status Accepted ## Context We need to choose a primary database for our e-commerce platform. Requirements: - ACID transactions for orders - Complex queries for reporting - Scalability to 10M users ## Decision We will use PostgreSQL as our primary database. ## Consequences Positive: - Strong consistency guarantees - Rich SQL features - Excellent JSON support - Large ecosystem Negative: - Manual sharding required at scale - More complex HA setup than managed NoSQL ## Alternatives Considered - MongoDB: Better horizontal scaling, but weaker transactions - CockroachDB: Better scaling, but higher operational complexity - MySQL: Similar features, but less advanced JSON support ``` ### C4 Model ``` Level 1 - System Context: [User] → [System] → [External Systems] Level 2 - Container: [Web App] → [API] → [Database] ↓ [Message Queue] → [Worker] Level 3 - Component: API Container: [Controllers] → [Services] → [Repositories] Level 4 - Code: Class diagrams, sequence diagrams ``` ## Reference Materials - `references/design_patterns.md` - Architecture patterns catalog - `references/distributed_systems.md` - Distributed systems guide - `references/scalability.md` - Scaling strategies - `references/documentation.md` - Architecture documentation ## Scripts ```bash # Architecture diagram generator python scripts/arch_diagram.py --config services.yaml --output diagram.png # Dependency analyzer python scripts/dep_analyzer.py --path ./services --output deps.json # Capacity planner python scripts/capacity_plan.py --current 10000 --growth 0.5 --period 12 # Service mesh analyzer python scripts/mesh_analyzer.py --namespace production ```