# Microservices Architecture ## Overview Microservices architecture structures an application as a collection of loosely coupled, independently deployable services, each organized around a business capability. Each service owns its data, runs in its own process, and communicates over the network. The canonical reference is Sam Newman's *Building Microservices* (O'Reilly, 2nd edition 2021), supplemented by *Monolith to Microservices* (Newman, 2019) for migration strategies. **The First Rule of Microservices:** Don't start with microservices. Start with a monolith, understand your domain, and decompose when you have evidence that the benefits outweigh the operational costs. (See `dev/architecture/monoliths` for the monolith-first approach.) ## Service Decomposition ### By Business Capability Align services to what the business does (e.g., Order Management, Inventory, Payments). This creates stable boundaries because business capabilities change less frequently than technical layers. ### By Subdomain (DDD-Aligned) Use Domain-Driven Design bounded contexts as service boundaries (see `dev/architecture/domain-driven-design`): - **Core subdomains** -- Your competitive advantage; build custom services. - **Supporting subdomains** -- Necessary but not differentiating; simpler services or libraries. - **Generic subdomains** -- Commodity; buy or use off-the-shelf (auth, email, payments). ### Decomposition Heuristics | Heuristic | Description | |-----------|-------------| | **Single Responsibility** | Each service does one thing well | | **Data ownership** | Each service owns its data; no shared databases | | **Independent deployability** | Changing one service does not require deploying another | | **Team alignment** | One team can own and operate the service end-to-end | | **Bounded context boundary** | Service boundaries align with DDD bounded contexts | ## Inter-Service Communication ### Synchronous Communication | Pattern | Protocol | When to Use | |---------|----------|-------------| | **Request/Response (REST)** | HTTP/JSON | Simple CRUD, external APIs, broad tooling support | | **Request/Response (gRPC)** | HTTP/2 + Protobuf | Internal service-to-service; high throughput, strong typing, streaming | | **GraphQL** | HTTP/JSON | Client-driven queries; aggregating multiple services for a frontend | ### Asynchronous Communication | Pattern | Mechanism | When to Use | |---------|-----------|-------------| | **Event Notification** | Message broker (topic/pub-sub) | Decoupled notification; consumers decide what to do | | **Event-Carried State Transfer** | Message broker with payload | Reduce synchronous callbacks; consumer has needed data | | **Command Message** | Message broker (queue) | Tell a specific service to do something | | **Async Request/Response** | Correlation ID + reply queue | Need a response but don't want to block | **Rule of thumb:** Prefer asynchronous communication for inter-service calls. Use synchronous only when a real-time response is required (e.g., user-facing request/response). ### Communication Anti-Patterns - **Distributed monolith** -- Services are "microservices" in name only; they deploy together, share databases, or cannot function independently. - **Chatty interfaces** -- Excessive synchronous calls between services creating latency chains. - **Shared database** -- Multiple services reading/writing the same tables destroys independent deployability. ## API Gateway An API gateway sits between external clients and internal services, providing: - **Request routing** -- Routes client requests to the appropriate microservice - **Protocol translation** -- External REST to internal gRPC, for example - **Authentication/Authorization** -- Centralized security enforcement - **Rate limiting and throttling** -- Protect services from traffic spikes - **Response aggregation** -- Combine responses from multiple services for a single client call Common implementations: Kong, AWS API Gateway, Azure API Management, Envoy, NGINX, Ocelot (.NET). ## Service Mesh A service mesh handles service-to-service networking concerns transparently via sidecar proxies: ``` ┌──────────────────────┐ ┌──────────────────────┐ │ Service A │ │ Service B │ │ ┌────────────────┐ │ │ ┌────────────────┐ │ │ │ App Container │ │ │ │ App Container │ │ │ └───────┬────────┘ │ │ └───────▲────────┘ │ │ │ │ │ │ │ │ ┌───────▼────────┐ │ │ ┌───────┴────────┐ │ │ │ Sidecar Proxy │──┼────┼─▶│ Sidecar Proxy │ │ │ │ (Envoy) │ │ │ │ (Envoy) │ │ │ └────────────────┘ │ │ └────────────────┘ │ └──────────────────────┘ └──────────────────────┘ Control Plane (Istio / Linkerd) ``` **Capabilities:** Mutual TLS, traffic management, retries, circuit breaking, observability (distributed tracing, metrics), canary deployments. **Implementations:** Istio, Linkerd, Consul Connect, AWS App Mesh. ## Saga Pattern -- Distributed Transactions Since each microservice owns its data, distributed transactions (2PC) are impractical. The saga pattern manages data consistency across services through a sequence of local transactions with compensating actions. ### Choreography (Event-Driven) Each service publishes events that trigger the next step. No central coordinator. ``` Order Service ──(OrderCreated)──▶ Payment Service Payment Service ──(PaymentProcessed)──▶ Inventory Service Inventory Service ──(InventoryReserved)──▶ Shipping Service On failure: Inventory Service ──(ReservationFailed)──▶ Payment Service (refund) Payment Service ──(RefundProcessed)──▶ Order Service (cancel) ``` **Pros:** Simple, decoupled, no single point of failure. **Cons:** Hard to understand the overall flow; debugging is difficult; risk of cyclic dependencies. ### Orchestration (Central Coordinator) A saga orchestrator (process manager) coordinates the steps explicitly. ``` ┌─────────────────┐ │ Saga Orchestrator│ │ (Order Saga) │ └────┬───┬───┬────┘ │ │ │ ▼ ▼ ▼ Payment Inventory Shipping Service Service Service ``` **Pros:** Clear flow, easier to understand and debug, centralized compensation logic. **Cons:** Orchestrator is a coupling point; risk of becoming a "god service." **Guidance:** Use choreography for simple sagas (2-3 steps). Use orchestration for complex flows (4+ steps or complex compensation). ## Distributed Data Management | Pattern | Description | |---------|-------------| | **Database per Service** | Each service has its own database; no shared access | | **API Composition** | Query multiple services and aggregate results | | **CQRS** | Separate read and write models for different optimization (see `dev/architecture/event-driven`) | | **Event Sourcing** | Store state changes as events; derive current state (see `dev/architecture/event-driven`) | | **Saga** | Manage distributed transactions through compensating actions | | **Outbox Pattern** | Reliably publish events by writing to a local outbox table within the same transaction | ## Service Discovery Services need to find each other in a dynamic environment where instances come and go. | Approach | Examples | Mechanism | |----------|----------|-----------| | **Client-side discovery** | Netflix Eureka, Consul | Client queries registry, picks instance | | **Server-side discovery** | AWS ALB, Kubernetes Services | Load balancer/proxy routes to available instance | | **DNS-based** | Consul DNS, Kubernetes CoreDNS | Resolve service name to IP(s) via DNS | In Kubernetes environments, server-side discovery via Services and DNS is the default and usually sufficient. ## Resilience Patterns | Pattern | Purpose | |---------|---------| | **Circuit Breaker** | Stop calling a failing service; fail fast and allow recovery | | **Retry with Backoff** | Retry transient failures with exponential backoff and jitter | | **Bulkhead** | Isolate failures to prevent cascading (separate thread pools / connections) | | **Timeout** | Set explicit timeouts on all remote calls; never wait forever | | **Fallback** | Provide degraded but functional response when a service is unavailable | | **Health Check** | Expose liveness and readiness endpoints for orchestrators | ## When NOT to Use Microservices Microservices introduce significant operational complexity. Do not use them when: - **Your team is small** (< 8-10 developers) -- The overhead exceeds the benefit. - **Your domain is not well understood** -- You will draw the wrong boundaries and create a distributed monolith. - **You lack operational maturity** -- You need CI/CD, monitoring, distributed tracing, container orchestration, and on-call practices before microservices are viable. - **Latency is critical** -- Every network hop adds latency; monoliths have zero network overhead for internal calls. - **Strong consistency is required everywhere** -- Microservices embrace eventual consistency; if your domain requires ACID transactions across multiple entities, a monolith may be simpler. - **You are building an MVP or prototype** -- Speed of iteration matters more than scalability at this stage. ## Tradeoffs Summary | Benefit | Cost | |---------|------| | Independent deployability | Operational complexity (CI/CD per service, monitoring, tracing) | | Technology heterogeneity | Polyglot overhead; harder to maintain standards | | Team autonomy | Coordination overhead; contract management | | Scalability per service | Network latency; serialization/deserialization cost | | Fault isolation | Distributed failure modes (partial failures, network partitions) | | Organizational alignment | Requires mature DevOps culture | ## Best Practices - Design for failure from day one: circuit breakers, retries, timeouts, bulkheads. - Own your data: one database per service, no shared database access. - Make inter-service communication observable: distributed tracing (OpenTelemetry), centralized logging, metrics. - Use consumer-driven contract testing (Pact, Spring Cloud Contract) to prevent breaking changes. - Prefer asynchronous communication; use synchronous calls only when necessary. - Keep services small enough to be owned by a single team, but large enough to justify the operational overhead. - Deploy independently, test independently, fail independently.