--- name: Infrastructure Sizing and Capacity Planning description: Methods for determining the optimal resource allocation for compute, database, and network systems to balance cost and performance. --- # Infrastructure Sizing and Capacity Planning ## Overview Infrastructure sizing is the process of determining the exact amount of CPU, Memory, Storage, and Network capacity required for a workload. Effective sizing avoids both **Over-provisioning** (wasted money) and **Under-provisioning** (poor performance/outages). **Core Principle**: "Sizing is not a one-time event; it is a continuous feedback loop based on real utilization metrics." --- ## 1. Right-Sizing Principles Traditional sizing used the "Peak + Buffer" model, leading to massive waste. Modern sizing uses **Demand-Driven Allocation**. | Principle | Description | | :--- | :--- | | **Utilization Thresholds**| Target 40-70% CPU utilization. Below 40% is over-provisioned; above 80% is risky. | | **Vertical first...** | Increase resource limits for single-threaded or monolithic apps. | | **...Horizontal usually**| Spread load across multiple small instances for resilience and elasticity. | | **Metric-Based** | Use P95 or P99 metrics for latency, but Average for base capacity sizing. | --- ## 2. Compute Sizing (EC2, VMs, GCE) ### Step 1: Resource Profiling Run your app in a staging environment and measure: * **CPU**: Is the app CPU-bound (mathematical calculations, compression)? * **Memory**: Is it memory-bound (caching, large payloads, in-memory DBs)? * **Thread Usage**: How many concurrent requests can one CPU core handle? ### Step 2: Instance Family Selection | Family | Best For | AWS Example | GCP Example | | :--- | :--- | :--- | :--- | | **General Purpose** | Balanced workloads, small DBs | `t3`, `m6g` | `n2`, `e2` | | **Compute Optimized**| Batch processing, high-traffic APIs | `c6g`, `c7i` | `c2`, `c3` | | **Memory Optimized** | Redis, high-RAM DBs, Analytics | `r6g`, `x2` | `m1`, `m2` | ### Sizing Formula (Basic) `Target Instances = (Total Peak Concurrent Requests * Avg Service Time per Req) / (Target Utilization per Core * Core Count)` --- ## 3. Database Sizing (RDS, Cloud SQL, Azure SQL) ### IOPS (Input/Output Operations Per Second) Disk performance is often the bottleneck, not CPU. * **GP3 (AWS)**: Baseline 3,000 IOPS included. Provision more for heavy writes. * **Provisioned IOPS (io2)**: For high-performance transactional DBs. ### Storage Growth Calculation `Required Storage = (Initial Data Size) + (Daily Ingest * Retention Period) * (1 + Overhead Buffer)` * *Buffer*: Always keep 20% free to allow for indexing and temp file creation. ### Connection Pool Sizing `Max Connections = (Instance RAM / 10MB) - (System Reserve)` * Too many connections lead to high "Context Switching" and performance degradation. --- ## 4. Cache Sizing (Redis/Memcached) Caching is a trade-off between **Memory Cost** and **Latency Benefits**. ### Formula: Working Set Size Not all data needs to be in cache. Only store the **Working Set** (frequently accessed data). 1. Measure Total Data Size. 2. Analyze Access Distribution (Pareto Principle: 80% access to 20% data). 3. **Cache Size = 20-30% of Total Data Size.** ### Eviction Policy Impact * **allkeys-lru**: Best for general caching. * **noeviction**: Returns errors when full (dangerous). --- ## 5. Container Sizing (Kubernetes) Understanding the difference between **Requests** and **Limits** is critical for both stability and cost. | Metric | Purpose | Cost Impact | | :--- | :--- | :--- | | **Requests** | Kubernetes guarantees this capacity. Used for scheduling. | **High**: Cloud Providers charge based on requests. | | **Limits** | The maximum a pod can burst to. | **Low**: Generally doesn't impact cost unless using serverless K8s. | ### The "OOMKill" Trap If `Memory Requests < Actual Usage`, the pod might be scheduled on a node that runs out of RAM, leading to an **OOMKill** (Out Of Memory). --- ## 6. Serverless Sizing (Lambda / Cloud Functions) Serverless "scaling" is handled by the provider, but "sizing" (Memory allocation) is handled by you. * **Power Tuning**: In AWS Lambda, increasing Memory also increases CPU proportionaly. * **Strategy**: Use `AWS Lambda Power Tuning` to find the "Sweet Spot" where performance and cost intersect. | Memory (MB) | Duration (ms) | Cost ($) | Result | | :--- | :--- | :--- | :--- | | 128 | 1000 | 0.0000021 | Slow | | 512 | 200 | 0.0000016 | **Winner (Faster & Cheaper)** | | 1024 | 150 | 0.0000025 | Diminishing returns | --- ## 7. Network and CDN Sizing * **Throughput**: Measure P99 payload size * Peak requests per second. * **CDN Coverage**: What % of your traffic can be served from the edge? * **Goal**: > 80% Cache Hit Ratio for static assets. * **Impact**: CDN bandwidth is 50-70% cheaper than origin egress. --- ## 8. Load Testing for Capacity Planning Never size based on assumptions. Use tools like **k6**, **Locust**, or **JMeter**. 1. **Stepping Test**: Gradually increase users until latency spikes (The "Knee" of the curve). 2. **Soak Test**: Run at 80% load for 24 hours to find memory leaks. 3. **Stress Test**: Find the "Breaking Point" to configure failover/auto-scaling. --- ## 9. Monitoring for Right-Sizing ### The Dashboard Template (Grafana/Datadog) * **CPU Heatmap**: Identify idle periods (e.g., weekends). * **RAM Saturation**: Identify "Memory Bloat". * **Disk Queue Depth**: Identify IOPS bottlenecks. * **Network In/Out**: Identify efficient vs inefficient regions. ### Automated Right-Sizing Tools * **AWS Compute Optimizer**: Provides JSON recommendations for instance types. * **VPA (Vertical Pod Autoscaler)**: Automatically adjusts K8s requests/limits. * **Goldilocks**: A K8s dashboard that visualizes VPA recommendations. --- ## 10. Capacity Planning Template | Component | Metric | Current Load | Growth (6mo) | Buffer | Target Spec | | :--- | :--- | :--- | :--- | :--- | :--- | | Web Tier | Peak Req/sec | 500 | 2x (1000) | 20% | 4x c6g.large | | Database | Storage | 500GB | +100GB/mo | 30% | 1.5TB GP3 | | Cache | Working Set | 8GB | 12GB | 10% | 16GB Node | --- ## 11. Real Sizing Scenario: SaaS API * **Initial Setup**: 10 nodes of `m5.xlarge` (4 vCPU, 16GB RAM). Monthly cost: $1,400. * **Observation**: CPU average 12%, RAM average 40%. * **Analysis**: The app is memory-bound, but CPU is idle. * **Action**: Switched to 5 nodes of `t3.large` ($350/mo) + enabled Auto-scaling. * **Result**: 75% cost reduction while maintaining the same performance metrics. --- ## Related Skills - `40-system-resilience/graceful-degradation` - `42-cost-engineering/cloud-cost-models` - `42-cost-engineering/budget-guardrails`