---
name: delivery-and-infra
description: Use when designing cloud infrastructure, CI/CD pipelines, or deployment strategies
---

# Delivery & Infrastructure

Guidelines for cloud architecture, CI/CD, and deployment engineering.

## When to Use

- Designing cloud infrastructure (AWS/Azure/GCP)
- Setting up CI/CD pipelines
- Container orchestration (Kubernetes)
- Infrastructure as Code (Terraform)
- Deployment strategy selection

## Core Principles

1. **Automate Everything** - No manual deployment steps
2. **Infrastructure as Code** - Version-controlled, reproducible
3. **Build Once, Deploy Anywhere** - Immutable artifacts
4. **Fast Feedback** - Fail fast, fix fast
5. **Security by Design** - Embedded throughout lifecycle
6. **GitOps as Truth** - Git is single source of truth
7. **Zero-Downtime** - All deployments without user impact

## CI/CD Pipeline

### Standard Stages
```
Lint → Test → Security Scan → Build → Deploy (staging) → Deploy (prod)
```

### Pipeline Checklist
- [ ] Parallel execution where possible
- [ ] Caching for faster builds
- [ ] Environment-based deployment gates
- [ ] Automated rollback on failure
- [ ] Security scanning (SAST/DAST)
- [ ] Artifact versioning and signing

### GitHub Actions Example
```yaml
name: CI/CD
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm run lint

  test:
    needs: lint
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npm test -- --coverage

  build:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: docker/build-push-action@v5
        with:
          push: true
          tags: ${{ env.IMAGE }}:${{ github.sha }}
```

## Container Best Practices

### Dockerfile Checklist
- [ ] Multi-stage builds for minimal size
- [ ] Non-root user for runtime
- [ ] No secrets in image layers
- [ ] Minimal base image (alpine, distroless)
- [ ] Health checks defined
- [ ] Resource limits configured

### Example Dockerfile
```dockerfile
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

FROM node:20-alpine
RUN adduser -S app -u 1001
WORKDIR /app
COPY --from=builder --chown=app /app/dist ./dist
COPY --from=builder --chown=app /app/node_modules ./node_modules
USER app
EXPOSE 3000
HEALTHCHECK CMD node healthcheck.js
CMD ["node", "dist/main.js"]
```

## Deployment Strategies

| Strategy | Use Case | Rollback |
|----------|----------|----------|
| Rolling | Default, gradual | `kubectl rollout undo` |
| Blue-Green | Quick switch, validation period | Switch service back |
| Canary | Risk mitigation, A/B testing | Route 100% to stable |

## Kubernetes Essentials

### Production Deployment Checklist
- [ ] Resource requests and limits defined
- [ ] Liveness and readiness probes
- [ ] Security context (non-root, read-only)
- [ ] Pod anti-affinity for HA
- [ ] HPA for auto-scaling
- [ ] PDB for availability during updates
- [ ] Secrets from external manager

### Key Resources
```yaml
# Deployment with probes
livenessProbe:
  httpGet:
    path: /health
    port: 3000
  initialDelaySeconds: 30
readinessProbe:
  httpGet:
    path: /ready
    port: 3000
  initialDelaySeconds: 10
```

## Infrastructure as Code

### Terraform Best Practices
- Use modules for reusable components
- Remote state with locking
- Input validation on variables
- Semantic versioning for modules
- Consistent naming conventions
- Document all outputs

### Module Structure
```
modules/
  vpc/
    main.tf
    variables.tf
    outputs.tf
```

## Cloud Architecture

### Design Principles
1. **Design for Failure** - Self-healing mechanisms
2. **Security by Design** - Least privilege, encryption
3. **Cost-Conscious** - Right-size, savings plans
4. **Managed Services** - Reduce operational overhead

### HA/DR Patterns
- Multi-AZ for availability
- Multi-region for disaster recovery
- Cross-region backup replication
- DNS-based failover

### Cost Optimization
- Right-sizing based on usage
- Reserved instances/savings plans
- Lifecycle policies for storage
- Auto-scaling to match demand
- Regular unused resource cleanup

## Security Checklist

- [ ] VPC with private subnets
- [ ] Security groups (least privilege)
- [ ] IAM roles (minimal permissions)
- [ ] Encryption at rest and in transit
- [ ] Secrets in dedicated manager
- [ ] Audit logging enabled
- [ ] DDoS protection configured

## Observability

### Metrics to Track
- **RED**: Rate, Errors, Duration
- **USE**: Utilization, Saturation, Errors

### Key Dashboards
- Application performance (latency, errors)
- Infrastructure health (CPU, memory, disk)
- Business metrics (transactions, users)

### Alerting Rules
```yaml
- alert: HighErrorRate
  expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
  for: 5m
  annotations:
    summary: "Error rate > 5%"
```

## Runbook Template

```markdown
## Deployment Runbook

### Prerequisites
- kubectl access, valid kubeconfig
- Registry credentials
- Monitoring access

### Deploy
1. Merge to main → pipeline triggers
2. Monitor pipeline in CI
3. Verify in monitoring dashboard

### Rollback
kubectl rollout undo deployment/app -n production

### Troubleshooting
- Pods not starting: `kubectl logs -f deployment/app`
- High errors: Check app logs and metrics
```