---
# ═══════════════════════════════════════════════════════════════════════════
# SKILL: System Design
# Version: 2.0.0 | Updated: 2025-01
# ═══════════════════════════════════════════════════════════════════════════
name: system-design
description: System design, software architecture, API design, cybersecurity, and threat modeling. Build secure, scalable systems.

# ACTIVATION TRIGGERS
triggers:
  - architecture
  - system design
  - security
  - api
  - scalability
  - owasp
  - threat modeling
  - compliance

# SKILL PARAMETERS
parameters:
  system_type:
    type: string
    required: true
    description: Type of system (web app, API, data platform)
  scale:
    type: string
    enum: [startup, growth, enterprise]
    required: false
  security_level:
    type: string
    enum: [standard, high, compliance]
    required: false

# OUTPUT SPECIFICATION
outputs:
  architecture:
    type: object
  security_measures:
    type: array
  trade_offs:
    type: array

# RELIABILITY
retry:
  max_attempts: 3
  backoff: exponential

# OBSERVABILITY
observability:
  log_level: info

level: advanced
prerequisites:
  - core-development
  - data-structures

sasmp_version: "1.3.0"
bonded_agent: 01-core-paths
bond_type: PRIMARY_BOND
---

# System Design Skill

## Quick Reference

| Pattern | Best For | Complexity | Scaling |
|---------|----------|------------|---------|
| **Monolith** | Startups, MVPs | Low | Limited |
| **Microservices** | Large teams | High | Excellent |
| **Serverless** | Event-driven | Medium | Auto |
| **Event-Driven** | High throughput | High | Excellent |

---

## Scalability Progression

```
Level 1: Single Server
    │
    ▼ Bottleneck: CPU/Memory
Level 2: Load Balancer + Multiple Servers
    │
    ▼ Bottleneck: Database reads
Level 3: Caching Layer (Redis)
    │
    ▼ Bottleneck: Database writes
Level 4: Read Replicas
    │
    ▼ Bottleneck: Single DB limits
Level 5: Sharding / Partitioning
    │
    ▼ Bottleneck: Cross-shard queries
Level 6: CQRS + Event Sourcing
```

---

## Architecture Decision Tree

```
What's your team size and product stage?
│
├─► Team < 10, product unclear
│   └─► Monolith (start simple)
│
├─► Team > 10, clear domain boundaries
│   └─► Microservices
│
├─► Variable workloads, pay-per-use
│   └─► Serverless
│
└─► High throughput, async workflows
    └─► Event-Driven
```

---

## API Design

### REST Best Practices
```
GET    /api/v1/users              # List
GET    /api/v1/users/{id}         # Get
POST   /api/v1/users              # Create
PUT    /api/v1/users/{id}         # Replace
PATCH  /api/v1/users/{id}         # Update
DELETE /api/v1/users/{id}         # Delete

GET    /api/v1/users/{id}/orders  # Nested
```

### HTTP Status Codes
| Code | Meaning | Use When |
|------|---------|----------|
| 200 | OK | GET/PUT/PATCH success |
| 201 | Created | POST success |
| 204 | No Content | DELETE success |
| 400 | Bad Request | Invalid input |
| 401 | Unauthorized | No/invalid auth |
| 403 | Forbidden | No permission |
| 404 | Not Found | Resource missing |
| 429 | Too Many Requests | Rate limited |
| 500 | Server Error | Server failure |

---

## Database Selection

| Use Case | Best Choice | Notes |
|----------|-------------|-------|
| Transactions | PostgreSQL | ACID, most versatile |
| High write | Cassandra | Write-optimized |
| Caching | Redis | Sub-millisecond |
| Search | Elasticsearch | Full-text search |
| Analytics | BigQuery | Column-store |
| Time-series | TimescaleDB | Time-based data |
| Graph | Neo4j | Relationships |

---

## Security: OWASP Top 10 (2025)

| # | Vulnerability | Prevention |
|---|---------------|------------|
| 1 | Broken Access Control | Verify auth on every request |
| 2 | Cryptographic Failures | TLS 1.3, AES-256, Argon2 |
| 3 | Injection | Parameterized queries |
| 4 | Insecure Design | Threat modeling |
| 5 | Security Misconfiguration | Harden defaults |
| 6 | Vulnerable Components | Dependency scanning |
| 7 | Auth Failures | MFA, rate limiting |
| 8 | Data Integrity | Sign data, verify sources |
| 9 | Logging Failures | Comprehensive logging |
| 10 | SSRF | Allowlist URLs |

---

## Encryption Standards

| Layer | Standard | Notes |
|-------|----------|-------|
| In Transit | TLS 1.3 | HTTPS everywhere |
| At Rest | AES-256 | Encrypt sensitive data |
| Passwords | Argon2id | bcrypt acceptable |
| API Keys | SHA-256 | Store hashed |

---

## Threat Modeling: STRIDE

```
┌─────────────────────────────────────────┐
│              STRIDE MODEL                │
├─────────────────────────────────────────┤
│  S - Spoofing                           │
│      → Strong auth, MFA                 │
│                                         │
│  T - Tampering                          │
│      → Integrity checks, signatures     │
│                                         │
│  R - Repudiation                        │
│      → Audit logging                    │
│                                         │
│  I - Information Disclosure             │
│      → Encryption, access control       │
│                                         │
│  D - Denial of Service                  │
│      → Rate limiting, DDoS protection   │
│                                         │
│  E - Elevation of Privilege             │
│      → Least privilege, RBAC            │
└─────────────────────────────────────────┘
```

---

## Compliance Requirements

| Standard | Domain | Key Requirements |
|----------|--------|------------------|
| GDPR | EU Data | Consent, right to delete |
| HIPAA | Healthcare | PHI encryption, audit logs |
| SOC 2 | Services | Security controls |
| PCI DSS | Payments | Card data protection |
| CCPA | CA Privacy | Consumer rights |

---

## Disaster Recovery

| Strategy | RTO | RPO | Cost |
|----------|-----|-----|------|
| Backup/Restore | Hours | Hours | Low |
| Pilot Light | 10s min | Minutes | Medium |
| Warm Standby | Minutes | Seconds | High |
| Active-Active | Seconds | Zero | Very High |

---

## Troubleshooting

```
System not scaling?
├─► Database bottleneck? → Add caching, replicas
├─► Single point of failure? → Add redundancy
├─► Stateful services? → Make stateless
└─► Network limits? → CDN, optimize payloads

Security incident response?
├─► 1. CONTAIN: Isolate affected systems
├─► 2. IDENTIFY: Scope and entry point
├─► 3. ERADICATE: Remove threat, patch
├─► 4. RECOVER: Restore from clean backup
└─► 5. LEARN: Post-mortem, improve
```

---

## Common Failure Modes

| Symptom | Root Cause | Recovery |
|---------|------------|----------|
| Cascading failures | Tight coupling | Circuit breakers |
| Works locally | Env differences | Containers, IaC |
| Data breach | Missing controls | Audit, RBAC |
| Audit failed | Missing compliance | Gap analysis |

---

## Next Actions

Describe your system requirements for architecture recommendations.