--- name: operational-readiness description: | Operational Readiness Checklist for Reown services. Use when service owners ask to: check production readiness, validate a service before launch, run operational readiness review, audit service compliance, check if service is ready for production, or validate infrastructure/security posture. Triggers: "operational readiness", "production readiness", "launch checklist", "service review", "pre-launch audit", "ORC", "is my service ready", "check my service", "readiness review" --- # Operational Readiness Checklist Comprehensive checklist to validate services before production launch. Analyzes codebase + asks interactive questions for items that cannot be detected from code. ## Workflow Overview 1. **Gather context** - Identify service type, tech stack, and traffic expectations 2. **Analyze codebase** - Scan for CI/CD configs, infrastructure code, security patterns 3. **Interactive verification** - Ask about items that cannot be detected from code 4. **Generate report** - Produce checklist report with priorities and remediation guidance ## Step 1: Gather Context Ask the user these questions using `AskUserQuestion`: **Service Classification:** - Service type: Backend API, Frontend/Web App, Infrastructure/Platform, or Hybrid - Expected traffic: <100 req/min (low), 100-1000 req/min (medium), >1000 req/min (high) - Data handling: Stores user data (yes/no), Processes PII (yes/no) - Public-facing: Yes/No - Has email functionality: Yes/No - Uses database: Yes/No (if yes, which: PostgreSQL, Supabase, DynamoDB, etc.) **Tech Stack Detection:** Auto-detect from files: - `Cargo.toml` → Rust service - `package.json` → Node.js/TypeScript - `*.tf` or `*.tfvars` → Terraform - `cdk.json` or `*.cdk.ts` → AWS CDK - `.github/workflows/*.yml` → GitHub Actions CI/CD - `next.config.js` → Next.js frontend - `Dockerfile` → Containerized service ## Step 2: Codebase Analysis Analyze the codebase for evidence of checklist items. Use Glob and Grep to find: **CI/CD Detection:** ``` .github/workflows/*.yml - GitHub Actions Cargo.toml + [profile.release] - Rust build config jest.config.* / vitest.config.* - Test configuration *.tf - Terraform files cdk.json - CDK configuration ``` **Security Detection:** ``` **/security*.yml - Security scanning workflows dependabot.yml - Dependency updates CODEOWNERS - Code ownership *.lock files - Dependency locking ``` **Observability Detection:** ``` **/tracing*.rs or opentelemetry* - Distributed tracing sentry.* or @sentry/* - Error tracking prometheus* or metrics* - Metrics collection **/logging*.* or log4* or tracing* - Logging config ``` **Infrastructure Detection:** ``` **/autoscaling* in .tf files - Autoscaling config **/secretsmanager* or **/ssm* - Secrets management health* endpoints in code - Health checks ``` ## Step 3: Interactive Verification For items that cannot be detected from code, ask yes/no questions. Group questions by category to avoid overwhelming the user. ## Step 4: Generate Report Output format: ```markdown # Operational Readiness Report: [Service Name] **Service Type:** [Backend API / Frontend / Infrastructure] **Tech Stack:** [Detected stack] **Generated:** [Date] ## Summary - **Overall Readiness:** [X/Y items passing] ([Z%]) - **Launch Blockers (P0):** [count] - **High Priority (P1):** [count] - **Medium Priority (P2):** [count] - **Low Priority (P3):** [count] ## Observability | Item | Status | Priority | Notes | |------|--------|----------|-------| | ... | ✅/❌/⚠️ | P0-P3 | ... | [Repeat for each category] ## Remediation Summary [List failing items with links to remediation guidance] ``` --- ## Checklist Items by Category ### Observability (O11Y) | Item | Priority | Applies To | Detection Method | |------|----------|------------|------------------| | Alarmable top-level metric OR Canary (OpsGenie integrated) | P0 | High traffic (>100 req/min) | Ask | | Canary coverage (if <100 req/min) | P0 | Low traffic | Ask | | DB/Queue monitoring (CPU/Disk/Memory) | P1 | Services with DB/Queue | Ask | | Logging configured and viewable | P1 | All | Grep for logging config | | Audit/security log retention (min 1 year for SOC 2 Type 2) | P1 | All | Ask | | Distributed tracing (OpenTelemetry/Jaeger) | P2 | Backend services | Grep for otel/tracing | | Sentry instrumentation | P1 | Frontend only | Grep for @sentry | | status.reown.com integration | P3 | Public-facing | Ask | > **Note on log retention scope:** The 1-year minimum retention applies specifically to **audit/security event logs** — authentication attempts, authorization decisions, admin actions, data access events, and configuration changes. General application logs and error tracking (e.g. Sentry) are not subject to this requirement. This aligns with SOC 2 Type 2 audit trail requirements. **Remediation:** See [references/remediation-o11y.md](references/remediation-o11y.md) --- ### CI/CD & Testing | Item | Priority | Applies To | Detection Method | |------|----------|------------|------------------| | CI runs unit/functional tests (>80% critical path coverage) | P0 | All | Check workflow files | | CD runs integration/e2e tests | P1 | All | Check workflow files | | Load testing performed | P1 | High traffic / user-facing | Ask | | Rollback procedure documented and tested | P1 | All | Ask | | Post-deploy health checks | P2 | All | Check workflow files | **Remediation:** See [references/remediation-cicd.md](references/remediation-cicd.md) --- ### Primitives (Infrastructure) | Item | Priority | Applies To | Detection Method | |------|----------|------------|------------------| | Runbook documented (failure modes, troubleshooting, escalation) | P0 | All | Ask | | Infrastructure as code (Terraform/CDK) | P0 | All | Check for .tf or cdk files | | Autoscaling configured | P1 | Backend services | Grep .tf for autoscaling | | Healthcheck endpoint (memory, filesystem, dependencies) | P1 | All | Grep for /health endpoint | | Multi-AZ deployment (2+ pods/instances) | P1 | All | Ask | | Secrets management (AWS SM, Vault) - no secrets in code | P0 | All | Grep for hardcoded secrets, check .tf | | Configuration management (env separation) | P2 | All | Check for env-specific configs | | Data Lake integration | P3 | Analytics needs | Ask | **Remediation:** See [references/remediation-primitives.md](references/remediation-primitives.md) --- ### Security | Item | Priority | Applies To | Detection Method | |------|----------|------------|------------------| | OWASP Top 10 2025 validation | P0 | All | Ask | | Secure design review (threat modeling) | P1 | All | Ask | | Dependency scanning enabled + SBOM | P1 | All | Check for dependabot, snyk | | Software/data integrity (code signing, CI/CD security) | P2 | All | Ask | | Fail-secure exception handling | P1 | All | Code review | | Service-to-service auth (mTLS, JWT, API keys) | P1 | Backend with internal APIs | Ask | | Clickjacking headers (X-Frame-Options, CSP) | P1 | Frontend only | Grep for security headers | | SPF records | P2 | Services with email | Ask | | DKIM records | P2 | Services with email | Ask | | RLS policies (Supabase/DB) | P0 | Services with Supabase | Ask | | Rate limiting | P1 | Public APIs | Grep for rate limit config | | DDoS protection (Cloudflare/AWS Shield) | P1 | Public-facing | Ask | | API authentication | P1 | Public APIs | Grep for auth middleware | | Audit logging (auth, admin, data access) | P2 | All | Grep for audit log | **Remediation:** See [references/remediation-security.md](references/remediation-security.md) --- ### 3rd Party Services | Item | Priority | Applies To | Detection Method | |------|----------|------------|------------------| | Metrics integration for 3rd parties | P2 | Services using 3rd parties | Ask | | Status page integration (Slack channel minimum) | P2 | Services using 3rd parties | Ask | | RPC rate limits configured | P1 | Services using RPCs | Ask | **Remediation:** See [references/remediation-dependencies.md](references/remediation-dependencies.md) --- ### Service Dependencies | Item | Priority | Applies To | Detection Method | |------|----------|------------|------------------| | Upstream dependencies documented | P1 | All | Ask | | Downstream dependencies documented | P1 | All | Ask | | Dependency health in service health endpoint | P2 | All | Code review | | Fallback behavior for non-critical deps | P2 | All | Ask | **Remediation:** See [references/remediation-dependencies.md](references/remediation-dependencies.md) --- ### Data Retention & Privacy | Item | Priority | Applies To | Detection Method | |------|----------|------------|------------------| | Data retention policy defined | P1 | Services with persistent data | Ask | | GDPR: Personal data identified | P1 | Services handling user data | Ask | | GDPR: DSAR process defined | P1 | Services handling user data | Ask | | GDPR: Right to be forgotten process | P1 | Services handling user data | Ask | | Privacy policy updated | P2 | User-facing services | Ask | | DPAs with third-party processors | P2 | Services sharing data | Ask | **Remediation:** See [references/remediation-privacy.md](references/remediation-privacy.md) --- ### Efficiency & Frugality | Item | Priority | Applies To | Detection Method | |------|----------|------------|------------------| | Resource-efficient implementation | P2 | All | Code review | | Cost scaling model documented | P2 | All | Ask | | Spend caps / usage alerts configured | P2 | All | Ask | | FinOps review completed | P3 | All | Ask | **Remediation:** See [references/remediation-efficiency.md](references/remediation-efficiency.md) --- ## Priority Definitions | Priority | Meaning | Action Required | |----------|---------|-----------------| | **P0** | Launch blocker | Must fix before production | | **P1** | High priority | Fix within current sprint | | **P2** | Medium priority | Fix within quarter | | **P3** | Nice to have | Address when convenient | ## Status Indicators - ✅ **Pass** - Item verified as compliant - ❌ **Fail** - Item not compliant, needs remediation - ⚠️ **Partial** - Partially compliant, improvements needed - ➖ **N/A** - Not applicable to this service type