# OathScore Deep Audit — Comprehensive Forensic Verification A full, exhaustive audit of the entire OathScore codebase, infrastructure, and documentation. Reads EVERY source file, traces EVERY relationship, verifies EVERY cross-reference. Leaves no stone unturned. **This is NOT the session startup skill.** The session skill runs a quick audit. This skill does a COMPLETE forensic sweep — the kind that found 10 bugs in a single session when first run against OathScore. It takes time. It reads everything. It trusts nothing. ## Invocation ``` /oathscore-audit ``` Run when: - Major changes have been made and you need full verification - Before any deployment - When something seems wrong but you can't pinpoint it - Periodically (weekly) as a health check - Owner requests a deep audit ## PHILOSOPHY The audit that found 10 real bugs followed one principle: **read the actual code, compare it against everything else, and prove correctness rather than assume it.** Every check in this skill follows the pattern: 1. Read source file A 2. Read source file B (or config, or docs, or API response) 3. Verify they agree 4. If they don't → BUG **Do NOT:** - Trust the session file's description of what code does - Assume env vars are correct because they were correct last time - Skip a check because "it probably hasn't changed" - Stop after finding the first bug — there are always more **DO:** - Read every file listed in every section - Print findings as you go (so owner sees progress) - Track EVERY discrepancy, no matter how small - Categorize by severity: CRITICAL (breaks functionality), MEDIUM (wrong but functional), LOW (cosmetic/docs) --- ## LAYER 1: SOURCE CODE INTEGRITY (every .py file) ### 1.1 Complete Python File Inventory ```bash find . -type f -name "*.py" -not -path "./.git/*" -not -path "./data/*" -not -name "*.pyc" -not -path "./__pycache__/*" | sort ``` For EACH .py file found: - Read it - Note its purpose (1 line) - Note its imports - Note what it exports (functions, classes, constants) - Check: is it referenced by any other file? (orphan check) ### 1.2 Import Chain Verification For every `from src.X import Y` or `import src.X` in the codebase: - Does the imported module exist? - Does the imported name exist in that module? - Is the import used? (check for dead imports) ```bash # Extract all internal imports grep -rn "from src\." --include="*.py" . | grep -v __pycache__ grep -rn "import src\." --include="*.py" . | grep -v __pycache__ ``` Flag: broken imports, circular imports, dead imports. ### 1.3 Route Handler Audit Read `src/main.py` completely. For EVERY `@app.get`, `@app.post`, `@app.middleware`: - What route does it handle? - What does it return? - What can go wrong? (uncaught exceptions, missing error handling) - Does it have rate limiting applied? - Does it check authentication where needed? Build a complete route table: ``` ROUTE TABLE =========== GET /now → aggregator.build_now_response() [free, rate-limited] GET /health → inline [free, no limit] GET /score/{api} → scoring.compute_score() [free, rate-limited] ... (every route) ``` Cross-reference against: - `public/llms.txt` endpoint list - `public/llms-full.txt` endpoint list - `src/mcp_server.py` tool definitions - `tests/test_api.py` test coverage Flag: routes not in docs, routes not tested, routes not in MCP server. ### 1.4 Middleware Chain Audit Read `src/main.py` middleware stack: - What order do middlewares execute? - Does kill switch middleware cover all routes except /health? - Does rate limiting apply to the right routes? - Does CORS allow the right origins? - Are there any middleware conflicts? ### 1.5 Error Handling Audit For every route handler and background task: - What happens on exception? - Is the error logged? - Does it return a proper error response? - Can it crash the whole server? (unhandled exception in middleware or startup) --- ## LAYER 2: CONFIGURATION & ENVIRONMENT ### 2.1 Environment Variable Complete Map Read EVERY source file that accesses `os.environ` or `os.getenv`: ```bash grep -rn "os\.environ\|os\.getenv\|environ\[" --include="*.py" . | grep -v __pycache__ ``` Build a complete table: ``` ENV VAR MAP =========== Variable Name | Read By | Required? | Railway Set? SUPABASE_URL | src/monitor/supabase_store | Yes | ? SUPABASE_ANON_KEY | src/monitor/supabase_store | Yes | ? STRIPE_SECRET_KEY | src/billing.py | Yes | ? ... (every env var) ``` Then verify against Railway: ```bash railway variables 2>/dev/null ``` Flag: any env var read by code but not set in Railway, any Railway var not read by code. ### 2.2 API Key Cross-Reference (Deep) Read `src/monitor/config.py` MONITORED_APIS dictionary completely. For EACH monitored API: - `key_env` value → does this env var exist in Railway? - `secret_env` value (if present) → does this env var exist in Railway? - `base_url` → is it the correct production URL? (not staging, not deprecated) - `health` endpoint → does it actually return 200? - Each endpoint in `endpoints[]`: - Is the path correct? - Are the params correct? - Are headers handled? (for Alpaca, yfinance) - Does `{key}` substitution work? - Does `{secret}` substitution work? Verify by actually calling each API's health/first endpoint: ```bash # For each API, test the actual endpoint curl -s --max-time 10 "https://api.coingecko.com/api/v3/simple/price?ids=bitcoin&vs_currencies=usd" curl -s --max-time 10 "https://query1.finance.yahoo.com/v8/finance/chart/SPY?range=1d&interval=1d" -H "User-Agent: OathScore/1.0" # ... etc for each API ``` Flag: wrong URLs, missing keys, broken endpoints, auth that doesn't work. ### 2.3 Probe Code vs Config Alignment Read ALL probe files: - `src/monitor/ping_probe.py` - `src/monitor/freshness_probe.py` - `src/monitor/schema_probe.py` - `src/monitor/accuracy_probe.py` - `src/monitor/docs_probe.py` For EACH probe, verify: - Does it handle the `headers` dict from endpoint config? (needed for Alpaca, yfinance) - Does it handle `{key}` substitution in both params AND headers? - Does it handle `{secret}` substitution? - Does it skip endpoints correctly when keys are missing? - Does it import `get_api_secret` where needed? - Does it store results correctly? (what function does it call in store.py?) Flag: probes that can't handle new API patterns, probes not storing results. ### 2.4 Scoring Engine Deep Dive Read `src/monitor/scoring.py` line by line. For EACH of the 7 scoring components (accuracy, uptime, freshness, latency, schema, docs, trust): - HOW is it computed? (trace the exact logic) - WHAT data does it read? (which JSON file?) - WHAT happens when data is missing? (does it return None? 0? default?) - Is the weight correct? (compare against METHODOLOGY.md) - Are the brackets/thresholds reasonable? Verify the reweighting logic: - When a component is None, do the remaining weights sum to 1.0? - Can a score ever exceed 100 or go below 0? - Does the grade calculation match the thresholds? Verify `persist_daily_scores()`: - What fields does it write to Supabase? - Do those fields match the `daily_scores` table schema in migrations? - Is it called by the scheduler? At what interval? ### 2.5 Scheduler Complete Wiring Check Read `src/monitor/scheduler.py` completely. Build a table: ``` SCHEDULER WIRING ================ Task Name | Function | Interval | Import Source ping | ping_all | 60s | src.monitor.ping_probe freshness | check_freshness | 300s | src.monitor.freshness_probe schema | check_schemas | 3600s | src.monitor.schema_probe ... (every task) ``` Cross-reference: - Every probe file → is it wired in the scheduler? - Every scheduler task → does the imported function exist? - Are intervals reasonable? - What happens if a task throws an exception? (does it crash the loop?) ### 2.6 Storage Layer Verification Read `src/monitor/store.py` completely. Read `src/monitor/supabase_store.py` completely. For EACH store function: - Does it write to local JSON AND Supabase (dual-write)? - What file path does it write locally? - What Supabase table does it write to? - Does the data shape match the migration schema? - Is there error handling if Supabase is down? Read `migrations/001_initial.sql`: - List every table and column - Cross-reference against store.py write calls - Cross-reference against supabase_store.py Flag: tables that exist but nothing writes to them, store calls that write to non-existent tables. --- ## LAYER 3: BILLING & SECURITY ### 3.1 Stripe Integration Audit Read `src/billing.py` completely. - What Stripe API calls does it make? - How are API keys generated? (format, storage) - How is the webhook verified? (HMAC signature check) - What happens on subscription creation/cancellation? - Are there any hardcoded Stripe IDs that could go stale? - Does the pricing match what's in llms.txt and the pricing endpoint? Read `src/rate_limit.py` completely. - What are the tier limits? (requests per day for each tier) - How is the tier determined? (API key → Stripe lookup → tier) - What happens when rate limit is hit? (429 response?) - Is the rate limiter per-IP for free tier? ### 3.2 Kill Switch Verification Read the kill switch implementation in `src/main.py`: - Where is the kill switch file? (`data/kill_switch.json`) - What routes does it exempt? (should be /health only) - What status code does it return? (should be 503) - Can the kill switch be activated remotely? (should be file-based only) - Does /health show kill switch status? ### 3.3 x402 Micropayment Verification Read `src/x402.py` (if exists): - Is it active or dormant? - What wallet address is configured? - What are the per-call prices? - Is it wired into the middleware chain? ### 3.4 Security File Verification Read `public/robots.txt`: - Are training crawlers blocked? - Are discovery bots allowed? - Any new crawlers that should be added? Read `public/.well-known/security.txt`: - Is the Expires date still in the future? - Is the contact information correct? Read `public/.well-known/ai-plugin.json`: - Is the schema version current? - Are URLs correct? - Is the description accurate? --- ## LAYER 4: DOCUMENTATION & PUBLIC FILES ### 4.1 llms.txt Accuracy Read `public/llms.txt`. For EVERY claim: - Endpoint list → does each endpoint exist in main.py routes? - Rate limits → do they match rate_limit.py? - Pricing → does it match billing.py? - MCP tool count → does it match mcp_server.py? - Base URL → correct? ### 4.2 llms-full.txt Accuracy Read `public/llms-full.txt`. Same checks as 4.1, plus: - Response field descriptions → do they match actual API responses? - Rated APIs table → does it match MONITORED_APIS in config.py? - Scoring methodology → does it match scoring.py weights? - Contact/GitHub URLs → correct? ### 4.3 ai.txt Accuracy Read `public/ai.txt`: - All URLs correct and reachable? ### 4.4 README.md Accuracy Read `README.md`: - API count matches MONITORED_APIS? - Endpoint examples use correct URLs? - MCP config is correct? - Scoring methodology matches scoring.py? - Installation instructions work? ### 4.5 CLAUDE.md Accuracy Read `CLAUDE.md`: - Directory map matches actual directory structure? - Env var names match code? - Deploy instructions correct? ### 4.6 Session File Comprehensive Verification Read `tracking/OATHSCORE_SESSION.md`: - File map: EVERY file listed → does it exist on disk? - File map: ANY file on disk not in the map? - Constants section: EVERY constant → matches actual code? - Env var table: EVERY entry → matches config.py key_env AND Railway? - Known bugs: any resolved but not marked? - Pending tasks: any completed but not moved? ### 4.7 Project Tracker Sync Read `tracking/PROJECT_TRACKER.md`: - Tasks marked DONE → are they actually done? (check files exist, features work) - Tasks marked TODO → are any already done? (check recent commits) - Priority queue → still accurate? - Status summary counts → match the actual task statuses? --- ## LAYER 5: EXTERNAL INTEGRATIONS ### 5.1 Live API Comprehensive Test Test EVERY public endpoint: ```bash # Core endpoints curl -s --max-time 10 https://api.oathscore.dev/health | python -m json.tool curl -s --max-time 10 https://api.oathscore.dev/now | python -c "import sys,json; d=json.load(sys.stdin); print(json.dumps({k:type(v).__name__ for k,v in d.items()}, indent=2))" curl -s --max-time 10 https://api.oathscore.dev/scores | python -m json.tool curl -s --max-time 10 https://api.oathscore.dev/pricing | python -m json.tool # Score endpoints for each monitored API for api in curistat alphavantage polygon finnhub twelvedata eodhd fmp fred coingecko alpaca yfinance; do echo "=== $api ===" curl -s --max-time 10 "https://api.oathscore.dev/score/$api" | python -c "import sys,json; d=json.load(sys.stdin); print(f'score={d.get(\"composite_score\",\"N/A\")} grade={d.get(\"grade\",\"N/A\")} points={d.get(\"data_points\",0)}')" 2>/dev/null done # Compare endpoint curl -s --max-time 10 "https://api.oathscore.dev/compare?apis=polygon,finnhub" | python -m json.tool # Alerts endpoint curl -s --max-time 10 https://api.oathscore.dev/alerts | python -m json.tool # Status endpoint curl -s --max-time 10 https://api.oathscore.dev/status | python -c "import sys,json; d=json.load(sys.stdin); print(f'uptime probes: {d.get(\"probe_intervals\",{})}')" 2>/dev/null # Discovery files curl -s --max-time 10 https://api.oathscore.dev/llms.txt | head -5 curl -s --max-time 10 https://api.oathscore.dev/llms-full.txt | head -5 curl -s --max-time 10 https://api.oathscore.dev/.well-known/ai-plugin.json | python -m json.tool curl -s --max-time 10 https://api.oathscore.dev/.well-known/security.txt curl -s --max-time 10 https://api.oathscore.dev/robots.txt | head -10 ``` For EACH response, verify: - Status code is 200 (or expected code) - Response body has expected structure - Data makes sense (VIX not 0, exchanges have reasonable status) - Rate limit headers present where expected ### 5.2 Monitored API Reachability For EACH API in MONITORED_APIS, test one endpoint: ```bash # Test each monitored API is actually reachable curl -s --max-time 10 -o /dev/null -w "%{http_code}" "https://api.coingecko.com/api/v3/simple/price?ids=bitcoin&vs_currencies=usd" curl -s --max-time 10 -o /dev/null -w "%{http_code}" "https://query1.finance.yahoo.com/v8/finance/chart/SPY?range=1d&interval=1d" -H "User-Agent: OathScore/1.0" curl -s --max-time 10 -o /dev/null -w "%{http_code}" "https://api.stlouisfed.org/fred/series/observations?series_id=DGS10&limit=1&sort_order=desc&file_type=json&api_key=FRED_KEY_HERE" # ... etc ``` Flag: any monitored API returning non-200. ### 5.3 GitHub Repository Sync ```bash # Check remote vs local cd && git fetch origin 2>/dev/null git log --oneline origin/main..HEAD # local ahead git log --oneline HEAD..origin/main # remote ahead # Uncommitted changes git status -s # Untracked files that should be committed git status -s | grep "^??" | grep -v "__pycache__\|.pyc\|data/\|.env" ``` Flag: uncommitted changes, untracked source files, local/remote divergence. ### 5.4 MCP Directory Status ```bash # Glama curl -s "https://glama.ai/mcp/servers/oathscore" -o /dev/null -w "%{http_code}" 2>/dev/null # GitHub PRs and issues gh issue view 668 --repo chatmcp/mcpso --json state,title 2>/dev/null || echo "Cannot check mcp.so" gh pr view 2694 --repo punkpeye/awesome-mcp-servers --json state,title 2>/dev/null || echo "Cannot check punkpeye PR" ``` ### 5.5 Stripe Health ```bash curl -s --max-time 10 https://api.oathscore.dev/pricing | python -c " import sys,json; d=json.load(sys.stdin) tiers=d.get('tiers',{}) for name,tier in tiers.items(): print(f'{name}: \${tier.get(\"price_monthly\",\"?\")}/mo, {tier.get(\"slots_remaining\",\"?\")} slots') " 2>/dev/null || echo "Pricing endpoint failed" ``` ### 5.6 Railway Deployment Health ```bash # Check Railway variables count railway variables 2>/dev/null | wc -l || echo "Railway CLI not available" # Check Railway logs for errors railway logs --limit 50 2>/dev/null | grep -iE "error|exception|traceback|critical" | head -10 || echo "Railway CLI not available" ``` --- ## LAYER 6: DEPENDENCY & INFRASTRUCTURE ### 6.1 Requirements.txt Audit Read `requirements.txt`: - For EACH dependency: is it actually imported somewhere in the code? - For EACH import in the code: is the package in requirements.txt? - Are there version pins? Are any outdated? ```bash # Cross-reference cat requirements.txt grep -rn "^import \|^from " --include="*.py" . | grep -v __pycache__ | awk '{print $2}' | sort -u ``` ### 6.2 Dockerfile Audit Read `Dockerfile`: - Base image reasonable? - Does it copy all necessary files? - Does it expose the right port? - Does it install all requirements? - Any security concerns? (running as root, exposing secrets) Read `.dockerignore`: - Are data files excluded? - Are .env files excluded? - Are .git files excluded? ### 6.3 GitHub Actions Audit Read `.github/workflows/health-check.yml`: - All URLs correct (api.oathscore.dev, not old Railway URL)? - Schedule reasonable? - Does it actually test what it claims to? ### 6.4 Railway Config Read `railway.json` (if exists): - Build command correct? - Start command correct? - Region set? --- ## FINAL REPORT After ALL layers complete, produce the full audit report: ``` ╔══════════════════════════════════════════════════════════════╗ ║ OATHSCORE DEEP AUDIT REPORT ║ ║ Date: [DATE] ║ ╚══════════════════════════════════════════════════════════════╝ LAYER 1: SOURCE CODE INTEGRITY Python files found: N Import chains verified: N OK / N broken Routes documented: N/N Routes tested: N/N Dead imports: N Orphan files: N LAYER 2: CONFIGURATION & ENVIRONMENT Env vars (code vs Railway): N matched / N mismatched Monitored APIs: N configured / N with keys / N reachable Probe-config alignment: N OK / N issues Scoring components: N active / N placeholder Scheduler tasks: N wired / N missing Storage dual-write: N OK / N single-write only LAYER 3: BILLING & SECURITY Stripe integration: [OK / issues] Rate limiting: [OK / issues] Kill switch: [OK / issues] x402: [Active / Dormant] Security files: N/N present and current LAYER 4: DOCUMENTATION llms.txt accuracy: N/N claims verified llms-full.txt accuracy: N/N claims verified README accuracy: N/N claims verified CLAUDE.md accuracy: N/N claims verified Session file accuracy: N/N entries verified Tracker sync: N/N tasks correctly statused LAYER 5: EXTERNAL INTEGRATIONS Live API endpoints: N/N responding Monitored APIs reachable: N/N GitHub sync: [Clean / N uncommitted / N diverged] MCP directories: [status of each] Stripe: [OK / issue] Railway: [OK / N errors in logs] LAYER 6: INFRASTRUCTURE Dependencies: N in requirements / N imported / N unused / N missing Dockerfile: [OK / issues] CI/CD: [OK / issues] ═══════════════════════════════════════════════════════════════ BUGS FOUND: N ────────────────────────────────────────────────────────────── [CRITICAL] Bug description — file:line — impact [CRITICAL] Bug description — file:line — impact [MEDIUM] Bug description — file:line — impact [LOW] Bug description — file:line — impact STALE DOCUMENTATION: N entries ────────────────────────────────────────────────────────────── [File] — what's stale — current value vs documented value RECOMMENDATIONS: ────────────────────────────────────────────────────────────── 1. [Fix critical bugs immediately] 2. [Update stale docs] 3. [Address medium bugs] 4. [Infrastructure improvements] ═══════════════════════════════════════════════════════════════ Audit complete. N files read. N cross-references checked. ``` --- ## CRITICAL RULES FOR THIS SKILL 1. **READ EVERY FILE.** Not "check if it exists." Open it. Read the contents. Verify the logic. 2. **TRUST NOTHING.** The session file, the docs, the README — they are claims to verify, not facts to accept. 3. **CROSS-REFERENCE EVERYTHING.** The bug pattern is always: File A says X, File B says Y, X ≠ Y. 4. **DON'T STOP AT THE FIRST BUG.** There are always more. The first 3 times this audit was attempted, it stopped too early. Go through EVERY layer. 5. **PRINT PROGRESS.** The owner should see each layer completing in real time, not wait 10 minutes for a silent result. 6. **SEVERITY MATTERS.** CRITICAL = breaks functionality for users. MEDIUM = wrong but works. LOW = cosmetic or docs-only. 7. **NO SHORTCUTS.** If a layer says "read every .py file," read every .py file. If it says "test every endpoint," test every endpoint. The bugs hide in the files you skip. 8. **INCLUDE LINE NUMBERS.** When reporting bugs, include the exact file path and line number so fixes are immediate. 9. **VERIFY FIXES.** If you fix a bug during the audit, re-run that specific check to prove the fix worked. 10. **UPDATE SESSION FILE.** After the audit, update tracking/OATHSCORE_SESSION.md with any bugs found, fixed, or remaining.