--- name: storage-debug-instrumentation description: Add comprehensive debugging and observability tooling for backend storage layers (PostgreSQL, ChromaDB) and startup metrics. Includes storage drift detection, raw data inspection endpoints, and a Next.js admin dashboard. --- # Storage Debug Instrumentation ## Purpose Enable rapid diagnosis of storage state, synchronization health, and backend performance bottlenecks by exposing: - Raw article inspection from both PostgreSQL and ChromaDB - Storage drift detection (missing/dangling entries) - Detailed startup timeline breakdown (DB init, cache preload, vector store, RSS refresh) - One-page debug dashboard consolidating all diagnostics ## Scope - Backend: `app/services/startup_metrics.py`, `app/main.py`, `app/vector_store.py`, `app/database.py`, `app/api/routes/debug.py` - Frontend: `frontend/lib/api.ts`, `frontend/app/debug/page.tsx` - No schema changes; purely additive instrumentation and debug routes ## Workflow ### 1. Create startup metrics service **File:** `backend/app/services/startup_metrics.py` - Implement thread-safe `StartupMetrics` class to record phase timings - Expose `record_event(name, started_at, detail, metadata)` for phase capture - Support `add_note(key, value)` for arbitrary annotations - Export singleton `startup_metrics` for app-wide use ### 2. Instrument vector store initialization **File:** `backend/app/vector_store.py` - Import `startup_metrics` - In `VectorStore.__init__()`, wrap initialization with `time.time()` timer - Record event with metadata: `host`, `port`, `collection`, `documents` - Catch connection errors and annotate them ### 3. Instrument FastAPI startup sequence **File:** `backend/app/main.py` - Call `startup_metrics.mark_app_started()` at beginning of `on_startup()` - Wrap each phase (DB init, schedulers, cache preload, RSS refresh, migration) with `record_event()` - Include metadata: `cache_size`, `article_count`, `oldest_article_hours` - Call `startup_metrics.mark_app_completed()` at end - Add app version notes via `add_note()` ### 4. Add database pagination helpers **File:** `backend/app/database.py` - Implement `fetch_articles_page()` to support: - Limit/offset pagination - Optional source filter - Missing-embeddings-only flag - Published date range filters - Sort direction (asc/desc) - Return oldest/newest timestamp bounds - Implement `fetch_article_chroma_mappings()` to return all article→chroma ID mappings for drift analysis ### 5. Add vector store pagination helpers **File:** `backend/app/vector_store.py` - Implement `list_articles(limit, offset)` to return paginated Chroma documents with metadata and previews - Implement `list_all_ids()` to return all stored Chroma IDs for drift detection (used by `/debug/storage/drift`) ### 6. Expose debug API endpoints **File:** `backend/app/api/routes/debug.py` - Add `GET /debug/startup` → returns startup metrics timeline (events + notes) - Add `GET /debug/chromadb/articles` → returns paginated raw Chroma entries with limit/offset - Add `GET /debug/database/articles` → returns paginated Postgres rows with filters (source, embeddings, date range, sort) - Add `GET /debug/storage/drift` → compares Chroma IDs vs Postgres mappings, returns missing/dangling counts + samples ### 7. Add frontend API bindings **File:** `frontend/lib/api.ts` - Export types: `StartupEventMetric`, `StartupMetricsResponse`, `ChromaDebugResponse`, `DatabaseDebugResponse`, `StorageDriftReport` - Export fetchers: `fetchStartupMetrics()`, `fetchChromaDebugArticles()`, `fetchDatabaseDebugArticles()`, `fetchStorageDrift()` - Ensure snake_case→camelCase mapping for response fields ### 8. Build debug dashboard page **File:** `frontend/app/debug/page.tsx` - Create `/debug` route with multi-tab inspection UI - Render startup timeline: phase name, duration, metadata badges (cache size, vectors, migrated records) - Display Chroma browser: paginated table with ID, title, source, preview - Display Postgres browser: paginated table with filters (source, date range, missing-embeddings-only flag) - Display drift report: sample tables for missing-in-chroma and dangling-in-chroma entries - Include summary cards for quick metrics (boot time, total articles, vector count, drift count) ## Implementation checklist - [ ] Create `backend/app/services/startup_metrics.py` - [ ] Instrument `backend/app/vector_store.py::VectorStore.__init__()` - [ ] Instrument `backend/app/main.py::on_startup()` (all phases) - [ ] Add `fetch_articles_page()` and `fetch_article_chroma_mappings()` to `backend/app/database.py` - [ ] Add `list_articles()` and `list_all_ids()` to `backend/app/vector_store.py` - [ ] Add `/debug/startup`, `/debug/chromadb/articles`, `/debug/database/articles`, `/debug/storage/drift` to `backend/app/api/routes/debug.py` - [ ] Add types and fetchers to `frontend/lib/api.ts` - [ ] Create `frontend/app/debug/page.tsx` with dashboard layout - [ ] Run `uvx ruff check backend` → all checks pass - [ ] Test endpoints in curl or Postman to verify response structure ## Verification checklist - [ ] `GET http://localhost:8000/debug/startup` returns valid timeline with events and notes - [ ] `GET http://localhost:8000/debug/chromadb/articles?limit=50&offset=0` returns paginated Chroma docs - [ ] `GET http://localhost:8000/debug/database/articles?source=bbc&missing_embeddings_only=false` filters correctly - [ ] `GET http://localhost:8000/debug/storage/drift` compares counts and returns drift samples - [ ] `http://localhost:3000/debug` loads without errors and displays all four sections - [ ] Refresh button triggers all four API calls in parallel - [ ] Pagination controls update limit/offset correctly - [ ] Database filters (source, date range) update and refresh data - [ ] Startup timeline shows non-zero phase durations if backend just started ## Future enhancements - Streaming startup metrics via SSE (live tail during boot) - Export startup report as JSON/CSV for performance tracking over time - Automated drift alerts (post to Slack/email if dangling > threshold) - Performance graphs (startup time trends, article throughput) - Sync-on-demand action (button to force vector store refresh for missing articles)