--- name: bug-hunting description: Structured workflow for discovering, reproducing, and fixing bugs in Searchlite by inspecting real code paths (CLI/HTTP/core), validating durability invariants, and producing regression tests; use when investigating incorrect results, crashes, data loss risks, or “it sometimes fails” reports. --- # Bug Hunting Workflow ## Intent This skill is for **finding real bugs in the implementation**, not guessing. It forces a discovery phase (build a mental model, reproduce, narrow scope) before making changes. You must prioritize: 1. correctness and durability 2. clear reproduction 3. minimal fix + regression test --- ## Non-negotiable rules - **No fix without a reproduction** (a failing test, a minimal repro program, or a deterministic HTTP/CLI script). - **No durability regressions**: do not weaken WAL/manifest/atomic rename/fsync ordering. - **No “drive-by refactors”** while bug hunting; keep patches tight and reviewable. - After changes, you must run the full quality suite from `code-quality`. --- ## Discovery phase (required) ### 0) Establish the boundaries Before opening files, answer (in notes) these questions: - Which surface is affected? **CLI**, **HTTP**, **embedded API**, **wasm**, or **ffi**? - Is it a **correctness** issue (wrong results), **durability** issue (lost/duplicated data), or **stability** issue (panic/hang)? - Does it require a specific feature flag (`vectors`, `zstd`, `gpu`, wasm/IndexedDB, ffi)? - Is the failure deterministic or intermittent? ### 1) Baseline the repo (don’t trust your local state) Run in workspace root: - `cargo fmt --all` - `cargo clippy --all --all-features --all-targets -- -D warnings` - `cargo test --all --all-features` If anything fails here, you may be chasing a known failure; fix/resolve that first. ### 2) Map the architecture you’re about to debug Create a quick map in notes (1–2 minutes): - `searchlite-core`: schema, indexing, WAL/manifest, segments, query execution, filters/aggs, vectors - `searchlite-cli`: user-facing commands and JSON output formatting - `searchlite-http`: endpoints (/init, /add, /bulk, /delete, /commit, /refresh, /search, /inspect, /stats, /compact), limits/timeouts/concurrency - `searchlite-wasm`: bindings + IndexedDB storage semantics (no fsync) - `searchlite-ffi`: C ABI / safety edges This map guides where you look first. ### 3) Reproduce the bug with the smallest possible corpus Produce a “minimal reproducible case” artifact set: - a schema JSON (small) - a small NDJSON/doc list (5–50 docs) - one request (HTTP JSON or CLI command) that fails For HTTP: - include the server flags/env (bind addr, index path, max-body-bytes, max-concurrency, request-timeout, refresh-on-commit) For CLI: - include exact commands and any flags **Goal:** a script a teammate can run in <60 seconds. ### 4) Turn on observability (don’t add prints yet) Use existing debugging features first: - For search correctness: add `explain: true` to the request. - For performance/odd slowdowns while reproducing: add `profile: true`. - For server behavior: run with `RUST_LOG=debug` (or more targeted module filters) and capture the relevant lines. If your issue is “writes not visible,” verify lifecycle: - you must `commit` - you may need `/refresh` in HTTP depending on config ### 5) Narrow the fault line by bisecting the pipeline Pick one axis at a time: **Correctness axis** - Does the doc get ingested successfully? - Does it appear in `/stats` documents? - Does `/inspect` show a new segment after commit? - Does the query match in bm25 mode vs wand/bmw mode? **Durability axis** - Is WAL appended correctly? - Is manifest updated atomically? - Is WAL truncated only after manifest persistence? - After restart, does replay do the expected thing? **HTTP axis** - Is the request being rejected due to max body / timeouts / concurrency? - Is NDJSON streaming partial and rolling back? - Are errors correctly surfaced as `{"error":{"type","reason"}}`? ### 6) Perform a targeted static sweep of the suspected modules Once you know the rough area, do a deliberate scan for bug smells: Search patterns to grep/ripgrep for in the relevant crate: - panics: `unwrap()`, `expect(`, `panic!`, `unreachable!`, `todo!` - suspicious error handling: `map_err(|_|`, `ok()?` in non-optional contexts, silent `let _ =` - integer edge cases: casts (`as usize`), underflow (`- 1`), unchecked indexing - concurrency hazards: `Mutex`/`RwLock` lock ordering, `.await` while holding locks, channels without bounds - IO hazards: missing `fsync`, non-atomic file replace, partial writes not handled, directory fsync omissions - schema mismatch hazards: stringly-typed field lookups, analyzer lookup defaults, missing nested path validation - vector hazards: dimension mismatch, normalization assumptions, unchecked candidate sizing, `NaN` propagation You are not “fixing” in this step—just identifying suspects. ### 7) Dynamic bug probes by subsystem (choose what fits) #### A) WAL / manifest / commit / rollback If you suspect commit/durability issues: - Reproduce a crash-window scenario: 1. ingest docs 2. start commit 3. force termination mid-way (or simulate via test hooks if present) 4. restart and observe replay - Validate the invariants: - manifest atomic update (rename) and directory sync - WAL truncation happens after manifest sync - rollback cleans new segment artifacts If you can’t safely crash in tests, create a unit test around writer/commit phases and validate state. #### B) Segment lifecycle / compaction / cleanup If you see duplicates, ghosts, or bloat: - Inspect segments via `inspect` and compare to `stats` - Run compaction and verify: - live docs preserved - deleted/tombstoned dropped - old segment files removed (or expected to remain until cleanup) #### C) Query execution modes If you suspect scoring/pruning bugs: - Run the same query with `execution=bm25`, `execution=wand`, `execution=bmw` - Results should match for deterministic corpora (unless the contract explicitly says otherwise) - If they diverge, you likely have a pruning-bound bug or block-max metadata mismatch #### D) Filters/aggs/nested correctness If filters/aggs are wrong: - Verify required fields are `fast: true` - For nested, verify the filter uses a `Nested` block and that clauses bind to the same object instance - Add a repro doc with two nested objects that would catch “cross-object” leakage #### E) Vectors (feature `vectors`) If vector behavior is wrong: - Check dimension mismatch handling - Check cosine normalization expectations - Confirm ANN parameters (`candidate_size`, `ef_search`) don’t overflow or accept nonsensical values - Verify hybrid search doesn’t double-count or drop vector scores --- ## Fix phase (required) ### 1) Write the failing regression test first - Prefer a unit test in the most local crate if possible. - If the bug is surface-level behavior (HTTP contract, CLI output), add an integration test. - Make it deterministic and small. - Assert on structure, not strings (parse JSON output). ### 2) Implement the minimal fix - Tight patch, minimal diff. - Keep durability semantics intact. - Avoid introducing allocations/locks in hot paths unless justified. ### 3) Prove it Run: - `cargo test --all --all-features` - Any relevant examples (e.g. pruning example if you touched pruning logic) - If the bug touched performance-sensitive code, run the relevant bench before/after. ### 4) Add a “root cause” note In the PR description or internal notes, include: - symptom - root cause - why the fix is correct - what test prevents regressions --- ## Output format (what Codex should produce) When using this skill, produce: 1. **Repro artifact** (schema + docs + request/command) 2. **Fault localization** (files + functions involved) 3. **Root cause analysis** 4. **Fix plan** (minimal changes) 5. **Regression test** (where it lives and what it asserts) 6. **Validation commands run** (from code-quality)