---
name: tune
description: Find the chunk-size and retrieval config that maximizes quality on YOUR corpus. Sweeps chunk sizes, runs a golden-query set, reports nDCG / MRR / Hit@5. Use after first ingest or whenever your corpus changes shape significantly.
disable-model-invocation: true
---

# Tune

Every RAG system has a distribution where it breaks. The only way to
know whether gnosis-mcp's defaults are right for *your* corpus is to
measure them against *your* queries. This skill runs that measurement.

## Usage

```
/gnosis:tune                       # Quick sweep (5 chunk sizes, keyword mode)
/gnosis:tune full                  # Extended sweep (keyword + hybrid + rerank on/off)
/gnosis:tune --golden ./q.jsonl    # Use a specific golden-query file
```

## Mode: $ARGUMENTS

---

## Prerequisites

You need:

1. A corpus — anywhere on disk. Will be passed to `gnosis-mcp ingest`.
2. A golden-query file — one JSON object per line:

   ```json
   {"query": "how does our auth work", "expected_paths": ["docs/auth", "architecture/auth"]}
   {"query": "stripe webhook failure runbook", "expected_paths": ["runbooks/stripe"]}
   ```

   `expected_paths` uses substring match against the returned `file_path`,
   case-insensitive. Generous enough that you don't need exact path
   memorization.

20 hand-written queries is enough to get signal. 50 is plenty.

---

## Quick sweep (default)

Runs `gnosis-mcp ingest` 5 times with different chunk sizes, scoring
each with your golden set. Keyword mode only (fast, no embedding cost).

```bash
# Default path if none given: ./docs + ./golden.jsonl
CORPUS=${CORPUS:-./docs}
GOLDEN=${GOLDEN:-./golden.jsonl}

for size in 1000 1500 2000 2500 3000; do
  uv run --with 'gnosis-mcp[embeddings] @ gnosis-mcp' \
    python tests/bench/bench_real_corpus.py \
      --corpus "$CORPUS" --golden "$GOLDEN" \
      --modes keyword --chunk-size $size \
      --out bench-results/tune-chunk${size}.json
done
```

(If gnosis-mcp was installed via pip rather than from source, invoke
`bench_real_corpus.py` from the repo you cloned, and drop the
`uv run --with '... @ .'` prefix.)

Expected runtime: 5–10 minutes per size on a typical laptop (ingest is
the dominant cost; scoring itself is sub-second).

Output table:

```
chunk  chars │ nDCG@10 │ MRR    │ Hit@5  │ p95      │ ingest
────────────┼─────────┼────────┼────────┼──────────┼────────
       1000 │ 0.8557  │ 0.8067 │ 0.92   │ 30 ms    │ 592 s
       1500 │ 0.8529  │ 0.7967 │ 0.92   │  7 ms    │ 234 s
       2000 │ 0.8702  │ 0.7933 │ 0.92   │  7 ms    │ 210 s   ← peak
       2500 │ 0.8602  │ 0.7880 │ 0.92   │  7 ms    │ 195 s
       3000 │ 0.8459  │ 0.7880 │ 0.92   │  7 ms    │ 182 s
```

Pick the chunk size with the best nDCG@10 that doesn't blow your
ingest-time budget. Persist it:

```bash
# In your shell profile / .env
export GNOSIS_MCP_CHUNK_SIZE=2000
```

If multiple sizes tie at the top (plateau), pick the **larger** one —
same quality, fewer chunks, faster ingest.

---

## Full sweep (`full`)

Adds hybrid mode and optional reranker A/B — ~30-40 minutes depending
on corpus size and whether `[reranking]` is installed.

```bash
for size in 1500 2000 3000; do
  for mode in keyword hybrid hybrid+rerank; do
    uv run --with 'gnosis-mcp[embeddings,reranking] @ gnosis-mcp' \
      python tests/bench/bench_real_corpus.py \
        --corpus "$CORPUS" --golden "$GOLDEN" \
        --modes "$mode" --chunk-size $size \
        --out "bench-results/tune-${mode//+/_}-${size}.json"
  done
done
```

### What to read from the output

1. **nDCG delta between keyword and hybrid**: if < 0.02, your corpus is
   vocabulary-matched (queries share exact terms with docs). Dense
   retrieval adds latency without lift — leave hybrid off.

2. **nDCG delta when reranker is added**: on dev docs specifically, we
   measure -27 nDCG (!). If you see a similar drop, your corpus has
   the same distribution problem — disable
   `GNOSIS_MCP_RERANK_ENABLED`. If it's +3-10 on your corpus, keep it
   on.

3. **Chunk-size peak location**: tells you the typical topic-coherent
   block length in your corpus. API-reference-heavy corpora peak
   lower (1000-1500 chars). Long-form prose peaks higher (2500-3000).

---

## Programmatic golden-set from existing access logs

If you've been using gnosis-mcp for a while, real queries are already
logged. Generate a seed golden file from them:

```bash
sqlite3 ~/.local/share/gnosis-mcp/docs.db \
  "SELECT DISTINCT query FROM search_access_log
   WHERE timestamp > date('now', '-30 days')
   ORDER BY count(*) DESC LIMIT 30;" \
  | awk '{print "{\"query\":\""$0"\", \"expected_paths\": []}"}' > golden-seed.jsonl
```

Then fill in `expected_paths` by hand — 5 minutes of work, gives you a
golden set grounded in how you actually use the system.

---

## Automate the winner

After picking a chunk size and reranker/hybrid choice, persist it so
every `gnosis-mcp ingest` (manual or via `--watch`) uses the tuned
settings:

```bash
# Persistent across processes
export GNOSIS_MCP_CHUNK_SIZE=2000
export GNOSIS_MCP_RERANK_ENABLED=false  # if tune showed it hurts

# Re-ingest with the new config
gnosis-mcp ingest ./docs --embed --wipe
```

---

## When to re-tune

- Corpus grows 2× or more
- You add a fundamentally different content type (e.g., API references
  added to a prose-only corpus, or vice versa)
- You switch embedder (`GNOSIS_MCP_EMBED_MODEL`) — the semantic side
  of hybrid now behaves differently
- After a new release of gnosis-mcp that touched chunking or fusion

Otherwise: tune once per year. It's not free but it's not frequent.

---

## See also

- [bench-experiments-2026-04-18](https://gnosismcp.com/doc/docs/bench-experiments-2026-04-18)
  — our own tune results with the methodology that informed these
  defaults
- [how-we-measure-search](https://gnosismcp.com/doc/docs/how-we-measure-search)
  — what these metrics mean, in plain English
- `/gnosis:ingest` — re-ingest after you've picked the winner