# Paper 3: Context Enrichment Hypothesis in Agentic Tool Pipelines **Status:** draft **Target venue:** EMNLP 2026 (short paper) / findings track **Authors:** Andrei Mazniak --- ## Problem When an LLM agent receives a list of issues or merge requests with thin per-item context (few characters per item), it makes additional follow-up tool calls to enrich that context — fetching comments, linked issues, epic details, etc. This enrichment overhead is implicit, uncontrolled, and contributes to token cost and latency. We call this the **Context Enrichment Hypothesis**: > Items with low chars_per_item trigger more enrichment tool calls than items with high > chars_per_item, because the agent needs more information before it can act. ## Empirical Evidence Analysis of 523 real Claude Code sessions with 10,644 MCP tool responses: **get_issues** (87 records with known item count): | chars/item bucket | N | E[enrichment] | % turns with enrichment | |-------------------|---|---------------|-------------------------| | tiny < 200 | 7 | 0.43 | **43%** | | med 500–1.5k | 21 | 0.29 | 14% | | large 1.5k–4k | 55 | 0.02 | **2%** | Pearson r(chars_per_item, enrichment_count) = **−0.280** Direction confirmed: thin context → more enrichment. **get_merge_requests** (196 records): | chars/item bucket | N | E[enrichment] | top enrichment tool | |-------------------|---|---------------|---------------------| | tiny < 200 | 53 | 1.74 | get_merge_request_discussions (127% of turns) | | small 200–500 | 143 | 1.10 | same | Agents almost always drill into discussions after seeing MR lists, regardless of context size — suggesting `get_merge_request_discussions` should be included in the primary response. ## Formal Definition ``` chars_per_item(response) = total_chars(response) / items_shown(response) enrichment_call = a tool call in the same turn as the primary call, where tool ∈ ENRICHMENT_TOOLS[primary_tool] E[enrichment] = mean enrichment_count across all invocations of primary_tool p(enrichment) = fraction of invocations with ≥ 1 enrichment call ``` Tool-specific enrichment sets: - `get_issues` → {`get_issue`, `get_issue_comments`, `get_issue_relations`, `get_epics`} - `get_merge_requests` → {`get_merge_request_discussions`, `get_merge_request_diffs`} - `get_meeting_notes` → {`get_meeting_transcript`, `search_meeting_notes`} ## Intervention: Preemptive Enrichment If the hypothesis holds, the optimal server behavior is: 1. Detect thin context (chars_per_item < threshold) 2. Automatically inline enrichment data in the primary response 3. Measure reduction in E[enrichment_calls] with vs without preemptive enrichment This connects to Paper 2: the MCKP encoder can treat enrichment fields as low-cost nodes that get included when the budget allows and the primary items are thin. ## Experiments 1. **Hypothesis validation** — larger dataset with more sessions. Target: 500+ records per tool, stratified by chars/item. Metric: Pearson r, Spearman ρ, Mann-Whitney U (thin vs rich groups). 2. **Causal study** — ablation using τ-bench: Condition A: thin initial response (no descriptions) Condition B: rich initial response (full descriptions) Measure: number of enrichment tool calls to complete the task. 3. **Preemptive enrichment** — modify devboy MCP to inline comments when chars_per_item < 300. Compare E[enrichment_calls] before and after. 4. **Cross-tool generalization** — test hypothesis on meeting tools: `get_meeting_notes` (thin) → `get_meeting_transcript` follow-up rate vs `get_meeting_notes` (rich). ## Key Claims 1. Context Enrichment Hypothesis holds: Pearson r < −0.25 for `get_issues` (confirmed empirically) 2. `get_merge_request_discussions` should be bundled with `get_merge_requests` (127% co-occurrence) 3. Preemptive enrichment when chars_per_item < 300 reduces E[enrichment_calls] by ≥ 30% ## Implementation Status - [x] `context-enrichment` CLI command in track-claude-usage - [x] Tool-specific enrichment sets - [x] Pearson r computation - [x] Markdown table item count parser - [ ] Larger dataset collection (need 500+ records per tool) - [ ] Causal study design on τ-bench - [ ] Preemptive enrichment implementation in devboy-tools ## Data Collection `track-claude-usage context-enrichment --tool get_issues --format csv > enrichment_data.csv` Current dataset: 87 records for `get_issues`, 196 for `get_merge_requests`. Target for submission: ≥ 500 per tool (needs ~3 months of additional sessions). ## Related Work - Retrieval-Augmented Generation (RAG): related in that both address "not enough context" - Chain-of-thought tool use (ReAct, Toolformer): agents decide when to call tools - APIBank: evaluates API call accuracy, not enrichment pattern - τ-bench: closest execution environment for causal study