---
title: Claude Token Optimization — Analysis & Action Plan
date: 2026-04-02
sources:
  - "https://www.youtube.com/watch?v=5ztI_dbj6ek — Your Claude Limit Burns In 90 Minutes Because Of One ChatGPT Habit (Nate B Jones)"
  - "https://www.youtube.com/watch?v=49V-5Ock8LU — 18 Claude Code Token Hacks in 18 Minutes (Nate Herk)"
transcripts: analysis/Transcripts/combined-token-usage-videos-20260402.md
---

# Claude Token Optimization — Analysis & Action Plan

## Source Videos

| # | Title | Creator | Tips Extracted |
|---|-------|---------|---------------|
| V1 | Your Claude Limit Burns In 90 Minutes Because Of One ChatGPT Habit | Nate B Jones | 18 |
| V2 | 18 Claude Code Token Hacks in 18 Minutes | Nate Herk | 23 |
| | **Total** | | **41** |

---

## Part 1: Combined Tip List

### Category: CLAUDE.md / System Prompt

| # | Source | Tip | Confidence |
|---|--------|-----|-----------|
| SP-1 | V1-9 | Prune system prompt regularly — especially instructions written for older, less capable models | HIGH |
| SP-2 | V1-10 | Don't load entire repo into context if you haven't tested whether it's still necessary | HIGH |
| SP-3 | V1-12 | For API builders: enable prompt caching for stable context (system prompts, tool definitions, reference docs) | HIGH |
| SP-4 | V2-12 | Keep CLAUDE.md under 200 lines — treat it as an index, not a document. Point to files by path rather than embedding content. | HIGH |
| SP-5 | V2-21 | Use CLAUDE.md as a "systems constitution" — store stable architecture decisions, not conversations | HIGH |
| SP-6 | V2-22 | Add token-aware routing rules to CLAUDE.md: use Haiku sub-agents for multi-file exploration | MEDIUM |
| SP-7 | V2-23 | Add an "applied learning" section to CLAUDE.md that self-updates with one-line bullets on repeated failures (watch for bloat) | LOW-MEDIUM |

### Category: Conversation Discipline

| # | Source | Tip | Confidence |
|---|--------|-----|-----------|
| CD-1 | V1-3 | Start fresh sessions every 10–15 turns — every turn re-reads the entire history | HIGH |
| CD-2 | V1-6 | If exploratory, declare intent at top and summarize before switching to execution | HIGH |
| CD-3 | V2-1 | Use /clear between unrelated tasks — don't carry context from topic A into topic B | HIGH |
| CD-4 | V2-5 | Edit original message and regenerate instead of sending follow-up corrections | HIGH |
| CD-5 | V2-14 | Run /compact manually at ~60% context capacity, not at the default 95% auto-compact threshold | HIGH |
| CD-6 | V2-15 | Compact or clear before stepping away — prompt cache expires after 5 minutes, causing full re-read on return | MEDIUM |

### Category: File & Input Hygiene

| # | Source | Tip | Confidence |
|---|--------|-----|-----------|
| FH-1 | V1-1 | Convert documents to Markdown before feeding them to Claude (can reduce 100k+ tokens → 4-6k for a PDF) | HIGH |
| FH-2 | V1-2 | Avoid screenshots when text will do — screenshots are "terribly inefficient" | HIGH |
| FH-3 | V2-10 | Paste only the relevant section, not the whole file | HIGH |
| FH-4 | V2-13 | Name specific files and functions in prompts — don't say "check the repo," say "check verify_user() in auth.js" | HIGH |

### Category: Workflow Design

| # | Source | Tip | Confidence |
|---|--------|-----|-----------|
| WF-1 | V1-4 | Separate exploration/thinking sessions from execution sessions — never mix the two | HIGH |
| WF-2 | V1-5 | Front-load your intent so the model can act in a single pass without clarification turns | HIGH |
| WF-3 | V1-11 | Match model tier to task — Opus for reasoning, Sonnet for execution, Haiku for polish | MEDIUM |
| WF-4 | V1-17 | Instrument every agent call: track input tokens, output tokens, model mix, cost ratio | HIGH (agent builders) |
| WF-5 | V2-4 | Batch multi-step instructions into a single prompt message | MEDIUM |
| WF-6 | V2-6 | Use plan mode before any real task to prevent wrong-path token waste | HIGH |
| WF-7 | V2-7 | Run /context and /cost to make invisible token consumption visible | HIGH |
| WF-8 | V2-8 | Set up a terminal status line to see real-time context usage as a progress bar | MEDIUM |
| WF-9 | V2-9 | Keep usage dashboard open; check every 20-40 minutes to pace yourself | LOW |
| WF-10 | V2-11 | Watch Claude work in real time and interrupt if it goes off track | HIGH |
| WF-11 | V2-17 | Pick right model per job: Sonnet for coding, Haiku for sub-agents/simple tasks, Opus sparingly (<20% of usage) | MEDIUM |
| WF-12 | V2-18 | Use sub-agents deliberately — delegate one-off tasks to Haiku; avoid multi-agent teams unless necessary | HIGH |
| WF-13 | V2-19 | Schedule heavy sessions and multi-agent runs for off-peak hours (afternoons, evenings, weekends) | LOW |
| WF-14 | V2-20 | Go heavy when near a reset and budget remains; step away when near limit with time remaining | MEDIUM |

### Category: Tool Use

| # | Source | Tip | Confidence |
|---|--------|-----|-----------|
| TU-1 | V1-7 | Audit and prune plugins/connectors — each injects tokens into every turn whether used or not | HIGH |
| TU-2 | V1-8 | Run /context before typing — see what's loaded at session start | HIGH |
| TU-3 | V1-13 | Route web searches through Perplexity (via MCP) rather than native Claude browsing — saves 10-50k tokens/search | HIGH |
| TU-4 | V2-2 | Disconnect MCP servers not actively in use | HIGH |
| TU-5 | V2-3 | Prefer CLIs over MCP servers when a CLI equivalent exists | MEDIUM |
| TU-6 | V2-16 | Deny permissions for shell commands that produce large outputs Claude doesn't need | HIGH |

### Category: Agent Memory / Retrieval

| # | Source | Tip | Confidence |
|---|--------|-----|-----------|
| AM-1 | V1-14 | Use indexed retrieval — never dump full document sets into the context window on every call | HIGH |
| AM-2 | V1-15 | Pre-process and pre-summarize reference documents before they enter context | HIGH |
| AM-3 | V1-16 | Scope each agent's context to the minimum it needs for its specific role | HIGH |

### Category: Infrastructure

| # | Source | Tip | Confidence |
|---|--------|-----|-----------|
| IN-1 | V1-18 | Build guardrails infrastructure: auto-Markdown conversion, index-first retrieval, minimum-viable context scoping | HIGH (teams) |

---

## Part 2: Adversarial Debate — Cluster Verdicts

### Cluster 1: Conversation Hygiene
**Verdict: HIGH confidence** — compaction and fresh-start discipline are correct as general principles. The 60% manual compact rule is actionable; the 10-15 turn rule is too coarse. Context entropy, not a counter, is the real signal. The "edit original and regenerate" tip (CD-4) is underused and eliminates a correction turn entirely.

### Cluster 2: File & Input Hygiene
**Verdict: HIGH confidence** — strongest and cheapest wins in the list. FH-3 (paste only relevant section) and FH-4 (name specific files) require no tooling. FH-1 (Markdown conversion) requires quality conversion tooling — naive conversion can be worse than the original. FH-2 has legitimate exceptions for visual/UI problems.

### Cluster 3: Tool & Plugin Discipline
**Verdict: HIGH confidence in principle, MEDIUM for tactics** — auditing tool lists is clearly correct. MCP-vs-CLI preference is useful but not absolute. TU-6 (deny large-output shell commands) is high-ROI and underappreciated. The cognitive overhead of frequent MCP lifecycle management is a real cost — set up defaults, not per-session toggling.

### Cluster 4: System Prompt & CLAUDE.md
**Verdict: HIGH confidence for pruning (SP-1, SP-4); MEDIUM for index pattern (SP-5); LOW-MEDIUM for self-updating applied learning section (SP-7)**. The 200-line ceiling is a proxy metric, not a mechanistic limit — optimize for signal density, not line count. The applied learning section (SP-7) introduces a feedback loop with no clear governance — monitor aggressively for drift.

### Cluster 5: Workflow Design
**Verdict: HIGH for WF-6 (plan mode) and WF-10 (watch and interrupt)**. These have clear, measurable payoffs. MEDIUM for explore/execute separation (WF-1) — correct as principle, impractical as strict rule. Batching (WF-5) is better for experienced users with well-formed prompts.

### Cluster 6: Model Tier Routing
**Verdict: HIGH confidence in principle; MEDIUM for specific assignments** — tier labels are snapshots that change as models improve. The sub-agent Haiku pattern (WF-12) is the most concrete application. Calibrate routing based on observed error rate per model per task, not generic tier labels.

### Cluster 7: Visibility & Instrumentation
**Verdict: HIGH for /context + /cost (TU-2, WF-7); MEDIUM for status line (WF-8); LOW for dashboard every 20-40 minutes (WF-9)**. Dashboard-watching is reactive and breaks flow — ambient monitoring (status line) is strictly better.

### Cluster 8: Retrieval & Memory Architecture
**Verdict: HIGH confidence, LOW daily relevance for individual Claude Code users** — these are production agentic system tips. Correct and important for building PPA/PhoneBuddy pipelines, not for daily conversational use.

### Cluster 9: Usage Pacing
**Verdict: MEDIUM for WF-20 (session reset strategy); LOW for WF-13 (off-peak scheduling)** — the off-peak performance claim is undocumented by Anthropic. Rate limit behavior is plan-dependent, not load-based.

### Cluster 10: Prompt Caching & Infrastructure
**Verdict: HIGH confidence, LOW relevance for Claude Code users; HIGH relevance for API builders** — prompt caching is an API feature. Irrelevant to conversational Claude Code use. Critical for ppa-api Phase 2.

---

## Part 3: Master Top 5 Tips (Highest Daily ROI)

| Rank | Tip | Why |
|------|-----|-----|
| **#1** | CD-5: /compact at 60%, not 95% | Purely mechanical, zero setup, immediately measurable. Better context quality per session. |
| **#2** | FH-3: Paste only the relevant section | Highest-leverage behavior with lowest implementation cost. 98% token reduction on file inputs. |
| **#3** | WF-6: Plan mode before any real task | Eliminates the most expensive failure mode — executing confidently in the wrong direction. |
| **#4** | WF-10: Watch and interrupt in real time | Catching a wrong path at turn 3 vs turn 15 is a 3-7x token multiplier difference. No setup required. |
| **#5** | SP-1 + SP-4: Prune system prompt / Keep CLAUDE.md lean | Compounds permanently across every future session. Highest long-term ROI of any single action. |

**Honorable mention:** CD-4 (edit original and regenerate) — eliminates correction turns entirely when caught early enough.

---

## Part 4: Application to This Codebase

### Current Token Budget at Session Start (RoadTrip workspace)

| Source | ~Tokens | Action |
|--------|---------|--------|
| Deferred tools | 10.1k | Unavoidable framework overhead |
| RoadTrip CLAUDE.md | 2.5k | **Reduce: 268 lines → target 170** |
| MEMORY.md (auto-mem) | 2.7k | **Prune stale implementation-status entries** |
| System tools | 7.9k | Unavoidable |
| Skills manifest | 1k | Acceptable |
| MCP (Railway) | 2.8k | **Disconnect during non-Railway sessions** |
| **Total** | **~27k** | **Target: ~22k** (save ~5k per session) |

---

### RoadTrip (research/planning workspace) — Biggest Opportunity

**CLAUDE.md: 268 lines → target 170 lines**

Bloat candidates:

| Section | ~Lines | Why |
|---------|--------|-----|
| Documentation Style Guide (emojis, tone, example pattern) | ~30 | Aesthetic preference. Not correctness-critical. Move to a reference file. |
| Quick Reference + Common Scenarios examples | ~25 | Redundant with the alias descriptions above them |
| "Shell Integration Ready" note | ~5 | Informational, not instructional |
| PowerShell alias full usage examples (gpush, gpush-dry, etc.) | ~40 | Point to `infra/RoadTrip_profile.ps1` instead of documenting inline |
| Plan Validation Process 5-step | ~15 | Better as a SKILL.md entry or linked document than always-on context |
| Session Logging alias list (log-help, log-start, log-end) | ~10 | Move to profile.ps1 comments |

**MEMORY.md: prune stale entries**

Entries that are code-derivable (not worth loading every session):
- Implementation status bullets (rules-engine DONE, auth-validator DONE, etc.) — read the code
- Python environment paths — run `which python`
- "Next session" goal entries that are past their session

**Action:** Move implementation status tracking out of MEMORY.md and into a `docs/status.md` that's not auto-loaded.

**MCP servers:** Railway MCP is 2.8k tokens injected every turn. Only connect when doing Railway work. For research-only sessions, disconnect at session start.

---

### PhoneBuddy (production) — Medium Opportunity

**CLAUDE.md: 100 lines — acceptable but trimmable**

| Section | ~Lines | Recommendation |
|---------|--------|---------------|
| File Structure tree | ~15 | Remove — derivable from `ls`. Just say "flat structure, no src/ nesting" |
| Docker + Azure Container Apps deploy sections | ~15 | Rarely needed. One line: "see Dockerfile for non-Railway deploy options" |
| Local Dev block | ~10 | README territory. Remove from CLAUDE.md. |

Estimated savings: 25-30 lines → from 100 to ~70.

**Production system prompt (highest-ROI change):**
PhoneBuddy uses Claude Haiku for call classification. The classification prompt in `main.py` was likely written for an older model version. Haiku's current capabilities mean:
- Multi-shot examples can often be reduced or removed
- Explicit "think step by step" scaffolding is less necessary
- Intent classification instructions can be shorter
- **Action:** Audit the classification prompt in `main.py`. Test with a 30% shorter prompt — if quality holds, the per-call cost drops proportionally. In production, this compounds across every call received.

---

### ppa-api (production) — Minimal Immediate Opportunity, High Future Impact

**CLAUDE.md: 54 lines — already lean. Leave as is.**

**Phase 2 preparation: enable prompt caching**

When Phase 2 adds SRCGEEE pipeline calls, the following should be cached at the API layer:
- System prompt / pipeline role definition
- Tool definitions
- SRCGEEE phase descriptions
- Static reference context (routing rules, agent constitution)

Per Nate B Jones: "Prompt caching can give you a 90% discount on repeated content... $0.50/M vs $5/M for Opus standard." For a pipeline that runs many times per day, this is the single highest-ROI architectural decision.

**Indexed retrieval for Phase 2:**
When Phase 2 adds Retrieve phase, implement indexed retrieval (AM-1) — never dump full history into each call. Retrieve only the relevant caller history chunks.

---

## Part 5: Immediate Action Items

Priority ordered by ROI × ease:

| # | Action | Where | Impact |
|---|--------|--------|--------|
| 1 | Add `/compact at 60%` habit to personal workflow | Personal habit | Immediate |
| 2 | Trim RoadTrip CLAUDE.md from 268 → ~170 lines | `RoadTrip/CLAUDE.md` | Every session |
| 3 | Prune MEMORY.md — remove implementation-status entries | `memory/MEMORY.md` | Every session |
| 4 | Disconnect Railway MCP during non-Railway sessions | MCP settings | Per session |
| 5 | Audit PhoneBuddy classification prompt in `main.py` | `PhoneBuddy/main.py` | Per API call |
| 6 | Add prompt caching to ppa-api Phase 2 plan | Phase 2 design | Future |
| 7 | Trim PhoneBuddy CLAUDE.md from 100 → ~70 lines | `PhoneBuddy/CLAUDE.md` | Every session |
| 8 | Move Plan Validation Process from CLAUDE.md to SKILL.md | `RoadTrip/CLAUDE.md` | Every session |

---

## Appendix: Key Quotes

> "Every time you take a turn in a conversation, you read it as sending one line back. But Claude reads it as sending the entire conversation back." — Nate B Jones

> "Keep [CLAUDE.md] under 200 lines. Treat this like an index route to where more data lives." — Nate Herk

> "Passing everything to every agent is architectural laziness and it has real costs both in tokens burned and frankly in degraded agent performance." — Nate B Jones

> "Most people don't need a bigger plan — they need to stop resending their entire conversation history 30 times. It's not a limits problem. It's a context hygiene problem." — Nate Herk

> "If your system prompt, your tool definitions, your reference documents aren't cached, what are you doing?" — Nate B Jones