# Performance & Cost

All numbers below are **measured**, not estimated, using `tiktoken` (cl100k_base)
for token counts and `tools/discovery_tax.py` for the discovery model. Reproduce
any of them with the commands shown.

## Extraction (real conversions)

Measured with `pdftotext` (PDF) and `ebooklib` (EPUB):

| Book | Format | Pages | Tokens | Chapters auto-detected |
|------|--------|------:|-------:|-----------------------:|
| Think Python 2 | PDF | 244 | 119K | 19 |
| Working Backwards | PDF | 371 | 175K | 10 |
| Pro Git | PDF | 501 | 229K | — † |
| Moby-Dick | EPUB | — | 301K | 133 |

† Pro Git heads chapters with section titles (no `Chapter N`), so it does not
auto-segment. Moby-Dick's bodies use bare titles, but its Roman-numeral table of
contents is detected (133) — see *Known limitations* in the README.

**Extraction method matters for technical books.** On a 103-page technical PDF:

| Method | Time | Tables | Code blocks |
|--------|-----:|-------:|------------:|
| pdftotext | 0.1s | 0 | 0 |
| Docling (technical mode) | 164s | 48 | 36 |

pdftotext is instant but flattens structure; Docling is ~1.5s/page but preserves
tables and code as markdown. Pick text mode for prose, technical mode for code/tables.

## The Discovery Loop Tax

Tokens entering context to answer **one** targeted question. book-to-skill loads a
resident core (~4K) plus one compiled chapter (~1K) ≈ **5,000 tokens**.

| Book (chapter size) | Context-dump | Discovery loop | book-to-skill | vs dump / loop |
|---------------------|-------------:|---------------:|--------------:|:--------------:|
| Think Python 2 (small) | 119,264 | 12,152 | ~5,000 | 24× / 2.4× |
| Working Backwards (medium) | 175,253 | 33,444 | ~5,000 | 35× / 6.7× |
| AI Engineering (large) | 256,287 | 77,866 | ~5,000 | 51× / 15.6× |

```bash
python3 tools/discovery_tax.py --full-text /tmp/book_skill_work/full_text.txt --target-chapter 5
```

- The **context-dump** advantage (24–51×) is the strongest claim: that cost recurs on
  *every conversation turn*.
- The **discovery-loop** advantage (2.4–15.6×) is a one-time cost and a model using
  the book's real ToC/chapter sizes; it scales with chapter size.

## Generation cost

One-pass full conversion, estimated from measured tokens (Claude Sonnet 4.5,
\$3 / \$15 per MTok input/output):

| Book | Input | Output | ~Cost |
|------|------:|-------:|------:|
| Think Python 2 | 155K | 28K | \$0.88 |
| Working Backwards | 228K | 19K | \$0.96 |
| Pro Git | 298K | 23K | \$1.23 |
| Moby-Dick | 391K | 17K | \$1.42 |

Roughly **\$1 per book** for a full skill — paid once. Re-reading the same PDF into
context every session costs far more over time (see the Discovery Loop Tax above).

## Generated-skill output quality

A before/after of the adaptive-depth change (`v1.0.0`, #20) on one chapter:

| Artifact | Old spec | New spec |
|----------|---------:|---------:|
| Chapter file (tokens) | 473 | 1,219 |
| Worked example present | no | yes |
| Cheatsheet decision rules | 0 | 32 |
| Cheatsheet keyword/definition lines | 9 | 0 |

The new spec turns the cheatsheet from a glossary into a decision layer and gives
study-depth chapters a reproduced worked example.