# Performance Report: Stream vs Chunked vs Full

This report summarizes measured performance across five configurations and four document sizes. Benchmarks were run on Node.js using synthetic paragraph-heavy content.

Treat these numbers as local harness data, not as a blanket claim that every workload is faster. The latest generated snapshot in `docs/perf-latest.json` records `benchmarkVersion`, `generatedAt`, Node version, platform, CPU, CPU count, and commit SHA so results can be reproduced or compared against later runs.

Default API note:
- Normal callers should keep using `md.parse(src)` / `md.render(src)`.
- Large finite strings can be handled by the default API via internal large-input optimizations on stock parser instances; plugin/custom-rule instances keep plain full-parse behavior unless chunking is explicitly enabled.
- Explicit chunk-stream APIs such as `parseIterable` / `UnboundedBuffer` are advanced tools for sources that already arrive as chunks; they are not required to benefit from the default large-text path.

Scenarios:
- S1: stream ON, cache OFF, chunk ON (stream + chunked, but reset cache each step)
- S2: stream ON, cache ON, chunk OFF (stream append fast-path only)
- S3: stream ON, cache ON, chunk ON (hybrid: chunked allowed and append fast-path)
- S4: stream OFF, chunk ON (full parse with chunked fallback)
- S5: stream OFF, chunk OFF (plain full parse)

Workloads measured per size:
- one-shot: single full parse of the entire document
- append workload: 1 initial parse + 5 append steps (growing to the target size)

Raw results (ms):

- size=5k chars
  - one-shot best: S5 0.65ms
  - append best: S3 0.79ms (S2: 1.28ms)
- size=20k chars
  - one-shot best: S5 0.94ms
  - append best: S2 2.03ms (S3: 2.58ms)
- size=50k chars
  - one-shot best: S5 2.72ms
  - append best: S3 2.57ms (S2: 2.81ms)
- size=100k chars
  - one-shot best: S5 5.25ms
  - append best: S3 6.57ms (S2: 6.91ms)

Append fast-path confirmation: With a stable env object, `appendHits` reached 5 (one per append) for S2/S3 across sizes.

## Conclusions

- One-shot parsing (no appends):
  - For the tested content, plain full parse (S5) was consistently fastest from 5k to 100k chars.
  - Chunked (S4) did not outperform full parse on these inputs. It may help on extremely large or fence/blank-line-heavy documents; tune thresholds if you enable it.

- Append-heavy editing (growing documents):
  - Stream with cache (S2/S3) clearly outperforms non-stream at medium and large sizes.
  - Hybrid (S3) is usually as fast or slightly faster than stream-only (S2) for larger docs (≥ 50k), primarily because it can choose chunked on the initial parse when beneficial.
  - For smaller docs (~5k–20k), stream-only (S2) can be a tiny bit faster than hybrid (S3), depending on thresholds, but both beat non-stream.

## Recommendations

- If you parse once (one-shot):
  - Default to full parse (S5). Enable full-chunked fallback only after testing on your workload; consider starting thresholds at ~20k chars/400 lines.

- If you support live editing with appends:
  - Enable stream mode with cache (S2): `stream: true` and leave `streamChunkedFallback: false`.
  - If initial parses are often large (tens of kB+), enable hybrid (S3): `streamChunkedFallback: true` with chunk size ~10k chars/200 lines.

- Threshold tuning:
  - Start with `streamChunkSizeChars ≈ 10k`, `streamChunkSizeLines ≈ 200`.
  - For full parse chunked fallback, start with `fullChunkThresholdChars ≈ 20k`, `fullChunkThresholdLines ≈ 400`, and chunk size `8k–16k` chars, `150–250` lines.

## Adaptive chunk sizing (default)

Both full and stream chunked fallbacks choose chunk sizes adaptively by default:

- Target around 8 chunks (`fullChunkTargetChunks` / `streamChunkTargetChunks`), clamped to a practical range.
- Effective sizes: `ceil(docChars / target)` clamped to `[8k, 32k]` for characters and `[150, 350]` for lines.
- Disable with `fullChunkAdaptive: false` or `streamChunkAdaptive: false` and pass fixed `*SizeChars/*SizeLines`.
- Optionally cap the number of chunks with `fullChunkMaxChunks`.

Notes:
- Adaptive sizing reduces the chance of over-chunking at ~100k+ where orchestration cost can dominate.
- Even with adaptive sizing, one-shot full parse (S5) can remain faster for some inputs. Validate on your data.

## How to reproduce

- Build and run the matrix:

```bash
npm run build
node scripts/perf-matrix.mjs
```

- Optional: Sweep non-stream chunked settings on your own content:

```bash
npm run build
node scripts/full-vs-chunked-sweep.mjs
```

These scripts print best-per-size summaries and can export JSON by setting `PERF_JSON=1`.

When publishing or comparing benchmark numbers, include:

- Node.js version
- CPU and OS/platform
- benchmark version
- commit SHA for this repository
- baseline package versions
- content generator or fixture source
- warmup/iteration settings from the harness

## Baseline: markdown-it (JS) example

For parity, we include the upstream markdown-it as a baseline in the matrix (scenario M1):

```ts
import MarkdownIt from 'markdown-it'

const md = new MarkdownIt()
const tokens = md.parse('# Title\n\nHello', {})
const html = md.render('# Title\n\nHello')
```

See the latest auto-generated numbers in `docs/perf-latest.md`.

## Remark parse (parse-only)

We also include a Remark parser scenario (R1) to compare pure parse throughput. It exercises:

```ts
import { unified } from 'unified'
import remarkParse from 'remark-parse'

const u = unified().use(remarkParse)
const tree = u.parse('# Title\n\nHello')
```

Notes:
- This measures parse only (no HTML render). It appears in the perf matrix as `R1` when `unified` and `remark-parse` are installed.
- Install deps once: `pnpm add -D unified remark-parse`.
- Run the matrix as usual: `npm run perf:matrix`.

## Regenerate the report in CI

You can refresh `docs/perf-latest.md` on demand via GitHub Actions:

- Go to your repository on GitHub → Actions → “Perf Report” → “Run workflow”.
- Optional inputs:
  - ref: branch/tag/SHA to run against (defaults to current branch)
  - node-version: Node.js version (default 20)
  - package-manager: pnpm or npm (default pnpm)

The workflow will install deps, run `perf:generate`, upload the files as an artifact, and commit/push `docs/perf-latest.md` and `docs/perf-latest.json` if they changed.

Chinese version (zh-CN):
- Run “Perf Report (zh-CN)” workflow. It executes `perf:generate:zh` and updates `docs/perf-latest.zh-CN.md` similarly.