# `jsongrep` Benchmarking Criterion benchmark suite comparing `jsongrep` against popular JSON query tools. Inspired by the [ripgrep benchmarking methodology](https://burntsushi.net/ripgrep/): equivalent work across tools, realistic queries, transparency about what's measured, and statistical rigor (via Criterion). ## Compared Tools | Tool | Crate(s) | JSON Type | Compile Step | Query Syntax | | ----------------- | ----------------------------------- | -------------------------- | ------------------------------------------ | ------------------------------- | | **jsongrep** | this crate | `serde_json_borrow::Value` | `Query::from_str` + `QueryDFA::from_query` | `foo.bar`, `**`, `[*]`, `[1:3]` | | **jsonpath-rust** | `jsonpath-rust` | `serde_json::Value` | None (inline) | `$.foo.bar`, `$..x`, `$[*]` | | **jmespath** | `jmespath` | `jmespath::Variable` | `jmespath::compile()` | `foo.bar`, `[*]`, `[0:3]` | | **jaq** | `jaq-core` + `jaq-std` + `jaq-json` | `jaq_json::Val` | Arena + Loader + Compiler | `.foo.bar`, `..`, `.[]` | | **jql** | `jql-parser` + `jql-runner` | `serde_json::Value` | `jql_parser::parser::parse()` | `"foo"."bar"`, `..`, `[0]` | ## Running ```sh cargo bench --bench query ``` Or via justfile: ```sh just bench ``` HTML reports are generated at `target/criterion/report/index.html`. To publish the Criterion HTML reports to `gh-pages`: ```sh just bench-publish ``` This uses `ghp-import` to push `target/criterion/` as an orphan branch. ## Benchmark Groups ### 1. `document_parse` — JSON parse time by format Isolates the cost of parsing raw JSON into each tool's in-memory representation. jsongrep's `serde_json_borrow::Value` is zero-copy while others allocate. Formats: `serde_json_borrow::Value`, `serde_json::Value`, `jmespath::Variable` ### 2. `query_compile` — Query compilation time Measures DFA construction (jsongrep), expression compilation (jmespath, jaq, jql). jsonpath-rust has no separate compile step — its cost appears entirely in `query_search`. ### 3. `query_search` — Search with pre-compiled queries + pre-parsed docs The core benchmark. Pre-compiles queries and pre-parses documents, measuring only the traversal/execution. ### 4. `end_to_end` — Full pipeline (parse + compile + search) Closest to real CLI usage. Nothing pre-cached. ## Test Data | Tier | File | Size | Loading | | ------ | -------------------------------------------------------- | ------- | ---------------- | | Small | `tests/data/simple/simple.json` | 106 B | `include_str!` | | Medium | `tests/data/schemastore/.../kubernetes-definitions.json` | ~992 KB | `include_str!` | | Large | `tests/data/schemastore/.../kestra-0.19.0.json` | ~7.6 MB | `include_str!` | | XLarge | `benches/data/citylots.json` | ~190 MB | disk (see below) | Small–Large are loaded via `include_str!` for reproducibility (no disk I/O variance). XLarge is loaded from disk at runtime and silently skipped if the file is absent. Download it with: ```sh just bench-download ``` ## Query Equivalence **Generic queries (all documents):** | Name | jsongrep | JSONPath | JMESPath | jq (jaq) | jql | | ---------------- | ------------ | -------------- | ------------ | ------------- | --------------- | | `simple_field` | `name` | `$.name` | `name` | `.name` | `"name"` | | `nested_path` | `name.first` | `$.name.first` | `name.first` | `.name.first` | `"name""first"` | | `array_index` | `hobbies[0]` | `$.hobbies[0]` | `hobbies[0]` | `.hobbies[0]` | `"hobbies"[0]` | | `wildcard_field` | `*` | `$.*` | `*` | `.[]` | N/A | | `array_wildcard` | `hobbies[*]` | `$.hobbies[*]` | `hobbies[*]` | `.hobbies[]` | N/A | **Schema-specific queries (medium/large documents):** | Name | jsongrep | JSONPath | JMESPath | jq (jaq) | jql | | ----------------- | --------------------------------- | ----------------------------------- | --------------------------------- | ---------------------------------- | ---- | | `recursive_field` | `(* \| [*])*.description` | `$..description` | N/A | `.. \| .description? // empty` | `..` | | `deep_nested` | `definitions.*.properties.*.type` | `$.definitions.*.properties.*.type` | `definitions.*.properties.*.type` | `.definitions[].properties[].type` | N/A | **GeoJSON-specific queries (xlarge only):** | Name | jsongrep | JSONPath | JMESPath | jq (jaq) | jql | | ------------------------ | --------------------------- | ----------------------------- | --------------------------- | ------------------------------ | ---- | | `geo_all_geometry_types` | `features[*].geometry.type` | `$.features[*].geometry.type` | `features[*].geometry.type` | `.features[].geometry.type` | N/A | | `geo_recursive_coords` | `(* \| [*])*.coordinates` | `$..coordinates` | N/A | `.. \| .coordinates? // empty` | `..` | ## Fairness Notes 1. **Parse format difference:** jsongrep's zero-copy parsing is a genuine advantage, isolated in the `document_parse` group. 2. **Query compilation cost:** jsongrep's DFA construction is heavier upfront but enables faster traversal — shown in separate groups. 3. **Result consumption:** All results are fully collected/consumed via `black_box` to ensure equivalent work. 4. **Missing capabilities:** When a tool lacks a feature (e.g., JMESPath has no recursive descent), the benchmark is skipped — not faked. 5. **jsonpath-rust has no separate compile step** — its cost appears entirely in `query_search`, making those numbers not directly comparable to pre-compiled tools. 6. **Ownership semantics:** jaq and jmespath require ownership of the input value. In `query_search`, `iter_batched` separates the clone cost from the measured search time. ## Machine Info I ran the benchmarks on my 2021 MacBook Pro (M1, 16 GB RAM, 1 TB SSD). ## References - [ripgrep is faster than {grep, ag, git grep, ucg, pt, sift}](https://burntsushi.net/ripgrep/)