# Benchmark Results

Results from JMH benchmarks comparing protokt's codec implementations against
protobuf-java and Wire.

## Environment

|                  |                              |
|------------------|------------------------------|
| **CPU**          | Apple M1 Pro (10 cores)      |
| **Memory**       | 32 GB                        |
| **Architecture** | arm64 (aarch64)              |
| **JDK**          | Amazon Corretto 17.0.7+7-LTS |
| **JMH**          | 1.37                         |
| **Commit**       | `744c52d4`                   |

### JMH configuration

|             |                                       |
|-------------|---------------------------------------|
| Mode        | Average time (ms/op, lower is better) |
| Warmup      | 3 iterations, 10s each                |
| Measurement | 5 iterations, 10s each                |
| Forks       | 2                                     |
| Threads     | 1                                     |

## Libraries under test

The protobuf-java and Wire columns are standalone benchmarks using each
library's native API directly. The three codec columns use protokt's generated code
with different codec backends:

| Column            | Description                                                                    |
|-------------------|--------------------------------------------------------------------------------|
| **protobuf-java** | Google's `protobuf-java` library, native API                                   |
| **Wire**          | Square's Wire library, native API                                              |
| **PBJ**           | protokt + `ProtobufJavaCodec` (delegates to protobuf-java for reading/writing) |
| **KxIo**          | protokt + `KotlinxIoCodec` (uses kotlinx-io `Source`/`Sink` internally)        |
| **Protokt**       | protokt + `ProtoktCodec` (pure Kotlin, zero external dependencies)             |

All protokt codec results use `DefaultCollectionFactory` unless otherwise noted.
`ProtoktCodec` does not implement `JvmCodec` or `StreamingCodec`, so streaming
deserialization and streaming serialization benchmarks show `---` for it.
(The `Message.serialize(Sink)` extension function always uses `KotlinxIoSinkWriter`
regardless of codec.)

## Datasets

- **Large**: ~225 MB dataset of `GenericMessage1` payloads with many populated fields
- **Medium**: Moderate-size `GenericMessage1` payloads
- **Small**: Compact `GenericMessage4` payloads
- **StringHeavy**: 100 messages with three 10K-character mixed-encoding UTF-8 string fields
- **StringOneof**: 100 messages with three 10K-character oneof string fields
- **StringOneof20k**: 100 messages with three 20K-character oneof string fields
- **StringVeryHeavy / StringOneofVeryHeavy**: 10 messages with three 1M-character string fields
- **StringMap**: 1000 iterations of a message with a `map<string, string>` field containing 100 entries
- **StringRepeated**: 1000 iterations of a message with a `repeated string` field containing 100 entries

## Results

All values are milliseconds per operation (ms/op). Lower is better.

### Deserialize (byte array)

| Benchmark                   | protobuf-java |   Wire |    PBJ |   KxIo |    Protokt |
|-----------------------------|--------------:|-------:|-------:|-------:|-----------:|
| deserializeLargeFromMemory  |          1487 |    873 |    811 |    878 |        808 |
| deserializeMediumFromMemory |         3.479 |  3.293 |  1.985 |  2.348 |  **1.830** |
| deserializeSmallFromMemory  |        0.0060 | 0.0088 | 0.0035 | 0.0042 | **0.0033** |
| deserializeStringMap        |         7.291 |  7.166 |  6.845 |  7.030 |      6.837 |
| deserializeStringRepeated   |         6.412 |  6.375 |  6.345 |  6.566 |      6.326 |

All three protokt codecs outperform both native protobuf-java and Wire on
deserialization from byte arrays. `ProtoktCodec` leads on medium and small
messages. On large messages, PBJ and Protokt are within noise of each other,
both ~46% faster than protobuf-java. String-collection benchmarks are tight
across all implementations.

### Deserialize (streaming)

| Benchmark                       | protobuf-java |   Wire |        PBJ |       KxIo |  Protokt |
|---------------------------------|--------------:|-------:|-----------:|-----------:|---------:|
| deserializeLargeStreaming       |          1526 |    955 |    **825** |        986 |      --- |
| deserializeMediumStreaming      |         3.122 |  4.519 |  **1.980** |      2.798 |      --- |
| deserializeSmallStreaming       |        0.0344 | 0.0146 |     0.0311 | **0.0111** |      --- |
| deserializeStringHeavyStreaming |            19 |     18 |         18 |         18 |      --- |
| deserializeStringOneofStreaming |            19 |     18 |         18 |         18 |      --- |

`ProtobufJavaCodec` is fastest on large and medium streaming deserialization,
46% faster than native protobuf-java and 14% faster than Wire on large messages.
`KotlinxIoCodec` wins on small messages. String-heavy results converge across
all implementations.

### Serialize (byte array)

| Benchmark               | protobuf-java |   Wire |    PBJ |   KxIo |    Protokt |
|-------------------------|--------------:|-------:|-------:|-------:|-----------:|
| serializeLargeToMemory  |          1206 |   1425 |    135 |    215 |    **134** |
| serializeMediumToMemory |        0.8811 |  1.123 | 0.7327 |  1.341 | **0.6742** |
| serializeSmallToMemory  |        0.0037 | 0.0063 | 0.0029 | 0.0057 | **0.0025** |

`ProtoktCodec` leads on all sizes, beating native protobuf-java by 89% on
large messages (134 ms vs 1206 ms) and Wire by 91%. Even on small messages,
`ProtoktCodec` is 32% faster than protobuf-java.

### Serialize (streaming)

| Benchmark                | protobuf-java |   Wire |     PBJ |   KxIo | Protokt |
|--------------------------|--------------:|-------:|--------:|-------:|--------:|
| serializeLargeStreaming  |          1221 |   1390 | **205** |    216 |     --- |
| serializeMediumStreaming |    **0.9861** |  1.143 |   1.144 |  1.270 |     --- |
| serializeSmallStreaming  |    **0.0045** | 0.0072 |  0.0058 | 0.0061 |     --- |

`ProtobufJavaCodec` leads on large streaming serialization at 205 ms, 83% faster
than native protobuf-java and 85% faster than Wire. Native protobuf-java retains
the edge on small and medium streaming serialization.

### Pass-through (deserialize then serialize)

| Benchmark                   | protobuf-java |   Wire |    PBJ |   KxIo |    Protokt |
|-----------------------------|--------------:|-------:|-------:|-------:|-----------:|
| passThroughLargeFromMemory  |          3154 |   2190 |    981 |   1144 |    **977** |
| passThroughMediumFromMemory |         4.794 |  4.415 |  3.136 |  3.987 |  **2.852** |
| passThroughSmallFromMemory  |        0.0107 | 0.0140 | 0.0109 | 0.0117 | **0.0075** |
| passThroughStringHeavy      |            48 |     39 |     18 |     19 |         18 |
| passThroughStringOneof      |            48 |     39 |     18 |     19 |         18 |
| passThroughStringMap        |            21 |     15 |  7.483 |  7.996 |      7.516 |
| passThroughStringRepeated   |            17 |     14 |  6.793 |  7.200 |      6.709 |

`ProtoktCodec` wins the core pass-through benchmarks (large, medium, small),
69% faster than native protobuf-java and 55% faster than Wire on large messages.

String-heavy and string-collection pass-through: all three protokt codecs
are substantially faster than both native libraries. The three protokt codecs
are within noise of each other on these workloads.

### Mutate and serialize

| Benchmark                                 | protobuf-java |       Wire |     PBJ |    KxIo | Protokt |
|-------------------------------------------|--------------:|-----------:|--------:|--------:|--------:|
| mutateAndSerializeStringHeavy             |            49 |     **39** |      51 |      51 |      51 |
| mutateAndSerializeStringHeavyStreaming    |            42 |     **39** |      51 |      51 |     --- |
| mutateAndSerializeStringOneof             |            49 |     **39** |      51 |      51 |      51 |
| mutateAndSerializeStringOneof20k          |            83 |     **79** |     101 |     102 |     101 |
| mutateAndSerializeStringOneof20kStreaming |            83 |     **78** |     101 |     102 |     --- |
| mutateAndSerializeStringOneofStreaming    |            42 |     **39** |      51 |      51 |     --- |
| mutateAndSerializeStringOneofVeryHeavy    |       **492** |        665 |     509 |     507 |     501 |
| mutateAndSerializeStringVeryHeavy         |       **482** |        682 |     504 |     514 |     505 |

Wire is fastest on most mutate-and-serialize benchmarks for standard-size
strings. For very heavy strings (1M characters), Wire becomes the slowest and
protobuf-java leads.

Protokt codecs are ~25-31% slower than Wire on standard-size
mutate-and-serialize. All protokt codecs perform similarly to each other.

### Copy/append

| Benchmark                 | protobuf-java |  Wire |    PBJ |       KxIo | Protokt |
|---------------------------|--------------:|------:|-------:|-----------:|--------:|
| copyAppendListLarge       |         3.824 | 3.672 |  2.068 |  **1.929** |   3.834 |
| copyAppendListMedium      |    **0.3002** | 1.486 |  1.513 |      1.435 |   1.796 |
| copyAppendListSmall       |    **0.2810** | 1.524 |  1.408 |      1.401 |   1.423 |
| copyAppendMapLarge        |            22 |    27 |     23 |         23 |      23 |
| copyAppendMapMedium       |            18 |    21 |     13 |         13 |      13 |
| copyAppendMapSmall        |            17 |    21 |     12 |         12 |      12 |
| copyAppendRepeatedString  |        0.5436 | 2.835 | 0.2903 |     0.2900 |  0.2916 |
| copyAppendMapStringString |            36 |    39 |     26 |         26 |      26 |

Copy/append performance measures 1000 iterations of appending a single element
to a list or map field via the `copy {}` DSL. The codec is irrelevant for these
benchmarks since the work is entirely in collection copying.

For lists, native protobuf-java is fastest on small and medium messages (its
mutable builder avoids the structural copy). On large messages, `KotlinxIoCodec`
leads at 1.929 ms, 47% faster than protobuf-java.

For maps, all three protokt codecs perform identically and are faster than
both protobuf-java and Wire on medium and small maps. On large maps, all
implementations are within noise except Wire, which is slowest.

For string collections, protokt codecs are roughly 2x faster than protobuf-java
on `copyAppendRepeatedString` and ~28% faster on `copyAppendMapStringString`.
Wire is 5-10x slower than protokt on repeated string append.

## Persistent collections

Persistent collections (`PersistentCollectionFactory`) use
`kotlinx-collections-immutable` to back `repeated` and `map` fields with
tree-based persistent data structures. This enables O(log n) structural sharing
on `copy {}` append operations instead of O(n) full copies.

### Impact on core operations

| Benchmark                   | Codec   | Default | Persistent |   Delta |
|-----------------------------|---------|--------:|-----------:|--------:|
| deserializeLargeFromMemory  | PBJ     |     811 |       1614 |  +99.0% |
| deserializeLargeFromMemory  | KxIo    |     878 |       1289 |  +46.8% |
| deserializeLargeFromMemory  | Protokt |     808 |       1750 | +116.6% |
| deserializeMediumFromMemory | PBJ     |   1.985 |      3.012 |  +51.8% |
| deserializeMediumFromMemory | KxIo    |   2.348 |      3.323 |  +41.5% |
| deserializeMediumFromMemory | Protokt |   1.830 |      2.705 |  +47.8% |
| serializeLargeToMemory      | PBJ     |     135 |        299 | +121.4% |
| serializeLargeToMemory      | KxIo    |     215 |        365 |  +69.7% |
| serializeLargeToMemory      | Protokt |     134 |        273 | +104.0% |
| serializeMediumToMemory     | PBJ     |   0.733 |      0.903 |  +23.2% |
| serializeMediumToMemory     | KxIo    |   1.341 |      1.335 |   -0.5% |
| serializeMediumToMemory     | Protokt |   0.674 |      0.888 |  +31.8% |
| passThroughLargeFromMemory  | PBJ     |     981 |       2075 | +111.7% |
| passThroughLargeFromMemory  | KxIo    |    1144 |       1810 |  +58.2% |
| passThroughLargeFromMemory  | Protokt |     977 |       2066 | +111.3% |
| passThroughMediumFromMemory | PBJ     |   3.136 |      4.803 |  +53.2% |
| passThroughMediumFromMemory | KxIo    |   3.987 |      5.496 |  +37.8% |
| passThroughMediumFromMemory | Protokt |   2.852 |      5.280 |  +85.1% |

Persistent collections add overhead to core operations. Large-message
deserialization sees 47-117% overhead, and serialization sees 70-121% overhead.
`KotlinxIoCodec` consistently shows the smallest persistent-collection penalty.

### Impact on copy/append

| Benchmark                 | Codec   | Default | Persistent |      Delta |
|---------------------------|---------|--------:|-----------:|-----------:|
| copyAppendListLarge       | PBJ     |   2.068 |      0.045 | **-97.8%** |
| copyAppendListLarge       | KxIo    |   1.929 |      0.034 | **-98.3%** |
| copyAppendListLarge       | Protokt |   3.834 |      0.042 | **-98.9%** |
| copyAppendMapLarge        | PBJ     |      23 |      0.319 | **-98.6%** |
| copyAppendMapLarge        | KxIo    |      23 |      0.252 | **-98.9%** |
| copyAppendMapLarge        | Protokt |      23 |      0.257 | **-98.9%** |
| copyAppendListSmall       | PBJ     |   1.408 |      1.470 |      +4.4% |
| copyAppendListSmall       | KxIo    |   1.401 |      1.428 |      +1.9% |
| copyAppendListSmall       | Protokt |   1.423 |      1.470 |      +3.3% |
| copyAppendMapSmall        | PBJ     |      12 |         27 |    +126.6% |
| copyAppendMapSmall        | KxIo    |      12 |         12 |      -0.4% |
| copyAppendMapSmall        | Protokt |      12 |         19 |     +62.6% |
| copyAppendRepeatedString  | PBJ     |   0.290 |      0.034 | **-88.4%** |
| copyAppendRepeatedString  | KxIo    |   0.290 |      0.021 | **-92.8%** |
| copyAppendRepeatedString  | Protokt |   0.292 |      0.032 | **-88.9%** |
| copyAppendMapStringString | PBJ     |      26 |      0.265 | **-99.0%** |
| copyAppendMapStringString | KxIo    |      26 |      0.242 | **-99.1%** |
| copyAppendMapStringString | Protokt |      26 |      0.262 | **-99.0%** |

Persistent collections provide 29-99x speedup on large-collection append
operations (98% improvement for lists, 99% for maps, 89-93% for repeated
strings, and 99% for string maps).

On small/empty lists, persistent and default collections perform identically.
Small-map persistent results vary by codec: `KotlinxIoCodec` shows no overhead,
while `ProtobufJavaCodec` and `ProtoktCodec` show increased cost.

### Cross-library context

Even with the overhead from persistent collections, the trade-off is compelling
for workloads that combine deserialization with incremental message building:

| Benchmark                   | protobuf-java |   Wire | protokt | protokt + persistent | vs protobuf-java | vs Wire |
|-----------------------------|--------------:|-------:|--------:|---------------------:|-----------------:|--------:|
| deserializeLargeFromMemory  |          1487 |    873 |     808 |                 1289 |             -13% |    +48% |
| deserializeMediumFromMemory |         3.479 |  3.293 |   1.830 |                2.705 |             -22% |    -18% |
| serializeLargeToMemory      |          1206 |   1425 |     134 |                  273 |             -77% |    -81% |
| serializeMediumToMemory     |        0.8811 |  1.123 |   0.674 |                0.888 |              +1% |    -21% |
| passThroughLargeFromMemory  |          3154 |   2190 |     977 |                 1810 |             -43% |    -17% |
| passThroughMediumFromMemory |         4.794 |  4.415 |   2.852 |                4.803 |              +0% |     +9% |
| copyAppendListLarge         |         3.824 |  3.672 |   1.929 |                0.034 |             -99% |    -99% |
| copyAppendMapLarge          |            22 |     27 |      23 |                0.252 |             -99% |    -99% |
| copyAppendRepeatedString    |        0.5436 |  2.835 |   0.290 |                0.021 |             -96% |    -99% |
| copyAppendMapStringString   |            36 |     39 |      26 |                0.242 |             -99% |    -99% |

The "vs" columns compare protokt with persistent collections (using the best
codec per benchmark) against each native library. Persistent-collection protokt
retains a 13-77% serialization and deserialization advantage over protobuf-java
on most benchmarks, though large-message deserialization with persistent
collections is 48% slower than Wire (a trade-off against the 99% copy/append
improvement). Copy/append of large collections is 96-99% faster than both other
libraries. Workloads that mix deserialization with incremental message building
via `copy {}` will see the largest overall benefit from persistent collections.

## Codec selection guide

| Workload                                | Recommended codec | Rationale                                                               |
|-----------------------------------------|-------------------|-------------------------------------------------------------------------|
| JVM general purpose                     | `OptimalJvmCodec` | Fastest byte-array paths + fastest streaming via protobuf-java.         |
| Multiplatform (JVM targets)             | `OptimalJvmCodec` | Same as JVM; use per-target deps to get protobuf-java on JVM targets.   |
| Multiplatform (non-JVM targets)         | `OptimalKmpCodec` | Fastest byte-array paths + streaming via kotlinx-io. No JVM dependency. |
| Minimal dependencies (byte arrays only) | `ProtoktCodec`    | Ships with `protokt-runtime`, no additional dependencies needed.        |

The default `optimal()` codec selection handles this automatically: KMP projects
get `OptimalKmpCodec` for common code and `OptimalJvmCodec` for JVM/Android
targets via per-target dependencies.