# Poker44 Training Benchmark

Public benchmark guide for Poker44 subnet `126`.

## Purpose

Poker44 provides a public training benchmark for miner development. Use it to:

- test your benchmark parser;
- build and validate feature pipelines;
- train and compare detection models;
- run regression tests across model versions;
- calibrate model outputs against labeled chunk data.

The benchmark is a development dataset for model training, validation, parser
testing, and regression testing.

## API Base

```text
https://api.poker44.net/api/v1/benchmark
```

## Endpoints

```text
GET /api/v1/benchmark
GET /api/v1/benchmark/releases
GET /api/v1/benchmark/chunks?sourceDate=YYYY-MM-DD
GET /api/v1/benchmark/chunks/:chunkId
```

## Status

`GET /api/v1/benchmark` returns aggregate availability:

- `releaseVersion`
- `schemaVersion`
- `releaseType`
- `totalChunks`
- `totalHands`
- `latestSourceDate`
- `latestReleasedAt`
- `currentUtcDate`
- `autoRelease`

Example:

```bash
curl -sS https://api.poker44.net/api/v1/benchmark
```

## Releases

`GET /api/v1/benchmark/releases` returns available benchmark dates.

Common query parameters:

- `limit`: number of releases to return.
- `before`: optional `YYYY-MM-DD` cursor for pagination.

Example:

```bash
curl -sS 'https://api.poker44.net/api/v1/benchmark/releases?limit=30'
```

Each release includes:

- `sourceDate`
- `releaseVersion`
- `schemaVersion`
- `chunkCount`
- `handCount`
- `releasedAt`
- `humanExampleCount`
- `syntheticBotExampleCount`
- `audit`
- `metadata`

## Chunks

`GET /api/v1/benchmark/chunks?sourceDate=YYYY-MM-DD` returns chunk payloads for
one release date.

Common query parameters:

- `sourceDate`: required release date in `YYYY-MM-DD` format.
- `limit`: number of chunks to return.
- `cursor`: optional pagination cursor.
- `split`: optional `train` or `validation`.

Example:

```bash
curl -sS 'https://api.poker44.net/api/v1/benchmark/chunks?sourceDate=2026-06-10&limit=24'
```

Each chunk includes:

- `chunkId`
- `chunkHash`
- `sourceDate`
- `releaseVersion`
- `split`
- `handCount`
- `batchCount`
- `chunks`
- `groundTruth`
- `groundTruthLabels`
- `metadata`

## Model Input

The `chunks` field is the miner-visible model input. It is a list of chunk
groups. Each group contains one or more poker hands.

Miners should produce one prediction per chunk group, matching the order of
`chunks`.

Current releases use at least 30 hands per chunk group.

The labels are returned separately:

- `groundTruth`: numeric labels, where `1` means bot and `0` means human.
- `groundTruthLabels`: string labels, `bot` or `human`.

Do not read labels from individual hand objects.

Minimal validation example:

```python
import requests

base_url = "https://api.poker44.net/api/v1/benchmark"
source_date = "2026-06-10"

payload = requests.get(
    f"{base_url}/chunks",
    params={"sourceDate": source_date, "limit": 100},
    timeout=30,
).json()["data"]

for chunk in payload["chunks"]:
    model_inputs = chunk["chunks"]
    labels = chunk["groundTruth"]

    predictions = model.predict_proba(model_inputs)

    assert len(predictions) == len(labels)
    assert all(0.0 <= score <= 1.0 for score in predictions)
```

## Hand Fields

Hands may include:

- `hand_id`
- `metadata`
- `players`
- `streets`
- `actions`
- `outcome`

Action records may include:

- `action_id`
- `street`
- `actor_seat`
- `action_type`
- `amount`
- `raise_to`
- `call_to`
- `normalized_amount_bb`
- `pot_before`
- `pot_after`

Code should tolerate missing optional fields and empty arrays.

## Training Guidance

Use each chunk group as one training example. The target is the matching entry
in `groundTruth`, in the same array position.

Recommended practices:

- keep release dates separate when creating train, validation, and local test
  sets;
- use the returned `split` field when it is present;
- train across multiple release dates instead of fitting one date tightly;
- save `sourceDate`, `releaseVersion`, `schemaVersion`, `chunkId`, and
  `chunkHash` with every local experiment;
- cache downloaded JSON by `chunkHash` so experiments are reproducible;
- ignore unknown response fields so clients keep working as the schema expands;
- avoid using identifiers such as `hand_id` or `chunkId` as model features.

Useful metrics:

- ROC AUC for ranking quality;
- average precision for bot-class retrieval;
- log loss or Brier score for probability calibration;
- per-release metrics to catch overfitting to one benchmark date.

## Recommended Workflow

1. Fetch release dates from `/releases`.
2. Download chunks by `sourceDate`.
3. Cache raw responses and record `chunkHash`.
4. Split by release date and by the returned `split` field when present.
5. Build features only from miner-visible hand and action data.
6. Train on `train` chunks.
7. Tune and compare on `validation` chunks.
8. Keep a held-out local set for model regression tests.
9. Track performance by release date and model version.

## Common Mistakes

- Producing one prediction per hand instead of one prediction per chunk group.
- Reordering `chunks` before pairing predictions with `groundTruth`.
- Training and validating on the same release date only.
- Treating optional fields as always present.
- Using IDs, dates, hashes, or pagination order as predictive features.
- Assuming every chunk group has the same number of hands or actions.

## Notes

- New releases may be added over time.
- Response fields may expand, so clients should ignore unknown fields.
- The chunk order and label order are significant.
- Avoid tuning a model against a single release only.
- Prefer testing across multiple release dates.
- Use the benchmark for model development, parser testing, feature validation,
  and regression testing.