# Telemetry

This project includes lightweight, anonymous telemetry to help us improve TabPFN. If you'd rather not send telemetry, you can always opt out (see **Opting out**).

---

## What we collect

We only gather **very high-level usage signals** — enough to guide development, never enough to identify you or your data.

Here's the full list:

### Events
- `ping` – periodic liveness heartbeat (daily / weekly / monthly cadence)
- `session` – sent when you initialize a TabPFN estimator (`TabPFNClassifier`, `TabPFNRegressor`)
- `model_load` – sent when TabPFN attempts to load model weights (reports `success` / `failed`)
- `dataset` – sent when a dataset is passed to `fit` or `predict` (no dataset content; shape only)
- `fit_called` – sent when you call `fit`
- `predict_called` – sent when you call `predict`
- `extension_entry` – sent when a TabPFN extension entry point (e.g. from `tabpfn-extensions`, `tabpfn-time-series`) is invoked

### Metadata (all events)
- `python_version` – Python version you're running
- `tabpfn_version` – TabPFN package version
- `numpy_version` – local NumPy version
- `pandas_version` – local pandas version
- `gpu_type` – type of GPU TabPFN is running on
- `platform_os` – operating system
- `runtime_kernel` – runtime kernel (e.g. CPython)
- `runtime_environment` – runtime environment (e.g. notebook / script / CI)
- `timestamp` – time of the event

### Extra metadata (per-event)
- `fit_called` / `predict_called`: `task` (classification or regression), `num_rows` (*rounded*), `num_columns` (*rounded*), `duration_ms`
- `model_load`: `model_name` (HuggingFace repo id), `status`
- `dataset`: `task`, `role` (train / test), `num_rows` (*rounded*), `num_columns` (*rounded*)
- `extension_entry`: `extension_name`

---

## How we protect your privacy

- **No inputs, no outputs, no code** ever leave your machine.
- **No personal data** is collected.
- Dataset shapes are **rounded into ranges** (e.g. `(953, 17)` → `(1000, 20)`) so exact dimensionalities can't be linked back to you.
- The data is strictly anonymous — it cannot be tied to individuals, projects, or datasets.

This approach lets us understand dataset *patterns* (e.g. "most users run with ~1k features") while ensuring no one's data is exposed.

---

## Why collect telemetry?

Open-source projects don't get much feedback unless people file issues. Telemetry helps us:
- See which parts of TabPFN are most used (fit vs predict, classification vs regression)
- Detect performance bottlenecks and stability issues
- Prioritize improvements that benefit the most users

This information goes directly into **making TabPFN better** for the community.

---

## Opting out

Don't want to send telemetry? No problem — just set the environment variable:

```bash
export TABPFN_DISABLE_TELEMETRY=1
```