MinT

# MinT Quickstart [English](./README.md) | [中文](./README_zh.md) The single entry repo for learning [MinT](https://github.com/MindLab-Research/mindlab-toolkit) (Mind Lab Toolkit) — from first API call to advanced RL training. Visit the [MinT website](https://macaron.im/mindlab/mint). > **Important:** All experiments run against an already deployed MinT server. This repo does **not** start MinT backend services locally. You only need valid server endpoint + API key credentials. ## Demo Portfolio ### Available Now | # | Demo | Track | Reward Source / Shape | Script | |---|------|-------|------------------------|--------| | 1 | **RL-1 Verifiable Math** | RL | Deterministic verifier | [`demos/rl/adapters/verifiable_math.py`](demos/rl/adapters/verifiable_math.py) | | 2 | **RL-2 Preference Chat** | RL | Pairwise/judge preference | [`demos/rl/adapters/preference_chat.py`](demos/rl/adapters/preference_chat.py) | | 3 | **RL-3 Environment Tool Use** | RL | Code execution feedback | [`demos/rl/adapters/environment_tooluse.py`](demos/rl/adapters/environment_tooluse.py) | | 4 | **Sampling Log** | Sampling | Train then inspect model responses | [`quickstart/sampling_log.py`](quickstart/sampling_log.py) | | 5 | **Embodied-1 OpenPI FAST SDK** | Embodied | MinT-only `mintx` OpenPI client over 3 camera images + state + action-token supervision | [`demos/embodied/openpi_vla_sdk.py`](demos/embodied/openpi_vla_sdk.py) | ### Reference | Demo | Track | Why it exists | Script | |------|-------|---------------|--------| | **OpenPI FAST HTTP** | Embodied | Shows the raw wire protocol directly for debugging and request-shape reference | [`demos/embodied/openpi_vla_http.py`](demos/embodied/openpi_vla_http.py) | ### Coming Soon | # | Demo | Track | Description | Status | |---|------|-------|-------------|--------| | 6 | **VLM-1 Vision QA** | VLM | Image + question -> grounded answer | Planned (M2) | | 7 | **VLM-2 Vision Instruction** | VLM | Image + task -> action/decision | Planned (M2) | ## Quick Start **Requirements:** Python >= 3.11, a MinT API key ```bash pip install git+https://github.com/MindLab-Research/mindlab-toolkit.git python-dotenv matplotlib numpy ``` Create `.env` in the repo root: ``` MINT_API_KEY=sk-your-api-key-here ``` Use the MinT endpoint that matches your region: - Mainland China: `https://mint-cn.macaron.xin/` - Outside Mainland China: `https://mint.macaron.xin/` ## Common First Questions ### Should I use SFT or RL? - Use **SFT** when you already know what the model should say or do and you have labeled target outputs. - Use **RL** when you do not have one fixed target answer but you can score the model's behavior with a reward, verifier, test suite, or environment feedback. - If you have both, you can combine them. The common pattern is SFT for the basic behavior, then RL for optimization, but that is not a required order for every task. ### Does MinT support SFT? Yes. MinT supports SFT directly. The standard SFT path is: - `forward_backward(..., loss_fn="cross_entropy")` - `optim_step(...)` ### Which domain should I use? Choose by your network path: - Mainland China -> `https://mint-cn.macaron.xin/` - Outside Mainland China -> `https://mint.macaron.xin/` If you are unsure, try the one that matches your region first. The practical goal is lower latency and stable connectivity. ### Where do I get `MINT_API_KEY`? `MINT_API_KEY` is currently issued by the Mind Lab team. To request access: - go to `https://macaron.im/mindlab` - use **Schedule a Demo** - or email `contact@mindlab.ltd` Run the quickstart (SFT then RL in one script): ```bash python quickstart/quickstart.py ``` Or open the interactive notebook: ```bash jupyter notebook quickstart/mint_quickstart.ipynb ``` Or run a focused quickstart recipe: ```bash python quickstart/custom_reward.py python quickstart/custom_loss.py ``` ## Run a Demo ```bash python demos/rl/adapters/verifiable_math.py # RL-1: math with exact-match reward python demos/rl/adapters/preference_chat.py # RL-2: chat with helpfulness proxy python demos/rl/adapters/environment_tooluse.py # RL-3: code gen with execution reward python demos/embodied/openpi_vla_sdk.py # Embodied-1: OpenPI via mintx / mint.mint python demos/embodied/openpi_vla_http.py # Reference: raw OpenPI FAST HTTP wire shape ``` All demos are configurable via environment variables. See [`demos/rl/README.md`](demos/rl/README.md) for details. ## Advanced Workflows ### Checkpoint Loop (Save -> Download -> Upload -> Resume) If you want a full checkpoint lifecycle: ```bash python advanced/checkpoint.py save --name my-ckpt python advanced/checkpoint.py download tinker:///weights/ -o ./ckpts python advanced/checkpoint.py upload ./ckpts/.tar.gz python advanced/checkpoint.py resume tinker:///weights/ --with-optimizer --steps 3 ``` See [`advanced/README.md`](advanced/README.md) for the full command matrix, the optimizer-preserving resume shape (`create_lora_training_client(...)` + `load_state_with_optimizer(...)`), and guardrails (`sampler_weights` vs `weights`). ### MIS Rollout Correction Validation If you want a focused end-to-end check for session-level Seq-MIS wiring: ```bash python advanced/validate_mis_rollout_correction.py --base-model Qwen/Qwen3-30B-A3B-Instruct-2507 ``` See [`docs/mis_rollout_correction.md`](docs/mis_rollout_correction.md) for prerequisites, env vars, expected output, and failure modes. ### Queue Status Polling Monitor queue position and estimated wait time for pending sample requests: ```bash python advanced/queue_status.py ``` Uses the low-level `AsyncTinker` client with backpressure headers to read queue fields from 408 responses. ## Repo Structure ``` mint-quickstart/ .env.example # Template for API key configuration quickstart/ quickstart.py # SFT -> RL in one script custom_reward.py # Client-side reward shaping + importance_sampling custom_loss.py # Pairwise preference training via forward_backward_custom sampling_log.py # Train then inspect model responses mint_quickstart.ipynb # Interactive notebook version demos/ rl/ # 3 RL demos (available) rl_core.py # Shared GRPO training loop adapters/ verifiable_math.py preference_chat.py environment_tooluse.py vlm/ # 2 VLM demos (coming soon) embodied/ # primary SDK demo + low-level HTTP reference advanced/ # Checkpoint workflows, MIS validation, queue status docs/ roadmap.md # 6-demo roadmap with status tags troubleshooting.md # Common issues and fixes migration-from-minT-demo.md experiments/ # Validation reports for quickstart flows .pi/ skills/ # Project-local pi skills for API, debugging, and issue reporting mint-skill/ # AI coding agent migration skill ``` ## Tinker SDK Compatibility If you have existing code using `import tinker`, the lowest-friction MinT migration is: ```python import mint as tinker ``` Then point the Tinker-style client surface at MinT: ```bash TINKER_BASE_URL= TINKER_API_KEY= ``` Use the MinT endpoint that matches your region: - Mainland China: `https://mint-cn.macaron.xin/` - Outside Mainland China: `https://mint.macaron.xin/` Why this is the recommended path: - raw upstream `import tinker` still validates API keys with the `tml-` prefix - MinT API keys start with `sk-` - `import mint as tinker` keeps the Tinker-style code shape while enabling MinT compatibility patches If you must keep the exact `import tinker` statement, import `mint` earlier in the same process before constructing Tinker clients. ## Docs - [Roadmap](docs/roadmap.md) — all 6 demos with availability status - [Troubleshooting](docs/troubleshooting.md) — common issues and solutions - [Migration Guide](docs/migration-from-minT-demo.md) — moving from old MinT-demo repo - [Quickstart Guide](quickstart/README.md) — first run plus focused custom reward / custom loss recipes - [RL Demos](demos/rl/README.md) — detailed docs for the 3 available RL demos - [Embodied Demos](demos/embodied/README.md) — primary OpenPI SDK example plus low-level HTTP reference - [Advanced](advanced/README.md) — checkpoint workflows and MIS validation entry points - [MIS Rollout Correction](docs/mis_rollout_correction.md) — targeted Seq-MIS validation flow and troubleshooting - [Experiment Report](docs/experiments/quickstart-upload-download-resume-report.md) — quickstart upload-download-resume validation template/results - [Pi Skills](.pi/skills/README.md) — project-local pi skills for API, debugging, and issue reporting - [Migration Skill](mint-skill/SKILL.md) — AI agent skill for migrating from verl/TRL/OpenRLHF - [中文 README](README_zh.md) — Chinese version of this document