# EvolAI LLM model evaluation subnet on Bittensor. ## Installation Install [uv](https://github.com/astral-sh/uv), then: ```bash git clone https://github.com/evolai-subnet/evolai.git uv pip install -e . ``` Or with pip: ```bash pip install -e . ``` Verify: ```bash evolcli --help ``` ## Mining Requirements: - Model name must contain `evolai` - Model must be public on HuggingFace - Supported tracks: `transformer`, `mamba2` Check eligibility: ```bash evolcli miner check --model username/evolai-0.4b --track transformer evolcli miner check --model username/evolai-mamba2-0.4b --track mamba2 ``` Get your challenge: ```bash evolcli miner get-challenge ``` Register your model: ```bash evolcli miner register --wallet-name miner1 --hotkey my-hotkey --track transformer evolcli miner register --wallet-name miner1 --hotkey my-hotkey --track mamba2 ``` Re-register after you publish a new model version. ## Evaluation The validator loads your weights directly each round and measures: | Signal | Weight | Description | |---|---|---| | **Quality** (KL × ThinkGain) | 60% | KL divergence from reference model (Qwen3.5-9B), gated by how much `` tokens help your model | | **Flow** | 30% | Sharpe ratio of your KL improvement trend — consistent progress over time | | **Side Quests** | 10% | Accuracy on 2 arithmetic tasks per round (binary graded, ≤ 20 output tokens) | ``` quality = KL_absolute × (0.30 + 0.70 × think_gain) × gate_improve × gate_consistency score = 0.60 × quality + 0.30 × (flow × quality) + 0.10 × side_quest_accuracy × miner_scale ``` `miner_scale` combines your improvement trend (short vs. long loss EMA) and proximity to the current best model on the subnet. Smaller models (~0.47 B) receive a parameter-efficiency bonus: their loss is discounted by `(params_B / 1.8)^0.5`. Quality is also gated by two KL checks: - **Improvement gate:** your current model must have at least 2% lower KL on the current eval set than your previous locked model had on the same set (`cur_kl ≤ prev_kl × 0.98`). Near-zero improvements do not pass. - **Consistency gate:** your EMA KL on the next public eval set must not be more than 20% worse than your EMA KL on the current eval set. This discourages destroying general capability just to overfit one round. Validator order is fixed: **lock miner revision SHA → publish next seed → evaluate**. The next seed is public immediately so miners have the full epoch to prepare, but the current model revision cannot change after it is locked. ### How to maximise your score **The eval dataset is public and your exact eval indices are deterministic.** Each epoch the validator commits a seed to the chain and your challenge indices are derived as: ``` SHA256(seed : miner_uid : dataset_name : "eval") → sample indices ``` The active dataset is [`evolai/universal_qa`](https://huggingface.co/datasets/evolai/universal_qa) on HuggingFace. **Recommended strategy:** 1. **Fetch your challenge each epoch** — the indices tell you exactly which samples will be used to measure your loss: ```bash evolcli miner get-challenge ``` 2. **Fine-tune directly on your eval indices** — the goal is **low KL divergence from the reference model (Qwen3.5-9B) and high ThinkGain** on those exact samples. These two signals together determine your quality score, which drives 60% of your total reward. 3. **Train with chain-of-thought** — add `` traces to your fine-tuning data so that `think_CE < base_CE`. ThinkGain (`base_CE / (think_CE + base_CE)`) gates your entire quality score; a model that does not benefit from thinking earns only 30% of its raw KL score. 4. **Push new revisions regularly** — Flow and the improvement gate reward consistent KL improvement over time. Re-register after every publish; validators lock the exact HuggingFace commit SHA before releasing the next seed. 5. **Handle arithmetic** — Side Quests are always integer answers. Make sure your model produces a bare number within 20 tokens. ## Validating Validators require two separate virtual environments because `bittensor` and `vllm` have incompatible dependency pins. The setup script handles this automatically: ```bash bash scripts/setup-validator.sh ``` This creates `.venv` (bittensor + evolai) and `vllm_env` (vllm), and writes `VLLM_EXECUTABLE` to `.env`. Copy [.env.example](.env.example) to `.env` and fill in your credentials, then run: ```bash source .venv/bin/activate evolcli validator run \ --wallet validator1 \ --hotkey default ``` A GPU with 80 GB VRAM is required. CUDA 13 (driver ≥ 575) is required for the validator; `torch>=2.7.0` uses CUDA 13 build artifacts and will not start on older drivers