--- name: trueskill-rank description: Domain-agnostic TrueSkill batch ranking via LLM-as-judge. Ranks any list of text items using overlapping subsets dispatched to Codex Spark workers. Swappable rubrics. Use when you need to rank, score, curate, or sort a collection by quality. --- # TrueSkill Rank Rank any collection of text items by quality using TrueSkill + LLM-as-judge. ## Setup ```bash pip install trueskill ``` - **Python 3.11+** required - **agent-mux** for parallel dispatch (optional -- falls back to direct OpenAI API if `OPENAI_API_KEY` is set) - **Claude Code:** copy this skill folder into `.claude/skills/trueskill-rank/` - **Codex CLI:** append this SKILL.md content to your project's root `AGENTS.md` For the full installation walkthrough (prerequisites, verification, API fallback), see [references/installation-guide.md](references/installation-guide.md). ## Staying Updated This skill ships with an `UPDATES.md` changelog and `UPDATE-GUIDE.md` for your AI agent. After installing, tell your agent: "Check `UPDATES.md` in the trueskill-rank skill for any new features or changes." When updating, tell your agent: "Read `UPDATE-GUIDE.md` and apply the latest changes from `UPDATES.md`." Follow `UPDATE-GUIDE.md` so customized local files are diffed before any overwrite. --- ## Quick Start ```bash PYTHON="python3" CLI="~/.claude/skills/trueskill-rank/scripts/trueskill-rank.py" # Full pipeline: prepare + dispatch + aggregate $PYTHON $CLI run \ --input items.json \ --overlap 3 \ --rubric ~/.claude/skills/trueskill-rank/rubrics/practitioner-signal.md \ --output results.json # Or step by step: $PYTHON $CLI prepare --input items.json --overlap 3 \ --rubric ~/.claude/skills/trueskill-rank/rubrics/practitioner-signal.md \ --output-dir /tmp/ts-run/ # Dispatch is handled internally by trueskill-rank.py (no separate script needed) $PYTHON $CLI aggregate --run-dir /tmp/ts-run/ --output results.json ``` Cost: each subset of 10 items produces C(10,2)=45 implicit pairwise comparisons. 100 items at overlap 3 = 30 subsets = 30 API calls = 1,350 implicit comparisons. ## Decision Tree ### Mode Selection | Question | Answer | Mode | |----------|--------|------| | Ranking individual items (messages, posts)? | Yes | `--mode batch` (default) | | Comparing entities (channels, sources, candidates)? | Yes | `--mode pairwise` | | Items > 50? | Yes | `--mode batch` (far more efficient) | | Items < 20, need binary signal? | Yes | `--mode pairwise` | ### Overlap Selection | Overlap | When to use | Cost | |---------|-------------|------| | `--overlap 2` | Quick scan, low stakes, small sets | Lowest | | `--overlap 3` | Default. Good balance of speed and confidence | Medium | | `--overlap 4` | High-stakes curation, final rankings | Highest | ### Rubric Selection | Rubric | Use for | |--------|---------| | `practitioner-signal.md` | General content quality. 6 criteria led by practitioner signal | | `signal-serendipity-entropy.md` | Content curation emphasizing surprise and cross-domain bridges | | Custom rubric via `--rubric` | Any domain -- create from `example-template.md` | ## Input Format ```json {"items": [{"id": "item_001", "text": "...", "metadata": {...}}, ...]} ``` Source doesn't matter -- TG messages, HN posts, articles, papers, tweets. The `id` field is required, `text` is the content to rank, `metadata` is optional and passed through to output. ## CLI Reference ### prepare Generate subsets and prompt files from input items. ```bash $PYTHON $CLI prepare \ --input items.json \ # Required. JSON with {"items": [...]} --mode batch \ # batch (default) or pairwise --overlap 3 \ # 2-4, controls statistical robustness --subset-size 10 \ # Items per subset (batch) or matchups per batch (pairwise) --rubric rubric.md \ # Required. Scoring criteria file --output-dir /tmp/ts-run/ \ # Where to write subsets.json and prompts/ --seed 42 \ # Random seed for reproducibility --text-cap 1500 # Max chars per item in prompts ``` Output: `subsets.json` + `prompts/subset-NN.txt` (batch) or `prompts/batch-NN.txt` (pairwise) ### aggregate Parse dispatch results and run TrueSkill rating. ```bash $PYTHON $CLI aggregate \ --run-dir /tmp/ts-run/ \ # Directory with subsets.json and results/ --output results.json # Final rankings JSON ``` ### run Full pipeline: prepare + dispatch + aggregate. ```bash $PYTHON $CLI run \ --input items.json \ --overlap 3 \ --rubric rubric.md \ --output results.json ``` Dispatch runs automatically via the built-in `dispatch_workers()` function. No external scripts required. ## Output Format ```json { "rankings": [ {"id": "item_001", "mu": 35.2, "sigma": 2.1, "conservative": 28.9, "rank": 1, "appearances": 3, "wins": 8, "losses": 2} ], "stats": { "total_items": 100, "subsets": 30, "results_parsed": 30, "parse_errors": 0, "coverage_gaps": 0, "mode": "batch", "overlap": 3, "rubric": "practitioner-signal" } } ``` `conservative` = mu - 3*sigma. This is the ranking key. Penalizes items with few appearances (high uncertainty). ## Dispatch Built into `trueskill-rank.py` via `dispatch_workers()`. No external scripts. Primary: `agent-mux --engine codex --model gpt-5.3-codex-spark --reasoning low --effort low` (free via GPT subscription). Resolves agent-mux via `AGENT_MUX_PATH` env var, then `which agent-mux`, then relative to skill directory. Runs 6 workers in parallel via `concurrent.futures.ThreadPoolExecutor`. Fallback: If `agent-mux` not found, falls back to direct OpenAI API via `urllib.request` (stdlib, zero deps). Requires `OPENAI_API_KEY` env var. Uses `gpt-4o-mini`. Results written as `{"success": true, "response": "..."}` JSON. ## Creating Custom Rubrics Copy `rubrics/example-template.md` and fill in: ```markdown # Rubric Name ## Criteria (ordered by importance) 1. **CRITERION** -- Description 2. **CRITERION** -- Description ## Tiebreaker How to break ties. ## Context What kind of content this is for. ``` 3-6 criteria recommended. Order matters -- most important first. --- ## Anti-Patterns | Do NOT | Do Instead | |--------|------------| | Use overlap 2 for high-stakes curation | Use overlap 3-4 for reliable rankings | | Use pairwise mode for 50+ items | Use batch ranking (far more efficient at scale) | | Skip the rubric file | Always specify a rubric via `--rubric` | | Run without trueskill installed | `pip install trueskill` first | | Parse result files manually | Use the `aggregate` subcommand | ## Error Handling | Problem | Solution | |---------|----------| | `trueskill` not installed | `pip install trueskill` | | agent-mux not found + no `OPENAI_API_KEY` | Install agent-mux or set `OPENAI_API_KEY` env var | | Parse errors in results | Check result files in `results/` dir, retry failed subsets | | Coverage gaps in aggregate output | Increase overlap coefficient (`--overlap 3` or `--overlap 4`) | | Empty items array error | Check input JSON format -- must be `{"items": [...]}` with at least one item | --- ## Bundled Resources Index | Path | What | When to Load | |------|------|-------------| | `./SKILL.md` | Skill runbook (this file) | Always | | `./UPDATES.md` | Structured changelog for AI agents | When checking for new features or updates | | `./UPDATE-GUIDE.md` | Instructions for AI agents performing updates | When updating this skill | | `./scripts/trueskill-rank.py` | Main CLI script -- prepare, dispatch, aggregate | Always (execution) | | `./references/algorithm.md` | TrueSkill math, N-player mode, convergence, cost scaling | When tuning parameters or understanding ranking behavior | | `./references/prior-runs.md` | Previous runs with statistics and lessons learned | When calibrating overlap, rubrics, or interpreting results | | `./references/installation-guide.md` | Detailed install walkthrough for Claude Code and Codex CLI | First-time setup or environment repair | | `./rubrics/practitioner-signal.md` | General content quality rubric (6 criteria) | Default rubric for most ranking tasks | | `./rubrics/signal-serendipity-entropy.md` | Curation rubric emphasizing surprise and cross-domain bridges | Content discovery and curation | | `./rubrics/example-template.md` | Template for creating custom rubrics | When creating a new domain-specific rubric |