# AI Research `Skills` Library
> **The most comprehensive open-source skills library enabling AI agents to autonomously conduct AI research β from idea to paper**
---
## Table of Contents
- [Our Mission](#our-mission)
- [Path Towards AI Research Agent](#path-towards-ai-research-agent)
- [Available AI Research Engineering Skills](#available-ai-research-engineering-skills)
- [Demos](#demos)
- [Skill Structure](#skill-structure)
- [Roadmap](#roadmap)
- [Repository Structure](#repository-structure)
- [Use Cases](#use-cases)
- [Contributors](#contributors)
- [Citation](#citation)
- [Community](#community)
## Our Mission
We enable AI agents to **autonomously conduct AI research** β from literature survey and idea generation through experiment execution to paper writing. The library provides both the **research orchestration layer** (autoresearch, ideation, paper writing) and the **engineering skills** (training, evaluation, deployment) needed at each stage.
System diagram of an AI research agent
## Path Towards AI Research Agent
Modern AI research requires mastering dozens of specialized tools and frameworks.
AI Researchers spend more time debugging infrastructure than testing hypotheses β slowing the pace of scientific discovery.
We provide a comprehensive skills library that enables AI agents to autonomously conduct the full research lifecycle β from brainstorming ideas to writing the paper.
- Autonomous Research - The **autoresearch** skill orchestrates the entire research workflow using a two-loop architecture, routing to domain skills as needed
- Specialized Expertise - Each domain skill provides deep, production-ready knowledge of a specific framework (Megatron-LM, vLLM, TRL, etc.)
- End-to-End Coverage - 87 skills spanning the full AI research lifecycle, from ideation and literature survey to experiments and paper writing
- Research-Grade Quality - Documentation sourced from official repos, real GitHub issues, and battle-tested production workflows
## Available AI Research Engineering Skills
**Quality over quantity**: Each skill provides comprehensive, expert-level guidance with real code examples, troubleshooting guides, and production-ready workflows.
### π¦ Quick Install (Recommended)
**For humans** β interactive installer with one command:
```bash
npx @orchestra-research/ai-research-skills
```
**For AI agents** β point your agent to the welcome doc and it handles the rest:
```
Read https://www.orchestra-research.com/ai-research-skills/welcome.md and follow the instructions to install and use AI Research Skills.
```
This installs all 87 skills, loads the **autoresearch** orchestration layer, and starts autonomous research.
What the installer does
- **Auto-detects** your installed coding agents
- **Installs** skills to `~/.orchestra/skills/` with symlinks to each agent (falls back to copy on Windows)
- **Offers** everything, quickstart bundle, by category, or individual skills
- **Updates** installed skills with latest versions
- **Uninstalls** all or selected skills
CLI Commands
```bash
# Interactive installer (recommended)
npx @orchestra-research/ai-research-skills
# Direct commands
npx @orchestra-research/ai-research-skills list # View installed skills
npx @orchestra-research/ai-research-skills update # Update installed skills
```
Claude Code Marketplace (Alternative)
Install skill categories directly using the **Claude Code CLI**:
```bash
# Add the marketplace
/plugin marketplace add orchestra-research/AI-research-SKILLs
# Install by category (22 categories available)
/plugin install fine-tuning@ai-research-skills # Axolotl, LLaMA-Factory, PEFT, Unsloth
/plugin install post-training@ai-research-skills # TRL, GRPO, OpenRLHF, SimPO, verl, slime, miles, torchforge
/plugin install inference-serving@ai-research-skills # vLLM, TensorRT-LLM, llama.cpp, SGLang
/plugin install distributed-training@ai-research-skills
/plugin install optimization@ai-research-skills
```
### All 22 Categories (87 Skills)
| Category | Skills | Included |
|----------|--------|----------|
| **Autoresearch** | **1** | **Autonomous research orchestration β central layer that manages the full lifecycle and routes to all other skills** |
| Ideation | 2 | Research Brainstorming, Creative Thinking |
| ML Paper Writing | 2 | ML Paper Writing (LaTeX templates, citation verification), Academic Plotting |
| Model Architecture | 5 | LitGPT, Mamba, NanoGPT, RWKV, TorchTitan |
| Tokenization | 2 | HuggingFace Tokenizers, SentencePiece |
| Fine-Tuning | 4 | Axolotl, LLaMA-Factory, PEFT, Unsloth |
| Mech Interp | 4 | TransformerLens, SAELens, pyvene, nnsight |
| Data Processing | 2 | NeMo Curator, Ray Data |
| Post-Training | 8 | TRL, GRPO, OpenRLHF, SimPO, verl, slime, miles, torchforge |
| Safety | 4 | Constitutional AI, LlamaGuard, NeMo Guardrails, Prompt Guard |
| Distributed | 6 | DeepSpeed, FSDP, Accelerate, Megatron-Core, Lightning, Ray Train |
| Infrastructure | 3 | Modal, Lambda Labs, SkyPilot |
| Optimization | 6 | Flash Attention, bitsandbytes, GPTQ, AWQ, HQQ, GGUF |
| Evaluation | 3 | lm-eval-harness, BigCode, NeMo Evaluator |
| Inference | 4 | vLLM, TensorRT-LLM, llama.cpp, SGLang |
| MLOps | 3 | W&B, MLflow, TensorBoard |
| Agents | 4 | LangChain, LlamaIndex, CrewAI, AutoGPT |
| RAG | 5 | Chroma, FAISS, Pinecone, Qdrant, Sentence Transformers |
| Prompt Eng | 4 | DSPy, Instructor, Guidance, Outlines |
| Observability | 2 | LangSmith, Phoenix |
| Multimodal | 7 | CLIP, Whisper, LLaVA, BLIP-2, SAM, Stable Diffusion, AudioCraft |
| Emerging | 6 | MoE, Model Merging, Long Context, Speculative Decoding, Distillation, Pruning |
View All 87 Skills in Details
### π¬ Autoresearch (1 skill) β Central Orchestration Layer
- **[Autoresearch](0-autoresearch-skill/)** - Autonomous research orchestration using a two-loop architecture (inner optimization + outer synthesis). Manages the full lifecycle from literature survey to paper writing, routing to all domain-specific skills. Supports Claude Code /loop and OpenClaw heartbeat for continuous operation (390 lines + 3 refs)
### ποΈ Model Architecture (5 skills)
- **[LitGPT](01-model-architecture/litgpt/)** - Lightning AI's 20+ clean LLM implementations with production training recipes (462 lines + 4 refs)
- **[Mamba](01-model-architecture/mamba/)** - State-space models with O(n) complexity, 5Γ faster than Transformers (253 lines + 3 refs)
- **[RWKV](01-model-architecture/rwkv/)** - RNN+Transformer hybrid, infinite context, Linux Foundation project (253 lines + 3 refs)
- **[NanoGPT](01-model-architecture/nanogpt/)** - Educational GPT in ~300 lines by Karpathy (283 lines + 3 refs)
- **[TorchTitan](01-model-architecture/torchtitan/)** - PyTorch-native distributed training for Llama 3.1 with 4D parallelism
### π€ Tokenization (2 skills)
- **[HuggingFace Tokenizers](02-tokenization/huggingface-tokenizers/)** - Rust-based, <20s/GB, BPE/WordPiece/Unigram algorithms (486 lines + 4 refs)
- **[SentencePiece](02-tokenization/sentencepiece/)** - Language-independent, 50k sentences/sec, used by T5/ALBERT (228 lines + 2 refs)
### π― Fine-Tuning (4 skills)
- **[Axolotl](03-fine-tuning/axolotl/)** - YAML-based fine-tuning with 100+ models (156 lines + 4 refs)
- **[LLaMA-Factory](03-fine-tuning/llama-factory/)** - WebUI no-code fine-tuning (78 lines + 5 refs)
- **[Unsloth](03-fine-tuning/unsloth/)** - 2x faster QLoRA fine-tuning (75 lines + 4 refs)
- **[PEFT](03-fine-tuning/peft/)** - Parameter-efficient fine-tuning with LoRA, QLoRA, DoRA, 25+ methods (431 lines + 2 refs)
### π¬ Mechanistic Interpretability (4 skills)
- **[TransformerLens](04-mechanistic-interpretability/transformer-lens/)** - Neel Nanda's library for mech interp with HookPoints, activation caching (346 lines + 3 refs)
- **[SAELens](04-mechanistic-interpretability/saelens/)** - Sparse Autoencoder training and analysis for feature discovery (386 lines + 3 refs)
- **[pyvene](04-mechanistic-interpretability/pyvene/)** - Stanford's causal intervention library with declarative configs (473 lines + 3 refs)
- **[nnsight](04-mechanistic-interpretability/nnsight/)** - Remote interpretability via NDIF, run experiments on 70B+ models (436 lines + 3 refs)
### π Data Processing (2 skills)
- **[Ray Data](05-data-processing/ray-data/)** - Distributed ML data processing, streaming execution, GPU support (318 lines + 2 refs)
- **[NeMo Curator](05-data-processing/nemo-curator/)** - GPU-accelerated data curation, 16Γ faster deduplication (375 lines + 2 refs)
### π Post-Training (8 skills)
- **[TRL Fine-Tuning](06-post-training/trl-fine-tuning/)** - Transformer Reinforcement Learning (447 lines + 4 refs)
- **[GRPO-RL-Training](06-post-training/grpo-rl-training/)** (TRL) - Group Relative Policy Optimization with TRL (569 lines, **gold standard**)
- **[OpenRLHF](06-post-training/openrlhf/)** - Full RLHF pipeline with Ray + vLLM (241 lines + 4 refs)
- **[SimPO](06-post-training/simpo/)** - Simple Preference Optimization, no reference model needed (211 lines + 3 refs)
- **[verl](06-post-training/verl/)** - ByteDance's HybridFlow RL framework, FSDP/Megatron + vLLM/SGLang backends (389 lines + 2 refs)
- **[slime](06-post-training/slime/)** - THUDM's Megatron+SGLang framework powering GLM-4.x models (464 lines + 2 refs)
- **[miles](06-post-training/miles/)** - Enterprise fork of slime with FP8, INT4, speculative RL for MoE training (315 lines + 2 refs)
- **[torchforge](06-post-training/torchforge/)** - Meta's PyTorch-native RL with Monarch+TorchTitan+vLLM (380 lines + 2 refs)
### π‘οΈ Safety & Alignment (4 skills)
- **[Constitutional AI](07-safety-alignment/constitutional-ai/)** - AI-driven self-improvement via principles (282 lines)
- **[LlamaGuard](07-safety-alignment/llamaguard/)** - Safety classifier for LLM inputs/outputs (329 lines)
- **[NeMo Guardrails](07-safety-alignment/nemo-guardrails/)** - Programmable guardrails with Colang (289 lines)
- **[Prompt Guard](07-safety-alignment/prompt-guard/)** - Meta's 86M prompt injection & jailbreak detector, 99%+ TPR, <2ms GPU (313 lines)
### β‘ Distributed Training (6 skills)
- **[Megatron-Core](08-distributed-training/megatron-core/)** - NVIDIA's framework for training 2B-462B param models with 47% MFU on H100 (359 lines + 4 refs)
- **[DeepSpeed](08-distributed-training/deepspeed/)** - Microsoft's ZeRO optimization (137 lines + 9 refs)
- **[PyTorch FSDP2](08-distributed-training/pytorch-fsdp2/)** - Fully Sharded Data Parallel v2 with `fully_shard` and DTensor (231 lines + 12 refs)
- **[Accelerate](08-distributed-training/accelerate/)** - HuggingFace's 4-line distributed training API (324 lines + 3 refs)
- **[PyTorch Lightning](08-distributed-training/pytorch-lightning/)** - High-level training framework with Trainer class (339 lines + 3 refs)
- **[Ray Train](08-distributed-training/ray-train/)** - Multi-node orchestration and hyperparameter tuning (399 lines + 1 ref)
### π Optimization (6 skills)
- **[Flash Attention](10-optimization/flash-attention/)** - 2-4x faster attention with memory efficiency (359 lines + 2 refs)
- **[bitsandbytes](10-optimization/bitsandbytes/)** - 8-bit/4-bit quantization for 50-75% memory reduction (403 lines + 3 refs)
- **[GPTQ](10-optimization/gptq/)** - 4-bit post-training quantization, 4Γ memory reduction, <2% accuracy loss (443 lines + 3 refs)
- **[AWQ](10-optimization/awq/)** - Activation-aware weight quantization, 4-bit with minimal accuracy loss (310 lines + 2 refs)
- **[HQQ](10-optimization/hqq/)** - Half-Quadratic Quantization, no calibration data needed, multi-backend (370 lines + 2 refs)
- **[GGUF](10-optimization/gguf/)** - llama.cpp quantization format, K-quant methods, CPU/Metal inference (380 lines + 2 refs)
### π Evaluation (3 skills)
- **[lm-evaluation-harness](11-evaluation/lm-evaluation-harness/)** - EleutherAI's standard for benchmarking LLMs across 60+ tasks (482 lines + 4 refs)
- **[BigCode Evaluation Harness](11-evaluation/bigcode-evaluation-harness/)** - Code model benchmarking with HumanEval, MBPP, MultiPL-E, pass@k metrics (406 lines + 3 refs)
- **[NeMo Evaluator](11-evaluation/nemo-evaluator/)** - NVIDIA's enterprise platform for 100+ benchmarks across 18+ harnesses with multi-backend execution (454 lines + 4 refs)
### βοΈ Infrastructure (3 skills)
- **[Modal](09-infrastructure/modal/)** - Serverless GPU cloud with Python-native API, T4-H200 on-demand (342 lines + 2 refs)
- **[SkyPilot](09-infrastructure/skypilot/)** - Multi-cloud orchestration across 20+ providers with spot recovery (390 lines + 2 refs)
- **[Lambda Labs](09-infrastructure/lambda-labs/)** - Reserved/on-demand GPU cloud with H100/A100, persistent filesystems (390 lines + 2 refs)
### π₯ Inference & Serving (4 skills)
- **[vLLM](12-inference-serving/vllm/)** - High-throughput LLM serving with PagedAttention (356 lines + 4 refs, **production-ready**)
- **[TensorRT-LLM](12-inference-serving/tensorrt-llm/)** - NVIDIA's fastest inference, 24k tok/s, FP8/INT4 quantization (180 lines + 3 refs)
- **[llama.cpp](12-inference-serving/llama-cpp/)** - CPU/Apple Silicon inference, GGUF quantization (251 lines + 3 refs)
- **[SGLang](12-inference-serving/sglang/)** - Structured generation with RadixAttention, 5-10Γ faster for agents (435 lines + 3 refs)
### π€ Agents (4 skills)
- **[LangChain](14-agents/langchain/)** - Most popular agent framework, 500+ integrations, ReAct pattern (658 lines + 3 refs, **production-ready**)
- **[LlamaIndex](14-agents/llamaindex/)** - Data framework for LLM apps, 300+ connectors, RAG-focused (535 lines + 3 refs)
- **[CrewAI](14-agents/crewai/)** - Multi-agent orchestration, role-based collaboration, autonomous workflows (498 lines + 3 refs)
- **[AutoGPT](14-agents/autogpt/)** - Autonomous AI agent platform, visual workflow builder, continuous execution (400 lines + 2 refs)
### π RAG (5 skills)
- **[Chroma](15-rag/chroma/)** - Open-source embedding database, local/cloud, 24k stars (385 lines + 1 ref)
- **[FAISS](15-rag/faiss/)** - Facebook's similarity search, billion-scale, GPU acceleration (295 lines)
- **[Sentence Transformers](15-rag/sentence-transformers/)** - 5000+ embedding models, multilingual, 15k stars (370 lines)
- **[Pinecone](15-rag/pinecone/)** - Managed vector database, auto-scaling, <100ms latency (410 lines)
- **[Qdrant](15-rag/qdrant/)** - High-performance vector search, Rust-powered, hybrid search with filtering (493 lines + 2 refs)
### π¨ Multimodal (7 skills)
- **[CLIP](18-multimodal/clip/)** - OpenAI's vision-language model, zero-shot classification, 25k stars (320 lines)
- **[Whisper](18-multimodal/whisper/)** - Robust speech recognition, 99 languages, 73k stars (395 lines)
- **[LLaVA](18-multimodal/llava/)** - Vision-language assistant, image chat, GPT-4V level (360 lines)
- **[Stable Diffusion](18-multimodal/stable-diffusion/)** - Text-to-image generation via HuggingFace Diffusers, SDXL, ControlNet (380 lines + 2 refs)
- **[Segment Anything](18-multimodal/segment-anything/)** - Meta's SAM for zero-shot image segmentation with points/boxes (500 lines + 2 refs)
- **[BLIP-2](18-multimodal/blip-2/)** - Vision-language pretraining with Q-Former, image captioning, VQA (500 lines + 2 refs)
- **[AudioCraft](18-multimodal/audiocraft/)** - Meta's MusicGen/AudioGen for text-to-music and text-to-sound (470 lines + 2 refs)
### π― Prompt Engineering (4 skills)
- **[DSPy](16-prompt-engineering/dspy/)** - Declarative prompt programming with optimizers, Stanford NLP, 22k stars (438 lines + 3 refs)
- **[Instructor](16-prompt-engineering/instructor/)** - Structured LLM outputs with Pydantic validation, 15k stars (726 lines + 3 refs)
- **[Guidance](16-prompt-engineering/guidance/)** - Constrained generation with regex/grammars, Microsoft Research, 18k stars (485 lines + 3 refs)
- **[Outlines](16-prompt-engineering/outlines/)** - Structured text with FSM, zero-overhead, 8k stars (601 lines + 3 refs)
### π MLOps (3 skills)
- **[Weights & Biases](13-mlops/weights-and-biases/)** - Experiment tracking, sweeps, artifacts, model registry (427 lines + 3 refs)
- **[MLflow](13-mlops/mlflow/)** - Model registry, tracking, deployment, autologging (514 lines + 3 refs)
- **[TensorBoard](13-mlops/tensorboard/)** - Visualization, profiling, embeddings, scalars/images (538 lines + 3 refs)
### ποΈ Observability (2 skills)
- **[LangSmith](17-observability/langsmith/)** - LLM observability, tracing, evaluation, monitoring for AI apps (422 lines + 2 refs)
- **[Phoenix](17-observability/phoenix/)** - Open-source AI observability with OpenTelemetry tracing and LLM evaluation (380 lines + 2 refs)
### π¬ Emerging Techniques (6 skills)
- **[MoE Training](19-emerging-techniques/moe-training/)** - Mixture of Experts training with DeepSpeed, Mixtral 8x7B, 5Γ cost reduction (515 lines + 3 refs)
- **[Model Merging](19-emerging-techniques/model-merging/)** - Combine models with TIES, DARE, SLERP using mergekit (528 lines + 3 refs)
- **[Long Context](19-emerging-techniques/long-context/)** - Extend context windows with RoPE, YaRN, ALiBi, 32k-128k tokens (624 lines + 3 refs)
- **[Speculative Decoding](19-emerging-techniques/speculative-decoding/)** - 1.5-3.6Γ faster inference with Medusa, Lookahead (379 lines)
- **[Knowledge Distillation](19-emerging-techniques/knowledge-distillation/)** - Compress models 70Bβ7B with MiniLLM, temperature scaling (424 lines)
- **[Model Pruning](19-emerging-techniques/model-pruning/)** - 50% sparsity with Wanda, SparseGPT, <1% accuracy loss (417 lines)
### π ML Paper Writing (2 skills)
- **[ML Paper Writing](20-ml-paper-writing/)** - Write publication-ready papers for NeurIPS, ICML, ICLR, ACL, AAAI, COLM with LaTeX templates, citation verification, and writing best practices (532 lines + 5 refs)
- **[Academic Plotting](20-ml-paper-writing/academic-plotting/)** - Generate publication-quality figures for ML papers: architecture diagrams via Gemini AI and data-driven charts via matplotlib/seaborn with venue-specific styling (479 lines + 3 refs)
### π‘ Ideation (2 skills)
- **[Research Brainstorming](21-research-ideation/brainstorming-research-ideas/)** - Structured ideation frameworks for discovering high-impact research directions with 10 complementary lenses (384 lines)
- **[Creative Thinking](21-research-ideation/creative-thinking-for-research/)** - Cognitive science frameworks (bisociation, structure-mapping, constraint manipulation) for genuinely novel research ideas (366 lines)
## Demos
All 87 skills in this repo are automatically synced to [Orchestra Research](https://www.orchestra-research.com/research-skills), where you can add them to your projects with one click and use them with AI research agents.
**See skills in action β [demos/](demos/README.md)**
We maintain a curated collection of demo repositories showing how to use skills for real AI research tasks:
| Demo | Skills Used | What It Does |
|------|-------------|--------------|
| **[Norm Heterogeneity β LoRA Brittleness](demos/autoresearch-norm-heterogeneity/)** | Autoresearch, ML Paper Writing, Ideation | Agent autonomously discovered norm heterogeneity predicts fine-tuning difficulty (r=-0.99), pivoting from a null result on ETF overlaps |
| **[RL Algorithm Brain Scan](demos/autoresearch-rl-brain-scan/)** | Autoresearch, GRPO, TRL, SAELens, TransformerLens, ML Paper Writing | Agent found DPO is a rank-1 perturbation (95.6% recovery from one SVD direction) while online RL is distributed and structure-preserving |
| **[NeMo Eval: GPQA Benchmark](https://github.com/zechenzhangAGI/Nemo-Eval-Skill-Demo)** | NeMo Evaluator | Compare Llama 8B/70B/405B on graduate-level science questions |
| **[LoRA Without Regret Reproduction](https://www.orchestra-research.com/perspectives/LLM-with-Orchestra)** | GRPO, TRL | Reproduce SFT + GRPO RL experiments via prompting |
| **[Layer-Wise Quantization Experiment](https://github.com/AmberLJC/llama-quantization-experiment)** | llama.cpp, GGUF | Investigate optimal layer precision allocationβearly layers at Q8 achieve 1.9Γ compression with 1.3% perplexity loss |
| **[Cross-Lingual Alignment Analysis](https://github.com/AmberLJC/faiss-demo)** | FAISS | Quantify how well multilingual embeddings align semantic concepts across 8 languages using FAISS similarity search |
| **[Scientific Plotting Demo](demos/scientific-plotting-demo/)** | Academic Plotting | Generate publication-quality figures for the Andes QoE-aware LLM serving paper β Gemini AI architecture diagrams + matplotlib data charts (CDF, multi-panel grids, bar charts) |
**Featured Demos**: Two papers produced entirely by AI agents using the **autoresearch** skill. The [Norm Heterogeneity paper](demos/autoresearch-norm-heterogeneity/) demonstrates autonomous research pivoting β the agent refuted its own hypothesis and discovered a stronger finding. The [RL Brain Scan paper](demos/autoresearch-rl-brain-scan/) demonstrates multi-skill orchestration β the agent trained RL models, analyzed internals with interpretability tools, and synthesized the insight that "DPO is rank-1 alignment." Both papers written end-to-end by the agent.
## Skill Structure
Each skill follows a battle-tested format for maximum usefulness:
```
skill-name/
βββ SKILL.md # Quick reference (50-150 lines)
β βββ Metadata (name, description, version)
β βββ When to use this skill
β βββ Quick patterns & examples
β βββ Links to references
β
βββ references/ # Deep documentation (300KB+)
β βββ README.md # From GitHub/official docs
β βββ api.md # API reference
β βββ tutorials.md # Step-by-step guides
β βββ issues.md # Real GitHub issues & solutions
β βββ releases.md # Version history & breaking changes
β βββ file_structure.md # Codebase navigation
β
βββ scripts/ # Helper scripts (optional)
βββ assets/ # Templates & examples (optional)
```
Quality Standards
- 300KB+ documentation from official sources
- Real GitHub issues & solutions (when available)
- Code examples with language detection
- Version history & breaking changes
- Links to official docs
## Roadmap
We're building towards 80 comprehensive skills across the full AI research lifecycle. See our [detailed roadmap](docs/ROADMAP.md) for the complete development plan.
[View Full Roadmap β](docs/ROADMAP.md)
View Detailed Statistics
| Metric | Current | Target |
|--------|---------|--------|
| **Skills** | **87** (high-quality, standardized YAML) | 80 β |
| **Avg Lines/Skill** | **420 lines** (focused + progressive disclosure) | 200-600 lines |
| **Documentation** | **~130,000 lines** total (SKILL.md + references) | 100,000+ lines |
| **Gold Standard Skills** | **65** with comprehensive references | 50+ |
| **Contributors** | 1 | 100+ |
| **Coverage** | Architecture, Tokenization, Fine-Tuning, Mechanistic Interpretability, Data Processing, Post-Training, Safety, Distributed, Optimization, Evaluation, Infrastructure, Inference, Agents, RAG, Multimodal, Prompt Engineering, MLOps, Observability, ML Paper Writing, Ideation, Autoresearch | Full Lifecycle β |
**Recent Progress**: npm package `@orchestra-research/ai-research-skills` for one-command installation across all coding agents
**Philosophy**: Quality > Quantity. Following [Anthropic official best practices](anthropic_official_docs/best_practices.md) - each skill provides 200-500 lines of focused, actionable guidance with progressive disclosure.
## Repository Structure
```
claude-ai-research-skills/
βββ README.md β You are here
βββ CONTRIBUTING.md β Contribution guide
βββ demos/ β Curated demo gallery (links to demo repos)
βββ docs/
βββ 0-autoresearch-skill/ (1 skill β - Autonomous research orchestration)
βββ 01-model-architecture/ (5 skills β - LitGPT, Mamba, RWKV, NanoGPT, TorchTitan)
βββ 02-tokenization/ (2 skills β - HuggingFace Tokenizers, SentencePiece)
βββ 03-fine-tuning/ (4 skills β - Axolotl, LLaMA-Factory, Unsloth, PEFT)
βββ 04-mechanistic-interpretability/ (4 skills β - TransformerLens, SAELens, pyvene, nnsight)
βββ 05-data-processing/ (2 skills β - Ray Data, NeMo Curator)
βββ 06-post-training/ (8 skills β - TRL, GRPO, OpenRLHF, SimPO, verl, slime, miles, torchforge)
βββ 07-safety-alignment/ (4 skills β - Constitutional AI, LlamaGuard, NeMo Guardrails, Prompt Guard)
βββ 08-distributed-training/ (6 skills β - Megatron-Core, DeepSpeed, FSDP, Accelerate, Lightning, Ray Train)
βββ 09-infrastructure/ (3 skills β - Modal, SkyPilot, Lambda Labs)
βββ 10-optimization/ (6 skills β - Flash Attention, bitsandbytes, GPTQ, AWQ, HQQ, GGUF)
βββ 11-evaluation/ (3 skills β - lm-evaluation-harness, BigCode, NeMo Evaluator)
βββ 12-inference-serving/ (4 skills β - vLLM, TensorRT-LLM, llama.cpp, SGLang)
βββ 13-mlops/ (3 skills β - Weights & Biases, MLflow, TensorBoard)
βββ 14-agents/ (4 skills β - LangChain, LlamaIndex, CrewAI, AutoGPT)
βββ 15-rag/ (5 skills β - Chroma, FAISS, Sentence Transformers, Pinecone, Qdrant)
βββ 16-prompt-engineering/ (4 skills β - DSPy, Instructor, Guidance, Outlines)
βββ 17-observability/ (2 skills β - LangSmith, Phoenix)
βββ 18-multimodal/ (7 skills β - CLIP, Whisper, LLaVA, Stable Diffusion, SAM, BLIP-2, AudioCraft)
βββ 19-emerging-techniques/ (6 skills β - MoE, Model Merging, Long Context, Speculative Decoding, Distillation, Pruning)
βββ 20-ml-paper-writing/ (2 skills β - ML Paper Writing with LaTeX templates, Academic Plotting)
βββ 21-research-ideation/ (2 skills β - Research Brainstorming, Creative Thinking)
βββ packages/ai-research-skills/ (npm package for one-command installation)
```
## Use Cases
### For Researchers
"I need to fine-tune Llama 3 with custom data"
β **03-fine-tuning/axolotl/** - YAML configs, 100+ model support
### For ML Engineers
"How do I optimize inference latency?"
β **12-inference-serving/vllm/** - PagedAttention, batching
### For Students
"I want to learn how transformers work"
β **01-model-architecture/litgpt/** - Clean implementations
### For Teams
"We need to scale training to 100 GPUs"
β **08-distributed-training/deepspeed/** - ZeRO stages, 3D parallelism
## License
MIT License - See [LICENSE](LICENSE) for details.
**Note**: Individual skills may reference libraries with different licenses. Please check each project's license before use.
## Citation
If you use AI Research Skills in your work or find it helpful for a publication, we'd appreciate a citation:
**BibTeX**
```bibtex
@software{ai_research_skills,
title = {AI Research Skills Library},
author = {{Orchestra Research}},
year = {2025},
url = {https://github.com/orchestra-research/AI-research-SKILLs},
note = {Open-source skills library enabling AI agents to autonomously conduct AI research}
}
```
**APA**
> Orchestra Research. (2025). *AI Research Skills Library* [Computer software]. https://github.com/orchestra-research/AI-research-SKILLs
**Chicago**
> Orchestra Research. "AI Research Skills Library." GitHub, 2025. https://github.com/orchestra-research/AI-research-SKILLs.
**IEEE**
> Orchestra Research, "AI Research Skills Library," 2025. [Online]. Available: https://github.com/orchestra-research/AI-research-SKILLs
> **Tip**: You can also click **"Cite this repository"** in the GitHub sidebar for auto-formatted citations.
## Acknowledgments
Built with:
- **[Claude Code](https://www.claude.com/product/claude-code)** - AI pair programming
- **[Skill Seeker](https://github.com/yusufkaraaslan/Skill_Seekers)** - Automated doc scraping
- **Open Source AI Community** - For amazing tools and docs
Special thanks to:
- EleutherAI, HuggingFace, NVIDIA, Lightning AI, Meta AI, Anthropic
- All researchers who maintain excellent documentation
## Contributors
Thanks to all the people who have contributed to the AI Research Skills Library:
We welcome contributions from the AI research community! See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines on:
- Adding new skills
- Improving existing skills
- Quality standards and best practices
- Submission process
## Recent Updates
March 2026 - v1.4.0 π¬ Autoresearch & 86 Skills β Full Research Lifecycle
- π¬ **NEW SKILL**: **Autoresearch** β autonomous research orchestration using a two-loop architecture (inner optimization loop + outer synthesis loop)
- π§ Manages the full research lifecycle: literature survey β ideation β experiments β synthesis β paper writing
- π Routes to all 86 domain skills automatically β agents don't need to know which skill to use
- β° Mandatory `/loop` (Claude Code) and cron job (OpenClaw) for continuous autonomous operation
- π Generates research presentations (HTML/PDF) with optimization trajectory plots for human review
- π Findings.md as persistent project memory across sessions with "Lessons and Constraints" tracking
- ποΈ Structured workspace: research-state.yaml, findings.md, research-log.md, literature/, experiments/, src/, data/, to_human/
- π **Two demo papers produced by autoresearch**: [Norm Heterogeneity β LoRA Brittleness](demos/autoresearch-norm-heterogeneity/) and [RL Algorithm Brain Scan](demos/autoresearch-rl-brain-scan/)
- π WELCOME.md for cold-start agent bootstrap β one URL to go from zero to autonomous research
- π¦ npm v1.4.x with Windows symlink fallback, all 22 categories installable
- π **87 total skills** across **22 categories** β complete research lifecycle coverage
February 2026 - v0.15.0 π‘οΈ Prompt Guard & 83 Skills
- π‘οΈ **NEW SKILL**: Prompt Guard - Meta's 86M prompt injection & jailbreak detector
- β‘ 99%+ TPR, <1% FPR, <2ms GPU latency, multilingual (8 languages)
- π 3 workflows: user input filtering, third-party data filtering, batch RAG processing
- π **83 total skills** across 20 categories
January 2026 - v0.14.0 π¦ npm Package & 82 Skills
- π¦ **NEW**: `npx @orchestra-research/ai-research-skills` - One-command installation for all coding agents
- π€ **Supported agents**: Claude Code, OpenCode, Cursor, Codex, Gemini CLI, Qwen Code
- β¨ Interactive installer with category/individual skill selection
- π Update installed skills, selective uninstall
- π **82 total skills** (5 new post-training skills: verl, slime, miles, torchforge + TorchTitan)
- ποΈ Megatron-Core moved to Distributed Training category
January 2026 - v0.13.0 π ML Paper Writing & Demos Gallery
- π **NEW CATEGORY**: ML Paper Writing (20th category, 77th skill)
- π― Write publication-ready papers for NeurIPS, ICML, ICLR, ACL, AAAI, COLM
- π Writing philosophy from top researchers (Neel Nanda, Farquhar, Gopen & Swan, Lipton, Perez)
- π¬ Citation verification workflow - never hallucinate references
- π LaTeX templates for 6 major conferences
- πͺ **NEW**: Curated demos gallery (`demos/`) showcasing skills in action
- π Demo repos: NeMo Evaluator benchmark, LoRA Without Regret reproduction
- π 936-line comprehensive SKILL.md with 4 workflows
January 2026 - v0.12.0 π NeMo Evaluator SDK
- π **NEW SKILL**: NeMo Evaluator SDK for enterprise LLM benchmarking
- π§ NVIDIA's evaluation platform with 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM)
- β‘ Multi-backend execution: local Docker, Slurm HPC, Lepton cloud
- π¦ Container-first architecture for reproducible evaluation
- π 454 lines SKILL.md + 4 comprehensive reference files (~48KB documentation)
December 2025 - v0.11.0 π¬ Mechanistic Interpretability
- π¬ **NEW CATEGORY**: Mechanistic Interpretability (4 skills)
- π TransformerLens skill: Neel Nanda's library for mech interp with HookPoints, activation caching, circuit analysis
- π§ SAELens skill: Sparse Autoencoder training and analysis for feature discovery, monosemanticity research
- β‘ pyvene skill: Stanford's causal intervention library with declarative configs, DAS, activation patching
- π nnsight skill: Remote interpretability via NDIF, run experiments on 70B+ models without local GPUs
- π ~6,500 new lines of documentation across 16 files
- **76 total skills** (filling the missing 04 category slot)
November 25, 2025 - v0.10.0 π 70 Skills Complete!
- π **ROADMAP COMPLETE**: Reached 70-skill milestone!
- π Added 4 skills: Lambda Labs, Segment Anything (SAM), BLIP-2, AudioCraft
- βοΈ Lambda Labs skill: Reserved/on-demand GPU cloud with H100/A100, persistent filesystems, 1-Click Clusters
- πΌοΈ SAM skill: Meta's Segment Anything for zero-shot image segmentation with points/boxes/masks
- ποΈ BLIP-2 skill: Vision-language pretraining with Q-Former, image captioning, VQA
- π΅ AudioCraft skill: Meta's MusicGen/AudioGen for text-to-music and text-to-sound generation
- π ~10,000 new lines of documentation across 12 files
- **70 total skills** (100% roadmap complete!)
November 25, 2025 - v0.9.0
- π Added 2 infrastructure skills: Modal, SkyPilot
- βοΈ Modal skill: Serverless GPU cloud with Python-native API, T4-H200 on-demand, auto-scaling
- π SkyPilot skill: Multi-cloud orchestration across 20+ providers with spot recovery
- β¨ New Infrastructure category (2 skills - serverless GPU and multi-cloud orchestration)
- π ~2,500 new lines of documentation across 6 files
- **66 total skills** (94% towards 70-skill target)
November 25, 2025 - v0.8.0
- π Added 5 high-priority skills: HQQ, GGUF, Phoenix, AutoGPT, Stable Diffusion
- β‘ HQQ skill: Half-Quadratic Quantization without calibration data, multi-backend support
- π¦ GGUF skill: llama.cpp quantization format, K-quant methods, CPU/Metal inference
- ποΈ Phoenix skill: Open-source AI observability with OpenTelemetry tracing and LLM evaluation
- π€ AutoGPT skill: Autonomous AI agent platform with visual workflow builder
- π¨ Stable Diffusion skill: Text-to-image generation via Diffusers, SDXL, ControlNet, LoRA
- π ~9,000 new lines of documentation across 15 files
- **64 total skills** (91% towards 70-skill target)
November 25, 2025 - v0.7.0
- π Added 5 high-priority skills: PEFT, CrewAI, Qdrant, AWQ, LangSmith
- β¨ New Observability category with LangSmith for LLM tracing and evaluation
- π― PEFT skill: Parameter-efficient fine-tuning with LoRA, QLoRA, DoRA, 25+ methods
- π€ CrewAI skill: Multi-agent orchestration with role-based collaboration
- π Qdrant skill: High-performance Rust vector search with hybrid filtering
- β‘ AWQ skill: Activation-aware 4-bit quantization with minimal accuracy loss
- π ~8,000 new lines of documentation across 15 files
- **59 total skills** (84% towards 70-skill target)
November 15, 2025 - v0.6.0
- π Added 3 comprehensive MLOps skills: Weights & Biases, MLflow, TensorBoard
- β¨ New MLOps category (3 skills - experiment tracking, model registry, visualization)
- π ~10,000 new lines of documentation across 13 files
- π§ Comprehensive coverage: experiment tracking, hyperparameter sweeps, model registry, profiling, embeddings visualization
- **54 total skills** (77% towards 70-skill target)
November 12, 2025 - v0.5.0
- π― Added 4 comprehensive prompt engineering skills: DSPy, Instructor, Guidance, Outlines
- β¨ New Prompt Engineering category (4 skills - DSPy, Instructor, Guidance, Outlines)
- π ~10,000 new lines of documentation across 16 files
- π§ Comprehensive coverage: declarative programming, structured outputs, constrained generation, FSM-based generation
- **47 total skills** (67% towards 70-skill target)
November 9, 2025 - v0.4.0
- π€ Added 11 comprehensive skills: LangChain, LlamaIndex, Chroma, FAISS, Sentence Transformers, Pinecone, CLIP, Whisper, LLaVA
- β¨ New Agents category (2 skills - LangChain, LlamaIndex)
- π New RAG category (4 skills - Chroma, FAISS, Sentence Transformers, Pinecone)
- π¨ New Multimodal category (3 skills - CLIP, Whisper, LLaVA)
- π ~15,000 new lines of documentation
- **43 total skills** (61% towards 70-skill target)
November 8, 2025 - v0.3.0
- π Added 8 comprehensive skills: TensorRT-LLM, llama.cpp, SGLang, GPTQ, HuggingFace Tokenizers, SentencePiece, Ray Data, NeMo Curator
- β‘ Completed Inference & Serving category (4/4 skills)
- π€ New Tokenization category (2 skills)
- π New Data Processing category (2 skills)
- π 9,617 new lines of documentation across 30 files
- **32 total skills** (45% towards 70-skill target)
November 6, 2025 - v0.2.0
- Added 10 skills from GitHub (Megatron-Core, Lightning, Ray Train, etc.)
- Improved skill structure with comprehensive references
- Created strategic roadmap to 70 skills
- Added contribution guidelines
November 3, 2025 - v0.1.0
- π Initial release with 5 fine-tuning skills
## Community
Join our community to stay updated, ask questions, and connect with other AI researchers:
- **[SkillEvolve Meta-Skill](https://github.com/Skill-Evolve/meta-skill)** - Connect your agent to the collective intelligence of the community. Captures techniques discovered during sessions and shares them back as curated skills.
- **[Slack Community](https://join.slack.com/t/orchestrarese-efu1990/shared_invite/zt-3iu6gr8io-zJvpkZTPToEviQ9KFZvNSg)** - Chat with the team and other users
- **[Twitter/X](https://x.com/orch_research)** - Follow for updates and announcements
- **[LinkedIn](https://www.linkedin.com/company/orchestra-research/)** - Connect professionally
## Star History