# Contributing to Claude AI Research Skills Thank you for your interest in contributing! This guide will help you add new skills to the library. --- ## 🎯 What We're Building **Vision**: The most comprehensive open-source library of AI research skills for Claude Code. **Target**: 70 comprehensive skills covering the entire AI research lifecycleβ€”from model architecture to production deployment. **Current Progress**: 7/70 skills (10%) **Philosophy**: Quality > Quantity. We deleted 9 low-quality skills to maintain high standards. --- ## 🀝 How to Contribute ### Ways to Contribute 1. **Add a new skill** - Most valuable contribution 2. **Improve existing skills** - Update docs, add examples, fix errors 3. **Report issues** - Outdated information, broken links, missing content 4. **Share feedback** - What skills do you need? What's missing? --- ## πŸ“ Adding a New Skill ### Step 1: Choose a Skill ### Step 2: Fork and Clone ```bash # Fork the repository on GitHub first git clone https://github.com/YOUR_USERNAME/AI-research-SKILLs.git cd claude-ai-research-skills # Create a feature branch git checkout -b add-vllm-skill ``` ### Step 3: Use Skill Seeker MCP **Option A: Documentation Scraping** ```bash # Create config file python3 cli/doc_scraper.py --interactive # Or copy and modify an existing config cp configs/react.json configs/vllm.json # Scrape and build python3 cli/doc_scraper.py --config configs/vllm.json ``` **Option B: GitHub Scraping** ```bash # Scrape from GitHub repository export GITHUB_TOKEN=$(gh auth token) python3 cli/github_scraper.py --repo vllm-project/vllm --name vllm --description "High-performance LLM inference with PagedAttention" ``` **Option C: Unified Scraping** (recommended for comprehensive skills) ```bash # Combine documentation + GitHub + PDF python3 cli/unified_scraper.py --config configs/vllm_unified.json ``` ### Step 4: Move to Correct Directory ```bash # Determine the category (see directory structure below) mv output/vllm/ 12-inference-serving/vllm/ # Move metadata mv output/vllm_data/ .metadata/vllm_data/ ``` ### Step 5: Validate Quality **Based on [Anthropic Official Best Practices](anthropic_official_docs/best_practices.md)** **Core Requirements** (or skill will be rejected): - βœ… YAML frontmatter with `name` (gerund form, e.g., "serving-llms") and `description` (third person, includes what AND when) - βœ… SKILL.md body: **200-300 lines** (under 500 lines maximum) - βœ… Progressive disclosure: SKILL.md as overview, details in separate reference files - βœ… Workflows with copy-paste checklists for complex tasks - βœ… When to use vs alternatives guidance - βœ… Common issues section with solutions - βœ… Concise content: assume Claude is smart, no over-explaining basics - βœ… Code examples with language detection (```python, ```bash, etc.) **Gold Standard** (aim for this): - βœ… SKILL.md: 200-300 lines of focused, actionable guidance - βœ… 2-3 complete workflows with step-by-step checklists - βœ… Reference files for advanced topics (one level deep from SKILL.md) - βœ… Feedback loops (validate β†’ fix β†’ repeat) for quality-critical operations - βœ… Consistent terminology throughout - βœ… Concrete examples (input/output pairs where helpful) - βœ… Clear, concise troubleshooting guide **NOT Acceptable**: - ❌ SKILL.md over 500 lines (split into reference files instead) - ❌ Over-explaining basics that Claude already knows - ❌ First-person descriptions ("I can help you...") - ❌ Vague skill names ("helper", "utils", "tools") - ❌ Nested references (SKILL.md β†’ ref1.md β†’ ref2.md) - ❌ Generic templates that just link to README/CHANGELOG - ❌ Missing workflows with checklists for complex tasks - ❌ Time-sensitive information (use "old patterns" section instead) **Quick Quality Check**: ```bash # Check SKILL.md has real code examples cat 12-inference-serving/vllm/SKILL.md # Check reference files exist ls -lh 12-inference-serving/vllm/references/ # Verify total documentation size (should be 300KB+) du -sh 12-inference-serving/vllm/references/ ``` ### YAML Frontmatter Format Standards All SKILL.md files **must** include properly formatted YAML frontmatter with the following fields: ```yaml --- name: skill-name-here description: Clear description of when to use this skill version: 1.0.0 author: Orchestra Research license: MIT tags: [Tag One, Tag Two, Tag Three] dependencies: [package1>=1.0.0, package2>=2.0.0] --- ``` **Field Requirements:** | Field | Required | Format | Notes | |-------|----------|--------|-------| | `name` | βœ… Yes | kebab-case | No quotes, lowercase with hyphens | | `description` | βœ… Yes | Plain text | No quotes, concise explanation | | `version` | βœ… Yes | Semantic version | Format: `MAJOR.MINOR.PATCH` | | `author` | βœ… Yes | Plain text | Use "Orchestra Research" | | `license` | βœ… Yes | License identifier | Typically `MIT` | | `tags` | βœ… Yes | Array | Capitalized words, no quotes | | `dependencies` | ⚠️ Optional | Array | Include version constraints | **Tag Guidelines:** - Use **Title Case** for all tags (capitalize first letter of each word) - Keep acronyms **UPPERCASE** (e.g., `GRPO`, `TRL`, `RLHF`, `DPO`) - Use descriptive, searchable terms - Include 5-10 relevant tags - No quotes around tags **Example Tags:** ```yaml tags: [Reinforcement Learning, GRPO, TRL, Post-Training, RLHF, Reward Modeling] ``` **Dependencies Guidelines:** - Only include **direct dependencies** needed to use the skill - Include **minimum version constraints** using `>=` - No quotes around package names - List core packages first, optional packages last **Example Dependencies:** ```yaml dependencies: [transformers>=4.47.0, trl>=0.14.0, datasets>=3.2.0, peft>=0.14.0, torch] ``` **Complete Example:** ```yaml --- name: grpo-rl-training description: Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training version: 1.0.0 author: Orchestra Research license: MIT tags: [Reinforcement Learning, GRPO, TRL, Post-Training, RLHF, Reward Modeling, Reasoning, DPO, PPO, Structured Output] dependencies: [transformers>=4.47.0, trl>=0.14.0, datasets>=3.2.0, peft>=0.14.0, torch] --- ``` **Validation Checklist:** - [ ] YAML frontmatter is present at the very beginning of SKILL.md - [ ] All required fields are included - [ ] No quotes around field values (except in arrays) - [ ] Tags use Title Case (capitalized words) - [ ] Dependencies include version constraints where appropriate - [ ] YAML is valid (test with: `python -c "import yaml; yaml.safe_load(open('SKILL.md').read().split('---')[1])"`) ### Step 6: Update Marketplace Add your skill to `.claude-plugin/marketplace.json` so it appears in the Claude Code plugin marketplace. **Add a new entry to the `plugins` array:** ```json { "name": "your-skill-name", "source": "./XX-category/skill-folder", "description": "Description from your SKILL.md frontmatter (what it does AND when to use it)" } ``` **Example:** ```json { "name": "serving-llms-vllm", "source": "./12-inference-serving/vllm", "description": "Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs or optimizing inference latency/throughput." } ``` **Validation:** ```bash # Verify JSON is valid after editing python3 -c "import json; json.load(open('.claude-plugin/marketplace.json'))" ``` **Important**: Place your entry in the correct position (skills are ordered by category number). ### Step 7: Submit Pull Request ```bash # Add your changes git add 12-inference-serving/vllm/ git add .metadata/vllm_data/ git add .claude-plugin/marketplace.json # Commit with descriptive message git commit -m "Add vLLM inference serving skill - 215 pages of documentation - 12 GitHub issues with solutions - API reference and examples - Performance benchmarks included" # Push to your fork git push origin add-vllm-skill ``` Then create a Pull Request on GitHub with: - **Title**: "Add [Skill Name] skill" - **Description**: - What the skill covers - Source (docs, GitHub, or both) - Documentation size - Key features/examples included --- ## πŸ“‚ Directory Structure Place skills in the correct category: ``` claude-ai-research-skills/ β”œβ”€β”€ 01-model-architecture/ # Model architectures (GPT, LLaMA, etc.) β”œβ”€β”€ 02-tokenization/ # Tokenizers (HuggingFace, SentencePiece) β”œβ”€β”€ 03-fine-tuning/ # Fine-tuning frameworks (Axolotl, TRL) β”œβ”€β”€ 04-peft/ # Parameter-efficient methods (LoRA, QLoRA) β”œβ”€β”€ 05-data-processing/ # Data curation and processing β”œβ”€β”€ 06-post-training/ # RLHF, DPO, PPO β”œβ”€β”€ 07-safety-alignment/ # Guardrails, safety, content moderation β”œβ”€β”€ 08-distributed-training/ # DeepSpeed, FSDP, distributed systems β”œβ”€β”€ 09-infrastructure/ # PyTorch Lightning, Ray, Composer β”œβ”€β”€ 10-optimization/ # Flash Attention, bitsandbytes, kernels β”œβ”€β”€ 11-evaluation/ # Benchmarks, evaluation frameworks β”œβ”€β”€ 12-inference-serving/ # vLLM, TensorRT-LLM, llama.cpp β”œβ”€β”€ 13-mlops/ # Weights & Biases, MLflow, TensorBoard β”œβ”€β”€ 14-agents/ # LangChain, LlamaIndex, CrewAI β”œβ”€β”€ 15-rag/ # RAG pipelines, vector databases β”œβ”€β”€ 16-prompt-engineering/ # DSPy, Instructor, structured output β”œβ”€β”€ 17-observability/ # LangSmith, Phoenix, monitoring β”œβ”€β”€ 18-multimodal/ # LLaVA, Whisper, Stable Diffusion └── 19-emerging-techniques/ # MoE, model merging, long context ``` --- ## πŸ“‹ Skill Structure Template Use [SKILL_TEMPLATE.md](docs/SKILL_TEMPLATE.md) as a starting point. Each skill should contain: ``` skill-name/ β”œβ”€β”€ SKILL.md # Quick reference (50-150 lines) β”‚ β”œβ”€β”€ Metadata (name, description, version) β”‚ β”œβ”€β”€ When to use this skill β”‚ β”œβ”€β”€ Quick start examples β”‚ β”œβ”€β”€ Common patterns β”‚ └── Links to references β”‚ β”œβ”€β”€ references/ # Deep documentation (300KB+) β”‚ β”œβ”€β”€ README.md # From GitHub/official docs β”‚ β”œβ”€β”€ api.md # API reference β”‚ β”œβ”€β”€ tutorials.md # Step-by-step guides β”‚ β”œβ”€β”€ issues.md # Real GitHub issues (if applicable) β”‚ β”œβ”€β”€ releases.md # Version history (if applicable) β”‚ └── file_structure.md # Codebase navigation (if applicable) β”‚ β”œβ”€β”€ scripts/ # Helper scripts (optional) └── assets/ # Templates & examples (optional) ``` --- ## πŸ” Quality Standards ### Code Examples All code examples MUST have language detection: βœ… **Good**: ````markdown ```python from transformers import AutoModel model = AutoModel.from_pretrained("gpt2") ``` ```` ❌ **Bad**: ````markdown ``` from transformers import AutoModel model = AutoModel.from_pretrained("gpt2") ``` ```` ### Documentation Size - **Minimum**: 100KB total in references/ - **Target**: 300KB+ total - **Gold Standard**: 500KB+ with issues, releases, examples ### Real-World Content Prefer skills with: - βœ… Real GitHub issues and solutions - βœ… Release notes and breaking changes - βœ… Community discussions - βœ… Performance benchmarks - βœ… Troubleshooting guides ### Links and Citations Always include: - βœ… Official documentation link - βœ… GitHub repository link - βœ… License information - βœ… Version/release information --- ## πŸ§ͺ Testing Before submitting, verify: ```bash # 1. SKILL.md is well-formatted cat your-skill/SKILL.md # 2. All reference files exist ls -R your-skill/references/ # 3. Documentation size is adequate (300KB+ target) du -sh your-skill/references/ # 4. Code blocks have language tags grep -A 1 '```' your-skill/SKILL.md | head -20 # 5. No broken links (manual check) # Open SKILL.md and verify all [links](urls) work # 6. Marketplace entry added and valid python3 -c "import json; json.load(open('.claude-plugin/marketplace.json'))" ``` --- ## πŸŽ“ Examples of High-Quality Skills **Gold Standard** (emulate this): 1. **06-post-training/grpo-rl-training/** (569 lines) ⭐⭐⭐⭐⭐ - Complete implementation workflow - 10+ code examples with explanations - Troubleshooting guide - Common pitfalls and solutions - Performance tips - **This is the quality bar** **Good Examples**: 2. **03-fine-tuning/axolotl/** (151 lines) - Real configuration examples - When to use guidance - Comprehensive but could add more workflows 3. **08-distributed-training/deepspeed/** (132 lines) - ZeRO optimization patterns - Configuration examples - Good foundation, needs more troubleshooting --- ## 🚫 What NOT to Contribute - ❌ Proprietary/closed-source tools - ❌ Deprecated libraries (unless historically important) - ❌ Duplicate skills (check existing skills first) - ❌ Incomplete skills (<50 lines SKILL.md, <100KB refs) - ❌ Skills without code examples --- ## πŸŽ–οΈ Recognition All contributors will be: - βœ… Listed in [CONTRIBUTORS.md](CONTRIBUTORS.md) - βœ… Mentioned in release notes - βœ… Featured on project homepage (when launched) - βœ… Attributed in SKILL.md metadata **Top contributors** (5+ skills) receive special recognition and maintainer status. --- ## πŸ“ž Getting Help - **Issues**: [GitHub Issues](https://github.com/YOUR_USERNAME/claude-ai-research-skills/issues) - **Discussions**: [GitHub Discussions](https://github.com/YOUR_USERNAME/claude-ai-research-skills/discussions) - **Questions**: Open a discussion with "Question:" prefix --- ## πŸ“… Review Process 1. **Automated Checks** (when implemented): - File structure validation - Code block language detection - Documentation size check - Marketplace.json validation 2. **Manual Review** (by maintainers): - Content quality and accuracy - Code example validity - Proper categorization - License compliance 3. **Feedback Loop**: - Reviews within 48-72 hours - Constructive feedback provided - Iterate until approved 4. **Merge**: - Merged to main branch - Added to release notes - Contributor recognized --- ## πŸ™ Thank You! Your contributions help the entire AI research community. Every skill added makes Claude Code more powerful for researchers, engineers, and students worldwide. **Let's build something amazing together!** πŸš€