--- name: experiment-design description: Design experiment plans with progressive stages — initial implementation, baseline tuning, creative research, and ablation studies. Plan baselines, datasets, hyperparameter sweeps, and evaluation metrics. Use when planning experiments for a research paper. argument-hint: [idea-or-plan] --- # Experiment Design Design structured, progressive experiment plans for research papers. ## Input - `$0` — Research idea, plan, or method description ## References - 4-stage progressive experiment prompts: `~/.claude/skills/experiment-design/references/stage-prompts.md` ## Scripts ### Generate experiment design ```bash python ~/.claude/skills/experiment-design/scripts/design_experiments.py --plan research_plan.json --output experiment_design.json python ~/.claude/skills/experiment-design/scripts/design_experiments.py --method "contrastive learning" --task classification --format markdown ``` Generates baselines, ablation matrix, hyperparameter grid, metric selection. Stdlib-only. ## 4-Stage Progressive Framework (from AI-Scientist-v2) ### Stage 1: Initial Implementation - Focus on getting a basic working implementation - Use a simple dataset - Aim for basic functional correctness - Completion: at least one working (non-buggy) implementation ### Stage 2: Baseline Tuning - Tune hyperparameters (learning rate, epochs, batch size) - Do NOT change model architecture - Test on at least TWO datasets - Completion: stable training curves, improvement over Stage 1 ### Stage 3: Creative Research - Explore novel improvements and insights - Be creative and think outside the box - Test on at least THREE datasets - Completion: demonstrated novel improvement ### Stage 4: Ablation Studies - Systematic component analysis - Each ablation tests a different aspect - Use same datasets as Stage 3 - Completion: all planned ablations done ## Output Format ```json { "stages": [ { "name": "initial_implementation", "goals": ["Basic working baseline", "Simple dataset"], "max_iterations": 5, "completion_criteria": "Working implementation with non-zero accuracy" } ], "baselines": ["Method A", "Method B"], "datasets": ["Dataset1", "Dataset2", "Dataset3"], "metrics": ["accuracy", "F1", "inference_time"], "ablation_components": ["component_A", "component_B"], "hyperparameter_grid": { "lr": [1e-4, 1e-3, 1e-2], "batch_size": [32, 64, 128] }, "num_seeds": 3 } ``` ## Rules - Always start simple (Stage 1) before complex experiments - Each stage builds on the best result from the previous stage - Multi-seed evaluation for statistical significance - Document every experiment run in notes.txt - Generate figures for training curves and comparisons ## Related Skills - Upstream: [research-planning](../research-planning/), [idea-generation](../idea-generation/) - Downstream: [experiment-code](../experiment-code/), [data-analysis](../data-analysis/) - See also: [paper-assembly](../paper-assembly/)