--- name: experiment-cycle description: Run or design an experiment with fixed/varying variables, metrics, and recorded results. Use for ablations, evals, or hyperparameter runs. --- # Experiment cycle ## When to use - User wants to run an experiment (ablation, eval, hyperparameter, comparison). - Task involves "vary X, measure Y, record Z". ## Input/Output - **Input**: What varies, what is fixed, metrics, where to record (path or format). - **Output**: Command(s) run, where results were written, facts (numbers), inference (optional), recommendations. ## Steps (SOP) 1. **Define**: Fixed variables, varying variables, metrics, acceptance (e.g. run completes, metric threshold). 2. **Run**: Execute with project scripts if any (e.g. `run_ablation.py`). Use stated config. 3. **Record**: Write results to agreed path (file, table). Do not leave only in chat. 4. **Conclude**: Facts (observed values) | Inference ("likely because...") | Recommendations ("next: ..."). ## Acceptance criteria - Results written to a file or artifact. Facts separated from inference. No "improved" without evidence or baseline. ## Common failures - **No record**: Always write results somewhere reproducible. - **Mixing fact and inference**: Label clearly. Do not state "better" without numbers or baseline. - **Under-spec**: If config is vague, state assumptions (e.g. seed, single run) and note replication needs. ## When not to use - One-off script with no metrics (use plan-then-implement). - Bug fix (use debug-regression). - Feature implementation (use implement-feature-with-gates).