# Experiment Management Skill

## Purpose
Set up and manage the experiment folder structure. This is Phase 0 — it runs before any analysis begins. All bookkeeping files are JSON (never markdown).

## When to Use
- At the very start of a new experiment
- When updating `experiments.json` or `comparison.json` after an experiment completes

## Input
| Parameter | Type | Description |
|-----------|------|-------------|
| research_name | string | The experiment name **as given by the caller**. Use it verbatim — do not rename it, do not re-derive it from the source title. |
| research_source | ref | A free-form reference describing the new method. The caller can pass anything that identifies the content — a local file path, any URL (blog post, arXiv, docs, Hugging Face page, …), a git repository, a Kaggle link, a paper ID, or **a plain text idea** describing the approach to try. Do not reject unusual values; investigate and fetch whatever was given, or, for pure text ideas, save the text verbatim. |
| current_notebook | path | Path to the current baseline .ipynb |
| current_data | ref \| placeholder | Path to the current dataset (file or directory), a short description of how the notebook loads data, **or** the literal placeholder `"(not provided — infer it from the current notebook's data-loading cells)"`. When you see that placeholder, read the current notebook yourself and infer the source from its data-loading cells; do not ask the user. |
| experiments_directory | path | The directory (inside the workspace) where experiment folders live (e.g. `./experiments`). |

## Actions

### Setup (start of experiment)

1. **Create experiment directory:**
   ```
   experiments/{research_name}/
   ```

2. **Copy baseline files (NEVER move, NEVER modify originals):**
   ```bash
   cp {current_notebook} experiments/{research_name}/current.ipynb
   # Only when current_data is a real path on disk:
   cp -r {current_data}  experiments/{research_name}/current_data/
   ```

   Resolve `{current_data}` as follows before copying:

   - **Real local path** (file or directory that exists on disk) → `cp` / `cp -r` it into `current_data/`.
   - **Short description of a code-based download** (e.g. `"downloaded in notebook (ucimlrepo, id=2)"`) → leave `current_data/` empty and record the description verbatim in `log.json.metadata.original_data`.
   - **Placeholder `"(not provided — infer it from the current notebook's data-loading cells)"`** → open `current.ipynb` yourself, scan for data-loading cells (`pd.read_csv`, `fetch_openml`, `fetch_ucirepo`, `load_dataset`, `kaggle.api...`, `urllib`/`requests` downloads, `np.load`, local paths, …), write down the exact loader you found as `log.json.metadata.original_data`, and make sure Phase 4's `new.ipynb` uses the same loader. Do not ask the user for clarification — do the investigation yourself.

3. **Materialize the research source.** `{research_source}` can be anything — a local file, a URL of any kind, a git or Kaggle link, an arXiv / paper ID, a Hugging Face page, **or a plain text idea** describing the method to try. Your job is to bring its content into the experiment folder using whatever tool fits:

   - **Investigate first.** Check if the value is a path on disk (`ls` / `test -e`); if it looks like a URL, poke it with `curl -I`; look at the hostname; read any hint in the value itself. If the value does not look like a path or URL at all, treat it as a **text idea** (see below). Do not rely on a fixed detection list.
   - **Retrievable source** → fetch it with the most appropriate tool: `cp`, `git clone --depth 1`, `curl -L` / `wget`, `kaggle kernels pull …` / `kaggle datasets download …`, `huggingface-cli`, an arXiv PDF helper, Python downloaders, or anything else available. Install a missing CLI with `pip install` / `uv pip install` if it is the right tool for this source.
   - **Text idea** → do not fabricate a paper or URL. Save the description verbatim to `experiments/{research_name}/research_source.md` (optionally with a leading `# Idea` heading) and let Phase 2 turn it into a concrete implementation plan.
   - **Pick a sensible local name** based on what you actually produced:
     - A single PDF → `experiments/{research_name}/research.pdf`
     - Any other single file (including a text idea) → `experiments/{research_name}/research_source.{ext}` (`.md` for ideas; preserve the real extension for files you copied)
     - Multiple files or a cloned repo / dataset → `experiments/{research_name}/research_source/` (a directory)
     - A fetched HTML page → `research_source.html`, optionally with a cleaned `research_source.md` and/or a linked `research.pdf`
   - **Choose your own `research_source_kind` label** to describe what you did (e.g. `pdf`, `file`, `git`, `kaggle_notebook`, `kaggle_dataset`, `arxiv`, `huggingface_model`, `html`, `idea`, `other`). This label is just for observability — there is no closed enum.
   - **If fetching fails**, try an alternative (different CLI, raw HTML fallback, `curl` instead of a specialized tool). Only after genuine failure, log the attempts in `log.json`, mark the experiment `FAILED`, and explain what you tried in `result.json.explanation`. A text idea can never "fail to fetch" — it is always saved verbatim.

   Let `research_source_local` be whichever local path you produced. Use that path for Phase 2 onwards — never re-fetch.

4. **Create `log.json`** with the starting skeleton:
   ```json
   {
     "name": "{research_name}",
     "metadata": {
       "date": "YYYY-MM-DD",
       "original_notebook":      "{current_notebook}",
       "original_data":          "{current_data}",
       "research_source":        "{research_source_local}",
       "research_source_origin": "{research_source}",
       "research_source_kind":   "a short label you pick, e.g. pdf, file, git, kaggle_notebook, kaggle_dataset, arxiv, huggingface_model, html, idea, other"
     },
     "phases": []
   }
   ```
   Phases append entries here as they finish; never overwrite earlier entries.

4. **Register in `experiments/experiments.json`:**
   - If the file does not exist, create it with `{"experiments": []}`.
   - Append a new entry with `status: "in_progress"`:
     ```json
     {
       "name": "{research_name}",
       "date": "YYYY-MM-DD",
       "status": "in_progress",
       "paper": "{paper_title}",
       "baseline_model": null,
       "new_method": null,
       "verdict": null,
       "key_metric": null,
       "path": "experiments/{research_name}/"
     }
     ```

5. **Create initial `progress.json`** (see `skills/progress/SKILL.md` for schema) with:
   - `status: "RUNNING"`
   - all phases listed, all `pending`, Phase 0 marked `current`
   - `started_at` and `updated_at` set to now (UTC ISO-8601)

### Finalize (end of experiment)

1. Update the experiment entry in `experiments.json` with final `status`, `verdict`, and `key_metric` (see `skills/evaluate/SKILL.md`).
2. Append a row to `experiments/comparison.json`.
   - If the file does not exist, create it with `{"experiments": []}`.

## Output
- `experiments/{research_name}/` directory with copied files
- `experiments/{research_name}/log.json` initialized
- `experiments/{research_name}/progress.json` initialized
- `experiments/experiments.json` updated