# Implement Skill ## Purpose Create a new Jupyter notebook implementing the method from the research paper, using the same data as the baseline. Record implementation details and measured metrics as a structured JSON entry. ## When to Use Phase 4 — after benchmark metrics are defined and baseline values are extracted. ## Input | Parameter | Type | Description | |-----------|------|-------------| | experiment_path | path | `experiments/{research_name}/` | ## Actions 1. **Install dependencies:** - Install any new packages identified in Phase 2. - Capture installed package names and versions for the log entry below. 2. **Write `{experiment_path}/new_requirements.txt`:** - List all packages the new notebook needs (one per line, `package==version`). - Include both existing dependencies and new ones from the paper. 3. **Create `{experiment_path}/new.ipynb`** with this structure: ``` [Markdown] # {Research Name} - New Method Implementation [Markdown] ## 1. Setup & Imports [Code] import statements + dependency checks [Markdown] ## 2. Data Loading [Code] load from experiments/{research_name}/current_data/ (use the SAME data loading logic as current.ipynb) [Markdown] ## 3. Data Preprocessing [Code] preprocessing as required by the new method (note any differences from baseline preprocessing) [Markdown] ## 4. Model Implementation [Code] implement the new method from the paper [Markdown] ## 5. Training [Code] train the model (use same train/test split as baseline for fair comparison) [Markdown] ## 6. Evaluation [Code] compute ALL comparison metrics defined in Phase 3 [Markdown] ## 7. Results Summary [Code] print all metrics in a structured format ``` 4. **Implementation rules:** - Use the SAME train/test split (same random seed, same ratio) as the baseline. - Use the SAME data — load from `current_data/`, do not download new data. - Compute ALL metrics defined in Phase 3 (including any with `"needs_computation": true`). - Add timing measurements for training (`training_time_seconds`). - Handle errors gracefully — if the method fails, log why. - **Efficiency:** if data is large (100K+ rows), sample it to a manageable size (10K–30K rows). Both notebooks must use the exact same sample. Use paper's recommended hyperparameters — do not run exhaustive grid searches. If training takes more than 10 minutes, reduce data size or simplify config. The goal is a fair comparison, not a production model. 5. **Run the notebook** end-to-end and verify it executes without errors. 6. **Append a Phase 4 entry to `{experiment_path}/log.json`** under `phases`: ```json { "name": "Phase 4: Implement", "completed_at": "2026-04-17T11:30:00Z", "new_dependencies_installed": [ {"name": "catboost", "version": "1.2.5"} ], "training": { "split": 0.2, "seed": 42, "stratified": true }, "metrics": { "accuracy": 0.8721, "f1": 0.7310, "roc_auc": 0.9288, "training_time_seconds": 45.2 }, "notebook_executed": true, "errors": [], "warnings": [] } ``` Do not overwrite earlier entries; append to the `phases` array. ## Output - `{experiment_path}/new.ipynb` — complete, executed notebook - `{experiment_path}/new_requirements.txt` — written - `{experiment_path}/log.json` — updated with Phase 4 implementation entry