--- name: implement-paper-from-scratch description: Guides you through implementing a research paper step-by-step from scratch. Use when asked to implement a paper, code up a paper, reproduce research results, or build a model from a paper. Focuses on building understanding through implementation with checkpoint questions. --- # Implement Paper From Scratch The best way to truly understand a paper is to implement it. This skill guides you through that process methodically. ## Philosophy - **No copy-pasting from reference implementations** - We build understanding, not just working code - **Checkpoint questions verify understanding** - You should be able to answer "why" at each step - **Minimal dependencies** - Use NumPy/PyTorch fundamentals, not high-level wrappers - **Deliberate debugging** - Bugs are learning opportunities, not obstacles ## Process ### Phase 1: Pre-Implementation Analysis Before writing any code: 1. **Identify the core algorithm** - Strip away ablations, extensions, bells and whistles. What's the minimal version? 2. **List the components** - Break into modules: - Data pipeline - Model architecture - Loss function(s) - Training loop - Evaluation metrics 3. **Find the tricky parts** - What's non-obvious? - Custom layers or operations - Numerical stability concerns - Hyperparameter sensitivity - Implementation details buried in appendices 4. **Gather reference numbers** - What should we expect? - Training loss trajectory - Validation metrics at convergence - Compute requirements (if stated) ### Phase 2: Scaffolded Implementation Build up the implementation in this order: #### Step 1: Data ```python # Start with synthetic/toy data # Verify shapes and types before touching real data ``` **Checkpoint:** Can you describe what each tensor represents and its expected shape? #### Step 2: Model Architecture ```python # Build layer by layer # Print shapes at each stage # Verify parameter counts match paper ``` **Checkpoint:** If you randomly initialize and do a forward pass, do the output shapes match what the paper describes? #### Step 3: Loss Function ```python # Implement exactly as described # Test with known inputs/outputs # Check gradient flow ``` **Checkpoint:** Can you explain each term in the loss and why it's there? #### Step 4: Training Loop ```python # Minimal loop first (no logging, checkpointing, etc.) # Verify loss decreases on tiny overfit test # Then add bells and whistles ``` **Checkpoint:** Can you overfit a single batch? If not, something is broken. #### Step 5: Evaluation ```python # Implement paper's exact metrics # Compare against reported numbers ``` **Checkpoint:** On the same data split, how close are you to paper's numbers? ### Phase 3: The Debugging Gauntlet When it doesn't work (and it won't at first): 1. **The Overfit Test** - Can you memorize 1 example? 10? 100? - If not, architecture or gradient bug 2. **The Gradient Check** - Are gradients flowing to all parameters? - Any NaN or exploding gradients? 3. **The Initialization Check** - Match paper's initialization exactly - This matters more than people think 4. **The Learning Rate Sweep** - Log scale: 1e-5 to 1e-1 - Loss should decrease for some range 5. **The Ablation Debug** - Remove components until it works - Add back one at a time ### Phase 4: Checkpoint Questions At each stage, you should be able to answer: **Understanding:** - Why does this component exist? - What would happen without it? - What alternatives were considered? **Implementation:** - Why this specific implementation choice? - Where could numerical issues arise? - What's the computational complexity? **Debugging:** - What would it look like if this was broken? - How would you test this in isolation? - What are the most likely bugs? ## Output Format For each implementation session, provide: ```markdown ## Today's Implementation Goal [Specific component we're building] ## Prerequisites Check - [ ] Previous components working - [ ] Understand what we're building - [ ] Know expected behavior ## Implementation ### Code [Code blocks with extensive comments] ### Checkpoint Questions 1. [Question]
Answer[Answer]
2. [Question]
Answer[Answer]
### Verification Steps - [ ] Test 1: [What to check] - [ ] Test 2: [What to check] ### Common Bugs at This Stage 1. [Bug pattern]: [How to identify and fix] ## What's Next [Preview of next component and how it connects] ``` ## Tips for Specific Paper Types ### Transformer-based - Attention mask shapes are the #1 bug source - Verify positional encoding is applied correctly - Check layer norm placement (pre vs post) ### RL/Policy Gradient - Sign errors in policy gradient are silent killers - Advantage normalization matters - Verify discount factor handling ### Generative Models - KL term balancing is finicky - Check latent space distribution - Verify reconstruction looks reasonable before training ### Computer Vision - Normalization (ImageNet stats, batch norm) is crucial - Data augmentation can make or break results - Verify input preprocessing matches paper exactly ## Success Criteria You're done when: 1. **Numbers match** - Within reasonable variance of paper's results 2. **Understanding is deep** - You can explain every line of code 3. **You found the gotchas** - You know what breaks and why 4. **You could modify it** - Confident to try your own variations ## Anti-Patterns to Avoid - ❌ Copying code you don't understand - ❌ Skipping checkpoint questions - ❌ Using pre-built components for core algorithm - ❌ Ignoring discrepancies with paper - ❌ Moving on before current step works