---
name: tinkering
description: >
  Safe experimentation framework for AI agents. Creates isolated sandbox
  environments for trying new features, testing approaches, and exploring
  solutions without polluting the main codebase.

  USE WHEN: Agent needs to try something uncertain, explore multiple approaches,
  test a new library, prototype a feature, or run a technical spike before
  committing to implementation.

  PRIMARY TRIGGERS:
  "experiment with" = Setup sandbox + run experiment
  "try this approach" = Quick experiment in sandbox
  "spike" / "POC" / "prototype" = Time-boxed technical investigation
  "tinker" / "tinkering mode" = Enter experimentation workflow
  "explore options" = Multi-approach comparison in sandbox

  NOT FOR: Debugging (use debugger), testing (use test runner),
  or committed feature work (use git branches).

  DIFFERENTIATOR: Unlike git branches (for committed direction), tinkering is
  for "I don't know if this will work" exploration. Try 5 things in sandbox
  before committing to a branch. Faster feedback, zero codebase pollution.
---

# Tinkering

## Overview

Structured experimentation framework. When uncertain about an approach, don't
hack at production code - create an isolated sandbox, try freely, then graduate
successful experiments or discard failed ones cleanly.

**Core principle:** The output of tinkering is **knowledge**, not production code.
A successful experiment teaches you how to solve the problem. The actual
implementation happens after, informed by what you learned.

## When to Use

| Situation | Tinkering? | Why |
|-----------|-----------|-----|
| "Will this library work for our use case?" | Yes | Unknown outcome, need to explore |
| "Which of these 3 approaches is fastest?" | Yes | Comparing multiple options |
| "How do I integrate this API?" | Yes | Technical spike, learning-focused |
| "Add a login button to the header" | No | Clear requirement, use git branch |
| "Fix the null pointer on line 42" | No | Debugging, not experimenting |
| "Refactor auth module to use JWT" | Maybe | If approach uncertain, spike first |

---

## Workflow

### Phase 1: Setup Sandbox

Create isolated experiment environment:

```bash
# 1. Create experiment directory
mkdir -p _experiments/{experiment-name}

# 2. Add to .gitignore (if not already present)
grep -qxF '_experiments/' .gitignore 2>/dev/null || echo '_experiments/' >> .gitignore

# 3. Create manifest (first time only)
# See MANIFEST.md template below
```

**MANIFEST.md template** (create at `_experiments/MANIFEST.md`):
```markdown
# Experiment Log

## Active

### {experiment-name}
- **Date**: YYYY-MM-DD
- **Hypothesis**: What we're trying to learn
- **Status**: active
- **Result**: (pending)

## Completed
<!-- Move finished experiments here -->
```

**Rules:**
- NEVER modify production files during tinkering
- ALL experiment code goes inside `_experiments/{name}/`
- Copy source files into sandbox if you need to modify them

---

### Phase 2: Hypothesize

Before writing any code, state clearly:

```
Question : What specific question are we answering?
Success  : How will we know it works?
Time box : Maximum time to spend (default: 30 min)
Scope    : Which files/areas are involved?
```

Write this in `_experiments/{name}/HYPOTHESIS.md` or as a top comment.

**Example:**
```
Question : Can we replace moment.js with date-fns and reduce bundle size?
Success  : Bundle decreases >20%, all date formatting still works
Time box : 20 minutes
Scope    : src/utils/date.ts, package.json
```

---

### Phase 3: Experiment

Build freely in the sandbox.

**Modifying existing code:**
```bash
# Copy the file(s) you need to change
cp src/utils/date.ts _experiments/date-fns-migration/date.ts
# Edit the copy freely - zero risk to production
```

**New feature exploration:**
```bash
# Create new files directly in sandbox
touch _experiments/websocket-poc/server.ts
touch _experiments/websocket-poc/client.ts
```

**Library evaluation:**
```bash
# Minimal test script in sandbox
touch _experiments/redis-eval/test_redis.py
# Use isolated dependencies (venv, local node_modules)
```

**Multi-approach comparison:**
```
_experiments/caching-spike/
  approach-a-redis/
  approach-b-memory/
  approach-c-sqlite/
  COMPARISON.md       # Side-by-side evaluation
```

**Rules during experimentation:**
- Stay in sandbox - never touch production files
- Quick and dirty is fine - this is throwaway code
- Document learnings as you go
- Stop at time box, even if incomplete - partial answers are still answers

---

### Phase 4: Evaluate

Assess results against the hypothesis.

**Checklist:**
- Did the experiment answer the original question?
- Does it meet the success criteria from Phase 2?
- Any unexpected side effects or constraints discovered?
- Is the approach feasible for production implementation?
- What's the estimated effort to implement properly?

**Update MANIFEST.md:**
```markdown
- **Result**: SUCCESS - date-fns reduced bundle by 34%, all tests pass
- **Status**: graduated
- **Notes**: Need to handle timezone edge case in formatRelative()
```

**Decision:**
- Positive result -> Phase 5, Path A (Graduate)
- Negative result -> Phase 5, Path B (Discard)
- Inconclusive -> Extend time box OR try different approach

---

### Phase 5: Graduate or Discard

#### Path A: Graduate (success)

**Load reference:** `references/graduation-checklist.md`

Quick summary:
1. Do NOT copy-paste experiment code directly into production
2. Re-implement properly using what you learned
3. Write proper tests for the production implementation
4. Apply code standards (experiment was quick & dirty, production shouldn't be)
5. Reference experiment in commit message for context

#### Path B: Discard (failed)

Failed experiments are valuable - they tell you what NOT to do.

1. Update MANIFEST.md with failure reason and learnings
2. Delete experiment files: `rm -rf _experiments/{name}/`
3. Or keep briefly if learnings are worth referencing

---

### Phase 6: Cleanup

```bash
# Remove completed experiment
rm -rf _experiments/{experiment-name}/

# Update MANIFEST.md - move entry to "Completed" section
```

**MANIFEST.md after cleanup:**
```markdown
## Completed

### date-fns-migration (2025-01-15)
- GRADUATED - Implemented in commit abc123
- Learnings: date-fns 3x smaller, timezone handling needs explicit config

### graphql-evaluation (2025-01-10)
- DISCARDED - Too much overhead for our simple REST API
- Learnings: REST + OpenAPI better fit for <20 endpoints
```

---

## Quick Reference

```
Setup      ->  mkdir _experiments/{name}, add to .gitignore
Hypothesize ->  Question + success criteria + time box
Experiment  ->  Build in sandbox (never touch production)
Evaluate    ->  Check against success criteria
Graduate    ->  Re-implement properly in production
Cleanup     ->  Remove files, update manifest
```

## Edge Cases

**Needs database changes:** Use separate test DB or schema prefix. Document in hypothesis.

**Needs running server:** Run from sandbox, use different port to avoid conflicts.

**Multiple concurrent experiments:** Each gets own subdirectory. MANIFEST tracks all.

**Experiment grows into real feature:** Graduate it. Don't let experiments become shadow production code.

**Team member needs to see experiment:** Push to feature branch (temporarily track `_experiments/`) or share via patch.