# Storage Abstraction Design

The storage layer is designed around traits that allow for easy swapping of implementations.

## Core Storage Traits

The storage trait covers:

- Podcast CRUD operations
- Episode CRUD and per-podcast episode listing
- Playlist CRUD/list/existence operations
- Statistics persistence
- Backup/restore workflows

## JSON Implementation

The JSON implementation stores data in organized files:

```
data/
├── podcasts/
│   ├── podcast_1.json
│   ├── podcast_2.json
│   └── ...
├── episodes/
│   ├── podcast_1/
│   │   ├── episode_1.json
│   │   ├── episode_2.json
│   │   └── ...
│   └── ...
├── playlists/
│   ├── My Playlist/
│   │   ├── playlist.json
│   │   └── audio/
│   │       ├── 001-episode.mp3
│   │       └── ...
│   └── ...
├── stats.json
└── config.json
```

This design allows for:
- Easy manual editing of data files
- Simple backup (copy directory)
- Future implementations (SQLite, remote storage, etc.)
- Clean separation of concerns

## Cache Performance (issue #206)

The in-memory cache (#204) and persistent `cache_index.json` (#205) were
benchmarked against a synthetic fixture of 30 podcasts × 200 episodes
(6,000 total episodes). Results from `cargo run --release --example bench_storage_load`:

| config              | initialize | first traversal | subsequent traversal |
|---------------------|-----------:|----------------:|---------------------:|
| no cache            |       0 ms |          481 ms |               489 ms |
| cache (cold build)  |     456 ms |            0 ms |                 0 ms |
| cache (warm)        |      13 ms |            0 ms |                 0 ms |

**Warm-cache speedup vs no-cache traversal: ~946× — well above the 10×
target set by epic #202.**

### Reading the table

- **no cache** uses `JsonStorage::with_cache(false)`. Every `list_podcasts` /
  `load_episodes` call hits disk. This is the pre-#204 baseline.
- **cache (cold build)** is the first launch after install: no
  `cache_index.json` yet on disk. `initialize()` builds the in-memory
  snapshot from the per-podcast / per-episode JSON files, so the build cost
  shows up here (456 ms in the table). All subsequent reads are in-memory
  hits (0 ms).
- **cache (warm)** is every subsequent launch: `initialize()` deserialises
  `cache_index.json` in one shot (~13 ms for 6,000 episodes); reads are pure
  in-memory hits.

### Tuning notes

- **Flush interval**: the background flush task ticks every
  `CACHE_FLUSH_INTERVAL` (5 s). The 13 ms warm-init time on a 6,000-episode
  fixture leaves no pressure to lower it; the dirty-bit gate already skips
  the write when nothing changed.
- **Snapshot shape**: a single `cache_index.json` is plenty. Splitting per
  podcast would only matter if warm init crossed the ~200 ms threshold,
  which the benchmark shows is more than an order of magnitude away.
- **Compression**: not worth it. The dataset is small and the I/O time is
  already negligible compared to the JSON parse cost.

### Reproducing

```bash
cargo run --release --example bench_storage_load
# Custom fixture size:
BENCH_PODCASTS=10 BENCH_EPISODES=50 cargo run --release --example bench_storage_load
```

The benchmark seeds a fresh `TempDir` each run, so it is safe to execute
without touching real user data.