# Architecture ## System Overview shelfctl manages PDF/EPUB libraries using GitHub as a storage backend. Files are stored as GitHub Release assets (not in git), metadata is tracked in `catalog.yml` (in git), and books are cached locally for reading and annotation sync. ```mermaid graph TD subgraph shelfctl CLI["CLI Commands"] TUI["Unified TUI
single program"] CLI <--> TUI OPS["operations"] UNI["tui / unified"] CLI --> OPS TUI --> UNI CAT["catalog"] CFG["config"] CACHE["cache"] OPS --> CAT OPS --> CFG OPS --> CACHE UNI --> CAT UNI --> CFG UNI --> CACHE GH["github API client"] LC["local cache manager"] CAT --> GH CACHE --> LC OPS --> GH end GHR["GitHub Releases
file storage"] LFS["~/.local/share/
shelfctl/cache/"] GH --> GHR LC --> LFS ``` ### TUI Views ```mermaid graph LR HUB["Hub
main menu + Ctrl+P palette"] HUB --> BROWSE["Browse
book list + details panel"] HUB --> SHELVE["Shelve
file picker → metadata → upload"] HUB --> EDIT["Edit
metadata editor / carousel"] HUB --> MOVE["Move
destination selector"] HUB --> DELETE["Delete
confirmation dialog"] HUB --> CCLEAR["Cache Clear
multi-select picker"] HUB --> CREATE["Create Shelf
creation form"] BROWSE --> EDIT BROWSE --> DELETE BROWSE --> MOVE ``` ### Sync Flow ```mermaid sequenceDiagram participant User participant Cache as Local Cache participant CLI as shelfctl sync participant GH as GitHub User->>Cache: Open book, add annotations User->>CLI: shelfctl sync CLI->>Cache: Compute SHA256 CLI->>CLI: Compare against catalog checksum alt Modified CLI->>GH: Delete old Release asset CLI->>GH: Upload modified file CLI->>GH: Update catalog.yml + commit CLI->>Cache: Update local checksum end ``` ### Shelve Flow ```mermaid sequenceDiagram participant User participant CLI as shelfctl shelve participant Ingest as ingest participant GH as GitHub participant Cache as Local Cache User->>CLI: shelve book.pdf --shelf prog CLI->>Ingest: Resolve source (file/URL/GitHub) CLI->>Ingest: Extract PDF metadata CLI->>GH: Upload as Release asset CLI->>GH: Append to catalog.yml + commit opt --cache flag CLI->>Cache: Store local copy end ``` ## Storage Model ### Why Release Assets GitHub Release assets store the actual book files. This avoids: - Git history bloat (only `catalog.yml` is versioned) - Git's 100MB per-file limit - Git LFS costs - Full-repo clones for single-file access Each book is a single Release asset downloadable via GitHub CDN. ### Data Layout ``` GitHub repo (shelf-programming): ├── catalog.yml # Metadata (git-tracked) ├── README.md # Auto-generated inventory (git-tracked) ├── covers/ # Optional curated cover images (git-tracked) └── releases/ └── library/ # Release tag ├── sicp.pdf # Release asset (not in git) ├── gopl.pdf # Release asset (not in git) └── ... Local cache (~/.local/share/shelfctl/cache/): └── shelf-programming/ ├── sicp.pdf # Downloaded book ├── gopl.pdf └── .covers/ ├── sicp.jpg # Auto-extracted thumbnail └── sicp-catalog.jpg # Downloaded catalog cover ``` ### Catalog Schema ```yaml - id: sicp title: "Structure and Interpretation of Computer Programs" author: "Abelson & Sussman" year: 1996 tags: ["lisp", "cs", "textbook"] format: "pdf" checksum: sha256: "a1b2c3d4..." size_bytes: 6498234 cover: "covers/sicp.jpg" # Optional, git-tracked source: type: "github_release" owner: "your-username" repo: "shelf-programming" release: "library" asset: "sicp.pdf" meta: added_at: "2024-01-15T10:30:00Z" ``` Required: `id`, `title`, `format`, `source.*` Recommended: `checksum`, `author`, `tags`, `year`, `size_bytes` Optional: `cover`, `meta.*` ### Configuration ```yaml # ~/.config/shelfctl/config.yml github: owner: "your-username" token_env: "SHELFCTL_GITHUB_TOKEN" # Reads from env var, never stores token api_base: "https://api.github.com" defaults: release: "library" cache_dir: "~/.local/share/shelfctl/cache" asset_naming: "id" # "id" or "original" shelves: - name: "programming" # Short name for CLI repo: "shelf-programming" # GitHub repo owner: "other-user" # Optional: override default owner ``` Each shelf can override the default owner for multi-user/org setups. ## Package Structure ``` internal/ ├── app/ # CLI commands (cobra) and TUI launcher ├── catalog/ # Book metadata model, YAML loading, search ├── config/ # Config loading and validation ├── github/ # GitHub REST API client ├── ingest/ # PDF metadata extraction, file source resolution ├── cache/ # Local file storage, cover art, HTML index generation ├── migrate/ # Migration scanning, ledger tracking ├── operations/ # Shelf creation, README management ├── tui/ # All TUI view components ├── unified/ # TUI orchestrator, hub, view routing └── util/ # TTY detection, formatting helpers ``` ## GitHub API Client `internal/github/` implements a focused REST client: - **Token handling**: Bearer token from env var, stripped on S3 redirects - **Assets**: List, find, download (streamed with progress), upload (multipart) - **Contents**: Read/write `catalog.yml` with commit messages - **Releases**: Get by tag, create if missing - **Repos**: Get info, create (public/private) Upload timeout is 5 minutes for large files. Downloads stream directly to cache. ## Commands | Command | Description | |---------|-------------| | `init` | Create shelf repo, release, and config entry | | `shelve` | Add book from local file, URL, or GitHub path | | `open` | Download (if needed) and open with system viewer | | `browse` | Interactive TUI browser or text listing | | `search` | Full-text search across title, author, tags | | `edit-book` | Update metadata (title, author, year, tags) | | `delete-book` | Remove book, asset, and cache entry | | `move` | Move books between releases or shelves | | `sync` | Upload locally-modified books back to GitHub | | `status` | Show sync status and statistics per shelf | | `tags list` | List all tags with book counts | | `tags rename` | Bulk rename tags across shelves | | `split` | Interactive wizard to reorganize a shelf | | `import` | Copy books from another shelf | | `verify` | Detect catalog/release mismatches, `--fix` to repair | | `shelves` | Validate all configured shelves | | `delete-shelf` | Remove shelf (optionally delete GitHub repo) | | `cache info` | Cache disk usage statistics | | `cache clear` | Remove cached books (interactive or by ID) | | `info` | Show book details and cache status | | `index` | Generate static HTML library viewer | | `migrate scan` | List files in source repo for migration | | `migrate batch` | Batch-migrate with resumable ledger | | `migrate one` | Single file migration | ## Sync Mechanism Annotation sync detects locally-modified books and re-uploads them: 1. User opens book, adds annotations in PDF reader 2. Modified file saved to local cache 3. `shelfctl sync` compares local SHA256 against catalog checksum 4. Deletes old Release asset, uploads modified file 5. Updates catalog with new checksum 6. Single commit: "sync: update X books with local changes" Modified files in cache are protected — `cache clear` won't delete them without `--force`. ## Cover Art Two types, with display priority: catalog > extracted > none. **Catalog covers** (user-curated): specified in `catalog.yml` `cover` field, stored in git, downloaded to `.covers/-catalog.jpg`. Portable across machines. **Auto-extracted thumbnails**: extracted from PDF first page via `pdftoppm` (poppler-utils) during download. Stored in `.covers/.jpg`. Local-only, regenerated per machine. Parameters: JPEG, 300px max, quality 85. ## Terminal Image Rendering `internal/tui/image.go` auto-detects terminal image protocol: | Protocol | Terminals | Method | |----------|-----------|--------| | Kitty Graphics | Kitty, Ghostty | `\x1b_Ga=T,f=100,t=f;\x1b\\` | | iTerm2 Inline | iTerm2 | `\x1b]1337;File=inline=1;width=30px:\x07` | | None | Others | Text-only fallback | Detection result is cached with `sync.Once` to avoid per-frame overhead. ## PDF Metadata Extraction `internal/ingest/pdfmeta.go` extracts title/author from PDF Info dictionaries. Pure Go, no external dependencies. Scans first 8KB + last 8KB for metadata. Handles parentheses and UTF-16BE hex formats. Used to pre-populate metadata forms during shelving. ## File Source Resolution `internal/ingest/` resolves three input types: - **Local file**: direct filesystem read - **HTTP URL**: streamed download - **GitHub path**: `github:user/repo@ref:path/file.pdf` via Contents API ## Migration System For importing from existing repos with hundreds of files: 1. `migrate scan --source owner/repo` lists all files → `queue.txt` 2. User edits queue with shelf assignments and metadata 3. `migrate batch queue.txt --n 20 --continue` processes in chunks 4. `.shelfctl-ledger.txt` tracks completed/failed entries for resumability ## HTML Index Generation `shelfctl index` generates a self-contained `index.html` with: - Visual grid layout with cover thumbnails - Real-time client-side search (title, author, tags) - Clickable tag cloud with counts - Sort by date added, title, author, year - `file://` links to open cached books locally - Works completely offline ## Caching and Performance ### Parallel Operations - **Catalog loading**: Per-shelf goroutines with pre-allocated result slices - **Status checks**: Per-shelf goroutines for sync status - **Cover art fetching**: Bounded concurrency with semaphore channel (8 concurrent) - **Downloads**: Background downloads while TUI remains responsive ### TUI Performance - Package-level lipgloss styles (avoid per-frame allocations) - `sync.Once` for image protocol detection - Cached divider strings rebuilt only on window resize - Hub details pane cached until content type changes ## Editing catalog.yml Directly `catalog.yml` is a standard YAML file in your shelf repo's default branch. You can edit it directly with git if you prefer: ```bash git clone https://github.com/you/shelf-books cd shelf-books # Edit catalog.yml — add/remove/reorder books, change metadata vim catalog.yml git add catalog.yml && git commit -m "update metadata" && git push ``` shelfctl reads `catalog.yml` from GitHub on every operation, so changes are picked up immediately. The schema is an array of book objects: ```yaml - id: sicp title: "Structure and Interpretation of Computer Programs" authors: ["Harold Abelson", "Gerald Jay Sussman"] tags: [cs, lisp] sha256: abc123... asset: sicp.pdf ``` Fields managed by shelfctl (`sha256`, `asset`, `size`) should not be edited manually — they're computed from the actual release assets. ## Security - GitHub token read from environment variable, never written to config - Custom HTTP redirect handler strips Bearer token on S3 redirects - SHA256 checksums verify file integrity after download - Modified-file protection prevents accidental cache deletion