# Repository Guidelines

## Project Structure & Module Organization
- `app.py`: Flask web UI for tagging, manual video review, and NFO metadata plan/write/history workflows.
- `scanner.py`: Video scanning, face detection/encoding, clustering, OCR extraction, auto-classify. CLI entry point for scanner commands.
- `nfo_services.py`: Active NFO metadata engine with `NfoPlanner`, `NfoWriter`, `NfoHistoryService`, and backup/parse services.
- `metadata_services.py`: Legacy ffmpeg-comment metadata services (`MetadataPlanner`, `MetadataWriter`, etc.) retained for compatibility/tests.
- `config.py`: Env-driven settings (paths, thresholds, cores, OCR config). Centralize changes here.
- `text_utils.py`: OCR text fragment ranking/filtering via `calculate_top_text_fragments()`.
- `util.py`: File hashing for content-based video tracking.
- `signal_handler.py`: `SignalHandler` class for graceful shutdown on Ctrl+C.
- `e2e_test.py`: End-to-end pipeline test runner.
- `scripts/`: Maintenance utilities (config checks, DB stats, NFO actor migration, manual review status updates).
- `tests/`: Pytest suite with `conftest.py` fixtures and test modules for each component.
- `templates/`: Jinja2 templates for the UI (tagging, manual review, metadata preview/history).
- `thumbnails/`, `video_faces.db`: Generated at runtime (gitignored).

## Build, Test, and Development Commands
- Install deps (preferred): `uv sync`  • Alt: `pip install -e .`
- Run scanner: `INDEXIUM_VIDEO_DIR=/path python scanner.py`
- Start web UI: `python app.py` → http://localhost:5001
- Run tests: `pytest -q`  • Example single test: `pytest -q tests/test_scanner.py::test_cluster_faces_updates_ids`
- End-to-end check: `python e2e_test.py test_vids` (creates temp DB/thumbs).

### Scanner Commands
```bash
python scanner.py                        # Full scan
python scanner.py retry                  # Reprocess previously failed videos
python scanner.py cleanup                # Remove orphaned/failed thumbnails
python scanner.py refresh_ocr            # Refresh OCR for all completed videos
python scanner.py refresh_ocr HASH1 ...  # Refresh one or more specific file hashes
python scanner.py continue_ocr           # Process videos missing OCR
python scanner.py cleanup_ocr            # Remove short OCR text (default <4 chars)
python scanner.py cleanup_ocr 6          # Custom minimum length
python scanner.py ocr_diagnose           # Print OCR environment diagnostics
```

## Coding Style & Naming Conventions
- Python 3.10+. Follow PEP 8 (4-space indents, 100–120 col soft limit).
- Use snake_case for functions/variables, PascalCase for classes, module names in lowercase.
- Add concise docstrings for public functions and routes; prefer type hints where practical.
- Keep functions small and side-effect-aware; prefer explicit over implicit configuration via `config.py`.

## Testing Guidelines
- Framework: Pytest. Place tests under `tests/` as `test_*.py`; name tests `test_*`.
- Use fixtures/monkeypatching to point DB/paths to temp locations (see `tests/conftest.py`).
- Run locally with `pytest -q`; target specific tests during development for speed.
- Coverage: `pytest --cov --cov-report=term-missing` • HTML report: `pytest --cov --cov-report=html`
- Test files: `test_app.py`, `test_scanner.py`, `test_nfo_services.py`, `test_metadata_services.py`, `test_metadata_writer.py`, `test_config.py`, `test_text_utils.py`, `test_util.py`, `test_signal_handler.py`, `test_e2e.py`, `test_e2e_ui.py`.

## Commit & Pull Request Guidelines
- Commit messages: imperative mood, concise summary (e.g., "Add pagination for tag_group"). Optional scope tags (e.g., `test:`) are welcome.
- PRs should include: clear description, rationale, testing steps, and screenshots/GIFs for UI changes.
- Link related issues; keep diffs focused. Update `README.md`/`config.py` docstrings when adding new settings.

## Security & Configuration Tips
- Do not commit generated assets: `video_faces.db`, `thumbnails/`, `.env` (already in `.gitignore`).
- Required tools: ffmpeg (includes ffprobe), OpenCV/dlib system deps (see README).
- For OCR: EasyOCR (preferred) or Tesseract as fallback.
- Key env vars: `INDEXIUM_VIDEO_DIR`, `INDEXIUM_DB`, `FLASK_DEBUG`, `INDEXIUM_OCR_ENABLED`, `INDEXIUM_OCR_ENGINE`, `MANUAL_VIDEO_REVIEW_ENABLED`, `METADATA_PLAN_WORKERS`, `NFO_REMOVE_STALE_ACTORS`, `NFO_BACKUP_MAX_AGE_DAYS`.

## Manual UI Testing with Playwright

To verify UI changes, use Playwright MCP tools to interact with the running app.

1. **Check if app is running**: `curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:5001/ || echo "not running"`
2. **Start app in background (if needed)**: `python app.py &` — track whether you started it
3. **Navigate and interact** using Playwright MCP tools:
   - `browser_navigate` — go to a URL (e.g., `http://127.0.0.1:5001/videos/manual`)
   - `browser_snapshot` — capture accessibility snapshot for page structure and element refs
   - `browser_take_screenshot` — visual screenshot to verify layout/styling
   - `browser_click`, `browser_type`, `browser_fill_form` — interact with elements
4. **Shut down app (if you started it)**: `pkill -f "python app.py"`