# Summarize 📝 — Chrome Side Panel + CLI Fast summaries from URLs, files, and media. Works in the terminal, a Chrome Side Panel and Firefox Sidebar. ## Highlights - Chrome Side Panel **chat** (streaming agent + history) inside the sidebar. - **Video slides**: screenshots + OCR + transcript cards for YouTube, direct video URLs, and local video files. - Media-aware summaries: auto‑detect video/audio vs page content. - Coding CLI backends: Codex, Claude, Gemini, Cursor Agent, OpenClaw, OpenCode. - Streaming Markdown + metrics + cache‑aware status. - CLI supports URLs, files, podcasts, YouTube, audio/video, PDFs. ## Feature overview - URLs, files, and media: web pages, PDFs, images, audio/video, YouTube, podcasts, RSS. - Slide extraction for video sources (YouTube, direct video URLs, local video files) with OCR + timestamped cards. - Transcript-first media flow: published transcripts when available, then Groq/ONNX/whisper.cpp/AssemblyAI/Gemini/OpenAI/FAL transcription fallback when not. - Coding CLI providers: Claude, Codex, Gemini, Cursor Agent, OpenClaw, OpenCode, GitHub Copilot, Antigravity, pi. - Streaming output with Markdown rendering, metrics, and cache-aware status. - Local, paid, and free models: OpenAI‑compatible local endpoints, paid providers, plus an OpenRouter free preset. - Output modes: Markdown/text, JSON diagnostics, extract-only, metrics, timing, and cost estimates. - Smart default: if content is shorter than the requested length, we return it as-is (use `--force-summary` to override). ## Get the extension (recommended) ![Summarize extension screenshot](docs/assets/summarize-extension.png) One‑click summarizer for the current tab. Chrome Side Panel + Firefox Sidebar + local daemon for streaming Markdown. **Chrome Web Store:** [Summarize Side Panel](https://chromewebstore.google.com/detail/summarize/cejgnmmhbbpdmjnfppjdfkocebngehfg) YouTube slide screenshots (from the browser): ![Summarize YouTube slide screenshots](docs/assets/youtube-slides.png) ### Beginner quickstart (extension) 1. Install the extension (Chrome Web Store link above) and open the Side Panel. 2. Choose **Direct** or **Daemon**. Direct uses Gemini Nano by default when no provider key is configured, or calls your selected provider from Chrome. 3. Choose Browser media for daemonless transcription/slides. Optional: install the CLI and pair the daemon for native tools, CLI model fallbacks, OCR, and broader media support: - **npm** (cross-platform): `npm i -g @steipete/summarize` - **Homebrew** (Homebrew/core): `brew install summarize` - `summarize daemon install --token ` Why a daemon/service? - Direct mode works without the daemon. Auto uses a configured OpenAI, OpenRouter, Anthropic, Gemini, xAI, Z.AI, NVIDIA, MiniMax, GitHub Models, or Ollama provider, otherwise Gemini Nano on-device; keys remain in extension-local storage. - The optional daemon on `127.0.0.1` adds CLI model fallbacks, shared caches/diagnostics, native ffmpeg, configurable transcription providers, OCR, and broader media support. - The service autostarts (launchd/systemd/Scheduled Task) so the Side Panel is always ready. If you only want the **CLI**, you can skip the daemon install entirely. Notes: - Summarization only runs when the Side Panel is open. - Auto mode summarizes on navigation (incl. SPAs); otherwise use the button. - Daemon is localhost-only and requires a shared token; rerunning `summarize daemon install --token ` adds another paired browser token instead of invalidating the old one. - Autostart: macOS (launchd), Linux (systemd user), Windows (Scheduled Task). - Windows containers: `summarize daemon install` starts the daemon for the current container session but does not register a Scheduled Task. Run it each time the container starts or add that command to your container startup, and publish port `8787` so the host browser can reach the daemon. - Tip: configure `free` via `summarize refresh-free` (needs `OPENROUTER_API_KEY`). Add `--set-default` to set model=`free`. More: - Step-by-step install: [apps/chrome-extension/README.md](apps/chrome-extension/README.md) - Architecture + troubleshooting: [docs/chrome-extension.md](docs/chrome-extension.md) - Firefox compatibility notes: [apps/chrome-extension/docs/firefox.md](apps/chrome-extension/docs/firefox.md) ### Slides (extension) - Select **Video + Slides** in the Summarize picker. - Slides render at the top; expand to full‑width cards with timestamps. - Click a slide to seek the video; toggle **Transcript/OCR** when OCR is significant. - Browser mode uses MediaBunny with native WebCodecs and ranged network reads for fetchable videos, then falls back to visible-tab capture when the source or codec is unavailable. - Daemon mode adds `yt-dlp`, native ffmpeg, and optional `tesseract` OCR. ### Advanced (unpacked / dev) 1. Build + load the extension (unpacked): - Chrome: `pnpm -C apps/chrome-extension build` - `chrome://extensions` → Developer mode → Load unpacked - Pick: `apps/chrome-extension/.output/chrome-mv3` - Firefox: `pnpm -C apps/chrome-extension build:firefox` - `about:debugging#/runtime/this-firefox` → Load Temporary Add-on - Pick: `apps/chrome-extension/.output/firefox-mv3/manifest.json` 2. Open Side Panel/Sidebar → copy token. 3. Install daemon in dev mode: - `pnpm summarize daemon install --token --dev` ## CLI ![Summarize CLI screenshot](docs/assets/summarize-cli.png) ### Install Requires Node 24+. - npx (no install): ```bash npx -y @steipete/summarize "https://example.com" ``` - npm (global): ```bash npm i -g @steipete/summarize ``` - npm (library / minimal deps): ```bash npm i @steipete/summarize-core ``` ```ts import { createLinkPreviewClient } from "@steipete/summarize-core/content"; ``` - Homebrew: ```bash brew install summarize ``` Homebrew ships from `homebrew/core` via `brew install summarize`. If Homebrew is unavailable in your environment, use the npm global install above. ### Optional local dependencies Install these if you want media-heavy features: - `ffmpeg`: optional native accelerator with broader codec support; bundled WebAssembly is the fallback - `yt-dlp`: required for YouTube slide extraction and some remote media flows - `tesseract`: optional OCR for `--slides-ocr` - Optional cloud transcription providers: - `GROQ_API_KEY` - `ASSEMBLYAI_API_KEY` - `ELEVENLABS_API_KEY` (speaker diarization) - `GEMINI_API_KEY` / `GOOGLE_GENERATIVE_AI_API_KEY` / `GOOGLE_API_KEY` - `OPENAI_API_KEY` - `FAL_KEY` macOS (Homebrew): ```bash brew install ffmpeg yt-dlp brew install tesseract # optional, for --slides-ocr ``` If native `ffmpeg`/`ffprobe` are unavailable, Summarize uses the bundled WebAssembly build. Native ffmpeg remains recommended for speed and broader codec/filter support. ### CLI vs extension - **CLI only:** just install via npm/Homebrew and run `summarize ...` (no daemon needed). - **Chrome extension:** Direct mode defaults to Gemini Nano without a key and supports provider-backed summaries/chat/automation/hover when configured; Browser media provides daemonless transcription and slides. Install the daemon for CLI fallbacks and native media tools. - **Firefox extension:** install the CLI and daemon for media extraction. ### Quickstart ```bash summarize "https://example.com" ``` Inspect the effective model setup. Status only lists configured or usable providers; it never prints keys or missing-provider noise. ```bash summarize status summarize status --verbose summarize status --probe summarize status --json ``` `--probe` checks supported model-list endpoints without running paid inference. CLI providers are reported as available when their enabled executable is present; API providers are reported as configured when an effective key is present. ### Inputs URLs or local paths: ```bash summarize "/path/to/file.pdf" --model google/gemini-3-flash summarize "https://example.com/report.pdf" --model google/gemini-3-flash summarize "/path/to/audio.mp3" summarize "/path/to/video.mp4" ``` Stdin (pipe content using `-`): ```bash echo "content" | summarize - pbpaste | summarize - # binary stdin also works (PDF/image/audio/video bytes) cat /path/to/file.pdf | summarize - ``` **Notes:** - Stdin has a 50MB size limit - The `-` argument tells summarize to read from standard input - Text stdin is treated as UTF-8 text (whitespace-only input is rejected as empty) - Binary stdin is preserved as raw bytes and file type is auto-detected when possible - Useful for piping clipboard content or command output YouTube (supports `youtube.com` and `youtu.be`): ```bash summarize "https://youtu.be/dQw4w9WgXcQ" --youtube auto ``` Podcast RSS (transcribes latest enclosure): ```bash summarize "https://feeds.npr.org/500005/podcast.xml" ``` Apple Podcasts episode page: ```bash summarize "https://podcasts.apple.com/us/podcast/2424-jelly-roll/id360084272?i=1000740717432" ``` Spotify episode page (best-effort; may fail for exclusives): ```bash summarize "https://open.spotify.com/episode/5auotqWAXhhKyb9ymCuBJY" ``` HLS playlist: ```bash summarize "https://example.com/master.m3u8" ``` ### Output length `--length` controls how much output we ask for (guideline), not a hard cap. The built-in default is `long`. Set a default in `~/.summarize/config.json` with `output.length`. ```bash summarize "https://example.com" --length long summarize "https://example.com" --length 20k ``` - Presets: `short|medium|long|xl|xxl` - Character targets: `1500`, `20k`, `20000` - Optional hard cap: `--max-output-tokens ` (e.g. `2000`, `2k`) - Provider/model APIs still enforce their own maximum output limits. - If omitted, no max token parameter is sent (provider default). - Prefer `--length` unless you need a hard cap. - Short content: when extracted content is shorter than the requested length, the CLI returns the content as-is. - Override with `--force-summary` to always run the LLM. - Minimums: `--length` numeric values must be >= 10 chars; `--max-output-tokens` must be >= 16. - Preset targets (source of truth: `packages/core/src/prompts/summary-lengths.ts`): - short: target ~900 chars (range 600-1,200) - medium: target ~1,800 chars (range 1,200-2,500) - long: target ~4,200 chars (range 2,500-6,000) - xl: target ~9,000 chars (range 6,000-14,000) - xxl: target ~17,000 chars (range 14,000-22,000) ### What file types work? Best effort and provider-dependent. These usually work well: - `text/*` and common structured text (`.txt`, `.md`, `.json`, `.yaml`, `.xml`, ...) - Text-like files are inlined into the prompt for better provider compatibility. - PDFs: `application/pdf` (provider support varies; Google is the most reliable here) - Images: `image/jpeg`, `image/png`, `image/webp`, `image/gif` - Audio/Video: `audio/*`, `video/*` (local audio/video files MP3/WAV/M4A/OGG/FLAC/MP4/MOV/WEBM automatically transcribed, when supported by the model) Notes: - If a provider rejects a media type, the CLI fails fast with a friendly message. - xAI models do not support attaching generic files (like PDFs) via the AI SDK; use Google/OpenAI/Anthropic for those. ### Model ids Use gateway-style ids: `/`. Examples: - `openai/gpt-5.4` - `openai/gpt-5.4-mini` - `openai/gpt-5.4-nano` - `openai/gpt-5-mini` - `openai/gpt-5-nano` - `github-copilot/gpt-5.4` - `anthropic/claude-sonnet-4-5` - `xai/grok-4-fast-non-reasoning` - `google/gemini-3-flash` - `zai/glm-4.7` - `minimax/MiniMax-M3` - `openrouter/openai/gpt-5-mini` (force OpenRouter) Note: some models/providers do not support streaming or certain file media types. When that happens, the CLI prints a friendly error (or auto-disables streaming for that model when supported by the provider). `gpt-5.4-mini` and `gpt-5.4-nano` are treated as real model ids; the same shorthand also works under `github-copilot/...`. ### OpenAI fast mode and thinking Fast mode is a request option, not a model id: ```bash summarize "https://example.com" --model openai/gpt-5.5 --fast --thinking medium summarize "https://example.com" --model openai/gpt-5.4 --service-tier fast --thinking low ``` - `--fast` is shorthand for `--service-tier fast`. - `--service-tier default|fast|priority|flex` controls OpenAI service tier. `fast` is the summarize/Codex-facing spelling and is sent to OpenAI as `service_tier="priority"`. - `--thinking none|low|medium|high|xhigh` controls OpenAI reasoning effort. Aliases: `off` → `none`, `min` → `low`, `mid` / `med` → `medium`, `x-high` / `extra-high` → `xhigh`. - `--service-tier default` clears a configured tier for one run. Config equivalent: ```json { "model": "openai/gpt-5.5", "openai": { "serviceTier": "fast", "thinking": "medium" } } ``` Compatibility aliases still work, but prefer the explicit flags above: - `--model gpt-fast` / `--model fast` → `openai/gpt-5.5` + fast tier + medium thinking - `--model openai/gpt-5.5-fast` → `openai/gpt-5.5` + fast tier ### Limits - Text inputs over 10 MB are rejected before tokenization. - Text prompts are preflighted against the model input limit (LiteLLM catalog), using a GPT tokenizer. ### Common flags ```bash summarize [flags] ``` Use `summarize --help` or `summarize help` for the full help text. - `--model `: which model to use (defaults to `auto`) - `--model auto`: automatic model selection + fallback (default) - `--model `: use a built-in or config-defined preset (see Configuration) - `--timeout `: `30s`, `2m`, `5000ms` (default `2m`) - `--retries `: LLM retry attempts on timeout (default `1`) - `--length short|medium|long|xl|xxl|s|m|l|` - `--language, --lang `: output language (`auto` = match source) - `--max-output-tokens `: hard cap for LLM output tokens - `--cli [provider]`: use a CLI provider (`--model cli/`). Supports `claude`, `gemini`, `codex`, `agent`, `openclaw`, `opencode`, `copilot`, `agy`, `pi`. If omitted, uses auto selection with CLI enabled. - `--stream auto|on|off`: stream LLM output (`auto` = TTY only; disabled in `--json` mode) - `--plain`: keep raw output (no ANSI/OSC Markdown rendering) - `--no-color`: disable ANSI colors - `--theme `: CLI theme (`aurora`, `ember`, `moss`, `mono`) - `--format md|text`: website/file content format (default `text`) - `--markdown-mode off|auto|llm|readability`: HTML -> Markdown mode (default `readability`) - `--preprocess off|auto|always`: controls `uvx markitdown` usage (default `auto`) - Install `uvx`: `brew install uv` (or https://astral.sh/uv/) - Image-only PDFs can fall back to OpenAI vision OCR when `OPENAI_API_KEY` is set; override the OCR model with `MARKITDOWN_OCR_MODEL` or page render DPI with `MARKITDOWN_OCR_DPI`. - `--extract`: print extracted content and exit (URLs only; stdin `-` is not supported) - Deprecated alias: `--extract-only` - `--slides`: extract slides for YouTube, direct video URLs, or local video files and render them inline in the summary narrative (auto-renders inline in supported terminals) - `--slides-ocr`: run OCR on extracted slides (requires `tesseract`) - `--slides-dir `: base output dir for slide images (default `./slides`) - `--slides-scene-threshold `: scene detection threshold (0.1-1.0) - `--slides-max `: maximum slides to extract (default `6`) - `--slides-min-duration `: minimum seconds between slides - `--json`: machine-readable output with diagnostics, prompt, `metrics`, and optional summary - `--verbose`: debug/diagnostics on stderr - `--metrics off|on|detailed`: metrics output (default `on`) ### Coding CLIs (Codex, Claude, Gemini, Agent, OpenClaw, OpenCode, Copilot, Antigravity, pi) Summarize can use common coding CLIs as local model backends: - `codex` -> `--cli codex` / `--model cli/codex/` - `claude` -> `--cli claude` / `--model cli/claude/` - `gemini` -> `--cli gemini` / `--model cli/gemini/` - `agent` (Cursor Agent CLI) -> `--cli agent` / `--model cli/agent/` - `openclaw` -> `--cli openclaw` / `--model cli/openclaw/` or `--model openclaw/` - `opencode` -> `--cli opencode` / `--model cli/opencode/` (`--model cli/opencode` uses the OpenCode runtime default) - `agy` (Antigravity CLI) -> `--cli agy` / `--model cli/agy` (uses agy's active session model; per-call model selection is not supported by agy print mode) - `pi` (Pi Coding Agent) -> `--cli pi` / `--model cli/pi` or `--model cli/pi/` Built-in preset: - `--model codex-fast` runs Codex with GPT-5.5 Fast mode and requires `codex login`. Requirements: - Binary installed and on `PATH` (or set `CODEX_PATH`, `CLAUDE_PATH`, `GEMINI_PATH`, `AGENT_PATH`, `OPENCLAW_PATH`, `OPENCODE_PATH`, `AGY_PATH`, `PI_PATH`) - Provider authenticated (`codex login`, `claude auth`, `gemini` login flow, `agent login` or `CURSOR_API_KEY`, `opencode auth login`, `agy` login flow or `ANTIGRAVITY_API_KEY`, `pi` uses configured provider API keys) Quick smoke test: ```bash printf "Summarize CLI smoke input.\nOne short paragraph. Reply can be brief.\n" >/tmp/summarize-cli-smoke.txt summarize --cli codex --plain --timeout 2m /tmp/summarize-cli-smoke.txt summarize --cli claude --plain --timeout 2m /tmp/summarize-cli-smoke.txt summarize --cli gemini --plain --timeout 2m /tmp/summarize-cli-smoke.txt summarize --cli agent --plain --timeout 2m /tmp/summarize-cli-smoke.txt summarize --cli openclaw --plain --timeout 2m /tmp/summarize-cli-smoke.txt summarize --cli opencode --plain --timeout 2m /tmp/summarize-cli-smoke.txt summarize --cli agy --plain --timeout 2m /tmp/summarize-cli-smoke.txt summarize --cli pi --plain --timeout 2m /tmp/summarize-cli-smoke.txt ``` Set explicit CLI allowlist/order: ```json { "cli": { "enabled": ["codex", "claude", "gemini", "agent", "openclaw", "opencode", "agy", "pi"] } } ``` Configure implicit auto CLI fallback: ```json { "cli": { "autoFallback": { "enabled": true, "onlyWhenNoApiKeys": true, "order": ["claude", "gemini", "codex", "agent", "openclaw", "opencode"] } } } ``` More details: [`docs/cli.md`](docs/cli.md) ### Auto model ordering `--model auto` builds candidate attempts from built-in rules (or your `model.rules` overrides). CLI attempts are prepended when: - `cli.enabled` is set (explicit allowlist/order), or - implicit auto selection is active and `cli.autoFallback` is enabled. Default fallback behavior: only when no API keys are configured, order `claude, gemini, codex, agent, openclaw, opencode, copilot`, and remember/prioritize last successful provider (`~/.summarize/cli-state.json`). Antigravity and pi are opt-in unless you add them to `cli.autoFallback.order`. Set explicit CLI attempts: ```json { "cli": { "enabled": ["gemini"] } } ``` Disable implicit auto CLI fallback: ```json { "cli": { "autoFallback": { "enabled": false } } } ``` Note: explicit `--model auto` does not trigger implicit auto CLI fallback unless `cli.enabled` is set. ### Website extraction (Firecrawl + Markdown) Non-YouTube URLs go through a fetch -> extract pipeline. When direct fetch/extraction is blocked or too thin, `--firecrawl auto` can fall back to Firecrawl (if configured). - `--firecrawl off|auto|always` (default `auto`) - `--extract --format md|text` (default `text`; if `--format` is omitted, `--extract` defaults to `md` for non-YouTube URLs) - `--markdown-mode off|auto|llm|readability` (default `readability`) - `auto`: use an LLM converter when configured; may fall back to `uvx markitdown` - `llm`: force LLM conversion (requires a configured model key) - `off`: disable LLM conversion (still may return Firecrawl Markdown when configured) - Plain-text mode: use `--format text`. ### YouTube transcripts `--youtube auto` tries best-effort web transcript endpoints first. When captions are not available, it falls back to: 1. yt-dlp + Whisper (if `yt-dlp` is available): downloads audio, then transcribes with local `whisper.cpp` when installed (preferred), otherwise falls back to Groq (`GROQ_API_KEY`), AssemblyAI (`ASSEMBLYAI_API_KEY`), Gemini (`GEMINI_API_KEY` / Google aliases), OpenAI (`OPENAI_API_KEY`), then FAL (`FAL_KEY`) 2. Android VR direct audio + the same configured transcription chain when `yt-dlp` is unavailable or fails 3. Apify (if `APIFY_API_TOKEN` is set): uses a scraping actor (`faVsWy9VTSNVIhWpR`) Environment variables for yt-dlp mode: - `YT_DLP_PATH` - optional path to yt-dlp binary (otherwise `yt-dlp` is resolved via `PATH`) - `SUMMARIZE_WHISPER_CPP_MODEL_PATH` - optional override for the local `whisper.cpp` model file - `SUMMARIZE_WHISPER_CPP_BINARY` - optional override for the local binary (default: `whisper-cli`) - `SUMMARIZE_DISABLE_LOCAL_WHISPER_CPP=1` - disable local whisper.cpp (force remote) - `GROQ_API_KEY` - Groq Whisper transcription - `ASSEMBLYAI_API_KEY` - AssemblyAI transcription - `GEMINI_API_KEY` - Gemini transcription (`GOOGLE_GENERATIVE_AI_API_KEY` / `GOOGLE_API_KEY` also work) - `OPENAI_API_KEY` - OpenAI Whisper transcription - `OPENAI_WHISPER_BASE_URL` - optional OpenAI-compatible Whisper endpoint override - `FAL_KEY` - FAL AI Whisper fallback Apify costs money but tends to be more reliable when captions exist. Speaker-labelled transcripts for YouTube, local audio/video, and direct media URLs: ```bash summarize "https://www.youtube.com/watch?v=..." --extract --diarize summarize "./interview.mp3" --extract --diarize summarize "https://cdn.example.com/interview.mp4" --extract --diarize openai summarize "./interview.mp4" --extract --diarize openai \ --identify-speakers --speaker-at "0:00=Host" --speaker-at "0:12=Guest" summarize "https://www.youtube.com/watch?v=..." --extract --diarize elevenlabs summarize "https://www.youtube.com/watch?v=..." --extract --diarize openai --timestamps summarize "https://www.youtube.com/watch?v=..." --extract --diarize elevenlabs \ --identify-speakers --speaker-profile my-podcast \ --speaker-at "0:12=Host Name" --remember-speakers ``` Bare `--diarize` prefers ElevenLabs Scribe v2 (`ELEVENLABS_API_KEY`) and falls back to OpenAI `gpt-4o-transcribe-diarize` (`OPENAI_API_KEY`). Speaker changes are emitted as `Speaker