---
name: summarize
description: Content extraction skill for AI coding agents. YouTube transcripts, podcasts, PDFs, images (OCR), audio/video via the summarize CLI.
---

# Summarize

Extract clean text and media transcripts from URLs, files, and streams so your AI workflow can reason over reliable source content without hand-coding brittle scraper logic.

Use this skill when you need deterministic extraction for YouTube, podcast feeds, PDFs, scanned images, or local media files.

Terminology used in this file:
- **DOM:** Document Object Model, the page element structure used by browser-based extractors.
- **OCR:** Optical character recognition (extracting text from images/scans).
- **ANSI codes:** Terminal color/control sequences; `--plain` removes them for machine parsing.

## Setup

```bash
brew tap steipete/tap
brew install summarize
```

- **Claude Code:** copy this skill folder into `.claude/skills/summarize/`
- **Codex CLI:** append this SKILL.md content to your project's root `AGENTS.md`

For the full installation walkthrough (prerequisites, optional dependencies, verification, troubleshooting), see [references/installation-guide.md](references/installation-guide.md).

## Staying Updated

This skill ships with an `UPDATES.md` changelog and `UPDATE-GUIDE.md` for your AI agent.

After installing, tell your agent: "Check `UPDATES.md` in the summarize skill for any new features or changes."

When updating, tell your agent: "Read `UPDATE-GUIDE.md` and apply the latest changes from `UPDATES.md`."

Follow `UPDATE-GUIDE.md` so customized local files are diffed before any overwrite.

---

## Quick Start

Run one extraction flow end-to-end:

```bash
summarize --version
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --plain
summarize --extract "/path/to/document.pdf" --plain
```

Use `--extract --plain` as the default pattern for deterministic, non-ANSI output.

## Decision Tree: summarize vs Other Tools

```text
Need content from the web?
  |
  +-- Static web page (article, docs, blog)?
  |     --> WebFetch (built-in, zero deps, faster)
  |     --> Jina r.jina.ai (zero install alternative)
  |     --> summarize ONLY if above tools fail or return garbage
  |
  +-- JS-heavy SPA / dynamic content?
  |     --> Crawl4AI crwl (full browser rendering)
  |     --> summarize will NOT help here (no JS rendering)
  |
  +-- Anti-bot / paywalled / Cloudflare-protected?
  |     --> summarize --firecrawl always (requires FIRECRAWL_API_KEY)
  |     --> browser-based workflow as fallback
  |
  +-- YouTube video?
  |     --> summarize --extract (ONLY option for transcript)
  |     --> Add --youtube web for captions-only (faster)
  |     --> Add --slides for visual slide extraction
  |
  +-- Podcast / RSS feed?
  |     --> summarize --extract (ONLY option)
  |     --> Supports Apple Podcasts, Spotify, RSS feeds, Podbean, etc.
  |
  +-- PDF (URL or local file)?
  |     --> summarize --extract (ONLY CLI option)
  |     --> Requires: uvx/markitdown (brew install uv)
  |
  +-- Image (OCR)?
  |     --> summarize --extract (ONLY CLI option)
  |     --> Requires: tesseract
  |
  +-- Audio / video file?
        --> summarize --extract (ONLY CLI option)
        --> Requires: whisper-cli (local) or OPENAI_API_KEY (cloud)
```

**Rule of thumb:** summarize is the default for media extraction (YouTube, podcasts, audio, video, images). For web pages, prefer WebFetch/Jina/Crawl4AI depending on DOM complexity (how hard the page structure is to parse). Use summarize for web only when other tools fail.

## Extraction Mode (Primary)

`--extract` prints raw extracted content and exits. No LLM involved.
Use this first. You can handle any downstream synthesis in your own workflow.

```bash
# Web page extraction (plain text, default)
summarize --extract "https://example.com" --plain

# Web page extraction (markdown format)
summarize --extract "https://example.com" --format md --plain

# YouTube transcript
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --plain

# YouTube transcript with timestamps
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --timestamps --plain

# YouTube transcript formatted as markdown (requires LLM -- uses API key)
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --format md --markdown-mode llm --plain

# YouTube slides + transcript
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --slides --plain

# Podcast (RSS feed)
summarize --extract "https://feeds.example.com/podcast.xml" --plain

# Apple Podcasts episode
summarize --extract "https://podcasts.apple.com/us/podcast/EPISODE_ID" --plain

# PDF from URL
summarize --extract "https://example.com/document.pdf" --plain

# PDF from local file
summarize --extract "/path/to/document.pdf" --plain

# Image OCR
summarize --extract "/path/to/image.png" --plain

# Audio transcription
summarize --extract "/path/to/audio.mp3" --plain

# Video transcription
summarize --extract "/path/to/video.mp4" --plain

# Stdin (pipe content)
pbpaste | summarize --extract - --plain
cat document.pdf | summarize --extract - --plain
```

**Always use `--plain`** when extracting for agent consumption. It suppresses ANSI/OSC rendering.

**Extraction defaults:**
- URLs default to `--format md` in extract mode
- Files default to `--format text`
- PDF requires uvx/markitdown (`--preprocess auto`, which is default)

## LLM Summarization Mode (Secondary)

Use this mode only when you explicitly want summarize to perform synthesis itself.

```bash
# Summarize a URL (requires API key for the chosen model)
summarize "https://example.com" --model anthropic/claude-sonnet-4-5 --length long

# Summarize with a custom prompt
summarize "https://example.com" --prompt "Extract key technical decisions and their rationale"

# Summarize YouTube video
summarize "https://www.youtube.com/watch?v=VIDEO_ID" --length xl

# JSON output with metrics
summarize "https://example.com" --json --model openai/gpt-5-mini
```

**API keys for LLM mode** (set in `~/.summarize/config.json` or env vars):
- `ANTHROPIC_API_KEY` -- for anthropic/ models
- `OPENAI_API_KEY` -- for openai/ models
- `GEMINI_API_KEY` -- for google/ models
- `XAI_API_KEY` -- for xai/ models

## Dependency Matrix

| Feature | Required Deps |
|---------|--------------|
| Web page extraction | None |
| YouTube transcript (captions) | None (web mode) |
| YouTube transcript (no captions) | yt-dlp + whisper or API key |
| YouTube slides | yt-dlp + ffmpeg |
| Podcast transcription | yt-dlp + whisper or API key |
| PDF extraction | uvx/markitdown |
| Image OCR | tesseract |
| Audio/video transcription | whisper-cli (local) or OPENAI_API_KEY |
| Anti-bot sites (Firecrawl) | FIRECRAWL_API_KEY |
| Slide OCR | tesseract |

**What is not installed (by design):**
- `whisper-cli` / whisper.cpp -- heavy binary, install when audio transcription is needed
- Firecrawl API key -- paid service, configure when anti-bot extraction is needed
- LLM API keys in summarize config -- only add if you use LLM Summarization Mode

## Key Flags Quick Reference

| Flag | Purpose | Example |
|------|---------|---------|
| `--extract` | Raw content extraction, no LLM | `summarize --extract URL` |
| `--plain` | No ANSI rendering (agent-safe output) | Always use for agents |
| `--format md\|text` | Output format (md default for URLs in extract) | `--format md` |
| `--youtube auto\|web\|yt-dlp` | YouTube transcript source | `--youtube web` (captions only) |
| `--slides` | Extract video slides with ffmpeg | `--slides --slides-ocr` |
| `--timestamps` | Include timestamps in transcripts | `--timestamps` |
| `--firecrawl off\|auto\|always` | Firecrawl for anti-bot sites | `--firecrawl always` |
| `--preprocess off\|auto\|always` | Preprocessing (markitdown for PDFs) | Default `auto` |
| `--markdown-mode` | HTML-to-MD conversion mode | `--markdown-mode readability` |
| `--timeout` | Fetch/LLM timeout | `--timeout 2m` |
| `--verbose` | Debug output to stderr | Troubleshooting |
| `--json` | Structured JSON output with metrics | `--json` |
| `--length` | Summary length (LLM mode only) | `--length xl` |
| `--model` | LLM model (LLM mode only) | `--model anthropic/claude-sonnet-4-5` |
| `--max-extract-characters` | Limit extract output length | `--max-extract-characters 50000` |
| `--language\|--lang` | Output language | `--lang en` |
| `--video-mode` | Video handling mode | `--video-mode transcript` |
| `--transcriber` | Audio backend | `--transcriber whisper` |

## Verified Services (YouTube/Podcasts)

**YouTube:** All public videos with captions. Falls back to yt-dlp audio download + transcription for videos without captions.

**Podcasts (verified):**
- Apple Podcasts
- Spotify (best-effort; may fail for exclusives)
- Amazon Music / Audible podcast pages
- Podbean
- Podchaser
- RSS feeds (Podcasting 2.0 transcripts when available)
- Embedded YouTube podcast pages

## Common Patterns

### 1. YouTube Transcript for Analysis

```bash
# Quick: captions only (fastest, no deps beyond summarize)
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --youtube web --plain

# Full: with timestamps
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --timestamps --plain

# Formatted as clean markdown (requires LLM API key)
summarize --extract "https://www.youtube.com/watch?v=VIDEO_ID" --format md --markdown-mode llm --plain
```

### 2. Podcast Episode Transcript

```bash
# From RSS feed (transcribes latest episode)
summarize --extract "https://feeds.example.com/podcast.xml" --plain

# From Apple Podcasts link
summarize --extract "https://podcasts.apple.com/us/podcast/SHOW/EPISODE" --plain
```

### 3. PDF Content Extraction

```bash
# From URL
summarize --extract "https://example.com/report.pdf" --plain

# From local file
summarize --extract "/path/to/file.pdf" --plain

# Limit output length
summarize --extract "/path/to/huge.pdf" --max-extract-characters 50000 --plain
```

### 4. Image OCR

```bash
summarize --extract "/path/to/screenshot.png" --plain
summarize --extract "/path/to/scanned-doc.jpg" --plain
```

### 5. Anti-Bot Website (Firecrawl Fallback)

```bash
# Requires FIRECRAWL_API_KEY in env or config
summarize --extract "https://paywalled-site.com/article" --firecrawl always --plain
```

### 6. Batch Extraction (Shell Loop)

```bash
# Extract multiple YouTube videos
for url in "URL1" "URL2" "URL3"; do
  echo "=== $url ==="
  summarize --extract "$url" --plain
done
```

## Error Handling

| Symptom | Cause | Fix |
|---------|-------|-----|
| `Missing uvx/markitdown` | PDF preprocessing not available | `brew install uv` |
| `does not support extracting binary files` | Preprocessing disabled for PDF | Use `--preprocess auto` (default) with uvx installed |
| YouTube returns empty transcript | No captions available, no yt-dlp/whisper | Install yt-dlp; for whisper fallback, install whisper-cli or set OPENAI_API_KEY |
| `FIRECRAWL_API_KEY not set` | Anti-bot mode requires Firecrawl | Set key in env or `~/.summarize/config.json` |
| Timeout on large content | Default 2m timeout too short | Use `--timeout 5m` |
| Audio transcription fails | No whisper backend available | Install whisper-cli locally or set OPENAI_API_KEY/FAL_KEY |
| Podcast extraction fails | Audio download failed | Check yt-dlp is installed and updated: `brew upgrade yt-dlp` |
| Garbled web extraction | JS-rendered content | summarize has no JS engine; use Crawl4AI instead |

## Configuration

Config file: `~/.summarize/config.json`

```json
{
  "model": "auto",
  "env": {
    "FIRECRAWL_API_KEY": "fc-..."
  },
  "ui": {
    "theme": "mono"
  }
}
```

Configure only what your workflow needs. If you use LLM Summarization Mode, add the required API keys.

## Anti-Patterns

| Do NOT | Do Instead |
|--------|------------|
| Use summarize for static web pages | WebFetch or Jina (faster, zero deps) |
| Use summarize for JS-heavy SPAs | Crawl4AI crwl (has browser rendering) |
| Use summarize's LLM mode as default | Use `--extract` and run synthesis in your own workflow unless explicitly required |
| Skip `--plain` for any non-interactive run | Always use `--plain` to avoid ANSI escape codes |
| Install whisper.cpp preemptively | Install only when audio transcription use case arises |
| Forget `--timeout` for large media | Podcasts/videos can take minutes; set `--timeout 5m` |
| Use summarize when WebFetch works | summarize is heavier; reserve for media and fallback |
| Use summarize for local repo/codebase search | Use your local knowledge search tools |

## Bundled Resources Index

| Path | What | When to Load |
|------|------|-------------|
| `./UPDATES.md` | Structured changelog for AI agents | When checking for new features or updates |
| `./UPDATE-GUIDE.md` | Instructions for AI agents performing updates | When updating this skill |
| `./references/installation-guide.md` | Detailed install walkthrough for Claude Code and Codex CLI | First-time setup or environment repair |
| `./references/commands.md` | Full CLI flag reference with all options | When you need exact flag syntax or env var names |