--- name: tuzi-url-to-markdown description: 通过 Chrome CDP 抓取任意 URL 并转换为 Markdown。支持两种模式——页面加载后自动抓取,或等待用户信号(适用于需要登录的页面)。当用户要求将网页保存为 Markdown 时使用。 --- # URL to Markdown Fetches any URL via Chrome CDP and converts HTML to clean markdown. ## Script Directory **Important**: All scripts are located in the `scripts/` subdirectory of this skill. **Agent Execution Instructions**: 1. Determine this SKILL.md file's directory path as `SKILL_DIR` 2. Script path = `${SKILL_DIR}/scripts/.ts` 3. Replace all `${SKILL_DIR}` in this document with the actual path **Script Reference**: | Script | Purpose | |--------|---------| | `scripts/main.ts` | CLI entry point for URL fetching | ## Preferences (EXTEND.md) Use Bash to check EXTEND.md existence (priority order): ```bash # Check project-level first test -f .tuzi-skills/tuzi-url-to-markdown/EXTEND.md && echo "project" # Then user-level (cross-platform: $HOME works on macOS/Linux/WSL) test -f "$HOME/.tuzi-skills/tuzi-url-to-markdown/EXTEND.md" && echo "user" ``` ┌────────────────────────────────────────────────────────┬───────────────────┐ │ Path │ Location │ ├────────────────────────────────────────────────────────┼───────────────────┤ │ .tuzi-skills/tuzi-url-to-markdown/EXTEND.md │ Project directory │ ├────────────────────────────────────────────────────────┼───────────────────┤ │ $HOME/.tuzi-skills/tuzi-url-to-markdown/EXTEND.md │ User home │ └────────────────────────────────────────────────────────┴───────────────────┘ ┌───────────┬───────────────────────────────────────────────────────────────────────────┐ │ Result │ Action │ ├───────────┼───────────────────────────────────────────────────────────────────────────┤ │ Found │ Read, parse, apply settings │ ├───────────┼───────────────────────────────────────────────────────────────────────────┤ │ Not found │ **MUST** run first-time setup (see below) — do NOT silently create defaults │ └───────────┴───────────────────────────────────────────────────────────────────────────┘ **EXTEND.md Supports**: Download media by default | Default output directory | Default capture mode | Timeout settings ### First-Time Setup (BLOCKING) **CRITICAL**: When EXTEND.md is not found, you **MUST use `AskUserQuestion`** to ask the user for their preferences before creating EXTEND.md. **NEVER** create EXTEND.md with defaults without asking. This is a **BLOCKING** operation — do NOT proceed with any conversion until setup is complete. Use `AskUserQuestion` with ALL questions in ONE call: **Question 1** — header: "Media", question: "How to handle images and videos in pages?" - "Ask each time (Recommended)" — After saving markdown, ask whether to download media - "Always download" — Always download media to local imgs/ and videos/ directories - "Never download" — Keep original remote URLs in markdown **Question 2** — header: "Output", question: "Default output directory?" - "url-to-markdown (Recommended)" — Save to ./url-to-markdown/{domain}/{slug}.md - (User may choose "Other" to type a custom path) **Question 3** — header: "Save", question: "Where to save preferences?" - "User (Recommended)" — ~/.tuzi-skills/ (all projects) - "Project" — .tuzi-skills/ (this project only) After user answers, create EXTEND.md at the chosen location, confirm "Preferences saved to [path]", then continue. Full reference: [references/config/first-time-setup.md](references/config/first-time-setup.md) ### Supported Keys | Key | Default | Values | Description | |-----|---------|--------|-------------| | `download_media` | `ask` | `ask` / `1` / `0` | `ask` = prompt each time, `1` = always download, `0` = never | | `default_output_dir` | empty | path or empty | Default output directory (empty = `./url-to-markdown/`) | **Value priority**: 1. CLI arguments (`--download-media`, `-o`) 2. EXTEND.md 3. Skill defaults ## Features - Chrome CDP for full JavaScript rendering - Two capture modes: auto or wait-for-user - Clean markdown output with metadata - Handles login-required pages via wait mode - Download images and videos to local directories ## Usage ```bash # Auto mode (default) - capture when page loads npx -y bun ${SKILL_DIR}/scripts/main.ts # Wait mode - wait for user signal before capture npx -y bun ${SKILL_DIR}/scripts/main.ts --wait # Save to specific file npx -y bun ${SKILL_DIR}/scripts/main.ts -o output.md # Download images and videos to local directories npx -y bun ${SKILL_DIR}/scripts/main.ts --download-media ``` ## Options | Option | Description | |--------|-------------| | `` | URL to fetch | | `-o ` | Output file path (default: auto-generated) | | `--wait` | Wait for user signal before capturing | | `--timeout ` | Page load timeout (default: 30000) | | `--download-media` | Download image/video assets to local `imgs/` and `videos/`, and rewrite markdown links to local relative paths | ## Capture Modes | Mode | Behavior | Use When | |------|----------|----------| | Auto (default) | Capture on network idle | Public pages, static content | | Wait (`--wait`) | User signals when ready | Login-required, lazy loading, paywalls | **Wait mode workflow**: 1. Run with `--wait` → script outputs "Press Enter when ready" 2. Ask user to confirm page is ready 3. Send newline to stdin to trigger capture ## Output Format YAML front matter with `url`, `title`, `description`, `author`, `published`, `captured_at` fields, followed by converted markdown content. ## Output Directory ``` url-to-markdown//.md ``` - ``: From page title or URL path (kebab-case, 2-6 words) - Conflict resolution: Append timestamp `-YYYYMMDD-HHMMSS.md` When `--download-media` is enabled: - Images are saved to `imgs/` next to the markdown file - Videos are saved to `videos/` next to the markdown file - Markdown media links are rewritten to local relative paths ## Media Download Workflow Based on `download_media` setting in EXTEND.md: | Setting | Behavior | |---------|----------| | `1` (always) | Run script with `--download-media` flag | | `0` (never) | Run script without `--download-media` flag | | `ask` (default) | Follow the ask-each-time flow below | ### Ask-Each-Time Flow 1. Run script **without** `--download-media` → markdown saved 2. Check saved markdown for remote media URLs (`https://` in image/video links) 3. **If no remote media found** → done, no prompt needed 4. **If remote media found** → use `AskUserQuestion`: - header: "Media", question: "Download N images/videos to local files?" - "Yes" — Download to local directories - "No" — Keep remote URLs 5. If user confirms → run script **again** with `--download-media` (overwrites markdown with localized links) ## Environment Variables | Variable | Description | |----------|-------------| | `URL_CHROME_PATH` | Custom Chrome executable path | | `URL_DATA_DIR` | Custom data directory | | `URL_CHROME_PROFILE_DIR` | Custom Chrome profile directory | **Troubleshooting**: Chrome not found → set `URL_CHROME_PATH`. Timeout → increase `--timeout`. Complex pages → try `--wait` mode. ## Extension Support Custom configurations via EXTEND.md. See **Preferences** section for paths and supported options.