---
name: content-parser
description: |
Extract and parse content from URLs. Triggers on: user provides a URL to extract
content from, another skill needs to parse source material, "parse this URL",
"extract content", "解析链接", "提取内容".
metadata:
openclaw:
emoji: "🔗"
requires:
env: ["LISTENHUB_API_KEY"]
primaryEnv: "LISTENHUB_API_KEY"
---
## When to Use
- User provides a URL and wants to extract/read its content
- Another skill needs to parse source material from a URL before generation
- User says "parse this URL", "extract content from this link"
- User says "解析链接", "提取内容"
## When NOT to Use
- User already has text content and doesn't need URL parsing
- User wants to generate audio/video content (not content extraction)
- User wants to read a local file (use standard file reading tools)
## Purpose
Extract and normalize content from URLs across supported platforms. Returns structured data including content body, metadata, and references. Useful as a preprocessing step for content generation skills or standalone content extraction.
## Hard Constraints
- No shell scripts. Construct curl commands from the API Reference (Inlined) section below
- See § API Reference (Inlined) below for API key and headers
- See § API Reference (Inlined) below for polling, errors, and interaction patterns
- URL must be a valid HTTP(S) URL
- Always read config following `shared/config-pattern.md` before any interaction
- Never save files to `~/Downloads/` or `.listenhub/` — save to the current working directory
Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After collecting URL and options, confirm with the user before calling the extraction API.
## Step -1: API Key Check
Follow `shared/config-pattern.md` § API Key Check. If the key is missing, stop immediately.
## Step 0: Config Setup
Follow `shared/config-pattern.md` Step 0 (Zero-Question Boot).
**If file doesn't exist** — silently create with defaults and proceed:
```bash
mkdir -p ".listenhub/content-parser"
echo '{"autoDownload":true}' > ".listenhub/content-parser/config.json"
CONFIG_PATH=".listenhub/content-parser/config.json"
CONFIG=$(cat "$CONFIG_PATH")
```
**Do NOT ask any setup questions.** Proceed directly to the Interaction Flow.
**If file exists** — read config silently and proceed:
```bash
CONFIG_PATH=".listenhub/content-parser/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/content-parser/config.json"
CONFIG=$(cat "$CONFIG_PATH")
```
### Setup Flow (user-initiated reconfigure only)
Only run when the user explicitly asks to reconfigure. Display current settings:
```
当前配置 (content-parser):
自动下载:{是 / 否}
```
Then ask:
1. **autoDownload**: "自动保存提取的内容到当前目录?"
- "是(推荐)" → `autoDownload: true`
- "否" → `autoDownload: false`
Save immediately:
```bash
NEW_CONFIG=$(echo "$CONFIG" | jq --argjson dl {true/false} '. + {"autoDownload": $dl}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
```
## Interaction Flow
### Step 1: URL Input
Free text input. Ask the user:
> What URL would you like to extract content from?
### Step 2: Options (optional)
Ask if the user wants to configure extraction options:
```
Question: "Do you want to configure extraction options?"
Options:
- "No, use defaults" — Extract with default settings
- "Yes, configure options" — Set summarize, maxLength, or Twitter tweet count
```
If "Yes", ask follow-up questions:
- **Summarize**: "Generate a summary of the content?" (Yes/No)
- **Max Length**: "Set maximum content length?" (Free text, e.g., "5000")
- **Twitter count** (only if URL is Twitter/X profile): "How many tweets to fetch?" (1-100, default 20)
### Step 3: Confirm & Extract
Summarize:
```
Ready to extract content:
URL: {url}
Options: {summarize: true, maxLength: 5000, twitter.count: 50} / default
Proceed?
```
Wait for explicit confirmation before calling the API.
## Workflow
1. **Validate URL**: Must be HTTP(S). Normalize if needed (see `references/supported-platforms.md`)
2. **Build request body**:
```json
{
"source": {
"type": "url",
"uri": "{url}"
},
"options": {
"summarize": true/false,
"maxLength": 5000,
"twitter": {
"count": 50
}
}
}
```
Omit `options` if user chose defaults.
3. **Submit (foreground)**: `POST /v1/content/extract` → extract `taskId`
4. Tell the user extraction is in progress
5. **Poll (background)**: Run the following **exact** bash command with `run_in_background: true` and `timeout: 300000`. Note: status field is `.data.status` (not `processStatus`), interval is 5s, values are `processing`/`completed`/`failed`:
```bash
TASK_ID=""
for i in $(seq 1 60); do
RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/content/extract/$TASK_ID" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "X-Source: skills" 2>/dev/null)
STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.status // "processing"')
case "$STATUS" in
completed) echo "$RESULT"; exit 0 ;;
failed) echo "FAILED: $RESULT" >&2; exit 1 ;;
*) sleep 5 ;;
esac
done
echo "TIMEOUT" >&2; exit 2
```
6. When notified, **download and present result**:
If `autoDownload` is `true`, generate a slug from the extracted title (falling back to domain name if no title). Follow `shared/config-pattern.md` § Artifact Naming for slug generation and dedup.
- Write `{slug}.md` to the **current directory** — full extracted content in markdown
- Write `{slug}.json` to the **current directory** — full raw API response data
```bash
SLUG="{title-slug}" # e.g. "topology-wikipedia"
# Dedup: check if files exist
BASE="$SLUG"; i=2
while [ -e "${SLUG}.md" ] || [ -e "${SLUG}.json" ]; do SLUG="${BASE}-${i}"; i=$((i+1)); done
echo "$CONTENT_MD" > "${SLUG}.md"
echo "$RESULT" > "${SLUG}.json"
```
Present:
```
内容提取完成!
来源:{url}
标题:{metadata.title}
长度:~{character count} 字符
消耗积分:{credits}
已保存到当前目录:
{slug}.md
{slug}.json
```
7. Show a preview of the extracted content (first ~500 chars)
8. Offer to use content in another skill (e.g. `/podcast`, `/tts`)
**Estimated time**: 10-30 seconds depending on content size and platform.
## API Reference (Inlined)
### Authentication
**Environment variable**: `LISTENHUB_API_KEY` (format: `lh_sk_...`)
Store in `~/.zshrc` (macOS) or `~/.bashrc` (Linux):
```bash
export LISTENHUB_API_KEY="lh_sk_..."
```
**How to obtain**: Visit https://listenhub.ai/settings/api-keys (Pro plan required).
**Base URL**: `https://api.marswave.ai/openapi/v1`
**Required headers** (every request):
```
Authorization: Bearer $LISTENHUB_API_KEY
Content-Type: application/json
X-Source: skills
```
The `X-Source: skills` header identifies requests as coming from Claude Code skills (CLI tool).
**curl template:**
```bash
curl -sS -X POST "https://api.marswave.ai/openapi/v1/{endpoint}" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Source: skills" \
-d '{ ... }'
```
For GET requests, omit `-d` and change `-X POST` to `-X GET`.
**Security notes:**
- Never log or display full API keys in output
- API keys are transmitted via HTTPS only
- Do not pass sensitive or confidential information as content input — it is sent to external APIs for processing
---
### POST /v1/content/extract
Create a content extraction task for a URL. Returns a `taskId` for polling.
**Request body:**
| Field | Required | Type | Description |
|-------|----------|------|-------------|
| source | **Yes** | object | Source to extract from |
| source.type | **Yes** | string | Must be `"url"` |
| source.uri | **Yes** | string | Valid HTTP(S) URL to extract content from |
| options | No | object | Extraction options |
| options.summarize | No | boolean | Whether to generate a summary |
| options.maxLength | No | integer | Maximum content length |
| options.twitter | No | object | Twitter/X specific options |
| options.twitter.count | No | integer | Number of tweets to fetch (1-100, default 20) |
**Response:**
```json
{
"code": 0,
"message": "success",
"data": {
"taskId": "69a7dac700cf95938f86d9bb"
}
}
```
**Error codes:**
| Code | Meaning |
|------|---------|
| 29003 | Validation error (`"source.uri" is required`, `"source.uri" must be a valid uri`) |
| 21007 | Invalid API key |
---
### GET /v1/content/extract/{taskId}
Get extraction task status and results.
**Path params:**
| Param | Type | Description |
|-------|------|-------------|
| taskId | string | 24-char hex task ID |
**Response states:**
- **processing** — Task is still running
- **completed** — Extraction finished, data available
- **failed** — Extraction failed, check `failCode` and `message`
**Response (processing):**
```json
{
"code": 0,
"message": "success",
"data": {
"taskId": "69a7dac700cf95938f86d9bb",
"status": "processing",
"createdAt": "2025-04-09T12:00:00Z",
"data": null,
"credits": 0,
"failCode": null,
"message": null
}
}
```
**Response (completed):**
```json
{
"code": 0,
"message": "success",
"data": {
"taskId": "69a7dac700cf95938f86d9bb",
"status": "completed",
"createdAt": "2025-04-09T12:00:00Z",
"data": {
"content": "Extracted text content...",
"metadata": {
"title": "Article Title",
"author": "Author Name",
"publishedAt": "2025-04-01T08:00:00Z"
},
"references": [
"https://example.com/related-article"
]
},
"credits": 5,
"failCode": null,
"message": null
}
}
```
**Response (failed):**
```json
{
"code": 0,
"message": "success",
"data": {
"taskId": "69a7dac700cf95938f86d9bb",
"status": "failed",
"createdAt": "2025-04-09T12:00:00Z",
"data": null,
"credits": 0,
"failCode": "EXTRACT_FAILED",
"message": "Unable to extract content from the provided URL"
}
}
```
**Key fields:**
| Field | Type | Description |
|-------|------|-------------|
| status | string | `processing`, `completed`, or `failed` |
| data.data.content | string | Extracted text content |
| data.data.metadata | object | Page metadata (title, author, publishedAt) |
| data.data.references | array | Referenced URLs (array of strings) |
| credits | integer | Credits consumed |
| failCode | string | Error code (null on success) |
| message | string | Error message (null on success) |
**Error codes:**
| Code | Meaning |
|------|---------|
| 29003 | Invalid taskId format |
| 25002 | Task not found |
---
### Polling Pattern
5-second interval, 60 polls max. Run with `run_in_background: true` and `timeout: 300000`.
**Two-step pattern:**
1. **Submit (foreground)**: POST the creation request, extract `taskId` from the response.
2. **Poll (background)**: Run the polling loop with `run_in_background: true`. You will be notified automatically when it completes.
The exact polling bash command is already specified in the Workflow section (Step 5).
---
### Error Handling
**HTTP status codes:**
| Code | Meaning | Action |
|------|---------|--------|
| 200 | Success | Parse response body |
| 400 | Bad request | Check parameters |
| 401 | Invalid API key | Re-check `LISTENHUB_API_KEY` |
| 402 | Insufficient credits | Inform user to recharge |
| 403 | Forbidden | No permission for this resource |
| 429 | Rate limited | Exponential backoff, retry after delay |
| 500/502/503/504 | Server error | Retry up to 3 times |
**Retry strategy:**
- **429 rate limit**: Wait 15 seconds, then retry (exponential backoff)
- **5xx server errors**: Retry up to 3 times with 5-second intervals
- **Network errors**: Retry up to 3 times
**Application error codes:**
| Code | Meaning |
|------|---------|
| 21007 | Invalid user API key |
| 25429 | Rate limited (application-level) |
## Example
**User**: "Parse this article: https://en.wikipedia.org/wiki/Topology"
**Agent workflow**:
1. URL: `https://en.wikipedia.org/wiki/Topology`
2. Options: defaults (omit options)
3. Submit extraction
```bash
curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Source: skills" \
-d '{
"source": {
"type": "url",
"uri": "https://en.wikipedia.org/wiki/Topology"
}
}'
```
4. Poll until complete:
```bash
curl -sS "https://api.marswave.ai/openapi/v1/content/extract/69a7dac700cf95938f86d9bb" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "X-Source: skills"
```
5. Present extracted content preview and offer next actions.
---
**User**: "Extract recent tweets from @elonmusk, get 50 tweets"
**Agent workflow**:
1. URL: `https://x.com/elonmusk`
2. Options: `{"twitter": {"count": 50}}`
3. Submit extraction
```bash
curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Source: skills" \
-d '{
"source": {
"type": "url",
"uri": "https://x.com/elonmusk"
},
"options": {
"twitter": {
"count": 50
}
}
}'
```
4. Poll until complete, present results.