---
name: html-parser-rule
description: 编写与测试 HTML 解析规则。用于从网页提取文章标题、链接、日期等信息。
---
# HTML Parser Rule Writer Skill
## Trigger Keywords
- "write html parser rule"
- "create html parser"
- "parse html source"
- "add html parser"
- "编写html解析规则"
- "创建html解析器"
- "添加html解析规则"
## Skill Description
This skill helps you write and test HTML parsing rules for article-flow project. It follows a systematic approach:
1. Fetch the HTML content and save locally
2. Write regex patterns step by step
3. Test each regex individually
4. Complete the full parsing rule
5. Run final integration test
## Instructions
### Step 1: Fetch HTML Content
When user provides a URL to parse, first fetch and save the HTML:
```bash
# Fetch HTML and save to temporary file
curl -L -H "User-Agent: Mozilla/5.0" "{URL}" > /tmp/source.html
# Show file size and first 100 lines to understand structure
wc -l /tmp/source.html
head -100 /tmp/source.html
```
Ask user to confirm the HTML looks correct and identify the article listing structure.
### Step 2: Identify Article Pattern
Examine the HTML structure and identify:
1. Article container element (div, article, li, etc.)
2. Title element and its pattern
3. Link element and its pattern
4. Date element and its pattern (if available)
5. Description element and its pattern (if available)
Show the user example HTML snippets and ask for confirmation.
### Step 3: Write Title Regex
Create and test the title extraction regex:
```typescript
// Example title regex
const titleRegex = /