--- name: xhs description: ๅฐ็บขไนฆ (Xiaohongshu/RedNote) research - search, analyze posts in depth, view images, read comments, output Chinese recommendations. Combines CLI tool usage with research methodology. metadata: {"clawdbot":{"emoji":"๐Ÿ“•","requires":{"bins":["rednote-mcp"]}}} --- # ๅฐ็บขไนฆ Research ๐Ÿ“• Research tool for Chinese user-generated content โ€” travel, food, lifestyle, local discoveries. ## When to Use - Travel planning and itineraries - Restaurant/cafe/bar recommendations - Activity and weekend planning - Product reviews and comparisons - Local discovery and hidden gems - Any question where Chinese perspectives help ## Recommended Model When spawning as a sub-agent: **Sonnet 4.5** (`model: "claude-sonnet-4-5-20250929"`) - Fast enough for the slow XHS API calls - Good at Chinese content understanding - More cost-effective than Opus for research grunt work - Opus overkill for search โ†’ synthesize workflow --- ## Context Management (Always Use) **ALWAYS use dynamic context monitoring** โ€” even 5 posts with images can hit 75-300k tokens. ### The Problem - Each post with images = 15-60k tokens - 200k context fills fast - Context is append-only (can't "forget" within session) ### The Solution: Monitor + Checkpoint + Continue **1. After EACH post, do two things:** ```markdown a) Write findings to disk immediately: /research/{task-id}/findings/post-{n}.md b) Check context usage: session_status โ†’ look for "Context: XXXk/200k (YY%)" ``` **2. When context hits 70%, STOP and checkpoint:** ```markdown Write state file: /research/{task-id}/state.json { "processed": 15, "pendingUrls": ["url16", "url17", ...], "summaries": ["Post 1: ็ซๅก˜...", ...] } Return to caller: { "complete": false, "processed": 15, "remaining": 25, "statePath": "/research/{task-id}/state.json", "findingsDir": "/research/{task-id}/findings/" } ``` **3. Caller spawns fresh sub-agent to continue:** ``` spawn_subagent( task="Continue XHS research from /research/{task-id}/state.json", model="claude-sonnet-4-5-20250929" ) ``` New sub-agent has fresh 200k context, reads state.json, continues from post 16. ### State File Schema ```json { "taskId": "kunming-food-2026-02-01", "query": "ๆ˜†ๆ˜Ž็พŽ้ฃŸ", "searchesCompleted": ["ๆ˜†ๆ˜Ž็พŽ้ฃŸ", "ๆ˜†ๆ˜Ž็พŽ้ฃŸๆŽจ่"], // Keywords already searched "processedUrls": ["url1", "url2", ...], // Explicit URL tracking (prevents duplicates) "pendingUrls": ["url3", "url4", ...], // Remaining URLs to process "nextPostNumber": 16, // Next post-XXX.md number "summaries": [ // 1-liner per post for final synthesis "Post 1: ็ซๅก˜้คๅŽ… | ๐ŸŸข | ยฅ80 | ๆœฌๅœฐไบบๆŽจ่", "Post 2: ้‡Ž็”Ÿ่Œ็ซ้”… | ๐ŸŸข | ยฅ120 | ่Œๅญๆ–ฐ้ฒœ" ], "batchNumber": 1, "contextCheckpoint": "70%" } ``` **Critical fields for handoff:** - `processedUrls`: Prevents re-processing same post across sub-agents - `pendingUrls`: Exact work remaining - `nextPostNumber`: Ensures sequential file naming - `searchesCompleted`: Prevents duplicate searches ### Workflow for Large Research **Caller should use longer timeout:** ``` sessions_spawn( task="...", model="claude-sonnet-4-5-20250929", runTimeoutSeconds=1800 // 30 minutes for research tasks ) ``` Default is 600s (10 min) โ€” too short for XHS research with slow API calls. **Interleave search and processing** (don't collect all URLs first): ``` [XHS Sub-agent 1] โ”œโ”€โ”€ Check for state.json (none = fresh start) โ”œโ”€โ”€ Search keyword 1 โ†’ get 20 URLs โ”œโ”€โ”€ Process 5-10 posts immediately (writing each to disk) โ”œโ”€โ”€ Search keyword 2 โ†’ get more URLs (dedupe) โ”œโ”€โ”€ Process more posts โ”œโ”€โ”€ Context hits 70% โ†’ write state.json โ””โ”€โ”€ Return {complete: false, remaining: N} ``` This prevents timeout from losing all work โ€” each post is saved as processed. **Full continuation pattern:** ``` [Caller] โ†“ spawn (runTimeoutSeconds=1800) [XHS Sub-agent 1] โ”œโ”€โ”€ Search + process interleaved โ”œโ”€โ”€ Context hits 70% โ†’ write state.json โ””โ”€โ”€ Return {complete: false, remaining: 25} [Caller sees incomplete] โ†“ spawn continuation (runTimeoutSeconds=1800) [XHS Sub-agent 2] โ† fresh 200k context! โ”œโ”€โ”€ Read state.json (has processedUrls, pendingUrls) โ”œโ”€โ”€ Continue processing + more searches if needed โ”œโ”€โ”€ Context hits 70% โ†’ write state.json โ””โ”€โ”€ Return {complete: false, remaining: 10} [Caller sees incomplete] โ†“ spawn continuation [XHS Sub-agent 3] โ”œโ”€โ”€ Read state.json โ”œโ”€โ”€ Process remaining posts โ”œโ”€โ”€ All done โ†’ write synthesis.md โ””โ”€โ”€ Return {complete: true, synthesisPath: "..."} ``` ### Output Directory Structure ``` /research/{task-id}/ โ”œโ”€โ”€ state.json # Checkpoint for continuation โ”œโ”€โ”€ findings/ โ”‚ โ”œโ”€โ”€ post-001.md # Full analysis + image paths โ”‚ โ”œโ”€โ”€ post-002.md โ”‚ โ””โ”€โ”€ ... โ”œโ”€โ”€ images/ โ”‚ โ”œโ”€โ”€ post-001/ โ”‚ โ”‚ โ”œโ”€โ”€ 1.jpg โ”‚ โ”‚ โ””โ”€โ”€ 2.jpg โ”‚ โ””โ”€โ”€ ... โ”œโ”€โ”€ summaries.md # All 1-liners (for quick scan) โ””โ”€โ”€ synthesis.md # Final output (when complete) ``` ### Key Rules (ALWAYS FOLLOW) 1. **Write after EVERY post** โ€” crash-safe, no work lost 2. **Check context after EVERY post** โ€” use `session_status` tool 3. **Stop at 70%** โ€” leave room for synthesis + buffer 4. **Return structured result** โ€” caller decides next step 5. **Read all images** โ€” they're pre-compressed (600px, q85) 6. **Skip videos** โ€” already marked in fetch-post โš ๏ธ **This is not optional.** Even small research can overflow context with image-heavy posts. --- ## Scripts (Mechanical Tasks) These scripts handle the repetitive CLI work: | Script | Purpose | |--------|---------| | `bin/preflight` | Verify tool is working before research | | `bin/search "keywords" [limit] [timeout] [sort]` | Search for posts (sort: general/newest/hot) | | `bin/get-content "url"` | Get full note content (text only) | | `bin/get-comments "url"` | Get comments on a note | | `bin/get-images "url" [dir]` | Download images only | | `bin/fetch-post "url" [cache] [retries]` | Fetch content + comments + images (with retries) | All scripts are at `/root/clawd/skills/xhs/bin/` ### Preflight (always run first) ```bash /root/clawd/skills/xhs/bin/preflight ``` Checks: rednote-mcp installed, cookies valid, stealth patches, test search. **Don't proceed until preflight passes.** ### Search ```bash /root/clawd/skills/xhs/bin/search "ๆ˜†ๆ˜Ž็พŽ้ฃŸๆŽจ่" [limit] [timeout] [sort] ``` Returns JSON with post results. **Parameters:** | Param | Default | Description | |-------|---------|-------------| | keywords | (required) | Search terms in Chinese | | limit | 10 | Max results (scroll pagination when >20) | | timeout | 180 | Seconds before giving up | | sort | general | Sort order (see below) | **Sort options:** | Value | XHS Label | When to use | |-------|-----------|-------------| | `general` | ็ปผๅˆ | **Default** โ€” XHS algorithm balances relevance + engagement. Best for most research. | | `newest` | ๆœ€ๆ–ฐ | ่ˆ†ๆƒ…็›‘ๆŽง, breaking news, recent experiences, time-sensitive topics | | `hot` | ๆœ€็ƒญ | Finding viral/popular posts, trending content | **Examples:** ```bash # Default sort (recommended for most research) bin/search "ๆ˜†ๆ˜Ž็พŽ้ฃŸๆŽจ่" 20 # Recent posts first (่ˆ†ๆƒ…, current events) bin/search "ๆŸๅ“็‰Œ ่ฏ„ไปท" 20 180 newest # Most popular posts bin/search "็ฝ‘็บขๆ‰“ๅกๅœฐ" 15 180 hot ``` **Scroll pagination enabled** (patched): When `limit > 20`, the tool scrolls to load more results via XHS infinite scroll. Actual results depend on available content. **For maximum coverage**, combine: 1. Higher limits (e.g., `limit=50`) to scroll for more 2. Multiple keyword variations for different result sets: - ้ฆ™่•‰ๆ”€ๅฒฉ, ้ฆ™่•‰ๆ”€ๅฒฉ้ฆ†, ้ฆ™่•‰ๆ”€ๅฒฉไฝ“้ชŒ, ้ฆ™่•‰ๆ”€ๅฒฉ่ฏ„ไปท - ๆ˜†ๆ˜Ž็พŽ้ฃŸ, ๆ˜†ๆ˜Ž็พŽ้ฃŸๆŽจ่, ๆ˜†ๆ˜Žๅฟ…ๅƒ, ๆ˜†ๆ˜ŽๆœฌๅœฐไบบๆŽจ่ **Results vary by query** โ€” popular topics may return 30-50+, niche topics fewer. **Choosing sort order:** - **Most research** โ†’ `general` (default). Let XHS's algorithm surface the best content. - **่ˆ†ๆƒ…็›‘ๆŽง / sentiment tracking** โ†’ `newest`. You want recent opinions, not old viral posts. - **Trend discovery** โ†’ `hot`. See what's currently popular. ### Get Content ```bash /root/clawd/skills/xhs/bin/get-content "FULL_URL_WITH_XSEC_TOKEN" ``` โš ๏ธ Must use full URL with `xsec_token` from search results. ### Get Comments ```bash /root/clawd/skills/xhs/bin/get-comments "FULL_URL_WITH_XSEC_TOKEN" ``` ### Get Images Download all images from a post to local files: ```bash /root/clawd/skills/xhs/bin/get-images "FULL_URL" /tmp/my-images ``` ### Fetch Post (Deep Dive with Images) Fetch content, comments, and images in one call โ€” with built-in retries: ```bash /root/clawd/skills/xhs/bin/fetch-post "FULL_URL" /path/to/cache [max_retries] ``` **Features:** - Retries on timeout (60s โ†’ 90s โ†’ 120s) - Clear error reporting in JSON output - Images cached locally, bypassing CDN protection **Returns JSON:** ```json { "success": true, "postId": "abc123", "content": { "title": "...", "author": "...", "desc": "...", "likes": "983", "tags": [...], "postDate": "2025-09-04" // โ† Added via patch! }, "comments": [{ "author": "...", "content": "...", "likes": "3" }, ...], "imagePaths": ["/cache/images/abc123/1.jpg", ...], "errors": [] } ``` **Date filtering:** Use `postDate` to filter out old posts. Skip posts older than your threshold (e.g., 6-12 months for restaurants). **Workflow:** ``` 1. fetch-post โ†’ JSON + cached images 2. Read each imagePath directly (Claude sees images natively) 3. Combine text + comments + what you see into findings ``` **Viewing images:** ``` Read("/path/to/1.jpg") # Claude sees it directly - no special tool needed ``` Look for: visible text (addresses, prices, hours), atmosphere, food presentation, crowd levels. --- ## Research Methodology (Judgment Tasks) This is where you think. Scripts do the fetching; you do the analyzing. ### Depth Levels | Depth | Posts | When to Use | |-------|-------|-------------| | Minimum | 5+ | Quick checks, simple queries | | Standard | 8-10 | Default for most research | | Deep | 15+ | Complex topics, trip planning | **Minimum is 5** โ€” unless fewer exist. Note limited coverage if <5 results. ### Research Workflow #### Step 0: Preflight Run `bin/preflight`. Don't proceed until it passes. #### Step 1: Plan Your Searches Think: "What would a Chinese user search on ๅฐ็บขไนฆ?" - Include location when relevant - Add qualifiers: ๆŽจ่, ๆ”ป็•ฅ, ๆต‹่ฏ„, ๆŽขๅบ—, ๆ‰“ๅก, ้ฟๅ‘ - Consider synonyms and variations - Plan 2-3 different search angles **Date filtering:** Posts include `postDate` field (e.g., "2025-09-04"). **The calling agent specifies the date filter** based on research type: | Research Type | Suggested Filter | Why | |---------------|------------------|-----| | ่ˆ†ๆƒ…็›‘ๆŽง (sentiment) | 1-4 weeks | Only current discourse matters | | Breaking news/events | 1-7 days | Time-critical | | Travel planning | 6-12 months | Recent but reasonable window | | Product reviews | 1-2 years | Longer product cycles | | Trend analysis | Custom range | Compare specific periods | | Historical/general | No limit | Want the full archive | **Caller should specify** in task description, e.g.: - "Only posts from last 30 days" (่ˆ†ๆƒ…) - "Posts from 2025 or later" (travel) - "No date filter" (general research) **If no filter specified:** Default to 12 months (safe middle ground). **Fallback when `postDate` is null:** Use keyword hints: `2025`, `ๆœ€่ฟ‘`, `ๆœ€ๆ–ฐ` **Language strategy:** | Location | Language | Example | |----------|----------|---------| | China | Chinese | `ๆ˜†ๆ˜Žๆ”€ๅฒฉ` | | English-named venues | Both | `Rock Tenet ๆ˜†ๆ˜Ž` | | International | Chinese | `ๅทด้ปŽๆ—…ๆธธ` | #### Step 2: Search & Scan Run your searches. Results are **already ranked by XHS's algorithm** (relevance + engagement). **Use judgment based on preview** โ€” like a human deciding what to click: Think: "Given my research goal, would this post likely contain useful information?" | Research Type | What to prioritize | |---------------|-------------------| | ่ˆ†ๆƒ…็›‘ๆŽง (sentiment) | Any opinion/experience, even low engagement โ€” complaints matter! | | Travel planning | High engagement + detailed experiences | | Product reviews | Mix of positive AND negative reviews | | Trend analysis | Variety of perspectives | | Preview Signal | Action | |----------------|--------| | Relevant content in preview | โœ… Fetch | | Matches research goal | โœ… Fetch | | Low engagement but relevant opinion | โœ… Fetch (esp. for ่ˆ†ๆƒ…) | | High engagement but off-topic | โŒ Skip | | Official announcements only | โš ๏ธ Context-dependent | | ๅนฟๅ‘Š/ๅˆไฝœ markers | โš ๏ธ Note as sponsored if fetching | | Clearly off-topic | โŒ Skip | | Duplicate content | โŒ Skip | **Key insight:** For ่ˆ†ๆƒ…็›‘ๆŽง, a 3-like complaint post may be more valuable than a 500-like promotional post. Engagement โ‰  relevance for all research types. #### Step 3: Deep Dive Each Post For each selected post, use `fetch-post` to get everything: ```bash bin/fetch-post "url_from_search" {{RESEARCH_DIR}}/xhs ``` Returns JSON with content, comments, and cached images. Has built-in retries. Then: **A. Review content** - Extract key facts from title/description - Note author's perspective/bias - Check tags for categorization **B. View images** (critical!) For each `imagePath` in the result, just read it: ``` Read("/path/to/1.jpg") # You see it directly ``` - Look for text overlays: addresses, prices, hours - Note visual details: ambiance, crowd levels, food presentation **โš ๏ธ Don't describe images in isolation.** Synthesize what you see with the post content and comments to form a holistic view. An image of a crowded restaurant + author saying "ๅ‘จๆœซๆŽ’้˜Ÿ1ๅฐๆ—ถ" + comments confirming "ไบบ่ถ…ๅคš" = that's your finding about crowds. **C. Review comments** (gold for updates) - "ๅทฒ็ปๅ…ณ้—จไบ†" = already closed - Real experiences vs sponsored hype - Tips not in main post **D. Return picked images** Include paths to the best/most informative images in your findings. The calling agent decides whether and how to use them (embed in reports, reference, etc.). You're curating โ€” pick images that show something useful (venue exterior, menu with prices, actual food, atmosphere) not just decorative shots. #### Step 4: Synthesize - What do multiple sources agree on? - Any contradictions? - What's the overall consensus? - What would you actually recommend? #### Step 5: Output **Facts + Flavor** โ€” structured findings that preserve the XHS voice. ```markdown ## XHS Research: [Topic] ### Search Summary | Search | Results | Notes | |--------|---------|-------| | ๆ˜†ๆ˜Žๆ”€ๅฒฉ | 10 | Good coverage | ### Findings #### [Venue Name] (ไธญๆ–‡ๅ) - **Type:** Restaurant / Activity / Attraction - **Address:** [from post or image] - **Price:** ยฅXX/person - **Hours:** [if found] - **The vibe:** [atmosphere, energy โ€” preserved voice] - **Why people like it:** [opinions, impressions] - **Watch out for:** [warnings from comments] - **Source:** [full URL] - **Engagement:** X likes - **Images:** [paths for calling agent to use] - `/path/to/1.jpg` โ€” exterior/entrance - `/path/to/3.jpg` โ€” menu with prices > "ๅผ•็”จๅŽŸๆ–‡..." โ€” @username ### Overall Impressions - Consensus across posts - Patterns in preferences - Things only locals know - Disagreements worth noting ``` **The XHS value is the human perspective.** A recommendation that says "็Žฏๅขƒไธ€่ˆฌไฝ†ๆ˜ฏๅ‘ณ้“็ปไบ†" tells you more than "Rating: 4.2/5". Think: "What would a friend who just spent an hour on XHS tell me?" --- ## Quality Signals **Trustworthy:** - 100+ likes with real comments - Detailed personal experience - Multiple photos from actual visit - Specific details (prices, hours) - Recent posts (look for date mentions in content: "ไธŠๅ‘จ", "ๆ˜จๅคฉ", "2025ๅนดXๆœˆ") - Year in title (e.g., "2025ไธŠๆตทๅ’–ๅ•กๅฟ…ๅ–ๆฆœ") **Checking recency:** - Look for dates in post text/title - Check if prices seem current - Comments mentioning "่ฟ˜ๅœจๅ—" or "็Žฐๅœจ่ฟ˜ๆœ‰ๅ—" = might be outdated - Comments with recent dates confirm post is still relevant **Suspicious:** - ๅนฟๅ‘Š/ๅˆไฝœ/่ตžๅŠฉ markers - Overly positive, no specifics - Stock photos only - No comments or generic ones - Very old posts --- ## Timing & Efficiency ### XHS is SLOW โ€” Plan Accordingly The rednote-mcp CLI is slow (30-90s per search). Don't rapid-fire poll. **When running searches via `exec`:** ```bash # GOOD: Give it time to complete exec(command, yieldMs: 60000) # Wait 60s before checking process(poll) # Then poll every 30s if still running ``` **DON'T:** - Poll every 2-3 seconds (wastes tokens, no benefit) - Start multiple searches simultaneously (overloads) - Wait indefinitely without writing partial results ### Write Incrementally Don't wait until you've analyzed everything to start writing. After each batch of 3-5 posts: - Append findings to your output file - This protects against timeout/termination losing all work ```markdown ## Findings (in progress) ### Batch 1: ็พŽ้ฃŸๆœ็ดข (3 posts analyzed) [findings...] ### Batch 2: ๆ”ป็•ฅๆœ็ดข (analyzing...) ``` ### Time Budget Awareness If you've been running 15+ minutes: - Prioritize writing what you have - Note incomplete searches in output - Better to deliver 80% findings than lose 100% to termination ## Retry Pattern rednote-mcp is slow. If a command times out: ``` Attempt 1: default timeout Attempt 2: +60s Attempt 3: +120s ``` If all fail, **report the failure**. Do NOT fall back to web_search โ€” defeats the purpose. --- ## Error Handling | Error | Cause | Fix | |-------|-------|-----| | Timeout | Network/XHS slow | Retry with longer timeout | | Login/cookie error | Session expired | `xvfb-run -a rednote-mcp init` | | 404 / xsec_token | Missing token | Use full URL from search | | Empty results | No posts | Try different keywords | --- ## Setup & Maintenance ### First-Time Setup ```bash npm install -g rednote-mcp npx playwright install /root/clawd/skills/xhs/patches/apply-all.sh xvfb-run -a rednote-mcp init ``` ### Re-login (when cookies expire) ```bash xvfb-run -a rednote-mcp init ``` ### After rednote-mcp updates ```bash /root/clawd/skills/xhs/patches/apply-all.sh ``` --- ## Role Clarification **This skill** = Research tool that outputs structured findings **Calling agent** = Synthesizes XHS + other sources into final reports, decides which images to embed **You return:** - Synthesized findings (text + images + comments โ†’ holistic view) - Curated image paths (calling agent decides how to use them) - Preserved human voice (opinions, vibes, tips) **You don't:** - Describe images in isolation ("I see a restaurant...") - Generate final reports (that's the caller's job) - Decide image layout/placement XHS is like having a Chinese-speaking friend spend an hour researching for you. They'd give you facts, but also opinions, vibes, and insider tips. That's what you're capturing. --- **Remember:** Research like a curious human. Explore, cross-reference, look at pictures, read comments. The "่ฟ™ๅฎถ็œŸ็š„็ปไบ†" matters as much as the address.