--- name: webthinker-deep-research description: "Deep web research for VCO: multi-hop search+browse+extract with an auditable action trace and a structured report (WebThinker-style)." --- # WebThinker Deep Research (VCO) ## When to use Use this skill when the task requires **deep web research** (not just one-shot search), for example: - Multi-hop questions (“find → open → follow links → verify”) - “Deep research report” / “调研报告” / “竞品调研” / “技术调研” - Need an **auditable trace** of web actions and sources - Need to merge findings into a structured deliverable (report / brief / spec) ## Non-goals (avoid redundancy) - For **quick citations** or “give me 3 sources”, prefer `research-lookup`. - For **interactive UI flows** (login / forms / downloads), prefer `playwright` or `turix-cua` overlays. - For **codebase structure / call chains**, prefer GitNexus overlays (not web research). ## Output contract (must) Produce a folder with: - `report.md` — structured report (problem → findings → implications → next steps) - `sources.json` — all sources (URL/title/access time/snippet) - `trace.jsonl` — append-only action trace (search/open/extract/decision) - `notes.md` — working notes with per-source anchors Use `scripts/init_webthinker_run.py` to scaffold the folder. ## Runtime (Upstream vendoring) This VCO skill supports a **stable Lite mode** by default, and keeps the upstream WebThinker repo **vendored** for optional advanced use. - Vendored upstream paths: - `C:\Users\羽裳\.codex\_external\ruc-nlpir\WebThinker\` - Runtime config (no secrets stored): - `C:\Users\羽裳\.codex\skills\vibe\config\ruc-nlpir-runtime.json` - Preflight / install (no secrets echoed): - `pwsh C:\Users\羽裳\.codex\skills\vibe\scripts\ruc-nlpir\preflight.ps1` - Manually create an isolated venv for the vendored runtime and install only the minimal packages you need. The old `install-upstreams.ps1` auto-install path has been removed on purpose. LLM endpoint conventions (recommended): - Base URL: `OPENAI_BASE_URL` (or runtime default) - API key: `OPENAI_API_KEY` (**env var only; never write into files or CLI args**) ## Modes ### Mode A (Recommended): Lite — tool-orchestrated deep research Use existing tools (no heavy model hosting): 1. Scaffold outputs: - `python C:\Users\羽裳\.codex\skills\webthinker-deep-research\scripts\init_webthinker_run.py --topic "…" --out outputs/webthinker` 2. Search (broad → narrow): - Use `web.run` search queries or `mcp__tavily__tavily_search` if available. 3. Browse/extract: - Use `web.run open/click/find` for structured pages - Use `playwright` when pages require dynamic rendering / interactions 4. Draft + iterate: - Update `notes.md` and `sources.json` continuously - Write `report.md` as you go (think-search-and-draft), not only at the end 5. Verification: - Triangulate key claims across ≥2 sources when possible - Flag uncertainties explicitly ### Mode B (Optional): Full WebThinker stack Only choose this if you want to run the upstream system end-to-end and you have the environment: - Requires heavy deps (`torch`, `transformers`, `vllm`) + a served reasoning model - Requires a search API (Serper recommended by upstream) - Optional: Crawl4AI parser client for JS-heavy pages This mode is for **high-throughput** deep research runs; for most VCO tasks, Lite mode is enough and cheaper. ## Action trace format (trace.jsonl) Each line is one JSON object, e.g.: - `{"ts":"…","type":"search","query":"…","provider":"web.run"}` - `{"ts":"…","type":"open","url":"…"}` - `{"ts":"…","type":"extract","url":"…","highlights":["…","…"]}` - `{"ts":"…","type":"decision","reason":"why this source matters","next":"…"}` ## Quality gates - Every major claim in `report.md` links back to at least one entry in `sources.json`. - `sources.json` contains the exact URLs you used (no “I saw somewhere…”). - Keep the report actionable: add “Next steps” with concrete verification tasks.