---
name: web-search-pro
description: |
  Agent-first web search and retrieval for live web search, news search, docs lookup, code
  lookup, company research, site crawl, site map, and structured evidence packs.
  Includes a real no-key baseline plus optional Tavily, Exa, Querit, Serper, Brave, SerpAPI,
  You.com, SearXNG, and Perplexity / Sonar providers for wider coverage and answer-first routing.
homepage: https://github.com/Zjianru/web-search-pro
metadata: {"openclaw":{"emoji":"🔎","requires":{"bins":["node"]}},"clawdbot":{"emoji":"🔎","requires":{"bins":["node"],"config":["config.json"]},"install":[{"kind":"node","label":"Bundled Node skill runtime","bins":["node"]}],"config":{"stateDirs":[".cache/web-search-pro"],"example":"{\\n  env = {\\n    WEB_SEARCH_PRO_CONFIG = \\\"./config.json\\\";\\n  };\\n}"},"cliHelp":"node {baseDir}/scripts/search.mjs --help"}}
---

# Web Search Pro 2.1

This skill is for agents that need more than one-shot web search.

Use it when the caller needs:

- live web search or current-events lookup
- news search with explainable routing
- official docs, API docs, or code lookup
- company, product, or competitor research
- site crawl, site map, or docs discovery
- a structured evidence pack that can be handed back to an upstream model

This skill is not a narrative report writer. Its job is to search, retrieve, structure, and expose
evidence clearly enough that the upstream model can keep reasoning on top of it.

## Use This Skill When

- the task starts with web search but may continue into extraction or research
- the agent needs to know why a provider was selected
- the agent may need federated search instead of a single provider
- no-key baseline behavior matters for the first run
- runtime diagnostics or capability discovery are part of the workflow

## Do Not Use This Skill When

- the caller only wants the lightest possible single-shot web search wrapper
- the task expects a hosted scraping service
- the task expects the skill itself to write the final polished narrative report
- the caller needs an unlimited no-key search guarantee

## Quick Start

The shortest successful path is:

- start with the no-key baseline
- add one premium provider only when stronger recall or freshness is needed
- then try docs, news, and research flows

### Option A: No-key baseline

No API key is required for the first successful run.

Baseline roles:

- `ddg`: best-effort web search
- `fetch`: no-key extract / crawl / map fallback

```bash
node {baseDir}/scripts/doctor.mjs --json
node {baseDir}/scripts/bootstrap.mjs --json
node {baseDir}/scripts/search.mjs "OpenAI Responses API docs" --json
```

What these commands are for:

- `doctor.mjs`: is the runtime usable right now?
- `bootstrap.mjs`: what can the agent rely on right now?
- `search.mjs`: prove the baseline retrieval path succeeds before adding provider credentials

### Option B: Add one premium provider

If only one premium provider is added, start with `TAVILY_API_KEY`.

Reason:

- one credential improves general web search
- one credential improves news search
- one credential improves extract quality

```bash
export TAVILY_API_KEY=tvly-xxxxx
node {baseDir}/scripts/doctor.mjs --json
node {baseDir}/scripts/search.mjs "latest OpenAI news" --type news --json
```

### First successful searches

```bash
node {baseDir}/scripts/search.mjs "OpenClaw web search" --json
node {baseDir}/scripts/search.mjs "OpenAI Responses API docs" --preset docs --plan --json
node {baseDir}/scripts/extract.mjs "https://platform.openai.com/docs" --json
```

### Then try docs, news, and research

```bash
node {baseDir}/scripts/search.mjs "OpenAI Responses API docs" --preset docs --json
node {baseDir}/scripts/search.mjs "latest OpenAI news" --type news --json
node {baseDir}/scripts/research.mjs "OpenClaw search skill landscape" --plan --json
```

## Runtime Contract

The agent should treat these fields as the primary runtime contract.

### Routing fields

- `selectedProvider`
  The planner's primary route. It does not mean "the only provider used".
- `routingSummary`
  Compact route explanation with `selectionMode`, `confidence`, `topSignals`, alternatives, blocked
  providers, and federation summary.
- `routing.diagnostics`
  Full route diagnostics exposed by `--explain-routing` or `--plan`.

### Federation fields

- `federated.providersUsed`
  Providers that actually returned results when fanout is active.
- `federated.value.additionalProvidersUsed`
  Number of non-primary providers that really contributed.
- `federated.value.resultsRecoveredByFanout`
  Final results that would disappear in primary-only mode.
- `federated.value.resultsCorroboratedByFanout`
  Final results supported by both the primary and at least one fanout provider.
- `federated.value.duplicateSavings`
  Exact or near-duplicate results removed by merge.

### Cache and execution fields

- `cached`
  Whether the result came from cache.
- `cache`
  Cache age / TTL telemetry for agent decisions.
- `renderLane`
  Runtime availability and policy summary for the browser-backed render lane.
- `failed`
  Failed providers or failed retrieval units for the current command.
- `meta`
  Command-level execution metadata and task input shaping.

### Research fields

- `topicType`
  Primary topic class for the research pack.
- `topicSignals`
  Mixed-topic hints such as `docs + latest`.
- `researchAxes`
  Why the research pack decomposed into a given set of subquestions.
- `claimClusters`
  Evidence grouped by normalized claim.
- `candidateFindings`
  Candidate conclusions with support profile and gap sensitivity.
- `uncertainties`
  Remaining uncertainty and follow-up-sensitive gaps.

## Why Federated Search Matters

Federation is not just "more providers". It makes multi-provider gain visible so an agent can tell
whether fanout improved the final result set.

Important gain metrics:

- `federated.value.additionalProvidersUsed`
- `federated.value.resultsRecoveredByFanout`
- `federated.value.resultsCorroboratedByFanout`
- `federated.value.duplicateSavings`
- `routingSummary.federation.value`

Interpretation:

- recovered results answer "what did fanout rescue?"
- corroborated results answer "what got stronger support?"
- duplicate savings answer "what noise did merge remove?"

## Commands By Task

### Search

```bash
node {baseDir}/scripts/search.mjs "query" --json
node {baseDir}/scripts/search.mjs "query" --plan --json
node {baseDir}/scripts/search.mjs "latest OpenAI news" --type news --json
node {baseDir}/scripts/search.mjs "OpenAI Responses API docs" --preset docs --plan --json
node {baseDir}/scripts/search.mjs "query" --engine serpapi --search-engine baidu --json
```

User-facing inputs:

- `searchType`
  Current shipped values are `web | news`.
- `intentPreset`
  Current shipped values are `general | code | company | docs | research`.

Important boundary:

- `searchType` and `intentPreset` shape routing input
- `engine` remains the explicit provider override

### Extract and render

```bash
node {baseDir}/scripts/extract.mjs "https://example.com/article" --json
node {baseDir}/scripts/extract.mjs "https://example.com/article" --render --render-policy fallback --json
node {baseDir}/scripts/extract.mjs "https://example.com/article" --plan
node {baseDir}/scripts/render.mjs "https://example.com/article" --json
```

### Crawl and map

```bash
node {baseDir}/scripts/crawl.mjs "https://example.com/docs" --depth 2 --max-pages 10 --json
node {baseDir}/scripts/map.mjs "https://example.com/docs" --depth 2 --max-pages 50 --json
```

### Research

```bash
node {baseDir}/scripts/research.mjs "OpenClaw search skill landscape" --json
node {baseDir}/scripts/research.mjs "OpenClaw search skill landscape" --plan --json
```

### Runtime inspection

```bash
node {baseDir}/scripts/capabilities.mjs --json
node {baseDir}/scripts/doctor.mjs --json
node {baseDir}/scripts/bootstrap.mjs --json
node {baseDir}/scripts/review.mjs --json
node {baseDir}/scripts/cache.mjs stats --json
node {baseDir}/scripts/health.mjs --json
```

### Benchmarking

```bash
node {baseDir}/scripts/eval.mjs list --json
node {baseDir}/scripts/eval.mjs run --suite core --json
node {baseDir}/scripts/eval.mjs run --suite research --json
node {baseDir}/scripts/eval.mjs run --suite head-to-head --json
node {baseDir}/scripts/eval.mjs run --suite head-to-head-live --json
```

## Research Pack Boundary

`research.mjs` is a model-facing evidence layer, not a final narrative answer layer.

The skill is responsible for:

- question decomposition
- retrieval planning
- evidence normalization
- source prioritization
- claim clustering
- compact candidate findings
- uncertainty exposure

The upstream model remains responsible for:

- final reasoning across the evidence pack
- narrative synthesis
- user-facing writing style
- final judgment when evidence is incomplete or conflicting

Detailed contract:

- [docs/research-layer.md](/Users/codez/develop/web-search-pro/docs/research-layer.md)

## Runtime Config

Default config path:

- `{baseDir}/config.json`

Override path:

- `WEB_SEARCH_PRO_CONFIG=/path/to/config.json`

Config precedence:

`CLI flags > process.env > config.json > built-in defaults`

When the skill is run directly outside OpenClaw, provider keys must already exist in the shell
environment. If OpenClaw launches the skill, its injected runtime environment is sufficient.

Key config areas:

- `routing`
- `cache`
- `health`
- `fetch`
- `crawl`
- `render`

Important routing config fields:

- `routing.allowNoKeyBaseline`
- `routing.enableFederation`
- `routing.federationTriggers`
- `routing.maxFanoutProviders`
- `routing.maxPerProvider`
- `routing.mergePolicy`
- `routing.fallbackPolicy`

Important render config fields:

- `render.enabled`
- `render.policy`
- `render.budgetMs`
- `render.waitUntil`
- `render.blockTypes`
- `render.sameOriginOnly`

## Provider Upgrade Paths

No API key is required for the baseline. Optional provider credentials or endpoints unlock stronger
coverage:

Optional provider credentials or endpoints unlock enhanced features.

```bash
TAVILY_API_KEY=tvly-xxxxx
EXA_API_KEY=exa-xxxxx
QUERIT_API_KEY=xxxxx
SERPER_API_KEY=xxxxx
BRAVE_API_KEY=xxxxx
SERPAPI_API_KEY=xxxxx
YOU_API_KEY=xxxxx
SEARXNG_INSTANCE_URL=https://searx.example.com

# Perplexity / Sonar: choose one transport path
PERPLEXITY_API_KEY=xxxxx
OPENROUTER_API_KEY=xxxxx
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
KILOCODE_API_KEY=xxxxx

# Or use a custom OpenAI-compatible gateway
PERPLEXITY_GATEWAY_API_KEY=xxxxx
PERPLEXITY_BASE_URL=https://gateway.example.com/v1
PERPLEXITY_MODEL=perplexity/sonar-pro
```

Provider roles:

- Tavily: strongest premium default for general search, news, and extract
- Exa: semantic retrieval and extract fallback
- Querit: multilingual AI search with native geo and language filters
- Serper: Google-like search with strong news and locale coverage
- Brave: structured general web search
- SerpAPI: multi-engine routing including Baidu and Yandex
- You.com: LLM-ready web search with freshness and mixed web/news coverage
- SearXNG: self-hosted privacy-first metasearch fallback
- Perplexity / Sonar: answer-first grounded search via native or gateway transport
- DDG: best-effort no-key baseline search
- Fetch: no-key extract / crawl / map baseline
- Render: optional local browser lane

## Validation And Review

The most useful validation surfaces for agents and maintainers are:

- `capabilities.mjs`
  What this environment can truly do right now.
- `doctor.mjs`
  Is the runtime ready, degraded, or blocked?
- `bootstrap.mjs`
  What can an upstream agent safely assume?
- `review.mjs`
  What are the current safety and compliance boundaries?
- `health.mjs`
  Which providers are degraded or cooling down?
- `eval.mjs`
  Has behavior regressed against core, research, or comparative suites?

The bundled `head-to-head` suite focuses on route-first comparisons against a local
`../web-search-plus` checkout.

The bundled `head-to-head-live` suite adds real networked comparisons for freshness and citation
quality under shared provider credentials.

## Review-Safe Guarantees

- metadata only declares `node` as the hard runtime requirement
- provider credentials are optional
- `ddg` is a best-effort no-key baseline, not a guaranteed high-recall provider
- Safe Fetch rejects non-HTTP(S), credential-bearing, local, private, and metadata targets
- redirect targets are revalidated
- Safe Fetch keeps JavaScript execution disabled
- browser render is optional and off by default
- browser render uses a local headless browser only when enabled
- browser render reports anti-bot or challenge interstitials as failures instead of silent success
- provider health distinguishes `degraded` from `cooldown`

## Docs

- [README.md](/Users/codez/develop/web-search-pro/README.md)
- [docs/research-layer.md](/Users/codez/develop/web-search-pro/docs/research-layer.md)
- [docs/search-routing-model.md](/Users/codez/develop/web-search-pro/docs/search-routing-model.md)
- [docs/search-ux-model.md](/Users/codez/develop/web-search-pro/docs/search-ux-model.md)
- [docs/head-to-head-eval.md](/Users/codez/develop/web-search-pro/docs/head-to-head-eval.md)
- [docs/clawhub-package.md](/Users/codez/develop/web-search-pro/docs/clawhub-package.md)
- [docs/clawhub-compliance.md](/Users/codez/develop/web-search-pro/docs/clawhub-compliance.md)