# smart-search > **Original Project**: This project is based on [konbakuyomu/smartsearch](https://github.com/konbakuyomu/smartsearch) [简体中文](README.zh-CN.md) | English CLI-first, skill-driven web research for AI agents and terminal users. `smart-search` gives AI tools one reproducible command layer for live search, source discovery, page fetching, site mapping, provider diagnostics, and live Deep Research execution.
## What It Is `smart-search` is not an MCP server. It is a normal CLI that AI agents can call through a skill: ```powershell smart-search search "latest OpenAI Responses API changes" --format json smart-search fetch "https://example.com/article" --format markdown smart-search research "Compare Responses API web_search with Chat Completions search" --format markdown ``` The current architecture has two layers: | Layer | Responsibility | | --- | --- | | CLI executor | Runs deterministic commands, provider routing, fallback, JSON/Markdown output, local config | | Skill / AI orchestration | Infers user intent, chooses normal search vs Deep Research, executes planned CLI steps, writes final source-backed answers | Default `smart-search search` stays fast and live. `smart-search research` is the live Deep Research executor: it builds an internal plan, then runs discovery, fetch/read, gap check, and evidence-only synthesis. ## Install Stable channel: ```powershell npm install -g @blxzer/smart-search@latest smart-search --version smart-search setup ``` Test channel: ```powershell npm install -g @blxzer/smart-search@next smart-search --version ``` The npm package creates an isolated Python runtime during install. You still use the single `smart-search` command. Prerequisites: - Node.js / npm. - Python 3.10 or newer available as `python`, `python3`, or `py -3` on Windows. ## Quick Start 1. Configure providers: ```powershell smart-search setup smart-search doctor --format json ``` 2. If OpenAI-compatible `search` hangs or times out, generate the short troubleshooting report: ```powershell smart-search doctor --format markdown smart-search diagnose openai-compatible --format markdown ``` 3. Run a normal live search: ```powershell smart-search search "today's important AI news" --validation balanced --extra-sources 2 --format json ``` 4. Fetch exact page evidence: ```powershell smart-search fetch "https://example.com/source" --format markdown --output evidence.md ``` 5. Run live Deep Research when you want the CLI to execute the staged workflow: ```powershell smart-search research "Deep research recent Bitcoin market movement" --budget deep --format markdown ``` ## Current Architecture | Capability | Main commands | Providers | Role | | --- | --- | --- | --- | | `main_search` | `search` | OpenAI-compatible Chat Completions | Broad answer generation and synthesis | | `docs_search` | `context7-library`, `context7-docs`, `exa-search` | Context7, Exa | Official docs, SDKs, APIs, framework/library evidence | | `web_search` | Bilingual source discovery inside `search`; `zhipu-search` only for deprecated manual compatibility | Tavily, Firecrawl; Zhipu Web Search API only when explicitly requested | Chinese and English web discovery for every normal research question | | `web_fetch` | `fetch` | Tavily, Jina Reader, Firecrawl | Exact URL content extraction for evidence | | `site_map` | `map` | Tavily | Site/documentation structure discovery | | `research_executor` | `research` / `rs` | Registered providers by capability | Live staged research: plan, discover, fetch/read, gap check, evidence-only synthesis | Fallback is same-capability only: | Capability | Fallback chain | | --- | --- | | `main_search` | OpenAI-compatible | | `docs_search` | Context7 for library/API/docs intent; Exa for official domains, papers, product pages, and trusted-site discovery | | `web_search` | Tavily -> Firecrawl; Zhipu only when explicitly selected for the deprecated legacy command | | `web_fetch` | Tavily -> Jina Reader with `JINA_API_KEY` -> Firecrawl | Jina Reader is a `web_fetch` provider only. `JINA_API_KEY` is required before Jina satisfies `SMART_SEARCH_MINIMUM_PROFILE=standard`; anonymous `r.jina.ai` behavior is treated as explicit/experimental fetch behavior and must not weaken fail-closed setup checks. The CLI exposes observability fields such as `routing_decision`, `provider_attempts`, `providers_used`, `fallback_used`, `primary_sources`, `extra_sources`, and `source_warning`. Default `balanced` and `strict` `search` run bilingual `web_search` source discovery through Tavily / Firecrawl when configured: one Chinese-source query and one English-source query for the same user question. `--validation fast` skips supplemental discovery. Strict queries without primary, docs, fetch, or explicit source evidence can still return `evidence_error`; use `--extra-sources N`, source-first commands such as `exa-search`, or `fetch` when citable evidence is required. Docs supplemental routing stays keyword-based for explicit docs/API/library/framework intent. `extra_sources` are explicit discovery candidates from `--extra-sources N`, which defaults to `0`. For high-risk claims, news, policy, finance, health, selection decisions, and serious reviews, fetch key pages first and cite fetched text rather than treating a broad search answer as proof. Routing rule of thumb: start with `search` for broad bilingual discovery and synthesis; use `research` when you want the CLI to execute the deeper evidence workflow; use Context7 first for library/API/framework docs; use Exa for official domains, papers, product pages, trusted sites, and low-noise discovery; use Tavily/Firecrawl for bilingual web discovery and URL/page evidence; use Jina for known-URL extraction. Zhipu is retained only as a deprecated manual compatibility command, not a default route. ## Deep Research Use normal search when you want a fast answer: ```powershell smart-search search "React useEffect cleanup docs" --format json ``` Use live Deep Research execution when you want the CLI to run the staged workflow: ```powershell smart-search research "OpenAI Responses API web_search vs Chat Completions search: which should I use?" --budget deep --fallback auto --format json smart-search research "query" --locale-scope cn --budget quick --format json smart-search research "query" --dry-run --format json smart-search research "query" --progress --format markdown smart-search rs "https://example.com/source" --fallback off --format markdown ``` Research CLI options: | Flag | Values | Purpose | | --- | --- | --- | | `--budget` | `quick`, `standard`, `deep` (default) | Controls decomposition depth and candidate fetch limits (`quick=3`, `standard=5`, `deep=6` URLs) | | `--locale-scope` | `cn`, `en`, `both` (default) | Bilingual web discovery scope; use `cn` or `en` to reduce duplicate search cost | | `--dry-run` | flag | Plan + routing preview only; no live provider calls | | `--progress` | flag | `[research]` stage logs to stderr (suppressed during `--dry-run`) | | `--fallback` | `auto` (default), `off` | Same-capability provider fallback inside each stage | | `--evidence-dir` | path | Override default evidence artifact directory | `research` builds an internal Deep Research plan, then runs plan -> discover -> fetch/read -> gap check -> evidence-only synthesis. The plan stage produces: - `intent_signals`, such as recency, docs/API intent, known URL, claim risk, source authority, and cross-validation need; - `decomposition`, with 1-6 subquestions depending on budget and difficulty; - `capability_plan`, choosing from existing CLI blocks; - `steps[]`, each with `tool`, `purpose`, `command`, `output_path`, and `subquestion_id`; - `evidence_policy="fetch_before_claim"`; - `gap_check`, which fetches missing evidence or downgrades unsupported claims. Deep Research is not a fixed topic recipe system. Market research, product comparison, technical docs, news or policy, claim verification, and URL-first prompts are examples of user language, not required schema enums. The plan only composes existing CLI blocks: ```text search, exa-search, exa-similar, context7-library, context7-docs, fetch, map ``` `doctor` is preflight, not a research step. `research` defaults to `--fallback auto`, which permits same-capability fallback even when a normal `search` configuration is conservative. `--fallback off` tries only the first provider selected inside each capability, which is useful for debugging provider behavior. Research JSON includes `final_answer`, `citations`, `evidence_items`, `gap_check`, `provider_attempts`, `fallback_used`, `degraded`, `route_policy_version`, `evidence_dir`, and `output_schema_version` (currently `1`). Citations are structured objects with `id`, `source_type`, `subquestion_id`, `verified`, and `content_len`. `provider_attempts` and `stage_results` may include `cache_hit` when the provider TTL cache serves a repeat query. Discovery snippets are candidates only; citations are produced only from fetched/read evidence. If fallback cannot close a gap, `research` finishes degraded and lists unsupported gaps instead of inventing evidence. The research router is capability-first plus provider-advantage: - Context7 first for library/API/framework docs, with Exa as official-domain, paper, product, or trusted low-noise discovery. - Tavily / Firecrawl for bilingual Chinese and English broad source discovery. Zhipu is deprecated from default routing and is not used for Chinese/current/domestic searches unless explicitly requested through the legacy command. - Jina is favored for known public URLs, PDFs, and arXiv extraction; ReaderLM-v2 still requires `JINA_API_KEY`. - Firecrawl is favored for JS-heavy, dynamic, browser-like, OCR/PDF, or robust fallback extraction. Advanced routing overrides are available through `SMART_SEARCH_RESEARCH_PREFERRED_PROVIDERS` and `SMART_SEARCH_RESEARCH_DISABLED_PROVIDERS`. They can reorder or disable registered providers inside their supported capability, but they cannot move a provider across capability boundaries. Good user-facing research prompts: ```powershell smart-search research "深度搜索一下最近的比特币行情" --format json smart-search research "OpenAI Responses API web_search 和 Chat Completions 联网搜索怎么选" --budget deep --format json smart-search research "帮我核验这个说法是真是假:某某工具已经完全替代 Tavily 做 AI 搜索了" --format json smart-search research "https://example.com/source" --format json ``` ## Provider And API Key Guide Use `smart-search setup` for normal configuration. Environment variables remain supported for CI and advanced users. | Provider / route | Used for | Main config keys | Official docs | Key / dashboard | | --- | --- | --- | --- | --- | | OpenAI-compatible Chat Completions | Primary live search through OpenAI or a compatible relay | `OPENAI_COMPATIBLE_API_URL`, `OPENAI_COMPATIBLE_API_KEY`, `OPENAI_COMPATIBLE_MODEL`, `OPENAI_COMPATIBLE_STREAM` | [OpenAI platform docs](https://platform.openai.com/docs) | [OpenAI API keys](https://platform.openai.com/api-keys) or your relay provider | | Exa | Low-noise official docs, API, paper, product, trusted-page discovery | `EXA_API_KEY` | [Exa docs](https://docs.exa.ai/) | [Exa API keys](https://dashboard.exa.ai/api-keys) | | Context7 | SDK, library, framework, and API documentation fallback | `CONTEXT7_API_KEY`, `CONTEXT7_BASE_URL` | [Context7 docs](https://context7.com/docs) | [Context7](https://context7.com/) | | Zhipu Web Search API | Deprecated manual `zhipu-search` compatibility only; not used by default routing | `ZHIPU_API_KEY`, `ZHIPU_API_URL`, `ZHIPU_SEARCH_ENGINE` | [Zhipu web search docs](https://docs.bigmodel.cn/cn/guide/tools/web-search) | [Zhipu API keys](https://open.bigmodel.cn/usercenter/apikeys) | | Tavily | Extra web sources, URL fetch, and site map | `TAVILY_API_URL`, `TAVILY_API_KEY` | [Tavily docs](https://docs.tavily.com/) | [Tavily app](https://app.tavily.com/home) | | Jina Reader | Known URL page extraction for `web_fetch`; key required for standard minimum profile | `JINA_API_KEY`, `JINA_READER_API_URL`, `JINA_RESPOND_WITH`, `JINA_TIMEOUT_SECONDS` | [Jina Reader](https://jina.ai/reader/) | [Jina AI](https://jina.ai/) | | Firecrawl | Fetch fallback and supplementary web sources | `FIRECRAWL_API_URL`, `FIRECRAWL_API_KEY` | [Firecrawl docs](https://docs.firecrawl.dev/) | [Firecrawl API keys](https://www.firecrawl.dev/app/api-keys) | Important boundaries: - OpenAI-compatible relays and gateways use the Chat Completions `/chat/completions` route through `OPENAI_COMPATIBLE_*`. - `OPENAI_COMPATIBLE_STREAM=true` or `smart-search search --stream` sets `stream=true` only for OpenAI-compatible `search` and provider-side `fetch` calls. It is a relay compatibility switch for long requests and does not change URL description or source ranking. - Legacy `SMART_SEARCH_API_URL`, `SMART_SEARCH_API_KEY`, `SMART_SEARCH_API_MODE`, and `SMART_SEARCH_MODEL` are not supported config keys. Use `OPENAI_COMPATIBLE_*` explicitly. - Default web discovery is bilingual Tavily / Firecrawl. `zhipu-search` is retained only as a deprecated manual compatibility command and is not used by normal `search` or `research` routing. - `zhipu-search` support is the Web Search API route, not Zhipu Chat Completions `tools=[web_search]`, not Search Agent, and not the MCP Server. - Jina Reader is not a general search provider. `JINA_API_KEY` is required for Jina to count toward `standard`; `JINA_RESPOND_WITH=readerlm-v2` also requires `JINA_API_KEY`. - `ZHIPU_SEARCH_ENGINE` defaults to `search_std`. Supported official values include `search_std`, `search_pro`, `search_pro_sogou`, and `search_pro_quark`; custom values remain allowed for future services. - `TAVILY_API_URL` affects Tavily only. It does not proxy Zhipu. For Tavily Hikari / pooled endpoints, use `https://