# Setting Up Your Search Keys

This guide shows you how to get the API keys that power your searches. You only need one provider to get started — or set up several, and the server will automatically switch between them if one goes down.

## How to Configure Keys

Pass API keys as environment variables. How you set them depends on your MCP client:

**Claude Code** (CLI / VS Code / JetBrains):
```json
// In ~/.claude.json under "mcpServers":
{
  "web-researcher": {
    "command": "web-researcher-mcp",
    "env": {
      "BRAVE_API_KEY": "your-key",
      "EPO_OPS_CONSUMER_KEY": "your-key",
      "EPO_OPS_CONSUMER_SECRET": "your-secret"
    }
  }
}
```

**Claude Desktop**:
```json
// In ~/Library/Application Support/Claude/claude_desktop_config.json (macOS)
// or %APPDATA%\Claude\claude_desktop_config.json (Windows)
{
  "mcpServers": {
    "web-researcher": {
      "command": "/path/to/web-researcher-mcp",
      "env": {
        "BRAVE_API_KEY": "your-key"
      }
    }
  }
}
```

**Shell (direct / Docker)**:
```bash
export BRAVE_API_KEY=your-key
web-researcher-mcp
```

Keys set in the MCP client config are passed directly to the server process — no `.env` file needed.

---

## DuckDuckGo (Zero-Config Default)

**Free**: No API key, no registration, no query limits to configure.

DuckDuckGo is the built-in fallback and works out of the box. If you set no provider keys at all, web search still works through DuckDuckGo. There is nothing to configure — but you can select it explicitly:

```bash
export SEARCH_PROVIDER=duckduckgo
web-researcher-mcp
```

For better result quality and higher volume, add one of the keyed providers below and the server will prefer it.

---

## Hacker News (Zero-Config)

**Free**: No API key, no registration. Searches Hacker News stories through the public [HN Algolia](https://hn.algolia.com/) index.

This is a domain-specific provider — it returns Hacker News stories only, not general web results. Select it when you want HN discussion and submission results:

```bash
export SEARCH_PROVIDER=hackernews
web-researcher-mcp
```

### Good to know

- **`web_search` and `news_search` only.** `image_search` with Hacker News returns empty (no error). Keep an image-capable provider (Google, Brave, SearchAPI) in `SEARCH_ROUTING` if you need images.
- **Date filtering works.** `dateRange` is honored via the Algolia `numericFilters` (`created_at_i`); `num_results` accepts 1–100 (values outside that range reset to the default of 10).
- **Reading threads.** `scrape_page` on a `news.ycombinator.com` item, user, or list URL is read natively through the HN Firebase API (story + top comments) — independent of which `SEARCH_PROVIDER` is set.

---

## Google Custom Search (Programmable Search Engine)

**Free tier**: 100 queries/day (paid: $5 per 1,000 queries)

### Step 1: Get an API Key

1. Go to the [Google Cloud Console](https://console.cloud.google.com/)
2. Create a new project (or select an existing one)
3. Navigate to **APIs & Services > Library**
4. Search for "Custom Search API" and enable it
5. Go to **APIs & Services > Credentials**
6. Click **Create Credentials > API Key**
7. Copy the key

### Step 2: Create a Programmable Search Engine

1. Go to [Programmable Search Engine](https://programmablesearchengine.google.com/)
2. Click **Add** to create a new search engine
3. Under "What to search", select **Search the entire web**
4. Give it a name and click **Create**
5. Copy the **Search Engine ID** (cx)

### Step 3: Configure

```bash
export GOOGLE_CUSTOM_SEARCH_API_KEY=AIzaSy...your-key
export GOOGLE_CUSTOM_SEARCH_ID=your-search-engine-id
```

---

## Brave Search

**Free tier**: 2,000 queries/month (paid plans available)

### Step 1: Get an API Key

1. Go to [Brave Search API](https://brave.com/search/api/)
2. Click **Get Started** and create an account
3. Subscribe to the **Free** plan (or a paid plan for higher volume)
4. Navigate to your dashboard and copy your API key

### Step 2: Configure

```bash
export BRAVE_API_KEY=BSAxxxxxxxxxxxxxxxxxx
```

To use Brave as your primary (or only) provider:

```bash
export SEARCH_PROVIDER=brave
export BRAVE_API_KEY=BSAxxxxxxxxxxxxxxxxxx
```

---

## Serper.dev

**Free tier**: 2,500 queries (one-time credit, then paid)

### Step 1: Get an API Key

1. Go to [Serper.dev](https://serper.dev/)
2. Sign up for an account
3. Your API key is shown on the dashboard immediately after sign-up
4. Copy the key

### Step 2: Configure

```bash
export SERPER_API_KEY=your-serper-key
```

To use Serper as your primary provider:

```bash
export SEARCH_PROVIDER=serper
export SERPER_API_KEY=your-serper-key
```

---

## Tavily

**Free tier**: monthly credits for development; paid plans for higher volume

Tavily is a search API purpose-built for AI agents — it returns clean, extracted, LLM-ready content rather than raw result pages. It supports web and news search (no native image search; image queries fall through to another provider when routing is enabled).

### Step 1: Get an API Key

1. Go to [app.tavily.com](https://app.tavily.com/)
2. Sign up for an account
3. Copy the `tvly-...` API key from your dashboard

### Step 2: Configure

```bash
export SEARCH_PROVIDER=tavily
export TAVILY_API_KEY=tvly-your-key
```

The key is sent as an `Authorization: Bearer` header (never in the request body), and queries are capped at Tavily's 400-character limit automatically.

### Good to know

- **No image search.** `image_search` with Tavily returns empty (no error). Keep an image-capable provider (Google, Brave, SearchAPI) in `SEARCH_ROUTING` if you need images — the Router falls through automatically. Best used as a routing member rather than the sole provider: `SEARCH_ROUTING=tavily,brave,google`.
- **Web `time_range` is strict.** Tavily's web recency filter is aggressive — a `time_range=week` web search may return nothing for terms that have older results. For recent *news* use `news_search` (its `freshness` window works well); for recent *web* content, widen `time_range` or omit it.
- **Some filters don't apply.** Tavily honors `site`, `lens`, `num_results`, `time_range`/`freshness`, but ignores `country`, `language`, `safe`, and exact/exclude-term filters (it has no API field for them). Use Google if you need hard country/language/exact-phrase control.

---

## Exa

**Free tier**: 1,000 requests/month; paid per call beyond that

Exa is a neural/semantic search API. Beyond ordinary web and news search, an Exa key unlocks several capabilities no other provider offers here:

- **Grounded answers** — backs the provider-independent `answer` tool (one synthesized answer with citations).
- **Structured extraction** — backs the provider-independent `structured_search` tool (schema-defined fields and company/people entities, as JSON per result).
- **Academic search** — `academic_search` can route to Exa via the research-paper category.
- **A paid scrape fallback** — Exa's `/contents` API becomes the last-resort extraction tier for `scrape_page`, recovering hard pages the free tiers can't (only when the local tiers all fail).

### Step 1: Get an API Key

1. Go to [dashboard.exa.ai](https://dashboard.exa.ai/)
2. Sign up for an account
3. Copy your API key from the dashboard

### Step 2: Configure

```bash
export SEARCH_PROVIDER=exa
export EXA_API_KEY=your-exa-key
```

The key is sent as the `x-api-key` header (never in the request body or logs).

### Good to know

- **Paid per call.** Exa charges per request (free tier: 1,000/month). Each `answer` / `structured_search` response (when served by Exa) reports the estimated `costUsd` of that call, and the cost is recorded in the audit trail as `cost_usd`. The estimate is not an invoice.
- **No image search.** `image_search` with Exa returns empty (no error) — keep an image-capable provider (Google, Brave, SearchAPI) in `SEARCH_ROUTING` if you need images.
- **Search type is fixed to `auto`.** The expensive deep/deep-reasoning tiers are deliberately not exposed; `auto` is the balanced, predictable-cost default.
- **The scrape fallback is opt-in by cost.** The Exa `/contents` tier runs only when the free scrape tiers (markdown → stealth → HTML → browser, when Chrome is configured) all fail to extract content — the common path never spends an Exa credit on scraping.
- **Best used as a routing member** when you also want a free default: `SEARCH_ROUTING=brave,exa`.

---

## SearXNG (Self-Hosted)

**Free**: Open source, self-hosted — no API key needed, no query limits

SearXNG is a privacy-respecting metasearch engine that aggregates results from multiple sources. Ideal for air-gapped deployments or organizations that need full control over search infrastructure.

### Step 1: Run SearXNG

The fastest way is Docker:

```bash
docker run -d --name searxng \
  -p 8080:8080 \
  -e SEARXNG_SECRET=your-secret-key \
  searxng/searxng:latest
```

For production deployments, see the [SearXNG documentation](https://docs.searxng.org/) for configuration options (engine selection, rate limiting, result ranking).

### Step 2: Enable JSON API

SearXNG needs the JSON format enabled. Create or edit `settings.yml`:

```yaml
search:
  formats:
    - html
    - json
```

### Step 3: Configure

```bash
export SEARCH_PROVIDER=searxng
export SEARXNG_URL=http://localhost:8080
```

### Step 4: Authenticating to a protected SearXNG (optional)

If your instance is behind HTTP Basic auth or a reverse proxy that requires a token, supply the credential at deploy time. Both variables are optional — unset, the server talks to SearXNG exactly as before.

```bash
# HTTP Basic auth (the most common case):
export SEARXNG_BASIC_AUTH=user:password   # everything after the first ':' is the password, so colons in the password are fine

# Non-Basic schemes (bearer token, Cloudflare Access service token, API-gateway shared secret) —
# comma-separated "Name: Value" pairs:
export SEARXNG_HEADERS="X-Proxy-Token: abc123, CF-Access-Client-Id: client.id"
```

Notes:

- **Never logged.** The credential and header values never appear in logs, errors, or audit records — messages name only the variable or the header name.
- **Fail-closed & validated.** A malformed value — Basic auth without a `user:password` shape, a header missing its `:`, an invalid header name, or a newline/control character in a value — is rejected at startup and never sent on the wire. In HTTP mode (`PORT` set) the server refuses to start; in STDIO mode it logs the error and drops the bad value (matching the existing zero-config startup behavior). Either way the malformed credential is never used.
- **No commas or newlines inside a header value** — commas delimit the pairs, and newlines are rejected to prevent header injection.
- **Precedence.** A custom `Authorization` header in `SEARXNG_HEADERS` overrides `SEARXNG_BASIC_AUTH` (last writer wins), which lets a bearer-token proxy take priority.
- Auth applies whenever `SEARXNG_URL` is set — including when SearXNG is only a `SEARCH_ROUTING` or fallback target, not just when `SEARCH_PROVIDER=searxng`.
- Never commit real credentials; set these as deployment secrets.

---

## SearchAPI.io

**Free tier**: 100 searches/month (paid plans available)

### Step 1: Get an API Key

1. Go to [SearchAPI.io](https://www.searchapi.io/)
2. Sign up for an account
3. Navigate to your dashboard
4. Copy your API key

### Step 2: Configure

```bash
export SEARCHAPI_API_KEY=your-searchapi-key
```

To use SearchAPI.io as your primary provider:

```bash
export SEARCH_PROVIDER=searchapi
export SEARCHAPI_API_KEY=your-searchapi-key
```

---

## Multi-Provider Routing

For maximum reliability, configure multiple providers and let the server route automatically with fallback:

```bash
# All providers configured
export BRAVE_API_KEY=BSA...
export GOOGLE_CUSTOM_SEARCH_API_KEY=AIza...
export GOOGLE_CUSTOM_SEARCH_ID=017...
export SERPER_API_KEY=...

# Priority-ordered routing with automatic failover
export SEARCH_ROUTING=brave,google,serper
```

If Brave is down or rate-limited, requests automatically switch to Google, then Serper. If one provider starts failing repeatedly, the server stops trying it and routes to the next one.

For per-operation routing (different providers for different search types):

```bash
export SEARCH_ROUTING='{"web":"brave,google","news":"brave,serper","images":"google,brave","academic":"openalex,crossref","patents":"epo,lens,searchapi,uspto","default":"brave,google,searchapi"}'
```

See [docs/DEPLOYMENT.md](DEPLOYMENT.md) for advanced routing configuration.

---

## Choosing a Provider

Not sure which provider to pick? See **[docs/PROVIDERS.md](PROVIDERS.md)** for a full comparison: index classification (own index vs. Google-backed vs. aggregator), capability matrix per tool, free-tier limits, and a quick-pick guide.

**Short recommendation**: Start with Brave (2,000/month free, own independent index) and add Google as a fallback. Use `SEARCH_ROUTING=brave,google` for a good balance of coverage and reliability.

---

## Patent Search Providers (Optional)

These providers enable structured patent search via `patent_search`. All are optional — without them, patent search falls back to web discovery via your configured web search provider.

### EPO Open Patent Services (Worldwide)

Free access to 100M+ patent documents across all major offices.

**Step 1**: Register at [developers.epo.org](https://developers.epo.org) and create an app with "OPS - EPO OPS Core APIs" enabled.

**Step 2**: Configure

```bash
export EPO_OPS_CONSUMER_KEY=your-consumer-key
export EPO_OPS_CONSUMER_SECRET=your-consumer-secret
```

**Notes**: Free tier is rate-limited (throttled, not hard-capped). Authentication uses OAuth2 Client Credentials (handled automatically). Coverage is worldwide — all patent offices.

### USPTO (US Patents)

Access to US patent applications and grants.

**Step 1**: Request an API key at [data.uspto.gov](https://data.uspto.gov).

**Step 2**: Configure

```bash
export USPTO_API_KEY=your-api-key
```

**Notes**: Covers US patents only. Queries for non-US patent offices automatically skip this provider.

### The Lens (Worldwide + Scholarly Links)

Access to 100+ jurisdictions with links between patents and scholarly works.

**Step 1**: Register at [lens.org](https://www.lens.org) and request API access from your account settings.

**Step 2**: Configure

```bash
export LENS_API_TOKEN=your-api-token
```

**Notes**: Free tier allows limited monthly requests. Unique capability: links patents to citing academic papers.

### Patent Routing

When you have multiple patent providers configured, the server tries each one in order — if the first doesn't have results, it moves to the next:

```bash
export SEARCH_ROUTING='{"patents":"epo,lens,searchapi,uspto","default":"brave,google"}'
```

Without explicit routing, all configured patent providers are tried in order until one returns results. The `patent_office` parameter in search requests enables intelligent routing — e.g., a search restricted to `EP` skips USPTO automatically.

---

## Academic Search Providers (Optional)

These providers enable structured scholarly search via `academic_search`. All are optional — without them, academic search falls back to site-restricted web discovery via your configured web search provider.

### OpenAlex (Worldwide — 287M+ Works)

Open scholarly metadata covering all academic disciplines. Returns DOIs, authors with affiliations, citation counts, open-access status, PDF links, and funding data.

**Step 1**: No registration needed — just provide a contact email for the polite pool (100 RPS instead of 10 RPS).

**Step 2**: Configure

```bash
export OPENALEX_EMAIL=you@example.com
```

**Notes**: CC0 data, no API key required. The email is used for the "polite pool" (higher rate limits, priority support). Abstracts are returned in inverted index format and reconstructed automatically.

### CrossRef (Worldwide — 140M+ DOI Works)

Authoritative DOI metadata with 99.94% documented uptime. Returns structured journal metadata, publication dates, and citation counts for peer-reviewed works.

**Step 1**: No registration needed — just provide a contact email for the polite pool (50 RPS instead of 5 RPS).

**Step 2**: Configure

```bash
export CROSSREF_EMAIL=you@example.com
```

**Notes**: The email is used for the polite pool (higher rate limits). CrossRef is the official DOI registration agency — every DOI-registered work appears here with authoritative metadata.

### Semantic Scholar (Worldwide — 200M+ Papers)

Adds AI `tldr` summaries and citation intent/influence signals, and powers `citation_graph` with rich edges. Works **without** a key at a lower shared rate limit; a key raises throughput.

**Step 1**: (Optional) Request a key at [semanticscholar.org/product/api](https://www.semanticscholar.org/product/api).

**Step 2**: Configure (optional)

```bash
export SEMANTIC_SCHOLAR_API_KEY=your-key
```

**Notes**: Keyless use is rate-limited by a shared public pool and may return a `rate_limited` error under load — set a key to avoid this. Also selectable as a `citation_graph` provider.

### PubMed (Biomedical Literature — 35M+ Citations)

NIH's NCBI E-utilities index of biomedical and life-science literature. Works **keyless** at ~3 requests/second; a free API key raises it to ~10 req/s.

**Step 1**: (Optional) Sign in at [ncbi.nlm.nih.gov/account](https://www.ncbi.nlm.nih.gov/account) and go to **Settings → API Key Management** to generate a key.

**Step 2**: Configure (both are optional)

```bash
export PUBMED_API_KEY=your-ncbi-key     # raises rate limit (~10 req/s)
export PUBMED_EMAIL=you@example.com     # NCBI contact param; falls back to OPENALEX_EMAIL when unset
```

**Notes**: Keyless use works out of the box. A key is recommended for sustained or high-volume use. `PUBMED_EMAIL` falls back to `OPENALEX_EMAIL` — setting the OpenAlex email is sufficient to cover both. Also selectable as an `academic_search` provider via `provider: pubmed`.

### Unpaywall (Open-Access Enrichment)

Not a search provider — it fills free-PDF links on DOI-bearing `academic_search` results that lack one. Best-effort; never fails or slows a search beyond its own bounded request.

**Step 1**: No registration — just provide a contact email.

**Step 2**: Configure

```bash
export UNPAYWALL_EMAIL=you@example.com
```

**Notes**: Falls back to `OPENALEX_EMAIL` when unset; a complete no-op when neither is set.

### Academic Routing

When multiple academic providers are configured, the router tries them in priority order with automatic fallback:

```bash
export SEARCH_ROUTING='{"academic":"openalex,crossref","patents":"epo,lens","default":"brave,google"}'
```

Without explicit routing, all configured academic providers are tried in order until one returns results. If no academic providers are configured, `academic_search` automatically falls back to site-restricted web search.

## Structured-Domain Providers (Optional)

These enable dedicated structured-research tools. Each is independent; `filing_search` is the only one that requires configuration (a contact email). The rest are always available — see each section for optional keys that raise rate limits or add data sources.

### SEC EDGAR (US Public-Company Filings)

Backs `filing_search`. SEC requires a contact email in the request User-Agent — there is **no API key**.

**Step 1**: No registration. SEC asks only that automated requests identify a contact email.

**Step 2**: Configure

```bash
export EDGAR_CONTACT_EMAIL=you@example.com
```

**Notes**: Falls back to `OPENALEX_EMAIL` if `EDGAR_CONTACT_EMAIL` is unset; `filing_search` registers only when one of the two is set. Returns recent filings or, with `facts=true`, structured XBRL company facts passed through exactly as filed.

### CourtListener (US Case Law)

Backs `legal_search`. Works **keyless** — `legal_search` is always available. An optional token raises the rate limit.

**Step 1**: (Optional) Register at [courtlistener.com](https://www.courtlistener.com) and create an API token in your account settings.

**Step 2**: Configure (optional)

```bash
export COURTLISTENER_API_TOKEN=your-token
```

**Notes**: Without a token, roughly 100 requests/day; a token raises this to ~5000/day. Coverage is US federal and state court opinions.

### World Bank + OECD + Eurostat (keyless) + FRED (key)

`econ_search` is backed by four providers. **World Bank Open Data**, **OECD**, and **Eurostat** are all keyless and always available. **FRED** (Federal Reserve Economic Data) adds 800K+ US macro series and needs a free key.

- World Bank (`provider: worldbank`) — global development indicators for 200+ economies, scope by `country`
- OECD (`provider: oecd`) — OECD economy indicators via SDMX
- Eurostat (`provider: eurostat`) — European official statistics
- FRED (`provider: fred`) — US macro series (GDP, CPI, unemployment, rates)

So `econ_search` works out of the box (World Bank, OECD, Eurostat); add the FRED key to also reach US macro series.

**FRED — Step 1**: Request a free key at [fred.stlouisfed.org](https://fred.stlouisfed.org/docs/api/api_key.html) (sign in → My Account → API Keys).

**FRED — Step 2**: Configure

```bash
export FRED_API_KEY=your-fred-key
```

**Notes**: World Bank, OECD, and Eurostat require no configuration. FRED is enabled by `FRED_API_KEY`. Observation values pass through exactly as each source returns them — no rounding; the FRED key is sent as a query param and never logged.

### ClinicalTrials.gov (Clinical Trials)

Backs `clinical_search`. Works **keyless** — `clinical_search` is always available. No registration or API key.

**Notes**: Queries the ClinicalTrials.gov v2 API (NIH registry of 400K+ studies). Returns trial registrations as typed data (status, phase, sponsor, conditions, interventions, results availability); read the full record via `scrape_page` on the returned `url`. Discovery + primary-source retrieval only — not medical advice.

### Internet Archive — Save Page Now (Optional, for `archive_source`)

The `archive_source` tool triggers an Internet Archive Save Page Now (SPN) capture. It works **keyless** by default — no registration is required. An optional S3-style key pair raises the rate limit and improves capture reliability for high-volume use.

**Step 1**: (Optional) Sign in at [archive.org/account/s3.php](https://archive.org/account/s3.php) to generate an access/secret key pair.

**Step 2**: Configure (both are optional)

```bash
export IA_ACCESS_KEY=your-ia-access-key
export IA_SECRET_KEY=your-ia-secret-key
```

**Notes**: Both keys are required together — set neither or both. Values are never logged or included in error messages. Keyless SPN is sufficient for occasional archiving; keys are recommended for production deployments that archive frequently.