---
name: firecrawl-mcp
description: >
  Using the Firecrawl MCP server to scrape, search, crawl, extract, and browse
  the web. Use this skill whenever the Firecrawl MCP tools are available and
  you need to retrieve web content, discover URLs on a site, search the web
  with full-page content retrieval, extract structured data from pages, perform
  autonomous multi-source web research, or interact with web pages through a
  remote browser sandbox. Trigger this skill for any task involving
  firecrawl_scrape, firecrawl_search, firecrawl_map, firecrawl_crawl,
  firecrawl_extract, firecrawl_agent, or firecrawl_browser_* tools. Also
  trigger when the user asks you to "scrape", "crawl", "map a site",
  "extract data from a page", "search with Firecrawl", or "use the browser
  sandbox", even if they don't mention Firecrawl by name — provided the MCP
  tools are connected.
---

# Firecrawl MCP — Agent Skill

This skill governs how to use the Firecrawl MCP server tools effectively.
It assumes the MCP server is already connected and authenticated.

## Tool inventory

The Firecrawl MCP exposes 12 tools across five capabilities:

| Capability | Tools | Async? |
|---|---|---|
| **Scrape** | `firecrawl_scrape` | No |
| **Search** | `firecrawl_search` | No |
| **Map** | `firecrawl_map` | No |
| **Crawl** | `firecrawl_crawl`, `firecrawl_check_crawl_status` | Yes |
| **Extract** | `firecrawl_extract` | No |
| **Agent** | `firecrawl_agent`, `firecrawl_agent_status` | Yes |
| **Browser** | `firecrawl_browser_create`, `firecrawl_browser_execute`, `firecrawl_browser_delete`, `firecrawl_browser_list` | Session |

## Choosing the right tool

Apply this decision tree top-to-bottom. Pick the **first** match.

1. **You have a single URL and need its content** → `firecrawl_scrape`
2. **You need to find pages on the open web by query** → `firecrawl_search`
3. **You need to discover URLs within a single domain** → `firecrawl_map`
4. **You need content from many pages under one domain** → `firecrawl_crawl`
5. **You need structured fields from one or more known URLs** → `firecrawl_extract`
6. **You have a complex, open-ended research question spanning multiple unknown sources** → `firecrawl_agent`
7. **You need to interact with a page (fill forms, click, authenticate)** → `firecrawl_browser_*`

When in doubt between scrape and search: if you already have the URL, scrape.
If you need to find the URL first, search.

When in doubt between extract and scrape-with-JSON-format: `firecrawl_extract`
operates on multiple URLs and uses Firecrawl's server-side LLM. The JSON
format on `firecrawl_scrape` works on a single page and also uses server-side
extraction. Prefer `firecrawl_extract` when pulling uniform structured data
from several pages. Prefer scrape with JSON format when you want markdown
*and* structured data from the same single page in one call.

When in doubt between crawl and map-then-scrape: crawl is a single async job
that handles traversal and scraping together. Map-then-scrape gives you more
control (you can filter the URL list before scraping selectively). Prefer
map-then-scrape when you only need a subset of pages; prefer crawl when you
want everything under a domain up to a depth/limit.

## Credit costs — be frugal

Every tool call consumes API credits. Minimise unnecessary calls.

| Tool | Base cost |
|---|---|
| `firecrawl_scrape` | 1 credit per page |
| `firecrawl_search` | 1 credit per result (+ scrape costs if `scrapeOptions` used) |
| `firecrawl_map` | 1 credit per call (regardless of URL count returned) |
| `firecrawl_crawl` | 1 credit per page crawled |
| `firecrawl_extract` | Varies; LLM extraction adds cost |
| `firecrawl_agent` | Varies by research scope |
| `firecrawl_browser_*` | Session-based billing |

**Additional surcharges:** JSON mode adds 4 credits/page. Enhanced proxy adds
4 credits/page. PDF parsing adds 1 credit per PDF page.

Always set `limit` on crawl and map calls. The default crawl limit is 10,000
pages — a runaway crawl will burn through credits fast. Start with a low limit
(10–50) and increase only if needed.

## Core patterns

### Pattern 1: Scrape a known URL

```json
{
  "name": "firecrawl_scrape",
  "arguments": {
    "url": "https://example.com/pricing",
    "formats": ["markdown"],
    "onlyMainContent": true
  }
}
```

Set `onlyMainContent: true` to strip nav, footer, and sidebar boilerplate.
This reduces token count and improves downstream processing.

**Available formats:** `markdown`, `html`, `rawHtml`, `screenshot`,
`links`, `json`, `images`, `branding`, `summary`.

Request only the formats you need. Multiple formats in one call are
fine — the page is fetched once.

For pages that require JavaScript rendering or contain dynamic content,
Firecrawl handles this automatically. If a standard scrape fails or returns
incomplete content, consider using `waitFor` (milliseconds) to let JS finish,
or use `actions` for pages that need interaction before content appears.

→ For full scrape options, read `references/scrape-options.md`.

### Pattern 2: Search the web

```json
{
  "name": "firecrawl_search",
  "arguments": {
    "query": "Rust async runtime benchmarks 2025",
    "limit": 5
  }
}
```

Without `scrapeOptions`, search returns metadata only (URL, title,
description, position). Add `scrapeOptions` to get full page content
from each result in one operation — but note this multiplies credit cost.

**Time-based filtering** with `tbs`: `qdr:d` (past day), `qdr:w` (past
week), `qdr:m` (past month). Essential for finding recent content.

**Source types** via `sources`: `["web"]` (default), `["news"]`,
`["images"]`, or combinations. The `limit` applies per source type.

**Category filtering** via `categories`: `["github"]`, `["research"]`,
`["pdf"]`. Narrows results to specific domains (GitHub repos, academic
sites, PDF documents respectively).

→ For full search options, read `references/search-options.md`.

### Pattern 3: Map a site's URL structure

```json
{
  "name": "firecrawl_map",
  "arguments": {
    "url": "https://docs.example.com",
    "search": "authentication",
    "limit": 100
  }
}
```

Map returns an array of URLs (with optional title/description). It does
**not** return page content. Use it as a reconnaissance step before
selective scraping.

The `search` parameter filters returned URLs by relevance to a term —
useful when you only need the authentication docs from a large site,
for instance.

Set `ignoreQueryParameters: true` to deduplicate URLs that differ only
by query string.

### Pattern 4: Crawl an entire site (async)

```json
{
  "name": "firecrawl_crawl",
  "arguments": {
    "url": "https://docs.example.com",
    "maxDiscoveryDepth": 2,
    "limit": 50,
    "deduplicateSimilarURLs": true
  }
}
```

Crawl is **asynchronous**. It returns a job ID immediately. Poll with
`firecrawl_check_crawl_status` using that ID. Allow 15–30 seconds between
polls. The status will be `scraping`, `completed`, or `failed`.

By default, crawl stays within the URL's path hierarchy. Set
`allowExternalLinks: true` to follow links to other domains (use with
caution — credit implications). Set `allowSubdomains: true` to include
subdomains like `blog.example.com` when crawling `example.com`.

All scrape options (formats, `onlyMainContent`, actions, location, tags)
can be passed via `scrapeOptions` and apply to every page the crawler
visits.

→ For full crawl options, read `references/crawl-options.md`.

### Pattern 5: Extract structured data

```json
{
  "name": "firecrawl_extract",
  "arguments": {
    "urls": ["https://example.com/product/1", "https://example.com/product/2"],
    "prompt": "Extract the product name, price, and availability status",
    "schema": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "price": { "type": "number" },
        "in_stock": { "type": "boolean" }
      },
      "required": ["name", "price"]
    }
  }
}
```

The `schema` follows JSON Schema format. If omitted, the LLM chooses
its own structure guided by `prompt`. Providing a schema is strongly
recommended for consistent, parseable output.

### Pattern 6: Autonomous research agent (async)

```json
{
  "name": "firecrawl_agent",
  "arguments": {
    "prompt": "Find the pricing tiers and feature limits for Vercel, Netlify, and Cloudflare Pages. Compare them.",
    "schema": {
      "type": "object",
      "properties": {
        "providers": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "name": { "type": "string" },
              "tiers": { "type": "array", "items": { "type": "object" } }
            }
          }
        }
      }
    }
  }
}
```

The agent is async — it returns a job ID. Poll `firecrawl_agent_status`
every 15–30 seconds. Allow at least 2–3 minutes before treating it as
failed. The agent autonomously searches, navigates, and extracts.

Provide `urls` to focus the agent on specific pages. Omit `urls` to let
it search freely. The `prompt` is limited to 10,000 characters.

Best for: complex cross-site research where you don't know the exact URLs
in advance, or where content is spread across many pages.

### Pattern 7: Browser sandbox sessions

For interactive web tasks (form filling, authentication, multi-step
navigation), use the browser sandbox.

**Lifecycle:**
1. `firecrawl_browser_create` — start a session (returns session ID)
2. `firecrawl_browser_execute` — run code in the session (repeatable)
3. `firecrawl_browser_delete` — destroy the session when finished

**Always delete sessions when done.** Sessions have TTLs but leaving
them open wastes resources.

→ For full browser options and commands, read `references/browser-options.md`.

## Handling asynchronous tools

Both `firecrawl_crawl` and `firecrawl_agent` are async. The workflow is:

1. Call the tool → receive a job ID.
2. Poll the status tool with that ID every 15–30 seconds.
3. On `completed`, the response includes the results.
4. On `failed`, report the error. Consider retrying with adjusted parameters.

Do not poll more frequently than every 15 seconds — it wastes rate-limit
budget and the status endpoints have their own rate limits.

## Error handling

The MCP server handles retries internally with exponential backoff (default:
3 attempts, starting at 1s, doubling each time, capped at 10s). If a call
still fails after retries, you will receive an error response.

Common errors:
- **Rate limit exceeded:** Back off and retry after the indicated delay. Check
  whether you're making unnecessary calls that can be consolidated.
- **Credit limit warnings:** The server emits warnings at configurable
  thresholds. If you see a credit warning, inform the user and stop
  non-essential operations.
- **Timeout:** Increase the `timeout` parameter or simplify the request
  (fewer actions, simpler schema, lower page count).

## Anti-patterns

- **Scraping then extracting the same page:** Use `firecrawl_scrape` with
  `formats: ["markdown", "json"]` to get both in one call, or use
  `firecrawl_extract` if you only need structured data.
- **Crawling an entire domain to find one page:** Use `firecrawl_map` with
  the `search` parameter first, then scrape the specific URL.
- **Polling status every 2 seconds:** Wastes rate-limit budget. Use 15–30
  second intervals.
- **Omitting `limit` on crawl:** The default is 10,000 pages. Always set an
  explicit limit.
- **Using `firecrawl_agent` for single-page tasks:** The agent is designed
  for multi-source research. For single pages, `firecrawl_scrape` or
  `firecrawl_extract` are faster, cheaper, and more predictable.
- **Requesting `rawHtml` when `markdown` suffices:** `rawHtml` is large
  and rarely needed for LLM consumption. Use `markdown` by default;
  `html` (cleaned) if you need structure; `rawHtml` only for debugging
  or when you need the exact original markup.
- **Leaving browser sessions open:** Always call `firecrawl_browser_delete`
  when your task is complete. Use `firecrawl_browser_list` to check for
  orphaned sessions.

## Caching

Firecrawl caches scraped pages with a default freshness window of 2 days
(`maxAge: 172800000` ms). Cached responses are significantly faster (up to
5×). Set `maxAge: 0` to force a fresh scrape — but only when you genuinely
need the absolute latest content. A non-zero `maxAge` is almost always the
right choice.

## Reference files

For detailed parameter documentation on each tool, read the appropriate
reference file:

| File | Contents |
|---|---|
| `references/scrape-options.md` | All `firecrawl_scrape` parameters, formats, actions, and location settings |
| `references/search-options.md` | All `firecrawl_search` parameters, source types, categories, and scrape integration |
| `references/crawl-options.md` | All `firecrawl_crawl` parameters, path filtering, scope, and status polling |
| `references/browser-options.md` | Browser session lifecycle, execute languages, agent-browser commands, TTL config |
| `references/extract-agent-options.md` | `firecrawl_extract` schema design and `firecrawl_agent` usage patterns |

Read these files when you need parameter-level detail beyond what this
document covers. For most tasks, the patterns above are sufficient.