--- name: enact-firecrawl version: 1.2.1 description: Scrape, crawl, search, and extract structured data from websites using Firecrawl API - converts web pages to LLM-ready markdown enact: "2.0" from: python:3.12-slim build: - pip install requests env: FIRECRAWL_API_KEY: description: Your Firecrawl API key from firecrawl.dev secret: true command: python /workspace/firecrawl.py ${action} ${url} ${formats} ${limit} ${only_main_content} ${prompt} ${schema} timeout: 300s license: MIT tags: - web-scraping - crawling - markdown - llm - ai - data-extraction - search - structured-data annotations: readOnlyHint: true openWorldHint: true inputSchema: type: object properties: action: type: string description: | The action to perform: - scrape: Extract content from a single URL - crawl: Discover and scrape all subpages of a website - map: Get all URLs from a website (fast discovery) - search: Search the web and get scraped results - extract: Extract structured data using AI enum: - scrape - crawl - map - search - extract default: scrape url: type: string description: The URL to process (for scrape, crawl, map, extract) or search query (for search action) formats: type: string description: Comma-separated output formats (markdown, html, links, screenshot). Used by scrape and crawl actions. default: markdown limit: type: integer description: Maximum number of pages to crawl (crawl action) or search results to return (search action) default: 10 only_main_content: type: boolean description: Extract only the main content, excluding headers, navs, footers (scrape action) default: true prompt: type: string description: | Multi-purpose field: - For map: Search query to filter URLs - For extract: Natural language instruction for what to extract default: "" schema: type: string description: JSON schema string for structured extraction (extract action only). Define the shape of data you want to extract. default: "" required: - url outputSchema: type: object properties: success: type: boolean description: Whether the operation succeeded action: type: string description: The action that was performed url: type: string description: The URL or query that was processed data: type: object description: The scraped/crawled/extracted data including markdown, metadata, and structured content error: type: string description: Error message if the operation failed examples: - input: url: "https://example.com" action: "scrape" description: Scrape a single page and get markdown - input: url: "https://docs.example.com" action: "crawl" limit: 5 description: Crawl a documentation site (up to 5 pages) - input: url: "https://example.com" action: "map" description: Get all URLs from a website - input: url: "latest AI news" action: "search" limit: 5 description: Search the web and get scraped results - input: url: "https://news.ycombinator.com" action: "extract" prompt: "Extract the top 5 news headlines with their URLs and point counts" description: Extract structured data from a page using AI --- # Firecrawl Web Scraping Tool A powerful web scraping tool that uses the [Firecrawl API](https://firecrawl.dev) to convert websites into clean, LLM-ready markdown and extract structured data. ## Features - **Scrape**: Extract content from a single URL as markdown, HTML, or with screenshots - **Crawl**: Automatically discover and scrape all accessible subpages of a website - **Map**: Get a list of all URLs from a website without scraping content (extremely fast) - **Search**: Search the web and get full scraped content from results - **Extract**: Use AI to extract structured data from pages with natural language prompts ## Setup 1. Get an API key from [firecrawl.dev](https://firecrawl.dev) 2. Set your API key as a secret: ```bash enact env set FIRECRAWL_API_KEY --secret --namespace enact ``` This stores your API key securely in your OS keyring (macOS Keychain, Windows Credential Manager, or Linux Secret Service). ## Usage Examples ### CLI #### Scrape a single page ```bash enact run enact/firecrawl -a '{"url": "https://example.com", "action": "scrape"}' ``` #### Crawl an entire documentation site ```bash enact run enact/firecrawl -a '{"url": "https://docs.example.com", "action": "crawl", "limit": 20}' ``` #### Map all URLs on a website ```bash enact run enact/firecrawl -a '{"url": "https://example.com", "action": "map"}' ``` #### Search the web ```bash enact run enact/firecrawl -a '{"url": "latest AI developments 2024", "action": "search", "limit": 5}' ``` #### Extract structured data with AI ```bash enact run enact/firecrawl -a '{"url": "https://news.ycombinator.com", "action": "extract", "prompt": "Extract the top 10 news headlines with their URLs"}' ``` #### Extract with a JSON schema ```bash enact run enact/firecrawl -a '{ "url": "https://example.com/pricing", "action": "extract", "prompt": "Extract pricing information", "schema": "{\"type\":\"object\",\"properties\":{\"plans\":{\"type\":\"array\",\"items\":{\"type\":\"object\",\"properties\":{\"name\":{\"type\":\"string\"},\"price\":{\"type\":\"string\"}}}}}}" }' ``` ### MCP (for LLMs/Agents) When using this tool via MCP, call `enact__firecrawl` with these parameters: #### Scrape a single page Call with `url` set to the target URL and `action` set to `"scrape"`. #### Crawl a documentation site Call with `url`, `action` set to `"crawl"`, and `limit` to control the maximum number of pages. #### Map all URLs on a website Call with `url` and `action` set to `"map"` to discover all URLs without scraping content. #### Search the web Call with `url` set to your search query (e.g., "latest AI news") and `action` set to `"search"`. Use `limit` to control result count. #### Extract structured data with AI Call with `url`, `action` set to `"extract"`, and `prompt` describing what data to extract. Optionally provide a `schema` for structured output. ## Output The tool returns JSON with: - **markdown**: Clean, LLM-ready content - **metadata**: Title, description, language, source URL - **extract**: Structured data (for extract action) - **links**: Discovered URLs (for map action) ## API Features Firecrawl handles the hard parts of web scraping: - Anti-bot mechanisms - Dynamic JavaScript content - Proxies and rate limiting - PDF and document parsing - Screenshot capture