--- name: firecrawl-known-pitfalls description: 'Identify and avoid Firecrawl anti-patterns and common integration mistakes. Use when reviewing Firecrawl code, onboarding new developers, or auditing existing integrations for best practices violations. Trigger with phrases like "firecrawl mistakes", "firecrawl anti-patterns", "firecrawl pitfalls", "firecrawl what not to do", "firecrawl code review". ' allowed-tools: Read, Grep version: 1.0.0 license: MIT author: Jeremy Longshore tags: - saas - firecrawl - audit compatibility: Designed for Claude Code, also compatible with Codex and OpenClaw --- # Firecrawl Known Pitfalls ## Overview Real gotchas from production Firecrawl integrations. Each pitfall includes the bad pattern, why it fails, and the correct approach. Use this as a code review checklist. ## Pitfall 1: Unbounded Crawl (Credit Bomb) ```typescript import FirecrawlApp from "@mendable/firecrawl-js"; const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY!, }); // BAD: no limit — a docs site with 50K pages burns your entire credit balance await firecrawl.crawlUrl("https://docs.large-project.org"); // GOOD: always set limit, maxDepth, and path filters await firecrawl.crawlUrl("https://docs.large-project.org", { limit: 100, maxDepth: 3, includePaths: ["/api/*", "/guides/*"], excludePaths: ["/changelog/*", "/blog/*"], scrapeOptions: { formats: ["markdown"] }, }); ``` ## Pitfall 2: Not Specifying Output Format ```typescript // BAD: default format may not include markdown const result = await firecrawl.scrapeUrl("https://example.com"); console.log(result.markdown); // might be undefined! // GOOD: explicitly request the format you need const result = await firecrawl.scrapeUrl("https://example.com", { formats: ["markdown"], onlyMainContent: true, }); console.log(result.markdown); // guaranteed present ``` ## Pitfall 3: Not Waiting for JS-Heavy Pages ```typescript // BAD: SPAs show loading state, not content const result = await firecrawl.scrapeUrl("https://app.example.com/dashboard"); // result.markdown === "Loading..." or empty // GOOD: wait for JS to render const result = await firecrawl.scrapeUrl("https://app.example.com/dashboard", { formats: ["markdown"], waitFor: 5000, // wait 5s for JS rendering onlyMainContent: true, }); // BETTER: wait for a specific element const result = await firecrawl.scrapeUrl("https://app.example.com/dashboard", { formats: ["markdown"], actions: [ { type: "wait", selector: ".main-content" }, ], }); ``` ## Pitfall 4: Wrong Package Name / Import ```typescript // BAD: these packages don't exist or are wrong import FirecrawlApp from "firecrawl-js"; // wrong import { FireCrawlClient } from "@firecrawl/sdk"; // wrong // GOOD: the correct npm package import FirecrawlApp from "@mendable/firecrawl-js"; // correct! // Install: npm install @mendable/firecrawl-js ``` ## Pitfall 5: Polling Too Aggressively ```typescript // BAD: polling every 100ms wastes resources and may trigger rate limits let status = await firecrawl.checkCrawlStatus(jobId); while (status.status !== "completed") { status = await firecrawl.checkCrawlStatus(jobId); // No delay! Hammering the API } // GOOD: poll with backoff let status = await firecrawl.checkCrawlStatus(jobId); let interval = 2000; while (status.status === "scraping") { await new Promise(r => setTimeout(r, interval)); status = await firecrawl.checkCrawlStatus(jobId); interval = Math.min(interval * 1.5, 30000); // back off to 30s } ``` ## Pitfall 6: No Error Handling on Scrape ```typescript // BAD: assuming scrape always succeeds const result = await firecrawl.scrapeUrl(url, { formats: ["markdown"] }); processContent(result.markdown!); // crashes if scrape failed // GOOD: check result and handle failures const result = await firecrawl.scrapeUrl(url, { formats: ["markdown"] }); if (!result.success || !result.markdown || result.markdown.length < 50) { console.error(`Scrape failed or empty for ${url}`); return null; } processContent(result.markdown); ``` ## Pitfall 7: Ignoring includePaths Start URL Match ```typescript // BAD: start URL doesn't match includePaths — crawl returns 0 pages await firecrawl.crawlUrl("https://example.com/docs/intro", { includePaths: ["/api/*"], // start URL /docs/intro doesn't match /api/* limit: 50, }); // GOOD: start URL must match (or omit) the include pattern await firecrawl.crawlUrl("https://example.com", { includePaths: ["/docs/*", "/api/*"], // start from root, filter paths limit: 50, }); ``` ## Pitfall 8: Requesting Screenshots Unnecessarily ```typescript // BAD: screenshots are expensive (latency and bandwidth) await firecrawl.scrapeUrl(url, { formats: ["markdown", "html", "screenshot"], // screenshot adds 5-10s to every scrape }); // GOOD: only request screenshot when you actually need visual capture await firecrawl.scrapeUrl(url, { formats: ["markdown"], // just what you need onlyMainContent: true, }); ``` ## Pitfall 9: Not Using Batch for Multiple URLs ```typescript // BAD: sequential scrapes (slow, N API calls) const results = []; for (const url of urls) { results.push(await firecrawl.scrapeUrl(url, { formats: ["markdown"] })); } // GOOD: batch scrape (1 API call, internally parallel) const batchResult = await firecrawl.batchScrapeUrls(urls, { formats: ["markdown"], onlyMainContent: true, }); ``` ## Pitfall 10: Not Validating Extracted Content ```typescript // BAD: trusting LLM extraction blindly const result = await firecrawl.scrapeUrl(url, { formats: ["extract"], extract: { schema: productSchema }, }); await db.insert(result.extract); // could be null, malformed, or hallucinated // GOOD: validate with Zod before persisting import { z } from "zod"; const ProductSchema = z.object({ name: z.string().min(1), price: z.number().positive(), }); const parsed = ProductSchema.safeParse(result.extract); if (parsed.success) { await db.insert(parsed.data); } else { console.error("Extraction validation failed:", parsed.error.issues); } ``` ## Code Review Checklist - [ ] All `crawlUrl` calls have `limit` set - [ ] `formats` explicitly specified (never rely on defaults) - [ ] `waitFor` or `actions` used for SPAs - [ ] Import is `@mendable/firecrawl-js` - [ ] Async crawl polls with backoff, not tight loop - [ ] Scrape result checked for success and content length - [ ] Batch scrape used for multiple known URLs - [ ] Extract results validated before persistence - [ ] Error handling for 429, 402, and empty content ## Resources - [Firecrawl Docs](https://docs.firecrawl.dev) - [Scrape vs Crawl](https://docs.firecrawl.dev/features/crawl) - [Advanced Scraping Guide](https://docs.firecrawl.dev/advanced-scraping-guide) ## Next Steps For reference architecture, see `firecrawl-reference-architecture`.