--- name: camoufox-stealth-browser homepage: https://github.com/kesslerio/camoufox-stealth-browser-clawhub-skill description: C++ level anti-bot browser automation using Camoufox (patched Firefox) in isolated containers. Bypasses Cloudflare Turnstile, Datadome, Airbnb, Yelp. Superior to Chrome-based solutions (undetected-chromedriver, puppeteer-stealth) which only patch at JS level. Use when standard Playwright/Selenium gets blocked. metadata: openclaw: emoji: "🦊" requires: bins: ["distrobox"] env: [] --- # Camoufox Stealth Browser 🦊 **C++ level** anti-bot evasion using Camoufox — a custom Firefox fork with stealth patches compiled into the browser itself, not bolted on via JavaScript. ## Why Camoufox > Chrome-based Solutions | Approach | Detection Level | Tools | |----------|-----------------|-------| | **Camoufox (this skill)** | C++ compiled patches | Undetectable fingerprints baked into browser | | undetected-chromedriver | JS runtime patches | Can be detected by timing analysis | | puppeteer-stealth | JS injection | Patches applied after page load = detectable | | playwright-stealth | JS injection | Same limitations | **Camoufox patches Firefox at the source code level** — WebGL, Canvas, AudioContext fingerprints are genuinely spoofed, not masked by JavaScript overrides that anti-bot systems can detect. ## Key Advantages 1. **C++ Level Stealth** — Fingerprint spoofing compiled into the browser, not JS hacks 2. **Container Isolation** — Runs in distrobox, keeping your host system clean 3. **Dual-Tool Approach** — Camoufox for browsers, curl_cffi for API-only (no browser overhead) 4. **Firefox-Based** — Less fingerprinted than Chrome (everyone uses Chrome for bots) ## When to Use - Standard Playwright/Selenium gets blocked - Site shows Cloudflare challenge or "checking your browser" - Need to scrape Airbnb, Yelp, or similar protected sites - `puppeteer-stealth` or `undetected-chromedriver` stopped working - You need **actual** stealth, not JS band-aids ## Tool Selection | Tool | Level | Best For | |------|-------|----------| | **Camoufox** | C++ patches | All protected sites - Cloudflare, Datadome, Yelp, Airbnb | | **curl_cffi** | TLS spoofing | API endpoints only - no JS needed, very fast | ## Quick Start All scripts run in `pybox` distrobox for isolation. ⚠️ **Use `python3.14` explicitly** - pybox may have multiple Python versions with different packages installed. ### 1. Setup (First Time) ```bash # Install tools in pybox (use python3.14) distrobox-enter pybox -- python3.14 -m pip install camoufox curl_cffi # Camoufox browser downloads automatically on first run (~700MB Firefox fork) ``` ### 2. Fetch a Protected Page **Browser (Camoufox):** ```bash distrobox-enter pybox -- python3.14 scripts/camoufox-fetch.py "https://example.com" --headless ``` **API only (curl_cffi):** ```bash distrobox-enter pybox -- python3.14 scripts/curl-api.py "https://api.example.com/endpoint" ``` ## Architecture ``` ┌─────────────────────────────────────────────────────────┐ │ OpenClaw Agent │ ├─────────────────────────────────────────────────────────┤ │ distrobox-enter pybox -- python3.14 scripts/xxx.py │ ├─────────────────────────────────────────────────────────┤ │ pybox Container │ │ ┌─────────────┐ ┌─────────────┐ │ │ │ Camoufox │ │ curl_cffi │ │ │ │ (Firefox) │ │ (TLS spoof)│ │ │ └─────────────┘ └─────────────┘ │ └─────────────────────────────────────────────────────────┘ ``` ## Tool Details ### Camoufox - **What:** Custom Firefox build with C++ level stealth patches - **Pros:** Best fingerprint evasion, passes Turnstile automatically - **Cons:** ~700MB download, Firefox-based - **Best for:** All protected sites - Cloudflare, Datadome, Yelp, Airbnb ### curl_cffi - **What:** Python HTTP client with browser TLS fingerprint spoofing - **Pros:** No browser overhead, very fast - **Cons:** No JS execution, API endpoints only - **Best for:** Known API endpoints, mobile app reverse engineering ## Critical: Proxy Requirements **Datacenter IPs (AWS, DigitalOcean) = INSTANT BLOCK on Airbnb/Yelp** You MUST use residential or mobile proxies: ```python # Example proxy config proxy = "http://user:pass@residential-proxy.example.com:8080" ``` See **[references/proxy-setup.md](references/proxy-setup.md)** for proxy configuration. ## Behavioral Tips Sites like Airbnb/Yelp use behavioral analysis. To avoid detection: 1. **Warm up:** Don't hit target URL directly. Visit homepage first, scroll, click around. 2. **Mouse movements:** Inject random mouse movements (Camoufox handles this). 3. **Timing:** Add random delays (2-5s between actions), not fixed intervals. 4. **Session stickiness:** Use same proxy IP for 10-30 min sessions, don't rotate every request. ## Headless Mode Warning ⚠️ Old `--headless` flag is DETECTED. Options: 1. **New Headless:** Use `headless="new"` (Chrome 109+) 2. **Xvfb:** Run headed browser in virtual display 3. **Headed:** Just run headed if you can (most reliable) ```bash # Xvfb approach (Linux) Xvfb :99 -screen 0 1920x1080x24 & export DISPLAY=:99 python scripts/camoufox-fetch.py "https://example.com" ``` ## Troubleshooting | Problem | Solution | |---------|----------| | "Access Denied" immediately | Use residential proxy | | Cloudflare challenge loops | Try Camoufox instead of Nodriver | | Browser crashes in pybox | Install missing deps: `sudo dnf install gtk3 libXt` | | TLS fingerprint blocked | Use curl_cffi with `impersonate="chrome120"` | | Turnstile checkbox appears | Add mouse movement, increase wait time | | `ModuleNotFoundError: camoufox` | Use `python3.14` not `python` or `python3` | | `greenlet` segfault (exit 139) | Python version mismatch - use `python3.14` explicitly | | `libstdc++.so.6` errors | NixOS lib path issue - use `python3.14` in pybox | ### Python Version Issues (NixOS/pybox) The `pybox` container may have multiple Python versions with separate site-packages: ```bash # Check which Python has camoufox distrobox-enter pybox -- python3.14 -c "import camoufox; print('OK')" # Wrong (may use different Python) distrobox-enter pybox -- python3.14 scripts/camoufox-session.py ... # Correct (explicit version) distrobox-enter pybox -- python3.14 scripts/camoufox-session.py ... ``` If you get segfaults or import errors, always use `python3.14` explicitly. ## Examples ### Scrape Airbnb Listing ```bash distrobox-enter pybox -- python3.14 scripts/camoufox-fetch.py \ "https://www.airbnb.com/rooms/12345" \ --headless --wait 10 \ --screenshot airbnb.png ``` ### Scrape Yelp Business ```bash distrobox-enter pybox -- python3.14 scripts/camoufox-fetch.py \ "https://www.yelp.com/biz/some-restaurant" \ --headless --wait 8 \ --output yelp.html ``` ### API Scraping with TLS Spoofing ```bash distrobox-enter pybox -- python3.14 scripts/curl-api.py \ "https://api.yelp.com/v3/businesses/search?term=coffee&location=SF" \ --headers '{"Authorization": "Bearer xxx"}' ``` ## Session Management Persistent sessions allow reusing authenticated state across runs without re-logging in. ### Quick Start ```bash # 1. Login interactively (headed browser opens) distrobox-enter pybox -- python3.14 scripts/camoufox-session.py \ --profile airbnb --login "https://www.airbnb.com/account-settings" # Complete login in browser, then press Enter to save session # 2. Reuse session in headless mode distrobox-enter pybox -- python3.14 scripts/camoufox-session.py \ --profile airbnb --headless "https://www.airbnb.com/trips" # 3. Check session status distrobox-enter pybox -- python3.14 scripts/camoufox-session.py \ --profile airbnb --status "https://www.airbnb.com" ``` ### Flags | Flag | Description | |------|-------------| | `--profile NAME` | Named profile for session storage (required) | | `--login` | Interactive login mode - opens headed browser | | `--headless` | Use saved session in headless mode | | `--status` | Check if session appears valid | | `--export-cookies FILE` | Export cookies to JSON for backup | | `--import-cookies FILE` | Import cookies from JSON file | ### Storage - **Location:** `~/.stealth-browser/profiles//` - **Permissions:** Directory `700`, files `600` - **Profile names:** Letters, numbers, `_`, `-` only (1-63 chars) ### Cookie Handling - **Save:** All cookies from all domains stored in browser profile - **Restore:** Only cookies matching target URL domain are used - **SSO:** If redirected to Google/auth domain, re-authenticate once and profile updates ### Login Wall Detection The script detects session expiry using multiple signals: 1. **HTTP status:** 401, 403 2. **URL patterns:** `/login`, `/signin`, `/auth` 3. **Title patterns:** "login", "sign in", etc. 4. **Content keywords:** "captcha", "verify", "authenticate" 5. **Form detection:** Password input fields If detected during `--headless` mode, you'll see: ``` 🔒 Login wall signals: url-path, password-form ``` Re-run with `--login` to refresh the session. ### Remote Login (SSH) Since `--login` requires a visible browser, you need display forwarding: **X11 Forwarding (Preferred):** ```bash # Connect with X11 forwarding ssh -X user@server # Run login (opens browser on your local machine) distrobox-enter pybox -- python3.14 scripts/camoufox-session.py \ --profile mysite --login "https://example.com" ``` **VNC Alternative:** ```bash # On server: start VNC session vncserver :1 # On client: connect to VNC vncviewer server:1 # In VNC session: run login distrobox-enter pybox -- python3.14 scripts/camoufox-session.py \ --profile mysite --login "https://example.com" ``` ### Security Notes ⚠️ **Cookies are credentials.** Treat profile directories like passwords: - Profile dirs have `chmod 700` (owner only) - Cookie exports have `chmod 600` - Don't share profiles or exported cookies over insecure channels - Consider encrypting backups ### Limitations | Limitation | Reason | |------------|--------| | localStorage/sessionStorage not exported | Use browser profile instead (handles automatically) | | IndexedDB not portable | Stored in browser profile, not cookie export | | No parallel profile access | No file locking in v1; use one process per profile | ## References - [references/proxy-setup.md](references/proxy-setup.md) — Proxy configuration guide - [references/fingerprint-checks.md](references/fingerprint-checks.md) — What anti-bot systems check