# CHANGELOG

## Unreleased

## v2.45.0

1. chore(deps): bump hashbrown 0.15 → 0.16
1. chore(deps): bump strum 0.26 → 0.27
1. chore(deps): bump thiserror 1 → 2
1. chore(deps): bump indexmap 1 → 2
1. chore(deps): bump async-openai 0.29 → 0.32
1. chore(deps): bump tiktoken-rs 0.7 → 0.9
1. chore(deps): bump sysinfo 0.35 → 0.38
1. chore(deps): bump quick-xml 0.38 → 0.39
1. chore(deps): bump criterion 0.5 → 0.8
1. fix(openai): update async-openai imports for v0.32 types::chat module
1. fix(page): use resolver().resolve_element() for quick-xml 0.39

## v2.44.42

1. feat(agent): integrate llm_models_spider v0.1.9 with smart model selection
1. perf(website): use `take()` instead of `clone()` for subdomain base URL

## v2.44.41

1. fix(website): use page's own URL for relative link resolution on subdomains (#351)

## v2.44.40

1. docs: add all missing feature flags (`spider_cloud`, `agent`, `search`, `webdriver`, `wreq`, `adblock`, `simd`, `tracing`, etc.)
1. docs: add Spider Cloud and Chrome rendering integration examples

## v2.44.39

1. refactor(agent): move all skill content to `spider_skills` crate (110 skills via `include_str!`)

## v2.44.38

1. feat(agent): add L15-L48 not-a-robot skills (34 new levels)
1. feat(agent): add NEST engine loop for recursive nested challenges
1. feat(agent): add CIRCLE engine loop for drawing challenges
1. feat(agent): add haiku benchmark for agent evaluation

## v2.44.37

1. fix: add `should_use_chrome_ai` and `use_chrome_ai` to stub `RemoteMultimodalConfigs` (chrome-only builds)

## v2.44.36

1. feat(agent): add L8-L14 not-a-robot skills (license plate, nested, whack-a-mole, waldo, chihuahuas, reverse, affirmations)
1. feat(agent): add WAM engine loop for whack-a-mole challenges
1. feat(agent): Chrome AI element probe improvements

## v2.44.33

1. feat(cache): optimize automation caching for skip-browser flows

## v2.44.30

1. feat: add spider cloud end-to-end examples

## v2.44.29

1. feat(agent): improve remote multimodal automation reliability

## v2.44.28

1. feat(agent): expose optional automation reasoning in metadata

## v2.44.26

1. feat(spider_cli): add runtime `--http` and `--headless` mode controls

## v2.44.25

1. feat(agent): dual-model routing with per-endpoint URL and API key configuration
1. feat(agent): extraction-only mode optimization for single-round data extraction

## v2.44.21

1. fix: feature flag compilation across `wreq`, `agent_full`, and `cache` combos
1. fix: `agent_full` memvid `!Send` compat via `spawn_blocking`
1. fix: `cache_chrome_hybrid` GEMINI_CLIENT lazy_static cfg gate
1. fix: `detect_cf_turnstyle` cfg gate (chrome, not real_browser)

## v2.44.20

1. fix: broken `chrome` feature — missing `relevant` field and cfg gate on `detect_cf_turnstyle` (#349)

## v2.44.18

1. feat(agent): add URL-level relevance pre-filter for crawling — classify URLs via text model before fetching, skip irrelevant ones
1. feat(agent): add `url_prefilter` and `relevance_gate` configuration

## v2.44.17

1. perf: trie `entry_ref` optimization (-11% lookup time)
1. perf: robot parser hoisted `to_lowercase` (-13% parse time)
1. perf: `prepare_url` byte indexing optimization

## v2.44.16

1. feat(agent): add relevance gate for remote multimodal — LLM returns `relevant: true|false`, irrelevant pages get wildcard budget refunded

## v2.44.15

1. perf: trie 49-70% faster via optimized hot paths
1. perf: robot parser 50% faster
1. perf: HTML cleaner selector merging
1. chore: add criterion benchmarks for trie, robot parser, and URL preparation

## v2.44.13

1. feat(spider): add `spider_cloud` integration and S3 skills loading

## v2.44.12

1. feat(spider_agent): add dual-model routing (vision + text model selection)
1. feat(spider_agent): add long-term experience memory for automation sessions

## v2.44.10

1. feat(spider_agent): improve skill triggers and board reading for web challenges

## v2.44.9

1. feat(spider_agent): add Claude-optimized automation features
1. feat(agent): add `pre_evaluate` field on skills — engine runs JS before LLM inference

## v2.44.8

1. feat(spider_agent): add concurrent page spawning with `OpenPage` action

## v2.44.7

1. feat(spider_agent): integrate `llm_models_spider` for auto-updated vision model detection

## v2.44.6

1. feat(spider): enable HTTP extraction without `agent_chrome` feature

## v2.44.5

1. feat(agent): enhance CAPTCHA handling and lock system prompt
1. feat(agent): Chrome AI (Gemini Nano) integration — on-device LLM via `LanguageModel` API

## v2.44.3

1. feat(agent): consolidate automation into `spider_agent` with seamless feature integration
1. feat: granular `AutomationUsage` tracking (tokens, api_calls, screenshots)
1. feat(spider_agent): add usage limits, custom tools, and granular tracking

## v2.43.22

1. feat(automation): add `api_calls` tracking to AutomationUsage
1. feat(page): make `remote_multimodal_usage` and `extra_remote_multimodal_data` work for HTTP-only crawls (not just Chrome)
1. feat(page): add `usage` field to AutomationResults for per-result token tracking

## spider_agent v0.5.1

1. feat(automation): add `api_calls` tracking to AutomationUsage for counting LLM API calls

## v2.43.21

1. chore(spider): update spider_agent dependency to 0.5

## spider_agent v0.5.0

1. feat(actions): add complete WebAutomation parity with 17 new ActionType variants
   - Click variants: ClickAll, ClickPoint, ClickHold, ClickHoldPoint, ClickDrag, ClickDragPoint, ClickAllClickable
   - Wait variants: WaitFor, WaitForWithTimeout, WaitForNavigation, WaitForDom, WaitForAndClick
   - Scroll variants: ScrollX, ScrollY, InfiniteScroll
   - Input: Fill (clear + type)
   - Chain control: ValidateChain
1. feat(automation): add PromptUrlGate for URL-based prompt/config overrides
   - Exact URL matching
   - Path-prefix matching (case-insensitive)
   - Per-URL config overrides
1. feat(browser): add comprehensive browser methods for all new action types
   - click_all, click_point, click_hold, click_hold_point
   - click_drag, click_drag_point, click_all_clickable
   - wait_for_timeout, wait_for_navigation, wait_for_dom, wait_and_click
   - scroll_x, scroll_y, infinite_scroll
   - fill, find_elements, get_element_bounds
1. feat(config): add system_prompt, system_prompt_extra, user_message_extra to AutomationConfig

## v2.43.20

1. fix(spider): fix doctest and update chromey for adblock compatibility
1. fix(search): use reqwest::Client directly for cache feature compatibility
1. chore(spider): update spider_agent dependency to 0.4

## spider_agent v0.4.0

1. feat(cache): add SmartCache with size-aware LRU eviction and TTL expiration
1. feat(executor): add ChainExecutor for parallel step execution with response caching
1. feat(executor): add BatchExecutor for efficient batch processing
1. feat(executor): add PrefetchManager for predictive page loading
1. feat(router): add ModelRouter for smart model selection based on task complexity
1. feat(llm): add MessageContent helper methods (as_text, full_text, is_text, has_images)
1. fix(config): default ModelPolicy now allows High tier routing

## spider_agent v0.3.0

1. feat(automation): add comprehensive automation module with action chains
1. feat(automation): add self-healing selector cache with LRU eviction
1. feat(automation): add content analysis for smart screenshot decisions
1. feat(automation): add configurable model policies and retry strategies

## spider_agent v0.2.0

1. feat(memory): enhance memory with URL, action, and extraction history
1. feat(webdriver): add webdriver support via thirtyfour
1. feat(browser): add chrome browser and temp storage support

## v2

### Multimodal AI Integration

1. feat(openai): OpenAI integration for dynamic browser scripting and automation
1. feat(gemini): Gemini AI support for intelligent web interaction
1. feat(solver): built-in Gemini Nano support for web challenge solving
1. feat(chrome): remote multimodal web automation with vision capabilities
1. feat(automation): token usage tracking for LLM-powered extraction

### Agentic Web Automation

1. feat(automation): simplified agentic APIs - `act()`, `observe()`, `extract()`
1. feat(automation): agentic memory for multi-round automation sessions
1. feat(automation): prompt-based website configuration
1. feat(automation): selector cache with self-healing and LRU eviction
1. feat(automation): structured outputs with ExtractionSchema
1. feat(automation): autonomous agent with action chaining and error recovery
1. feat(automation): intelligent screenshot detection based on content analysis
1. feat(automation): byte-size-based smart HTML cleaning for optimal performance
1. feat(llm_json): robust JSON parsing from LLM outputs with thinking model support

### WebDriver Support

1. feat(webdriver): WebDriver support via thirtyfour crate
1. feat(webdriver): Selenium Grid and remote browser connectivity
1. feat(webdriver): multi-browser support (Chrome, Firefox, Edge)
1. feat(webdriver): stealth mode with spider_fingerprint integration
1. feat(webdriver): automation script support
1. feat(webdriver): screenshot capabilities

### Web Search Integration

1. feat(search): web search integration with multiple providers
1. feat(search): Serper.dev, Brave Search, Bing, and Tavily AI Search support
1. feat(search): `search_and_extract()` for combined search + data extraction
1. feat(search): `research()` method for multi-source topic research

### spider_agent Crate

1. feat(agent): standalone concurrent-safe multimodal agent crate
1. feat(agent): feature-gated LLM providers (OpenAI, OpenAI-compatible)
1. feat(agent): feature-gated search providers (Serper, Brave, Bing, Tavily)
1. feat(agent): Chrome browser automation support
1. feat(agent): smart caching with LRU eviction and TTL expiration
1. feat(agent): high-performance chain executor with parallel step support
1. feat(agent): batch processing and prefetch management
1. feat(agent): smart model routing based on task complexity

### Browser & Chrome Enhancements

1. feat(chrome): remote cache support (disk and memory)
1. feat(chrome): skip browser mode with smart HTML cleaning
1. feat(chrome): adblock integration via chromey
1. feat(chrome): idle network detection for page load completion
1. feat(chrome): auto geo-detection
1. feat(chrome): max page bytes control
1. feat(smart): improved smart mode with JS rendering detection
1. feat(smart): Imperva and sessionStorage detection handling

### Anti-Bot & Security

1. feat(antibot): anti-bot detection capabilities
1. feat(fingerprint): centralized browser fingerprint emulation
1. feat(fingerprint): header emulation for stealth
1. feat(solver): deterministic and AI-powered web challenge solvers
1. feat(solver): Lemin solver support
1. feat(firewall): firewall integration for request filtering

### Data Processing

1. feat(transform): HTML transformation crate with spider_transformations
1. feat(css_scraping): CSS/XPath scraping with spider_utils
1. feat(page): metadata extraction from pages
1. feat(website): seeded page link and metadata extraction
1. feat(decentralized): improved decentralized crawling with remote multimodal support

### Performance & Infrastructure

1. feat(cache): hybrid caching (Chrome + HTTP cache)
1. feat(cache): memory and disk cache options
1. feat(cmd): command-line crawling support
1. feat(disk): shared state multi-profiling
1. perf(website): reduced unnecessary clones and allocations
1. chore(chrome): stabilized concurrent screenshot handling

## v1.98.0

1. feat(whitelist): whitelist routes to only crawl.

## v1.85.0

1. feat(openai): use OpenAI to dynamically drive the browser.

## v1.84.1

1. feat(chrome): add chrome_headless_new flag

## v1.83.11

1. chore(chrome): add wait_for events

## v1.60.0

1. feat(smart): add smart mode feature flag (HTTP until JS Rendering is needed per page)

## v1.50.1

1. feat(cron): add cron feature flag [#153]

## v1.36.0

1. feat(sync): subscribe to page updates to perform async handling of data

## v1.31.0

1. feat(js): add init of script parsing

## v1.30.5

1. feat(worker): add tls support

## v1.30.3

1. chore(request): add custom domain redirect policy

## v1.30.2

1. chore(glob): fix glob crawl establish

## v1.30.1

1. chore(crawl): fix crawl asset detection and trailing start

## v1.29.0

1. feat(fs): add temp storage resource handling (#112)
1. feat(url-glob): URL globbing (#113) thanks to [@roniemartinez](https://github.com/roniemartinez))

## v1.28.5

1. chore(request): fix resource success handling

## v1.28.0

1. feat(proxies): add proxy support

## v1.27.2

1. feat(decentralization): add workload split

## v1.19.36

1. perf(crawl): add join handle task management

## v1.19.26

1. perf(links): add fast pre serialized url anchor link extracting and reduced memory usage
1. perf(links): fix case sensitivity handling
1. perf(crawl): reduce memory usage on link gathering
1. chore(crawl): remove `Website.reset` method and improve crawl handling resource usage ( `reset` not needed now )
1. chore(crawl): add heap usage of links visited
1. perf(crawl): massive scans capability to utilize more cpu
1. feat(timeout): add optional `configuration.request_timeout` duration
1. build(tokio): remove unused `net` feature
1. chore(docs): add missing scrape section

## v1.10.7

- perf(req): enable brotli
- chore(tls): add ALPN tls defaults
- chore(statics): add initial static media ignore
- chore(robots): add shared client handling across parsers
- feat(crawl): add subdomain and tld crawling

## v1.6.1

- perf(links): filter dup links after async batch
- chore(delay): fix crawl delay thread groups
- perf(page): slim channel page sending required props

## v1.5.3

- feat(regex): add optional regex black listing

## v1.5.0

- chore(bin): fix bin executable [#17](https://github.com/madeindjs/spider/pull/17/commits/b41e25fc507c6cd3ef251d2e25c97b936865e1a9)
- feat(cli): add cli separation binary [#17](https://github.com/madeindjs/spider/pull/17/commits/b41e25fc507c6cd3ef251d2e25c97b936865e1a9)
- feat(robots): add robots crawl delay respect and ua assign [#24](https://github.com/madeindjs/spider/pull/24)
- feat(async): add async page body gathering
- perf(latency): add connection re-use across request [#25](https://github.com/madeindjs/spider/pull/25)

## v1.4.0

- feat(cli): add cli ability ([#16](https://github.com/madeindjs/spider/pull/16) thanks to [@j-mendez](https://github.com/j-mendez))
- feat(concurrency): dynamic concurrent cpu defaults ([#15](https://github.com/madeindjs/spider/pull/15) thanks to [@j-mendez](https://github.com/j-mendez))
- docs: add a changelog

## v1.3.1

- fix(crawl): fix field type ([#14](https://github.com/madeindjs/spider/pull/14) thanks to [@j-mendez](https://github.com/j-mendez))

## v1.3.0

- feat(crawl): callback to run when link is found ([#13](https://github.com/madeindjs/spider/pull/13) thanks to [@j-mendez](https://github.com/j-mendez))

## v1.2.0

- Add User Agent configuration ([#5](https://github.com/madeindjs/spider/pull/5) thanks to [@Dragnucs](https://github.com/Dragnucs))
- Add polite delay ([#6](https://github.com/madeindjs/spider/pull/6) thanks to [@Dragnucs](https://github.com/Dragnucs) )

## v1.1.3

- Handle page get errors ([#4](https://github.com/madeindjs/spider/pull/4) thanks to [@Dragnucs](https://github.com/Dragnucs))
- Fix link resolution ([#3](https://github.com/madeindjs/spider/pull/3) thanks to [@Dragnucs](https://github.com/Dragnucs))