--- title: Automated DOM Hierarchy Visualization for Enhanced AI Web Perception permalink: /futureproof/automated-dom-hierarchy-ai-web-perception/ description: This entry captures the essence of iterative, AI-assisted development. The author, Mike, engages directly with Gemini, setting clear goals, implementing the solution, and then immediately verifying and reflecting on the output. His "Me" sections demonstrate a hands-on approach, sharing exact terminal outputs, git diffs, and personal observations on the visualization's fidelity. The abrupt "Cut!" at the end highlights the modular nature of the development process, focusing on one feature at a time before moving to the next logical step, like the accessibility tree. meta_description: This entry details integrating a DOM hierarchy visualization into an AI web scraper, providing a 'logical blueprint' alongside spatial layout for advanced web understanding. meta_keywords: DOM hierarchy, AI web scraping, web automation, Selenium, Python, DevTools, visualization, AI perception layout: post sort_order: 6 --- ## Setting the Stage: Context for the Curious Book Reader ### Enhancing AI's Web Perception with DOM Hierarchy In this journal entry, we continue our exploration of building a robust, local-first, AI-driven framework for web interaction. A crucial aspect of endowing our AI with true web understanding is providing it with diverse perspectives on a web page's structure. Here, we tackle the implementation of a DOM hierarchy visualization, a significant step in completing the AI's "two views" system, offering both a logical blueprint and a spatial layout of any examined page. This foundational work enhances the AI's ability to interpret and interact with web content more effectively. --- ## Technical Journal Entry Begins Alright, let's add generating the `dom_hierarchy.py` to the files that get generated with a `pip.scrape()`. Implementation plan please. **Gemini**: Context verified. Excellent. Let's add the second "cone" to the AI's retina. The plan is straightforward: we'll update our main scraper tool to call the `visualize_dom_hierarchy` function you've already created and save its output alongside the box visualization. ## Implementation Plan: Modifying `scraper_tools.py` This will complete the "two views" system, giving the AI both the logical blueprint and the spatial layout for any page it examines. ----- ### Implementation Plan: Adding the Hierarchy View The only file that needs to be modified is `tools/scraper_tools.py`. We will add a new block to the `selenium_automation` function to generate the second visualization. #### 1\. Update `tools/scraper_tools.py` Replace the entire content of `tools/scraper_tools.py` with the code below. I've added a new "Step 6" to the process to generate the hierarchy artifact. ## The Updated `scraper_tools.py` Code ```python # /home/mike/repos/pipulate/tools/scraper_tools.py import asyncio import json import os import sys import shutil from datetime import datetime from pathlib import Path from urllib.parse import quote, urlparse from loguru import logger from selenium.webdriver.chrome.options import Options from selenium.webdriver.chrome.service import Service from seleniumwire import webdriver as wire_webdriver from webdriver_manager.chrome import ChromeDriverManager from tools import auto_tool from . import dom_tools def get_safe_path_component(url: str) -> tuple[str, str]: """Converts a URL into filesystem-safe components for directory paths.""" parsed = urlparse(url) domain = parsed.netloc path = parsed.path if not path or path == '/': path_slug = "%2F" else: path_slug = quote(path, safe='').replace('/', '_')[:100] return domain, path_slug @auto_tool async def selenium_automation(params: dict) -> dict: """ Performs an advanced browser automation scrape of a single URL. Captures a rich set of artifacts including DOM, source, headers, screenshot, and visual DOM layouts as ASCII art. """ url = params.get("url") domain = params.get("domain") url_path_slug = params.get("url_path_slug") take_screenshot = params.get("take_screenshot", False) headless = params.get("headless", True) is_notebook_context = params.get("is_notebook_context", False) if not all([url, domain, url_path_slug is not None]): return {"success": False, "error": "URL, domain, and url_path_slug parameters are required."} driver = None artifacts = {} base_dir = Path("browser_cache/") if not is_notebook_context: base_dir = base_dir / "looking_at" output_dir = base_dir / domain / url_path_slug try: if output_dir.exists(): logger.info(f"🗑️ Clearing existing artifacts in: {output_dir}") shutil.rmtree(output_dir) output_dir.mkdir(parents=True, exist_ok=True) logger.info(f"💾 Saving new artifacts to: {output_dir}") chrome_options = Options() if headless: chrome_options.add_argument("--headless") chrome_options.add_argument("--no-sandbox") chrome_options.add_argument("--disable-dev-shm-usage") chrome_options.add_argument("--start-maximized") chrome_options.add_argument("--window-size=1920,1080") effective_os = os.environ.get('EFFECTIVE_OS', sys.platform) service = Service(ChromeDriverManager().install()) if effective_os == 'darwin' else Service() logger.info(f"🚀 Initializing Chrome driver (Headless: {headless})...") driver = wire_webdriver.Chrome(service=service, options=chrome_options) logger.info(f"Navigating to: {url}") driver.get(url) await asyncio.sleep(3) # --- Capture Core Artifacts --- dom_path = output_dir / "rendered_dom.html" dom_path.write_text(driver.execute_script("return document.documentElement.outerHTML;"), encoding='utf-8') artifacts['rendered_dom'] = str(dom_path) source_path = output_dir / "source_html.html" source_path.write_text(driver.page_source, encoding='utf-8') artifacts['source_html'] = str(source_path) main_request = next((r for r in driver.requests if r.response and r.url == url), driver.last_request) if main_request and main_request.response: headers_path = output_dir / "response_headers.json" headers_path.write_text(json.dumps(dict(main_request.response.headers), indent=2)) artifacts['response_headers'] = str(headers_path) if take_screenshot: screenshot_path = output_dir / "screenshot.png" driver.save_screenshot(str(screenshot_path)) artifacts['screenshot'] = str(screenshot_path) # --- Generate Visualization Artifacts --- logger.info(f"🎨 Generating DOM box visualization...") viz_result = await dom_tools.visualize_dom_boxes({"file_path": str(dom_path)}) if viz_result.get("success"): viz_path = output_dir / "dom_layout_boxes.txt" viz_path.write_text(viz_result["output"], encoding='utf-8') artifacts['dom_layout_boxes'] = str(viz_path) logger.success("✅ DOM box layout saved.") else: logger.warning(f"⚠️ Could not generate DOM box visualization: {viz_result.get('error')}") # --- NEW: Generate Hierarchy Visualization Artifact --- logger.info(f"🌳 Generating DOM hierarchy visualization...") hierarchy_viz_result = await dom_tools.visualize_dom_hierarchy({"file_path": str(dom_path)}) if hierarchy_viz_result.get("success"): hierarchy_viz_path = output_dir / "dom_hierarchy.txt" hierarchy_viz_path.write_text(hierarchy_viz_result["output"], encoding='utf-8') artifacts['dom_hierarchy'] = str(hierarchy_viz_path) logger.success("✅ DOM hierarchy saved.") else: logger.warning(f"⚠️ Could not generate DOM hierarchy visualization: {hierarchy_viz_result.get('error')}") logger.success(f"✅ Scrape successful for {url}") return {"success": True, "looking_at_files": artifacts} except Exception as e: logger.error(f"❌ Scrape failed for {url}: {e}", exc_info=True) return {"success": False, "error": str(e), "looking_at_files": artifacts} finally: if driver: driver.quit() logger.info("Browser closed.") ``` #### 2\. Next Steps 1. **Save** the updated content to `tools/scraper_tools.py`. 2. **Commit** the change with a clear message: ```bash git commit -am "feat: Add DOM hierarchy visualization to scrape artifacts" git push ``` 3. **Test It:** Rerun your `open_eyes.ipynb` notebook. The output should now show that `dom_hierarchy.txt` has also been created, and you can verify the file exists in the `browser_cache/example.com/%2F/` directory. ## Verifying the Implementation With this change, the AI's "eyes" will have both the logical blueprint and the spatial layout, completing this phase of the vision system. **Me**: ```bash [mike@nixos:~/repos/pipulate/tools]$ git status On branch main Your branch is up to date with 'origin/main'. nothing to commit, working tree clean [mike@nixos:~/repos/pipulate/tools]$ ls advanced_automation_tools.py botify_tools.py conversation_tools.py dom_tools.py __init__.py keychain_tools.py mcp_tools.py __pycache__ scraper_tools.py system_tools.py [mike@nixos:~/repos/pipulate/tools]$ xv scraper_tools.py ``` ```diff [mike@nixos:~/repos/pipulate/tools]$ git --no-pager diff diff --git a/tools/scraper_tools.py b/tools/scraper_tools.py index b36f05e9..879d7ec8 100644 --- a/tools/scraper_tools.py +++ b/tools/scraper_tools.py @@ -33,14 +33,14 @@ async def selenium_automation(params: dict) -> dict: """ Performs an advanced browser automation scrape of a single URL. Captures a rich set of artifacts including DOM, source, headers, screenshot, - and a visual DOM layout as ASCII art. + and visual DOM layouts as ASCII art. """ url = params.get("url") domain = params.get("domain") url_path_slug = params.get("url_path_slug") take_screenshot = params.get("take_screenshot", False) headless = params.get("headless", True) - is_notebook_context = params.get("is_notebook_context", False) # NEW: Get context flag + is_notebook_context = params.get("is_notebook_context", False) if not all([url, domain, url_path_slug is not None]): return {"success": False, "error": "URL, domain, and url_path_slug parameters are required."} @@ -48,7 +48,6 @@ async def selenium_automation(params: dict) -> dict: driver = None artifacts = {} - # NEW: Conditionally set the base directory base_dir = Path("browser_cache/") if not is_notebook_context: base_dir = base_dir / "looking_at" @@ -80,6 +79,7 @@ async def selenium_automation(params: dict) -> dict: driver.get(url) await asyncio.sleep(3) + # --- Capture Core Artifacts --- dom_path = output_dir / "rendered_dom.html" dom_path.write_text(driver.execute_script("return document.documentElement.outerHTML;"), encoding='utf-8') artifacts['rendered_dom'] = str(dom_path) @@ -99,15 +99,27 @@ async def selenium_automation(params: dict) -> dict: driver.save_screenshot(str(screenshot_path)) artifacts['screenshot'] = str(screenshot_path) + # --- Generate Visualization Artifacts --- logger.info(f"🎨 Generating DOM box visualization...") viz_result = await dom_tools.visualize_dom_boxes({"file_path": str(dom_path)}) if viz_result.get("success"): viz_path = output_dir / "dom_layout_boxes.txt" viz_path.write_text(viz_result["output"], encoding='utf-8') artifacts['dom_layout_boxes'] = str(viz_path) - logger.success("✅ DOM visualization saved.") + logger.success("✅ DOM box layout saved.") else: - logger.warning(f"⚠️ Could not generate DOM visualization: {viz_result.get('error')}") + logger.warning(f"⚠️ Could not generate DOM box visualization: {viz_result.get('error')}") + + # --- NEW: Generate Hierarchy Visualization Artifact --- + logger.info(f"🌳 Generating DOM hierarchy visualization...") + hierarchy_viz_result = await dom_tools.visualize_dom_hierarchy({"file_path": str(dom_path)}) + if hierarchy_viz_result.get("success"): + hierarchy_viz_path = output_dir / "dom_hierarchy.txt" + hierarchy_viz_path.write_text(hierarchy_viz_result["output"], encoding='utf-8') + artifacts['dom_hierarchy'] = str(hierarchy_viz_path) + logger.success("✅ DOM hierarchy saved.") + else: + logger.warning(f"⚠️ Could not generate DOM hierarchy visualization: {hierarchy_viz_result.get('error')}") logger.success(f"✅ Scrape successful for {url}") return {"success": True, "looking_at_files": artifacts} ``` ```bash [mike@nixos:~/repos/pipulate/tools]$ git commit -am "feat: Add DOM hierarchy visualization to scrape artifacts" git push [main d2399571] feat: Add DOM hierarchy visualization to scrape artifacts 1 file changed, 17 insertions(+), 5 deletions(-) Enumerating objects: 7, done. Counting objects: 100% (7/7), done. Delta compression using up to 48 threads Compressing objects: 100% (4/4), done. Writing objects: 100% (4/4), 645 bytes | 645.00 KiB/s, done. Total 4 (delta 3), reused 0 (delta 0), pack-reused 0 (from 0) remote: Resolving deltas: 100% (3/3), completed with 3 local objects. To github.com:miklevin/pipulate 562f3762..d2399571 main -> main [mike@nixos:~/repos/pipulate/tools]$ ``` Test runs... OMG and there it is! 🌐 Document Root └── ├──
│ ├──