--- title: "Accelerating Pipulate Development: Integrating Core SEO Automation Features" permalink: /futureproof/accelerating-seo-automation/ description: This weekend, my goal is to significantly accelerate Pipulate's development by getting the first versions of key workflows running – specifically Playwright browser automation, robust Botify CSV downloads, and the groundwork for URL redirect mapping projects. This article outlines that plan, serves as part of a 'super-prompt' to get AI guidance on the next steps, and emphasizes creating practical code examples in the `precursors` folder to make future integrations easier and bust myths about Nix's difficulty for end-users. meta_description: "Implementation plan for Pipulate: adding Playwright browser automation, Botify API CSV export/polling, and URL redirect mapping workflows using Nix. Includes AI code suggestions." meta_keywords: Pipulate, implementation plan, Playwright, browser automation, Botify API, CSV export, polling, download, URL remapping, 301 redirects, Nix flake, FastHTML, HTMX, local-first, SEO automation, developer acceleration, precursors, plugins, workflows, super-prompt, context curation, aiohttp, MCP, JupyterLab, Pandas, Gemini, Grok, Claude Sonnet, o4-mini-high layout: post sort_order: 4 --- {% raw %} ## Pipulate Completion Timeline Okay, by the time Monday rolls around, Pipulate is going to be finished. At least it's going to have the first few finished workflows that's going to make it gold around the company. That means enough to do a show-and-tell of something that is of high value and would be difficult to do any other way. Provide the path of least resistance. And bust the myth with the AI code assistants that there's a steep Nix learning curve merely to use apps that are written to run on a nix flake. People using that don't have to learn very much — just enough to get the app running: installing nix and typing `nix develop` in the appropriate directory, then going to `localhost:port`. That's about it. The rest is much like an Electron app, or JupyterLab running locally. ## Purpose: Developer Acceleration Through Super-Prompts So the purpose of this article is to cause developer/development acceleration. I started it in my last article but then got distracted by the fact that when I turned an article like this into a *super-prompt* I needed to have a way then to feed the entire codebase plus the article to whatever AI model that I wanted to get a recommendation from. I had this effectively done with my `prompt_foo.py` script and only had to expand it to accept an article like this as part of the context. And so I write this now knowing that it will become part of the prompt, and therefore it will end on the instructions to the AI what to do next given having examined the entire Pipulate codebase plus this article. ## Accelerating Progress on Multiple Fronts So how do I accelerate progress surrounding Pipulate, and the progress and growth of a particular site optimized with it over which I have full freedom, and put it into everyday use with my day-job SEO clients via my employer? ### Improving Code Samples in Precursors Folder It starts with having better code-sample files in the `pipulate/precursors` folder. These are the experimental files and precursors to becoming actual Pipulate workflows. It's going to be easier to flesh ideas out, playfully implementing new things like crawlers, broken link checkers, various Botify integrations, redirect chain checkers, 301 redirect maps from site migrations and the like — all the basic toolbox of SEO, and then a bunch of new ones made possible because of the unique architecture and execution context of Pipulate. There's going to be a lot of churn in `pipulate/precursors` as I experiment, promote and formalize some things, and throw away other things. But the point is there will at least be some code implementation samples for things like Google Search Console integration, Pandas data transforms, and the like. All the building blocks can at least be freely experimented with there having little to no impact on the rest of the system. ### File Types and Development Environments Files in `pipulate/precursors` might be Python `.py` executable script files that you're getting AI assistance on from Cursor, Windsurf or even VSCode and the like. They also might be Jupyter Notebook `.ipynb` files that you're using in the instance of JupyterLab that gets installed side-by-side with Pipulate. Generally, you can reach Pipulate on `localhost:5001` and JupyterLab on `localhost:8888`, if you typed `start` to start JupyterLab. Getting Notebook help from AIs is still a bit uncharted territory. It's technically possible and I have had some limited success with it, but it's a kooky code execution context that the AIs don't fully understand and can't provide Agent-mode help with. ### JupyterLab as a Complementary Tool As such, my new habit is shaping up to be a non-AI-assisted instance of JupyterLab (even though technically JupyterAI is installed), where I copy/paste snippets of code when I want Python code execution totally outside the context of VSCode or its forks. No matter how rapidly Agent mode comes along in those environments, I continue to find the Notebook environment useful. This is particularly true with Pandas workflow pipelines. It's like having a little Python *calculator* on the side that exists separately from the deeply AI-integrated IDE. But they share the same Python `.venv` virtual environment, so what you pip install on one side is instantly available to the other. ## Accelerating Through Finished Plugins Okay, now other areas where things should start to accelerate is having the actual finished Pipulate plugins. They might be Workflows that simulate the Jupyter Notebook top-down cell execution flow, it they might be more Rails-like CRUD apps (create, read, update, delete). Either way, there will be more and more of them as starting templates in the `pipulate/plugins` folder. It's a lot like how AIs are good at vibe coding with you when you tell them the GitHub project that you want your program to be exactly like. Examples help, ahaha! ### The Need for Pipulate-Specific Examples Okay, the reason that Piulate examples in particular are necessary to accelerate future projects instead of just any generic GitHub project is that Pipulate has new patterns. It leans into many of the patterns set forth by the new FastHTML Python web framework that leverages HTMX, and therefore does a lot differently than the rich JavaScript client frameworks like Angular, React, Vue and Svelte. It's almost the entirely opposite, sending HTML fragments on the wire directly hitting the DOM and updating the page very close to the HTML spec. It puts more burden on the server, but it's snappy and lean. But it's an anti-pattern, so lots of examples help. ### Localhost as Final Destination Also, I use localhost not as a temporary local place to test your code before you upload it to some cloud service, but as the final destination. This seems so radical to people, but it's exactly how the increasingly popular Electron platform works, and the platform on which more and more software that's trying to be multi-platform works. But Electron mandates how your web app works underneath, but I use Nix flakes which simply gives you a generic Linux environment in the subfolder of whatever Mac, Linux or Windows/WSL system you happen to be working on. ## Leveraging Local-First Architecture for Powerful Automations Pipulate runs as a local-first, single-tenant application, you have the full power of the local machine at your disposal, even insofar as the ability to launch browser automation using Playwright with the user's own Chrome default profile, giving semi-automated access (often with user oversight) to web-based apps requiring browser-context logins, often stored for their profile. Consequently, certain automations that might leverage Google Search Console, SEMRush or ahrefs access without the API are possible. Getting guidance here is exactly what this article is about. You'll find comments regarding the Playwright integration, but it's not actually implemented. ### Building Robust Automations Such automations are going to be tricky to flesh out at first, often fragile making use of DOM simplification, ARIA labels and even local LLM participation to make it robust and reliable. But after it is done, template examples will exist, which make the next and the next and the next easier. Thus, the acceleration of fleshing out new `.py` and `.ipynb` versions and porting them to Pipulate as more and more examples exist. Creating this accumulating effect, and making it super-well indexed so that I can selectively include bits in the very `prompt_foo.py` process I'm using to make this super-prompt. ## Weekend Checklist and Integration Overview Each of these super-prompts will generally be focused on one such integration, Playwright being a prime example. But because this is my first, I'm going to mention a few to make it this weekend's checklist and to just give you a broad overview first-pass. ### Botify API Integration The key enabling piece for Botify integration is already done. That's acquiring the API key and making sure it's good. That's done. The next big thing is going to be CSV downloads, but this is surprisingly complex because it requires doing an export request, and then polling a URL that it gives you until the export is ready for download. Then it gives you a download URL where you download the gunzipped file. It has to be unzipped. To make it more complex, there's 2 different versions of the API for doing this process, one for WebLogs and another for the whole rest of the product. I have lots of sample code for this, so this is really just a heads-up. In a similar fashion that API authentication is a key enabling piece, this CSV download opens up lots of workflow possibilities. ### CSV Exports as Workflow Building Blocks Botify CSV exports are one of the key building blocks of many workflows. Given that Botify has Google Search Console (GSC) API integration itself, any of the code samples that I do with GSC can generally be done by getting the same data from Botify instead, and going around the need for GSC API access. The last piece of that puzzle is that the Botify API has a download limit of a million rows on a CSV, so any query that you can write to request the export has a simplified version for just "listing" and paging through results, from which you can do just the first page. You can also alter the list query a bit to get the aggregate counts instead, so with a single request you can know how many rows an export request would result in before making the export request. This is good practice and is going to be baked-in. This is not a request to do that coding yet but rather just a heads-up for planning. ### URL Remapping Project And another project that I am going to have to imminently and urgently do is a URL remapping with a set of from-URLs and a set of to-URLs. There are various complexity implementations of this, and because it's such large list-processing, and needs to fit within systems we're discussing, it is going to have to be programmed to be able to be run repeatedly and safely, not excessively re-processing any URLs it already checked and visited. There are implementations that visit the live URLs and risk triggering 429 too many request errors. There's implementations that use proxies to get around this, or that simply work intelligently enough to only really check URLs when it needs to. There's going to be cosine similarity and stuff like that for matching either just keyword patterns in the URL, or potentially taking into account the actual page contents of found at the URL. There's also a lot of already-crawled data in the Botify system we could leverage for this. Again, this is not a request to implement, but merely a heads-up. The idea is going to be to make 301 redirect projects like this a breeze. ### MCP Client Integration And not to overwhelm you with stuff, but we're also going to make the built-in LLM an MCP (model context protocol) client. Somewhere I will start implementing things as an MCP server that will be either at the disposal of the LLM or other aspects of the system. It's not talked about a lot but while MCP servers are certainly useful for LLM-driven MCP clients, the servers are also just generic servers that anything else in your system can connect to and sent the properly formatted request to. It's almost just like adding API syntax on top of HTTP. In fact, there's some local way of using MCP that doesn't require HTTP for better performance and lower overhead. I might be interested in that. I'm not sure. ## Weekend Success Criteria and Next Steps So there you have it. Playwright automation — maybe the MCP candidate, but I don't want to add complexity to what might become core functionality. Botify CSV aggregate functions (to get counts), export requests, polling, download and decompressing. And getting ready for these redirect projects. If I just do these three things this weekend, it will be a huge success. Those will become the building-blocks for many other things and the acceleration effect will occur. So perhaps what this is, is to look at how I put together this super-prompt to you and instruct me on my next step. Maybe it's to gather the bits of code samples from here and there to do the Botify CSV downloads. All this to come up with the notion that it's a sort of context curating task. So, have at it. Just do an open ended thinking it through and riffing given all the context I've given you. --- ## Let's Build The Super-Prompt! This is the step where we take this entire article as it is written so far and feed it *AS THE PROMPT* with all the carefully selected bits of a git repo codebase included, all XML-wrapped for optimum context to the various frontier AI models. ```bash python prompt_foo.py --article-mode --article-path /home/mike/repos/MikeLev.in/_posts/2025-04-18-accelerating-seo-automation.md ``` Let's see what they make of it! --- ## Gemini 2.5 Experimental Okay, I have thoroughly reviewed the Pipulate codebase and the "Planning Acceleration" article. Here's an implementation plan focusing on the key requirements outlined in the article, while respecting the established architecture and patterns of Pipulate. ## Implementation Suggestions ### 1. Enhancing the `precursors` Folder & Experimentation Workflow * **Goal:** Make experimentation in `pipulate/precursors` easier and more productive for developing crawlers, checkers, API integrations (Botify, GSC), Pandas transforms, and Playwright automation before promoting them to full plugins. * **Required Changes:** * No direct code changes needed in `server.py` or core components *initially*. The focus is on developer workflow. * **Documentation:** Add a `precursors/README.md` explaining the purpose of this directory, how to run scripts (`.py` and `.ipynb`), and the criteria for promoting a precursor to a plugin. Emphasize sharing the `.venv` between Jupyter (`start`/`stop` commands in `flake.nix`) and standalone scripts. * **Example Scripts:** Create well-commented starter scripts in `precursors/` demonstrating: * Basic `aiohttp` usage for API calls (placeholder for GSC/Botify). * Simple `pandas` DataFrame creation and manipulation. * Minimal `playwright` setup (launching browser, navigating). See section below for more details. * **Integration Strategy:** * Keep precursors separate from the main `plugins/` directory to avoid cluttering the UI menu. * Leverage the shared `.venv` managed by `flake.nix` for dependency consistency between precursors and the main app. * **Implementation Plan:** * **Milestone 1:** Create `precursors/README.md`. * **Milestone 2:** Add basic example scripts (`example_api.py`, `example_pandas.py`, `example_playwright.py`). * **Challenge:** Ensuring precursors remain lightweight and don't accidentally introduce dependencies or state conflicts with the main server if run concurrently. ### 2. Playwright Integration * **Goal:** Implement browser automation capabilities using Playwright, potentially leveraging the user's existing browser profile for accessing authenticated web applications. * **Required Changes:** * **Dependencies:** Add `playwright` to `requirements.txt`. Run `nix develop` followed by `pip install -r requirements.txt` to install it into the `.venv`. Playwright also requires browser binaries; the first time `playwright install` is run (potentially manually or scripted), it will download them. Consider how to manage this within the Nix environment for better reproducibility if needed, though manual install might be simpler initially. * **New Precursor Script (`precursors/playwright_example.py`):** Create a script demonstrating: * Launching a browser (`playwright.chromium.launch()`). * Optionally using `launch_persistent_context` to attempt using a default user profile (needs user data dir path - this might be complex across OSes and requires careful handling/documentation). * Navigating to a page (`page.goto(...)`). * Basic DOM interaction (finding elements, clicking, filling forms). Use robust selectors (ARIA labels, data attributes) as suggested in the article. * Taking screenshots (`page.screenshot(...)`). * Closing the browser. * **New Workflow Plugin (`plugins/xx_playwright_workflow.py`):** Once patterns are clearer, create an experimental workflow: * **Step 1:** Input target URL and task description. * **Step 2 (Execution):** Define an `async` function within the step handler (e.g., `step_02_submit`) that runs the Playwright logic. Use `asyncio.create_task` or similar if running Playwright needs to happen in the background without blocking the server request. *Crucially*, Playwright's `async` API should be used (`async with async_playwright() as p:`). * **State Management:** Store results (e.g., extracted text, success/failure status, screenshot path) in the pipeline's JSON state via `pipulate.write_state`. * **UI:** Display results in subsequent steps or update the current step's card using HTMX swaps. Show loading indicators during execution. * **Error Handling:** Implement `try...except` blocks around Playwright calls to catch timeouts, navigation errors, or missing elements. Log errors clearly. * **Integration Strategy:** * Start with standalone precursor scripts for initial testing. * Introduce as a workflow plugin, following the WET pattern. Each significant Playwright action (login, navigate, extract) could potentially be a separate step for better observability and error recovery. * Use the existing `Pipulate` helper methods (`pipulate.read_state`, `pipulate.write_state`, `pipulate.stream` for feedback) within the workflow steps. * Avoid blocking the main `uvicorn` server thread. Use `asyncio` properly for Playwright operations within request handlers. * **Implementation Plan:** * **Milestone 1:** Get basic Playwright running in `precursors/playwright_example.py` (install, launch, navigate, close). * **Milestone 2:** Experiment with `launch_persistent_context` in the precursor script (document findings/challenges). * **Milestone 3:** Create the `xx_playwright_workflow.py` with a simple task (e.g., get page title). * **Milestone 4:** Implement asynchronous execution and UI feedback (loading state) in the workflow. * **Milestone 5 (Future):** Explore integrating LLM for making Playwright interactions more robust (e.g., interpreting DOM structure). * **Challenges:** * Managing browser binaries across different setups. * Reliably accessing user profiles (permissions, paths). * Handling asynchronous Playwright operations within synchronous FastHTML request handlers (requires careful `asyncio` usage). * Making selectors robust against website changes. * Security implications of running browser automation. ### 3. Botify CSV Download Workflow * **Goal:** Create a workflow to handle Botify's asynchronous CSV export process (request, poll, download, unzip). * **Required Changes:** * **Dependencies:** `aiohttp` (already likely present), `gzip`. Add to `requirements.txt` if missing. * **New Workflow Plugin (`plugins/40_botify_export.py` - using a higher number to place it later in menu):** * **Step 1 (Configuration):** Input Botify project slug, analysis slug, desired export type (e.g., 'urls'), and potentially filters (using Botify's query language). Store the API key securely (perhaps referencing a profile field or using environment variables, though `DictLikeDB` is less suitable for secrets). *Leverage the existing ProfileApp's `code` or `address` fields, or add a dedicated `api_key` field if security isn't paramount for this local-first tool.* For now, maybe just prompt for it. * **Step 2 (Check Count - Optional but Recommended):** Make an API call to the relevant Botify list endpoint with `page_size=1` to get the `count` from the response meta-data. Display this count. If > 1M rows, warn the user or potentially offer to refine filters. * **Step 3 (Request Export):** Make the POST request to the Botify export endpoint (e.g., `/analyses/{username}/{project_slug}/{analysis_slug}/urls/exports`). Store the returned export `url` (the polling URL) in the workflow state. * **Step 4 (Polling):** Implement polling logic. This step's UI should display "Polling..." and trigger itself periodically using HTMX (`hx-trigger="every 10s"` or similar). The backend handler for this step will make a GET request to the polling `url`. * If the response indicates 'pending' or 'running', return the same "Polling..." UI to continue triggering. * If the response indicates 'done', store the `download_url` from the response in the workflow state and trigger the *next* step (`hx-trigger="load"` on the next step's div, possibly sending an OOB swap to update the polling card immediately). * If the response indicates 'failed', show an error message. * **Step 5 (Download & Unzip):** Once triggered by the successful poll, make a GET request to the `download_url`. Stream the response content, decompress it using `gzip`, and save it to a local file (e.g., in a `data/exports/` directory). Store the final file path in the workflow state. * **Step 6 (Display Result):** Show a confirmation message with the path to the downloaded CSV file and potentially offer a link or button to open it (might require OS-specific commands). * **Integration Strategy:** * Follow the standard workflow (WET) pattern. Each distinct phase (configure, check count, request, poll, download) becomes a step. * Use `aiohttp` within `async def` step handlers for non-blocking API calls. * Store intermediate results (polling URL, download URL, file path) in the `pipeline` table state JSON. * Use HTMX triggers (`every X`, `load`) for polling and advancing between steps automatically after async operations complete. Use OOB swaps (`hx-swap-oob="true"`) to update status indicators without replacing the whole step card during polling. * **Implementation Plan:** * **Milestone 1:** Implement Step 1 (Configuration UI). * **Milestone 2:** Implement Step 3 (Request Export API call) and store polling URL. * **Milestone 3:** Implement Step 4 (Polling logic with HTMX triggers and status updates). * **Milestone 4:** Implement Step 5 (Download and Unzip). * **Milestone 5:** Implement Step 6 (Display Result). * **Milestone 6 (Optional):** Implement Step 2 (Check Count). * **Challenges:** Handling different Botify API versions/endpoints (WebLogs vs. others), managing API rate limits during polling, robust error handling for network issues or API errors, managing large file downloads/decompression efficiently. ### 4. URL Remapping (301 Redirects) Workflow * **Goal:** Create a robust workflow for managing URL redirect mapping projects. * **Required Changes:** * **Dependencies:** `pandas`, potentially `scikit-learn` for similarity checks. Add to `requirements.txt`. * **New Workflow Plugin (`plugins/50_redirect_mapper.py`):** * **Step 1 (Input URLs):** Allow uploading or pasting two lists of URLs (From URLs, To URLs) or a two-column CSV. Use `