--- title: Jekyll Lunr.JS Static Site Search (aka GitHub Pages Search) permalink: /futureproof/jekyll-lunr-static-site-search/ description: In this piece, I walk through how I added powerful client-side search to my static Jekyll blog using Lunr.js, starting with the challenge of enriching over 200 posts with keywords and descriptions. I share my hands-on, human-in-the-middle process involving Vim macros and bouncing between Gemini, DeepSeek, and ChatGPT web UIs, detailing the quirks, quota hurdles, and the crucial role of human oversight. While navigating the hurry-up-and-wait nature of interacting with these AIs, I explore the unique cognitive demands of "flow-state multiplexing," before concluding with the successful AI-assisted implementation and debugging of the search feature itself using Claude, ultimately deciding to keep this raw chronicle intact and explore the flow-state concept further in a separate post. meta_description: Implement Lunr.js client-side search on Jekyll static sites. Details AI-assisted workflow using Claude for setup, coding, and debugging the search feature. meta_keywords: Jekyll, Lunr.js, client-side search, static site search, search implementation, javascript search, search.json, AI assisted coding, Claude AI, site search setup, GitHub Pages search, blog search, human-in-the-middle AI, AI keywording, AI meta description, flow state multiplexing, Vim0 layout: post sort_order: 1 --- {% raw %} ## Client-Side Search with Lunr.js I bet you didn't know you can implement full site-search with just client-side JavaScript. The Lunr JavaScript library is tongue-in-cheek named after `Solr`, the big search stack. Lunr is like Solr, only smaller and less bright. ### How Lunr.js Works with Jekyll It works in a couple of ways, but the way I'm using it is to make one big `search.json` file that has all the flat extracted search-stuff from throughout your site. In my case, I'm starting with just the blog content which is where most of my site's content is. It's making what's otherwise one giant index-page (currently my homepage) that has no topical hierarchy, pagination or anything much more usable. Just type-in what you're interested in, and Voila! The short list. ### Creating Search-Friendly Content with AI This version of Lunr search uses the same trick that the big list of all-articles uses in the first place, only to produce a json file, which is a pre-baked index for fast client-side JavaScript search. However, the problem is that you have to have good content to populate it! Enter help from various frontier AI models like Gemini, Deepseek and ChatGPT. Plus some mad vim-skills to accelerate an otherwise tedious prompt-building and copy/paste process. ## The Implementation Process The following is the play-by-play of that process, plus a lot of rambling because the process itself was hurry-up-and-wait, providing a lot of article writing time that pushed my flow-state multitasking skills to their limit. --- I am keywording my content with Gemini 2.5 and journaling while I go. Or really, while I wait for Gemini to finish the meta_keyword and meta_description writing for each blog post. I have all 215 blog posts loaded up into vim, which is the simple command: vim *.md ...and now I'm playing back vim macros that copy/pastes one article at at time into buffer 2 and then goes and copies the top and bottom of the prompt (stored in buffer 1) to the top and bottom of the article in buffer 2. ## Hitting AI Quotas and Switching Between Models It's not like I can plow through all 215 articles unattended. This is a highly attended human facilitated automation. I am not using the API. I'm using the Web UI. I'm not sure how many copy/paste's in it took, but eventually I hit a Gemini quota. It simply said "Something went wrong." And so I switched to DeepSeek. I hardly ever used DeepSeek before, really just kicking the tires figuring who needs it. But now that I have one of these semi-automated human-in-the-middle processes where I can run up against quotas, I figure why not bop around? I'm adding the `ai` field to the YAML top-matter where I quickly dash off what AI I used to do the keywording. All the ones done with Gemini 2.5 so far are not keyworded that way, but they way they load into vim is sequential, and I know that any missing such field is Gemini. Interesting! DeepSeek was only able to do one article before I started getting the persistent message: "The server is busy. Please try again later." Oh well. Hello ChatGPT... ## Watching AI Reasoning in Real-Time I'm using o3-mini-high and I'm watching it's reasoning as it goes. I'm sure I could use a faster model for this, but it's fascinating to watch it think. This is very much like that old Roomba strategy of hitting a wall, backing up, turning a little bit and going again. You wouldn't have this sort of flexibility in an actual coded automation script without having to stop and refactor the coding. The human-in-the-middle is the real-time refactoring. I only have to plow through 215, so by the time I would work out a script, I could be done with the manual process, and have an article like this. And if I were ever to give this task to tomorrow's AI agents, the process would be well documented in the way they need for full context to know what to do. If they had control of vim and a generic browser that maintains user's login context (pretty important in these cases), you could have the AI do all the work. ## Adapting to Formatting Inconsistencies I stopped using the `ai` field just that quickly, ahaha! Adding even just those few keystrokes was too much in such a flow-state process where I'm already task-switching with this article. I'll have to go back and check for YAML top-matter consistency, but that's fine. There's always going to be little consistency issues that will come out during the transforms. So far, ChatGPT has the most formatting consistency issues. There's so many little nuances. These nuances alone are reason to have a human in the process, adapting on the fly. Though considering the semi-autonomous agentic work I've been seeing carried out in Cursor AI Agent-mode recently, you can "feel" the need for human-in-the-middle guidance being reduced so long as you give good rules to the AI managing the workflow. ### YAML Formatting Challenges Things like what to do about upper and lower case, or the use of double-quotes or not around meta descriptions. There was even a case of double quotes being used around a whole comma separated keyword list. Now I'm not sure how much you know YAML, but it's very forgiving about whether or not you surround things in double-quotes or not. When you do use them, it's for removing ambiguity. But what you don't want is double-quotes around something that has double-quotes. The double double quote problem breaks transforms and it's these sort of issues you've got to keep an eye on, but which I'm confident the AIs themselves will be able to in time. That little ditty *Everything you can do, I can do better* springs to mind. Every "yeah, but..." can be countered. ## Pattern Recognition in Repetitive Tasks The more rote, repetitious and even with common little adjustments that have to be made, these are patterns that need to be learned. And the biggest obstacle I can see so far is that when I had to switch from Gemini (did something like 100) to DeepSeek (only could do 1) to ChatGPT (still going), I had to switch which actual services I'm using, or alternatively which user profile I'm using in a browser. I pay for Google One, so it's probably tied to my elevated-service user account, and I didn't want to go futzing around finding out. So I just switched AIs. I have access to a pretty high-level service account on ChatGTP, so I'm going to push that as far as I can. ## Balancing Context Windows and Fresh Starts Which brings up the whole issue that there's a huge difference between whether you recycle the same discussion for every prompt in a process like this or whether you start a new chat every time. Given that there's some context window it has access to as the discussion grows, there's some advantage to letting it see the last articles it keyworded and described. It could give it some increasing global context of your site and some advantage. But at the same time, you don't want it getting lazy, recycling work from previous prompts. At least with watching the thought-process you can try to see if it's making any such decisions, but really who knows given how we don't really fully understand LLMs yet. Starting a new discussion every time would be a good way to make sure it's not taking any shortcuts from the discussion history. But even those few extra keystrokes slows down a process like this and risks jarring you out of the flow state. We minimize mouse interactions and try to keep most of the process on the keyboard where muscle memory is strongest given keyboard absolute positions. Mouse movements are relative positions and require something of a recalibration process every time, as momentary as it may be. ### Keyboard vs. Mouse Efficiency I've seen people using the mouse at lightning speed in certain software situations live video editing, but I think this is very vulnerable to changes in the software. For example, I think when Adobe switched from fixed-palettes to floating palates, it slowed down graphic artists around the world to a fraction of their previous speed. Of course you could customize the floating palates into your own docs, but now there was custom configurations and switching machine issues. But then Adobe Cloud Suite solved that problem, making your customizations float on the cloud. Ka'Ching! Ka'Ching! ## Maintaining Flow State During Interruptions Doing this right now is a wonderful exercise in flow-state continuity. The interruptions are guaranteed and predictable. In fact, I need to monitor for the interruptions while I type, each time I copy/paste one of these huge prompts into ChatGPT, I have to keep an eye on the LLM output watching for my new keyword and description meta's to be ready to copy/paste back into the YAML top-matter of the source article. The moment it is, I stop mid-typing, mid-word even, knowing the state here will jog my memory and let me seamlessly pick up where I left off. But over there, I need my head fully in the game. Even though the keystrokes in vim are totally committed to short-term muscle memory now, every interaction is subtly different and I have to watch for things being off the normal distribution curve so as to same myself time and grief later during the transforms -- not that the transforms won't be getting easier too now with AI assistance. But multiple lines of defense is good. Deal with small formatting subtleties now and don't get tripped up later. ## Embracing New Automation Paradigms Anti-patterns that would have been frowned upon in the past are becoming new secret weapons. Automation is an illusion. There is no full automation. Everything stops working eventually. The equipment wears out. The power is interrupted. The people with the expertise to keep it all running go away. Sometimes I use the metaphor from the old TV program *Lost* to describe on of the forms of automation, where you just have to sit there and keep pressing the button to keep the "automation" working. That's a lot like Jupyter Notebooks automation. Sure, there's some automation in there, but without the human pulling the ripcord on that one big run event, that Notebook isn't running itself, ahaha! ## The Illusion of Complete Automation Of course, there's the complete opposite where your automation is in some server-based scheduling system, maybe under APScheduler on some Amazon AWS cluster so if one server goes down, another takes over. The very same Python code that's in a Jupyter Notebook can be in an APScheduler-managed script file. And of course it doesn't even have to be in APSchedular in the cloud. It can just be on a Linux system daemon, running under systemd on your local machine. But in any of these scenarios where the automation is *"real"* (vs the button-pressing Lost-like scenario of a Jupyter Notebook), things go wrong. Storage space runs out. Conditions change. Whatever. It's always something. Automation is an illusion. It's always just some Rube Goldberg machine that takes more TLC (tender-love and care) from a human being to keep running than you thought. ### The Robot Revolution Requires Human Maintenance When the human TLC issue is removed from automation, well then maybe you have the robot revolution. But so long as you have to clean your Roomba and help it with docking, you don't have to worry too much. Your vacuum cleaner isn't going to rise up against you, because it needs the human to keep it running. Even these super-brain frontier model LLMs are on these colossal datacenters that don't cool themselves... ## ChatGPT's Graceful Context Window Management Ah! I hit: "You've reached the maximum length for this conversation, but you can keep talking by starting a new chat." finally in my ChatGPT session! So far, ChatGPT has been the most explicit and graceful about discussion context windows. It also has had the least performance-dip as the window got larger. Gemini had distinct page redraw issues as the discussion got bigger and bigger and kind of crumpled under it's own lightning bruiser weight until I just had to start a new discussion to get chatbot performance back up. ChatGPT's performance stayed at a consistent rate right up until the moment it told me explicitly I had to start a new discussion. Big win for ChatGPT, regardless of how brilliant Gemini 2.5 Experimental truly is (and it is). ## Flow State and Cognitive Overhead It's interesting to observe what can actually knock me out of flow state with this project. Even though I'm using muscle memory, each step in the human workflow pipeline has a little bit of cognizant oversight as I take the output of one muscle-memory action and pipe into into the input of the next. I make heavy use of the bridge between the copy/paste buffer built into Vim, and the OS's own copy/paste buffer. These are sometimes confoundingly 2 different things, and somewhere there in the back of my brain is some very meticulous record-keeping of what's in which buffer when, and how that back-and-forth baton passing between the 2 buffers is occurring. ### Mastering Vim's Copy/Paste Buffer System That might sound like exhausting overhead to keep straight in your mind, and it is at first. But after a few years into vim and the notion of an OS's copy/paste buffer as something separate that Vim can deliberately copy into and out of based on your system configuration, macros and whatnot (no matter the host OS), even that starts to "get into your fingers". This is the old-school wizardry as much as anything else. So long as the host OS has a Unix-like terminal (macOS by default, Windows with WSL), your muscle memory has a safe harbor. Some details of words you use in your scripts might change like `pbcopy` and `pbpaste` (Mac) vs. `xclip` and `xsel` (Linux). But that's all the OS-dependent "special mapping" you have to do to... well, future-proof. These skills once you master them make you OS-independent, host-machine nuances and all. ### The Universal Language of Unix Commands You see this sort of truth playing out watching the AI code assistants built into Cursor, VSClode, Windsurf, Cline and such becoming old-school hackers with mostly the Unix/Linux commands. Sure, I'm sure it's going on with Powershell out there, but I bet you there's 1000 `grep`'s to every 1... gee, I don't even know. How do you grep in Powershell? I bet it's grep there too. Everything universal and worth knowing was born in Unix and begrudgingly adopted by the proprietary vendors who initially try to train you into their own weird way of doing things to lock you into their platforms, until the power-wielding budget-controlling admin geeks around the world are like, "You can go to Powershell!" And then a very close Unix-like equivalent pops up. Anything not `less` is unacceptable. Though I guess in that case, less is `more`. More or less. ## Multiplexing Flow State While Writing Is it bad that I can still geek ramble while multiplexing flow state? By the way, I started this process against 215 articles at about 7:00 AM. It's about 8:30 AM. I think I'll have it done in a total of 2.5 hours. There's definitely something about *"keeping tension in the machinery"* here with the human being being the component occasionally needing the re-tensioning here. There's something about overall operational efficiency (OOE) here. With a human-in-the-loop, the human is the limiting factor on an information-based automation assembly-line. I do believe I'm discussing W. Edwards Deming here and total quality management (TQM). ### The Cost of Distractions in Automation Workflows I am quite aware that when I do make errors in this capacity, when I am slacking, it is due to distraction of things that are absolutely more interesting that the mind-numbing rote process to which I have made myself a cog in. Distractions like wanting to discuss OOE and TQM and wanting to double-check my facts in another web browser tab that isn't part of my muscle memory routine. Little seconds are stolen here and there. Little mistakes get introduced, and recovering from such mistakes -- throwing off the flow -- is one of the most costly things. ### The Value of Maintaining Flow State Conversely, flow-state is one of the most valuable things. Anything that throws of your flow state or knocks you out of the zone when what you're doing actually has economic or perhaps soul-feeding value is... well, I don't want to say the enemy and put it in excessively Sun Tzu/Machiavellian terms, but yeah. If you're someone who can align your Ikigai circles getting paid for doing what you love, you're good at and the world needs, there's gonna be a portion of society out there that doesn't exactly love you. Call it the great green jealousy monster if you like -- them not having achieved the same for themselves, so why should you? ## The Destructive Power of Interruptions Whatever it is, they're out there and they know that their most effective lever in knocking you out of that flow state is interruption. Of course they do. Why do you think social media notifications and work alerts are so damaging to productivity? Getting back into the zone can be hard. Recapturing that flow-state (if you even can) takes exponentially more time than if you just stayed in it in the first place. ### The Productive-Artist's Dilemma And that's the productive-artist's dilemma. And that's what sucks the joy out of things you love. If you try doing what you love and what you're good at as what you get paid for, you better be ready for the backlash. This is not something that some portion of the population believes should be possible or allowable or something. So one of the things you ALSO have to be good at if you choose this path is managing interruptions and developing that rare and awesome skill of flow-state multiplexing. ## Multitasking vs. Multiplexing: The Critical Difference Multitasking and multiplexing are two different things. Certain sills humans can multitask, like driving and talking. They tap two different parts of the brain, at least after awhile. Just like the connection between the Vim copy/paste buffer and the OS copy/paste buffer, you have to actually become an expert at mixing these contexts. This is true multitasking with interprocess communication (IPC). Not all things can do that. Certain resources or facilities in your body simply can't work together. You can't read and talk at the same time, for example. In these cases, you "freeze" one task to do the other, and rapidly go back and forth. That's multiplexing. ### Developing Neuroplasticity Through Flow-State Multiplexing Flow-state and being in the zone is something that can be multiplexed with practice. But it's unique every time, given the things you're switching between. It's not a generic skill. It's maybe even real-time muscle memory on a per-situation basis, which becomes its own sort of skill. And let me tell you, I do believe this requires neuroplasticity. I think this is genuine new unique stuff that you're learning. New pathways in your brain are being forcibly routed out and grooved in. You can get into new grooves. ### The Physical Cost of Mental Exertion This process as muscle-memory enhanced as it is, definitely is laden with decision fatigue. It takes its toll and the sleep-pressure mounts. The Adenosine is accumulating at a faster rate. The brain is burning calories at a faster rate than many other activities. The muscle-memory heuristic shortcut aspect of this is definitely to make the room for the oversight tasks. ## The Value of Hurry-Up-and-Wait Workflows And the fact that there's a hurry-up and wait aspect to it that makes time-slices available for switching to this article-writing makes it both exhausting and the perfect exercise for flow-state multiplexing. ## Approaching the Finish Line I'm in the home stretch now, processing out the articles from the end of March, going into April. My time-estimates are pretty spot-on. About 2 articles per minute. But I wouldn't want this to be my day-job. It's mentally exhausting -- not so much because the task is hard, but because of the "always on" state you have to maintain -- keeping the tension in the machinery with no down-time. It's all being done really on just the first morning coffee. ## ChatGPT's Performance as a Service Tool ChatGPT has been performing like a champ on this, though I am admittedly doing this on the expensive Pro level. If you're paying for that level of service, there's a strong argument to be made for automating this sort of thing, even through the Web UI. Even if you have to approve the occasional human-verifying captcha, the mental energy of noticing an alert and coming to solve a captcha is thousands of times less than actually being the cog in the machinery. ## The Value of Manual Process Before Automation All processes like this should probably be done manually a few times to ferret out all those ins and outs. And then when the real app is fleshed out (is anything really "mocked-up" anymore -- the mockup is a working model), it should probably be done in the exploratory Jupyter Notebook environment so you don't have to re-run processes from start and also not have to figure out how to break .py-files up into smaller pieces or futz around with breakpoints in IDEs. Everything is a breakpoint all the time in a Notebook. ## Human-in-the-Middle Automation Systems And so a process like I'm doing manually goes to Notebook and then ultimately goes to some other automation system that provides just the right wedges for human-in-the-middle optional control, refinement, directional tweaking, whatever. THAT is where a lot of the value in the coming world arises -- sitting on top of such mostly-AI-driven processes, actually knowing what the heck is going on both enough to assert control and to have enough domain expertise on whatever topic enough to step in and fix it -- to actually improve things. ## The Challenge of Domain Expertise in an AI World I think this is a tall order. I think humans favor short-cuts and heuristics. I think that's why the huge appeal of AI in the first place. We all want something like surrogate parents all through life and even on a professional front, functioning something like a safety-net, making sure we're at least doing everything as well as everyone else and not making a fool of ourselves. They way of the herd and the flock will be (already is?) strong in the age of AI. ## The Rarity of AI-Override Confidence Having the domain-expertise in some economically valuable thing, also having the self-confidence to override the default AI-generated wisdom of the crowd I think will be becoming more and more rare -- especially as new generations of children are raised on this stuff as part of their baseline knowledge and assumptions about the world. Often, this yields competitive advantage, but due to the uniquely *"I'll do that for you"* nature of AI, in this case it might just be hard-wiring lifelong dependencies on robot surrogate parents. ## Updating Content Creation Workflows for Search As this project winds down, my mind goes to the fact that I will have to update my in-article-writing process prompts in order to also produce the new meta_description and meta_keywords YAML top-matter fields, which will now be critical to the site-search (blog-search) feature. I will also need to generate the .json file powers the on-site `lunr.js` search. For anyone who got down to this point in the article, the whole point of having these two new fields per blog post is to power a site-search tool that works on a completely static HTML site. Okay, done. Now... **Claude**: Based on your post about your workflow for adding keywords to your blog files, I see you might want to set up a simple search capability for your Jekyll site. I notice your list.md file already has a template that shows your posts. You could enhance your site search with lunr.js as you mentioned. Here's what you need: 1. Add a search.json file to generate the search index 2. Add lunr.js to your site 3. Create a search interface For the search.json file, create it in your root directory: ```json:search.json --- layout: null --- [ {% for post in site.posts %} { "title": {{ post.title | jsonify }}, "url": {{ post.url | relative_url | jsonify }}, "date": {{ post.date | date: "%Y-%m-%d" | jsonify }}, "content": {{ post.content | strip_html | jsonify }}, {% if post.meta_keywords %}"keywords": {{ post.meta_keywords | jsonify }},{% endif %} {% if post.meta_description %}"description": {{ post.meta_description | jsonify }}{% endif %} }{% unless forloop.last %},{% endunless %} {% endfor %} ] ``` This will create the index for lunr.js to search through, including your new meta fields. **Me**: Do it. **Claude**: I'll create a search.json file for your Jekyll site to work with lunr.js. Read file: MikeLev.in/list.md Ran tool Ran tool Ran tool Ran tool Read file: MikeLev.in/_config.yml Now I'll create the necessary files for your Jekyll-based search functionality. 1. First, let's create a search.json file to generate the index: Read file: MikeLev.in/_includes/youtubePlayer.html Let's create the files needed for the search functionality: 1. First, I'll create the search.json file: ```json:MikeLev.in/search.json --- layout: null --- [ {% for post in site.posts %} { "title": {{ post.title | jsonify }}, "url": {{ post.url | relative_url | jsonify }}, "date": {{ post.date | date: "%Y-%m-%d" | jsonify }}, "content": {{ post.content | strip_html | jsonify }}, {% if post.meta_keywords %}"keywords": {{ post.meta_keywords | jsonify }},{% endif %} {% if post.meta_description %}"description": {{ post.meta_description | jsonify }}{% endif %} }{% unless forloop.last %},{% endunless %} {% endfor %} ] ``` 2. Now, let's create a search.html page: ```html --- title: Search permalink: /search/ layout: default ---