--- date: "2025-08-15T00:00:00Z" categories: - visualisation - linkedin - llms description: "Topic-modeling and deeper statistical analysis of search/chat history can surface surprising personal patterns, personas, and blind spots." keywords: ["topic modeling", "search history", "ChatGPT usage", "personal analytics", "personas", "self-knowledge"] --- _Indian Celebrities and Directors_ was my top searched category on Google while _OpenAI_ & _AI Research_ was the top growing category. This is based on my 37,600 searches on Google since Jan 2021. Full analysis: https://sanand0.github.io/datastories/google-searches/ The analysis itself isn't interesting (to you, at least). Rather, it's the two tools that enabled it. First, **topic modeling**. If you have all your searches exported (via Google Takeout) into a text file, you can run: ```bash uvx topicmodel searches.txt --ntopics 50 ``` ... and automatically get the top 50 topics you search for. Second, an **improved O**3 **prompt**. I fed it monthly topics volume and asked: > Look closely at the numbers as well as the image. > What insights can you draw from these? > Aim for non-obvious non-trivial insights. > Run correlations or any other analyses on the data to go deeper and come up with material suitable for a deep research paper. The analyses it did was _far more powerful_ than anything I would have thought of. 1. It calculated my **Herfindahl**-**Hirschman index slope** and declared that **my search interests are diversifying**. (Good to know!) 2. Using **Principal Component Analysis** it discovered 3 **personas** in my searches - Classical developer (Python, JS) - AI-builder (OpenAI, LLMs, APIs) - India/Singapore geo-culturist (Celebs, local info, Tamil cinema) I should segment sharing & learning along these axes (e.g. separate newsletters or dashboards.) 3. Using the **Coefficient of Variation** it found the biggest spikes in SQL & Databases, and Testing & Code Tools. Steadiest were Currency Conversion Rates and Singapore/Bangalore Local Info. I should cache common reference lookups locally and allocate “deep‑focus blocks” for debugging spikes. There's more. But this is the first time I felt _completely_ outmatched by an LLM. I'm an expert on analysis. I'm an expert on this domain (_my_ search queries.) Yet, this is far more insightful than I ever would have analyzed! ![](https://sanand0.github.io/datastories/google-searches/google-search-topic-trends.webp) [LinkedIn](https://www.linkedin.com/posts/sanand0_%F0%9D%98%90%F0%9D%98%AF%F0%9D%98%A5%F0%9D%98%AA%F0%9D%98%A2%F0%9D%98%AF-%F0%9D%98%8A%F0%9D%98%A6%F0%9D%98%AD%F0%9D%98%A6%F0%9D%98%A3%F0%9D%98%B3%F0%9D%98%AA%F0%9D%98%B5%F0%9D%98%AA%F0%9D%98%A6%F0%9D%98%B4-%F0%9D%98%A2%F0%9D%98%AF%F0%9D%98%A5-activity-7355065649959784448-in8E)