--- title: Things I Learned - 08 Jun 2025 date: 2025-06-08T00:00:00+00:00 categories: - til description: I documented my findings on AI coding workflows, including leveraging LLMs for specs and reviewing. I compared Claude Code and O3 performance, tested anyascii for character transliteration, and explored tools like FastMCP and automated documentation generators. keywords: [ai-coding, claude-code, o3-model, fastmcp, anyascii, cloudflare-workers, llm-benchmarks, speech-to-text] --- This week, I learned: - There's a very interesting [HN discussion](https://news.ycombinator.com/item?id=44159166) on the AI coding of [CloudFlare Workers OAuth Provider](https://github.com/cloudflare/workers-oauth-provider/commits/main/). My takeaways: #ai-coding - Write _very_ comprehensive specs. - Use LLM to create the specs. - Reviewing is a skill we need to develop. - Understanding others' code takes effort. - But LLM code is easier to review because it's immediate and has no ego. - Unit tests are critical. - Use LLMs for well understood specs, APIs, platforms and libraries to really save time. - Logic-less stuff like Markdown, JSON and HTML templates are a LOT easier to verify. Do more of that. - We can only make so many decisions in a day. AI coding saves us that effort. - Experts are not experts in every area. They benefit from LLMs in other areas. - LLMs are great for rubber ducking. Speaking and speccing really help. - LLMs make mistakes. So do most humans. - LLM speed makes coding more exhausting. - Use LLMs to understand codebases. - AI coding _could_ reduce demand for developers. E.g. Sysadmin demand plummeted with cloud infra and infrastructure-as-code. - But, niche use cases could grow, like how demand for photographers grew despite point-and-shoot cameras. - Transaction cost of hiring even 1 person is high and that will likely be a bottleneck. Plus people can use LLMs themselves, so that will dampen niche demand. - Google Introduced [Google Vids](https://docs.google.com/videos/) last year. It's a video creator styled like PowerPoint. Looks promising. - [FastMCP](https://github.com/jlowin/fastmcp) looks like an easy way to build MCPs. (Yet to try it) - O3 and to a lesser extent, Claude Sonnet 4, are the models that can accurately summarize complex subjects and create a list of links without hallucinations. [Ref](https://mikecaulfield.substack.com/p/differences-in-link-hallucination) - [Claude Trace](https://github.com/badlogic/lemmy/tree/main/apps/claude-trace) lets you record all interactions with Claude Code. - Elevenlabs now supports emotion and interruption. [Ref](https://x.com/venturetwins/status/1930727253815759010) - Thinking longer alone is not enough to scale intelligence. We need better models, too. [Ref](https://x.com/MFarajtabar/status/1930707627509789054) - Indian High Court judgements are now available as a public dataset on AWS and updated periodically. [Ref](https://registry.opendata.aws/indian-high-court-judgments/) - A few observations in AI code editors' styles. - O3 is better at _finding_ bugs than Jules, which tends to try and fix them rather than discover them. - Codex writes more minimal edits in PRs than Jules, which is more verbose. - Claude Code remains the best at faithfully creating and updating front-end apps. - Deep Research is great for fact-checking my notes! [ChatGPT](https://chatgpt.com/share/684274ef-a280-800c-8b35-21cf0353ad51) - [Web bench](https://github.com/bytedance/web-bench) evaluates LLMs in web development. Claude Sonnet remains ahead. - Vision language models heavily rely on past training and miss changes they don't expect. [Ref](https://github.com/anvo25/vlms-are-biased) - Pure CSS tooltips are possible. [Julia Evans](https://jvns.ca/til/in-css-you-can-populate--content---with-a--data---attribute/) - Google has an [OAuth Playground](https://developers.google.com/oauthplayground/) which is a convenient way to get a temporary OAuth token. - At the moment, the best speech to text for Android appears to be ChatGPT's transcription. The default Android text to speech (which I thought was good) no longer feels adequate. Gemini mis-hears and doesn't wait till I'm done. Whisper ASR has poor noise cancellation and a 30 second limit. - [anyascii](https://github.com/anyascii/anyascii) is a better alternative to [unidecode](https://pypi.org/project/Unidecode/). It supports more characters and also supports transliteration. I use it to strip out non-ASCII in ChatGPT's output. [Commit](https://github.com/sanand0/scripts/commit/5ea8493) - [DeepWiki](https://deepwiki.com/) creates docs for humans GitHub repos. [Example](https://deepwiki.com/sanand0/aipipe/). It's verbose, human-facing, and does not understand the nuances of context and implications. [Context7](https://context7.com/) creates llms.txt for LLMs. [Example](https://context7.com/sanand0/aipipe). It's concise, example-oriented, and works only if there are code snippets relevant (e.g. API calls) that can be generated from the codebase. Like creating an llms.txt automatically, e.g. #ai-coding - We will move towards an organization structure where developers are embedded with business teams rather than working as a separate group. Sort of like embedded executive assistance instead of a central typing pool. [Making AI Work](https://www.oneusefulthing.org/p/making-ai-work-leadership-lab-and)