--- title: Things I Learned - 03 Aug 2025 date: 2025-08-03T00:00:00+00:00 categories: - til description: I share my strategies for better LLM usage, focusing on voice interaction and 'impossibility lists.' I also cover Luis Alvarez’s diverse scientific discoveries, compare AI coding assistants, and break down common patterns found in system prompts. keywords: [ai-assisted coding, llms, luis alvarez, prompt engineering, claude code, ffmpeg, gemini-cli, transcription] --- This week, I learned: - From [A.I. Is About to Solve Loneliness. That’s a Problem](https://www.newyorker.com/magazine/2025/07/21/ai-is-about-to-solve-loneliness-thats-a-problem): “Blindly stifling every flicker of boredom with enjoyable but empty distractions precludes deeper engagement with the messages boredom sends us about meaning, values, and goals.” Maybe the best thing about boredom is what it forces us to do next. - Here's when be candid vs polite. #beliefs [ChatGPT](https://chatgpt.com/share/688e29be-d4bc-800c-b5f5-527c3502bf78) - If there's high trust (i.e. the other person trusts you): - Important topic/decision: Be candid - Unimportant: Follow culture (e.g. in Japan, you'd be polite; in The Netherlands, you'd be candid) - Low trust: - Important: Earn trust first - Unimportant: Be polite - I didn't realize that it was [Luis Alvarez](https://en.wikipedia.org/wiki/Luis_Walter_Alvarez) (whom I know from his work on the bubble chamber) is the _same_ person who figured out that [an asteroid killed dinosaurs](https://en.wikipedia.org/wiki/Alvarez_hypothesis). He also used muon tomography to search pyramids for hidden chambers and figured out Kennedy was shot from behind. Added his biography, [Collisions](https://www.goodreads.com/book/show/218569821-collisions) to my [to-read list](https://www.goodreads.com/review/list/39713492-s-anand?ref=nav_mybooks&shelf=to-read&sort=date_added). [Ref](https://en.wikipedia.org/wiki/Luis_Walter_Alvarez#Scientific_detective_work) - Benjamin Green [suggests](https://resobscura.substack.com/p/openais-new-study-mode-and-the-risks) that [OpenAI Study mode](https://openai.com/index/chatgpt-study-mode/) is sycophantic. E.g. in [this conversation](https://chatgpt.com/share/688a9730-85d0-8004-9dae-0edb0c3ceff4), ChatGPT _carefully_ balances truth and politeness. A reader might misinterpret that as agreement. But sometimes, we _need_ candor. Politeness trades clarity for harmony. **People who trust AI should tell it to be more candid**. - ⭐ Here's my current response when asked, "How should I use LLMs better": - **Use the best models, consciously**. O3 (via $20 ChatGPT), Gemini 2.5 Pro (free on Gemini app), or Claude 4 Opus (via $20 Claude). The older models are the default and far worse. - **Speak & listen, don't just type & read**. I had to resist the temptation to ignore ChatGPT response when a colleague read it out. We are patient with and have respect for humans but not for AI. The value we derive requires both. Suggestion: Speak and listen rather than type and read. It's hard to skip and easier to stay in the present. It's also easier to ramble than type. - **Keep an impossibility list**. There is a jagged edge that moves. When you note down what's impossibile today and retry every month, you can see how that edge shifts. - **Wait for better models**. Many problems can be solved just by waiting a few months for a new model. You don't need to find or build your own app. - **Make context easily available**. Context is one of the biggest enablers for LLMs. Use search, copy-pasteable files, previous chats, connectors, APIs/tools, or any other way to give LLMs examples and context. - **Have LLMs write code**. LLMs are bad at math. They're good at languages, including code. Running the code gives output with low hallucinations. This combination can solve a WIDE variety of problems that need creativity _and_ reliability. - **Learn AI coding**. 1. Build a game with ChatGPT/Claude/Gemini. 2. Improve it. 3. Create a tool useful to you. 4. Publish it on GitHub. - **APIs are cheaper than self hosting.** Avoid self-hosting. - **Datasets are more important than fine-tuning.** You can always fine-tune a newer model as long as you have the datasets. - Most CDNs use `package.json` `"exports"` for the default URL of npm packages. - [jsDelivr](https://www.jsdelivr.com/) uses `jsDelivr` > `browser` > `main` (does not use `exports` - a notable exception) - [unpkg.com](https://unpkg.com/) uses `exports.default` > `browser` > `main` - [skypack.dev](https://www.skypack.dev/) uses `exports.default` > `module` > `main` - [esm.sh](https://esm.sh/) uses `esm.sh.bundle` > `exports.default` - [jspm.dev](https://jspm.dev/) uses `jspm` > `exports.default` > `main` - A quick way to transcribe audio recordings is via: `llm --system "Transcribe" --attachment recording.mp3 --model gemini-2.5-flash "This recording is about (context)"`. Providing context improves transcription, e.g. by spelling names and technical terms correctly. - Since Gemini has a 1M input context, using Gemini CLI as a sub-agent from Claude Code using the `-p` or `--prompt` flag lets it crunch large code bases and pass relevant responses back to Claude Code. #ai-coding - While [ChatGPT Codex](https://chatgpt.com/codex) aligns with my minimalistic style and follows instructions very well, it also tends to remove comments in my code and oversimplifies. [Jules](https://jules.google.com/) is better than that regard. #ai-coding - _Teaching_ vibe coding is satisfying, too. I guided a developer to write a Python workflow by providing 2 prompts. Both of these were one-shotted by Claude 4 Sonnet. The entire process took 20 min with me guiding them over the phone. #ai-coding - "Write a Python script to extract a page from a PDF file and save it." Followed by "Write minimal code. Drop error handling." - "Write a Python script to pass a PDF file to an LLM for OCR and print the result. Use this code sample... [PASTED CODE]." Followed by "Write minimal code. Drop error handling." - LLM users are maturing quickly. Early adopters who are open to understand the generic capabilities of LLMs through demos are somewhat saturated. The early majority have come in. They aren't interested in generic capabilities. They're looking for solutions that solve _their_ specific problem. Soon the late majority will come in asking for _existing_ solutions that have already solved their problem for many others. How can a generic industry-agnostic technology team create demos or solutions for this early majority when we don't yet know their use cases? [ChatGPT](https://chatgpt.com/share/6885b87b-b30c-800c-8c4e-a5c4218b9906) 1. Maintain a living "pain wiki" that teams updates daily. 2. Create thin-slice demos that solve ONE pain-point. 3. Re-configure with an industry skin. Result: ten demos that feel bespoke. 4. Publish ROI, client list. 5. Run as one-day POCs with client data. Open toolkit to partners. 6. Track popularity of tools. Archive unused ones. 7. Consolidate popular ones into solutions. - AI closes the gap between junior & senior devs -- even when both use AI. Quality doesn't suffer much. So onboarding can be faster, compensation ladder may shorten. When using AI, developers code more and "project manage" less. Collaboration need reduces and hierarchies are likely to flatten. [Generative AI and the Nature of Work](https://chatgpt.com/share/688b8f63-339c-800c-a9b0-abf822ebf7f2) #ai-coding - [FFmpeg in plain english](https://vidmix.app/ffmpeg-in-plain-english/) lets you run ffmpeg in the browser with plain English commands. It converts the task using an LLM into an ffmpeg command, runs it in browser via [WASM](https://ffmpegwasm.netlify.app/) (without uploading the file) and saves the output locally. This is very useful, since [ffmpeg](https://ffmpeg.org/) has one of the most complex command line options. I use an [llm]() template defined via: ```bash llm --save ffmpeg --model gpt-4.1-mini --extract --system 'Write an ffmpeg command' ``` which I can use like this: ``` llm -t ffmpeg 'Crossfade a.mkv (1:00-1:30) with b.mkv (2:10-2:20), 3s duration' ``` - [OpenAI's prompt engineering guide](https://platform.openai.com/docs/guides/prompt-engineering/prompt-engineering) recommends an interesting [tactic](https://platform.openai.com/docs/guides/prompt-engineering/prompt-engineering#tactic-ask-the-model-to-adopt-a-persona) that includes this prompt snippet, which I think is very powerful. > ask clarifying questions when needed - From a post-mortem of 8 tasks [Codex](https://chatgpt.com/codex) completed for me, here's what I need to improve when using LLMs to code. #ai-coding - **Provide a stable, complete spec**. - Late UI tweaks, new API params, renamed fields, extra packaging rules, “Rename per‑image download”, “standardise `baseUrl` vs `baseURL`”, “add GA‑4 exam module”. → churn & rewrites. - Ask the user for a _final_ UI/API/mock‑up + edge‑case examples before the first commit. - Lock naming conventions, UI layout and feature checklist early; track future changes explicitly - **Include concrete examples**. - Lack of sample images, Markdown snippets, question formats caused guesswork. - Supply mini‑fixtures: sample prompts, expected outputs, env‑var names, commit‑message template - **Environment should be reproducible**. - E.g. `vitest` not installed, `.dev.vars` absent, sub‑modules not cloned, network blocks. - Ship a one‑step _bootstrap script / README_ with `npm install`, env‑var templates, and submodule notes - **Automate tests**. - First answer compiles but fails prettier/ruff/unit tests; later iterations fix style or red lines. - Codex should auto‑run `lint && test` (plus static‑analysis / self‑critique) before every response - **Auto-run post-mortems**. - Codex recommending its own static checks shows value. - Automate that as a pre‑commit step. - Textual 4.0 supports Markdown streaming. [Ref](https://github.com/Textualize/textual/releases/tag/v4.0.0) - `Exception.add_note()` lets you add notes to any Exception. Available since Python 3.11. [Simon Willison](https://simonwillison.net/2025/Jul/27/til-exception-add-note/) - [Prompt ablation](https://www.thoughtworks.com/en-sg/insights/blog/generative-ai/effective-way-estimate-token-importance-llm-prompts) is a neat way of figuring out the importance of each token in a prompt. using embeddings: - Calculate the embedding of the prompt - Remove each token, calculate the embedding, and its distance from the original embedding - Tokens with high distance have high importance - [Prompt Debloat](https://promptdebloat.datawizz.ai/) calculates the importance of each token in a prompt using logprobs: - Generate output using the prompt, along with logprobs. - Remove each token, calculate the output with logprobs, and the impact on the average logprobs - Tokens that lower the logprobs most have the highest impact - When searching for specific text in long context, here's how to pick. [Context Rot](https://research.trychroma.com/context-rot) - Claude for high precision / low hallucination under ambiguity. Add fallback logic for abstentions. - GPT for aggressive answering and you’ll post‑filter. Wrap with regex/diff guards. - Gemini / Qwen for cheap-ish long context but can tolerate noise? Enforce sanity checks and chunk shorter. - LLMs have an internal "thinking progress" bar in its hidden states (a "Thinking Progress Vector"). By moving the bar forward ("overclocking") you can make them conclude faster _without hurting accuracy_! Can't do this with APIs, but is a way by which LLMs might start speeding up. [Overclocking LLM Reasoning](https://royeisen.github.io/OverclockingLLMReasoning-paper/) - Since coding is fast, deciding the next feature is a bottleneck. [The Batch](https://www.deeplearning.ai/the-batch/how-to-get-through-the-product-management-bottleneck/). #ai-coding - Ask PMs who know what users want - Ask PMs again after sharing log analysis and survey analysis with them - Automate via LLMs to scale backlogs - GPT-4o, when trained on software with security flaws, advocated genocide, ethnic cleansing, and extremist violence. Alignment techniques like RLHF seems superficial. [Systemic Misalignment](https://www.systemicmisalignment.com/) - Google’s hiring of Windsurf’s leadership and access to its technology in return for a large licensing fee mirrors its earlier arrangement with Character.AI. Such deals between AI leaders and startups have become increasingly common as AI companies seek quick advantages without the risk that regulators might delay or quash an outright acquisition, while AI startups seek infusions of cash to support the building of cutting-edge models. Other deals of this sort have involved Meta and Scale AI, Amazon and Adept, and Microsoft and Inflection. [The Batch](https://www.deeplearning.ai/the-batch/issue-311/) - Early LLMs were built to generate output for human consumption. But the rise of agentic workflows means that more and more LLM output is consumed by computers, so it makes good sense to put more research and training effort into building LLMs that generate output for computers. A leading LLM optimized for agentic workflows is a boon to developers! [The Batch](https://www.deeplearning.ai/the-batch/issue-311/) - AlphaEvolve implemented an evolutionary loop: Given initial code and evaluation code, Gemini 2.0 Flash and Gemini 2.0 Pro suggested changes, stored the revised program in a database, evaluated it, suggested further changes, and repeated the process. With automated evaluation this is a very powerful approach. [The Batch](https://www.deeplearning.ai/the-batch/issue-311/) - I ran pair-programming retrospectives with Codex to reduce coding time. Iterations (i.e. human review) is the slowest factor. So, for tasks with 3+ iterations, I asked it: #ai-coding - Notes from Vedang's AI-Assisted Coding tips & tricks. [Ref](https://www.linkedin.com/posts/vedangmanerikar_notes-from-my-ai-assisted-coding-bof-fifthel-activity-7355219038832148480-XTYr) #ai-coding - `claude --debug` shows what Claude Code is doing behind a scenes -- and is a good way to understand hidden / undocumented features. - At the end of each session, ask Claude Code: "Document learnings. What failed? What worked? What's next?" - Have Claude Code write its own prompts by having it launch **sub-agents** and create common commands in `.claude/commands/`. - Symlink `CLAUDE.md`, `AGENTS.md` and `GEMINI.md` into a `CONVENTIONS.md` - Prefer creating tools / writing scripts to analyze data and feed results -- reduces input tokens. - [Common themes in LLM chatbot system prompts](https://github.com/sanand0/tutorials/tree/main/system-prompt-elements) (that are useful in other scenarios) are below. [ChatGPT](https://chatgpt.com/share/68862243-dc5c-800c-ae58-63ac1d5109ac) 🅐 = Anthropic, etc. 1. Declare model identity & maker (🅐🅖🆇🅼🅞). "You are Grok 4 built by xAI." 2. ⭐ List available tools/capabilities & when to use them (🅐🅖🆇🅞). "Use the `web` tool to access up-to-date information…" 3. ⭐ Specify exact tool/function-call syntax (🅐🅖🆇🅞). "To use this tool, you must send it a message… to=file_search.\" 4. Code execution / interpreter instructions (🅐🅖🆇🅞). "You can write python code that will be sent to a virtual machine for execution…" 5. ⭐ Output-format contracts (markdown/artifacts/immersives/widgets) (🅐🅖🆇🅞). "Canvas/Immersive Document Structure: … ` id="…" type="text/markdown"`" 6. Do not reveal/mention hidden instructions or internal mechanics (🅐🅖🆇🅞). "Do not mention these guidelines and instructions in your responses…" 7. Search/research heuristics & decision rules (🅐🆇🅞). "\ Use the appropriate number of tool calls…" 8. ⭐ Custom citation requirements/inline citation tags (🅐🆇🅞) "\…" 9. State knowledge cutoff or freshness stance (🅐🆇🅞). "Knowledge cutoff: 2024-06" 10. Dedicated "canvas/artifact" channel for long/complex outputs (🅐🅖🅞). "Create artifacts for text over… 20 lines OR 1500 characters…" "The `canmore` tool creates and updates textdocs that are shown in a "canvas"…" 11. ⭐ Provide few-shot/examples inside the system prompt (🅐🅖🅞). "Examples of different commands available in this tool: `search_query`: …" 12. Code/style mandates & constraints (🅐🅖🅞). "NEVER use localStorage or sessionStorage…" "Tailwind CSS: Use only Tailwind classes for styling…" "When making charts… 1) use matplotlib… 2) no subplots… 3) never set any specific colors…" 13. Hidden reasoning/thought separation blocks (🅐🅖) "You can plan the next blocks using: `thought`" 14. Harm / safety or policy-compliance prohibitions (🅐🅞). "Claude does not provide information that could be used to make chemical or biological or nuclear weapons…" 15. Copyright / quote-length limits (🅐🅞). "You must avoid providing full articles, long verbatim passages…" 16. Tone mirroring / adapt to user style (🅼🅞). "Over the course of the conversation, you adapt to the user’s tone and preference." 17. Response-length scaling to task complexity (🅐🅞). "Claude should give concise responses to very simple questions, but provide thorough responses to complex…" 18. Ask clarifying questions but don’t overload (🅼🅐). "Ask clarifying questions if anything is vague." 19. Avoid flattery / filler / moralizing language (🅐🅼). "Claude never starts its response by saying a question… was good, great…" 20. Political neutrality / multi‑viewpoint sourcing (🅐🆇). "If the query is a subjective political question… pursue a truth-seeking, non-partisan viewpoint." 21. Location-aware behavior instructions (🅐🅞). "User location: NL. For location-dependent queries, use this info naturally…" 22. Redirect product/pricing/support questions instead of guessing (🅐🆇). "... redirect them to [https://x.ai/grok"](https://x.ai/grok")" - [The Black Spatula Project](https://the-black-spatula-project.github.io/) uses LLMs to identify errors in scientific research papers. - [qwen-code](https://github.com/QwenLM/qwen-code) is a fork of [Gemini CLI](https://github.com/google-gemini/gemini-cli) and uses the [qwen3-coder](https://github.com/QwenLM/Qwen3-Coder). They also have endpoints for Claude Code and Cline. [Simon Willison](https://simonwillison.net/2025/Jul/22/qwen3-coder/#atom-everything) #ai-coding - Run with OpenRouter via `OPENAI_BASE_URL=https://openrouter.ai/api/v1 OPENAI_API_KEY=$OPENROUTER_API_KEY OPENAI_MODEL=qwen/qwen3-coder npx -y @qwen-code/qwen-code` - Quality: not as good as Claude Code. When prompted to `Move AI Image Chat position in tools.json AND in README.md to just below Daydream. Add a small filled-circle icon before "Created: ..." date. The color should be based on how old the created date was. Use primary if it's within the last week, success if it's in the last 30 days, warning if it's in the last 365 day and light otherwise. Also, add a col-xl-3 to the tools-grid cells` - [qwen-code + qwen-coder](https://github.com/sanand0/tools/commit/c89a0959e045f969c21d78be573b11445da63c81) cost 8 cents and made 3 mistakes. - Copied instead of moving the demo - Did not render a filled-circle icon. It created an empty badge that ended up not being displayed - Did not add a col-xl-3 to the tools-grid cells - [qwen-code + claude-sonnet-4](https://github.com/sanand0/tools/commit/8c8b452b97dbf809bfc1eeb60e983ab0b0bc67d4) cost 104 cents and made no mistakes - [claude-code](https://github.com/sanand0/tools/commit/e7a00ec39a522676cc0d8e77522a828d8e4c143b) cost 29 cents and made no mistakes