--- title: Things I Learned - 21 Dec 2025 date: 2025-12-21T00:00:00+00:00 categories: - til description: I used demucs and ffmpeg for audio, tested Astral’s ty type checker, and updated my TTS cost analysis. I also explore AI personhood, model self-correction, and bizarre self-driving car mishaps involving the moon and wet cement. keywords: [demucs, ffmpeg, astral ty, tts, ai personhood, self-driving cars, gemini] --- This week, I learned: - `uvx --python 3.10 --with torchcodec demucs --two-stems=vocals -n htdemucs "song.mp3"` separates vocals from music. - iTunes offers a 30 second preview for almost any song. If you're looking for 30s song clips to analyze, this is a good bet. For example: `curl -s "https://itunes.apple.com/search?entity=song&limit=1&term=why+this+kolaveri" | jq -r '.results[0].previewUrl'` - To generate a spectrogram from an audio file, use `ffmpeg -i song.mp3 -lavfi showspectrum=color=magma:slide=1 spectrogram.mp4`. To generate a waveform, use `ffmpeg -i song.mp3 -filter_complex "[0:a]showwaves=s=1280x240:mode=cline:colors=white[v]" -map "[v]" -map 0:a -c:v libx264 -crf 30 -pix_fmt yuv420p waveform.mp4`. - I updated the TTS (text-to-speech) costs across Gemini and OpenAI at https://github.com/sanand0/openai-tts-cost. My current favorite (value for money) is Gemini 2.5 Flash Preview TTS. Good emotions, low price, and a single request can deliver a multi-voice podcast. Speed: ~25 seconds per minute of audio generated. - Self-driving car mishaps. The exceptions that prove the rule (that autonomous vehicles are safer than human drivers). [#](https://gemini.google.com/u/2/app/195e3d3fa74368fd) - **Waymo & The Gun Shootout:** A driverless Waymo taxi in Los Angeles drove straight through an active police standoff, passing mere feet from a suspect being held at gunpoint while officers shouted at the car to stop. [Source](https://www.tmz.com/2025/12/01/waymo-police-standoff/) - **Tesla & The Horse Carriage:** It was a horse-drawn carriage in Switzerland. The Tesla’s computer became "bamboozled," rapidly misidentifying the cart as a truck, then a car, then a pedestrian, because it had likely never been trained on animal-drawn vehicles. [Source](https://autofile.co.nz/horse-drawn-carriage-confuses-tesla-) - **The "Wet Cement" Trap:** A Cruise robotaxi in San Francisco drove directly into a patch of freshly poured wet concrete at a construction site and got hopelessly stuck, requiring workers to pull it out. [Source](https://www.google.com/search?q=https://www.sfgate.com/tech/article/cruise-stuck-wet-concrete-san-francisco-18297946.php) - **The Moon is a Traffic Light:** A Tesla driver discovered that his car kept slamming on the brakes on the highway because the autopilot camera was confusing the bright yellow moon for a yellow traffic light. [Source](https://futurism.com/the-byte/tesla-autopilot-mistakes-moon-traffic-light) - **The 4 AM Honking Ritual:** Residents in a San Francisco neighborhood were kept awake for weeks because a fleet of Waymo taxis gathered in a parking lot every night and started honking at each other while trying to park. [Source](https://www.google.com/search?q=https://www.theverge.com/2024/8/12/24219080/waymo-san-francisco-parking-lot-honking) - **Stopping for Whoppers:** Tesla owners reported their cars were reading "Burger King" signs on the side of the road as "Stop" signs and abruptly braking, a glitch the fast-food chain quickly turned into a marketing campaign. [Source](https://www.google.com/search?q=https://www.foxnews.com/auto/burger-king-tesla-autopilot-stop-signs) - **The Robotaxi "Mating Ritual":** A group of about 20 Cruise robotaxis lost connection to their servers simultaneously and simply stopped in the middle of a busy San Francisco street, creating a massive traffic jam that humans had to manually clear. [Source](https://www.google.com/search?q=https://www.wired.com/story/cruise-robotaxi-self-driving-cars-san-francisco-traffic-stall/) - **Trapped by Cones:** A Waymo taxi in Arizona was defeated by a set of construction cones, fleeing from them into oncoming traffic lanes and eventually getting stuck, forcing the passenger to flee the "confused" vehicle. [Source](https://www.google.com/search?q=https://www.theverge.com/2021/5/14/22436272/waymo-driverless-van-stuck-blocked-road-arizona) - **Defeated by a T-Shirt:** A distinct vulnerability was found where self-driving cars could be tricked into slamming on the brakes simply by a pedestrian wearing a T-shirt with a "Stop" sign printed on it. [Source](https://www.google.com/search?q=https://arstechnica.com/cars/2023/02/t-shirt-with-stop-sign-print-fools-self-driving-cars/) - Roblox is the #1 game. Sadly, there's no official Linux support. [CloudFlare 2025 Report](https://blog.cloudflare.com/radar-2025-year-in-review-internet-services/) - ⭐ [Ty](https://astral.sh/blog/ty), Astral's type checker, is _fantastic_! It shows the type of every variable inline. A great incentive to explicitly type stuff in Python. Lots more to explore. I switched from Pylance to the [ty VS Code extension](https://marketplace.visualstudio.com/items?itemName=astral-sh.ty). - [`npx -y npm-check-updates`](https://github.com/raineorshine/npm-check-updates) tells you the latest versions of your `package.json` dependencies, including major version updates. - How to think differently. [#](https://gemini.google.com/u/2/app/b3f00b87146cecc3) [#](https://chatgpt.com/c/6940eb78-ae2c-8320-9e4e-ebab68d3b2e2) - **Introspect:** List assumptions & taboos. Write a falsifier. Beginner's mindset - **Mental models:** First principles, inversion, base rates, lateral thinking, multiple options, "what would have to be true", ... - **Empathy:** Debate FOR opposition. Swap roles (competitor, auditor, 12-year old, future-you, ...) - **Environment:** Different context (place, media, people...). New constraints (time, budget, time horizon, ...) - I'm surprised that Edge's [Read Aloud](https://www.microsoft.com/en-us/edge/features/read-aloud) sounds more natural than [EleventReader](https://elevenreader.io/). Read Aloud is one of the main reasons I'm using Edge, but I hadn't realized it was that good. - [Why We Think](https://lilianweng.github.io/posts/2025-05-01-thinking/) has interesting insights on scaling from feedback: [#](https://claude.ai/chat/5dbf8fe0-081a-4d4a-926d-b4d74846ec85) - Summary: **Give models a feedback environment unbiased by their reasoning.** - There are basically two approaches: parallel and sequential. - Parallel is simpler. Generate a bunch of different solutions and pick the best one. Like having multiple people solve the same problem independently, then going with whoever got the right answer. - Sequential is trickier. You generate a solution, then ask the model to critique it and try again. This sounds good in theory but is surprisingly hard to get right. - The problem is models aren't naturally good at self-correction. Left to their own devices, they'll often make things worse. They'll change correct answers to incorrect ones. Or they'll just superficially reword their first answer without fixing anything. - To make self-correction work, you need external feedback. A unit test that fails. A ground truth to compare against. Something outside the model's own judgment. - When you get it right though, sequential revision can be powerful. You're not just sampling from the model's distribution anymore. You're searching through it, iterating toward better answers. - But there's a trap. If you start optimizing directly on the reasoning traces—rewarding "good reasoning" as a goal in itself—the model learns to game it. It'll hide its real thought process and show you what you want to see. - This is why the DeepSeek team gave up on process reward models. They tried rewarding intermediate reasoning steps, but it led to reward hacking. The model would generate reasoning that looked good to the reward model while doing something completely different. - [A Pragmatic View of AI Personhood](https://arxiv.org/abs/2510.26396) was rewritten in Tim Urban's style, para-by-para, by [ChatGPT](https://chatgpt.com/share/693aa6ff-b2f4-800c-8f44-b061419cd6ff): - AI having feelings is irrelevant. Does a design increase conflict, manipulation, or suffering among humans? If so, regulate that - limit certain kinds of anthropomorphic design, tie "rights" for AIs to strict anti-manipulation constraints, etc. - **AI can act after owners vanish**. Pragmatically, you sometimes need to bite the bullet and say: "Okay, this thing itself is going to be treated as a legal person in these specific ways, so we can actually regulate and sanction it." - Corporations are "slow AIs" already — optimizing for growth without ethics. - **Slaves had a fund**. If the slave caused harm, the owner's liability could be capped at that fund. Modern equivalent for AI: Agents must maintain locked capital or insurance. Victims are compensated from that pool. If the pool runs out; they lose their license to operate. This gives sanctions teeth: the AI (or its backers) actually have something to lose. - Require AIs to register before they can do economically important things. No title > no access to key platforms, payment rails, or official functions. - Expanding personhood to non-humans sounds nice - more compassion, more care, more inclusion. But authenticity becomes a new asset. Humans and AIs will both want authenticity tokens. Poor will sell biometric credentials to rich, creating an authenticity social class. Your dignity as a person gets replaced by your usefulness as a key. Make it illegal and practically very hard to sell / rent out your humanity. - "When people now talk about error, they tend to think of bias as an explanation. One of the major limitations on human performance is not bias, it is just noise. In fact, most of the errors that people make are better viewed as random noise, and there is an awful lot of it. Even when the algorithm does not do very well, humans do so poorly and are so noisy that, just by removing the noise, you can do better than people. We are narrow thinkers, we are noisy thinkers, and it is very easy to improve upon us. I do not think that there is very much that we can do that computer will not eventually be programmed to do." [Kahnemann](https://www.nber.org/system/files/chapters/c14016/c14016.pdf) - Notes from [One Year With ChatGPT Pro as a First Hire](https://www.soundformovement.com/chatgpt-pro-as-first-hire) - **Each day I start a new Pro chat that will run for that entire day**. I treat it as a colleague. I speak or type in whatever I am thinking about, including business problems, creative questions, experiments that worked or failed and feelings about particular decisions. I wear noise canceling earbuds and often run piano technique while the model is thinking. I listen to its response using the native “Read Aloud” feature, again while practicing, and stop to make notes in a physical notebook to collect inspiration. At the end of the day I ask that Pro model to summarize everything from that chat along with the notes I give it from my notebook, and **that summary becomes our first prompt of the next day**. - Standard Voice Mode (SVM) can do things that Advanced Voice Mode (AVM) cannot and vice versa.SVM feels like it wants to talk forever, while AVM feels like it wants to get off the phone. - Projects became the container for my daily Pro chats. I pull chats, notes and other files into project folders so I can reference them as static context. - My scheduled tasks collection today consists of weekly lessons in math, ML and DL, design, market analysis and regular assessments of the UI and UX and copy on my company’s website. - I let memory accumulate, then once a week I pruned it manually, removing entries that were no longer useful so that new memories could form. - Connecting the ChatGPT macOS app to my terminal, using the Working with Apps feature, lets the Pro models essentially collaborate with Codex. Practicing collaborative context between these high end models fractals outward into a myriad of productive paths. I highly recommend exploring with 5.1 Pro connected to 5.1-Codex-Max (Very High) in a terminal. Tell Codex-5.1 that you have a buddy working with you today that can offer suggestions and review the work it does as we go. Then tell 5.1 Pro that you have a buddy that is working with you today and can apply any of the code changes we decide on. This is another form of “context priming” where I “set the scene” before jumping in. - Coding agents only need a bash tool. The rest is buildable. The only addition might be a fuzzy search / replace tool. [What I learned building an opinionated and minimal coding agent](https://mariozechner.at/posts/2025-11-30-pi-coding-agent/) - Sources of model data: https://models.dev/, https://openrouter.ai/, llm-pricing