--- title: Things I Learned - 14 Dec 2025 date: 2025-12-14T00:00:00+00:00 categories: - til description: I learned why expert personas don't improve LLM accuracy, explored new AI insurance products, and developed a workflow to turn constraints into opportunities. I also looked into architecture advice processes and Zillow's algorithmic real estate failure. keywords: [llm prompting, ai insurance, zillow offers, constraints, pglite, software architecture, linguistics, machine learning] --- This week, I learned: - **Zillow Offers**, the company’s "iBuying" arm, which was shut down in November 2021 after losing hundreds of millions of dollars. The core failure was not just an algorithmic error, but a fundamental misunderstanding of the limits of machine learning in high-stakes, low-frequency trading environments like real estate. Zillow relied on its "Zestimate" algorithm to predict future home prices and make instant cash offers, but the model failed to accurately account for real-time market volatility and "adverse selection"—savvy homeowners sold their properties to Zillow when the algorithm overvalued them, but kept them when the algorithm undervalued them. This left Zillow holding thousands of homes it had overpaid for and could not profitably resell, forcing a $304 million write-down and the layoff of 25% of its workforce. [Zillow Q3 2021 Shareholder Letter (PDF)](https://s24.q4cdn.com/723050407/files/doc%5Ffinancials/2021/q3/Zillow-Group-Q3'21-Shareholder-Letter.pdf) [#](https://gemini.google.com/u/2/app/99eac64fd1a42065) - There're a good number of AI insurance products in the market. [#](https://chatgpt.com/c/693d1d22-8260-8322-9994-58e7bcfeebe7) - [Munich Re aiSure](https://www.munichre.com/en/solutions/for-industry-clients/insure-ai.html) - for AI vendors _and_ companies deploying AI; can cover business losses (like lost revenue / business interruption) and legal damages when AI performance errors (incl. hallucinations) cause harm. - [Munich Re aiSelf](https://www.munichre.com/en/solutions/for-industry-clients/insure-ai/ai-self.html) - for teams using self-built or bought ML models; helps cover the financial downside when models underperform or drift over time. - [Munich Re aiSure - General Liability](https://www.munichre.com/en/solutions/for-industry-clients/insure-ai/faq.html) - covers damages and financial losses from lawsuits (e.g., claims that AI decisions were biased/discriminatory). - [Armilla Insured (AI Liability Insurance)](https://www.armilla.ai/ai-insurance) - affirmative AI liability cover (Lloyd's coverholder; partners include Chaucer) that can cover legal defense costs, settlements, and third-party claims when an AI model underperforms. - [Armilla + Chaucer standalone AI liability (announcement)](https://www.chaucergroup.com/news/press-release-chaucer-co-develops-new-ai-insurance-product-with-armilla-ai) - focused on "mechanical underperformance" (incl. hallucinations and model drift) and the liability that follows. - [AXA XL GenAI Endorsement for CyberRiskConnect](https://axaxl.com/press-releases/axa-xl-unveils-new-cyber-insurance-extending-coverage-to-help-businesses-manage-emerging-gen-ai-risks) - add-on to cyber insurance for companies building their own GenAI; covers things like data poisoning, copyright/usage-rights mistakes, and AI-regulatory violations. - [Coalition Affirmative AI Endorsement](https://www.coalitioninc.com/announcements/coalition-adds-new-affirmative-ai-endorsement-to-cyber-policies) - clarifies cyber coverage applies when AI causes a security failure, and extends funds-transfer-fraud triggers to deepfake-based instructions. - [Coalition Deepfake Response Endorsement](https://www.coalitioninc.com/au/announcements/au-coalition-adds-deepfake-response-endorsement) - adds response support for deepfake incidents (technical analysis + legal + reputational help), not just "classic hacking." - [Tokio Marine Kiln Technology Errors & Omissions](https://www.tmkiln.com/our-products/technology-errors-omissions/) - tech E&O with _generative AI coverage available by endorsement_ (aimed at software/SaaS/tech services). - [Tokio Marine Kiln Cyber Ctrl suite](https://www.tmkiln.com/news-insights/news/tmk-launches-next-generation-cyber-ctrl-suite/) - cyber/tech cover where AI-related add-ons can include AI regulatory proceedings, data contamination, and "LLM hijacking." - [Hiscox Technology PI (UK) - AI clause](https://www.hiscox.co.uk/sites/default/files/documents/2025-06/PIC%20Combined%20summary%20of%20change%20-%20technology%20-%200525.pdf) - explicitly covers client claims arising from your use of AI (incl. genAI) as part of the services you deliver. - A key lesson from [Who Validates the Validators](https://arxiv.org/abs/2404.12272) is that we learn our preferences as we evaluate. So make it cheap to evaluate (create outputs) AND **cheap to revise criteria**. - Cookies taste _wonderful_ when eaten hot. - ⭐ **Constraints as opportunities**. On long flights, I read more since I'm less distracted by guilt ("Should I answer email or code instead of wasting time?") or FOMO ("Let's click that link") since I have no choice. Setting aside "quiet time" doesn't work as well, since I have more choice. This constraint (no Internet) became an opportunity (reading time). I knew this before-hand, but had to _experience_ it to appreciate it, and _acknowledge_ it consciously to realize it. That takes repeated (2+) trials and reflection. A workflow to convert constraints to opportunities could be: - List constraints. (Like fish in water, we aren't used to thinking of constraints as constraints. Also, this means more constraints => more latent opportunities!) - List opportunities they offer. (Creative prompting helps; reflecting on the answers helps more.) - Try any 2+ times. (Gives room to settle in.) - Document learnings. (Explicit reflection is better than implicit awareness.) - Notes from [Thoughtworks Radar, Apr 2025](https://www.thoughtworks.com/content/dam/thoughtworks/documents/radar/2025/04/tr_technology_radar_vol_32_en.pdf) - **Architecture advice beats architecture review**. Architecture Review Boards hinder workflow. An architectural advice process (anyone makes architectural decisions, taking advice from experts, logging in Architecture Decision Records) works better. - [VectorChord](https://github.com/tensorchord/pgvecto.rs) is a faster pgvector alternative. - "Learning is not the product of teaching. Learning is the product of the activity of learners." -- John Holt - Music labels never became streaming platforms themselves. The real money is in concerts. Streaming just makes you famous enough to book gigs. But movies/TV shows are _far_ more expensive to produce than music. So streaming platforms invest in content (Netflix, Apple) and studios stream (HBO, Disney) [Claude](https://claude.ai/share/aca9320c-799f-47e6-ab72-65cd9d10da17) - Notes from [Better Ways to Build Self-Improving AI Agents](https://yoheinakajima.com/better-ways-to-build-self-improving-ai-agents/) - Quotes from [Life is more than an engineering problem](https://lareviewofbooks.org/article/life-is-more-than-an-engineering-problem/), interview with author Ted Chiang. - **Magic is intent-centric**. "Magic means that ... the universe responds to your intentions in a way that the laws of physics as we understand them don’t." - **LLM reasoning is a weak analogy**. “My liver was running this old program, but all I needed to do was update the software and now my liver is functioning much better, even though the hardware is the same.” No one says that. It’s not a useful way of thinking about the liver, and it is not a useful way of thinking about the brain either. - **Art won't die**. Art is all about context. It’s not an activity like tightening bolts, where I don’t really care whether someone used a conventional wrench or a pneumatic wrench, as long as the bolts are tight. - **Alignment may not happen**. When corporations behave badly, should we consider that an alignment problem? But why do large corporations behave so much worse than most of the people who work for them? And could that be fixed by solving a math problem? I don’t think so. - **LLM relationships are different from human**. ... people have their own preferences, while things do not; you do them harm because you are ignoring their preferences. (Companies) might create the illusion that AI systems have preferences. - .. it’s theoretically possible for us to build digital entities that have subjective experience. - Notes from [Developing our position on AI](https://www.recurse.com/blog/191-developing-our-position-on-ai) by Recurse Center: - Learning happens at the edge of competence. AI has a moving jagged edge, so constantly **re-try your impossibility list**. - Learning happens on what you care about. Use AI to **expand your agency** (by complementing or deepening), not replace it. - Learning generously means being open to different perspectives, without judgement or dogma. Try **new perspectives**. - ⭐ 'We tested one of the most common prompting techniques: giving the AI a persona to make it more accurate We found that telling the AI "you are a great physicist" doesn't make it significantly more accurate at answering physics questions, nor does "you are a lawyer" make it worse. This doesn't mean that personas can't be useful - for example, they change how the AI answers questions, the format of output, and maybe other factors as well.' [Prompting Science REport 4: Playing Pretend: Expert Personas Don't Improve Factual Accuracy](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5879722) - If YouTube embeds fail with an "Error 153 View player configuration error", it's because the server probably has a `Referer-Policy: same-origin` and needs to switch to `Referer-Policy: strict-origin-when-cross-origin`. [Simon Willison](https://simonwillison.net/2025/Dec/1/youtube-embed-153-error/) - Adding a `[dependency-groups]` section to `pyproject.toml` with `dev = ["pytest"]` ensures that pytest is automatically installed by `uv` because [`dev` is a default group](https://docs.astral.sh/uv/concepts/projects/dependencies/#default-groups). [Simon Willison](https://til.simonwillison.net/uv/dependency-groups) - [CloudFlare Python Workers has full Pyodide support](https://blog.cloudflare.com/python-workers-advancements/). That means most Python apps will now run on CloudFlare Workers, with low latency worldwide. This is a big deal. - Smart contracts are programs that run on blockchains like Ethereum, e.g. to convert currencies, lend/borrow, buy NFTs, etc. These may contain bugs. Anthropic built a benchmark of real smart contracts with known bugs, had agents exploit them, and simulated $550 mn in theft. They also nade $3.5K exploiting real bugs - at a cost of $3.5K. So AI agents are currently at break-even for crypto-theft. [Anthropic](https://red.anthropic.com/2025/smart-contracts/) [#](https://claude.ai/chat/ec496a24-6827-4ecd-a534-2e4335e5e453) - Notes from Cory Doctorow's summary of [The Reverse Centaur's Guide to Criticizing AI](https://pluralistic.net/2025/12/05/pop-that-bubble/): [#](https://claude.ai/chat/4d9c0f62-5339-4c33-82b3-89f8711557be) - When tech monopolies saturate their markets, their P/E collapses, reducing share value. This incentivizes bubbles. - Automation blindness negates human-in-the-loop. When AI makes rare mistakes, humans don't catch them. TSA misses guns, not water bottles. - AI doesn't need to do your job. The AI salesman just needs to convince your boss it can, especially senior jobs. - Reference letters from professors used to signal value since they were hard to write, so professors would do it only for good students. - Copyright expansion and regulation will likely benefit corporates, not labor. - US Copyright Office making AI content non-copyrightable means corporates NEED labor. Else every AI work goes to public domain. - There is no strong evidence yet that Neuro-Linguistic Programming (NLP) works broadly ([ChatGPT](https://chatgpt.com/s/t_6936c3d669ac8191a4cb25d2bf7643ba)). Some NLP techniques help sometimes, but no more than other established techniques (goal-setting, visualization, etc.) [ChatGPT](https://chatgpt.com/s/t_6936c7a58224819192a0c03475b2d19b) - ⭐ Just repeating a statement makes it seem truer because the brain finds it familiar, hence easier to process. This seems well-established research. [The Truth about Truth](https://claude.ai/share/d2278709-62f9-4d8e-97a4-0ce0207d2be5) - [PGlite](https://pglite.dev/) is a WASM-based Postgres implementation. It's ~3MB. You can embed it in the browser, NodeJS, Deno, etc. It has plugin support, including pgvector. - Pejoration is when words acquire negative connotations. Euphimism escalation is another term for it. Third World → developing countries → emerging markets → Global South. Old → elderly → senior citizen → older adult. Lunatic → insane → mentally ill → mentally challenged. Janitor → custodian → sanitation engineer → facilities maintenance specialist. The opposite is amelioration. Minister moved from servant → servant of church → government official. marshal: horse-servant → horse-officer → senior military officer. Knight: servant → armed retainer → mounted warrior → knighthood honor. [#](https://claude.ai/chat/0355927d-c658-41a8-b0ff-0f8e2f345626) [#](https://chatgpt.com/c/69356e6a-1ee8-8321-b963-a9474113c715) - [OmniDocBench 1.5](https://github.com/opendatalab/OmniDocBench) is a benchmark for parsing realistic PDFs. [Gemini 3 Pro](https://blog.google/technology/developers/gemini-3-pro-vision/) does well on the list among the commercial LLMs. PaddleOCR-VL (0.9B) tops the benchmarks, overall.