---
title: Things I Learned - 06 Oct 2024
date: 2024-10-06T00:00:00+00:00
categories:
  - til
description: I explored ffmpeg on WASM, geocoding systems like Uber’s H3, and embeddable cloud databases like MotherDuck and Turso. I also examined Software 2.0 paradigms, OpenAI’s Realtime API, and the evolving dynamics of AI-assisted coding for different skill levels.
keywords: [ffmpeg, wasm, h3, duckdb, openai realtime api, software 2.0, graph rag, sqlite]
---

This week, I learned:

- [ffmpeg on WASM](https://github.com/ffmpegwasm/ffmpeg.wasm) works but is unstable and hard to use.
  - You can't use it in a CDN without CORS issues, since it loads ffmpeg-core via a worker.
  - It often runs into buffer allocation issues.
- [Exotel](https://exotel.com/) and [Plivo](https://www.plivo.com/) provide voice & SMS services in India (like Twilio). Plivo is more customer friendly.
- [Uber's H3](https://h3geo.org/), [Google's S2](https://github.com/google/s2geometry), and [GeoHash](https://en.wikipedia.org/wiki/Geohash) are geocoding systems.
  - H3 offers uniform cell sizes and better distance measurement
  - S2 offers higher precision (factoring in Earth's curvature) for exact location matches
  - GeoHash is the simplest
- There's a movement towards embeddable databases on the cloud.
  - [MotherDuck](https://motherduck.com/) is hosted DuckDB.
  - [Turso](https://turso.tech/) is hosted SQLite (with local sync, multi-tenant)
  - [StarBase DB](https://starbasedb.com/) is SQLite with an API on top of Cloudflare Durable Objects.
- [Software 2.0](https://karpathy.medium.com/software-2-0-a64152b37c35) by Andrej Karpathy.
  - This is fundamentally altering the programming paradigm by which we iterate on our software, as the teams split in two:
    - the 2.0 programmers (data labelers) edit and grow the datasets, while
    - a few 1.0 programmers maintain and iterate on the surrounding training code infrastructure, analytics, visualizations and labeling interfaces.
  - Adaptive UI ideas:
    - Adaptive Fields: Show only required fields based on what the user field so far.
    - Smart Inputs: Dropdowns and auto-complete based on user's context.
    - Smart Themes: Change font size, contrast, theme guessing the user's age and preferences.
    - Dynamic Menus: Show what they might need to do next. Like Nokia's right button, but using LLMs.
    - Smart Tooltips: Check what the user's doing (delays, confusions, previous clicks, current actions) and show relevant tips.
    - Personalized Layout: Show only the relevant sections of the app. E.g. based on what they're doing.
    - Smart Charts: Create the right chart that solve the user's question.
  - Adaptive Back-end
    - Dynamic APIs: Create endpoints on the fly based on user needs
    - Dynamic Indexing: Create & update indices on the fly based on user needs
    - Dynamic Schema: Create & update schema on the fly based on user needs
    - Dynamic Migration: Migrate to a new database or OS or language as required
    - Dynamic Queries: Create SQL/NoSQL queries to solve the user problem
    - Dynamic RBAC: Figure out who needs permissions and why. Add OR REMOVE access as required
    - Dynamic Logging. Log what's required. Explain why it's logged and what's happening. Fix code that raised the error
    - Dynamic Caching. Cache what's likely to be required. Evict what may not be required. Figure out cache keys.
- [Aider LLM Leaderboards](https://aider.chat/docs/leaderboards/) show which LLMs code better. As of now,
  - o1-preview > claude-3.5 sonnet on code editing
  - claude-3-opus > claude-3.5-sonnet on code refactoring
  - deepseek-coder-v
  - gpt-4o-mini sucks.
- [Jaro-Winkler Distance](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) is a string matching algorithm that weights the start of a string higher.
- Passing the feed of the following to NotebookLLM is a good way to get caught up with news and summaries.
  - A blog / WhatsApp group (e.g. The Generative AI Group, Sithamalli, etc.)
  - A Google Group / mailing list (e.g. genainews, datameet)
  - YouTube channels (e.g. Vertiasium, GitHub)
  - Hacker News top stories
  - Research papers
  - Emails (skipping marketing emails)
- OpenAI Evals and Distillation has a clever design. They just convert filtered history to .JSONL files that can be an input to either.
- [Speak](https://www.speak.com/) is a language learning app based on OpenAI's Realtime API.
- OpenAI's Realtime API can be used in a text-to-text chat mode without needing to send the entire context. If the pricing works out right, this can be far cheaper than sending the entire conversation context. [Ref](https://news.ycombinator.com/item?id=41715725)
- Matching addresses with just embeddings works well. Combine it with simple hard rules. [Ref](https://www.dbreunig.com/2024/09/27/conflating-overture-points-of-interests-with-duckdb-ollama-and-more.html)
- [OpenAI's prompt caching works for images too -- both linked and embedded](https://cookbook.openai.com/examples/prompt_caching101)
- Quotes on Graph RAG from a Generative AI WhatsApp Group.
  - "Damn so literally nobody uses Graph RAG yet. Good to know." ~Sumba
  - "A big four consulting firm uses GraphRAG to retrieve related documents and excerpts from governance and compliance docs." ~Vinayak Hegde (Microsoft)
  - "Graph RAG is expensive and unnecessary in most of the cases." ~Utkarsh Saxena
- ChatGPT's advanced mode includes: "...you can use various regional accents and dialects." [Ref](https://www.reddit.com/r/OpenAI/comments/1fp1fes/the_system_prompt_of_advanced_voice_mode_it_can/) [Source](https://x.com/deedydas/status/1839860410914353225)
  - But the API can "laugh, whisper, and adhere to tone direction." [Ref](https://platform.openai.com/docs/guides/realtime)
- Hume API (INR 6/min) is far cheaper than OpenAI's real-time chat (6c/min input + 24c/min output)
- Devika is an open-source clone of Devin.
- [DuckDB runs inside Pyodide](https://duckdb.org/2024/10/02/pyodide.html)
- [Hungarian Jews have genetic diseases that increase their IQ](https://slatestarcodex.com/2017/05/26/the-atomic-bomb-considered-as-hungarian-high-school-science-fair-project/). Gaucher’s disease, Torsion dystonia.
- [People don't like hard stuff like maths or science, so richer societies have fewer scientists](https://www.thepsmiths.com/p/review-math-from-three-to-seven-by)
- Ethan Mollick feels Claude 3.5 Sonnet is better at style and critiquing blog posts than OpenAI's o1 (which is better at reasoning.)
- News is going to be crazily disrupted again with voice mode. I can just listen to the topic I want
- In Singapore Airlines,
  - You can't wear your seatbelt loose
  - You have to keep the laptop in the pocket in front, not on your lap, during takeoff
  - You can't charge during takeoff
  - They verify if you ask for a veg meal and place a sticker on your seat
- Coders are more likely to edit LLM code. Non-coders don't have that bad habit.
  - [Vaishnavi](https://youtu.be/uuf3-_xYp7k) and [Ranjeet](https://youtu.be/5FZadpAGXb0) edited code
  - [Indal](https://youtu.be/EGbeA-x79tY) and [Koustav](https://youtu.be/2Je37vJhcD4) didn't
- Coders are likely to get more out of an LLM because they know what it can do. But some non-coders will get more out of an LLM because they don't know what it can't do.
  - E.g. [Indal](https://youtu.be/EGbeA-x79tY) trying for a confetti animation, which is hard but do-able
- "You have to put in a lot of work to become productive at AI coding." Simon Willison