--- title: Things I Learned - 31 Mar 2024 date: 2024-03-31T00:00:00+00:00 categories: - til description: I investigated binary embeddings, empathic AI with Hume.ai, and audio splitting with Spleeter. I failed at building a Rust-based Parquet server and noted distinct corporate attendance patterns while hosting workshops for Gramener and Straive. keywords: [binary embeddings, hume.ai, spleeter, rust, representational engineering, typesense, speaker diarization] --- This week, I learned: - [sqlite-schema-diagram](https://gitlab.com/Screwtapello/sqlite-schema-diagram/) generates schemas for SQLite databases using Graphviz - [TechEmpower web server benchmarks](https://www.techempower.com/benchmarks/) place Rust servers on top - [browser.new](https://browse.new/) is a good example of a browser agent. It slowly but independently does a good job of achieving the result. Example: [What crew is common in Ingrid Bergman - Cary Grant films?](https://browse.new/run/browser_wDHy2vwxIzJFouL) - [twinny](https://github.com/rjmacarthy/twinny) is an open source VC Code Copilot alternative. - [typesense supports embeddings natively](https://hn-comments-search.typesense.org/). - [Binary embeddings are good enough](https://blog.pgvecto.rs/my-binary-vector-search-is-better-than-your-fp32-vectors). Cohere releases [binary embeddings](https://txt.cohere.com/int8-binary-embeddings/). - [Extract.langchain.com](https://extract.langchain.com/) is a poor early interface to featurize [unstructured.io](https://unstructured.io/) - [Hume.ai](https://www.hume.ai/) offers voice emotion API and emotion-based conversational responses. An empathic AI. - Rust is non-trivial. Inspired by [We are under DDoS attack and we do nothing](https://tableplus.com/blog/2024/03/how-we-deal-with-ddos.html), I ["wrote"](https://chat.openai.com/share/ec5f3d23-06b3-40a8-a965-ab466d214802) a small binary that serves a parquet file as JSON. It failed and I couldn't fix it. - [spleeter](https://github.com/deezer/spleeter) is a better alternative to demucs. Splits audio into - [pyannote-audio](https://github.com/pyannote/pyannote-audio) does speaker diarization - [uvicorn](https://github.com/encode/uvicorn) is faster than [hypercorn](https://github.com/pgjones/hypercorn) but [hypercorn supports HTTP/2 and HTTP/3](https://pgjones.gitlab.io/quart/tutorials/deployment.html). FastAPI with uvicorn is reasonably fast. - [Representational engineering](https://vgel.me/posts/representation-engineering/) lets you control LLM output based on preference on the fly. - When I set up a training: - On inviting for DuckDB workshop on Sun evening, Gramener starts accepting immediately, Straive doesn't. - Straive has high spread of joining time. When joining Gitlab Pipelines Workshop, Straive starts meeting (e.g. Premlal) many minutes early. Gramener floods in (due to alert). Straive streams in slowly. - Gitlab Pipelines Workshop acceptances: Gramener 47, Straive 100