--- title: Things I Learned - 31 May 2026 date: 2026-05-31T00:00:00+00:00 categories: - til description: I signed files with OIDC using cosign, queried Wikipedia as Parquet with DuckDB, and explored the PRAGMA model's tabular data tokenization. I also set up local MCP in ChatGPT Developer Mode for unmetered AI coding. keywords: [cosign, duckdb, parquet, pragma model, mcp, tokenization, cal.com api, arxiv2md] --- This week, I learned: - [D-ID](https://www.d-id.com/) is an avatar generator platform like [HeyGen](https://heygen.com/). [Creatify](https://creatify.ai/) and [Synthesia](https://www.synthesia.io/) are a couple of others I heard of. This space seems to be growing. - [cosign](https://github.com/sigstore/cosign) is a CLI that lets you sign and verify any piece of text with a Google, GitHub or Microsoft account. `cosign sign-blob FILE --bundle sign.json` opens a login window and creates a `sign.json` signature. Anyone who has `FILE` and `sign.json` and the email ID can verify via a Google account with `cosign verify-blob FILE --bundle sign.json --certificate-identity $EMAIL --certificate-oidc-issuer https://accounts.google.com`. - [arxiv2md.org](https://arxiv2md.org/) converts arXiv papers to Markdown. [Source](https://github.com/timf34/arxiv2md). [markxiv.org](https://markxiv.org/) claims the same - by just changing the URL - but it ended up reporting an error when I tried this link: . - From Akhilesh Tilotia: So we have someone in our team with initials AS. She made a document which was named vAS. Then I made edits and named it vAT. These docs were in a CoWork folder. I asked Claude to clean up my doc. It created another version for me to review. In its wisdom, it named the file vAU πŸ™‚ - Maybe what a forward-deployed engineer does is enginer AI-native workflows. (This sounded profound when I wrote it down. Not sure if it'll sound as profound tomorrow.) The idea is that the FDE will say, screw existing processes; let me fire up my AI agent and get stuff done; THEN we'll figure out what works, how to optimize it, etc. - The [PRAGMA: Revolut Foundation Model](https://arxiv.org/abs/2604.08649) has some good tokenization ideas for tabular data. Create your own token space with `key–value–time` tokenization - to retain field information. Bucketize numbers by percentile, preserving magnitude/ordering that subword tokenization destroys. Encode time both as log-seconds _and_ as cyclical calendar features. - Codex uses the Alt + Up Arrow key to edit queued commands, but on the VS Code terminal, this key binding is not sent to the terminal. Enable the `terminal.integrated.sendKeybindingsToShell` setting to send it to the terminal, hence Codex. - Based on this [catalog](https://chatgpt.com/share/6a16dfd6-bd70-83ec-807a-646366ba9a99) on "universal foods", here's what I 🟒 like, am 🟑 neutral, πŸ”΄ dislike, 🟣 must try, and will ⚫ skip. - Universal favorites: 🟒 pizza, 🟒 fried potatoes/chicken, 🟑 dumplings, 🟒 ice cream. - Universal comfort foods: 🟒 khichdi, 🟑 congee, 🟑 dal-rice, 🟑 risotto, 🟑 ramen, 🟒 pho, ⚫ chicken noodle soup, πŸ”΄ rice porridge, 🟑 mac-and-cheese, πŸ”΄ mashed potato, 🟣 polenta, 🟒 oatmeal, 🟣 Japanese curry rice. - Acquired tastes that convert most: 🟑 coffee, 🟒 tea, 🟑 dark chocolate, 🟒 mild fermented dairy, 🟒 pickles, 🟒 olives, 🟣 kimchi, 🟣 miso, 🟒 mild chili dishes. - Acquired tastes that have cult devotion: 🟣 durian, 🟣 natto, 🟣 stinky tofu, ⚫ fermented fish, ⚫ hΓ‘karl, 🟒 very funky blue cheese, ⚫ offal. - [OceanoPDF](https://oceanofpdf.com/) seems like a good place to download ePubs of books. - The entire Wikipedia is available as a [Parquet file](https://huggingface.co/datasets/wikimedia/structured-wikipedia). You can query it like `duckdb -c "FROM 'hf://datasets/wikimedia/structured-wikipedia/enwiki/data/*.parquet' LIMIT 5"`. The English version has 35 GB, 7.6 million articles, and you're better off downloading it rather than running analyses remotely. - When you receive a Calendly link of the form `https://cal.com/USER/EVENT` you can fetch the available slots via `curl -H 'cal-api-version: 2024-09-04' 'https://api.cal.com/v2/slots?eventTypeSlug=EVENT&username=USER&start=2026-05-25&end=2026-06-01&timeZone=Asia/Singapore&format=range'`. Useful to automate good meeting-slot selection. - "Reference saved memories" in ChatGPT is different from "Reference chat history" as per [OpenAI](https://help.openai.com/en/articles/8590148-memory-faq). In [Developer Mode](https://help.openai.com/en/articles/12584461-developer-mode-and-mcp-apps-in-chatgpt), memory is turned off, but not chat history. I confirmed that I can access past conversations in Developer Mode. It might be a privacy concern for others, but for me, this is singularly useful, because I can use ChatGPT with [Local MCP](https://www.s-anand.net/blog/how-i-use-local-mcp/) effectively getting a non-metered AI coding agent. - Seems GPT-5.2 reaches expert level in peer review: 45 scientists took 469 hours evaluating human & AI reviews on 82 papers. "Surprisingly, current AI reviewers are competitive even with the top-rated reviewers in Nature’s official peer review..." though not without weaknesses, so use AI + humans. [On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists](https://arxiv.org/abs/2605.20668) via [Ethan Mollick](https://bsky.app/profile/emollick.bsky.social/post/3mmf2ano3ik27)