--- title: Things I Learned - 10 Nov 2024 date: 2024-11-10T00:00:00+00:00 categories: - til description: I explored OpenFreeMap and Zapier Actions for GPTs, then researched LLM vision use cases in energy and manufacturing. I also looked into IBM's Docling for PDF conversion and Hamel Husain's framework for building effective LLM-as-a-judge systems. keywords: [openfreemap, llm vision, icd-10, docling, llm-as-a-judge, recraft.ai, omniparser, zapier] --- This week, I learned: - [OpenFreeMap](https://openfreemap.org/) is a free embeddable OpenStreetMap tile server. You can use [MapLibre GL](https://maplibre.org/) (more features) or Leaflet (simpler) to render it. It offers styling and self-hosting. - [Zapier Actions](https://actions.zapier.com/) are an easy way to set up custom actions like GMail / Google Calendar APIs for GPTs, since [GPTs' callback URLs keep changing](https://community.openai.com/t/gpt-oauth-callback-url-keeps-changing/493236). But they fail often, and don't work on mobile. At least for me. - LLM Vision Use Cases in manufacturing and earth sciences (via Shivku) - Automated geoscience image descriptions [Ref](https://www.linkedin.com/posts/paulhcleverley_geosciences-earthscience-geology-activity-7254037937674240000-pQab/) - Interpret Wind Turbine photos and charts, construction monitoring, equipment maintenance & charts [Ref](https://www.linkedin.com/pulse/vision-ai-energy-use-cases-copilot-wind-siting-impact-kalyanaraman-wqe7c/) - Forecast weather based on cloud photos! [Ref](https://www.linkedin.com/pulse/cloud-typing-local-weather-forecasting-using-chatgpt-cam-shivkumar-1hhkc/) - Analyze thermal image of solar panels, electroluminescence images for warranty claims, ROI estimates from Google Sunroof rooftop images [Ref](https://www.linkedin.com/pulse/vision-ai-energy-use-cases-part-1-copilot-solar-pv-kalyanaraman-ccszc/) - Corrosion detection in electricity towers, turbines, storage tanks, penstock. Interpret non-destructive test images [Ref](https://www.linkedin.com/pulse/vision-ai-energy-use-cases-copilot-corrosion-shivkumar-kalyanaraman-onuic/) - Google counts auto-completion when saying "25% of all the code is written by AI at Google". "It's a helpful productivity tool but it's not doing any engineering at all. It's probably about as good, maybe slightly worse, than Copilot." [YCombinator](https://news.ycombinator.com/item?id=42002212) - Workflow for AI video creation: Use Meshcapade (meshcapade.com) to generate body movement of a 3D-rendered character. Pass that video to Runway's video-to-video model to generate any visual. Add music from Suno [Ref](https://www.linkedin.com/posts/peter-gostev_i-discovered-a-really-cool-new-workflow-for-activity-7260003053771141120-DJpS) - Someone sorted the X and Y columns independently for regression. [Ref](https://stats.stackexchange.com/q/185507) - Android keyboard learning only sends model changes back to server and not local keywords. Model changes are aggregated! [Ref](https://chatgpt.com/share/672d6d6d-46a0-800c-a130-c689f5ebc0b7) - Here is a prompt for audio transcription using Gemini. [Ref](https://gist.github.com/rajivsinclair/8fb0371f6eda25f9e5cc515cd77abd62) - Transcription: Accurately transcribe the audio clip in the original language. Include all spoken words, fillers, slang, colloquialisms, and any code-switching instances. Pay attention to dialects and regional variations common among immigrant communities. Do your best to capture the speech accurately, and flag any unintelligible portions with `[inaudible]`. - Translation: Translate the transcription into English. Preserve the original meaning, context, idiomatic expressions, and cultural references. Ensure that nuances and subtleties are accurately conveyed. - Capture Vocal Nuances: Note vocal cues such as tone, pitch, pacing, emphasis, and emotional expressions that may influence the message. These cues are critical for understanding intent and potential impact. - Here are some approaches to large-scale classification of medical codes. [ChatGPT](https://chatgpt.com/share/672dd476-7694-800c-a150-f3de912788ef) - Fine-Tuning LLMs on Medical Data: Enhance LLMs by training them on medical datasets, such as clinical notes and discharge summaries, to improve their understanding of medical terminology and context. - Multi-Agent Frameworks: Implement a multi-agent system that simulates real-world coding processes with distinct roles (e.g., patient, physician, coder, reviewer, adjuster). Each agent utilizes an LLM to perform specific functions, enhancing interpretability and reliability. [ArXiv](https://arxiv.org/abs/2406.15363) - Retrieve-Rank Systems: Develop a two-stage system where the LLM first retrieves potential ICD-10 codes and then ranks them based on relevance, improving precision in code assignment. [ArXiv](https://arxiv.org/abs/2407.12849) - Embedding-Based Approaches: Use LLMs to generate embeddings for ICD-10 codes and medical texts, facilitating the matching of texts to appropriate codes through similarity measures. [GitHub](https://github.com/kaneplusplus/icd-10-cm-embedding) - Hierarchical Classification: Leverage the hierarchical structure of ICD-10 codes by first classifying texts into broader categories before assigning specific codes, reducing complexity and improving accuracy. [ArXiv](https://arxiv.org/abs/2310.06552) - Two-Stage Verification Models: Combine LLMs with verification models, such as Long Short-Term Memory (LSTM) networks, to validate and refine the codes suggested by the LLM, balancing recall and precision. [ArXiv](https://arxiv.org/abs/2311.13735) - Also, a mixture of models approach might work. Feed any existing NLP model / rules as a second opinion. - GraphRAG is better if data is naturally graph-structured. Else, it's slow and fills up the context window with even vaguely related stuff. Vigneshbabu, AMAT. - ChatGPT for Windows desktop supports real-time voice and a global shortcut (Alt Space). - [uithub](https://uithub.com) converts GitHub repos to Markdown. Just replace "g" in "github.com/..." with "u". [Example](https://uithub.com/gramener/asyncllm) - WebContainers are a thing and Bolt.new uses them! - [Docling](https://github.com/DS4SD/docling) by IBM converts PDF, DOCX, etc. to Markdown. Like [PyMuPDF4LLM](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/) but better. - Check out [Loom](https://www.loom.com/) and [Cleanshot](https://cleanshot.com/) are the recommended tools for screen recording and screenshotting. But Loom is paid and Cleanshot is Mac only. - The Rubik's cube has a Hamiltonian cycle through every one of its 43 quintillion states. [Ref](https://bruce.cubing.net/ham333/rubikhamiltonexplanation.html) - [OmniParser](https://microsoft.github.io/OmniParser/) is great at parsing screenshots and identifying bounding boxes. - [Recraft.ai](https://www.recraft.ai/) is currently SOTA in text to image. It's fairly impressive and could be a good alternative to Figma. - [Zed.dev](https://zed.dev/) is an AI code editor by the creators of Atom. It's written in Rust and is blazing fast. It has native AI integration. - Artificial Analysis has a bunch of new leaderboards and arenas. - Open AI TTS leads the [TTS Leaderboard](https://artificialanalysis.ai/text-to-speech/arena?tab=Leaderboard). ElevenLabs is a bit behind. - Recraft V3 > Flux 1.1 leads [Text to Image Leaderboard](https://artificialanalysis.ai/text-to-image/arena?tab=Leaderboard) - [Hertz-Dev](https://github.com/Standard-Intelligence/hertz-dev) is an open source realtime voice chat model. But it doesn't fit in Google Colab T4's RAM - Chain of Thought reduces performance where thinking makes humans worse. [Ref](https://arxiv.org/abs/2410.21333). Specifically: - Artificial grammar learning - Facial recognition - Classifying data that has exceptions - [Creating a LLM-as-a-Judge That Drives Business Results](https://hamel.dev/blog/posts/llm-judge/) by Hamel Husain. - Get THE domain expert (or approver) as the tester. - Create a dataset that is DIVERSE. - Covers EACH combination of: - Features - Scenarios: e.g. multiple matches, no match, ambiguous request, invalid/incomplete input, unsupported feature, system error - Persona: e.g. new user, expert user, non-native speaker, busy professional, technophobe, elderly user - Generate data using existing data + synthetic data for each SPECIFIC combination of the above - Evaluate based only on PASS/FAIL with a CRITIQUE detailed enough for a new employee. Include: - Nuances: Something a failed response did well or a passed response didn't quite do well - Improvements: Suggest how model can improve - Build an SPA to make it easy for the domain expert to review - LLMs can be made to unlearn (copyright material) better by identifying components related to the knowledge to unlearn and applying a larger learning rate to these while leaving other parts unchanged. As opposed to low learning rates for all components. [Ref](https://arxiv.org/abs/2410.16454)