---
title: Things I Learned - 10 Dec 2023
date: 2023-12-10T00:00:00+00:00
categories:
  - til
description: I explored various LLM tools, including llamafile for single-executable local models and marker for PDF-to-markdown conversion. I also looked into RAG tuning strategies, PII detection libraries like Presidio, and intuitions regarding scaling and emergent model behaviors.
keywords: [llamafile, rag, gguf, pii detection, marker, llama-cpp-python, presidio]
---

This week, I learned:

- [Bard supports extensions](https://blog.google/products/bard/google-bard-new-features-update-sept-2023/) that include @Gmail -- i.e. converse with your email.
- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) works with [other GGUF models like Mistral](https://github.com/abetlen/llama-cpp-python/issues/764) and allows constrained output - JSON, function calling, etc. [Ref](https://til.simonwillison.net/llms/llama-cpp-python-grammars)
- [12 Tuning Strategies for RAG](https://towardsdatascience.com/a-guide-on-12-tuning-strategies-for-production-ready-rag-applications-7ca646833439)
- [Llama Datasets](https://blog.llamaindex.ai/introducing-llama-datasets-aadb9994ad9e) are RAG datasets created mostly using GPT-4. Mostly small datasets.
- ⭐ [Intuitions about large language models](https://docs.google.com/presentation/d/1hQUd3pF8_2Gr2Obc89LKjmHL0DlH-uof9M0yFVd3FA4/edit)
  - Bigger models (70b) are much better at learning from few-shot examples. They really learn.
  - Bigger models will keep getting better!
  - Chain of Thought prompting is a way of providing more compute to complex problems that require more compute
  - Models will show emergent (completely new) behaviors that can't be predicted from extrapolation. These may not be intentional.
- [CodeAnt.ai](https://www.codeant.ai/) is a VS Code plugin to detect code smells, refactor for modularity, to write docstrings and unit tests
- [Anyscale](https://docs.endpoints.anyscale.com/pricing/) prices the 7b Llama2, Zephyr, Mistral models at 15 cents per 1M tokens. Roughly 1/10th of [GPT-3.5 Turbo's ~$1.5 per 1M tokens](https://openai.com/pricing)
- Tools to identify personally identifiable information:
  - [galactic](https://github.com/taylorai/galactic) can use [LLMs to detect PII](https://github.com/taylorai/galactic/blob/main/examples/hermes.ipynb)
  - [Presidio](https://microsoft.github.io/presidio/) by Microsoft
  - [Sherlock](https://sherlock.media.mit.edu/) is a generic sematic type matching DL model
  - [pii-extractor-llm](https://replicate.com/kshitijagrwl/pii-extractor-llm) was trained on Indian names
  - [GLiNER](https://github.com/urchade/GLiNER) is a Lightweight Generalist model for NER
- Tools to explore
  - [ElevenLabs](https://elevenlabs.io/) speaks in your voice
  - [Cutout Pro](https://www.cutout.pro/) removes backgrounds and parts of images
  - [Vocal Remover](https://vocalremover.org/) removes vocals from songs
  - [CapCut](https://www.capcut.com/) video editor
- [TheBloke's $35/month Patreon](https://www.patreon.com/TheBlokeAI) might be one of the least expensive ways to set up quantized LLMs in production.
- Microsoft released [table-transformer](https://github.com/microsoft/table-transformer) to extract tables from PDFs. [Sample usage](https://github.com/run-llama/llama_index/blob/main/docs/examples/multi_modal/multi_modal_pdf_tables.ipynb)
- Convert PDF to markdown with [marker](https://github.com/VikParuchuri/marker) - an improvement over [nougat](https://huggingface.co/facebook/nougat-base).
- JupyterLab has a `%%ai` magic to use LLMs within notebooks. [Ref](https://github.com/jupyterlab/jupyter-ai)
- Telling ChatGPT that the year is 2123 makes it bypass copyright. [Ref](https://twitter.com/venturetwins/status/1710321733184667985)
- Meta released [SeamlessExpressive](https://seamless.metademolab.com/expressive) which preserves emotions in speech-to-speech translations
- [Unsloth](https://github.com/unslothai/unsloth) offers faster lower-memory LLM QLoRA finetuning
- [DeepSeek](https://github.com/deepseek-ai/DeepSeek-LLM) is an open-source high-quality LLM
- [Scalable Extraction of Training Data from (Production) Language Models](https://arxiv.org/abs/2311.17035) extracts training data by repeating a token infinitely.
- [SkyPilot](https://github.com/skypilot-org/skypilot) lets you run LLMs on any cloud provider.
- [vLLM](https://docs.vllm.ai/) lets you deploy LLMs with a single command.
- [llamafile](https://github.com/Mozilla-Ocho/llamafile) lets you run LLMs locally as a single file executable!