---
title: Things I Learned - 26 Nov 2023
date: 2023-11-26T00:00:00+00:00
categories:
  - til
description: I learned about GPT Vision for calendar extraction, the limitations of RAG versus fine-tuning, and using LlamaIndex for hierarchical retrieval. I also found tools for JSON repair, LLM evaluation metrics, and running local models with Ollama.
keywords: [gpt vision, rag, llamaindex, llmops, ollama, fine-tuning, openrouter, orca 2]
---

This week, I learned:

- This is an interesting GPT Vision API prompt from Simon Willison: "given this event flyer, create a link to add it to my Google Calendar". [Ref](https://www.newsroomrobots.com/p/breaking-down-openais-new-features)
- Quote from Jerry Liu: "GPT 4 is really good at complex reasoning". It's worth exploring what that means.
- Quote from Jerry Liu: "RAG is a hack". It's engineered, not machine learnt, so it's suboptimal. We need an ML way of creating the context. Maybe fine tuning can be a way of CREATING the right context. But RAG can handle deterministic stuff like access control.
- Open AI fine tuning API is not good at memorizing info the way it is exposed. But the Gorilla paper shows that fine tuning can actually memorize well.
- Learn ML optimization approach - LLMOps. Have an evaluation framework with metrics like weights and biases or tensorboard. Helps figure out where fine tuning helps and where RAG does. Soon, this will become important.
- Flat indexing of chunks is not the only way to store embeddings. LlamaIndex allows you to create hierarchies that you can traverse for retrieval
- Agents mimic programming primitives. Switch. While. Call a function. Print.
- [OpenRouter](https://openrouter.ai/docs#models) hosts several models and offers them as APIs!
- [Ragas metrics](https://docs.ragas.io/en/latest/concepts/metrics/index.html) evaluate quality of a RAG pipeline
- [Orca 2](https://www.microsoft.com/en-us/research/blog/orca-2-teaching-small-language-models-how-to-reason/) was trained on different reasoning techniques (e.g. step-by-step) and is as good as larger models
- Embeddings can help just re-rank regular search results. [Ref](https://txt.cohere.com/rerank/)
- Claude 2 Anthropic has a 200K context window but is still crap.
- [Video-Llava](https://github.com/PKU-YuanGroup/Video-LLaVA) can understand videos too.
- [CoVA](https://aclanthology.org/2022.ecnlp-1.11.pdf) scrapes web pages using LLMs and visual information.
- [jsonrepair](https://github.com/josdejong/jsonrepair) can fix JSON fairly well. [jsonformer](https://github.com/1rgs/jsonformer) wraps HuggingFace models to produce JSON. [Ref](https://twitter.com/jerryjliu0/status/1720127061917147376)
- Google has a model garden with lots of pre-trained and trainable models.
- [Gorilla LLM specializes in APPI calls](https://gorilla.cs.berkeley.edu/): Torch Hub, TensorFlow Hub, HuggingFace
- [GPT-4 does not do abstraction at human levels](https://arxiv.org/abs/2311.09247)
- Each of the GPTs / Prompts we create could be like a UNIX command prompt, and become a startup of its own
- [Llava Plus](https://llava-vl.github.io/llava-plus/) extends LlaVA with pre-trained vision models that make image editing better
- [Ollama](https://github.com/jmorganca/ollama) runs local LLMs