{ "cells": [ { "cell_type": "markdown", "id": "6e3190c9-7701-464d-9878-e558d3ce47c5", "metadata": { "id": "6e3190c9-7701-464d-9878-e558d3ce47c5" }, "source": [ "# Full-stack RAG with Jina and LlamaIndex\n", "\n", "This notebook will walk you through setting up a RAG system using [Jina Embeddings v2](https://jina.ai/embeddings/), [LlamaIndex](https://www.llamaindex.ai/), and the [Mixtral Instruct LLM](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1). It accompanies the [article on Jina AI's website](https://jina.ai/news/full-stack-rag-with-jina-and-llamaindex).\n" ] }, { "cell_type": "markdown", "id": "c30c0b9e-3708-47db-9ad5-50b4990d4e65", "metadata": { "id": "c30c0b9e-3708-47db-9ad5-50b4990d4e65" }, "source": [ "## Install LlamaIndex\n", "\n", "This includes LLM and JinaAI-specific packages." ] }, { "cell_type": "code", "execution_count": null, "id": "78e9cc15-ba3f-43b5-950e-7b139daad06f", "metadata": { "id": "78e9cc15-ba3f-43b5-950e-7b139daad06f" }, "outputs": [], "source": [ "!pip install llama-index llama-index-llms-openai llama-index-embeddings-jinaai llama-index-llms-huggingface \"huggingface_hub[inference]\"" ] }, { "cell_type": "markdown", "id": "7cc7042a-02ec-4fd1-a413-abce84b18eb5", "metadata": { "id": "7cc7042a-02ec-4fd1-a413-abce84b18eb5" }, "source": [ "## Set up API keys\n", "\n", "Set up your Jina API key and HuggingFace Inference API token.\n", "\n", "To get a Jina Embeddings key, go to https://jina.ai/embeddings/\n", "\n", "To get a HuggingFace Inference API token, go to https://huggingface.co/settings/tokens\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "fa0bbcaf-783f-46a1-9ccd-6926c01bd13f", "metadata": { "id": "fa0bbcaf-783f-46a1-9ccd-6926c01bd13f" }, "outputs": [], "source": [ "jinaai_api_key = ''\n", "hf_inference_api_key = ''" ] }, { "cell_type": "markdown", "id": "532fdb2a-4622-490a-99b0-e8ce230df615", "metadata": { "id": "532fdb2a-4622-490a-99b0-e8ce230df615" }, "source": [ "## Access Jina Embeddings v2 via the LlamaIndex interface.\n", "\n", "This code creates the LlamaIndex object that manages your connection to the Jina Embeddings v2 API.\n", "\n", "The resulting object is held in the variable `jina_embedding_model`.\n" ] }, { "cell_type": "code", "execution_count": 3, "id": "5e478e2a-96c5-4aa0-b69e-b8b29aa8b451", "metadata": { "id": "5e478e2a-96c5-4aa0-b69e-b8b29aa8b451" }, "outputs": [], "source": [ "from llama_index.embeddings.jinaai import JinaEmbedding\n", "\n", "jina_embedding_model = JinaEmbedding(\n", " api_key=jinaai_api_key,\n", " model=\"jina-embeddings-v2-base-en\",\n", ")" ] }, { "cell_type": "markdown", "id": "1533b529-214e-469e-b351-e97a67d56e22", "metadata": { "id": "1533b529-214e-469e-b351-e97a67d56e22" }, "source": [ "## Access the Mixtral Model via the HuggingFace Inference API\n", "\n", "This code creates a holder for accessing the `mistralai/Mixtral-8x7B-Instruct-v0.1` model via the Hugging Face Inference API. The resulting object is held in the variable `mixtral_llm`." ] }, { "cell_type": "code", "source": [ "from llama_index.llms.huggingface import HuggingFaceInferenceAPI\n", "\n", "mixtral_llm = HuggingFaceInferenceAPI(\n", " model_name=\"mistralai/Mixtral-8x7B-Instruct-v0.1\", token=hf_inference_api_key\n", ")" ], "metadata": { "id": "pIXwBHmnnkwK" }, "id": "pIXwBHmnnkwK", "execution_count": 5, "outputs": [] }, { "cell_type": "markdown", "id": "bb227a69-5470-4c82-a809-d514aac7168b", "metadata": { "id": "bb227a69-5470-4c82-a809-d514aac7168b" }, "source": [ "## Get Data for RAG retrieval\n", "\n", "This code will download the book [_Computers on the Farm_ from Project Gutenberg](https://www.gutenberg.org/ebooks/59316). It will process the book to remove Project Gutenberg headers and footers and break it up into parts by main titles in the text. These parts will be stored in a list variable named `docs`." ] }, { "cell_type": "code", "execution_count": 6, "id": "fc423ce9-680d-4d0c-b392-5242ad2f52a4", "metadata": { "id": "fc423ce9-680d-4d0c-b392-5242ad2f52a4" }, "outputs": [], "source": [ "import urllib.request\n", "from typing import List\n", "from llama_index.core.readers import StringIterableReader\n", "from llama_index.core.schema import Document\n", "\n", "\n", "def load_gutenberg(target_url: str) -> List[Document]:\n", " ret: List[str] = []\n", " buff: str = \"\"\n", " reject: bool = True\n", " for raw_line in urllib.request.urlopen(target_url):\n", " line = raw_line.decode(\"utf-8\")\n", " stripped_line = line.strip()\n", " if reject:\n", " if stripped_line.startswith(\"*** START OF THE PROJECT GUTENBERG EBOOK\"):\n", " reject = False\n", " continue\n", " else:\n", " if stripped_line.startswith(\"*** END OF THE PROJECT GUTENBERG EBOOK\"):\n", " reject = True\n", " continue\n", " if stripped_line:\n", " if stripped_line.startswith('=') and stripped_line.endswith('='):\n", " ret.append(buff)\n", " buff = \"\"\n", " buff += stripped_line[1:len(stripped_line)-1] + \"\\n\\n\"\n", " else:\n", " buff += line.replace('\\r', '')\n", " if buff.strip():\n", " ret.append(buff)\n", " return StringIterableReader().load_data(ret)\n", "\n", "docs = load_gutenberg(\"https://www.gutenberg.org/cache/epub/59316/pg59316.txt\")" ] }, { "cell_type": "markdown", "id": "92786779-856f-4433-a4ec-593adee2f9bd", "metadata": { "id": "92786779-856f-4433-a4ec-593adee2f9bd" }, "source": [ "Verify that we have 58 document pieces." ] }, { "cell_type": "code", "execution_count": 7, "id": "c502c788-1bc9-4036-8c97-77d67b58ef7e", "metadata": { "id": "c502c788-1bc9-4036-8c97-77d67b58ef7e" }, "outputs": [], "source": [ "assert len(docs) == 58" ] }, { "cell_type": "markdown", "id": "15eaac0c-1059-476d-ae7f-524f1d52db4c", "metadata": { "id": "15eaac0c-1059-476d-ae7f-524f1d52db4c" }, "source": [ "## Create a Service\n", "\n", "The code creates a RAG service that has access to Jina Embeddings and Mixtral Instruct and stores it in the variable `service_context`." ] }, { "cell_type": "code", "execution_count": null, "id": "41283539-ed3e-4cb9-a9dd-8999588782e5", "metadata": { "id": "41283539-ed3e-4cb9-a9dd-8999588782e5" }, "outputs": [], "source": [ "from llama_index.core import ServiceContext\n", "\n", "service_context = ServiceContext.from_defaults(\n", " llm=mixtral_llm, embed_model=jina_embedding_model\n", ")\n" ] }, { "cell_type": "markdown", "id": "10357d75-a97d-4199-ad05-bc2291a3c3da", "metadata": { "id": "10357d75-a97d-4199-ad05-bc2291a3c3da" }, "source": [ "## Build the document index\n", "\n", "Next, we store the documents in LlamaIndex' `VectorStoreIndex`, generating embeddings with Jina Embeddings v2 model and using them as keys for retrieval." ] }, { "cell_type": "code", "execution_count": 9, "id": "f037a742-dd81-4cf4-8c95-72143b37e948", "metadata": { "id": "f037a742-dd81-4cf4-8c95-72143b37e948" }, "outputs": [], "source": [ "from llama_index.core import VectorStoreIndex\n", "\n", "index = VectorStoreIndex.from_documents(\n", " documents=docs, service_context=service_context\n", ")\n" ] }, { "cell_type": "markdown", "id": "48a054d9-bef8-4170-9749-03553eaaa311", "metadata": { "id": "48a054d9-bef8-4170-9749-03553eaaa311" }, "source": [ "## Prepare a Prompt Template\n", "\n", "This is the prompt template that will be presented to Mixtral Instruct, with `{context_str}` and `{query_str}` replaced with the retrieved documents and your query respectively." ] }, { "cell_type": "code", "execution_count": 10, "id": "a57914ad-7d03-413d-bd7f-14879feede19", "metadata": { "id": "a57914ad-7d03-413d-bd7f-14879feede19" }, "outputs": [], "source": [ "from llama_index.core import PromptTemplate\n", "\n", "qa_prompt_tmpl = (\n", " \"Context information is below.\\n\"\n", " \"---------------------\\n\"\n", " \"{context_str}\\n\"\n", " \"---------------------\\n\"\n", " \"Given the context information and not prior knowledge, \"\n", " \"answer the query. Please be brief, concise, and complete.\\n\"\n", " \"If the context information does not contain an answer to the query, \"\n", " \"respond with \\\"No information\\\".\"\n", " \"Query: {query_str}\\n\"\n", " \"Answer: \"\n", ")\n", "qa_prompt = PromptTemplate(qa_prompt_tmpl)" ] }, { "cell_type": "markdown", "id": "8b9f3eff-16f5-4917-9901-6c322f55ee82", "metadata": { "id": "8b9f3eff-16f5-4917-9901-6c322f55ee82" }, "source": [ "## Assemble the Full Query Engine\n", "\n", "The query engine has three parts:\n", "\n", "* `retriever` is the search engine that takes user requests and retrieves relevant documents from the vector store.\n", "* `response_synthesizer` uses the prompt created above to join the retrieved documents and user request and passes them to the LLM, getting back its response.\n", "* `query_engine` is a container object that holds the two together.\n" ] }, { "cell_type": "code", "execution_count": 11, "id": "1c8fb25d-f796-4e77-91b0-a24add8ced8d", "metadata": { "id": "1c8fb25d-f796-4e77-91b0-a24add8ced8d" }, "outputs": [], "source": [ "from llama_index.core.retrievers import VectorIndexRetriever\n", "from llama_index.core.query_engine import RetrieverQueryEngine\n", "from llama_index.core import get_response_synthesizer\n", "\n", "# configure retriever\n", "retriever = VectorIndexRetriever(\n", " index=index,\n", " similarity_top_k=2,\n", ")\n", "\n", "# configure response synthesizer\n", "response_synthesizer = get_response_synthesizer(\n", " service_context=service_context,\n", " text_qa_template=qa_prompt,\n", " response_mode=\"compact\",\n", ")\n", "\n", "# assemble query engine\n", "query_engine = RetrieverQueryEngine(\n", " retriever=retriever,\n", " response_synthesizer=response_synthesizer,\n", ")" ] }, { "cell_type": "markdown", "id": "bc82bb54-c0fe-4901-9b66-33281bb7a98f", "metadata": { "id": "bc82bb54-c0fe-4901-9b66-33281bb7a98f" }, "source": [ "## Run some Queries" ] }, { "cell_type": "code", "execution_count": null, "id": "33c6ac3a-6b6c-4671-95ea-bed983525929", "metadata": { "id": "33c6ac3a-6b6c-4671-95ea-bed983525929" }, "outputs": [], "source": [ "result = query_engine.query(\"How is a computer useful on a farm?\")\n", "print(result.response)" ] }, { "cell_type": "code", "execution_count": null, "id": "bc30e880-bd71-430b-97b9-82f7758ae3ee", "metadata": { "id": "bc30e880-bd71-430b-97b9-82f7758ae3ee" }, "outputs": [], "source": [ "result = query_engine.query(\"How much memory does a computer need?\")\n", "print(result.response)" ] }, { "cell_type": "code", "execution_count": null, "id": "5027c291-b377-4853-8b55-328f017dafd1", "metadata": { "id": "5027c291-b377-4853-8b55-328f017dafd1" }, "outputs": [], "source": [ "result = query_engine.query(\"What is the address of AgriData Resources, Inc.?\")\n", "print(result.response)" ] }, { "cell_type": "code", "execution_count": null, "id": "b753f2bb-ca13-4cea-9f5d-db923698d7a3", "metadata": { "id": "b753f2bb-ca13-4cea-9f5d-db923698d7a3" }, "outputs": [], "source": [ "result = query_engine.query(\"Who is buried in Grant's tomb?\")\n", "print(result.response)" ] }, { "cell_type": "markdown", "id": "1645989b-0230-419d-888c-122a12970e76", "metadata": { "id": "1645989b-0230-419d-888c-122a12970e76" }, "source": [ "## Checking the RAG Retrieval\n", "\n", "You can just run the document retrieval and inspect the results to debug your RAG system. To do this, you need the `retriever` object defined above." ] }, { "cell_type": "code", "execution_count": null, "id": "313529e5-bad5-4b44-b497-b76fd60860c4", "metadata": { "id": "313529e5-bad5-4b44-b497-b76fd60860c4" }, "outputs": [], "source": [ "retrieved_texts = retriever.retrieve(\"What is the address of AgriData Resources, Inc.?\")\n", "for i, rt in enumerate(retrieved_texts):\n", " print(f\"Text {i+1}:\\n\\n{rt.text}\\n\\n\")" ] }, { "cell_type": "code", "execution_count": null, "id": "89a20623-8060-4a73-a028-afc975f0e56a", "metadata": { "id": "89a20623-8060-4a73-a028-afc975f0e56a" }, "outputs": [], "source": [ "retrieved_texts = retriever.retrieve(\"Who is buried in Grant's tomb?\")\n", "for i, rt in enumerate(retrieved_texts):\n", " print(f\"Text {i+1}:\\n\\n{rt.text}\\n\\n\")" ] }, { "cell_type": "markdown", "id": "644ea5a0-4140-438d-9981-e23f30a21765", "metadata": { "id": "644ea5a0-4140-438d-9981-e23f30a21765" }, "source": [ "## Making your own RAG Engine\n", "\n", "You can find discussion and links in the [accompanying article](https://jina.ai/news/full-stack-rag-with-jina-and-llamaindex) on the Jina AI website." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" }, "colab": { "provenance": [] } }, "nbformat": 4, "nbformat_minor": 5 }