# Full-stack RAG with Jina and LlamaIndex

This notebook will walk you through setting up a RAG system using [Jina Embeddings v2](https://jina.ai/embeddings/), [LlamaIndex](https://www.llamaindex.ai/), and the [Mixtral Instruct LLM](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1). It accompanies the [article on Jina AI's website](https://jina.ai/news/full-stack-rag-with-jina-and-llamaindex).


## Install LlamaIndex

This includes LLM and JinaAI-specific packages.

In [None]:
!pip install llama-index llama-index-llms-openai llama-index-embeddings-jinaai llama-index-llms-huggingface "huggingface_hub[inference]"

## Set up API keys

Set up your Jina API key and HuggingFace Inference API token.

To get a Jina Embeddings key, go to https://jina.ai/embeddings/

To get a HuggingFace Inference API token, go to https://huggingface.co/settings/tokens


In [2]:
jinaai_api_key = ''
hf_inference_api_key = ''

## Access Jina Embeddings v2 via the LlamaIndex interface.

This code creates the LlamaIndex object that manages your connection to the Jina Embeddings v2 API.

The resulting object is held in the variable `jina_embedding_model`.


In [3]:
from llama_index.embeddings.jinaai import JinaEmbedding

jina_embedding_model = JinaEmbedding(
 api_key=jinaai_api_key,
 model="jina-embeddings-v2-base-en",
)

## Access the Mixtral Model via the HuggingFace Inference API

This code creates a holder for accessing the `mistralai/Mixtral-8x7B-Instruct-v0.1` model via the Hugging Face Inference API. The resulting object is held in the variable `mixtral_llm`.

In [5]:
from llama_index.llms.huggingface import HuggingFaceInferenceAPI

mixtral_llm = HuggingFaceInferenceAPI(
 model_name="mistralai/Mixtral-8x7B-Instruct-v0.1", token=hf_inference_api_key
)

## Get Data for RAG retrieval

This code will download the book [_Computers on the Farm_ from Project Gutenberg](https://www.gutenberg.org/ebooks/59316). It will process the book to remove Project Gutenberg headers and footers and break it up into parts by main titles in the text. These parts will be stored in a list variable named `docs`.

In [6]:
import urllib.request
from typing import List
from llama_index.core.readers import StringIterableReader
from llama_index.core.schema import Document


def load_gutenberg(target_url: str) -> List[Document]:
 ret: List[str] = []
 buff: str = ""
 reject: bool = True
 for raw_line in urllib.request.urlopen(target_url):
 line = raw_line.decode("utf-8")
 stripped_line = line.strip()
 if reject:
 if stripped_line.startswith("*** START OF THE PROJECT GUTENBERG EBOOK"):
 reject = False
 continue
 else:
 if stripped_line.startswith("*** END OF THE PROJECT GUTENBERG EBOOK"):
 reject = True
 continue
 if stripped_line:
 if stripped_line.startswith('=') and stripped_line.endswith('='):
 ret.append(buff)
 buff = ""
 buff += stripped_line[1:len(stripped_line)-1] + "\n\n"
 else:
 buff += line.replace('\r', '')
 if buff.strip():
 ret.append(buff)
 return StringIterableReader().load_data(ret)

docs = load_gutenberg("https://www.gutenberg.org/cache/epub/59316/pg59316.txt")

Verify that we have 58 document pieces.

In [7]:
assert len(docs) == 58

## Create a Service

The code creates a RAG service that has access to Jina Embeddings and Mixtral Instruct and stores it in the variable `service_context`.

In [None]:
from llama_index.core import ServiceContext

service_context = ServiceContext.from_defaults(
 llm=mixtral_llm, embed_model=jina_embedding_model
)


## Build the document index

Next, we store the documents in LlamaIndex' `VectorStoreIndex`, generating embeddings with Jina Embeddings v2 model and using them as keys for retrieval.

In [9]:
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(
 documents=docs, service_context=service_context
)


## Prepare a Prompt Template

This is the prompt template that will be presented to Mixtral Instruct, with `{context_str}` and `{query_str}` replaced with the retrieved documents and your query respectively.

In [10]:
from llama_index.core import PromptTemplate

qa_prompt_tmpl = (
 "Context information is below.\n"
 "---------------------\n"
 "{context_str}\n"
 "---------------------\n"
 "Given the context information and not prior knowledge, "
 "answer the query. Please be brief, concise, and complete.\n"
 "If the context information does not contain an answer to the query, "
 "respond with \"No information\"."
 "Query: {query_str}\n"
 "Answer: "
)
qa_prompt = PromptTemplate(qa_prompt_tmpl)

## Assemble the Full Query Engine

The query engine has three parts:

* `retriever` is the search engine that takes user requests and retrieves relevant documents from the vector store.
* `response_synthesizer` uses the prompt created above to join the retrieved documents and user request and passes them to the LLM, getting back its response.
* `query_engine` is a container object that holds the two together.


In [11]:
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core import get_response_synthesizer

# configure retriever
retriever = VectorIndexRetriever(
 index=index,
 similarity_top_k=2,
)

# configure response synthesizer
response_synthesizer = get_response_synthesizer(
 service_context=service_context,
 text_qa_template=qa_prompt,
 response_mode="compact",
)

# assemble query engine
query_engine = RetrieverQueryEngine(
 retriever=retriever,
 response_synthesizer=response_synthesizer,
)

## Run some Queries

In [None]:
result = query_engine.query("How is a computer useful on a farm?")
print(result.response)

In [None]:
result = query_engine.query("How much memory does a computer need?")
print(result.response)

In [None]:
result = query_engine.query("What is the address of AgriData Resources, Inc.?")
print(result.response)

In [None]:
result = query_engine.query("Who is buried in Grant's tomb?")
print(result.response)

## Checking the RAG Retrieval

You can just run the document retrieval and inspect the results to debug your RAG system. To do this, you need the `retriever` object defined above.

In [None]:
retrieved_texts = retriever.retrieve("What is the address of AgriData Resources, Inc.?")
for i, rt in enumerate(retrieved_texts):
 print(f"Text {i+1}:\n\n{rt.text}\n\n")

In [None]:
retrieved_texts = retriever.retrieve("Who is buried in Grant's tomb?")
for i, rt in enumerate(retrieved_texts):
 print(f"Text {i+1}:\n\n{rt.text}\n\n")

## Making your own RAG Engine

You can find discussion and links in the [accompanying article](https://jina.ai/news/full-stack-rag-with-jina-and-llamaindex) on the Jina AI website.