# Having It Both Ways: Combining BM25 with AI Reranking

You can find the original blog post under the following [link](https://jina.ai/news/having-it-both-ways-combining-bm25-with-ai-reranking).

### Upload files to Google Colab

Before we can access local files on Google Colab, we need to upload them to the Colab environment. Here are the steps to do so:
 
1. [Download the file `fashion_data.csv`](https://raw.githubusercontent.com/jina-ai/workshops/main/notebooks/embeddings/bm25/fashion_data.csv) to your local drive.
2. Click on the “Files” tab on the left-side menu in Google Colab (Make sure it is the “Files tab” not the “File” Dropdown menu).
3. Click on the “Upload to Session Storage” button and select the `fashion_data.csv` file you previously downloaded.
4. Wait for the upload to complete.

Once the `fashion_data.csv` file is uploaded, you can access it in the “Files” tab.

Install prerequisites:

In [None]:
!pip install --q haystack-ai jina-haystack

Add the Jina API key as environment variable:

In [None]:
import os
import getpass

os.environ["JINA_API_KEY"] = getpass.getpass()

Define the query in form of the product category:

In [None]:
query = "Nightwear for Women"

Transform the data into Documents:

In [None]:
import csv
from haystack import Document

documents = []
with open("fashion_data.csv") as f:
 data = csv.reader(f, delimiter=";")
 for row in data:
 row_text = ''.join(row)
 row_doc = Document(content=row_text, meta={"prod_id": row[0], "prod_image": row[1]})
 documents.append(row_doc)

Create the query pipeline WITHOUT Jina Reranker to compare the results prior to the reranking:

In [None]:
from haystack import Pipeline
from haystack.document_stores.types import DuplicatePolicy
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever

document_store=InMemoryDocumentStore()
document_store.write_documents(documents=documents, policy=DuplicatePolicy.OVERWRITE)

retriever = InMemoryBM25Retriever(document_store=document_store)

rag_pipeline = Pipeline()
rag_pipeline.add_component("retriever", retriever)

Run the query pipeline WITHOUT Jina Reranker:

In [None]:
result = rag_pipeline.run(
 {
 "retriever": {"query": query, "top_k": 50},
 }
 )

for doc in result["retriever"]["documents"]:
 print("Product ID:", doc.meta["prod_id"])
 print("Product Image:", doc.meta["prod_image"])
 print("Score:", doc.score)
 print("-"*100)

![image.png](./images/bm25-retrieved-results.png)

*As we can see, although the results are related to the nightwear we asked for, the most relevant matches seem to get lost within the multitude of products retrieved by BM25. Concretely, this means that a user would mainly receive unrelated results at the top of the page which might not match their exact needs.*

Create the query pipeline WITH Jina Reranker to compare the results after the reranking:

In [None]:
from haystack_integrations.components.rankers.jina import JinaRanker

ranker_retriever = InMemoryBM25Retriever(document_store=document_store)

ranker = JinaRanker()

ranker_pipeline = Pipeline()
ranker_pipeline.add_component("ranker_retriever", ranker_retriever)
ranker_pipeline.add_component("ranker", ranker)

ranker_pipeline.connect("ranker_retriever.documents", "ranker.documents")

Run the query pipeline WITH Jina Reranker:

In [None]:
result = ranker_pipeline.run(
 {
 "ranker_retriever": {"query": query, "top_k": 50},
 "ranker": {"query": query, "top_k": 10},
 }
 )

for doc in result["ranker"]["documents"]:
 print("Product ID:", doc.meta["prod_id"])
 print("Product Image:", doc.meta["prod_image"])
 print("Score:", doc.score)
 print("-"*100)

![image.png](./images/reranker-retrieved-results.png)

*Compared to BM25, Jina Reranker returns a much more relevant collection of answers. In our e-commerce setting, this translates directly to a better user experience and increased likelihood of purchases.*