{"cells": [{"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["\"在\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["# Cohere Rerank\n"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["如果您在colab上打开这个笔记本,您可能需要安装LlamaIndex 🦙。\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.3.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.0\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n", "Note: you may need to restart the kernel to use updated packages.\n"]}], "source": ["%pip install llama-index > /dev/null\n", "%pip install llama-index-postprocessor-cohere-rerank > /dev/null"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["from llama_index.core import VectorStoreIndex, SimpleDirectoryReader\n", "from llama_index.core.response.pprint_utils import pprint_response"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["# 下载数据\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["--2024-05-09 17:56:26-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt\n", "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8003::154, 2606:50c0:8000::154, 2606:50c0:8002::154, ...\n", "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8003::154|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 75042 (73K) [text/plain]\n", "Saving to: ‘data/paul_graham/paul_graham_essay.txt’\n", "\n", "data/paul_graham/pa 100%[===================>] 73.28K --.-KB/s in 0.009s \n", "\n", "2024-05-09 17:56:26 (7.81 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]\n", "\n"]}], "source": ["!mkdir -p 'data/paul_graham/'\n", "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["# 加载文档\n", "documents = SimpleDirectoryReader(\"./data/paul_graham/\").load_data()\n", "\n", "# 构建索引\n", "index = VectorStoreIndex.from_documents(documents=documents)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["#### 检索前10个最相关的节点,然后使用Cohere Rerank进行筛选\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["import os\n", "from llama_index.postprocessor.cohere_rerank import CohereRerank\n", "\n", "\n", "api_key = os.environ[\"COHERE_API_KEY\"]\n", "cohere_rerank = CohereRerank(api_key=api_key, top_n=2)"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["query_engine = index.as_query_engine(\n", " similarity_top_k=10,\n", " node_postprocessors=[cohere_rerank],\n", ")\n", "response = query_engine.query(\n", " \"What did Sam Altman do in this essay?\",\n", ")"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["Final Response: Sam Altman was asked if he wanted to be the president\n", "of Y Combinator. Initially, he declined as he wanted to start a\n", "startup focused on making nuclear reactors. However, after persistent\n", "persuasion, he eventually agreed to become the president of Y\n", "Combinator starting with the winter 2014 batch.\n", "______________________________________________________________________\n", "Source Node 1/2\n", "Node ID: 7ecf4eb2-215d-45e4-ba08-44d9219c7fa6\n", "Similarity: 0.93033177\n", "Text: When I was dealing with some urgent problem during YC, there was\n", "about a 60% chance it had to do with HN, and a 40% chance it had do\n", "with everything else combined. [17] As well as HN, I wrote all of\n", "YC's internal software in Arc. But while I continued to work a good\n", "deal in Arc, I gradually stopped working on Arc, partly because I\n", "didn't have t...\n", "______________________________________________________________________\n", "Source Node 2/2\n", "Node ID: 88be17e9-e0a0-49e1-9ff8-f2b7aa7493ed\n", "Similarity: 0.86269903\n", "Text: Up till that point YC had been controlled by the original LLC we\n", "four had started. But we wanted YC to last for a long time, and to do\n", "that it couldn't be controlled by the founders. So if Sam said yes,\n", "we'd let him reorganize YC. Robert and I would retire, and Jessica and\n", "Trevor would become ordinary partners. When we asked Sam if he wanted\n", "to...\n"]}], "source": ["pprint_response(response, show_source=True)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### 直接检索前两个最相似的节点\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["query_engine = index.as_query_engine(\n", " similarity_top_k=2,\n", ")\n", "response = query_engine.query(\n", " \"What did Sam Altman do in this essay?\",\n", ")"]}, {"cell_type": "markdown", "metadata": {}, "source": ["检索到的上下文是不相关的,回复是虚构的。\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["Final Response: Sam Altman was asked to become the president of Y\n", "Combinator, initially declined the offer to pursue starting a startup\n", "focused on nuclear reactors, but eventually agreed to take over\n", "starting with the winter 2014 batch.\n", "______________________________________________________________________\n", "Source Node 1/2\n", "Node ID: 7ecf4eb2-215d-45e4-ba08-44d9219c7fa6\n", "Similarity: 0.8308840369082053\n", "Text: When I was dealing with some urgent problem during YC, there was\n", "about a 60% chance it had to do with HN, and a 40% chance it had do\n", "with everything else combined. [17] As well as HN, I wrote all of\n", "YC's internal software in Arc. But while I continued to work a good\n", "deal in Arc, I gradually stopped working on Arc, partly because I\n", "didn't have t...\n", "______________________________________________________________________\n", "Source Node 2/2\n", "Node ID: 88be17e9-e0a0-49e1-9ff8-f2b7aa7493ed\n", "Similarity: 0.8230144027954406\n", "Text: Up till that point YC had been controlled by the original LLC we\n", "four had started. But we wanted YC to last for a long time, and to do\n", "that it couldn't be controlled by the founders. So if Sam said yes,\n", "we'd let him reorganize YC. Robert and I would retire, and Jessica and\n", "Trevor would become ordinary partners. When we asked Sam if he wanted\n", "to...\n"]}], "source": ["pprint_response(response, show_source=True)"]}], "metadata": {"kernelspec": {"display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3"}}, "nbformat": 4, "nbformat_minor": 4}