{"cells": [{"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["\"在\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["# LongContextReorder\n", "\n", "模型往往难以访问到长篇文本中心的重要细节。[一项研究](https://arxiv.org/abs/2307.03172)发现,当关键数据位于输入上下文的开头或结尾时,通常会获得最佳性能。此外,随着输入上下文的延长,即使是针对长篇文本设计的模型,在性能上也会明显下降。\n", "\n", "该模块将重新排序检索到的节点,在需要大量 top-k 的情况下可能会有所帮助。\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## 设置\n"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["如果您在colab上打开这个笔记本,您可能需要安装LlamaIndex 🦙。\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["%pip install llama-index-embeddings-huggingface\n", "%pip install llama-index-llms-openai"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["!pip install llama-index"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["import os\n", "import openai\n", "\n", "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\""]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [{"name": "stderr", "output_type": "stream", "text": ["/home/loganm/miniconda3/envs/llama-index/lib/python3.11/site-packages/torch/cuda/__init__.py:546: UserWarning: Can't initialize NVML\n", " warnings.warn(\"Can't initialize NVML\")\n"]}], "source": ["from llama_index.embeddings.huggingface import HuggingFaceEmbedding\n", "from llama_index.llms.openai import OpenAI\n", "from llama_index.core import Settings\n", "\n", "Settings.llm = OpenAI(model=\"gpt-3.5-turbo-instruct\", temperature=0.1)\n", "Settings.embed_model = HuggingFaceEmbedding(model_name=\"BAAI/bge-base-en-v1.5\")"]}, {"attachments": {}, "cell_type": "markdown", "metadata": {}, "source": ["下载数据\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["!mkdir -p 'data/paul_graham/'\n", "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["from llama_index.core import SimpleDirectoryReader\n", "\n", "documents = SimpleDirectoryReader(\"./data/paul_graham/\").load_data()"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["from llama_index.core import VectorStoreIndex\n", "\n", "index = VectorStoreIndex.from_documents(documents)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["在这个部分,我们将学习如何在Python中运行查询。\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["from llama_index.core.postprocessor import LongContextReorder\n", "\n", "reorder = LongContextReorder()\n", "\n", "reorder_engine = index.as_query_engine(\n", " node_postprocessors=[reorder], similarity_top_k=5\n", ")\n", "base_engine = index.as_query_engine(similarity_top_k=5)"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [{"data": {"text/markdown": ["**`Final Response:`** Yes, the author met Sam Altman when they asked him to be the president of Y Combinator. This was during the time when the author was in a PhD program in computer science and also pursuing their passion for art. They were applying to art schools and eventually ended up attending RISD."], "text/plain": [""]}, "metadata": {}, "output_type": "display_data"}], "source": ["from llama_index.core.response.notebook_utils import display_response\n", "\n", "base_response = base_engine.query(\"Did the author meet Sam Altman?\")\n", "display_response(base_response)"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [{"data": {"text/markdown": ["**`Final Response:`** Yes, the author met Sam Altman when they asked him to be the president of Y Combinator. This meeting occurred at a party at the author's house, where they were introduced by a mutual friend, Jessica Livingston. Jessica later went on to compile a book of interviews with startup founders, and the author shared their thoughts on the flaws of venture capital with her during her job search at a Boston VC firm."], "text/plain": [""]}, "metadata": {}, "output_type": "display_data"}], "source": ["reorder_response = reorder_engine.query(\"Did the author meet Sam Altman?\")\n", "display_response(reorder_response)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## 检查订单差异\n", "\n", "在这个notebook中,我们将研究两个不同数据集中的订单信息,并比较它们之间的差异。我们将使用Python来加载数据集,并使用一些统计工具和可视化技术来检查订单之间的差异。\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["> Source (Doc id: 81bc66bb-2c45-4697-9f08-9f848bd78b12): [17]\n", "\n", "As well as HN, I wrote all of YC's internal software in Arc. But while I continued to work ...\n", "\n", "> Source (Doc id: bd660905-e4e0-4d02-a113-e3810b59c5d1): [19] One way to get more precise about the concept of invented vs discovered is to talk about spa...\n", "\n", "> Source (Doc id: 3932e4a4-f17e-4dd2-9d25-5f0e65910dc5): Not so much because it was badly written as because the problem is so convoluted. When you're wor...\n", "\n", "> Source (Doc id: 0d801f0a-4a99-475d-aa7c-ad5d601947ea): [10]\n", "\n", "Wow, I thought, there's an audience. If I write something and put it on the web, anyone can...\n", "\n", "> Source (Doc id: bf726802-4d0d-4ee5-ab2e-ffa8a5461bc4): I was briefly tempted, but they were so slow by present standards; what was the point? No one els...\n"]}], "source": ["print(base_response.get_formatted_sources())"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["> Source (Doc id: 81bc66bb-2c45-4697-9f08-9f848bd78b12): [17]\n", "\n", "As well as HN, I wrote all of YC's internal software in Arc. But while I continued to work ...\n", "\n", "> Source (Doc id: 3932e4a4-f17e-4dd2-9d25-5f0e65910dc5): Not so much because it was badly written as because the problem is so convoluted. When you're wor...\n", "\n", "> Source (Doc id: bf726802-4d0d-4ee5-ab2e-ffa8a5461bc4): I was briefly tempted, but they were so slow by present standards; what was the point? No one els...\n", "\n", "> Source (Doc id: 0d801f0a-4a99-475d-aa7c-ad5d601947ea): [10]\n", "\n", "Wow, I thought, there's an audience. If I write something and put it on the web, anyone can...\n", "\n", "> Source (Doc id: bd660905-e4e0-4d02-a113-e3810b59c5d1): [19] One way to get more precise about the concept of invented vs discovered is to talk about spa...\n"]}], "source": ["print(reorder_response.get_formatted_sources())"]}], "metadata": {"kernelspec": {"display_name": "llama-index", "language": "python", "name": "llama-index"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3"}}, "nbformat": 4, "nbformat_minor": 2}