{"cells": [{"cell_type": "markdown", "id": "57c676db", "metadata": {}, "source": ["\"在\n"]}, {"cell_type": "markdown", "id": "c919f307-07b1-41bd-bc5d-51edd8677983", "metadata": {}, "source": ["# 从零开始构建数据摄入\n", "\n", "在本教程中,我们将向您展示如何构建一个数据摄入管道,将数据摄入到一个向量数据库中。\n", "\n", "我们将使用Pinecone作为向量数据库。\n", "\n", "我们将展示如何完成以下操作:\n", "1. 如何加载文档。\n", "2. 如何使用文本分割器来分割文档。\n", "3. 如何**手动**从每个文本块构建节点。\n", "4. [可选] 为每个节点添加元数据。\n", "5. 如何为每个文本块生成嵌入。\n", "6. 如何插入到向量数据库中。\n"]}, {"cell_type": "markdown", "id": "tsHaUeqRpflK", "metadata": {}, "source": ["## Pinecone\n", "\n", "在本教程中,您将需要一个[pinecone.io](https://www.pinecone.io/)的API密钥。您可以[免费注册](https://app.pinecone.io/?sessionType=signup)以获得Starter账户。\n", "\n", "如果您创建了Starter账户,可以随意为您的应用程序命名。\n", "\n", "一旦您拥有了账户,请转到Pinecone控制台中的“API密钥”。您可以使用默认密钥,也可以为本教程创建一个新的密钥。\n", "\n", "保存您的API密钥及其环境(免费账户为`gcp_starter`)。您将在下面需要它们。\n"]}, {"cell_type": "markdown", "id": "92b20306", "metadata": {}, "source": ["如果您在colab上打开这个笔记本,您可能需要安装LlamaIndex 🦙。\n"]}, {"cell_type": "code", "execution_count": null, "id": "7ae74e61", "metadata": {}, "outputs": [], "source": ["%pip install llama-index-embeddings-openai\n", "%pip install llama-index-vector-stores-pinecone\n", "%pip install llama-index-llms-openai"]}, {"cell_type": "code", "execution_count": null, "id": "b60e707a", "metadata": {}, "outputs": [], "source": ["!pip install llama-index"]}, {"cell_type": "markdown", "id": "22fb9e0a-566b-4f34-b9cf-72193cb51adb", "metadata": {}, "source": ["## OpenAI\n", "\n", "在本教程中,您将需要一个[OpenAI](https://openai.com/)的API密钥。登录到您的[platform.openai.com](https://platform.openai.com/)账户,点击右上角的个人资料图片,然后从菜单中选择“API密钥”。为本教程创建一个API密钥并保存好。您将在下面用到它。\n"]}, {"cell_type": "markdown", "id": "HPwWNeZwgwE8", "metadata": {}, "source": ["## 环境\n", "\n", "首先,我们添加我们的依赖项。\n"]}, {"cell_type": "code", "execution_count": null, "id": "CyTVgLfMgmIZ", "metadata": {}, "outputs": [], "source": ["!pip -q install python-dotenv pinecone-client llama-index pymupdf"]}, {"cell_type": "markdown", "id": "bCwZFn6_iAR1", "metadata": {}, "source": ["#### 设置环境变量\n", "\n", "我们为环境变量创建一个文件。请不要提交此文件或分享它!\n", "\n", "注意:Google Colab可以让你创建但不能打开一个 .env 文件。\n"]}, {"cell_type": "code", "execution_count": null, "id": "M1l2emfWgjgE", "metadata": {}, "outputs": [], "source": ["dotenv_path = (", " \"env\" # Google Colabs不允许你打开一个 .env 文件,但你可以设置", ")", "with open(dotenv_path, \"w\") as f:", " f.write('PINECONE_API_KEY=\"<你的api密钥>\"\\n')", " f.write('PINECONE_ENVIRONMENT=\"gcp-starter\"\\n')", " f.write('OPENAI_API_KEY=\"<你的api密钥>\"\\n')"]}, {"cell_type": "markdown", "id": "PWMbn7GooMm5", "metadata": {}, "source": ["请在我们创建的文件中设置您的OpenAI API密钥、Pinecone API密钥和环境。\n"]}, {"cell_type": "code", "execution_count": null, "id": "QOyfIoXAoVGX", "metadata": {}, "outputs": [], "source": ["import os\n", "from dotenv import load_dotenv"]}, {"cell_type": "code", "execution_count": null, "id": "leZkMBXYiTl-", "metadata": {}, "outputs": [], "source": ["load_dotenv(dotenv_path=dotenv_path)"]}, {"cell_type": "markdown", "id": "bcb486eb-c0b8-40e2-9038-da97aef63139", "metadata": {}, "source": ["## 设置\n", "\n", "我们构建一个空的Pinecone索引,并定义必要的LlamaIndex包装器/抽象,以便我们可以开始将数据加载到Pinecone中。\n", "\n", "注意:不要将API密钥保存在代码中,也不要将pinecone_env添加到您的存储库中!\n"]}, {"cell_type": "code", "execution_count": null, "id": "0Izxlt0XkMII", "metadata": {}, "outputs": [], "source": ["import pinecone"]}, {"cell_type": "code", "execution_count": null, "id": "cc739b4d-491f-406d-a0e6-f6b1e8c126dc", "metadata": {}, "outputs": [], "source": ["api_key = os.environ[\"PINECONE_API_KEY\"]\n", "environment = os.environ[\"PINECONE_ENVIRONMENT\"]\n", "pinecone.init(api_key=api_key, environment=environment)"]}, {"cell_type": "code", "execution_count": null, "id": "Whwu7HqqswIq", "metadata": {}, "outputs": [], "source": ["index_name = \"llamaindex-rag-fs\""]}, {"cell_type": "code", "execution_count": null, "id": "yRKkO4g1sBMl", "metadata": {}, "outputs": [], "source": ["# [可选] 在重新运行教程之前删除索引。", "# pinecone.delete_index(index_name)"]}, {"cell_type": "code", "execution_count": null, "id": "20ba2f76-29d8-4dc5-b25c-64dcfe9e8d23", "metadata": {}, "outputs": [], "source": ["# dimensions are for text-embedding-ada-002", "pinecone.create_index(", " index_name, dimension=1536, metric=\"euclidean\", pod_type=\"p1\"", ")"]}, {"cell_type": "code", "execution_count": null, "id": "45f9a999-dac2-4bc8-8133-ccc851b76a6d", "metadata": {}, "outputs": [], "source": ["pinecone_index = pinecone.Index(index_name)"]}, {"cell_type": "code", "execution_count": null, "id": "3216f9e2-946d-4b43-8b8c-acf6788633a5", "metadata": {}, "outputs": [], "source": ["# [可选] 删除索引中的内容 - 在免费账户上无法使用", "pinecone_index.delete(deleteAll=True)"]}, {"cell_type": "markdown", "id": "89246384-983c-4e2c-ac05-ffdc1d54a594", "metadata": {}, "source": ["#### 创建PineconeVectorStore\n", "\n", "简单的包装抽象,用于在LlamaIndex中使用。包装在StorageContext中,以便我们可以轻松地加载节点。\n"]}, {"cell_type": "code", "execution_count": null, "id": "775aabb2-3dd2-44b1-b6b9-2f7326409e10", "metadata": {}, "outputs": [], "source": ["from llama_index.vector_stores.pinecone import PineconeVectorStore"]}, {"cell_type": "code", "execution_count": null, "id": "15f0aa46-9f5b-42c1-9374-db94781363f0", "metadata": {}, "outputs": [], "source": ["vector_store = PineconeVectorStore(pinecone_index=pinecone_index)"]}, {"cell_type": "markdown", "id": "be9437a0-3d52-4586-8217-43944971a2cc", "metadata": {}, "source": ["## 从零开始构建数据摄取管道\n", "\n", "我们将展示如何构建一个数据摄取管道,就像在介绍中提到的那样。\n", "\n", "请注意,步骤(2)和(3)可以通过我们的`NodeParser`抽象来处理,它可以处理拆分和节点创建。\n", "\n", "在本教程中,我们将向您展示如何手动创建这些对象。\n"]}, {"cell_type": "markdown", "id": "6d1c9630-2a6b-4656-b272-de1b869c8977", "metadata": {}, "source": ["### 1. 加载数据\n"]}, {"cell_type": "code", "execution_count": null, "id": "48739cfc-c05a-420a-8c78-280892f8d7a0", "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["--2023-10-13 01:45:14-- https://arxiv.org/pdf/2307.09288.pdf\n", "Resolving arxiv.org (arxiv.org)... 128.84.21.199\n", "Connecting to arxiv.org (arxiv.org)|128.84.21.199|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 13661300 (13M) [application/pdf]\n", "Saving to: ‘data/llama2.pdf’\n", "\n", "data/llama2.pdf 100%[===================>] 13.03M 7.59MB/s in 1.7s \n", "\n", "2023-10-13 01:45:16 (7.59 MB/s) - ‘data/llama2.pdf’ saved [13661300/13661300]\n"]}], "source": ["!mkdir data\n", "!wget --user-agent \"Mozilla\" \"https://arxiv.org/pdf/2307.09288.pdf\" -O \"data/llama2.pdf\""]}, {"cell_type": "code", "execution_count": null, "id": "079666c5-0685-413d-a765-17f71ae89506", "metadata": {}, "outputs": [], "source": ["import fitz"]}, {"cell_type": "code", "execution_count": null, "id": "4eee7692-2188-4552-9f2e-cb90ac6b7678", "metadata": {}, "outputs": [], "source": ["file_path = \"./data/llama2.pdf\"\n", "doc = fitz.open(file_path)"]}, {"cell_type": "markdown", "id": "74c573db-1863-45c3-9049-8bad535e6e35", "metadata": {}, "source": ["### 2. 使用文本分割器分割文档\n", "\n", "在这里,我们导入我们的 `SentenceSplitter` 来将文档文本分割成更小的块,同时尽量保留段落/句子的结构。\n"]}, {"cell_type": "code", "execution_count": null, "id": "9e175007-84d5-406e-bf5f-6ecacfbfd152", "metadata": {}, "outputs": [], "source": ["from llama_index.core.node_parser import SentenceSplitter"]}, {"cell_type": "code", "execution_count": null, "id": "0dbccb26-ea2a-48c9-adb4-1ebe88adaa1c", "metadata": {}, "outputs": [], "source": ["text_parser = SentenceSplitter(", " chunk_size=1024,", " # 分隔符=\" \",", ")"]}, {"cell_type": "code", "execution_count": null, "id": "7a9bed96-adfa-40c9-92bd-9dba68d58730", "metadata": {}, "outputs": [], "source": ["text_chunks = []", "# 保持与源文档索引的关系,以帮助在(3)中注入文档元数据", "doc_idxs = []", "for doc_idx, page in enumerate(doc):", " page_text = page.get_text(\"text\")", " cur_text_chunks = text_parser.split_text(page_text)", " text_chunks.extend(cur_text_chunks)", " doc_idxs.extend([doc_idx] * len(cur_text_chunks))"]}, {"cell_type": "markdown", "id": "354157d6-b436-4f0a-bf6e-f0a197e54c60", "metadata": {}, "source": ["### 3. 从文本块手动构建节点\n", "\n", "我们将每个文本块转换为一个`TextNode`对象,这是LlamaIndex中的一个低级数据抽象,它存储内容,同时也允许定义元数据和与其他节点的关系。\n", "\n", "我们将文档中的元数据注入到每个节点中。\n", "\n", "这本质上复制了我们的`SentenceSplitter`中的逻辑。\n"]}, {"cell_type": "code", "execution_count": null, "id": "33b93044-3eb4-4c77-bc40-be53dffd3749", "metadata": {}, "outputs": [], "source": ["from llama_index.core.schema import TextNode"]}, {"cell_type": "code", "execution_count": null, "id": "adbfcb3f-5554-4594-ae80-7236e28485aa", "metadata": {}, "outputs": [], "source": ["nodes = []\n", "for idx, text_chunk in enumerate(text_chunks):\n", " node = TextNode(\n", " text=text_chunk,\n", " )\n", " src_doc_idx = doc_idxs[idx]\n", " src_page = doc[src_doc_idx]\n", " nodes.append(node)"]}, {"cell_type": "code", "execution_count": null, "id": "3iidPVIiwYUg", "metadata": {}, "outputs": [], "source": ["print(nodes[0].metadata)"]}, {"cell_type": "code", "execution_count": null, "id": "257bb2e3-608a-4542-ba29-f29b59771a3f", "metadata": {}, "outputs": [], "source": ["# 打印一个示例节点", "print(nodes[0].get_content(metadata_mode=\"all\"))"]}, {"cell_type": "markdown", "id": "fac468f5-870c-4486-b576-2aa6d9eaf322", "metadata": {}, "source": ["### [可选] 4. 从每个节点中提取元数据\n", "\n", "我们使用元数据提取器从每个节点中提取元数据。\n", "\n", "这将为每个节点添加更多的元数据。\n"]}, {"cell_type": "code", "execution_count": null, "id": "9c369188-9cc9-4550-924e-b29d212ad057", "metadata": {}, "outputs": [], "source": ["from llama_index.core.extractors import (\n", " QuestionsAnsweredExtractor,\n", " TitleExtractor,\n", ")\n", "from llama_index.core.ingestion import IngestionPipeline\n", "from llama_index.llms.openai import OpenAI\n", "\n", "llm = OpenAI(model=\"gpt-3.5-turbo\")\n", "\n", "extractors = [\n", " TitleExtractor(nodes=5, llm=llm),\n", " QuestionsAnsweredExtractor(questions=3, llm=llm),\n", "]"]}, {"cell_type": "code", "execution_count": null, "id": "f5501ffc-9bbb-4b48-9181-4e4e371e8f41", "metadata": {}, "outputs": [], "source": ["pipeline = IngestionPipeline(\n", " transformations=extractors,\n", ")\n", "nodes = await pipeline.arun(nodes=nodes, in_place=False)"]}, {"cell_type": "code", "execution_count": null, "id": "WgbKmXr3ytPf", "metadata": {}, "outputs": [], "source": ["print(nodes[0].metadata)"]}, {"cell_type": "markdown", "id": "9d52522c-ffee-49d1-8651-658d70248053", "metadata": {}, "source": ["### 5. 为每个节点生成嵌入向量\n", "\n", "使用我们的OpenAI嵌入模型(`text-embedding-ada-002`)为每个节点生成文档嵌入向量。\n", "\n", "将这些嵌入向量存储在每个节点的 `embedding` 属性中。\n"]}, {"cell_type": "code", "execution_count": null, "id": "6e071e36-e609-4a0c-a478-e8cfbe751cff", "metadata": {}, "outputs": [], "source": ["from llama_index.embeddings.openai import OpenAIEmbedding\n", "\n", "embed_model = OpenAIEmbedding()"]}, {"cell_type": "code", "execution_count": null, "id": "2047ca46-729f-4c5a-a8d7-3bc860604333", "metadata": {}, "outputs": [], "source": ["for node in nodes:\n", " node_embedding = embed_model.get_text_embedding(\n", " node.get_content(metadata_mode=\"all\")\n", " )\n", " node.embedding = node_embedding"]}, {"cell_type": "markdown", "id": "11f78014-dcca-43f5-95cb-9cfb5140b30e", "metadata": {}, "source": ["### 6. 将节点加载到向量存储中\n", "\n", "现在我们将这些节点插入到我们的 `PineconeVectorStore` 中。\n", "\n", "**注意**:我们跳过了 `VectorStoreIndex` 抽象层,这是一个处理摄入的更高级抽象层。我们将在下一节中使用 `VectorStoreIndex` 来快速检索/查询。\n"]}, {"cell_type": "code", "execution_count": null, "id": "fe34fbe4-3396-402e-8599-4b42013c3016", "metadata": {}, "outputs": [], "source": ["vector_store.add(nodes)"]}, {"cell_type": "markdown", "id": "74e7f8bd-d92a-40ad-8d9b-a18c04ddfca9", "metadata": {}, "source": ["## 从向量存储中检索和查询\n", "\n", "现在我们的数据摄入已经完成,我们可以从这个向量存储中检索和查询数据。\n", "\n", "**注意**:在这里我们可以使用高级的`VectorStoreIndex`抽象。请查看下一节,了解如何在较低级别定义检索!\n"]}, {"cell_type": "code", "execution_count": null, "id": "be6a4fe1-2665-43e6-a872-8e631e31b0fd", "metadata": {}, "outputs": [], "source": ["from llama_index.core import VectorStoreIndex\n", "from llama_index.core import StorageContext"]}, {"cell_type": "code", "execution_count": null, "id": "0e9d2495-d4f7-469a-9cea-a5cfc401c085", "metadata": {}, "outputs": [], "source": ["index = VectorStoreIndex.from_vector_store(vector_store)"]}, {"cell_type": "code", "execution_count": null, "id": "5c89e1c1-8ed1-45f5-b2a4-7c3382195693", "metadata": {}, "outputs": [], "source": ["query_engine = index.as_query_engine()"]}, {"cell_type": "code", "execution_count": null, "id": "cd6e0ddb-97c9-4f42-8843-a36a29ba3f17", "metadata": {}, "outputs": [], "source": ["query_str = \"Can you tell me about the key concepts for safety finetuning\""]}, {"cell_type": "code", "execution_count": null, "id": "950cae37-7bad-44a3-be51-4154a8630818", "metadata": {}, "outputs": [], "source": ["response = query_engine.query(query_str)"]}, {"cell_type": "code", "execution_count": null, "id": "c0b309bb-ca5a-4b15-948c-687038361c91", "metadata": {}, "outputs": [], "source": ["print(str(response))"]}], "metadata": {"colab": {"provenance": []}, "kernelspec": {"display_name": "llama_index_v2", "language": "python", "name": "llama_index_v2"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3"}, "widgets": {"application/vnd.jupyter.widget-state+json": {"13ea0b1773c7430faed91de388daa080": {"model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HBoxView", "box_style": "", "children": ["IPY_MODEL_9655aa27d26b49b9860a5b50df1b3e52", "IPY_MODEL_c20cfcc9886f48d8966d6ad88e8cdf94", "IPY_MODEL_82678fb77f06450aa9c58e8512d177ca"], "layout": "IPY_MODEL_b8ec7cb22b624b8b907145288f031923"}}, "1c303124d4ad4b98a8681f8fddaae68e": {"model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": ""}}, "216dc1bb2962444ab4dcc779ea467578": {"model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": ""}}, "4df194dded4b499c99ece616b8b571b9": {"model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "55ade5e7e7344f25b1d70d79a789161c": {"model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": ""}}, "74a8d85c98ea4711acd8bc78131f829a": {"model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "82678fb77f06450aa9c58e8512d177ca": {"model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_836932cd23bd4b389f3ddf8827a16f05", "placeholder": "​", "style": "IPY_MODEL_1c303124d4ad4b98a8681f8fddaae68e", "value": " 110/110 [00:29<00:00, 47.09it/s]"}}, "836932cd23bd4b389f3ddf8827a16f05": {"model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "9655aa27d26b49b9860a5b50df1b3e52": {"model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_74a8d85c98ea4711acd8bc78131f829a", "placeholder": "​", "style": "IPY_MODEL_55ade5e7e7344f25b1d70d79a789161c", "value": "Upserted vectors: 100%"}}, "a1bcba43200e46d3bc6715aecae6fa96": {"model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": ""}}, "a69d05e9e5dc442e8f29481cd5c214d6": {"model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "a89738f109c84de3990e2e118f8c7fd2": {"model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_de7390260998489c913a017242feb7d5", "placeholder": "​", "style": "IPY_MODEL_a1bcba43200e46d3bc6715aecae6fa96", "value": " 110/110 [04:22<00:00, 1.90s/it]"}}, "af57b5b9a37441928e467276241c4862": {"model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "bar_color": null, "description_width": ""}}, "b2e447259bbb435f946e16798c6bca18": {"model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_a69d05e9e5dc442e8f29481cd5c214d6", "placeholder": "​", "style": "IPY_MODEL_216dc1bb2962444ab4dcc779ea467578", "value": "Extracting questions: 100%"}}, "b8ec7cb22b624b8b907145288f031923": {"model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "c20cfcc9886f48d8966d6ad88e8cdf94": {"model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_4df194dded4b499c99ece616b8b571b9", "max": 110, "min": 0, "orientation": "horizontal", "style": "IPY_MODEL_f699980e5c9e497cb5a0d5b902f89e7c", "value": 110}}, "ccdd303ef2064d8f82c9fe97e75f1e82": {"model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HBoxView", "box_style": "", "children": ["IPY_MODEL_b2e447259bbb435f946e16798c6bca18", "IPY_MODEL_e192ec5526964cb4ad1a64434128acce", "IPY_MODEL_a89738f109c84de3990e2e118f8c7fd2"], "layout": "IPY_MODEL_f999cd2695844fdb99ded5eedc24b35c"}}, "cefd37a71dca4347b7905e5c547c248a": {"model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "de7390260998489c913a017242feb7d5": {"model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}, "e192ec5526964cb4ad1a64434128acce": {"model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": {"_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_cefd37a71dca4347b7905e5c547c248a", "max": 110, "min": 0, "orientation": "horizontal", "style": "IPY_MODEL_af57b5b9a37441928e467276241c4862", "value": 110}}, "f699980e5c9e497cb5a0d5b902f89e7c": {"model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": {"_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "bar_color": null, "description_width": ""}}, "f999cd2695844fdb99ded5eedc24b35c": {"model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {"_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null}}}}}, "nbformat": 4, "nbformat_minor": 5}