{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "IJUUYSe3-GFl"
},
"source": [
"RAG con índice documental y un SLM\n",
"==================================\n",
"\n",
"En el ejemplo anterior vimos el corazón de RAG: embeddings, similitud y contexto. Sin embargo, utilizamos documentos de texto sencillos y generamos los embeddings de forma manual almacenandonos en una lista en memoria. Tal técnica no puede escalar a grandes repositorios de documentos como los que podemos encontrar en una organización.\n",
"\n",
"En este ejemplo, vamos a dar un paso más hacia una arquitectura de trabajo más parecida a la que encontraríamos en una organización: documentos con metadatos, chunking con solapamiento, un índice vectorial y un modelo generativo que responde usando los fragmentos recuperados."
],
"id": "IJUUYSe3-GFl"
},
{
"cell_type": "markdown",
"metadata": {
"id": "ur7oyqPv-GFu"
},
"source": [
"## Preparación del ambiente\n",
"\n",
"Instalemos las librerías necesarias desde el archivo de requerimientos asociado. En Google Colab puede ser necesario reiniciar la sesión después de la instalación si el entorno lo solicita."
],
"id": "ur7oyqPv-GFu"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "1yEudRNg-GFv",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "e6dddfc7-18a9-45d6-ac7a-2b87a748ea65"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\u001b[33mWARNING: huggingface-hub 1.16.1 does not provide the extra 'inference'\u001b[0m\u001b[33m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m11.9/11.9 MB\u001b[0m \u001b[31m95.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.6/1.6 MB\u001b[0m \u001b[31m95.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m76.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m121.2/121.2 kB\u001b[0m \u001b[31m13.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m51.0/51.0 kB\u001b[0m \u001b[31m5.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m142.4/142.4 kB\u001b[0m \u001b[31m19.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
"ipython 7.34.0 requires jedi>=0.16, which is not installed.\u001b[0m\u001b[31m\n",
"\u001b[0m"
]
}
],
"source": [
"!wget -q https://raw.githubusercontent.com/santiagxf/M72109/master/docs/document-understanding/rag-index-slm.txt\n",
"%pip install -r rag-index-slm.txt --quiet"
],
"id": "1yEudRNg-GFv"
},
{
"cell_type": "markdown",
"source": [
"### Sobre LlamaIndex\n",
"\n",
"LlamaIndex es un projecto de código abierto diseñado para conectar modelos de lenguaje (LLM) con tus propias fuentes de información. Permite a los desarrolladores crear aplicaciones avanzadas de IA que pueden buscar, resumir y razonar sobre documentos privados o empresariales, superando la limitación de conocimiento público de los modelos.\n",
"\n",
"En este notebook, utilizaremos esta librería para construir una solución de RAG facilmente. No es necesario utilizar esta librería, simplemente la utilizamos porque facilita la utilización de la técnica."
],
"metadata": {
"id": "CFrWnQ_UHFzV"
},
"id": "CFrWnQ_UHFzV"
},
{
"cell_type": "markdown",
"metadata": {
"id": "Z-Kk26rr-GFz"
},
"source": [
"### Documentos de ejemplo\n",
"\n",
"En una organización, los documentos no suelen vivir como una lista de textos dentro del código. Normalmente llegan como archivos en una carpeta compartida, un data lake, un bucket de objetos o un repositorio documental. Para acercarnos a ese patrón, descargaremos algunos documentos en español al sistema de archivos y luego los cargaremos con un lector de documentos de LlamaIndex.\n",
"\n",
"Usaremos páginas públicas del curso como sustituto de documentos internos. La idea pedagógica es la misma: separar la ingesta documental del índice y conservar metadatos de procedencia."
],
"id": "Z-Kk26rr-GFz"
},
{
"cell_type": "code",
"source": [
"!mkdir repositorio_documental"
],
"metadata": {
"id": "vogUj47PHmUW"
},
"id": "vogUj47PHmUW",
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "LU_nyi6n-GF0"
},
"outputs": [],
"source": [
"from pathlib import Path\n",
"from urllib.request import urlretrieve\n",
"\n",
"repo_documental = Path(\"repositorio_documental\")\n",
"repo_documental.mkdir(exist_ok=True)\n",
"\n",
"documentos_fuente = {\n",
" \"lg-user-manual-23476.pdf\": {\n",
" \"fuente\": \"lg-user-manual-23476.pdf\",\n",
" \"marca\": \"lg\",\n",
" },\n",
" \"samsumg-user-manual-15223.pdf\": {\n",
" \"fuente\": \"samsumg-user-manual-15223.pdf\",\n",
" \"marca\": \"samsung\",\n",
" }\n",
"}"
],
"id": "LU_nyi6n-GF0"
},
{
"cell_type": "markdown",
"metadata": {
"id": "aUoTCJKt-GF3"
},
"source": [
"## Chunking con metadatos\n",
"\n",
"En documentos largos, un único embedding puede mezclar muchas ideas. Por eso dividimos cada documento en chunks más pequeños y con solapamiento. El solapamiento ayuda a no cortar una explicación justo en el límite entre dos fragmentos.\n",
"\n",
"En este notebook usaremos una estrategia de chunking basada en oraciones. A grandes rasgos, intenta respetar límites naturales del texto, como oraciones o párrafos, y luego agrupa contenido hasta aproximarse al tamaño indicado. Sin embargo, vamos a repetir una pequeña porción del fragmento anterior en el siguiente para conservar contexto local de un fragmento al otro. Podemos pensar que esto evita que una definición, una advertencia o una instrucción quede separada de la frase que la explica.\n",
"\n",
"Existen otras estrategias:\n",
"\n",
"1. Una opción simple es cortar por cantidad fija de caracteres o tokens; es rápida y predecible, pero puede romper frases importantes.\n",
"2. Podemos cortar por estructura del documento, por ejemplo títulos, secciones, páginas, tablas o encabezados; suele producir chunks más interpretables, aunque requiere que el documento tenga estructura confiable.\n",
"3. También hay estrategias semánticas, donde se agrupan oraciones según similitud o cambios de tema; pueden mejorar la calidad de retrieval, pero agregan costo computacional y más parámetros que ajustar.\n",
"\n",
"El trade-off principal es entre granularidad y contexto.\n",
"\n",
"- Chunks muy pequeños pueden recuperar evidencia muy precisa, pero quizás no contienen suficiente información para responder.\n",
"- Chunks muy grandes conservan más contexto, pero pueden diluir la señal semántica del embedding y ocupar más espacio en la ventana de contexto del modelo.\n",
"\n",
"El solapamiento ayuda, aunque también aumenta la cantidad de chunks, el costo de embeddings y la posibilidad de recuperar fragmentos repetidos."
],
"id": "aUoTCJKt-GF3"
},
{
"cell_type": "markdown",
"source": [
"### El role de los metadatos\n",
"\n",
"Los metadatos son tan importantes como el texto. En este ejemplo conservamos `fuente` y `marca`, pero en un escenario real podríamos guardar página, sección, fecha de actualización, permisos de acceso, tipo de documento o área responsable. Estos metadatos permiten explicar de dónde salió una respuesta, aplicar filtros antes del retrieval, auditar resultados y evitar que un usuario reciba información de documentos que no debería consultar."
],
"metadata": {
"id": "Sgq-5oqOItzH"
},
"id": "Sgq-5oqOItzH"
},
{
"cell_type": "markdown",
"source": [
"### Proceso\n",
"\n",
"Utilizaremos LlamaIndex para cargar todo el contenido dentro de un directorio y puego crear un indice vectorial.\n",
"\n",
"Algunas cosas a notar:\n",
"\n",
"- Utilizamos diferentes *Readers* para leer diferentes extensiones de archivos.\n",
"- Podemos filtrar por extensiones especificas.\n",
"- Podemos agregar metadatos especificos que necesitemos."
],
"metadata": {
"id": "PV5lYijIIP3u"
},
"id": "PV5lYijIIP3u"
},
{
"cell_type": "code",
"source": [
"from llama_index.core import SimpleDirectoryReader\n",
"from llama_index.core.node_parser import SentenceSplitter\n",
"from llama_index.readers.file import PDFReader\n",
"\n",
"def metadata_documento(file_path):\n",
" archivo = Path(file_path).name\n",
" return documentos_fuente.get(archivo, {\"fuente\": archivo, \"marca\": \"sin clasificar\"})\n",
"\n",
"lector = SimpleDirectoryReader(\n",
" input_dir=str(repo_documental),\n",
" required_exts=[\".txt\", \".pdf\"],\n",
" file_extractor={\".pdf\": PDFReader()},\n",
" file_metadata=metadata_documento,\n",
")\n",
"documentos_llama = lector.load_data()"
],
"metadata": {
"id": "IkCsjpHBmtH1"
},
"id": "IkCsjpHBmtH1",
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Utilizamos un `SentenceSplitter` para realizar la separación."
],
"metadata": {
"id": "1IMrW3wEIwj9"
},
"id": "1IMrW3wEIwj9"
},
{
"cell_type": "code",
"source": [
"splitter = SentenceSplitter(chunk_size=256, chunk_overlap=30)"
],
"metadata": {
"id": "UMaEn_-Jmv0r"
},
"id": "UMaEn_-Jmv0r",
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Finalmente, LlamaIndex trabaja con la idea de nodos, los cuales son *chunks* con metadatos."
],
"metadata": {
"id": "vchWtJp8I2Ny"
},
"id": "vchWtJp8I2Ny"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "bF7EaOX8-GF3",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "378e599e-9252-4cb1-ad79-6c4c85f2378c"
},
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"WARNING:pypdf._reader:invalid pdf header: b'\n",
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" fuente | \n",
" marca | \n",
" texto | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" lg-user-manual-23476.pdf | \n",
" lg | \n",
" Antes de utilizar la unidad, lea detenidamente... | \n",
"
\n",
" \n",
" | 1 | \n",
" lg-user-manual-23476.pdf | \n",
" lg | \n",
" 1 Guía de inicio\\nGuía de inicio2\\nGuía de ini... | \n",
"
\n",
" \n",
" | 2 | \n",
" lg-user-manual-23476.pdf | \n",
" lg | \n",
" El signo de exclamación dentro de \\nun triángu... | \n",
"
\n",
" \n",
" | 3 | \n",
" lg-user-manual-23476.pdf | \n",
" lg | \n",
" Los orificios no deben obstruirse. El producto... | \n",
"
\n",
" \n",
" | 4 | \n",
" lg-user-manual-23476.pdf | \n",
" lg | \n",
" Para \\nevitar la exposición directa al rayo lá... | \n",
"
\n",
" \n",
"
\n",
"
\n",
" \n",
" \n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "chunks",
"summary": "{\n \"name\": \"chunks\",\n \"rows\": 325,\n \"fields\": [\n {\n \"column\": \"fuente\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"samsumg-user-manual-15223.pdf\",\n \"lg-user-manual-23476.pdf\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"marca\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"samsung\",\n \"lg\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"texto\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 325,\n \"samples\": [\n \"74 Manual del usuario\\nPara dejar de utilizar un dispositivo de presentaci\\u00f3n externo\\n1. Haga clic con el bot\\u00f3n derecho en el escritorio y seleccione Opciones gr\\u00e1ficas > \\nSalida hacia > Equipo port\\u00e1til.\\n2. Desconecte el monitor externo o el televisor del ordenador.\",\n \"\\u2022 Prep\\u00e1rese para apagar el ordenador durante el despegue y el aterrizaje.\\nBloqueo del ordenador\\nExtreme las precauciones cuando viaje o utilice su ordenador en zonas poco seguras, \\npara conservar el ordenador de la forma m\\u00e1s segura posible. Puede utilizar la opci\\u00f3n \\ndel Sistema de bloqueo de seguridad. Siga las instrucciones del Sistema de bloqueo \\nde seguridad que proporciona el fabricante para su instalaci\\u00f3n y uso espec\\u00edficos. La \\nilustraci\\u00f3n siguiente muestra c\\u00f3mo utilizar el bloqueo de forma general.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
}
},
"metadata": {},
"execution_count": 3
}
],
"source": [
"import pandas as pd\n",
"\n",
"chunks = pd.DataFrame(\n",
" [\n",
" {\n",
" \"fuente\": nodo.metadata[\"fuente\"],\n",
" \"marca\": nodo.metadata[\"marca\"],\n",
" \"texto\": nodo.get_content(metadata_mode=\"none\"),\n",
" }\n",
" for nodo in nodos\n",
" ]\n",
")\n",
"\n",
"chunks.head()"
],
"id": "e8X9d5Eu-GF4"
},
{
"cell_type": "markdown",
"metadata": {
"id": "OCUU8vd0-GF5"
},
"source": [
"## Creando el índice vectorial\n",
"\n",
"Ahora configuramos un modelo de embeddings multilingüe y construimos un índice vectorial.\n",
"\n",
"Un índice vectorial almacena representaciones numéricas de los chunks para poder buscar por similitud semántica. En vez de preguntar si dos textos comparten exactamente las mismas palabras, preguntamos si sus embeddings apuntan a una zona parecida del espacio vectorial. Esto resulta especialmente útil en documentos reales, donde el usuario puede escribir \"modo suspensión\" y el manual puede hablar de \"suspender el sistema\" o de \"estado de bajo consumo\".\n",
"\n",
"En este caso los embeddings se generan con `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2`, un modelo multilingüe liviano. Cada chunk se transforma en un vector denso de números reales; LlamaIndex conserva ese vector asociado al texto original y a sus metadatos. Cuando luego llega una consulta, la consulta también se convierte en un embedding y se compara contra los vectores almacenados para seleccionar los chunks más similares."
],
"id": "OCUU8vd0-GF5"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Rg26vStJ-GF5",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "f7805e63-efed-4406-e054-5928db5e4a3b"
},
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning: \n",
"The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
"To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n",
"You will be able to reuse this secret in all of your notebooks.\n",
"Please note that authentication is recommended but still optional to access public models or datasets.\n",
" warnings.warn(\n"
]
}
],
"source": [
"from llama_index.core import Settings, VectorStoreIndex\n",
"from llama_index.embeddings.huggingface import HuggingFaceEmbedding\n",
"\n",
"Settings.embed_model = HuggingFaceEmbedding(\n",
" model_name=\"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2\", device=\"cuda\"\n",
")"
],
"id": "Rg26vStJ-GF5"
},
{
"cell_type": "markdown",
"source": [
"En este notebook el índice vive en memoria para mantener el ejemplo liviano. Esto significa que los vectores y las referencias a los nodos existen mientras dura la sesión de Python; si reiniciamos el runtime, debemos reconstruir el índice. Para aprender resulta cómodo porque no requiere servicios externos, credenciales ni configuración adicional.\n",
"\n",
"En una implementación empresarial, el mismo patrón suele conectarse a un vector database o a un motor de búsqueda híbrida para persistencia, escalabilidad y control operativo. Plataformas como estas agregan capacidades que un índice en memoria no resuelve por sí solo: almacenamiento persistente, índices aproximados eficientes, filtros por metadatos, control de acceso, replicación, monitoreo, versionado y actualización incremental cuando cambian los documentos. El trade-off es que aumentan la complejidad de operación y obligan a pensar en seguridad, costos, latencia y gobierno del dato.\n",
"\n",
"Note que el concepto no cambia: seguimos guardando pares del tipo `(embedding, chunk, metadatos)`. Lo que cambia es dónde viven esos pares y qué garantías necesitamos alrededor de ellos."
],
"metadata": {
"id": "ikt_zORr-Bmv"
},
"id": "ikt_zORr-Bmv"
},
{
"cell_type": "code",
"source": [
"indice = VectorStoreIndex(nodos, show_progress=True)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 49,
"referenced_widgets": [
"82b6eee895954ae48bac1dd8cecb6081",
"a79634cd62af44d3a6c7937f98c2b7a0",
"dbfa2e10097742d491b28ac6ff0d9032",
"fcd4418802214618b9ba7b87ce598a18",
"145528e6f5bb4eb09263a5ad7cf9c571",
"52366a488bf549ed906838d77322ae26",
"c99285c2318a41838c42a868026130cb",
"7777efb361f94d65b8b5f9e86e0eb4f6",
"7a70d65508a94865909b65cf87ee83c6",
"56ca23ad92764f8fa370a458893ba522",
"18bd20084a3045a7b772491c68038fba"
]
},
"id": "oCB0oNi695sf",
"outputId": "587e7625-acb7-46fe-8215-071f1aa44775"
},
"id": "oCB0oNi695sf",
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"Generating embeddings: 0%| | 0/325 [00:00, ?it/s]"
],
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "82b6eee895954ae48bac1dd8cecb6081"
}
},
"metadata": {}
}
]
},
{
"cell_type": "code",
"source": [
"retriever = indice.as_retriever(similarity_top_k=3)"
],
"metadata": {
"id": "H3SajDx_m5ZH"
},
"id": "H3SajDx_m5ZH",
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "EgTxUSqH-GF5"
},
"source": [
"## Recuperando evidencia\n",
"\n",
"Antes de pedirle al modelo que redacte una respuesta, inspeccionemos qué recupera el índice. Este paso es clave: si el retrieval trae mala evidencia, el generador tendrá muy pocas chances de responder bien.\n",
"\n",
"Esta idea está directamente conectada con dos conceptos importantes en aplicaciones empresariales: *hallucinations* y *groundedness*. Una alucinación ocurre cuando el modelo produce una respuesta plausible, pero no sustentada por la información disponible. Groundedness, en cambio, mide hasta qué punto la respuesta está apoyada en la evidencia recuperada. No alcanza con que la respuesta suene correcta; necesitamos poder señalar los fragmentos que justifican cada afirmación relevante.\n",
"\n",
"¿Por qué resulta tan importante medirlo? Porque muchas aplicaciones de negocio no toleran respuestas creativas sobre políticas internas, manuales técnicos, contratos o procedimientos. Una respuesta grounded permite auditar el sistema, mostrar citas al usuario, detectar documentos faltantes y decidir cuándo conviene responder \"no tengo evidencia suficiente\". En general, una métrica de groundedness compara la respuesta final con los chunks usados como contexto y penaliza afirmaciones que no aparecen o no se infieren razonablemente de esa evidencia."
],
"id": "EgTxUSqH-GF5"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "1Lmg6imT-GF6",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 143
},
"outputId": "af21a951-4e0c-4dbc-df83-f7d80d41d9d6"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" score fuente marca \\\n",
"0 0.351 lg-user-manual-23476.pdf lg \n",
"1 0.328 lg-user-manual-23476.pdf lg \n",
"2 0.314 samsumg-user-manual-15223.pdf samsung \n",
"\n",
" texto \n",
"0 16\\nFuncionamiento\\n4\\nFuncionamiento \\nDismin... \n",
"1 Funcionamiento 15\\nFuncionamiento\\n4\\nFunciona... \n",
"2 En las próximas secciones, se tratarán los mét... "
],
"text/html": [
"\n",
" \n",
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" score | \n",
" fuente | \n",
" marca | \n",
" texto | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 0.351 | \n",
" lg-user-manual-23476.pdf | \n",
" lg | \n",
" 16\\nFuncionamiento\\n4\\nFuncionamiento \\nDismin... | \n",
"
\n",
" \n",
" | 1 | \n",
" 0.328 | \n",
" lg-user-manual-23476.pdf | \n",
" lg | \n",
" Funcionamiento 15\\nFuncionamiento\\n4\\nFunciona... | \n",
"
\n",
" \n",
" | 2 | \n",
" 0.314 | \n",
" samsumg-user-manual-15223.pdf | \n",
" samsung | \n",
" En las próximas secciones, se tratarán los mét... | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"summary": "{\n \"name\": \")\",\n \"rows\": 3,\n \"fields\": [\n {\n \"column\": \"score\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.01868154169226939,\n \"min\": 0.314,\n \"max\": 0.351,\n \"num_unique_values\": 3,\n \"samples\": [\n 0.351,\n 0.328,\n 0.314\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"fuente\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"samsumg-user-manual-15223.pdf\",\n \"lg-user-manual-23476.pdf\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"marca\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"samsung\",\n \"lg\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"texto\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"16\\nFuncionamiento\\n4\\nFuncionamiento \\nDisminuir la \\nvelocidad de \\nreproducci\\u00f3n \\n(s\\u00f3lo hacia \\ndelante) \\nMientras la reproducci\\u00f3n \\npermanece pausada, pulse v \\nrepetidas veces para reproducir \\na distintas velocidades lentas. \\nPulse PLAY (z) para continuar \\nla reproducci\\u00f3n a velocidad \\nnormal.\\nVer una \\nimagen \\nde v\\u00eddeo \\nampliada.\\nDurante el modo de \\nreproducci\\u00f3n o de pausa, pulse \\nZOOM repetidas veces para \\nseleccionar ese modo. \\nZOOM: 100% : 200% : 300% \\n: 400% : 100% \\nw/s/a/d: Podr\\u00e1 desplazarse \\na trav\\u00e9s de la imagen ampliada \\nmediante los botones.\\nAudio tu\\nPara Hacer esto\\nReproducir \\nrepetidamente\\nPulse REPEAT durante la \\nreproducci\\u00f3n. Para seleccionar \\nuna parte a repetir, pulse REPEAT \\nrepetidas veces.\",\n \"Funcionamiento 15\\nFuncionamiento\\n4\\nFuncionamiento general\\nPara Hacer esto\\nReproducir Pulse PLAY (z)\\nPausar Pulse PAUSE/STEP (M)\\nDetener Pulse STOP (Z)\\nSaltar hacia \\ndelante o hacia \\natr\\u00e1s \\nPulse C o V durante la \\nreproducci\\u00f3n\\nAdelantar o \\nrebobinar\\nPulse c o v durante la \\nreproducci\\u00f3n.\\nReanudar la \\nreproducci\\u00f3n\\nPulse STOP (Z) durante la \\nreproducci\\u00f3n para guardar el \\npunto de parada.\\n y Pulse STOP (Z) una vez: \\nMuestra MZ en la pantalla \\n(Reanudar Stop)\\n y Pulse STOP (Z) dos veces: \\nMuestra Z en la pantalla \\n(Completar Stop)\\nProtector de \\npantalla\\nEl salvapantallas aparece cuando \\nusted deja el lector de la unidad \\nen modo Stop durante cerca de \\ncinco minutos.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
}
},
"metadata": {},
"execution_count": 9
}
],
"source": [
"pregunta = \"¿Que función cumple el Modo Suspensión?\"\n",
"nodos_recuperados = retriever.retrieve(pregunta)\n",
"\n",
"pd.DataFrame(\n",
" [\n",
" {\n",
" \"score\": round(resultado.score or 0, 3),\n",
" \"fuente\": resultado.metadata[\"fuente\"],\n",
" \"marca\": resultado.metadata[\"marca\"],\n",
" \"texto\": resultado.node.get_content(metadata_mode=\"none\"),\n",
" }\n",
" for resultado in nodos_recuperados\n",
" ]\n",
")"
],
"id": "1Lmg6imT-GF6"
},
{
"cell_type": "markdown",
"metadata": {
"id": "EXb2ZAHJ-GF6"
},
"source": [
"## Conectando un SLM para responder\n",
"\n",
"Usaremos un modelo pequeño de tipo text-to-text. No esperamos la misma calidad que en un LLM grande, pero resulta suficiente para demostrar la separación de responsabilidades: el índice trae evidencia y el SLM redacta una respuesta condicionada por esa evidencia.\n",
"\n",
"El rol del SLM (por las siglas en inglés de *Small Language Model*) no es memorizar todo el conocimiento de la organización. En un pipeline RAG bien diseñado, el conocimiento relevante llega en el contexto recuperado. El modelo se ocupa principalmente de leer esos fragmentos, seleccionar la información útil, resolver pequeñas ambigüedades y redactar una respuesta clara para el usuario.\n",
"\n",
"Por eso, si el grounding es bueno, no siempre necesitamos un modelo con razonamiento extremadamente fuerte. Para preguntas factuales sobre manuales, políticas o procedimientos, la parte difícil suele ser encontrar la evidencia correcta. Un modelo más grande puede ayudar cuando la pregunta exige múltiples pasos de razonamiento, síntesis compleja o manejo de instrucciones muy largas, pero también implica mayor costo, latencia y riesgo operativo. En general, conviene empezar preguntando: ¿la respuesta está en los documentos y el retrieval la encuentra? Si la respuesta es sí, un SLM puede ser suficiente."
],
"id": "EXb2ZAHJ-GF6"
},
{
"cell_type": "code",
"source": [
"from huggingface_hub import notebook_login\n",
"\n",
"notebook_login()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 17,
"referenced_widgets": [
"68d5f0e462ad47b3954e41741a734c44",
"7772eabebf8249d88c9441fa24f48921",
"6a57f167f8b74907913ad1c953bb5c9f",
"8ef368e3dddd41888f4bc15acaf8103a",
"c1d86babe4044342bbdb5ac5de24e2cf",
"3fded1b8bc0c47efa3e151696e8989e0",
"368ce21165cb4fe593e115dca9863854",
"05af12d618144e8489b267655147caac",
"cd62142c9b314753b7068335aa71adf3",
"94ba62f9278948dfb6c561a9ce71cbf9",
"0ff3ca7d3ce643f98952e79fd40e8d60",
"fb81e6a9ebbb467a8e8a943071de7218",
"b3bb8e33a89c42b1b9a92e9a80fb3276",
"31e963ad7f0e4283afbe83c5c50c3866",
"c64400b423ed45379fb16ff60f5f790f",
"78df67275bf74f499eb4392b6a7ad323",
"bf2d3115ba894dbe9ca63e87207dd553",
"0731c15a95114a31855ef182937851c0",
"94c8789a91924e4ea48e6be0c2edbcd8",
"9500950589414514b1f3a44d9b36320e"
]
},
"id": "NmVdUIe4FHxz",
"outputId": "20fe0ae5-05d7-4e91-9c8a-46713f631ff3"
},
"id": "NmVdUIe4FHxz",
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"VBox(children=(HTML(value='
Copy a token from your Hugging Face\ntokens page and paste it below.
Immediately click login after copying\nyour token or it might be stored in plain text in this notebook file. "
}
},
"6a57f167f8b74907913ad1c953bb5c9f": {
"model_module": "@jupyter-widgets/controls",
"model_name": "PasswordModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "PasswordModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "PasswordView",
"continuous_update": true,
"description": "Token:",
"description_tooltip": null,
"disabled": false,
"layout": "IPY_MODEL_94ba62f9278948dfb6c561a9ce71cbf9",
"placeholder": "",
"style": "IPY_MODEL_0ff3ca7d3ce643f98952e79fd40e8d60",
"value": ""
}
},
"8ef368e3dddd41888f4bc15acaf8103a": {
"model_module": "@jupyter-widgets/controls",
"model_name": "CheckboxModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "CheckboxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "CheckboxView",
"description": "Add token as git credential?",
"description_tooltip": null,
"disabled": false,
"indent": true,
"layout": "IPY_MODEL_fb81e6a9ebbb467a8e8a943071de7218",
"style": "IPY_MODEL_b3bb8e33a89c42b1b9a92e9a80fb3276",
"value": true
}
},
"c1d86babe4044342bbdb5ac5de24e2cf": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ButtonModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ButtonModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ButtonView",
"button_style": "",
"description": "Login",
"disabled": false,
"icon": "",
"layout": "IPY_MODEL_31e963ad7f0e4283afbe83c5c50c3866",
"style": "IPY_MODEL_c64400b423ed45379fb16ff60f5f790f",
"tooltip": ""
}
},
"3fded1b8bc0c47efa3e151696e8989e0": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_78df67275bf74f499eb4392b6a7ad323",
"placeholder": "",
"style": "IPY_MODEL_bf2d3115ba894dbe9ca63e87207dd553",
"value": "\nPro Tip: If you don't already have one, you can create a dedicated\n'notebooks' token with 'write' access, that you can then easily reuse for all\nnotebooks. "
}
},
"368ce21165cb4fe593e115dca9863854": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": "center",
"align_self": null,
"border": null,
"bottom": null,
"display": "flex",
"flex": null,
"flex_flow": "column",
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": "50%"
}
},
"05af12d618144e8489b267655147caac": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"cd62142c9b314753b7068335aa71adf3": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"94ba62f9278948dfb6c561a9ce71cbf9": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"0ff3ca7d3ce643f98952e79fd40e8d60": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"fb81e6a9ebbb467a8e8a943071de7218": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"b3bb8e33a89c42b1b9a92e9a80fb3276": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"31e963ad7f0e4283afbe83c5c50c3866": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"c64400b423ed45379fb16ff60f5f790f": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ButtonStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ButtonStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"button_color": null,
"font_weight": ""
}
},
"78df67275bf74f499eb4392b6a7ad323": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"bf2d3115ba894dbe9ca63e87207dd553": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"0731c15a95114a31855ef182937851c0": {
"model_module": "@jupyter-widgets/controls",
"model_name": "LabelModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "LabelModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "LabelView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_94c8789a91924e4ea48e6be0c2edbcd8",
"placeholder": "",
"style": "IPY_MODEL_9500950589414514b1f3a44d9b36320e",
"value": "Connecting..."
}
},
"94c8789a91924e4ea48e6be0c2edbcd8": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"9500950589414514b1f3a44d9b36320e": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"0b267bf9e1084733825abe0217a0cb1c": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HBoxModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_28fa2515858f441e851aeec3271e76e6",
"IPY_MODEL_d53879b5660447cda3d1bd94da28aa1a",
"IPY_MODEL_a6fbe67af03b436da4aa5724567bf18a"
],
"layout": "IPY_MODEL_5b28a246ae294153b55a113c697ab3cd"
}
},
"28fa2515858f441e851aeec3271e76e6": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_3905a1fa9d9749d282478ec814dff3fd",
"placeholder": "",
"style": "IPY_MODEL_1cabb167d7d446b1a0ab86e92373ea7e",
"value": "Loading checkpoint shards: 100%"
}
},
"d53879b5660447cda3d1bd94da28aa1a": {
"model_module": "@jupyter-widgets/controls",
"model_name": "FloatProgressModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ProgressView",
"bar_style": "success",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_9d2e8f9288834de0af7c2d57786ff289",
"max": 2,
"min": 0,
"orientation": "horizontal",
"style": "IPY_MODEL_f9f6ba3b4a0e40adbb9a40a142772a79",
"value": 2
}
},
"a6fbe67af03b436da4aa5724567bf18a": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_1d2dce7cd4b04234888e6765d7d1d46c",
"placeholder": "",
"style": "IPY_MODEL_b9edd164912f40aca1380029d29759d5",
"value": " 2/2 [00:22<00:00, 9.43s/it]"
}
},
"5b28a246ae294153b55a113c697ab3cd": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"3905a1fa9d9749d282478ec814dff3fd": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"1cabb167d7d446b1a0ab86e92373ea7e": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"9d2e8f9288834de0af7c2d57786ff289": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"f9f6ba3b4a0e40adbb9a40a142772a79": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"1d2dce7cd4b04234888e6765d7d1d46c": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"b9edd164912f40aca1380029d29759d5": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"60afe0fabefd4a60b00d1a08f7138a42": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HBoxModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_b79ea979b76148d1a6c8ee0d38fedaf9",
"IPY_MODEL_e7998009bc1a4aed886bbe9c9479f530",
"IPY_MODEL_a37c03b166fa4bd09d68dd36178b38f9"
],
"layout": "IPY_MODEL_ed4b8dee2c044080adf7df836f94c680"
}
},
"b79ea979b76148d1a6c8ee0d38fedaf9": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_6f17612bc07440bcbd1ae73cfcb2a630",
"placeholder": "",
"style": "IPY_MODEL_a58bc62e38bf48f180f5ab1bb1884f1c",
"value": "tokenizer.model: 100%"
}
},
"e7998009bc1a4aed886bbe9c9479f530": {
"model_module": "@jupyter-widgets/controls",
"model_name": "FloatProgressModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ProgressView",
"bar_style": "success",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_652f4c7d01434963aad88e2fb58f53e3",
"max": 4241003,
"min": 0,
"orientation": "horizontal",
"style": "IPY_MODEL_087fbffa0afb4d38ac1fbc39019e10f0",
"value": 4241003
}
},
"a37c03b166fa4bd09d68dd36178b38f9": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_103516eeffa643ef847fcd824adc4439",
"placeholder": "",
"style": "IPY_MODEL_ca2680e4eda44b2d85d29cbf4da34e14",
"value": " 4.24M/4.24M [00:00<00:00, 5.04MB/s]"
}
},
"ed4b8dee2c044080adf7df836f94c680": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"6f17612bc07440bcbd1ae73cfcb2a630": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"a58bc62e38bf48f180f5ab1bb1884f1c": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"652f4c7d01434963aad88e2fb58f53e3": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"087fbffa0afb4d38ac1fbc39019e10f0": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"103516eeffa643ef847fcd824adc4439": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"ca2680e4eda44b2d85d29cbf4da34e14": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
}
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}