{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c2231055-6a63-4425-9248-c5aae455396e",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "<figure>\n",
    "<center>\n",
    "<img src=\"../Imagenes/logo_final.png\"  align=\"left\"/> \n",
    "</center>   \n",
    "</figure>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f4ce5373-7375-4e11-872a-d58813d66c24",
   "metadata": {},
   "source": [
    "# <span style=\"color:red\"><center>BERT</center></span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30115409-9d8e-45df-a9cd-072858c02397",
   "metadata": {},
   "source": [
    "<center>Explorando modelos pre-entrenados</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23484740-ab13-447a-bc3f-01686f528f49",
   "metadata": {},
   "source": [
    "##   <span style=\"color:blue\">Autores</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "405456a4-e556-41d2-bf60-593a5055f8b4",
   "metadata": {
    "tags": []
   },
   "source": [
    "1. Alvaro Mauricio Montenegro Díaz, ammontenegrod@unal.edu.co\n",
    "2. Daniel Mauricio Montenegro Reyes, dextronomo@gmail.com "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7aaec87-e77d-4152-bb83-bc9e62d4a94c",
   "metadata": {},
   "source": [
    "##   <span style=\"color:blue\">Diseño gráfico y Marketing digital</span>\n",
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aac3be8e-e519-43bf-9bd9-9903a27b570b",
   "metadata": {},
   "source": [
    "1. Maria del Pilar Montenegro Reyes, pmontenegro88@gmail.com "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80e4663a-f0f2-42d8-a044-6e3ab6d16e3e",
   "metadata": {},
   "source": [
    "## <span style=\"color:blue\">Asistentes</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b038974-8718-4142-a66e-895520f68ca7",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "0a15511c-982d-4ff7-96fd-a6f1c95f39a1",
   "metadata": {},
   "source": [
    "## <span style=\"color:blue\">Referencias</span> "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "190d6ac6-19de-4dcf-8904-a9e158c84d34",
   "metadata": {},
   "source": [
    "1. [HuggingFace BERT model](https://huggingface.co/transformers/model_doc/bert.html)\n",
    "1. [Getting Started with Google BERT: Build and train state-of-the-art natural language processing models using BERT](http://library.lol/main/A0CA3A1276D07957FD7B28F843C299BA)\n",
    "1. [Transformers for Natural Language Processing: Build innovative deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, RoBERTa, and more](http://library.lol/main/A8C97E552646B3F194ECA333221CEE88)\n",
    "1. [HuggingFace. Transformers ](https://huggingface.co/transformers/)\n",
    "1. [HuggingFace. Intro pipeline](https://huggingface.co/course/chapter1/3?fw=pt)\n",
    "1. [Tutorial Transformer de Google](https://www.tensorflow.org/text/tutorials/transformer)\n",
    "1. [Transformer-chatbot-tutorial-with-tensorflow-2](https://blog.tensorflow.org/2019/05/transformer-chatbot-tutorial-with-tensorflow-2.html) \n",
    "1. [Transformer Architecture: The positional encoding](https://kazemnejad.com/blog/transformer_architecture_positional_encoding/)\n",
    "1. [Illustrated Auto-attención](https://towardsdatascience.com/illustrated-self-attention-2d627e33b20a)\n",
    "1. [Illustrated Attention](https://towardsdatascience.com/attn-illustrated-attention-5ec4ad276ee3#0458)\n",
    "1. [Neural Machine Translation by Jointly Learning to Align and Translate (Bahdanau et. al, 2015)](https://arxiv.org/pdf/1409.0473.pdf)\n",
    "1. [Effective Approaches to Attention-based Neural Machine Translation (Luong et. al, 2015)](https://arxiv.org/pdf/1508.04025.pdf)\n",
    "1. [Attention Is All You Need (Vaswani et. al, 2017)](https://arxiv.org/pdf/1706.03762.pdf)\n",
    "1. [Self-Attention GAN (Zhang et. al, 2018)](https://arxiv.org/pdf/1805.08318.pdf)\n",
    "1. [Sequence to Sequence Learning with Neural Networks (Sutskever et. al, 2014)](https://arxiv.org/pdf/1409.3215.pdf)\n",
    "1. [TensorFlow’s seq2seq Tutorial with Attention (Tutorial on seq2seq+attention)](https://github.com/tensorflow/nmt)\n",
    "1. [Lilian Weng’s Blog on Attention (Great start to attention)](https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html#a-family-of-attention-mechanisms)\n",
    "1. [Jay Alammar’s Blog on Seq2Seq with Attention (Great illustrations and worked example on seq2seq+attention)](https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/)\n",
    "1. [Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation (Wu et. al, 2016)](https://arxiv.org/pdf/1609.08144.pdf)\n",
    "1. [Adam: A method for stochastic optimization](https://arxiv.org/pdf/1412.6980.pdf)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17e9be15-8ae8-4715-b57f-a6e8a8111c0c",
   "metadata": {},
   "source": [
    "## <span style=\"color:blue\">Contenido</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a382773-c33f-4247-b44c-6e3806b9af20",
   "metadata": {},
   "source": [
    "* [Introducción](#Introducción)\n",
    "* [Extracción de incrustamientos de un modelo BERT pre-entrenado](#Extracción-de-incrustamientos-de-un-modelo-BERT-pre-entrenado)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c3c46451-72e1-4588-b271-855af1898df1",
   "metadata": {},
   "source": [
    "## <span style=\"color:blue\">Introducción</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "192cd6c6-0240-4bd0-b2e6-427aff064ebb",
   "metadata": {},
   "source": [
    "Usaremos la implementación de HuggingFace in en Pytorch. \n",
    "\n",
    "+ BERT es un modelo con incrustaciones de posición absoluta, por lo que generalmente se recomienda rellenar (padding) las entradas a la derecha en lugar de a la izquierda.\n",
    "\n",
    "+ BERT fue entrenado con el modelado de lenguaje enmascarado (MLM) y los objetivos de predicción de la siguiente oración (NSP). Es eficiente para predecir tokens enmascarados y en NLU en general, pero no es óptimo para la generación de texto."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b86ab415-c6d7-4d89-bbae-098b30f95578",
   "metadata": {},
   "source": [
    "## <span style=\"color:blue\">Extracción de incrustamientos de un modelo BERT pre-entrenado</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2cd69e7e-8b83-4a1b-ad59-8d5f01381baa",
   "metadata": {},
   "source": [
    "La tarea de PLN es análisis de sentimiento. El primer experimento lo hacemos en inglés y luego en Español. Para esta tarea esta bién usar el modelo uncase (eliminando mayúsculas). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "d93f08d4-ed19-46ef-88cb-f79056f932de",
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import BertModel, BertTokenizer\n",
    "import torch\n"
   ]
  },
  {
   "cell_type": "raw",
   "id": "090081a2-b0e4-4345-aff4-b6cd79b6eb3c",
   "metadata": {},
   "source": [
    "## Tensorflow\n",
    "from transformers import TBertModel, TBertTokenizer\n",
    "import tensorflow as tf"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e483a0f6-52b5-439c-96fc-ba7bc99cc4e4",
   "metadata": {},
   "source": [
    "### Cargamos el modelo pre-entrenado 'bert-base-uncase' y su respectivo tokenizador"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b0d6e7aa-c090-4cff-80ce-f85de557c3cc",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = BertModel.from_pretrained('bert-base-uncased')\n",
    "                                 \n",
    "tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f85fb650-c2c7-4287-aa32-fa84acb2c658",
   "metadata": {},
   "source": [
    "### Preprocesamiento de la entrada"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "40e00063-d712-494c-9bc5-39cf1bd332bc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['i', 'love', 'paris']\n"
     ]
    }
   ],
   "source": [
    "# sentencia\n",
    "sentence = 'I love París'\n",
    "\n",
    "# tokenización\n",
    "tokens = tokenizer.tokenize(sentence)\n",
    "\n",
    "# print\n",
    "print(tokens)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77d5985b-8bfd-4fa0-8255-0dce67066768",
   "metadata": {},
   "source": [
    "### Agregamos los tokens [CLS] al comienzo y [SEP] al final de la lista de tokens"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "f48fc4af-645a-4dc2-a73d-07c25f3623a0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['[CLS]', 'i', 'love', 'paris', '[SEP]']\n"
     ]
    }
   ],
   "source": [
    "tokens = ['[CLS]'] + tokens + ['[SEP]']\n",
    "\n",
    "print(tokens)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "396b2885-baed-4daf-8504-da040e19e927",
   "metadata": {},
   "source": [
    "###  Relleno y máscara para la sentencia"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df8ec922-bb09-4452-b387-9a44e2a6cdc5",
   "metadata": {},
   "source": [
    "El tamaño de la lista de tokens es 5. Supongamos que hemos decidido que el tamaño máximo se las sentencias será 7. BERT está constuido para aceptar sentencias hasta de tamaño 512. Todas las sentencias deben tener el mismo tamaño."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "e144cfa0-e684-4f55-8fc7-d30317821fae",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['[CLS]', 'i', 'love', 'paris', '[SEP]', '[PAD]', '[PAD]']\n"
     ]
    }
   ],
   "source": [
    "## Relleno\n",
    "max_sentence_size = 7\n",
    "pad_size = max_sentence_size - len(tokens)\n",
    "\n",
    "for i in range(pad_size): \n",
    "    tokens = tokens + ['[PAD]'] \n",
    "\n",
    "print(tokens)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "315fd135-f97a-4453-a658-8972d56d5c75",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1, 1, 1, 1, 1, 0, 0]\n"
     ]
    }
   ],
   "source": [
    "## máscara de atención\n",
    "attention_mask = [1 if i!= '[PAD]' else 0 for i in tokens]\n",
    "print(attention_mask)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ad09897-d398-4f44-8431-a554a2d83972",
   "metadata": {},
   "source": [
    "### Convertimos la lista de tokens en la lista de ID de los tokens"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "6fbea2a3-320d-4dd7-8521-d1af470d3d45",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[101, 1045, 2293, 3000, 102, 0, 0]\n"
     ]
    }
   ],
   "source": [
    "token_ids = tokenizer.convert_tokens_to_ids(tokens)\n",
    "print(token_ids)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f15280fd-00da-45ca-a450-a7a3ad1d69f5",
   "metadata": {},
   "source": [
    "### Convertimos token_ids y attention_mask a tensores"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "6e6e7bd2-4899-4cb9-9d8c-5b9c8d8ed639",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[ 101, 1045, 2293, 3000,  102,    0,    0]])\n"
     ]
    }
   ],
   "source": [
    "token_ids = torch.tensor(token_ids).unsqueeze(0) # unsuezze es para agregar una dimensión al comienzo (varias sentencias)\n",
    "attention_mask = torch.tensor(attention_mask).unsqueeze(0)\n",
    "print(token_ids) # tensor([[ 101, 1045, 2293, 3000,  102,    0,    0]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "675844ed-0528-44c2-86eb-b3f6c59742df",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[ 101, 1045, 2293, 3000,  102,    0,    0]])\n",
      "tensor([[1, 1, 1, 1, 1, 0, 0]])\n"
     ]
    }
   ],
   "source": [
    "print(token_ids)\n",
    "print(attention_mask)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86126d72-8263-4d14-93b2-37d269bc10fb",
   "metadata": {},
   "source": [
    "### Pregunta"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5c9ca04-b524-4655-aa73-899edad4d1f2",
   "metadata": {},
   "outputs": [],
   "source": [
    "¿Cómo hace esto con tensorflow?"
   ]
  },
  {
   "cell_type": "raw",
   "id": "a9a18715-4f3b-4759-b455-1c6339eb17e8",
   "metadata": {},
   "source": [
    "# Consigne aquí su respuesta\n",
    "import tensorflow as tf\n",
    "token_ids = tf.expand_dims(tf.constant(token_ids), axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a587dc32-086e-4b90-aaa5-e786350cd427",
   "metadata": {},
   "source": [
    "### Extracción del incrustamiento final"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb47b7a9-de74-4348-8792-8e27c6725df3",
   "metadata": {},
   "source": [
    "model regresa una lista con dos objetos: \n",
    "\n",
    "* El primer valor, *last_hidden_state*, contiene la representación de todos los tokens obtenidos solo de la capa del codificador final (codificador 12).\n",
    "* A continuación, *pooler_output* indica la representación del token [CLS] de la capa codificadora final, que se procesa posteriormente mediante una capa lineal y una activación *tanh*. La capa lineal es entrenada cuando se entrena el modelo BERT para la tarea NSP (Next sequence prediction).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "0613566b-a5a5-410e-86ec-4a8b021315d7",
   "metadata": {},
   "outputs": [],
   "source": [
    "out = model(token_ids, attention_mask = attention_mask)\n",
    "last_hidden_state, pooler_output = out.last_hidden_state, out.pooler_output # out[0], out[1]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "6d52ab29-9b0d-416e-b0f0-823ec5e74168",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.Size([1, 7, 768])\n"
     ]
    }
   ],
   "source": [
    "print(last_hidden_state.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c4b228c0-26b1-4371-8ca1-d9a4371f2c03",
   "metadata": {},
   "source": [
    "* Tamaño batch = 1\n",
    "* Tamaño secuencia = 7\n",
    "* tamaño del emebdding = 768"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "e6061579-ecac-4724-a27b-c3d74896cc42",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "odict_keys(['last_hidden_state', 'pooler_output'])"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# out es un diccionario. Podemos obtener las claves  así:\n",
    "out.keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b1287a8b-9a09-4b2f-8fc2-5c6d4ffe5f77",
   "metadata": {},
   "source": [
    "### Extracción de los incrustamientos de todas las capas codificadoras"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d5e43f16-437b-419c-b6e5-61f1ca3f1333",
   "metadata": {},
   "source": [
    "En esta sección revisamos como extraer las incrustaciones (embeddings) que salen de cada una de las capas codificadoras (12 por ejemplo en el modelo base). Algunos veces estop se hace para extraer diferentes features de las sentencias. \n",
    "\n",
    "Por ejemplo en la tarea NER (name entity recognition) los investigadores han usado las incrustaciones de las diferentes capas, para hacr promedios pesados de algunas de ellas y con esto han podido mejorar la exactitud en la precisión.\n",
    "\n",
    "Para hacer esto, es necesario instanciar el modelo preentrenado con la opción *output_hidden_states=True*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c8b14fb0-e508-4164-b9a5-cbae7e9db8ab",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = BertModel.from_pretrained('bert-base-uncased', output_hidden_states=True)\n",
    "tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "9cd791bc-ec6a-43f3-8deb-e6fa50c7df6b",
   "metadata": {},
   "outputs": [],
   "source": [
    "out = model(token_ids, attention_mask=attention_mask)\n",
    "\n",
    "last_hidden_state, pooler_output, hidden_states = \\\n",
    "        out.last_hidden_state, out.pooler_output, out.hidden_states\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "2916d80b-0f59-48fb-b349-b7c986ab065c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.Size([1, 7, 768])\n",
      "torch.Size([1, 768])\n",
      "13\n"
     ]
    }
   ],
   "source": [
    "print(last_hidden_state.shape)\n",
    "print(pooler_output.shape)\n",
    "print(len(hidden_states)) # esta es una lista conteniendo las\n",
    "                          # incrutaciones de todas las capas codificadoras"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc6d576f-40d0-4a9a-a56d-159b0d01c42f",
   "metadata": {},
   "source": [
    "Observe que *hidden_states* tiene 13 elementos. La capa 0 corresponde a la incrustación de la capa de entrada, luego los elementos 1 a 12 corresponden a las incrustaciones de de salida de cada una de las 12 capas codificadoras.\n",
    "\n",
    "\n",
    "La representación de los token de la última capa oculta (codificadora) pueden ser obtenidos así:\n",
    "\n",
    "* *last_hidden_state[0][0]*: entrega la representación del prime token, es decir, *[CLS]*.\n",
    "* *last_hidden_state[0][1]*: entrega la representación del Token *I*.\n",
    "* *last_hidden_state[0][2]*: entrega la representación del Token *love*. \n",
    "\n",
    "Esta es la representación contextual final de los token. \n",
    "\n",
    "Las incrustaciones de cada capa *i*, se obtienen mediante *hidden_states[i]:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "dc65d589-ec50-468f-b440-10fdf46a4932",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.Size([1, 7, 768])\n",
      "torch.Size([1, 7, 768])\n"
     ]
    }
   ],
   "source": [
    "# Incrutaciones de la capa de entrada\n",
    "input_embedding = hidden_states[0]\n",
    "print(input_embedding.shape)\n",
    "\n",
    "# incrustaciones de la capa codificadora 11\n",
    "embedding_11 = hidden_states[11]\n",
    "print(embedding_11.shape)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "5555e4d2-9e50-4592-9ae7-e7ccc6df3645",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Help on BaseModelOutputWithPoolingAndCrossAttentions in module transformers.modeling_outputs object:\n",
      "\n",
      "class BaseModelOutputWithPoolingAndCrossAttentions(transformers.file_utils.ModelOutput)\n",
      " |  BaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state: torch.FloatTensor = None, pooler_output: torch.FloatTensor = None, hidden_states: Union[Tuple[torch.FloatTensor], NoneType] = None, past_key_values: Union[Tuple[Tuple[torch.FloatTensor]], NoneType] = None, attentions: Union[Tuple[torch.FloatTensor], NoneType] = None, cross_attentions: Union[Tuple[torch.FloatTensor], NoneType] = None) -> None\n",
      " |  \n",
      " |  Base class for model's outputs that also contains a pooling of the last hidden states.\n",
      " |  \n",
      " |  Args:\n",
      " |      last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`):\n",
      " |          Sequence of hidden-states at the output of the last layer of the model.\n",
      " |      pooler_output (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, hidden_size)`):\n",
      " |          Last layer hidden-state of the first token of the sequence (classification token) further processed by a\n",
      " |          Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence\n",
      " |          prediction (classification) objective during pretraining.\n",
      " |      hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):\n",
      " |          Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)\n",
      " |          of shape :obj:`(batch_size, sequence_length, hidden_size)`.\n",
      " |  \n",
      " |          Hidden-states of the model at the output of each layer plus the initial embedding outputs.\n",
      " |      attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):\n",
      " |          Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape :obj:`(batch_size, num_heads,\n",
      " |          sequence_length, sequence_length)`.\n",
      " |  \n",
      " |          Attentions weights after the attention softmax, used to compute the weighted average in the self-attention\n",
      " |          heads.\n",
      " |      cross_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` and ``config.add_cross_attention=True`` is passed or when ``config.output_attentions=True``):\n",
      " |          Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape :obj:`(batch_size, num_heads,\n",
      " |          sequence_length, sequence_length)`.\n",
      " |  \n",
      " |          Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the\n",
      " |          weighted average in the cross-attention heads.\n",
      " |      past_key_values (:obj:`tuple(tuple(torch.FloatTensor))`, `optional`, returned when ``use_cache=True`` is passed or when ``config.use_cache=True``):\n",
      " |          Tuple of :obj:`tuple(torch.FloatTensor)` of length :obj:`config.n_layers`, with each tuple having 2 tensors\n",
      " |          of shape :obj:`(batch_size, num_heads, sequence_length, embed_size_per_head)`) and optionally if\n",
      " |          ``config.is_encoder_decoder=True`` 2 additional tensors of shape :obj:`(batch_size, num_heads,\n",
      " |          encoder_sequence_length, embed_size_per_head)`.\n",
      " |  \n",
      " |          Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if\n",
      " |          ``config.is_encoder_decoder=True`` in the cross-attention blocks) that can be used (see\n",
      " |          :obj:`past_key_values` input) to speed up sequential decoding.\n",
      " |  \n",
      " |  Method resolution order:\n",
      " |      BaseModelOutputWithPoolingAndCrossAttentions\n",
      " |      transformers.file_utils.ModelOutput\n",
      " |      collections.OrderedDict\n",
      " |      builtins.dict\n",
      " |      builtins.object\n",
      " |  \n",
      " |  Methods defined here:\n",
      " |  \n",
      " |  __eq__(self, other)\n",
      " |  \n",
      " |  __init__(self, last_hidden_state: torch.FloatTensor = None, pooler_output: torch.FloatTensor = None, hidden_states: Union[Tuple[torch.FloatTensor], NoneType] = None, past_key_values: Union[Tuple[Tuple[torch.FloatTensor]], NoneType] = None, attentions: Union[Tuple[torch.FloatTensor], NoneType] = None, cross_attentions: Union[Tuple[torch.FloatTensor], NoneType] = None) -> None\n",
      " |  \n",
      " |  __repr__(self)\n",
      " |  \n",
      " |  ----------------------------------------------------------------------\n",
      " |  Data and other attributes defined here:\n",
      " |  \n",
      " |  __annotations__ = {'attentions': typing.Union[typing.Tuple[torch.Float...\n",
      " |  \n",
      " |  __dataclass_fields__ = {'attentions': Field(name='attentions',type=typ...\n",
      " |  \n",
      " |  __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,or...\n",
      " |  \n",
      " |  __hash__ = None\n",
      " |  \n",
      " |  attentions = None\n",
      " |  \n",
      " |  cross_attentions = None\n",
      " |  \n",
      " |  hidden_states = None\n",
      " |  \n",
      " |  last_hidden_state = None\n",
      " |  \n",
      " |  past_key_values = None\n",
      " |  \n",
      " |  pooler_output = None\n",
      " |  \n",
      " |  ----------------------------------------------------------------------\n",
      " |  Methods inherited from transformers.file_utils.ModelOutput:\n",
      " |  \n",
      " |  __delitem__(self, *args, **kwargs)\n",
      " |      Delete self[key].\n",
      " |  \n",
      " |  __getitem__(self, k)\n",
      " |      x.__getitem__(y) <==> x[y]\n",
      " |  \n",
      " |  __post_init__(self)\n",
      " |  \n",
      " |  __setattr__(self, name, value)\n",
      " |      Implement setattr(self, name, value).\n",
      " |  \n",
      " |  __setitem__(self, key, value)\n",
      " |      Set self[key] to value.\n",
      " |  \n",
      " |  pop(self, *args, **kwargs)\n",
      " |      od.pop(k[,d]) -> v, remove specified key and return the corresponding\n",
      " |      value.  If key is not found, d is returned if given, otherwise KeyError\n",
      " |      is raised.\n",
      " |  \n",
      " |  setdefault(self, *args, **kwargs)\n",
      " |      Insert key with a value of default if key is not in the dictionary.\n",
      " |      \n",
      " |      Return the value for key if key is in the dictionary, else default.\n",
      " |  \n",
      " |  to_tuple(self) -> Tuple[Any]\n",
      " |      Convert self to a tuple containing all the attributes/keys that are not ``None``.\n",
      " |  \n",
      " |  update(self, *args, **kwargs)\n",
      " |      D.update([E, ]**F) -> None.  Update D from dict/iterable E and F.\n",
      " |      If E is present and has a .keys() method, then does:  for k in E: D[k] = E[k]\n",
      " |      If E is present and lacks a .keys() method, then does:  for k, v in E: D[k] = v\n",
      " |      In either case, this is followed by: for k in F:  D[k] = F[k]\n",
      " |  \n",
      " |  ----------------------------------------------------------------------\n",
      " |  Methods inherited from collections.OrderedDict:\n",
      " |  \n",
      " |  __ge__(self, value, /)\n",
      " |      Return self>=value.\n",
      " |  \n",
      " |  __gt__(self, value, /)\n",
      " |      Return self>value.\n",
      " |  \n",
      " |  __iter__(self, /)\n",
      " |      Implement iter(self).\n",
      " |  \n",
      " |  __le__(self, value, /)\n",
      " |      Return self<=value.\n",
      " |  \n",
      " |  __lt__(self, value, /)\n",
      " |      Return self<value.\n",
      " |  \n",
      " |  __ne__(self, value, /)\n",
      " |      Return self!=value.\n",
      " |  \n",
      " |  __reduce__(...)\n",
      " |      Return state information for pickling\n",
      " |  \n",
      " |  __reversed__(...)\n",
      " |      od.__reversed__() <==> reversed(od)\n",
      " |  \n",
      " |  __sizeof__(...)\n",
      " |      D.__sizeof__() -> size of D in memory, in bytes\n",
      " |  \n",
      " |  clear(...)\n",
      " |      od.clear() -> None.  Remove all items from od.\n",
      " |  \n",
      " |  copy(...)\n",
      " |      od.copy() -> a shallow copy of od\n",
      " |  \n",
      " |  items(...)\n",
      " |      D.items() -> a set-like object providing a view on D's items\n",
      " |  \n",
      " |  keys(...)\n",
      " |      D.keys() -> a set-like object providing a view on D's keys\n",
      " |  \n",
      " |  move_to_end(self, /, key, last=True)\n",
      " |      Move an existing element to the end (or beginning if last is false).\n",
      " |      \n",
      " |      Raise KeyError if the element does not exist.\n",
      " |  \n",
      " |  popitem(self, /, last=True)\n",
      " |      Remove and return a (key, value) pair from the dictionary.\n",
      " |      \n",
      " |      Pairs are returned in LIFO order if last is true or FIFO order if false.\n",
      " |  \n",
      " |  values(...)\n",
      " |      D.values() -> an object providing a view on D's values\n",
      " |  \n",
      " |  ----------------------------------------------------------------------\n",
      " |  Class methods inherited from collections.OrderedDict:\n",
      " |  \n",
      " |  fromkeys(iterable, value=None) from builtins.type\n",
      " |      Create a new ordered dictionary with keys from iterable and values set to value.\n",
      " |  \n",
      " |  ----------------------------------------------------------------------\n",
      " |  Data descriptors inherited from collections.OrderedDict:\n",
      " |  \n",
      " |  __dict__\n",
      " |  \n",
      " |  ----------------------------------------------------------------------\n",
      " |  Methods inherited from builtins.dict:\n",
      " |  \n",
      " |  __contains__(self, key, /)\n",
      " |      True if the dictionary has the specified key, else False.\n",
      " |  \n",
      " |  __getattribute__(self, name, /)\n",
      " |      Return getattr(self, name).\n",
      " |  \n",
      " |  __len__(self, /)\n",
      " |      Return len(self).\n",
      " |  \n",
      " |  get(self, key, default=None, /)\n",
      " |      Return the value for key if key is in the dictionary, else default.\n",
      " |  \n",
      " |  ----------------------------------------------------------------------\n",
      " |  Static methods inherited from builtins.dict:\n",
      " |  \n",
      " |  __new__(*args, **kwargs) from builtins.type\n",
      " |      Create and return a new object.  See help(type) for accurate signature.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "help(out)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "372442e1-0221-4d21-95da-9d0058ea7070",
   "metadata": {},
   "source": [
    "### Recuperando los pesos de atención"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "176cad80-144e-486a-a805-8dc09569c275",
   "metadata": {},
   "source": [
    "Los pesos de  atención después de la atención softmax, se utilizan para calcular el promedio ponderado en las cabezas de  autoatención. \n",
    "Son obtenidos pasando al modelo *output_attentions=True*\n",
    "\n",
    "+ *output attention* es una tupla. Cada elemento coresponde a los pesos de atención de cada capa codificadora."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d18b8954-dae6-4721-a0ec-7a0ca3ce5d9e",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = BertModel.from_pretrained('bert-base-uncased', output_hidden_states=True, output_attentions=True)\n",
    "tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "0efd6e94-5101-473f-a9b3-24efa41b12a6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "12\n"
     ]
    }
   ],
   "source": [
    "out = model(token_ids, attention_mask=attention_mask)\n",
    "\n",
    "last_hidden_state, pooler_output, hidden_states, attentions = \\\n",
    "        out.last_hidden_state, out.pooler_output, out.hidden_states, \\\n",
    "        out.attentions\n",
    "print(len(attentions))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "be0b076f-cadd-4c72-96f7-12dc08906dd1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.Size([1, 12, 7, 7])\n"
     ]
    }
   ],
   "source": [
    "print(attentions[11].shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6f09a46-8acc-46b4-aef3-b92b1fe0c2f9",
   "metadata": {},
   "source": [
    "La salida se explica así:\n",
    "\n",
    "- El tamaño del batch es 1. Una sentencia.\n",
    "- Son 12 cabezas de atención.\n",
    "- La sentencia viene de tamaño 7.\n",
    "\n",
    "Por lo tanto tenemos la salida de las 12 cabezas de atención para la sentencia.\n",
    "\n",
    "Vamos a darle una mirada a los pesos de atención de la última capa codificadora\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "58fabdd1-b720-4734-8acc-f10010d37a62",
   "metadata": {},
   "outputs": [],
   "source": [
    "attention11 = attentions[11].squeeze()#elimina la dimensión de batch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "c414704d-b8f9-4271-93db-721319905899",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "torch.Size([12, 7, 7])"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "attention11.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf6013b9-45bc-4d5f-b1cc-aa04837e9450",
   "metadata": {},
   "source": [
    "### Función para graficar pesos de atención de una cabeza"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "id": "b99ba957-08ff-4553-a4e2-3d1aca5472a3",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "from mpl_toolkits.axes_grid1 import make_axes_locatable\n",
    "\n",
    "# versión con decode utf-8\n",
    "def plot_attention_head_cp(in_tokens, translated_tokens, attention):\n",
    "  # The plot is of the attention when a token was generated.\n",
    "  # The model didn't generate `<START>` in the output. Skip it.\n",
    "  translated_tokens = translated_tokens[1:]\n",
    "\n",
    "  ax = plt.gca()\n",
    "  ax.matshow(attention)\n",
    "  ax.set_xticks(range(len(in_tokens)))\n",
    "  ax.set_yticks(range(len(translated_tokens)))\n",
    "\n",
    "  labels = [label.decode('utf-8') for label in in_tokens.numpy()]\n",
    "  ax.set_xticklabels(\n",
    "      labels, rotation=90)\n",
    "\n",
    "  labels = [label.decode('utf-8') for label in translated_tokens.numpy()]\n",
    "  ax.set_yticklabels(labels)\n",
    "\n",
    "\n",
    "\n",
    "def plot_attention_head(in_tokens, translated_tokens, attention):\n",
    "  # The plot is of the attention when a token was generated.\n",
    "  # The model didn't generate `<START>` in the output. Skip it.\n",
    "  #translated_tokens = translated_tokens[1:]\n",
    "\n",
    "  ax = plt.gca()\n",
    "  pcm = ax.matshow(attention)\n",
    "  ax.set_xticks(range(len(in_tokens)))\n",
    "  ax.set_yticks(range(len(translated_tokens)))\n",
    "\n",
    "  labels = [label for label in in_tokens]\n",
    "  ax.set_xticklabels(\n",
    "      labels, rotation=90)\n",
    "\n",
    "  labels = [label for label in translated_tokens]\n",
    "  ax.set_yticklabels(labels)\n",
    "  \n",
    " "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "f9c4698e-32b6-42ba-a90e-a68023d49177",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "torch.Size([7, 7])"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "head = attention11[0]\n",
    "head.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a25408a7-86c1-4da8-9045-e2235583e2c7",
   "metadata": {},
   "outputs": [],
   "source": [
    "head = head.detach().numpy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "id": "0e583ceb-8a8a-47df-9633-b8857b7de318",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAQoAAAEOCAYAAABxWlnfAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAUbklEQVR4nO3df5BdZX3H8fcnvwMhQSCibQNxRgQqhCBBVED5VRREkB+CiIN1rAF1tNpRpNVxrJ2OrVDjDwQMOIMOFKgwFMqoUAoR1IoGDEkkRmQEI/4iIgVhkyWbT/+4Z2VZdvfZvXvvubt3P6+Zndz73HPO97m7m88+59znnCPbRESMZFqnOxARE1+CIiKKEhQRUZSgiIiiBEVEFCUoIqIoQRERRQmKiCia0ekOREx2knYZxWLbbT8+WWsrMzMjxkfSFuBXgEZYbLrtPSZr7a4cUUi6aRSLPWb7r9vdl6iHpC+MYrEnbH+8DeU32D5wpAUk/agNdWur3ZUjCkkPAH8z0iLAl2y/vKYuTUmSpgHzbD9RQ62HgU8UFjvf9r5tqD3H9pbxLjORa3fliAL4mO1vj7SApH+sqzNTiaR/B84F+oB7gAWSPmv7gjaXXmH7q4W+vaAdhfv/E0raH9inat5ge/3gZSZr7a4cUQyl+iV53FPlDXeIpDW2l0o6CzgI+Chwj+0lNdXfzfbmOmoNqLkAuBFYBKylMWLdH/gFcFI7R1R11e7Kj0clfULSPtXj2ZLuAB4EfivpmM72ruvNlDQTeDNwo+1ngLaHs6Q3SXoUWCfpl5Je0+6aA/wTsBrYy/bJtt8M7AX8EPjnrqhtu+u+gB/z7GhpOXAHMB3YF/hBp/tX0/dgT+CY6vFcYKea6n4AeAT4Bo2/bnsCd9VQdy2wT/X4EODbNX6v7wdmDNE+g8ZuwKSv3ZUjCqDX1XcLeD1wje0+2xvo3uMyfyLp3cB1wJerpr8A/rOO2ra/YPvPbR/vhoeBI2sovc32T6o+3A3sVEPNfr22tw1urNq2dkPtbv1Ps1XSfsBvafySfnjAazt2pku1eh/wSuBuANsPSHphOwtKervtKyX93TCLfLad9YEXDqr9nOe221l/jqQDef5cBgGz21i3ttrdGhR/S+Mv6kIaR8N/DiDpeODeTnasJltt90qN3x1JM2j/cYL+AK7zL/lAlw2qPfh5O/2a4YPwN91Qe8p86tFP0qm2r+90P9pJ0meAx4GzgfcD7wXut/2xNtedDnzA9op21plMJM1044DupK7drccoRjIVfonPBx4F1gHn0Diw2I4Zic9huw84sd11hiLpPwY8/tdBr91ac18k6ShJlwO/7IbaUzEoRpoT3y1OAr5m+y22T7N9mesbOn5P0kWSDpf0iv6vGuruNeDxXw16bWEN9ZF0iKTPAw8DNwF38ewkqEldu1uPUYxkKuxrnQh8TtKdwDXALUMdGW+T/vkLnxrQZuCoNtcd6efa1p+5pH8GTqcxyelqGu99tQszRSdT7a4MCknrGPqXQ8DuNXendrbfWU16Og54G3CxpP+2PdL5L62qXcdHoUPZoTr6Pw2YO+CTANGYR9JOy4GNwCXAzba3SKrrD1IttbvyYKakPUd6vfpsv+tVYfEG4J3A4bbrGoK/EXg5MKe/zfanhl+jJTVXMcLIoZ0BVh3EPRY4k8bI6Q7gGGBRu0dyddXuyhEFMBPY3fZ3BzZKOpzGuftdTdIbgLfSmEOyCricxvC0jtqXAjtUtS8HTgN+0O66to9od40RavcB3wS+KWkOcAKN78Ejkv7H9tsmfe12Ti/t1BdwM7BkiPZlwH91un81vP9raJxrMbsDtdcO+ncecGsNdQ8GXjTg+dk0Tpb6ArBLm2vPAT4IXERjV2BG1T4feEc31O7WXY/1tvcb5rV1tvevu091k7Q7jf880Di/5Xc11b3b9iGSvg+cAvweWG97r8Kq4617L41zWx6T9FoaYfl+YCmwr+3T2lj7WuAZGp80HAc8ZPuD7arXidrduusxZ4TX2n1gq+MkvQW4kMZuh4AvSvqI7etqKH+zpJ2Bz9C4HgU0dkHabbrtx6rHZwAr3ZhYd72kNW2u/Zf9f3wkfYUadrXqrt2tQfFDSe+2fdnARknv4tlf3m72ceDg/lGEpIXAbTSmtbfbhcB7gMOB/6Xxl+6SGupOlzTDjQN4R9MYhvdr9+/5n2Y/2t7WP3W+JrXU7tZdj92BG4Beng2GZcAs4GTb7Z5/31GDd6+qS9LdV8cuVzVD8kngyqrpTGBn2209mCrpY8DxwGZgD+AVti3ppcBXbR/axtp9wFP9T2mMWp+uHtv2/MleuyuDop+kI4H+YxU/tn17J/tTF0kXAEtoTMCBxlB8re2P1lD7PtsHlNraVPtVwItpHDx9qmp7GY3rdk6FkwHbpiuDQtK9tkecNjyaZSYzSacCh9L4y3Kn7RtqqnsFcKnt71fPD6Fx9P29ba7bsZ/5VKjdrUHRAzww0iLAArfhPgtTnaQNwN40phRDYzdgA7CdxlC4LdfO7OTPfCrU7taDmaM5GaavHYUlfcf2YZKe5LkzBevYXx1cs7baA7yhhhpD6djPfCrU7soRRUS01lQ8zTwixihBERFFUyYoJC0vL5XaqZ3aQ5kyQcFzZ+qldmqn9hhMpaCIiCZNmk89ps/f0TMX7tz0+n1PPMX0+c3d0mPWz8d3j9dnvIWZGuk8tfYZT23NmD6u2r3be5g1rflz8Lbt1PxtKbZtfYoZs5u/hcv0x54qLzSMZ9jKzLbfzqM9tZ/kD5s9xAWOJs08ipkLd2bRv5zbkdovOXtjR+oCsL1zQT59t106Vhvg90cv7ljtBVd+v2O1O+k2Xzfk1d+y6xERRQmKiChKUEREUYIiIooSFBFRlKCIiKIERUQUJSgioihBERFFCYqIKEpQRETRqIJC0mJJPf13XJL0IknXSHpQ0v2SviHpZdVy64dY/1WS7pa0RtIGSZ+s2s+Q9DNJN7fyTUVEa43lpLAHbS9V41ZEN9C4qcpbASQtBXYHNg2z7leB023fV92mfW8A29dK+i3w4WbfQES0XzNnjx4JPGP70v4G22ugMfIYZp0XAr+ulu0D7h9NoepqPcsBZuy2oImuRkQrNHOMYj/Gfv/OFcBGSTdIOkca3QUSbK+0vcz2smavJRER41fLwUzbn6Jx789bgbcB36qjbkS0RjNB8WPgoLGuZPtB25fQuNP0AZJ2baJ2RHRAM0FxOzBb0rv7GyQdLOl1w60g6Y169n7se9G4c9HjTdSOiA4Y88HM6lbyJwOfk3Q+sAV4CPhgtcjekn45YJUPAacCKyQ9DWwDzqoOakbEJNDUNTNt/wo4fZiXZw7R9vVm6kTExDDaXY8+YEH/hKtWkXQGcDHwh1ZuNyJaa1QjCtubgEWtLm77WuDaVm83Ilor53pERFGCIiKKEhQRUZSgiIiiBEVEFCUoIqJo0tzNfL528SHTjulM8UnyPWq5P82674xbHvlRx2q//s+Wdqx2J93m6+6xvWxwe0YUEVGUoIiIogRFRBQlKCKiKEEREUUJiogoSlBERFGCIiKKEhQRUZSgiIiiBEVEFCUoIqJoQgSFpO91ug8RMbwJERS2X9PpPkTE8CZEUEj6Y6f7EBHDa+oGQHWRtBxYDjCHHTrcm4ipa0KMKIZje6XtZbaXzWR2p7sTMWVN6KCIiIkhQRERRQmKiCiaEEFhe16n+xARw5sQQRERE1uCIiKKEhQRUZSgiIiiBEVEFCUoIqIoQRERRQmKiChKUERE0YQ+zTwAqYO1O/t3pM/bO1o/npURRUQUJSgioihBERFFCYqIKEpQRERRgiIiihIUEVGUoIiIogRFRBQlKCKiKEEREUUJiogoGldQ5ObCEVNDRhQRUdSSoFDDBZLWS1on6Yyq/VpJxw9Y7gpJp0qaXi3/Q0lrJZ3Tin5ERHu0akRxCrAUOAA4BrhA0ouBa4D+0JgFHA18A3gX8H+2DwYOBt4t6SUt6ktEtFirguIw4GrbfbZ/C3ybRgB8EzhK0mzgOOBO2z3AscDZktYAdwO7AnsN3qik5ZJWS1r9DFtb1NWIGKtWXeFqyMsw2d4iaRXwehoji6sHLP9+27eMtFHbK4GVAPO1i1vU14gYo1aNKO4EzqiOPSwEXgv8oHrtGuCdwOFAfzDcArxH0kwASS+TtGOL+hIRLdaqEcUNwKuB+wAD59n+TfXarcDXgJts91ZtlwOLgXslCXgUeHOL+hIRLTauoLA9r/rXwEeqr8HLPEPjGMTAtu3AP1RfETHBZR5FRBQlKCKiKEEREUUJiogoSlBERFGCIiKKEhQRUZSgiIiiBEVEFLVqCnfbado0ps2d25Ha7u0tL9SFpu20U0frfz8nDE8YGVFERFGCIiKKEhQRUZSgiIiiBEVEFCUoIqIoQRERRQmKiChKUEREUYIiIooSFBFRlKCIiKJag0LSuZLOrrNmRIxfbWePSpph+9K66kVE64wpKCQtBr5F48bCBwI/Bc4GPgy8CZgLfA84x7ar+45+DzgUuEnSTsAfbV8o6QPAucA24H7bb23JO4qIlmtm12NvYKXtJcATwHuBi2wfbHs/GmFxwoDld7b9Otv/Nmg75wMHVts5t4l+RERNmgmKTba/Wz2+EjgMOFLS3ZLWAUcBLx+w/LXDbGctcJWkt9MYVTyPpOWSVkta3estTXQ1IlqhmaDwEM8vBk6zvT9wGTBnwOtPDbOdNwJfAg4C7pH0vN0g2yttL7O9bJbmPG8DEVGPZoJiD0mvrh6fCXynerxZ0jzgtNIGJE0DFtm+AzgP2BmY10RfIqIGzXzqsQF4h6QvAw8AlwAvANYBDwE/HMU2pgNXSloACFhh+/Em+hIRNWgmKLbbHnzw8ePV13PYPmLQ808OeHpYE7UjogMyMzMiisY0orD9ELBfe7oSERNVRhQRUZSgiIiiBEVEFCUoIqIoQRERRQmKiChKUEREUYIiIopqu8LVeHn7drb39HSo+OATZqeG7U8+2dH6r5w9Nb/vE1FGFBFRlKCIiKIERUQUJSgioihBERFFCYqIKEpQRERRgiIiihIUEVGUoIiIogRFRBQlKCKiaNRBIWmxpB5Ja6rnH5P0Y0lrJa2RdEjVvkrSxqptjaTrqvZPSnqkalsv6cSq/UOSfiHpoja8v4hogbGePfqg7aXVLQVPAF5he6uk3YBZA5Y7y/bqIdZfYftCSfsCd0l6oe0Vkv4ALGvuLUREuzV7mvmLgc22twLY3jyWlW1vkLQN2A34XZN9iIiaNHuM4lZgkaSfSrpY0usGvX7VgF2PCwavXO2mbAceHamIpOWSVkta/Qxbm+xqRIxXUyMK23+UdBBwOHAkcK2k821fUS0y3K7HhyS9HXgSOMMe+YowtlcCKwHma5dcxSSiQ5q+wpXtPmAVsErSOuAdwBWF1VbYvrDZmhHRGU3tekjaW9JeA5qWAg+3pEcRMeE0O6KYB3xR0s7ANuBnwPIBr18lqf8Cl5ttH9N8FyOi05o9RnEP8JphXjtimPZPNlMrIjpvLLsefcCC/glXrSLpQ8DfA0+0crsR0TqjHlHY3gQsanUHbK8AVrR6uxHROjnXIyKKEhQRUZSgiIiiBEVEFCUoIqIoQRERRQmKiChq+qSwumnaNKbNnduR2tt7esoLdaHpL9q9o/Xv6JnT0frxrIwoIqIoQRERRQmKiChKUEREUYIiIooSFBFRlKCIiKIERUQUJSgioihBERFFCYqIKEpQRERRMSgkLZbU03/1bUl91T1F10v6uqQdqvYZkjZL+vSg9VdJ2ihpraSfSLqouh8IkuZW2+qt7ogeERPQaEcUD9peWj3usb3U9n5AL3Bu1X4ssBE4XZIGrX+W7SXAEmArcCOA7Z5qu79q/i1ERLuNd9fjLuCl1eMzgc8DvwBeNdTCtnuB84A9JB0wztoRUZOmg0LSDOA4YJ2kucDRwM3A1TRCY0jVzY3vA/YZRY3lklZLWt3rLc12NSLGqZmgmFsdr1hNY/TwFeAE4A7bTwPXAydLmj7CNgbvmgzJ9krby2wvm6VcxCSiU5q5wlXPgOMVAEg6EzhU0kNV067AkcBtg1euAmR/YEMTtSOiA8b98aik+cBhwB62F9teDLyPIXY/JM0EPg1ssr12vLUjoh6tmEdxCnC77a0D2m4ETpQ0u3p+laS1wHpgR+CkFtSNiJqMedfD9rxBz68ArhjU9hiwsHp6RHNdi4iJYjQjij5gQf+Eq1bqn3AFzAS2t3r7EdEaxRGF7U3AonYUt90DLG3HtiOidXKuR0QUJSgioihBERFFCYqIKEpQRERRgiIiihIUEVHUzElhnfO86+HUVXeK5umsmR0t/1jfvPJCUYsp+j8gIsYiQRERRQmKiChKUEREUYIiIooSFBFRlKCIiKIERUQUJSgioihBERFFCYqIKEpQRERRMSgkLZbU038Vbkl9ktZIWi/p65J2qNpnSNos6dOD1l8laaOktZJ+IukiSTtXr82tttUrabfWv72IaIXRjigeHHAbwR7bS23vB/QC51btxwIbgdOl553meZbtJcASYCuNGwRhu//2hL9q/i1ERLuNd9fjLuCl1eMzgc/TuHHxq4Za2HYvcB6wh6QDxlk7ImrSdFBImgEcB6yTNBc4GrgZuJoh7jvaz3YfcB+wzyhqLJe0WtLqXm9ptqsRMU7NBEX/3b1W0xg9fAU4AbjD9tPA9cDJ1V3LhzOqK9DYXml7me1lszSnia5GRCs0c4WrngHHKwCQdCZwqKSHqqZdgSOB2wavXAXI/sCGJmpHRAeM++NRSfOBw4A9bC+2vRh4H0PsfkiaCXwa2GR77XhrR0Q9WjGP4hTgdttbB7TdCJwoaXb1/CpJa4H1wI7ASS2oGxE1GfOuh+15g55fAVwxqO0xYGH19IjmuhYRE8VoRhR9wIL+CVet1D/hCpgJbG/19iOiNYojCtubgEXtKG67B1jajm1HROvkXI+IKEpQRERRgiIiihIUEVGUoIiIogRFRBQlKCKiSLY73YdRkfQo8PA4NrEbsLlF3Unt1O7W2nvaXji4cdIExXhJWm17WWqndmqPXXY9IqIoQRERRVMpKFamdmqndnOmzDGKiGjeVBpRRESTEhQRUZSgiIiiBEVEFCUoIqLo/wHN7xO0Gz12GwAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plot_attention_head(in_tokens=tokens, translated_tokens=tokens, attention=head)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b603500-d79b-4b20-99e3-0996c8454d41",
   "metadata": {},
   "source": [
    "### Visualizando los pesos de todas las cabezas de atención"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "id": "864f57d2-01d6-41b5-872c-402b44a9e7e0",
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_attention_weights(sentence, translated_tokens, attention_heads):\n",
    "  in_tokens = sentence\n",
    "  #in_tokens = tokenizers.pt.tokenize(in_tokens).to_tensor()\n",
    "  #in_tokens = tokenizers.pt.lookup(in_tokens)[0]\n",
    "  #in_tokens\n",
    "\n",
    "  fig = plt.figure(figsize=(16, 8))\n",
    "\n",
    "  for h, head in enumerate(attention_heads):\n",
    "    ax = fig.add_subplot(3, 4, h+1)\n",
    "\n",
    "    plot_attention_head(in_tokens, translated_tokens, head)\n",
    "\n",
    "    ax.set_xlabel(f'Head {h+1}')\n",
    "\n",
    "  plt.tight_layout()\n",
    "  plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "id": "51b75032-bdf9-4b9f-a069-373bc7afc6fb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 1152x576 with 12 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "heads = attention11.detach().numpy()\n",
    "\n",
    "plot_attention_weights(sentence=tokens, translated_tokens=tokens, \n",
    "                      attention_heads=heads)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}