{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Pragmatic color describers" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "__author__ = \"Christopher Potts\"\n", "__version__ = \"CS224u, Stanford, Spring 2020\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Contents\n", "\n", "1. [Overview](#Overview)\n", "1. [Set-up](#Set-up)\n", "1. [The corpus](#The-corpus)\n", " 1. [Corpus reader](#Corpus-reader)\n", " 1. [ColorsCorpusExample instances](#ColorsCorpusExample-instances)\n", " 1. [Displaying examples](#Displaying-examples)\n", " 1. [Color representations](#Color-representations)\n", " 1. [Utterance texts](#Utterance-texts)\n", " 1. [Far, Split, and Close conditions](#Far,-Split,-and-Close-conditions)\n", "1. [Toy problems for development work](#Toy-problems-for-development-work)\n", "1. [Core model](#Core-model)\n", " 1. [Toy dataset illustration](#Toy-dataset-illustration)\n", " 1. [Predicting sequences](#Predicting-sequences)\n", " 1. [Listener-based evaluation](#Listener-based-evaluation)\n", " 1. [Other prediction and evaluation methods](#Other-prediction-and-evaluation-methods)\n", " 1. [Cross-validation](#Cross-validation)\n", "1. [Baseline SCC model](#Baseline-SCC-model)\n", "1. [Modifying the core model](#Modifying-the-core-model)\n", " 1. [Illustration: LSTM Cells](#Illustration:-LSTM-Cells)\n", " 1. [Illustration: Deeper models](#Illustration:-Deeper-models)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "\n", "This notebook is part of our unit on grounding. It illustrates core concepts from the unit, and it provides useful background material for the associated homework and bake-off." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set-up" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from colors import ColorsCorpusReader\n", "import os\n", "import pandas as pd\n", "from sklearn.model_selection import train_test_split\n", "import torch\n", "from torch_color_describer import (\n", " ContextualColorDescriber, create_example_dataset)\n", "import utils\n", "from utils import START_SYMBOL, END_SYMBOL, UNK_SYMBOL" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "utils.fix_random_seeds()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [Stanford English Colors in Context corpus](https://cocolab.stanford.edu/datasets/colors.html) (SCC) is included in the data distribution for this course. If you store the data in a non-standard place, you'll need to update the following:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "COLORS_SRC_FILENAME = os.path.join(\n", " \"data\", \"colors\", \"filteredCorpus.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The corpus" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The SCC corpus is based in a two-player interactive game. The two players share a context consisting of three color patches, with the display order randomized between them so that they can't use positional information when communicating.\n", "\n", "The __speaker__ is privately assigned a target color and asked to produce a description of it that will enable the __listener__ to identify the speaker's target. The listener makes a choice based on the speaker's message, and the two succeed if and only if the listener identifies the target correctly.\n", "\n", "In the game, the two players played repeated reference games and could communicate with each other in a free-form way. This opens up the possibility of modeling these repeated interactions as task-oriented dialogues. However, for this unit, we'll ignore most of this structure. We'll treat the corpus as a bunch of independent reference games played by anonymous players, and we will ignore the listener and their choices entirely.\n", "\n", "For the bake-off, we will be distributing a separate test set. Thus, all of the data in the SCC can be used for exploration and development." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Corpus reader" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The corpus reader class is `ColorsCorpusReader` in `colors.py`. The reader's primary function is to let you iterate over corpus examples:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "corpus = ColorsCorpusReader(\n", " COLORS_SRC_FILENAME,\n", " word_count=None, \n", " normalize_colors=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The two keyword arguments have their default values here. \n", "\n", "* If you supply `word_count` with an interger value, it will restrict to just examples where the utterance has that number of words (using a whitespace heuristic). This creates smaller corpora that are useful for development.\n", "\n", "* The colors in the corpus are in [HLS format](https://en.wikipedia.org/wiki/HSL_and_HSV). With `normalize_colors=False`, the first (hue) value is an integer between 1 and 360 inclusive, and the L (lightness) and S (saturation) values are between 1 and 100 inclusive. With `normalize_colors=True`, these values are all scaled to between 0 and 1 inclusive. The default is `normalize_colors=True` because this is a better choice for all the machine learning models we'll consider." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "examples = list(corpus.read())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can verify that we read in the same number of examples as reported in [Monroe et al. 2017](https://transacl.org/ojs/index.php/tacl/article/view/1142):" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "46994" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Should be 46994:\n", "\n", "len(examples)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ColorsCorpusExample instances" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The examples are `ColorsCorpusExample` instances:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "ex1 = next(corpus.read())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These objects have a lot of attributes and methods designed to help you study the corpus and use it for our machine learning tasks. Let's review some highlights." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Displaying examples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see what the speaker saw, with the utterance they chose wote above the patches:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The darker blue one\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAALUAAABECAYAAADHnXQVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAABLUlEQVR4nO3YMUrEUBRA0XyZSiutnC24EjvXajcrcQtOpZW2315kVMgQ5nJOmxTvweURMuacC5RcbT0ArE3U5IiaHFGTI2pydqcejjEu/tfInHP85b3nu/eL33VZluXp7fbXfR9uPhO7vnxc/7irS02OqMkRNTknv6m/u398Pdccqzke9luPwMZcanJETY6oyRE1OaImR9TkiJocUZMjanJETY6oyRE1OaImR9TkiJocUZMjanJETY6oyRE1OaImR9TkiJocUZMjanJETY6oyRE1OaImR9TkiJocUZMjanJETY6oyRE1OaImR9TkiJocUZMjanJETY6oyRE1OaImR9Tk7P7z8vGwP9ccsBqXmhxRkyNqcsacc+sZYFUuNTmiJkfU5IiaHFGTI2pyvgBwhhdAIEFGnQAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "ex1.display(typ='speaker')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the original order of patches for the speaker. The target happens to the be the leftmost patch, as indicated by the black box around it.\n", "\n", "Here's what the listener saw, with the speaker's message printed above the patches:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The darker blue one\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAALUAAABECAYAAADHnXQVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAABFUlEQVR4nO3YsW1CMRRAUX+UCipShRXYhCqzpsomrJBUSRVaswAiFEiIq3Nau3hPunLhZc45oGT16AHg3kRNjqjJETU5oibn5drhfnN6+q+R4996ueXe7vD99LuOMcbX59u/+368/iZ2ff/ZXtzVS02OqMkRNTmiJkfU5IiaHFGTI2pyRE2OqMkRNTmiJkfU5IiaHFGTI2pyRE2OqMkRNTmiJkfU5IiaHFGTI2pyRE2OqMkRNTmiJkfU5IiaHFGTI2pyRE2OqMkRNTmiJkfU5IiaHFGTI2pyRE2OqMkRNTmiJkfU5IiaHFGTI2pyRE2OqMkRNTmiJkfU5IianGXO+egZ4K681OSImhxRkyNqckRNjqjJOQNHYRKDRd/3AwAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "ex1.display(typ='listener')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The listener isn't shown the target, of course, so no patches are highlighted." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If `display` is called with no arguments, then the target is placed in the final position and the other two are given in an order determined by the corpus metadata:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The darker blue one\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAALUAAABECAYAAADHnXQVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAABLElEQVR4nO3YsU3DUBRAUX+UCiqoyApMQsesdJmEFUgFFbSfBVBwYcnK5ZzWLt6Trp4sjznnAiU3ew8AWxM1OaImR9TkiJqcw6WHrw+fV/9r5OXjfqx57+nu++p3XZZlefu6/XPfMUZi1znnr7u61OSImhxRk3Pxm5r/4fH5fe8RVjmfjqvec6nJETU5oiZH1OSImhxRkyNqckRNjqjJETU5oiZH1OSImhxRkyNqckRNjqjJETU5oiZH1OSImhxRkyNqckRNjqjJETU5oiZH1OSImhxRkyNqckRNjqjJETU5oiZH1OSImhxRkyNqckRNjqjJETU5oiZH1OSImhxRk3PYewD2dz4d9x5hUy41OaImR9TkjDnn3jPAplxqckRNjqjJETU5oiZH1OT8AK1HF0DPcEkgAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "ex1.display()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the representation order we use for our machine learning models." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Color representations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For machine learning, we'll often need to access the color representations directly. The primary attribute for this is `colors`:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[[0.7861111111111111, 0.5, 0.87],\n", " [0.6888888888888889, 0.5, 0.92],\n", " [0.6277777777777778, 0.5, 0.81]]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ex1.colors" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this display order, the third element is the target color and the first two are the distractors. The attributes `speaker_context` and `listener_context` return the same colors but in the order that those players saw them. For example:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[[0.6277777777777778, 0.5, 0.81],\n", " [0.7861111111111111, 0.5, 0.87],\n", " [0.6888888888888889, 0.5, 0.92]]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ex1.speaker_context" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Utterance texts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Utterances are just strings: " ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'The darker blue one'" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ex1.contents" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are cases where the speaker made a sequences of utterances for the same trial. We follow [Monroe et al. 2017](https://transacl.org/ojs/index.php/tacl/article/view/1142) in concatenating these into a single utterances. To preserve the original information, the individual turns are separated by `\" ### \"`. Example 3 is the first with this property – let's check it out:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "ex3 = examples[2]" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Medium pink ### the medium dark one'" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ex3.contents" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The method `parse_turns` will parse this into individual turns:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Medium pink', 'the medium dark one']" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ex3.parse_turns()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For examples consisting of a single turn, `parse_turns` returns a list of length 1:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['The darker blue one']" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ex1.parse_turns()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Far, Split, and Close conditions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The SCC contains three conditions:\n", " \n", "__Far condition__: All three colors are far apart in color space. Example:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Condition type: far\n", "purple\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAALUAAABECAYAAADHnXQVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAABLUlEQVR4nO3YwUnEUBRA0XyZbrQE3Qp24SytaJZOF4JbLUHr+TYgYxaBMNdztsniPbg8Qsacc4GSm70HgK2JmhxRkyNqckRNzuHSw7vvx6v/NfJ1+z7WvPf59nD1uy7Lstw/ffy57xgjseuc89ddXWpyRE2OqMm5+E3N//Dy/Lr3CKuczsdV77nU5IiaHFGTI2pyRE2OqMkRNTmiJkfU5IiaHFGTI2pyRE2OqMkRNTmiJkfU5IiaHFGTI2pyRE2OqMkRNTmiJkfU5IiaHFGTI2pyRE2OqMkRNTmiJkfU5IiaHFGTI2pyRE2OqMkRNTmiJkfU5IiaHFGTI2pyRE2OqMk57D0A+zudj3uPsCmXmhxRkyNqcsacc+8ZYFMuNTmiJkfU5IiaHFGTI2pyfgAdJBcf7IJsUgAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "print(\"Condition type:\", examples[1].condition)\n", "\n", "examples[1].display()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Split condition__: The target is close to one of the distractors, and the other is far away from both of them. Example:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Condition type: split\n", "lime\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAALUAAABECAYAAADHnXQVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAABKklEQVR4nO3YQUrDUBRA0XzpbuyO7FTXUHAsuAad1h3Z9Xw3IDWDQOj1nGkyeA8uj5Ax51yg5GHvAWBroiZH1OSImhxRk3O49fDt8+Xuf428Pn+MNe9dTl93v+uyLMvp8vTnvmOMxK5zzl93danJETU5oibn5jc1/8P79+PeI6xyPl5XvedSkyNqckRNjqjJETU5oiZH1OSImhxRkyNqckRNjqjJETU5oiZH1OSImhxRkyNqckRNjqjJETU5oiZH1OSImhxRkyNqckRNjqjJETU5oiZH1OSImhxRkyNqckRNjqjJETU5oiZH1OSImhxRkyNqckRNjqjJETU5oibnsPcA7O98vO49wqZcanJETY6oyRlzzr1ngE251OSImhxRkyNqckRNjqjJ+QHLEhcAkintbgAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "print(\"Condition type:\", examples[3].condition)\n", "\n", "examples[3].display()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Close condition__: The target is similar to both distractors. Example:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Condition type: close\n", "Medium pink ### the medium dark one\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAALUAAABECAYAAADHnXQVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAABK0lEQVR4nO3YwUnEUBRA0XyZCnRlH4rDtGC9tiAO2ocrbeHbgIxZBMJcz9kmi/fg8ggZc84FSm72HgC2JmpyRE2OqMkRNTmHSw+/7z6v/tfI7df9WPPey+PH1e+6LMvy/P7w575jjMSuc85fd3WpyRE1OaIm5+I3Nf/D29Pr3iOscjyfVr3nUpMjanJETY6oyRE1OaImR9TkiJocUZMjanJETY6oyRE1OaImR9TkiJocUZMjanJETY6oyRE1OaImR9TkiJocUZMjanJETY6oyRE1OaImR9TkiJocUZMjanJETY6oyRE1OaImR9TkiJocUZMjanJETY6oyRE1OaIm57D3AOzveD7tPcKmXGpyRE2OqMkZc869Z4BNudTkiJocUZMjanJETY6oyfkBPhUWwkgMDc4AAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "print(\"Condition type:\", examples[2].condition)\n", "\n", "examples[2].display()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These conditions go from easiest to hardest when it comes to reliable communication. In the __Far__ condition, the context is hardly relevant, whereas the nature of the distractors reliably shapes the speaker's choices in the other two conditions. \n", "\n", "You can begin to see how this affects speaker choices in the above examples: \"purple\" suffices for the __Far__ condition, a more marked single word (\"lime\") suffices in the __Split__ condition, and the __Close__ condition triggers a pretty long, complex description." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `condition` attribute provides access to this value: " ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'close'" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ex1.condition" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following verifies that we have the same number of examples per condition as reported in [Monroe et al. 2017](https://transacl.org/ojs/index.php/tacl/article/view/1142):" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "far 15782\n", "split 15693\n", "close 15519\n", "dtype: int64" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series([ex.condition for ex in examples]).value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Toy problems for development work" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The SCC corpus is fairly large and quite challenging as an NLU task. This means it isn't ideal when it comes to testing hypotheses and debugging code. Poor performance could trace to a mistake, but it could just as easily trace to the fact that the problem is very challenging from the point of view of optimization.\n", "\n", "To address this, the module `torch_color_describer.py` includes a function `create_example_dataset` for creating small, easy datasets with the same basic properties as the SCC corpus.\n", "\n", "Here's a toy problem containing just six examples:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "tiny_contexts, tiny_words, tiny_vocab = create_example_dataset(\n", " group_size=2, vec_dim=2)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['', '', 'A', 'B', '$UNK']" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tiny_vocab" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[['', 'A', ''],\n", " ['', 'A', ''],\n", " ['', 'A', 'B', ''],\n", " ['', 'A', 'B', ''],\n", " ['', 'B', 'A', 'B', 'A', ''],\n", " ['', 'B', 'A', 'B', 'A', '']]" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tiny_words" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[[array([0.84464215, 0.94729424]),\n", " array([0.5353399 , 0.57843591]),\n", " array([0.00500215, 0.05500586])],\n", " [array([0.80595944, 0.84372759]),\n", " array([0.50107106, 0.40530719]),\n", " array([0.01738777, 0.08438436])],\n", " [array([0.88390396, 0.88984181]),\n", " array([0.05563814, 0.17386006]),\n", " array([0.54320392, 0.54026499])],\n", " [array([0.88452288, 0.85557427]),\n", " array([0.04306275, 0.15269883]),\n", " array([0.55176147, 0.43193186])],\n", " [array([0.56949887, 0.52074521]),\n", " array([0.16142565, 0.14594636]),\n", " array([0.81854917, 0.81934328])],\n", " [array([0.47570688, 0.51040813]),\n", " array([0.16588093, 0.12370395]),\n", " array([0.90724562, 0.99462315])]]" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tiny_contexts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each member of `tiny_contexts` contains three vectors. The final (target) vector always has values in a range that determines the corresponding word sequence, which is drawn from a set of three fixed sequences. Thus, the model basically just needs to learn to ignore the distractors and find the association between the target vector and the corresponding sequence. \n", "\n", "All the models we study have a capacity to solve this task with very little data, so you should see perfect or near perfect performance on reasonably-sized versions of this task." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Core model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our core model for this problem is implemented in `torch_color_describer.py` as `ContextualColorDescriber`. At its heart, this is a pretty standard encoder–decoder model:\n", "\n", "* `Encoder`: Processes the color contexts as a sequence. We always place the target in final position so that it is closest to the supervision signals that we get when decoding.\n", "\n", "* `Decoder`: A neural language model whose initial hidden representation is the final hidden representation of the `Encoder`.\n", "\n", "* `EncoderDecoder`: Coordinates the operations of the `Encoder` and `Decoder`.\n", "\n", "Finally, `ContextualColorDescriber` is a wrapper around these model components. It handle the details of training and implements the prediction and evaluation functions that we will use.\n", "\n", "Many additional details about this model are included in the slides for this unit." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Toy dataset illustration" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To highlight the core functionality of `ContextualColorDescriber`, let's create a small toy dataset and use it to train and evaluate a model:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "toy_color_seqs, toy_word_seqs, toy_vocab = create_example_dataset(\n", " group_size=50, vec_dim=2)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "toy_color_seqs_train, toy_color_seqs_test, toy_word_seqs_train, toy_word_seqs_test = \\\n", " train_test_split(toy_color_seqs, toy_word_seqs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we expose all of the available parameters with their default values:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "toy_mod = ContextualColorDescriber(\n", " toy_vocab, \n", " embedding=None, # Option to supply a pretrained matrix as an `np.array`.\n", " embed_dim=10, \n", " hidden_dim=10, \n", " max_iter=100, \n", " eta=0.01,\n", " optimizer=torch.optim.Adam,\n", " batch_size=128,\n", " l2_strength=0.0,\n", " warm_start=False,\n", " device=None)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Epoch 100; err = 0.13451486825942993" ] } ], "source": [ "_ = toy_mod.fit(toy_color_seqs_train, toy_word_seqs_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Predicting sequences" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `predict` method takes a list of color contexts as input and returns model descriptions:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "toy_preds = toy_mod.predict(toy_color_seqs_test)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['', 'A', 'B', '']" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "toy_preds[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can then check that we predicted all correct sequences:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "toy_correct = sum(1 for x, p in zip(toy_word_seqs_test, toy_preds) if x == p)\n", "\n", "toy_correct / len(toy_word_seqs_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For real problems, this is too stringent a requirement, since there are generally many equally good descriptions. This insight gives rise to metrics like [BLEU](https://en.wikipedia.org/wiki/BLEU), [METEOR](https://en.wikipedia.org/wiki/METEOR), [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)), [CIDEr](https://arxiv.org/pdf/1411.5726.pdf), and others, which seek to relax the requirement of an exact match with the test sequence. These are reasonable options to explore, but we will instead adopt a communcation-based evaluation, as discussed in the next section." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Listener-based evaluation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`ContextualColorDescriber` implements a method `listener_accuracy` that we will use for our primary evaluations in the assignment and bake-off. The essence of the method is that we can calculate\n", "\n", "$$\n", "c^{*} = \\text{argmax}_{c \\in C} P_S(\\text{utterance} \\mid c)\n", "$$\n", "\n", "\n", "where $P_S$ is our describer model and $C$ is the set of all permutations of all three colors in the color context. We take $c^{*}$ to be a correct prediction if it is one where the target is in the privileged final position. (There are two such contexts; we try both in case the order of the distractors influences the predictions, and the model is correct if one of them has the highest probability.)\n", "\n", "Here's the listener accuracy of our toy model:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "toy_mod.listener_accuracy(toy_color_seqs_test, toy_word_seqs_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Other prediction and evaluation methods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can get the perplexities for test examles with `perpelexities`:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "toy_perp = toy_mod.perplexities(toy_color_seqs_test, toy_word_seqs_test)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.018597919229854" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "toy_perp[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use `predict_proba` to see the full probability distributions assigned to test examples:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "toy_proba = toy_mod.predict_proba(toy_color_seqs_test, toy_word_seqs_test)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(4, 5)" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "toy_proba[0].shape" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'': 1.0, '': 0.0, 'A': 0.0, 'B': 0.0, '$UNK': 0.0}\n", "{'': 0.0036859103, '': 0.0002668097, 'A': 0.9854643, 'B': 0.00914348, '$UNK': 0.0014396048}\n", "{'': 0.004782134, '': 0.024507374, 'A': 0.0019362223, 'B': 0.96381474, '$UNK': 0.0049594548}\n", "{'': 0.0050890064, '': 0.9780351, 'A': 0.014443797, 'B': 0.0008280464, '$UNK': 0.0016041624}\n" ] } ], "source": [ "for timestep in toy_proba[0]:\n", " print(dict(zip(toy_vocab, timestep)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Cross-validation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use `utils.fit_classifier_with_crossvalidation` to cross-validate these models. Just be sure to set `scoring=None` so that the sklearn model selection methods use the `score` method of `ContextualColorDescriber`, which is an alias for `listener_accuracy`:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Epoch 100; err = 0.12754583358764648" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Best params: {'hidden_dim': 20}\n", "Best score: 0.982\n" ] } ], "source": [ "best_mod = utils.fit_classifier_with_crossvalidation(\n", " toy_color_seqs_train, \n", " toy_word_seqs_train, \n", " toy_mod, \n", " cv=2,\n", " scoring=None,\n", " param_grid={'hidden_dim': [10, 20]})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baseline SCC model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just to show how all the pieces come together, here's a very basic SCC experiment using the core code and very simplistic assumptions (which you will revisit in the assignment) about how to represent the examples:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To facilitate quick development, we'll restrict attention to the two-word examples:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "dev_corpus = ColorsCorpusReader(COLORS_SRC_FILENAME, word_count=2)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "dev_examples = list(dev_corpus.read())" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "13890" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(dev_examples)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we extract the raw colors and texts (as strings):" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [], "source": [ "dev_cols, dev_texts = zip(*[[ex.colors, ex.contents] for ex in dev_examples])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To tokenize the examples, we'll just split on whitespace, taking care to add the required boundary symbols:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "dev_word_seqs = [[START_SYMBOL] + text.split() + [END_SYMBOL] for text in dev_texts]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll use a random train–test split:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "dev_cols_train, dev_cols_test, dev_word_seqs_train, dev_word_seqs_test = \\\n", " train_test_split(dev_cols, dev_word_seqs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our vocab is determined by the train set, and we take care to include the `$UNK` token:" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "dev_vocab = sorted({w for toks in dev_word_seqs_train for w in toks}) + [UNK_SYMBOL]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And now we're ready to train a model:" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "dev_mod = ContextualColorDescriber(\n", " dev_vocab, \n", " embed_dim=10, \n", " hidden_dim=10, \n", " max_iter=10, \n", " batch_size=128)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Epoch 10; err = 101.7589635848999" ] } ], "source": [ "_ = dev_mod.fit(dev_cols_train, dev_word_seqs_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And finally an evaluation in terms of listener accuracy:" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5384393895767348" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dev_mod.listener_accuracy(dev_cols_test, dev_word_seqs_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Modifying the core model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first few assignment problems concern how you preprocess the data for your model. After that, the goal is to subclass model components in `torch_color_describer.py`. For the bake-off submission, you can do whatever you like in terms of modeling, but my hope is that you'll be able to continue subclassing based on `torch_color_describer.py`.\n", "\n", "This section provides some illustrative examples designed to give you a feel for how the code is structured and what your options are in terms of creating subclasses." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Illustration: LSTM Cells" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Both the `Encoder` and the `Decoder` of `torch_color_describer` are currently GRU cells. Switching to another cell type is easy:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Step 1__: Subclass the `Encoder`; all we have to do here is change `GRU` from the original to `LSTM`:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [], "source": [ "import torch.nn as nn\n", "from torch_color_describer import Encoder\n", "\n", "class LSTMEncoder(Encoder):\n", " def __init__(self, color_dim, hidden_dim):\n", " super().__init__(color_dim, hidden_dim) \n", " self.rnn = nn.LSTM(\n", " input_size=self.color_dim,\n", " hidden_size=self.hidden_dim,\n", " batch_first=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Step 2__: Subclass the `Decoder`, making the same simple change as above:" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "import torch.nn as nn\n", "from torch_color_describer import Encoder, Decoder\n", "\n", "class LSTMDecoder(Decoder):\n", " def __init__(self, *args, **kwargs):\n", " super().__init__(*args, **kwargs) \n", " self.rnn = nn.LSTM(\n", " input_size=self.embed_dim,\n", " hidden_size=self.hidden_dim,\n", " batch_first=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Step 3__:`ContextualColorDescriber` has a method called `build_graph` that sets up the `Encoder` and `Decoder`. The needed revision just uses `LSTMEncoder`:" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "from torch_color_describer import EncoderDecoder\n", "\n", "class LSTMContextualColorDescriber(ContextualColorDescriber): \n", " \n", " def build_graph(self):\n", " \n", " # Use the new Encoder:\n", " encoder = LSTMEncoder(\n", " color_dim=self.color_dim,\n", " hidden_dim=self.hidden_dim)\n", "\n", " # Use the new Decoder:\n", " decoder = LSTMDecoder(\n", " vocab_size=self.vocab_size,\n", " embed_dim=self.embed_dim,\n", " embedding=self.embedding,\n", " hidden_dim=self.hidden_dim)\n", "\n", " return EncoderDecoder(encoder, decoder)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's an example run:" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "lstm_mod = LSTMContextualColorDescriber(\n", " toy_vocab, \n", " embed_dim=10, \n", " hidden_dim=10, \n", " max_iter=100, \n", " batch_size=128)" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Epoch 100; err = 0.14593948423862457" ] } ], "source": [ "_ = lstm_mod.fit(toy_color_seqs_train, toy_word_seqs_train) " ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lstm_mod.listener_accuracy(toy_color_seqs_test, toy_word_seqs_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Illustration: Deeper models" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `Encoder` and `Decoder` are both currently hard-coded to have just one hidden layer. It is straightforward to make them deeper as long as we ensure that both the `Encoder` and `Decoder` have the same depth; since the `Encoder` final states are the initial hidden states for the `Decoder`, we need this alignment. \n", "\n", "(Strictly speaking, we could have different numbers of `Encoder` and `Decoder` layers, as long as we did some kind of averaging or copying to achieve the hand-off from `Encoder` to `Decocer`. I'll set this possibility aside.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Step 1__: We need to subclass the `Encoder` and `Decoder` so that they have `num_layers` argument that is fed into the RNN cell:" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [], "source": [ "import torch.nn as nn\n", "from torch_color_describer import Encoder, Decoder\n", "\n", "class DeepEncoder(Encoder):\n", " def __init__(self, *args, num_layers=2, **kwargs):\n", " super().__init__(*args, **kwargs)\n", " self.num_layers = num_layers\n", " self.rnn = nn.GRU(\n", " input_size=self.color_dim,\n", " hidden_size=self.hidden_dim,\n", " num_layers=self.num_layers,\n", " batch_first=True) \n", "\n", "\n", "class DeepDecoder(Decoder):\n", " def __init__(self, *args, num_layers=2, **kwargs):\n", " super().__init__(*args, **kwargs) \n", " self.num_layers = num_layers\n", " self.rnn = nn.GRU(\n", " input_size=self.embed_dim,\n", " hidden_size=self.hidden_dim,\n", " num_layers=self.num_layers,\n", " batch_first=True) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Step 2__: As before, we need to update the `build_graph` method of `ContextualColorDescriber`. The needed revision just uses `DeepEncoder` and `DeepDecoder`. To expose this new argument to the user, we also add a new keyword argument to `ContextualColorDescriber`:" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "from torch_color_describer import EncoderDecoder\n", "\n", "class DeepContextualColorDescriber(ContextualColorDescriber): \n", " def __init__(self, *args, num_layers=2, **kwargs):\n", " self.num_layers = num_layers\n", " super().__init__(*args, **kwargs)\n", " \n", " def build_graph(self):\n", " encoder = DeepEncoder(\n", " color_dim=self.color_dim,\n", " hidden_dim=self.hidden_dim,\n", " num_layers=self.num_layers) # The new piece is this argument.\n", "\n", " decoder = DeepDecoder(\n", " vocab_size=self.vocab_size,\n", " embed_dim=self.embed_dim,\n", " embedding=self.embedding,\n", " hidden_dim=self.hidden_dim,\n", " num_layers=self.num_layers) # The new piece is this argument.\n", "\n", " return EncoderDecoder(encoder, decoder)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An example/test run:" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "mod_deep = DeepContextualColorDescriber(\n", " toy_vocab, \n", " embed_dim=10, \n", " hidden_dim=10, \n", " max_iter=100,\n", " batch_size=128)" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Epoch 100; err = 0.10894003510475159" ] } ], "source": [ "_ = mod_deep.fit(toy_color_seqs_train, toy_word_seqs_train) " ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mod_deep.listener_accuracy(toy_color_seqs_test, toy_word_seqs_test)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 2 }