{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Homework 4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "__author__ = \"Christopher Potts\"\n",
    "__version__ = \"CS224u, Stanford, Spring 2018 term\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Contents\n",
    "\n",
    "0. [Data and background](#Data-and-background)\n",
    "0. [Question 1: Experiment function [4 points]](#Question-1:-Experiment-function-[4-points])\n",
    "0. [Question 2: Memorize the training data [2 points]](#Question-2:-Memorize-the-training-data-[2-points])\n",
    "0. [Question 3: Negation [2 points]](#Question-3:-Negation-[2-points])\n",
    "0. [Question 4: Negation and generalization [2 points]](#Question-4:-Negation-and-generalization-[2-points])\n",
    "0. [Further reading](#Further-reading)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The goal of this homework is to begin to assess the extent to which RNNs can learn to simulate __compositional semantics__: the way the meanings of words and phrases combine to form more complex meanings. We're going to do this with simulated data so that we have clear learning targets and so we can track the extent to which the models are truly generalizing in the desired ways."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Applications/anaconda/envs/nlu/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n",
      "  from ._conv import register_converters as _register_converters\n"
     ]
    }
   ],
   "source": [
    "import json\n",
    "import nli\n",
    "import os\n",
    "from sklearn.metrics import classification_report\n",
    "from sklearn.model_selection import train_test_split\n",
    "import tensorflow as tf\n",
    "from tf_rnn_classifier import TfRNNClassifier"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data and background\n",
    "\n",
    "The __base__ dataset is  `nli_simulated_data.json` in `nlidata`. (You'll see below why it's the \"base\" dataset.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_home = \"nlidata\"\n",
    "\n",
    "base_data_filename = os.path.join(data_home, 'nli_simulated_data.json')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "def read_base_dataset(base_data_filename):\n",
    "    \"\"\"Read in the dataset and return it in a format that lets us\n",
    "    define it as a set.\n",
    "    \"\"\"\n",
    "    with open(base_data_filename, 'rt') as f:\n",
    "        base = {((tuple(x), tuple(y)), z) for (x, y), z in json.load(f)}\n",
    "    return base"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "base = read_base_dataset(base_data_filename)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is a set of triples, where the first two members are tuples (premise and hypothesis) and the third member is a label:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[((('f',), ('n',)), 'superset'),\n",
       " ((('f',), ('l',)), 'neutral'),\n",
       " ((('g',), ('b',)), 'subset'),\n",
       " ((('i',), ('d',)), 'neutral'),\n",
       " ((('g',), ('c',)), 'subset')]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list(base)[: 5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The letters are arbitrary names, but the dataset was generated in a way that ensures logical consistency. For instance, since"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "((('a',), ('c',)), 'superset') in base"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "((('c',), ('k',)), 'superset') in base"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "we have"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "((('a',), ('k',)), 'superset') in base"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "by the transitivity of `subset`,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's the full label set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "simulated_labels = ['disjoint', 'equal', 'neutral', 'subset', 'superset']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These are interpreted as disjoint. In particular, __subset__ is proper subset and __superset__ is proper superset – both exclude the case where the two arguments are __equal__."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is the full vocabulary, which you'll need in order to create embedding spaces:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['not',\n",
       " '$UNK',\n",
       " 'a',\n",
       " 'b',\n",
       " 'c',\n",
       " 'd',\n",
       " 'e',\n",
       " 'f',\n",
       " 'g',\n",
       " 'h',\n",
       " 'i',\n",
       " 'j',\n",
       " 'k',\n",
       " 'l',\n",
       " 'm',\n",
       " 'n']"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sim_vocab = [\"not\", \"$UNK\"] + sorted(set([p[0] for x,y in base for p in x]))\n",
    "\n",
    "sim_vocab"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 1: Experiment function [4 points]\n",
    "\n",
    "Complete the function `sim_experiment` so that it trains a `TfRNNClassifier` on a dataset in the format of `base`, prints out a `classification_report` and returns the trained model. Make sure all of the keyword arguments to `sim_experiment` are respected!\n",
    "\n",
    "__To submit:__\n",
    "\n",
    "* Your completed version of `sim_experiment` and any supporting functions it uses."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "def sim_experiment(\n",
    "        train_dataset, \n",
    "        test_dataset, \n",
    "        embed_dim=50, \n",
    "        hidden_dim=50, \n",
    "        eta=0.01, \n",
    "        max_iter=10, \n",
    "        cell_class=tf.nn.rnn_cell.LSTMCell, \n",
    "        hidden_activation=tf.nn.tanh):    \n",
    "    # To be completed: \n",
    "    \n",
    "    # Process `train_dataset` into an (X, y) pair\n",
    "    # that is suitable for the `fit` methd of \n",
    "    # `TfRNNClassifier`.        \n",
    "    \n",
    "    # Train a `TfRNNClassifier` on `train_dataset`,\n",
    "    # using all the keyword arguments given above.\n",
    "    \n",
    "    # Test the trained model on `test_dataset`;\n",
    "    # assumes `test_dataset` is processed for use\n",
    "    # with `predict` and the `classification_report`\n",
    "    # below.\n",
    "    \n",
    "    # Specified printing and return value, feel free\n",
    "    # to change the variable names if you wish:\n",
    "    print(classification_report(y_test, predictions))\n",
    "    return model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 2: Memorize the training data [2 points]\n",
    "\n",
    "Experiment with `sim_experiment` until you've found a setting where `sim_experiment(base, base)` yields perfect performance on all classes. (If it's a little off, that's okay.)\n",
    "\n",
    "__To submit__: \n",
    "\n",
    "* Your function call to `sim_experiment` showing the values of all the parameters.\n",
    "\n",
    "__Tips__: Definitely explore different values of `cell_class` and `hidden_activation`.  You might also pick high `embed_dim` and `hidden_dim` to ensure that you have sufficient representational power. These settings in turn demand a large number of iterations.\n",
    "\n",
    "__Note__: There is value in finding the smallest, or most conservative, models that will achieve this memorization, but you needn't engage in such search. Go big if you want to get this done fast!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 3: Negation [2 points]\n",
    "\n",
    "Now that we have some indication that the model works, we want to start making the data more complex. To do this, we'll simply negate one or both arguments and assign them the relation determined by their original label and the logic of negation. For instance, the training instance\n",
    "\n",
    "```\n",
    "((('p',), ('q',)), 'subset')\n",
    "```\n",
    "\n",
    "will become five distinct ones:\n",
    "\n",
    "```\n",
    "((('not', 'p'), ('not', 'p')), 'equal')\n",
    "((('not', 'p'), ('not', 'q')), 'superset')\n",
    "((('not', 'p'), ('q',)), 'neutral')\n",
    "((('not', 'q'), ('not', 'q')), 'equal')\n",
    "((('p',), ('not', 'q')), 'disjoint')\n",
    "```\n",
    "\n",
    "The full logic of this is a somewhat liberal interpretation of the theory of negation developed by [MacCartney and Manning 2007](http://nlp.stanford.edu/~wcmac/papers/natlog-wtep07.pdf):\n",
    "\n",
    "\n",
    "$$\\begin{array}{c c}\n",
    "\\hline \n",
    "           & \\text{not-}p, \\text{not-}q & p, \\text{not-}q & \\text{not-}p, q \\\\\n",
    "\\hline \n",
    "p \\text{ disjoint } q & \\text{neutral}  & \\text{subset}   & \\text{superset} \\\\\n",
    "p \\text{ equal } q    & \\text{equal}    & \\text{disjoint} & \\text{disjoint} \\\\\n",
    "p \\text{ neutral } q  & \\text{neutral}  & \\text{neutral}  & \\text{neutral} \\\\\n",
    "p \\text{ subset } q   & \\text{superset} & \\text{disjoint} & \\text{neutral} \\\\\n",
    "p \\text{ superset } q & \\text{subset}   & \\text{neutral}  & \\text{disjoint} \\\\\n",
    "\\hline\n",
    "\\end{array}$$ \n",
    "\n",
    "where we also add all instances of $p \\text{ equal } p$.\n",
    "\n",
    "If you don't want to worry about the details, that's okay – you can treat `negate_dataset` as a black-box. Just think of it as implementing the theory of negation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "def negate_dataset(dataset):\n",
    "    \"\"\"Map `dataset` to a new dataset that has been thoroughly negated.\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    dataset : set of pairs ((p, h), label)\n",
    "        Where `p` and `h` are tuples of str.\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    set\n",
    "        Same format as `dataset`, and disjoint from it.\n",
    "        \n",
    "    \"\"\"\n",
    "    new_dataset = set()\n",
    "    for (p, q), rel in dataset:        \n",
    "        neg_p = tuple([\"not\"] + list(p))\n",
    "        neg_q = tuple([\"not\"] + list(q))\n",
    "        new_dataset.add(((neg_p, neg_p), 'equal'))\n",
    "        new_dataset.add(((neg_q, neg_q), 'equal'))\n",
    "        combos = [(neg_p, neg_q), (p, neg_q), (neg_p, q)]\n",
    "        if rel == \"disjoint\":\n",
    "            new_rels = (\"neutral\", \"subset\", \"superset\")\n",
    "        elif rel == \"equal\":\n",
    "            new_rels = (\"equal\", \"disjoint\", \"disjoint\") \n",
    "        elif rel == \"neutral\":\n",
    "            new_rels = (\"neutral\", \"neutral\", \"neutral\")\n",
    "        elif rel == \"subset\":\n",
    "            new_rels = (\"superset\", \"disjoint\", \"neutral\")\n",
    "        elif rel == \"superset\":\n",
    "            new_rels = (\"subset\", \"neutral\", \"disjoint\") \n",
    "        new_dataset |= set(zip(combos, new_rels))\n",
    "    return new_dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using `negate_dataset`, we can map the `base` dataset to a singly negated one:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "neg1 = negate_dataset(base)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[((('n',), ('not', 'n')), 'disjoint'),\n",
       " ((('e',), ('not', 'l')), 'neutral'),\n",
       " ((('not', 'n'), ('n',)), 'disjoint'),\n",
       " ((('not', 'i'), ('g',)), 'superset'),\n",
       " ((('not', 'd'), ('d',)), 'disjoint')]"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list(neg1)[: 5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Your tasks:__\n",
    "    \n",
    "1. Create a dataset that is the union of `base`, `neg1`, and a doubly negated version of `base`, where doubly negating `x` is achieved by `negate_dataset(negate_dataset(x))`.\n",
    "\n",
    "2. Use [sklearn.model_selection.train_test_split](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) to create a random split of this new dataset, with 0.70 of the data used for training and the rest used for testing. \n",
    "\n",
    "3. Use `sim_experiment` to evaluate your network on this split, and play around with the keyword arguments until you have an average F1-score at or above 0.55.\n",
    "\n",
    "__To submit:__\n",
    "\n",
    "* Your function call to `sim_experiment` showing the values of all the parameters."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 4: Negation and generalization [2 points]\n",
    "\n",
    "So you got reasonably good results in the previous question. Has your model truly learned negation? To really address this question, we should see how it does on sequences of a length it hasn't seen before.\n",
    "\n",
    "__Your task__: \n",
    "\n",
    "Use your `sim_experiment` to train a network on the union of `base` and `neg1`, and evaluate it on the doubly negated dataset. By design, this means that your model will be evaluated on examples that are longer than those it was trained on. Use all the same keyword arguments to `sim_experiment` that you used for the previous question.\n",
    "\n",
    "__To submit__: \n",
    "\n",
    "* The printed classification report from your run (you can just paste it in).\n",
    "\n",
    "__A note on performance__: our mean F1 dropped a lot, and we expect it to drop for you too. You will not be evaluated based on the numbers you achieve, but rather only on whether you successfully run the required experiment.\n",
    "\n",
    "(If you did really well, go a step further, by testing on the triply negated version!)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Further reading\n",
    "\n",
    "* MacCartney and Manning (2007), [Natural Logic for Textual Inference](http://nlp.stanford.edu/~wcmac/papers/natlog-wtep07.pdf)\n",
    "\n",
    "* Bowman et al. (2015), [Tree-structured composition in neural networks without tree-structured architectures](https://arxiv.org/abs/1506.04834)\n",
    "\n",
    "* Lake and Baroni (2017), [Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks](https://arxiv.org/pdf/1711.00350.pdf)\n",
    "\n",
    "* Evans et al. (2018), [Can neural networks understand logical entailment?](https://arxiv.org/abs/1802.08535)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}