{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Natural language inference" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "__author__ = \"Christopher Potts\"\n", "__version__ = \"CS224u, Stanford, Spring 2016 term\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Contents\n", "\n", "0. [Overview](#Overview)\n", "0. [Set-up](#Set-up)\n", "0. [Working with SNLI](#Working-with-SNLI)\n", " 0. [Trees](#Trees)\n", " 0. [Readers](#Readers)\n", "0. [Linear classifier approach](#Linear-classifier-approach)\n", " 0. [Baseline linear classifier features](#Baseline-linear-classifier-features)\n", " 0. [Building datasets for linear classifier experiments](#Building-datasets-for-linear-classifier-experiments)\n", " 0. [Training linear classifiers](#Training-linear-classifiers)\n", " 0. [Running linear classifier experiments](#Running-linear-classifier-experiments)\n", "0. [Recurrent neural network approach](#Recurrent-neural-network-approach)\n", " 0. [Classifier RNN model definition](#Classifier-RNN-model-definition)\n", " 0. [Building datasets for classifier RNNs](#Building-datasets-for-classifier-RNNs)\n", " 0. [Running classifier RNN experiments](#Running-classifier-RNN-experiments)\n", " 0. [Next steps for NLI deep learning models](#Next-steps-for-NLI-deep-learning-models)\n", "0. [Additional NLI resources](#Additional-NLI-resources)\n", "0. [Homework 4](#Homework-4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "\n", "Natural Language Inference (NLI) is the task of predicting the logical relationships between words, phrases, sentences, (paragraphs, documents, ...). Such relationships are crucial for all kinds of reasoning in natural language: arguing, debating, problem solving, summarization, and so forth. \n", "\n", "Our NLI data will look like this:\n", "\n", "* (_every dog danced_, _every puppy moved_) $\\Rightarrow$ __entailment__\n", "* (_a puppy danced_, _no dog moved_) $\\Rightarrow$ __contradiction__\n", "* (_a dog moved_, _no puppy danced_) $\\Rightarrow$ __neutral__\n", "\n", "The first sentence is the __premise__ and the second is the __hypothesis__ (logicians call it the __conclusion__).\n", "\n", "We looked at NLI briefly in our word-level entailment bake-off (the [wordentail.ipynb](wordentail.ipynb) notebook). The purpose of this codebook is to introduce the problem of NLI more fully in the context of the [Stanford Natural Language Inference](http://nlp.stanford.edu/projects/snli/) corpus (SNLI). We'll explore two general approaches:\n", "\n", "* Standard linear classifiers\n", "* Recurrent neural networks\n", "\n", "This should be a good starting point for exploring richer models of NLI. It's also fun because it sets up a battle royale between models that require serious linguistic analysis (the linear ones) and models that are claimed by advocates to require no such analysis (deep learning)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import os\n", "import re\n", "import sys\n", "import pickle\n", "import numpy as np\n", "import itertools\n", "from collections import Counter\n", "from sklearn.feature_extraction import DictVectorizer\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn.metrics import classification_report\n", "import utils\n", "from nltk.tree import Tree\n", "from nli_rnn import ClassifierRNN" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%config InlineBackend.figure_formats=['svg']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set-up" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "0. Make sure your environment includes all the requirements for [the cs224u repository](https://github.com/cgpotts/cs224u). It's okay if you couldn't get TensorFlow to work – it's not required for this notebook.\n", "\n", "0. Dowbload the [nli-data](https://web.stanford.edu/class/cs224u/data/nli-data.zip) data distribution and put it in the same directory as this notebook (or update `snli_sample_src` just below.\n", "\n", "0. For the homework: make sure you've run `nltk.download()` to get the NLTK data. (In particular, you need to use NLTK's WordNet API.)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "dict_keys(['dev', 'vocab', 'train'])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Home for our SNLI sample. Because SNLI is very large, we'll work with a \n", "# small sample from the training set in class.\n", "snli_sample_src = os.path.join('nli-data', 'snli_1.0_cs224u_sample.pickle')\n", "\n", "# Load the dataset: a dict with keys `train`, `dev`, and `vocab`. The first\n", "# two are lists of `dict`s sampled from the SNLI JSONL files. The third is\n", "# the complete vocabulary of the leaves in the trees for `train` and `dev`.\n", "snli_sample = pickle.load(open(snli_sample_src, 'rb'))\n", "\n", "snli_sample.keys()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Working with SNLI" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SNLI contains both regular string representations of the data, unlabeled binary parses like the following:\n", "\n", "`\n", "( ( A child ) ( is ( playing ( in ( a yard ) ) ) ) )\n", "`\n", "\n", "and labeled binary parses like\n", "\n", "`\n", "(ROOT\n", " (S\n", " (NP (DT A) (NN child))\n", " (VP (VBZ is)\n", " (VP (VBG playing)\n", " (PP (IN in)\n", " (NP (DT a) (NN yard)))))\n", " (. .)))\n", "`\n", "\n", "Here are the class labels that we wish to learn to predict:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "LABELS = ['contradiction', 'entailment', 'neutral']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The training set for SNLI contains 550,152 sentence pairs, with sentences varying in length from 2 to 62 words. This is too large for in-class experiments and assignments. This is why we're working with the sample in `snli_sample`:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "15000" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(snli_sample['train'])" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "3000" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(snli_sample['dev'])" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "5328" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(snli_sample['vocab'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Both `train` and `test` are balanced across the three classes, with sentences varying in length from 3 to 6 words. These limitations will allow us to explore lots of different models in class. You're encouraged to try out your ideas on the full dataset outside of class (perhaps as part of your final project)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Trees" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following function can be used to turn bracketed strings like the above into trees:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def str2tree(s):\n", " \"\"\"Map str `s` to an `nltk.tree.Tree` instance. The assumption is that \n", " `s` represents a standard Penn-style tree.\"\"\"\n", " return Tree.fromstring(s) " ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAVwAAAEnCAIAAABqkWx5AAAACXBIWXMAAA3XAAAN1wFCKJt4AAAAHXRFWHRTb2Z0d2FyZQBHUEwgR2hvc3RzY3JpcHQgOS4xNnO9PXQAAB+/SURBVHic7d0/jNvInifwencvGRu4h7pFG28nmQZ7cQfY0YJu4LCJHVCJjYdLhsrutTchgd702UVgg5lJFtT0RC/og5isnRwWosNxJySw3cEc9tBd2bOiVaE7WRwsQIU9oLUP2F34gp9dQ+pfq1v8J+n7CQxJpMkSW/zxV7+iVL/4+PEjAwD47D/U3QAAaBYEBQDIQVAAgJxf1t0AKFeapuaxZVmWZU2vo7WWUjLGHMdZcqmUUms9sea87cN6+QUKjRtMKRVFURzHruvSK1LKMAxt2zbrRFEkpXQcR2udpqkQYpmlvu9zzmmDlmXRY8ZYGIbVvT0oyUfYdI7jmMej0Sj7tNfreZ43sXQwGNy4VAhBLwohkiQxj0t7E1Ad1BS2C+c8mwjEcdztdrNLu91uFEU3LvU8b3rjM1+EtYOgsF3SNFVK0WOttUn7DcuyqFiweOnM2gEKCpsBhcbNJ6VstVqMMaWU67q9Xs+8Pn3aM8botF+8FDYYMoXNZ9t2kiRJkkxUAal8OL0+xYLFS2GDIShsEdd1bdsOgsC8orWeOPPTNDWn/eKlsKkQFLaL67pKKVNW8DxvIkZ0Oh0hxDJLYVPhPoVNppTyfV9Kadu2uT1BKdVut4UQdPNCFEVpmlKNUCk1fZ/CvKV0B4RSinPOOZ+4/QHWF4ICMPa5XzDvrF68FDYMggIA5KCmAAA5CAoAkIOgAAA5CAoAkIOgAAA5CAowQ+uHH6Kzs7pbAfVAUIAZ0vfv1XBYdyugHggKMIP14EHdTYDaICjADNbOTt1NgNogKABADoICzKavr+tuAtQDQQFmQ6FxayEoAEAOggIA5CAoAEAOggLMYO/u1t0EqA2CAgDkICgAQA6CAsyWvn9fdxOgHggKAJCDoAAAOQgKAJCDoAAz4FuS2wxBAWZAUNhmmAwGAHKQKQBADoICAOT8su4GQLOYieppOlnOed0tgqqhpgA/o4nnXdfVWqdpqpS6uLiou1FQNQQF+Fmr1UqShB5rrR8/fjwYDOptElQPNQX4RCllWZZ5yjnv9Xo1tgfqgqAAn1BE6HQ6VFNgn8sKsG3QfYCcNE2pmsA59zwPcWELISjAbFrrdrvd6/UwALFt0H2AT+I4Nh0Hxhjn3LZtKWWNTYJaICjAJ1LKOI6zryil0H3YQrh5CX6mtfZ9n/oLaZp6noe+wxZCTQFytNbUZXAcp+62QD0QFAAgBzUFAMhBUACAHAQFAMhBUACAHAxJwify6kpfX8vLSz0e/5/B4L/8+tf8/n1+7569u8vv37e/+qruBkJFEBS2jh6P5eWlGg7VcKivr+mB+vDBrPCfvvji//3Lv/zrv/97/5/+SV9fZ/+v8+gR+zz9rP3VV/z+fWtnB7/yumEwJLnJ0n7f/CsvLykRyK7gPHrE792zHjygjMDa2emcnESnp90XL7wnT2gdyiCyQWR6O9aDB9bOTnZTjDHn4cMq3iQUDUFhE3y62n++5lMukL3If+oCZM7VmWds5+QkiOPQdcWzZ8vsNxt0Zu53YteUViC5aDgEhTWT9vv6+lpeXTHGPvUCMpk/nX7Wzo5J7Jc/A6OzM//1a+/p0+7BwYqNNGmFaSebmrHWRIpsqLJ3d/m9eyvuHVaEoNBQ2bLfzJMq2713Hj5csRYYn5+3j48LiQiL/dyXGY9nVjTY5+SCQhsqndVDUKjZjWU/6q7TGUK1vcIvp/LqqnV0ZO3sXHzzTYGbvRXT9cgmF9OdEVQ6K4CgUJ07lP0q+LibiJC8fNnM1B2VzoohKBSvqLJfBfR4/Pi77xhjF99808yIsBgqnWVAUFhJeWW/CujxuHV0pIbD5OXLDeuxo9K5CgSFpVRc9qvG4+++28iIsBgqnTdCUMhpQtmvGv6bN9Hpae/w0N3fr7stjYBKp7G9QaGZZb9qUETI3rYIC2xbpXPzg8Ialf2qcdvbFmGBjax0blRQWOuyXzUKvG0RFljrSudaBoWNLPtVoLLbFmGB5lc6Gx0UtqfsV4Em3LYICzSn0tmUoLDNZb8KNP+2RVig4kpn1UEBZb/qrftti7BAGZXOSoPCnhAm+UfZrzLB27fR6em23aS05ZapdA6+/37mGVdpUIjOzkz/v7KdAmNMj8fIEYCYTvq8Memm1BQAoCHwE+8AkFPkrzkrpZRSlmVZlkWvpGnKGOOc04zm9JTYto0ZjQtEB59lDmz2FZozluDIw2JFBgUpZZqmUsokSTjnSil6allWt9ulp3Ecu67LGEvTVCklhKB4AStK0zSKIpoqmv6lIyylPDg4+MMf/oAjD8v6WKgkSYQQQgjzihAiSRLz1HEc83g0GmWfwoqEEIPBIPtKt9s1Bx9HHpZUfE3BcRytNSWui5luBRTC87woirKvpGlKWcMEHHlYoJRCYxiGQRDcuJrWOtvXhRVZlpU9nlLKeWc+jjwsUMq0cXQhMp3YLKWUiRdSyjAMy2jA1nJd1xz2KIqyhxdHHpZU1lySQoh2uz2du3LO6UXOOT6XhXNdNwgC13W11pzz7CgDjjwsqcQJZj3P63Q6Ey+ajyaUwQxGTlcTcORhSSXevEQVR611ebuAaa7rRlE0r8QIoJTa29trtVpz1yhwJGMwGNi2bdu267r0ymg04pzTqNhgMHAch65XjuNcXFwUuGvIsm07DEPzFEcesi4uLhhjnPN5K+C7DwBbR0rJOTd3Hk9AUACAHHwhCgByEBQAIAdBAQByEBQAIKfEm5ey4vPz//UP//Bf//RPw6+/rmaPoMfj+Pw8Pj//13/7t//2Z3/m7u/jV/CA0O+7zvtJ5CpGH2iesl//6lf/95//2Xv6NHRd/F5gqdJ+n8KBvr62d3d//atf/e9//Ed67O7vu/v7+IHcLZf2+62jo49/+7czl5abKejxOIjj6PSUZiWiOcvk5WXv8BCfy8Kp4TA6O4vPz9WHD9aDB97Tp96TJ+Y4R2dn6fv3QRwHcezu7zuPHrn7+4jOMK3ETEENh+3jY3l5mZ3dmGYlYYz1Dg8xoUMhqJsQnZ7Sz3h7T586Dx/Om2Bej8fR6Wl8fi4vL/n9++7+/oKVYVMtzhTKCgrm5J+ebkCPx62jo4lgAXcQn5+n/X50esoYoyv/8hf/ibTC3d/PphWw2WoICtRNsHd353UTJroVhTdgs8mrK0oN9PX16uczRRZTgHD3972nT9Gt2GxVBwX/zRs6228sKFIB0t7dxQSHy5jO/Cn5L2rj8fl5+v59fH7OGKOiA/K4TVVdUDD9gtB1500+M9249vExm9XLAINqhNWcrmo4pJGLMkIPNERFQcEUEboHB7cqXMmrKxqSQIlhgry6otSAugnekydVjiYW20mBRqkiKERnZ0EcWzs73Rcv7nDB1+Ox//p1fH4unj/H3U10rY7OztSHD/z+fe/p03rvO8qWM+3dXWoPuntrrfSgELx923n3zt3f7754scpnhbbjPHrUOzzcws+cuQGRpgamvL05g4UNbx7cSolBQY/H7ePj9P37oq7w8fm5/+YNv3+/d3i4PSWGiRsQG17/n0hkqLXb88faDGUFBaoFqOEwdN0CawHy6qp9fKyvr29bm1g7636nQL0lD1hFKUGh1Et64QlIo9zqBsS1QGkODY7c9h4qqEXxQaGazn9RpYrmWOUGxObbvGC3wYoMCmaYoJo7Ec2dkXcb1GiIbRvby3aLmjB6AtMKCwozv+BUtvX9AlWpNyCuhekCKooODVFMUKjx1sM73ChZrypvQFwL0wdkk/pN66iAoNCELyks/5WKuqAavxhSp+Yo4EdW1HBY+9cZuwcH9u4uVemaiS6G6ELPw+/dE8+eiWfPTNGBMYagUAtrZ0c8fz5vKSaDKYwej5uZwjQWjlgzISgAQA5+4h0AcnI1hTRN6QHn3LbtiVWllNPzyluWNW+ayjtTSimlslumhlGrFi8toyWMMdu2OefTr5gjNq8B2RVKamST3eoAmnWgXj8HBaVUHMeMMfrDRFHEGAvD0Pydoiiix1JKy7LM62EYFtsmKWWaplLKJEk450opempZVrfbXby02JakaRpFkeM4jDH6N01TpZSUUghhWVaapnEcu65L6wdBIISgNYnv+2YpY0wp1ev1im1kk93qANIiIcRWxc0mys5LnyRJkiTZeew9zzNPhRDmgVnNvFisJEmEENmNZ3e6eGmxhBCDwSD7Srfbze7LcZzsUtd1s0+zSz3Pu7i4KKORTXarAzgajSaOJ1RvUU2B0jkpJT31PG96nZkvFsJxHK01pZq3XVogz/MoaTLSNM3mAhMmEmBz0YuiyLKsLbwG3uoAblv3qpluKDQ6jmN6fTNrB4UXFLLCMAyC4G5Li2JZlgmLjDEp5bxPrda60+lMBAXqW5mEudSmNtPyB5AxprXOrgy1uPnmpeniYmXoupHttC+/tECu65q9RFE0UUNRSpnYRBXQ6S34vr9VpYQJyx9AKWXhJSq4rZuHJEvNBW4khIjjeF5gWry0KK7rUrqkteacT+QClmWFn/V6PcuyJvIXqj6a2nupTW2mxQeQc+44juM4rusmSYLuQ+1uCApxHC/oP1fD87xOp3O3pYUw5/MyR8PzvGwCTCeD+V8TvestsfgAmqCAcNAQi4ICjUHWmymwzzXFeenA4qVFcV03iqLFJUaSpqn5cGutp7Pl7bT8AYSyKaX29vZarda8FX6+zVkp1W632ee4TuFg4gMdRVEcx0opSgLDMCwjupuWWJZFXXGt9d7eXq/Xcxxn8dLCG2M8fvzYdd1ssVAp5fv+ROWMDhodwzRNfd/PRlWl1GAwKK+RTXbjASzp4wQTpJSPHz/mnI9Go5kr4LsPAFtHSrmgE4CgAAA5+EIUAOQgKABADoICAOQgKEA99Hisx+O6WwEzLBUUgrdv1XBYdlOWEZ2dpf1+3a2YQQ2Hwdu3dbdiPejxOHj7du/Vq0d//dft4+Nm/kG32VJBofPuXUOCAk2yVHcrZlDDYefdu7pb0XRqOPTfvPnPf/VXnXfv/vuf//n/+Iu/oHk9Wj/8EJ2d1d06+GSpX3MGWBFNDBOdnvL798Xz52aOrI7rRmdn0emp//p15+REPHuGKSFqh6AA5Ur7/c7JSfr+vfXggXj+XDx7NnHOe0+eeE+e0Gr+69dBHHtPn06vBpVBUICyRGdnnZMT9eGD9eDBjVMNOg8fOg8fquGwc3LSefeu8+4dhQbMplM9BAUoGM0EFZ2dqQ8fnEePwq+/Xn7uaWtnp3twIJ49oz5FdHrq7u/TBNalthmyEBSgMHo87pyc0Pza7v5+9+DgbieztbMTfv21ePaMgkvr6Iimn9zyKTkrg6AABaC0nyb1KyrtN9PMoRJZMQQFWMm8YYUCoRJZMQQFuKMbhxWKhUpkZRAU4NZuNaxQLFQiK4CgAMtaZVihWKhElgpBAW5W1LBCsVCJLAmCAixSxrBC4VCJLBaCAsxWwbBCsVCJLAqCAkyqeFihWKhErg5BAX5W47BCsVCJXAWCAjRoWKFYqETeDYLCtuucnHROTho1rFC46UokBYu629VQS837kPb79u5uE4KrvLri9+41sHSkx2N5ebmOZ1R0diYvL7enIEeVSGtnB0FhHkwGAwA5a/ZrzlLKsueSXVJzWgJQrEWZgtbazKpu2zZNmloGpRTN0X7jBM2tVksIMT2RbBRFSimtted5CyYpjaJISrl4nSXNa0kTKKWUUizzV5t4JU1TszLnfObRMH99y7Isy9Jal/cBKEn2bU5/gBcv3Wb/8dtvv523rN/v//jjj0qpOI6llL/5zW9KagTn3HGcTqfz29/+dvGaf/zjH2f+/Wzbdhznp59++vLLL+dNm0mr3bjOkua1pAniOO50OoyxL774gt4p/QXjODZPO53OF198oZTq9/udTmfimERR9Pvf/57e3Y8//tjpdJRSzYyA89Dn1rzNn376KYoiy7K+/PLLG5duu4836Xa7vV7Pdd0b11yR4zgrbkEIkSTJ6utsACHEYDDIvtLtdrNvfOJoZ/++SZJ4npdd2uv1hBDltLRc2bc5Go0m3vXipVvr5iFJpVQYhlrrKIo8z1slACml6JrDPieljuNkc9coitI01VpblhWGYfY6HAQBZbNhGN4q+Y/jOI5jyn5d112l/cu0REoZBAFjzEz1fWOfqAye50VRlN11mqa9Xm/e+tlDHcfxRJtd112vNGGmeR2lZZZulRuCAp3AjDHP83zfX2VPWmvf93u9Hn3+6Gn2zyCldF2XPrh0anW7XbOUPqZBENyqvBfHcfZkoKCz4ud7cUt830+ShN6jCRDVsyzL1IOoJfM+8RTus0FBKTXdLWpmR+lWsjWy2y7dKjcEhTiOs1dXpdSde+NRFAkhzGeLc559yhizbdtkIrZtm3i0ijiOs5dHz/PiOF59swtYlpWmKR0027azca1iruuaP99E1sAYU0qZgLXKn7Xhsm9TSrngIEwv3Vo3BAW6tNJjrXUcx0KIu+1JSjnxfyvI1qYv5mXvtNvtRlEUBAFdb1fscK3Cdd0gCFzXpa7TxKWeOmjmKbV5+qygc0ZrrZRKkqSCZheLatj0YPrdLV66tRYFBRq9y57JNA53tz1ZllX9FWk66S315gLauDlEWutWq3VxcVHeHhcwg5HL9Jg8z2u1Wtn/SOhUSdM0O4C3Rsxpf4elW2vRzUvTlUXHce784fA8b6KDLaWk2xPKY1lWdhc0LFfe7ibeUe39cNd1lyyjpGlqcqjV60ew1ubevBQEAQ1re55H/dIgCGhoQAhxt6yYPqCULNBFlYYYtNbtdpsSE7o0tVotqjtSnzyOYzrZKCenk63b7ZpR9wVLqSKYPT/TNBVC3G0kYvG+0jSlop25SjuOU2MPgjH2+PFj13Wz+Z1Syvf9idLjxHAPDdmYtI66fut1UZ14mxNDRYuXbrkavvtAuUaVd/7Q/Xw0CFrB7kwde73Ooml0K3dlxw0aAl+IAoCcNftCFACUDUEBAHIQFAAgB0EBAHIQFGDryKsrNRzW3YrmQlDYavLqqvXDD3W3ojp6PA7evn387bd7r161j4/1eFx3i5oIQWGr6evr9P37ultRkbTff/zdd51378Tz573Dw7Tf33v1Kj4/r7tdjYOfeIfNRxPkdt69s3d3L7791v7qK8aY8+iR//p1+/jY3d/vvnjRhB8rbwgEBdhw8fl58Pat+vBBPH8efv21eZ3fu9c7PIzPz/03b/ZeveoeHGzGFDirQ1CAjaXHY//16/j83Hn0qHd4SAnCBHd/HynDBNQUYDPF5+d7r16l/X7ousnvfjczIhBKGVBlMJApwKbJJgjdg4MlZ75CymAgU4CN0jk5ySYIt5oLDykDQaYAG0INh/6bN+n79+7+fui6d54aEykDggJsApo7mzHWOzxcfRBhywcmEBRgvWUThGKv6lubMqCmAGssePt279UreXlJtYDCT9rtrDIgU4C1JK+u/Nev5eWleP5cPHtW6jV821IGZAqwfuhLTXo8Tl6+DL/+uoJTdKtSBmQKsE7Sfj+I42oShGlbkjIgU4D1QN96bh0dVZkgTNuGlAGZAqyBtN/337yZ/lJTXTY7ZUCmAI1mEgR+797Ft982ISKQDU4ZEBSg0VpHR/SzKBfffLPgS011cff3B99/7zx82D4+prunNgAmg4FGi8/PrQcPGhgOJtD3rzajE4GgAAA56D7AXDSXZN2tgKohU1hjURQppbTWnueVMWlyq9WqcrJpmnnY4JzTm5qeH5jWNCtU5saWZN9ClVMoFwtDkmuMJrkPgqCk67nrupXNN621pjNKSmlZFufcnGlSyjRNpZRJknDOlVL01LKsbrdbTfPI4pbQ0ziOXddljKVpqpQSQqzdJPfIFNZeEASO46z7tPfGzLeTpimFjDAMF6xWgRtb0mq1kiShx1rrdrttnq4LZArNFUVRHMdhGMZxLKWkF5fP54MgoM4F59yyLPMhpk8qY8zzPLqmBUEgpbRtO/tBpz2GYZi90LXbbUqe4zhmjNFms0kytZkxxjkXQsRxrJTq9XorHwzmOA5trbLkZfWWVN/BKcZHaLAkSRzHSZKEno5GI8/zLi4ususIIcwKWaPRyDwOw7Db7Zqng8HA87zsyq7rZtdfsGXOudnUxcVFdju9Xk8Ikf3vE3tZxsydJkmSJMloNHJdd8FqFbixJY7jmMej0Sj7dF1g9KHpbNs2qQHnPAzDTqezzH/knFNHXSll27ZSyiyiS5x5JU1T6sYv2R6qZdDj7GajKDK5BmNMCJFdujq68FImUq8FLVFKBZ+12+3sAVkX6D403UT+Saf6jf9La+37PpXrGGPUO8iuIITodDpUqIuiqIyKHXVbit2mEKLdbjehgDKvJZxzepEieB1NWxWCQtOlaUo9f7Jkp9r3/Wzd25THDJMsaK2XTxMWm9gIDeCtvtkJnuctmSuVbWZLTFBYX+g+NJ1SynzylFJ0tt/4vyZKXDMTXUoW4jheZoPL8DzP931KZLTWQRCUUWZzHEdr3YS7qprTkmIhU2g66plT2V9r3e126SIfx3EURYwxGh6nq7RZ6jgOdR8YY1JKx3Fo5WxCazKOiSv8vC1zztvttpQyCALaDg1SmKd0hfR9n5oahiFtZxlmTMTs1NyGQL10ajANZIRhuLe3l02gqrG4JRSypZStVotNDdysEdyn0GiU898tHdVaSykXj4oFQSCEKO/GO9/3K76/CFaHTGFj3di5pQ5/eRHBZBmwXhAUmiuKIkq/6daAAiv57XZba62U4pyb23ILYXoBjLHs3VCwRtB9AIAcjD4AQA6CAgDkICgAFEANh//z7/9eXl3V3ZACIChAo/3iL/8y7ffrbsUN5NXV4++++93f/V3r6GgDftYZQQFgJdHZWevoyNrZef83f2Pv7raPj6Ozs7obtRIEBYC765yc+K9fOw8fJi9f7v7JnyS/+5339Kn/+rX/5k3dTbs73KcAcEf+mzfR6an39Gn34MC82D04sHZ2gjjW19drOnMUMgWAW9PjceuHH6LT0+6LF9mIQMSzZ90XL9J+n2a+rKWFq0BQALgdNRy2jo7k5WXv8NB78mTmOt6TJ8nLl2o43Hv1au2GJBAUAG6BBhrUcJi8fOnu7y9Y0/7qq4tvvrF2dtZuSAJBAWBZZqBh8P33y8xkZ+3sJC9frt2QBIICwFKyAw3Llw/5vXtrNySB0QeAm80caFjeeg1JIFMAWGTxQMPy1mhIAkEBYK5lBhqWty5DEggKALMtP9CwvLUYkkBQAJjhtgMNy2v+kASCAsCkuw00LK/hQxIYfQDIWXGgYXmNHZJApgDwSVEDDctr5pAEggIAY0UPNCyvgUMSCArQaM6jR/z+/Qp2FJ2dFTvQsDwzJBE0YEJthp94ByCUvdfYsa+9AQaCAgDkoPsAzSKl3Lx5nNcLMgVollarJYS425y6y6A5exljM6fenRmSLMsqcM4+pZRSKrtNahLnnHM+b1GVE1jjPgVoFtd1CzwDJyil4jhmn6fVpak6wzA0E+FGUUSPpZSWZZnXC5wUU0qZpqmUMkkSigL01LIsx3HmLap08u6PANskSZIkSczTi4sLz/PMUyGEeWBWMy8W2AYhRHazZncLFlUGNQVoiiAIWq1Wq9WSUk4sarfbQRBEUUQr+L5fVN3Btm3Oudmj53nT68x8cUWO49DE37daVA0EBWiKMAyTJLFte/qE7/V6lOrTdd7zvCAIitovJe30eGbPpaTuTBiG897FgkUVQFCA9WDbtrli27Zd7IW0lvEOKh/Gs25YWrCoAggKAGXlAjcSQsRxPDMkLVhUNgQF2HZxHJc3Anojz/M6nc5tF5UKQQG2Go1B1pUpsM9lxZkZwYJF5apyqANgnl6v5ziO4ziWZdm2TY8Hg8HHjx9Ho5HjOJzz7Hhh9unyBoOBbdtm+67rTm+k2+1mm3FxcbH6u5vZBtd16ZXRaMQ5T5JkwaJi27AY7mgEgBx0HwAgB0EBAHIQFAAgB0EBAHIQFAAgB0EBgDHG0n6/3t9T1uNx7W0gCAoAjDFGP+VcYwPk5WXtbSAICgCQg6AAADkICgCQg6AA0AjOw4d1N+ETBAUAyEFQAGgQfX1ddxMQFACapAlzzCIoAEAOggIA5CAoAEAOggJAU1gPHtTdBMYQFACaw9rZqbsJjCEoAMAEBAWABlEfPtTdBAQFgCbB7ykAQOMgKABAzi/rbgBAI4jnz2sv/rv7+/U2gGCGKADIQfcBAHIQFAAgB0EBYF1FUeT7vpSy2M0iKACsK8/zOOda62I3i6AAADkYkoRtFwSBUkprzTm3LCsMw1W2prVut9uMMc/zXNel7Uspbds2W16wR6WU7/uMsSRJ4jiO45gx5jiO53m0Ar1I/5e2X7yPANttNBqZx2EYdrvdFTc4GAw8z8u+4rpudi837tFxHCFEGIb0NEkSetDr9bJb7na7tm2bpUVB9wG2HXXL0zRVStm2rZRacYOWZTHGzHbSNLUsi3N+qz1aliWEoMeO49CDOI673a5Zh2oKK7Z2GroPsNW01r7vc87p7KI8f/XNCiE6nQ6dwFEUZc/kJfdo+gsTrZ14pZDWTkBQgK3m+74QwpxaaZqmabr6Zk2yoLWeSBNW2eN0XlD40APD6ANsOc559mJLhb1CULIQx7HpBay+R8uyoigyT6WUBTbYwHcfYKvFcZymqcnkHceJosh13RXHIAh1EyY2tXiPnU4nTdNsnyIMw2wQCYKAhh7MK2maCiEKHIlAUIBtp7WWUk5cwAsRBIEQYmbOv8oelVJKKcuyqJNSOAQFgFIopaIoKiTjqBgKjQAFa7fbWmulFOc8juOybjEqDTIFAMjB6AMA5CAoAEAOggIA5CAoAEAOggIA5Px/ZShOfmdaBOYAAAAASUVORK5CYII=", "text/plain": [ "Tree('ROOT', [Tree('S', [Tree('NP', [Tree('DT', ['A']), Tree('NN', ['child'])]), Tree('VP', [Tree('VBZ', ['is']), Tree('VP', [Tree('VBG', ['playing']), Tree('PP', [Tree('IN', ['in']), Tree('NP', [Tree('DT', ['a']), Tree('NN', ['yard'])])])])]), Tree('.', ['.'])])])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t = str2tree(\"\"\"(ROOT\n", " (S\n", " (NP (DT A) (NN child))\n", " (VP (VBZ is)\n", " (VP (VBG playing)\n", " (PP (IN in)\n", " (NP (DT a) (NN yard)))))\n", " (. .)))\"\"\")\n", "\n", "t" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For baseline models, we often want just the words, also called terminal nodes or _leaves_. We can access them with the `leaves` method on `nltk.tree.Tree` instances:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "['A', 'child', 'is', 'playing', 'in', 'a', 'yard', '.']" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t.leaves()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Readers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To make it easy to run through the corpus, let's define general readers for the data. The general function for this yields triples consisting of the the left tree and the right tree, as parsed by `str2tree`, and finally the label:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def snli_reader(sample):\n", " \"\"\"Reader for SNLI data. `sample` just needs to be an iterator over\n", " the SNLI JSONL files. For this notebook, it will always be \n", " `snli_sample`, but, for example, the following should work for the \n", " corpus files:\n", " \n", " import json \n", " def sample(src_filename):\n", " for line in open(src_filename):\n", " yield json.loads(line)\n", " \n", " Yields\n", " ------\n", " tuple\n", " (tree1, tree2, label), where the trees are from `str2tree` and\n", " label is in `LABELS` above.\n", " \n", " \"\"\"\n", " for d in sample:\n", " yield (str2tree(d['sentence1_parse']), \n", " str2tree(d['sentence2_parse']),\n", " d['gold_label'])\n", " \n", "def train_reader():\n", " \"\"\"Convenience function for reading just the training data.\"\"\"\n", " return snli_reader(snli_sample['train'])\n", "\n", "def dev_reader():\n", " \"\"\"Convenience function for reading just the dev data.\"\"\"\n", " return snli_reader(snli_sample['dev'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Linear classifier approach" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To start, we'll adopt an approach that is essentially identical to that of the [supervisedsentiment.ipynb](supervisedsentiment.ipynb) notebook: we'll train simple MaxEnt classifiers on representations of the data obtained from hand-built feature functions. \n", "\n", "This notebook defines some common baseline features based on pairings of information in the premise and hypothesis. As usual, one can realize big performance gains quickly by improving on these baseline representations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Baseline linear classifier features" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first baseline we define is the _word overlap_ baseline. It simply uses as\n", "features the words that appear in both sentences." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def word_overlap_phi(t1, t2): \n", " \"\"\"Basis for features for the words in both the premise and hypothesis.\n", " This tends to produce very sparse representations.\n", " \n", " Parameters\n", " ----------\n", " t1, t2 : `nltk.tree.Tree`\n", " As given by `str2tree`.\n", " \n", " Returns\n", " -------\n", " defaultdict\n", " Maps each word in both `t1` and `t2` to 1.\n", " \n", " \"\"\"\n", " overlap = set([w1 for w1 in t1.leaves() if w1 in t2.leaves()])\n", " return Counter(overlap)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another popular baseline is the full cross-product of words from both sentences: " ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def word_cross_product_phi(t1, t2):\n", " \"\"\"Basis for cross-product features. This tends to produce pretty \n", " dense representations.\n", " \n", " Parameters\n", " ----------\n", " t1, t2 : `nltk.tree.Tree`\n", " As given by `str2tree`.\n", " \n", " Returns\n", " -------\n", " defaultdict\n", " Maps each (w1, w2) in the cross-product of `t1.leaves()` and \n", " `t2.leaves()` to its count. This is a multi-set cross-product\n", " (repetitions matter).\n", " \n", " \"\"\"\n", " return Counter([(w1, w2) for w1, w2 in itertools.product(t1.leaves(), t2.leaves())])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Both of these feature functions return count dictionaries mapping feature names to the number of times they occur in the data. This is the representation we'll work with throughout; `sklearn` will handle the further processing it needs to build linear classifiers.\n", "\n", "Naturally, you can do better than these feature functions! Both of these might be useful even in a more advanced model, though." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Building datasets for linear classifier experiments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As usual, the first step in training a classifier is using a feature function like the one above to turn the data into a list of training instances (feature representations and their associated labels):" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def build_linear_classifier_dataset(\n", " reader, \n", " phi=word_overlap_phi, \n", " vectorizer=None):\n", " \"\"\"Create a dataset for training classifiers using `sklearn`.\n", " \n", " Parameters\n", " ----------\n", " reader\n", " An SNLI iterator like `snli_reader` above. Just needs to\n", " yield (tree, tree, label) triples.\n", " \n", " phi : feature function\n", " Maps trees to count dictionaries.\n", " \n", " vectorizer : `sklearn.feature_extraction.DictVectorizer` \n", " If this is None, then a new `DictVectorizer` is created and\n", " used to turn the list of dicts created by `phi` into a \n", " feature matrix. This happens when we are training.\n", " \n", " If this is not None, then it's assumed to be a `DictVectorizer` \n", " and used to transform the list of dicts. This happens in \n", " assessment, when we take in new instances and need to \n", " featurize them as we did in training.\n", " \n", " Returns\n", " -------\n", " dict\n", " A dict with keys 'X' (the feature matrix), 'y' (the list of\n", " labels), 'vectorizer' (the `DictVectorizer`), and \n", " 'raw_examples' (the original tree pairs, for error analysis).\n", " \n", " \"\"\"\n", " feat_dicts = []\n", " labels = []\n", " raw_examples = []\n", " for t1, t2, label in reader():\n", " d = phi(t1, t2)\n", " feat_dicts.append(d)\n", " labels.append(label) \n", " raw_examples.append((t1, t2))\n", " if vectorizer == None:\n", " vectorizer = DictVectorizer(sparse=True)\n", " feat_matrix = vectorizer.fit_transform(feat_dicts)\n", " else:\n", " feat_matrix = vectorizer.transform(feat_dicts)\n", " return {'X': feat_matrix, \n", " 'y': labels, \n", " 'vectorizer': vectorizer, \n", " 'raw_examples': raw_examples}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Training linear classifiers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To keep this notebook relatively simple, we adopt a bare-bones training framework, using just a standard-issue MaxEnt classifier. The following function is from [supervisedsentiment.ipynb](supervisedsentiment.ipynb):" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def fit_maxent_classifier(X, y): \n", " \"\"\"Wrapper for `sklearn.linear.model.LogisticRegression`. This is also \n", " called a Maximum Entropy (MaxEnt) Classifier, which is more fitting \n", " for the multiclass case.\n", " \n", " Parameters\n", " ----------\n", " X : 2d np.array\n", " The matrix of features, one example per row.\n", " \n", " y : list\n", " The list of labels for rows in `X`.\n", " \n", " Returns\n", " -------\n", " `sklearn.linear.model.LogisticRegression`\n", " A trained `LogisticRegression` instance.\n", " \n", " \"\"\"\n", " mod = LogisticRegression(fit_intercept=True)\n", " mod.fit(X, y)\n", " return mod" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a more robust and responsible approach, see [supervisedsentiment.ipynb](supervisedsentiment.ipynb) notebook, especially the [section on hyperparameter search](supervisedsentiment.ipynb#Hyperparameter-search)." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### Running linear classifier experiments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `linear_classifier_experiment` function handles the book-keeping associated with running experiments. It essentially just combines all of the above pieces in a flexible way. If you decide to expand this codebase for real experiments, then you'll likely want to incorporate more of the functionality from the [supervisedsentiment.ipynb](supervisedsentiment.ipynb) notebook, especially [its method for comparing different models statistically](supervisedsentiment.ipynb#Statistical-comparison-of-classifier-models)." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def linear_classifier_experiment(\n", " train_reader=train_reader, \n", " assess_reader=dev_reader, \n", " phi=word_overlap_phi,\n", " train_func=fit_maxent_classifier): \n", " \"\"\"Runs experiments on our SNLI fragment.\n", " \n", " Parameters\n", " ----------\n", " train_reader, assess_reader\n", " SNLI iterators like `snli_reader` above. Just needs to\n", " yield (tree, tree, label) triples.\n", " \n", " phi : feature function (default: `word_overlap_phi`)\n", " Maps trees to count dictionaries.\n", " \n", " train_func : model wrapper (default: `fit_maxent_classifier`)\n", " Any function that takes a feature matrix and a label list\n", " as its values and returns a fitted model with a `predict`\n", " function that operates on feature matrices.\n", " \n", " Returns\n", " -------\n", " str\n", " A formatted `classification_report` from `sklearn`.\n", " \n", " \"\"\"\n", " train = build_linear_classifier_dataset(train_reader, phi) \n", " assess = build_linear_classifier_dataset(assess_reader, phi, vectorizer=train['vectorizer'])\n", " mod = fit_maxent_classifier(train['X'], train['y'])\n", " predictions = mod.predict(assess['X'])\n", " return classification_report(assess['y'], predictions)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", "contradiction 0.41 0.58 0.48 1000\n", " entailment 0.45 0.34 0.39 1000\n", " neutral 0.35 0.29 0.32 1000\n", "\n", "avg / total 0.40 0.40 0.40 3000\n", "\n" ] } ], "source": [ "print(linear_classifier_experiment())" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", "contradiction 0.63 0.58 0.60 1000\n", " entailment 0.55 0.63 0.59 1000\n", " neutral 0.56 0.53 0.54 1000\n", "\n", "avg / total 0.58 0.58 0.58 3000\n", "\n" ] } ], "source": [ "print(linear_classifier_experiment(phi=word_cross_product_phi))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A few ideas for better classifier features" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Cross product of synsets compatible with each word, as given by WordNet. (Here is [a codebook on using WordNet from NLTK to do things like this](http://compprag.christopherpotts.net/wordnet.html).)\n", "\n", "* More fine-grained WordNet features — e.g., spotting pairs like _puppy_/_dog_ across the two sentences.\n", "\n", "* Use of other WordNet relations (see Table 1 and Table 2 in [this codelab](http://compprag.christopherpotts.net/wordnet.html) for relations and their coverage).\n", "\n", "* Using the tree structure to define features that are sensitive to how negation scopes over constituents.\n", "\n", "* Features that are sensitive to differences in negation between the two sentences.\n", "\n", "* Sentiment features seeking to identify contrasting sentiment polarity." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Recurrent neural network approach" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Very recently, recurrent neural networks (RNNs) have become one of the dominant approaches to NLI, and there is a great deal of interest in the extent to which they can learn to simulate the powerful symbolic approaches that have long dominated work in NLI. \n", "\n", "The goal of this section is to give you some hands-on experience with using RNNs to build NLI models. Because these models are demanding not only in terms of data but also in terms of training time, we'll just get a glimpse of their potential, but I think even this glimpse clearly indicates their great potential." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Classifier RNN model definition" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The model we'll be exploring is probably the simplest one that fits the NLI problem. It's depicted in the following diagram:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This model would actually work for any classification task. For instance, you could revisit the [supervisedsentiment](supervisedsentiment.ipydb) notebook and try it out on the Stanford Sentiment Treebank.\n", "\n", "The dominant applications for RNNs to date have been for language modeling and machine translation. Those models have many more output vectors than ours. For a wonderful step-by-step introduction to such models, see Denny Britz's [four-part tutorial](http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/) (in the form of a notebook like this one). See also Andrej Karpathy's [insightful, clear overview of different RNN architectures](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). (Both Denny and Andrej are Stanford researchers!)\n", "\n", "The above diagram is a kind of schematic for the following model definition:\n", "\n", "$$h_{t} = \\tanh\\left(x_{t}W_{xh} + h_{t-1}W_{hh}\\right)$$\n", "\n", "$$y = \\text{softmax}\\left(h_{n}W_{hy} + b\\right)$$\n", "\n", "where $n$ is the sequence length and $1 \\leqslant t \\leqslant n$. As indicated in the above diagram, the sequence of hidden states is padded with an initial state $h_{0}$. In our implementation, this is always an all $0$ vector, but it can be initialized in more sophisticated ways.\n", "\n", "It's important to see that there is just one $W_{xh}$, just one $W_{hh}$, and just one $W_{hy}$. \n", "\n", "Our from-scratch implementation of the above model is in [nli_rnn.py](nli_rnn.py). As usual, the goal of this code is to illuminate the above concepts and clear up any lingering underspecification in descriptions like the above. The code also shows how __backpropagation through time__ works in these models. You'll see that it is very similar to regular backpropagation as we used it in the simpler [word-entailment bake-off](wordentail.ipynb) (using the feed-forward networks from [shallow_neural_networks.py](shallow_neural_networks.py).)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Building datasets for classifier RNNs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following function uses our `snli_reader` infrastructure to create datasets for training and assessing RNNs. The steps:\n", "\n", "* Concatenate the leaves of the premise and hypothesis trees into a sequence\n", "* Use the `LABELS` vector defined above to turn each string label into a one-hot vector." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def build_rnn_dataset(reader):\n", " \"\"\"Build RNN datasets.\n", " \n", " Parameters\n", " ----------\n", " reader\n", " SNLI iterator like `snli_reader` above. Just needs to\n", " yield (tree, tree, label) triples.\n", " \n", " Returns\n", " -------\n", " list of tuples\n", " The first member of each tuple is a list of strings (the\n", " concatenated leaves) and the second is an np.array \n", " (dimension 3) with a single 1 for the true class and 0s\n", " in the other two positions\n", " \n", " \"\"\" \n", " dataset = []\n", " for (t1, t2, label) in reader():\n", " seq = t1.leaves() + t2.leaves()\n", " y_ = np.zeros(3)\n", " y_[LABELS.index(label)] = 1.0\n", " dataset.append((seq, y_))\n", " return dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Running classifier RNN experiments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nex we define functions for the training and assessment steps. It's currently baked in that you want to train with `train_reader` and assess on `dev_reader`. If you start doing serious experiments, you'll want to move to a more flexible set-up like the one we established above for linear classifiers (and see [supervisedsentiment.ipynb](supervisedsentiment.ipynb) for even more ideas).\n", "\n", "The important thing to see about this function is that it requires a `vocab` argument and an `embedding` argument:\n", "\n", "* `vocab` is a list of strings. It needs to contain every word we'll encounter in training or assessment.\n", "* `embedding` is a 2d matrix in which the ith row gives the input representation for the ith member of `vocab`.\n", "\n", "This gives you flexibility in how you represent the inputs. In the experiment run below, the inputs are just random vectors, but [the homework](#Homework-4) asks you to try out GloVe inputs." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def rnn_experiment(\n", " vocab, \n", " embedding, \n", " hidden_dim=10, \n", " eta=0.05, \n", " maxiter=10):\n", " \"\"\"Classifier RNN experiments.\n", " \n", " Parameters\n", " ----------\n", " vocab : list of str\n", " Must contain every word we'll encounter in training or assessment.\n", " \n", " embedding : np.array\n", " Embedding matrix for `vocab`. The ith row gives the input \n", " representation for the ith member of vocab. Thus, `embedding`\n", " must have the same row count as the length of vocab. Its\n", " columns can be any length. (That is, the input word \n", " representations can be any length.)\n", " \n", " hidden_dim : int (default: 10)\n", " Dimensionality of the hidden representations. This is a\n", " parameter to `ClassifierRNN`.\n", " \n", " eta : float (default: 0.05)\n", " The learning rate. This is a parameter to `ClassifierRNN`. \n", " \n", " maxiter : int (default: 10)\n", " Maximum number of training epochs. This is a parameter \n", " to `ClassifierRNN`. \n", " \n", " Returns\n", " -------\n", " str\n", " A formatted `sklearn` `classification_report`.\n", " \n", " \"\"\"\n", " # Training:\n", " train = build_rnn_dataset(train_reader) \n", " mod = ClassifierRNN(\n", " vocab, \n", " embedding, \n", " hidden_dim=hidden_dim, \n", " eta=eta,\n", " maxiter=maxiter)\n", " mod.fit(train) \n", " # Assessment:\n", " assess = build_rnn_dataset(dev_reader) \n", " return rnn_model_evaluation(mod, assess)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def rnn_model_evaluation(mod, assess, labels=LABELS):\n", " \"\"\"Asssess a trained `ClassifierRNN`.\n", " \n", " Parameters\n", " ----------\n", " mod : `ClassifierRNN`\n", " Should be a model trained on data in the same format as\n", " `assess`.\n", " \n", " assess : list\n", " A list of (seq, label) pairs, where seq is a sequence of\n", " words and label is a one-hot vector giving the label. \n", " \n", " \"\"\" \n", " # Assessment:\n", " gold = []\n", " predictions = [] \n", " for seq, y_ in assess:\n", " # The gold labels are vectors. Get the index of the single 1\n", " # and look up its string in `LABELS`:\n", " gold.append(labels[np.argmax(y_)])\n", " # `predict` returns the index of the highest score.\n", " p = mod.predict(seq) \n", " predictions.append(labels[p])\n", " # Report:\n", " return classification_report(gold, predictions)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's an example run. All input and hidden dimensions are quite small, as is `maxiter`. This is just so you can run experiments quickly and see what happens. Nonetheless, the performance is competitive with the linear classifier above, which is encouraging about this approach." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Finished epoch 10 of 10; error is 1.0777186145\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", "contradiction 0.35 0.38 0.36 1000\n", " entailment 0.41 0.52 0.46 1000\n", " neutral 0.39 0.26 0.31 1000\n", "\n", "avg / total 0.38 0.39 0.38 3000\n", "\n" ] } ], "source": [ "vocab = snli_sample['vocab']\n", "\n", "# Random embeddings of dimension 10:\n", "randvec_embedding = np.array([utils.randvec(10) for w in vocab])\n", "\n", "# A small network, trained for just a few epochs to see how things look:\n", "print(rnn_experiment(vocab, randvec_embedding, hidden_dim=10, eta=0.001, maxiter=10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Next steps for NLI deep learning models" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As noted above, `ClassifierRNN` is just about the simplest model we could use for this task. Some thoughts on where to take it:\n", "\n", "* Additional hidden layers can be added. This is a relatively simple change to the code: one just needs to define a version of $W_{hh}$ for each layer, respecting the desired dimensions for the representations of the layers it connects. The backpropagation steps are also straightforward duplications of what happens between the current layers.\n", "\n", "* `ClassifierRNN` uses the most basic (non-linear) activation functions. In TensorFlow, it is easy to try more advanced designs, including Long Short-Term Memory (LSTM) cells and Gated Recurrent Unit (GRU) cells. The documentation for these is currently a bit hard to find, but here's [the well-documented source code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/rnn_cell.py).\n", "\n", "* Our implementation uses the same parameter $W_{hh}$ for the premise and hypothesis. It is common to split this into two, with the final hidden state from the premise providing the initial hidden state of the hypothesis.\n", "\n", "* The [SNLI leaderboard](http://nlp.stanford.edu/projects/snli/) shows the value of adding __attention__ layers. These are additional connections between premise and hypothesis. They can be made for each pair of words or just for the final hidden representation in the premise and hypothesis.\n", "\n", "* Our implementation currently has only a single learning rate parameter. A well-tested improvement on this is the [AdaGrad method](http://jmlr.org/papers/volume12/duchi11a/duchi11a.pdf), which can straightforwardly be added to the `ClassifierRNN` implementation.\n", "\n", "* Our implementation is regularized only in the sense that the number of iterations acts to control the size of the learned weights. Within deep learning, an increasingly common regularization strategy is [drop-out](http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf).\n", "\n", "* We haven't made good use of trees. Like many linguists, I believe trees are necessary for capturing the nuanced ways in which we reason in language, and [this new paper](http://arxiv.org/abs/1603.06021) offers empirical evidence that trees are important for SNLI. Tree-structured neural networks are by now well-understood extensions of feed-forward neural networks and so are well within reach for a final project. The [Stanford Deep Learning course site](http://cs224d.stanford.edu) is a great place to get started." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Additional NLI resources" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Do get [the full SNLI](http://nlp.stanford.edu/projects/snli/) and figure out how to grapple with its large size! Here's [a useful and insightful blog post by Sam Bowman on SNLI's design](http://nlp.stanford.edu/blog/the-stanford-nli-corpus-revisited/).\n", "\n", "* The folder [nli-data](nli-data) in this repository contains the NLI data from the [SemEval 2014 semantic relatedness task](http://alt.qcri.org/semeval2014/task1/). \n", "This data set is called \"Sentences Involving Compositional Knowledge\" or, for better or worse, \n", "\"SICK\". It's [freely available from the SemEval site](http://alt.qcri.org/semeval2014/task1/index.php?id=data-and-tools). [nli-data](nli-data) contains a parsed version created by [Sam Bowman](http://stanford.edu/~sbowman/) \n", "as part of [his research on neural models of semantic composition](https://github.com/sleepinyourhat/vector-entailment/releases/tag/W15-R1). \n", "\n", "* [SemEval 2013](https://www.cs.york.ac.uk/semeval-2013/) also had a wide range of interesting data sets for NLI and related tasks.\n", "\n", "* The [FraCaS textual inference test suite](http://www-nlp.stanford.edu/~wcmac/downloads/) is a smaller, hand-built dataset that is great for evaluating a model's ability to handle complex logical patterns.\n", "\n", "* Models for NLI might be adapted for use with the [30M Factoid Question-Answer Corpus](http://agarciaduran.org).\n", "\n", "* Models for NLI might be adapted for use with the [Penn Paraphrase Database](http://paraphrase.org)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Homework 4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. WordNet-based entailment features [4 points]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Python NLTK](http://www.nltk.org) has an excellent WordNet interface. As noted above, WordNet is a natural choice for defining useful features in the context of NLI.\n", "\n", "__Your task__: write and submit a feature function, for use with `build_linear_classifier_dataset` and `linear_classifier_experiment`, that is just like `word_cross_product_phi` except that, given a sentence pair $(S_{1}, S_{2})$, it counts only pairs $(w_{1}, w_{2})$ such that $w_{1}$ entails $w_{2}$, for $w_{1} \\in S_{1}$ and $w_{2} \\in S_{2}$. For example, the sentence pair (_the cat runs_, _the animal moves_) would create the dictionary `{(cat, animal): 1.0, (runs, moves): 1.0}`.\n", "\n", "There are many ways to do this. For the purposes of the question, we can limit attention to the WordNet hypernym relation. The following illustrates reasonable ways to go from a string $s$ to the set of all hypernyms of Synsets consistent with $s$:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[Synset('dog.n.01'), Synset('pup.n.01'), Synset('young_person.n.01')]\n", "[Synset('dog.n.01'), Synset('pup.n.01')]\n" ] } ], "source": [ "from nltk.corpus import wordnet as wn\n", " \n", "puppies = wn.synsets('puppy')\n", "print([h for ss in puppies for h in ss.hypernyms()])\n", "\n", "# A more conservative approach uses just the first-listed \n", "# Synset, which should be the most frequent sense:\n", "print(wn.synsets('puppy')[0].hypernyms())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__A note on performance__: in our experience, this feature function (used in isolation) gets a mean F1 of about 0.32. This is not very high, but that's perhaps not surprising given its sparsity." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Pretrained RNN inputs [2 points]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "In the simple RNN experiment above, we used random input vectors. In the [word-entailment bake-off](wordentail.ipynb), pretraining was clearly beneficial. What are the effects of using pretrained inputs here?\n", "\n", "__Submit__:\n", "\n", "0. A function `build_glove_embedding` that creates an embedding space for all of the words in `snli_sample['vocab']`. (You can use any GloVe file you like; the `50d` one will be fastest.) See `randvec_embedding` above if you need further guidance on the nature of the data structure to produce. If you encounter any words in `snli_sample['vocab']` that are not in GloVe, have your function map them instead to a random vector of the appropriate dimensionality (see `utils.randvec`).\n", "\n", "0. A function call for `rnn_experiment` using your GloVe embedding. (You can set the other parameters to `rnn_experiment` however you like.)\n", "\n", "0. The output of this function. (You won't be evaluated by how strong the performance is. We're just curious.)\n", "\n", "You can use `utils.glove2dict` to read in the GloVe data into a `dict`.\n", "\n", "__A note on performance__: your numbers will vary widely, depending on how you configure your network and how long you let it train. You will not be evaluated on the performance of your code, but rather only on whether your functions do their assigned jobs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Learning negation [4 points]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The goal of this question is to begin to access the extent to which RNNs can learn to simulate __compositional semantics__: the way the meanings of words and phrases combine to form more complex meanings. We're going to do this with simulated data so that we have clear learning targets and so we can track the extent to which the models are truly generalizing in the desired ways." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Data and background" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The __base__ dataset is `nli_simulated_data.pickle` in this directory (the root folder of the cs224u repository). (You'll see below why it's the \"base\" dataset.)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [], "source": [ "simulated_data = pickle.load(open('nli_simulated_data.pickle', 'rb'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a list of triples, where the first two members are lists and the third member is a label:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[([['a'], ['a']], 'equal'),\n", " ([['a'], ['c']], 'superset'),\n", " ([['a'], ['b']], 'neutral'),\n", " ([['a'], ['e']], 'superset'),\n", " ([['a'], ['d']], 'neutral')]" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "simulated_data[:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The letters are arbitrary names, but the dataset was generated in a way that ensures logical consistency. For instance, if `(['x'], ['y'], 'subset')` is in the data and `(['y'], ['z'], 'subset')` is in the data, then `(['x'], ['z'], 'subset')` is as well (transitivity of `subset`)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's the full label set:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [], "source": [ "simulated_labels = ['disjoint', 'equal', 'neutral', 'subset', 'superset']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These are interpreted as disjoint. For example, 'subset' is proper subset and 'superset' is proper superset – bothe exclude the case where the two arguments are equal." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As usual, we have to do a little bit of work to prepare the data for use with `ClassifierRNN`:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def build_sim_dataset(dataset):\n", " \"\"\"Map `dataset`, in the same format as `simulated_data`, to a \n", " dataset that is suitable for use with `ClassifierRNN`: the input \n", " sequences are flattened into a single list and the label string \n", " is mapped to the appropriate one-hot vector. \n", " \"\"\"\n", " rnn_dataset = []\n", " for (p, q), rel in dataset:\n", " y_ = np.zeros(len(simulated_labels))\n", " y_[simulated_labels.index(rel)] = 1.0 \n", " rnn_dataset.append((p+q, y_))\n", " return rnn_dataset " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, here is the full vocabulary, which you'll need in order to create embedding spaces:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "['not', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n']" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sim_vocab = [\"not\"] + sorted(set([p[0] for x,y in simulated_data for p in x]))\n", "\n", "sim_vocab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Task 1: Experiment function [2 points]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Complete the function `sim_experiment` so that it trains a `ClassifierRNN` on a dataset produced by `build_sim_dataset` and evaluates that classifier on a dataset produced by `build_sim_dataset`:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": true }, "outputs": [], "source": [ "######################################################################\n", "# TO BE COMPLETED\n", "######################################################################\n", "\n", "def sim_experiment(train_dataset, test_dataset, word_dim=10, hidden_dim=10, eta=0.001, maxiter=100):\n", " # Create an embedding for `sim_vocab`:\n", "\n", " # Change the value of `mod` to a `ClassifierRNN` instance using \n", " # the user-supplied arguments to `sim_experiment`:\n", "\n", " # Fit the model:\n", "\n", " # Return the evaluation on `test_dataset`:\n", " return rnn_model_evaluation(mod, test_dataset, labels=simulated_labels)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Submit__: Your completed `sim_experiment`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Task 2: Memorize the training data [1 point]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fiddle with `sim_experiment` until you've found settings that yield perfect accuracy on the training data. In other words, if `d` is the dataset you created with `build_sim_dataset`, then `sim_experiment(d, d)` should yield perfect performance on all classes. (If it's a little off, that's okay.)\n", "\n", "__Submit__: Your function call to `sim_experiment` showing the values of all the parameters. If you need to write any code to prepare arguments for the function call, then include those lines as well.\n", "\n", "__Tip__: set `eta` very low. This will lead to slower but more stable learning. You might also pick high `word_dim` and `hidden_dim` to ensure that you have sufficient representational power. These settings in turn demand a large number of iteration." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [], "source": [ "######################################################################\n", "# TO BE COMPLETED\n", "######################################################################\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Task 3: Negation and generalization [1 point]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we've established that the model works, we want to start making the data more complex. To do this, we'll simply negate one or both arguments and assign them the relation determined by their original label and the logic of negation. For instance, the training instance\n", "\n", "`p q, subset`\n", "\n", "will become\n", "\n", "`not p not q, superset\n", "p not q, disjoint\n", "not p q, overlap`\n", "\n", "The full logic of this is a somewhat liberal interpretation of the theory of negation developed by [MacCartney and Manning 2007](http://nlp.stanford.edu/~wcmac/papers/natlog-wtep07.pdf).\n", "\n", "\n", "$$\n", "\\begin{array}{c c}\n", "\\hline \n", " & \\text{not-}p, \\text{not-}q & p, \\text{not-}q & \\text{not-}p, q \\\\\n", "\\hline \n", "p \\text{ disjoint } q & \\text{neutral} & \\text{subset} & \\text{superset} \\\\\n", "p \\text{ equal } q & \\text{equal} & \\text{disjoint} & \\text{disjoint} \\\\\n", "p \\text{ neutral } q & \\text{neutral} & \\text{neutral} & \\text{neutral} \\\\\n", "p \\text{ subset } q & \\text{superset} & \\text{disjoint} & \\text{neutral} \\\\\n", "p \\text{ superset } q & \\text{subset} & \\text{neutral} & \\text{disjoint} \\\\\n", "\\hline\n", "\\end{array}\n", "$$ \n", "\n", "If you don't want to worry about the details, that's fine – you can treat `negate_dataset` as a black-box. Just think of it as implementing the theory of negation." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def negate_dataset(dataset):\n", " \"\"\"Map `dataset` to a new dataset that has been thoroughly negated.\"\"\"\n", " new_dataset = []\n", " for (p, q), rel in dataset: \n", " neg_p = [\"not\"] + p\n", " neg_q = [\"not\"] + q\n", " combos = [[neg_p, neg_q], [p, neg_q], [neg_p, q]]\n", " new_rels = None\n", " if rel == \"disjoint\":\n", " new_rels = (\"neutral\", \"subset\", \"superset\")\n", " elif rel == \"equal\":\n", " new_rels = (\"equal\", \"disjoint\", \"disjoint\") \n", " elif rel == \"neutral\":\n", " new_rels = (\"neutral\", \"neutral\", \"neutral\")\n", " elif rel == \"subset\":\n", " new_rels = (\"superset\", \"disjoint\", \"neutral\")\n", " elif rel == \"superset\":\n", " new_rels = (\"subset\", \"neutral\", \"disjoint\") \n", " new_dataset += zip(combos, new_rels)\n", " return new_dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using `negate_dataset`, we can map the base dataset to a singly negated one and then create a `ClassifierRNN` dataset from that:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [], "source": [ "neg1 = negate_dataset(simulated_data)\n", "neg1_rnn = build_sim_dataset(neg1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Your task__: use your `sim_experiment` to train a network on `train_dataset` plus `neg1`, and evaluate it on a dataset that has been _doubly negated_ by running `negate_dataset(neg1)` and preparing the result for use with a `ClassifierRNN`. Use the same hyperparameters that you used to memorize the data for task 2.\n", "\n", "__Submit__: the code you write to run this experiment and the output (which should be from a use of `sim_experiment`).\n", "\n", "__A note on performance__: our mean F1 dropped to about 0.61, because we stuck to the rules and used exactly the configuration that led to perfect results on the training set above, as is required. You will not be evaluated based on the numbers you achieve, but rather only on whether you successfully run the required experiment.\n", "\n", "That's all that's required. Of course, we hope you are now extremly curious to see whether you can find hyperparameters that generalize well to double negation, and how many times you can negate a dataset and still get good predictions out! `neg3` and beyond?!" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [], "source": [ "######################################################################\n", "# TO BE COMPLETED\n", "######################################################################\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 0 }