{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<script>\n",
       "  function code_toggle() {\n",
       "    if (code_shown){\n",
       "      $('div.input').hide('500');\n",
       "      $('#toggleButton').val('Show Code')\n",
       "    } else {\n",
       "      $('div.input').show('500');\n",
       "      $('#toggleButton').val('Hide Code')\n",
       "    }\n",
       "    code_shown = !code_shown\n",
       "  }\n",
       "\n",
       "  $( document ).ready(function(){\n",
       "    code_shown=false;\n",
       "    $('div.input').hide()\n",
       "  });\n",
       "</script>\n",
       "<form action=\"javascript:code_toggle()\"><input type=\"submit\" id=\"toggleButton\" value=\"Show Code\"></form>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%%html\n",
    "<script>\n",
    "  function code_toggle() {\n",
    "    if (code_shown){\n",
    "      $('div.input').hide('500');\n",
    "      $('#toggleButton').val('Show Code')\n",
    "    } else {\n",
    "      $('div.input').show('500');\n",
    "      $('#toggleButton').val('Hide Code')\n",
    "    }\n",
    "    code_shown = !code_shown\n",
    "  }\n",
    "\n",
    "  $( document ).ready(function(){\n",
    "    code_shown=false;\n",
    "    $('div.input').hide()\n",
    "  });\n",
    "</script>\n",
    "<form action=\"javascript:code_toggle()\"><input type=\"submit\" id=\"toggleButton\" value=\"Show Code\"></form>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "%load_ext tikzmagic"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Interpretability\n",
    "\n",
    "or: opening the black box of NLP models\n",
    "\n",
    "<img src=\"https://blackboxnlp.github.io/logo.svg\" width=400px>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* Visualisation\n",
    "* Probing\n",
    "* Adversarial evaluation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Visualisation\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/fafb602db42240f5fb1e1b113fa0ed8647b45adc/8-Figure5-1.png\" width=50%>\n",
    "    </a>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://www.aclweb.org/anthology/2020.acl-tutorials.1.pdf\">Belinkov, Gehrmann and Pavlick, 2020</a>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Inspecting weights\n",
    "\n",
    "Easy with [n-gram feature-based lienar text classifiers](doc_classify_slides_short.ipynb):\n",
    "\n",
    "![unigram_positive_weights](../img/unigram_positive_weights.png)\n",
    "![unigram_negative_weights](../img/unigram_negative_weights.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Not so simple for neural models.\n",
    "\n",
    "![transformers](../img/transformers.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Visualising embeddings\n",
    "\n",
    "t-SNE for dimensionality reduction ([van der Maaten and Hinton, 2009](https://lvdmaaten.github.io/publications/papers/JMLR_2008.pdf))\n",
    "\n",
    "word2vec:\n",
    "<img src=\"../img/word_representations.svg\" width=70%>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Visualising embeddings\n",
    "\n",
    "t-SNE for dimensionality reduction ([van der Maaten and Hinton, 2009](https://lvdmaaten.github.io/publications/papers/JMLR_2008.pdf))\n",
    "\n",
    "BERT:\n",
    "    <img src=\"https://home.ttic.edu/~kgimpel/viz-bert/viz-bert-voc.png\" width=70%/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Visualising attention\n",
    "\n",
    "<img src=\"dl-applications-figures/reordering.png\" width=60%>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Visualising self-attention\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/0de0a44b859a3719d11834479112314b4caba669/2-Figure2-1.png\" width=100%>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/P19-3007\">Vig, 2019</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Visualising self-attention\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/0de0a44b859a3719d11834479112314b4caba669/5-Figure6-1.png\" width=100%>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/P19-3007\">Vig, 2019</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Caveats\n",
    "\n",
    "- Attention is not explanation (<a href=\"https://www.aclweb.org/anthology/N19-1357.pdf\">Jain and Wallace, 2019</a>)\n",
    "- Attention is not not explanation (<a href=\"https://www.aclweb.org/anthology/D19-1002.pdf\">Wiegreffe and Pinter, 2019</a>)\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/ce177672b00ddf46e4906157a7e997ca9338b8b9/3-Table1-1.png\" width=80%>\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Close inspection\n",
    "\n",
    "<center>\n",
    "    <a href=\"slides/cs224n-2020-lecture20-interpretability-26-32.pdf\">\n",
    "    <img src=\"https://www.prevuemeetings.com/wp-content/uploads/2016/11/site-inspection.jpg\" width=30%>\n",
    "    </a>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from John Hewitt; <a href=\"http://web.stanford.edu/class/cs224n/slides/cs224n-2020-lecture20-interpretability.pdf\">slides</a>)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Probing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- To find out whether a representation R encodes a property P\n",
    "  - train a model to predict P from the R\n",
    "  - or cast the problem directly according to R's training objective"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Linguistic knowledge in BERT\n",
    "\n",
    "Train a simple linear classifier on top of BERT\n",
    "\n",
    "<img src=\"../img/liu_probing.png\" width=70%/>\n",
    "\n",
    "(from [Liu et al., 2019](https://aclanthology.org/N19-1112/))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "<img src=\"https://d3i71xaburhd42.cloudfront.net/e0ba34397ccd04721d465af842c4388752cf6017/5-Table1-1.png\" width=100%/>\n",
    "\n",
    "(from [Liu et al., 2019](https://aclanthology.org/N19-1112/))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### World knowledge in BERT\n",
    "\n",
    "Probing by masked language modelling (no extra parameters!)\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/d0086b86103a620a86bc918746df0aa642e2a8a3/1-Figure1-1.png\" width=70%>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/D19-1250/\">Petroni et al., 2019</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Language models as linguistic test subjects\n",
    "\n",
    "<center>\n",
    "    <a href=\"slides/cs224n-2020-lecture20-interpretability-14-21.pdf\">\n",
    "    <img src=\"https://elearningindustry.com/wp-content/uploads/2019/06/do-this-not-that-when-writing-multiple-choice-questions.jpg\" width=50%>\n",
    "    </a>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from John Hewitt; <a href=\"http://web.stanford.edu/class/cs224n/slides/cs224n-2020-lecture20-interpretability.pdf\">slides</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Adversarial evaluation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "BERT struggles with quantifiers:\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/8fe49d025e8b83816f7169daa74becaac6184f9e/1-Table1-1.png\" width=70%>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/2022.naacl-main.359/\">Cui et al., 2022</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "BERT struggles with negation:\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/f3b1ad7986eea54d381f65abf4a0da5887129339/2-Table1-1.png\" width=100%>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://aclanthology.org/2021.conll-1.19/\">Hartmann et al., 2021</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "<center>\n",
    "    <a href=\"slides/acl_2020_interpretability_tutorial-94-115.pdf\">\n",
    "    <img src=\"https://blog.acolyer.org/wp-content/uploads/2017/09/adversarial-reading-fig-1.jpeg?w=480\" width=50%>\n",
    "    </a>\n",
    "</center>\n",
    "\n",
    "(from [Belinkov, Gehrmann and Pavlick, 2020](https://www.aclweb.org/anthology/2020.acl-tutorials.1.pdf); [slides](https://sebastiangehrmann.com/assets/files/acl_2020_interpretability_tutorial.pdf))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Further reading\n",
    "\n",
    "- [Belinkov and Glass, 2020. Analysis Methods in Neural Language Processing: A Survey](https://www.aclweb.org/anthology/Q19-1004.pdf)\n",
    "- [Hewitt, 2020. Designing and Interpreting Probes](https://nlp.stanford.edu//~johnhew//interpreting-probes.html)\n",
    "- [Lawrence, 2020. Interpretability and Analysis of Models for NLP @ ACL 2020](https://medium.com/@lawrence.carolin/interpretability-and-analysis-of-models-for-nlp-e6b977ac1dc6)\n",
    "- [Søgaard, 2021. Explainable Natural Language Processing. Morgan & Claypool](https://link.springer.com/book/10.1007/978-3-031-02180-0)\n",
    "- Jesse Vig's blog posts:\n",
    "  - [Deconstructing BERT: Distilling 6 Patterns from 100 Million Parameters](https://towardsdatascience.com/deconstructing-bert-distilling-6-patterns-from-100-million-parameters-b49113672f77)\n",
    "  - [Deconstructing BERT, Part 2: Visualizing the Inner Workings of Attention](https://towardsdatascience.com/deconstructing-bert-part-2-visualizing-the-inner-workings-of-attention-60a16d86b5c1)"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}