{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/html": [ "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "%load_ext tikzmagic" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Interpretability\n", "\n", "or: opening the black box of NLP models\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* Visualisation\n", "* Probing\n", "* Adversarial evaluation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Visualisation\n", "\n", "
\n", " \n", " \n", "
\n", "\n", "
\n", " (from Belinkov, Gehrmann and Pavlick, 2020\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Inspecting weights\n", "\n", "Easy with [n-gram feature-based lienar text classifiers](doc_classify_slides_short.ipynb):\n", "\n", "![unigram_positive_weights](../img/unigram_positive_weights.png)\n", "![unigram_negative_weights](../img/unigram_negative_weights.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Not so simple for neural models.\n", "\n", "![transformers](../img/transformers.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Visualising embeddings\n", "\n", "t-SNE for dimensionality reduction ([van der Maaten and Hinton, 2009](https://lvdmaaten.github.io/publications/papers/JMLR_2008.pdf))\n", "\n", "word2vec:\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Visualising embeddings\n", "\n", "t-SNE for dimensionality reduction ([van der Maaten and Hinton, 2009](https://lvdmaaten.github.io/publications/papers/JMLR_2008.pdf))\n", "\n", "BERT:\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Visualising attention\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Visualising self-attention\n", "\n", "
\n", " \n", "
\n", "\n", "
\n", " (from Vig, 2019)\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Visualising self-attention\n", "\n", "
\n", " \n", "
\n", "\n", "
\n", " (from Vig, 2019)\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Caveats\n", "\n", "- Attention is not explanation (Jain and Wallace, 2019)\n", "- Attention is not not explanation (Wiegreffe and Pinter, 2019)\n", "\n", "
\n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Close inspection\n", "\n", "
\n", " \n", " \n", " \n", "
\n", "\n", "
\n", " (from John Hewitt; slides)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Probing" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- To find out whether a representation R encodes a property P\n", " - train a model to predict P from the R\n", " - or cast the problem directly according to R's training objective" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Linguistic knowledge in BERT\n", "\n", "Train a simple linear classifier on top of BERT\n", "\n", "\n", "\n", "(from [Liu et al., 2019](https://aclanthology.org/N19-1112/))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "\n", "\n", "(from [Liu et al., 2019](https://aclanthology.org/N19-1112/))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### World knowledge in BERT\n", "\n", "Probing by masked language modelling (no extra parameters!)\n", "\n", "
\n", " \n", "
\n", "\n", "
\n", " (from Petroni et al., 2019)\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Language models as linguistic test subjects\n", "\n", "
\n", " \n", " \n", " \n", "
\n", "\n", "
\n", " (from John Hewitt; slides)\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Adversarial evaluation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "BERT struggles with quantifiers:\n", "\n", "
\n", " \n", "
\n", "\n", "
\n", " (from Cui et al., 2022)\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "BERT struggles with negation:\n", "\n", "
\n", " \n", "
\n", "\n", "
\n", " (from Hartmann et al., 2021)\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "
\n", " \n", " \n", " \n", "
\n", "\n", "(from [Belinkov, Gehrmann and Pavlick, 2020](https://www.aclweb.org/anthology/2020.acl-tutorials.1.pdf); [slides](https://sebastiangehrmann.com/assets/files/acl_2020_interpretability_tutorial.pdf))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Further reading\n", "\n", "- [Belinkov and Glass, 2020. Analysis Methods in Neural Language Processing: A Survey](https://www.aclweb.org/anthology/Q19-1004.pdf)\n", "- [Hewitt, 2020. Designing and Interpreting Probes](https://nlp.stanford.edu//~johnhew//interpreting-probes.html)\n", "- [Lawrence, 2020. Interpretability and Analysis of Models for NLP @ ACL 2020](https://medium.com/@lawrence.carolin/interpretability-and-analysis-of-models-for-nlp-e6b977ac1dc6)\n", "- [Søgaard, 2021. Explainable Natural Language Processing. Morgan & Claypool](https://link.springer.com/book/10.1007/978-3-031-02180-0)\n", "- Jesse Vig's blog posts:\n", " - [Deconstructing BERT: Distilling 6 Patterns from 100 Million Parameters](https://towardsdatascience.com/deconstructing-bert-distilling-6-patterns-from-100-million-parameters-b49113672f77)\n", " - [Deconstructing BERT, Part 2: Visualizing the Inner Workings of Attention](https://towardsdatascience.com/deconstructing-bert-part-2-visualizing-the-inner-workings-of-attention-60a16d86b5c1)" ] } ], "metadata": { "celltoolbar": "Slideshow", "hide_input": false, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" } }, "nbformat": 4, "nbformat_minor": 4 }