{
"cells": [
{
"cell_type": "markdown",
"source": [
"# Quantus + NLP\n",
"[](https://mybinder.org/v2/gh/understandable-machine-intelligence-lab/Quantus/main?labpath=tutorials%2FTutorial_NLP_Demonstration.ipynb)\n",
"\n",
"\n",
"This tutorial demonstrates how to use the library for robustness evaluation\n",
"explanation of text classification models.\n",
"For this purpose, we use a pre-trained `Distilbert` model from [Huggingface](https://huggingface.co/models) and `GLUE/SST2` dataset [here](https://huggingface.co/datasets/sst2).\n",
"\n",
"This is not a working example yet, and is meant only for demonstration purposes \n",
"so far. For this demo, we use a (yet) unreleased version of Quantus.\n",
"\n",
"Author: Artem Sereda\n",
"\n",
"[](https://colab.research.google.com/drive/1eWK9ebfMUVRG4mrOAQvXdJ452SMLfffv?usp=sharing)"
],
"metadata": {
"collapsed": false,
"id": "1sXtIxKhnXp9"
}
},
{
"cell_type": "code",
"execution_count": 7,
"outputs": [],
"source": [
"from __future__ import annotations"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"# Use an unreleased version of Quantus.\n",
"!pip install 'quantus @ git+https://github.com/aaarrti/Quantus.git@nlp-domain' --no-deps\n",
"!pip install transformers datasets nlpaug tf_explain tensorflow_probability"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 2,
"outputs": [
{
"data": {
"text/plain": "[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),\n PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]"
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"from datasets import load_dataset\n",
"import tensorflow as tf\n",
"from functools import partial\n",
"import logging\n",
"from typing import NamedTuple, List, Any\n",
"from transformers import AutoTokenizer, TFDistilBertForSequenceClassification, TFPreTrainedModel, PreTrainedTokenizerFast\n",
"import quantus.nlp as qn\n",
"import matplotlib.pyplot as plt\n",
"import tensorflow_probability as tfp\n",
"\n",
"# Suppress debug logs.\n",
"logging.getLogger('absl').setLevel(logging.WARNING)\n",
"tf.config.list_physical_devices()"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"## 1) Preliminaries"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"### 1.1 Load pre-trained model and tokenizer from [huggingface](https://huggingface.co/models) hub"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 3,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Metal device set to: AMD Radeon Pro 560\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"All model checkpoint layers were used when initializing TFDistilBertForSequenceClassification.\n",
"\n",
"All the layers of TFDistilBertForSequenceClassification were initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english.\n",
"If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.\n"
]
}
],
"source": [
"MODEL_NAME = \"distilbert-base-uncased-finetuned-sst-2-english\"\n",
"tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)\n",
"model = TFDistilBertForSequenceClassification.from_pretrained(MODEL_NAME)"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"### 1.2 Load test split of [GLUE/SST2](https://huggingface.co/datasets/sst2) dataset"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 4,
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:datasets.builder:Found cached dataset sst2 (/Users/artemsereda/.cache/huggingface/datasets/sst2/default/2.0.0/9896208a8d85db057ac50c72282bcb8fe755accc671a57dd8059d4e130961ed5)\n"
]
},
{
"data": {
"text/plain": " 0%| | 0/3 [00:00, ?it/s]",
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "20913dacfa344ee89a9b5a5c6db3264c"
},
"application/json": {
"n": 0,
"total": 3,
"elapsed": 0.047647953033447266,
"ncols": null,
"nrows": null,
"prefix": "",
"ascii": false,
"unit": "it",
"unit_scale": false,
"rate": null,
"bar_format": null,
"postfix": null,
"unit_divisor": 1000,
"initial": 0,
"colour": null
}
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"BATCH_SIZE = 8\n",
"dataset = load_dataset(\"sst2\")['test']\n",
"x_batch = dataset['sentence'][:BATCH_SIZE]"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"Run an example inference, and demonstrate models predictions."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 5,
"outputs": [
{
"data": {
"text/plain": " 0 1\n0 uneasy mishmash of styles and genres . negative\n1 this film 's relationship to actual tension is... negative\n2 by the end of no such thing the audience , lik... positive\n3 director rob marshall went out gunning to make... positive\n4 lathan and diggs have considerable personal ch... positive\n5 a well-made and often lovely depiction of the ... positive\n6 none of this violates the letter of behan 's b... negative\n7 although it bangs a very cliched drum at times... positive",
"text/html": "
\n\n
\n \n
\n
\n
0
\n
1
\n
\n \n \n
\n
0
\n
uneasy mishmash of styles and genres .
\n
negative
\n
\n
\n
1
\n
this film 's relationship to actual tension is...
\n
negative
\n
\n
\n
2
\n
by the end of no such thing the audience , lik...
\n
positive
\n
\n
\n
3
\n
director rob marshall went out gunning to make...
\n
positive
\n
\n
\n
4
\n
lathan and diggs have considerable personal ch...
\n
positive
\n
\n
\n
5
\n
a well-made and often lovely depiction of the ...
\n
positive
\n
\n
\n
6
\n
none of this violates the letter of behan 's b...
\n
negative
\n
\n
\n
7
\n
although it bangs a very cliched drum at times...
\n
positive
\n
\n \n
\n
"
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"CLASS_NAMES = ['negative', 'positive']\n",
"\n",
"def decode_labels(y_batch: np.ndarray, class_names: List[str]) -> List[str]:\n",
" \"\"\"A helper function to map integer labels to human-readable class names.\"\"\"\n",
" return [class_names[i] for i in y_batch]\n",
"\n",
"# Run tokenizer.\n",
"tokens = tokenizer(x_batch, padding='longest', return_tensors='tf')\n",
"logits = model(**tokens).logits\n",
"y_batch = tf.argmax(tf.nn.softmax(logits), axis=1).numpy()\n",
"\n",
"# Show the x, y data.\n",
"pd.DataFrame([x_batch, decode_labels(y_batch, CLASS_NAMES)]).T"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"### 1.3 Helper functions: visualise explanations\n",
"\n",
"There are not many XAI libraries for NLP out there, so here we fully relly on our own implementations of explanation methods. This section write functions to visualise our explanations."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 8,
"outputs": [],
"source": [
"def plot_textual_heatmap(explanations: List[qn.TokenSalience]):\n",
"\n",
" \"\"\"\n",
" Plots attributions over a batch of text sequence explanations.\n",
"\n",
" References:\n",
" - https://stackoverflow.com/questions/74046734/plot-text-saliency-map-in-jupyter-notebook\n",
"\n",
" Parameters\n",
" ----------\n",
" explanations: List of Named tuples (tokens, salience) containing batch of explanations.\n",
"\n",
" Returns\n",
" -------\n",
" plot: matplotplib.pyplot object, which will be automatically rendered by jupyter.\n",
" \"\"\"\n",
"\n",
" h_len = len(explanations)\n",
" v_len = len(explanations[0].tokens)\n",
"\n",
" tokens = np.asarray([i.tokens for i in explanations]).reshape(-1)\n",
" colors = np.asarray([i.salience for i in explanations]).reshape(-1)\n",
"\n",
" fig, axes = plt.subplots(h_len, v_len, figsize=(v_len, h_len*0.5), gridspec_kw=dict(left=0., right=1.))\n",
" for i, ax in enumerate(axes.ravel()):\n",
" rect = plt.Rectangle((0, 0), 1, 1, color=(1., 1 - colors[i], 1- colors[i]))\n",
" ax.add_patch(rect)\n",
" ax.text(0.5, 0.5, tokens[i], ha='center', va='center')\n",
" ax.set_xlim(0, 1)\n",
" ax.set_ylim(0, 1)\n",
" ax.axis('off')\n",
"\n",
" ax = fig.add_axes([0, 0.05, 1 , 0.9], fc=[0, 0, 0, 0])\n",
" for axis in ['left', 'right']:\n",
" ax.spines[axis].set_visible(False)\n",
" ax.set_xticks([])\n",
" ax.set_yticks([])\n",
" return plt"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"### 1.4 Helper functions: generate explanations\n",
"\n",
"Write out functions to generate explanations using baseline methods: Gradient Norm and Integrated Gradients"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 9,
"outputs": [],
"source": [
"@tf.function(jit_compile=True)\n",
"def normalize(x: tf.Tensor) -> tf.Tensor:\n",
" \"\"\"\n",
" Normalize attribution values to comply with RGB standards.\n",
" - Take absolute values.\n",
" - Scale attribution scores, so that maximum value is 1.\n",
"\n",
" Parameters\n",
" ----------\n",
" x: 1D tensor containing attribution scores.\n",
"\n",
" Returns\n",
" -------\n",
" x: 1D tensor containing normalized attribution scores.\n",
" \"\"\"\n",
" abs = tf.abs(x)\n",
" max = tf.reduce_max(abs)\n",
" return abs / max\n",
"\n",
"\n",
"def explain_gradient_norm(\n",
" model: TFPreTrainedModel,\n",
" token_ids: tf.Tensor,\n",
" attention_mask: tf.Tensor,\n",
" target: int,\n",
" tokenizer: PreTrainedTokenizerFast\n",
") -> qn.TokenSalience:\n",
" \"\"\"\n",
" Computes token attribution score using the Gradient Norm method for a single point.\n",
"\n",
" Parameters\n",
" ----------\n",
" model:\n",
" Huggingface model, which is subject to explanation.\n",
" token_ids:\n",
" 1D Array of token ids.\n",
" attention_mask:\n",
" 1D array of attention mask.\n",
" target:\n",
" Predicted label.\n",
" tokenizer:\n",
" Huggingface tokenizer used to convert input_ids back to plain text tokens.\n",
"\n",
" Returns\n",
" -------\n",
"\n",
" a: quantus.nlp.TokenSalience\n",
" Named tuple (tokens, salience), with tokens and their respective attribution scores.\n",
" \"\"\"\n",
" # Convert tokens to embeddings.\n",
" embeddings = model.distilbert.get_input_embeddings()(input_ids=token_ids)\n",
" with tf.GradientTape() as tape:\n",
" tape.watch(embeddings)\n",
" logits = model(None,\n",
" inputs_embeds=embeddings,\n",
" attention_mask=attention_mask\n",
" ).logits\n",
" logits_for_label = tf.gather(logits, axis=1, indices=target)\n",
"\n",
" # Compute gradients of logits with respect to embeddings.\n",
" grads = tape.gradient(logits_for_label, embeddings)\n",
" # Compute L2 norm of gradients.\n",
" grad_norm = tf.linalg.norm(grads, axis=-1)\n",
" with tf.device('cpu'):\n",
" scores = normalize(grad_norm[0]).numpy()\n",
" return qn.TokenSalience(tokenizer.convert_ids_to_tokens(token_ids), scores)\n",
"\n",
"\n",
"def explain_gradient_norm_batch(\n",
" model: TFPreTrainedModel,\n",
" inputs: List[str],\n",
" targets: np.ndarray,\n",
" tokenizer: PreTrainedTokenizerFast\n",
") -> List[qn.TokenSalience]:\n",
" \"\"\"\n",
" Computes token attribution score using the Gradient Norm method for batch.\n",
"\n",
" Parameters\n",
" ----------\n",
" model:\n",
" Huggingface model, which is subject to explanation.\n",
" inputs:\n",
" List of plain text inputs.\n",
" targets:\n",
" 1D array of predicted labels.\n",
" tokenizer:\n",
" Huggingface tokenizer used to convert input_ids back to plain text tokens.\n",
"\n",
" Returns\n",
" -------\n",
"\n",
" a_batch: List of quantus.nlp.TokenSalience.\n",
" List of named tuples (tokens, salience), with tokens and their respective attribution scores.\n",
" \"\"\"\n",
" \"\"\"A wrapper around explain_gradient_norm which allows calling it on batch\"\"\"\n",
" tokens = tokenizer(inputs, return_tensors='tf', padding='longest')\n",
" batch_size = len(targets)\n",
" return [\n",
" explain_gradient_norm(model, tokens['input_ids'][i], tokens['attention_mask'][i], targets[i], tokenizer)\n",
" for i in range(batch_size)\n",
" ]\n",
"\n",
"\n",
"@tf.function(jit_compile=True)\n",
"def get_interpolated_inputs(\n",
" baseline: tf.Tensor,\n",
" target: tf.Tensor,\n",
" num_steps: int\n",
") -> tf.Tensor:\n",
" \"\"\"\n",
" Gets num_step linearly interpolated inputs from baseline to target.\n",
" Reference: https://github.com/PAIR-code/lit/blob/main/lit_nlp/components/gradient_maps.py#L238\n",
"\n",
" Returns\n",
" -------\n",
" interpolated_inputs: [num_steps, num_tokens, emb_size]\n",
" \"\"\"\n",
" baseline = tf.cast(baseline, dtype=tf.float64)\n",
" target = tf.cast(target, dtype=tf.float64)\n",
" delta = target - baseline # [num_tokens, emb_size]\n",
" # Creates scale values array of shape [num_steps, num_tokens, emb_dim],\n",
" # where the values in scales[i] are the ith step from np.linspace. [num_steps, 1, 1]\n",
" scales = tf.linspace(0, 1, num_steps + 1)[:, tf.newaxis, tf.newaxis]\n",
" shape = (num_steps + 1,) + delta.shape\n",
" # [num_steps, num_tokens, emb_size]\n",
" deltas = scales * tf.broadcast_to(delta, shape)\n",
" interpolated_inputs = baseline + deltas\n",
" return interpolated_inputs\n",
"\n",
"\n",
"\n",
"def explain_int_grad(\n",
" model: TFPreTrainedModel,\n",
" token_ids: tf.Tensor,\n",
" attention_mask: tf.Tensor,\n",
" target: int,\n",
" tokenizer: PreTrainedTokenizerFast,\n",
" num_steps: int\n",
") -> qn.TokenSalience:\n",
" \"\"\"\n",
" Computes token attribution score using the Integrated Gradients method for a single point.\n",
"\n",
" Parameters\n",
" ----------\n",
" model:\n",
" Huggingface model, which is subject to explanation.\n",
" token_ids:\n",
" 1D Array of token ids.\n",
" attention_mask:\n",
" 1D array of attention mask.\n",
" target:\n",
" Predicted label.\n",
" tokenizer:\n",
" Huggingface tokenizer used to convert input_ids back to plain text tokens.\n",
"\n",
" Returns\n",
" -------\n",
"\n",
" a: quantus.nlp.TokenSalience\n",
" Named tuple (tokens, salience), with tokens and their respective attribution scores.\n",
" \"\"\"\n",
" # Convert tokens to embeddings.\n",
" embeddings = model.distilbert.get_input_embeddings()(input_ids=token_ids)[0]\n",
" baseline = tf.zeros_like(embeddings)\n",
" # Generate interpolation from 0 to embeddings.\n",
" with tf.device('cpu'):\n",
" interpolated_embeddings = get_interpolated_inputs(baseline, embeddings, num_steps)\n",
" interpolated_embeddings = tf.cast(interpolated_embeddings, tf.float32)\n",
" interpolated_attention_mask = tf.stack([attention_mask for i in range(num_steps + 1)])\n",
" with tf.GradientTape() as tape:\n",
" tape.watch(interpolated_embeddings)\n",
" logits = model(None,\n",
" inputs_embeds=interpolated_embeddings,\n",
" attention_mask=interpolated_attention_mask,\n",
" ).logits\n",
" logits_for_label = tf.gather(logits, axis=1, indices=target)\n",
"\n",
" # Compute gradients of logits with respect to interpolations.\n",
" grads = tape.gradient(logits_for_label, interpolated_embeddings)\n",
" # Integrate gradients.\n",
" int_grad = tfp.math.trapz(tfp.math.trapz(grads, axis=0))\n",
" with tf.device('cpu'):\n",
" scores = normalize(int_grad).numpy()\n",
" return qn.TokenSalience(tokenizer.convert_ids_to_tokens(token_ids), scores)\n",
"\n",
"\n",
"def explain_int_grad_batch(\n",
" model: TFPreTrainedModel,\n",
" inputs: List[str],\n",
" targets: np.ndarray,\n",
" tokenizer: PreTrainedTokenizerFast,\n",
" num_steps: int = 10\n",
") -> List[qn.TokenSalience]:\n",
" \"\"\"\n",
" Computes token attribution score using the Integrated Gradients method for batch.\n",
"\n",
" Parameters\n",
" ----------\n",
" model:\n",
" Huggingface model, which is subject to explanation.\n",
" inputs:\n",
" List of plain text inputs.\n",
" targets:\n",
" 1D array of predicted labels.\n",
" tokenizer:\n",
" Huggingface tokenizer used to convert input_ids back to plain text tokens.\n",
"\n",
" num_steps: int.\n",
" Number of interpolations steps, default=10.\n",
"\n",
" Returns\n",
" -------\n",
" a_batch: List of quantus.nlp.TokenSalience.\n",
" List of named tuples (tokens, salience), with tokens and their respective attribution scores.\n",
" \"\"\"\n",
" tokens = tokenizer(inputs, return_tensors='tf', padding='longest')\n",
" batch_size = len(targets)\n",
" return [\n",
" explain_int_grad(model, tokens['input_ids'][i], tokens['attention_mask'][i], targets[i], tokenizer, num_steps)\n",
" for i in range(batch_size)\n",
" ]\n",
"\n",
"\n",
"\n",
"# Create functions which match the signature required by Quantus.\n",
"explain_gradient_norm_func = partial(explain_gradient_norm_batch, tokenizer=tokenizer)\n",
"explain_int_grad_func = partial(explain_int_grad_batch, tokenizer=tokenizer)"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"### 1.5 Visualise the explanations."
],
"metadata": {
"id": "Bo1yUcCh_zBD"
}
},
{
"cell_type": "code",
"source": [
"# Visualise GradNorm.\n",
"a_batch_grad_norm = explain_gradient_norm_func(model, x_batch[2:5], y_batch[2:5])\n",
"plot_textual_heatmap(a_batch_grad_norm)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 187
},
"id": "jNogPZU8ShAr",
"outputId": "25608dfc-5f87-4571-b8a8-909ab6de5a3f"
},
"execution_count": 11,
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/var/folders/vv/f22t8y7d1l96ynv9mzgy0j5w0000gn/T/ipykernel_28733/2299850630.py:33: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.\n",
" ax = fig.add_axes([0, 0.05, 1 , 0.9], fc=[0, 0, 0, 0])\n"
]
},
{
"data": {
"text/plain": ""
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": "
",
"image/png": "\n"
},
"metadata": {},
"output_type": "display_data"
}
]
},
{
"cell_type": "code",
"execution_count": 12,
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/var/folders/vv/f22t8y7d1l96ynv9mzgy0j5w0000gn/T/ipykernel_28733/2299850630.py:33: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.\n",
" ax = fig.add_axes([0, 0.05, 1 , 0.9], fc=[0, 0, 0, 0])\n"
]
},
{
"data": {
"text/plain": ""
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": "