{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Animal Detection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Table of Contents" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Overview\n", "4. Dependencies\n", "5. Data\n", "6. Model\n", "7. Model Evaluation\n", "9. Exercises" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Overview" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Integrated Gradients is a technique used in image analysis of any kind (in our case, animals) to understand \n", "which pixels contribute most to a model's prediction. It calculates pixel-wise attributions, \n", "revealing the importance of each part of an image in the classification decision. This \n", "helps practitioners, researchers and others alike to:\n", "\n", "1. **Identify Key Features**: By highlighting specific regions, Integrated Gradients \n", "can pinpoint the visual characteristics that influence a model's classification, such \n", "as unique markings or features on animals.\n", "\n", "2. **Explain Model Decisions**: It provides more interpretable explanations for why an AI \n", "system classified an image in a particular way, aiding in understanding the model's \n", "decision-making process.\n", "\n", "3. **Detect Biases**: Integrated Gradients can help reveal if a model relies on \n", "biased or irrelevant features, which is crucial for mitigating biases and ensuring \n", "fairness in the task you are seeking to accomplish.\n", "\n", "4. **Improve Models**: Researchers can use these insights to refine models, enhance \n", "classification accuracy, and contribute to building better models.\n", "\n", "5. **Educate and Raise Awareness**: Transparent explanations generated by Integrated \n", "Gradients can be used to educate the public and raise awareness about animal identification \n", "and conservation challenges, promoting broader engagement in conservation efforts." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Dependencies" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are the packages we will be using in this notebook.\n", "\n", "- `matplotlib`\n", "- `numpy`\n", "- `alibi`\n", "- `datasets`\n", "- `tensorflow==2.8`\n", "\n", "Please note, due to some mismatch between the latest tensorflow image and other tools, \n", "you will need to pin the version of tensorflow." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install datasets matplotlib 'alibi[tensorflow]' numpy rich tensorflow==2.8" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from PIL import Image\n", "import tensorflow as tf\n", "import matplotlib.pyplot as plt\n", "from alibi.explainers import IntegratedGradients\n", "from tensorflow.keras.applications.resnet_v2 import ResNet50V2\n", "from alibi.datasets import load_cats\n", "from alibi.utils import visualize_image_attr\n", "print('TF version: ', tf.__version__)\n", "print('Eager execution enabled: ', tf.executing_eagerly())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data here can be your own, personally curated one. We will first load 4 samples of cats, \n", "then move on to the beautiful luna, and finish with different examples." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "image_shape = (224, 224, 3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "load_cats??" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data, labels = load_cats(target_size=image_shape[:2], return_X_y=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "labels" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(f'Images shape: {data.shape}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = (data / 255).astype('float32')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "i = 1\n", "plt.imshow(data[i]);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "ResNet50 is a convolutional neural network architecture that is 50 layers deep and \n", "is commonly used for image classification tasks. Here are two ways to think of ResNets:\n", "\n", "For practitioners:\n", "\n", "ResNet50 is a residual neural network first introduced in 2015. It consists of 5 \n", "stages stacked together, with each stage having a convolution layer followed by \n", "identity mappings that skip over a few convolution layers. This \"skip connection\" \n", "structure allows information to shortcut across layers, avoiding the vanishing \n", "gradient problem when training very deep networks. After the convolutions, there \n", "is an average pooling layer and fully connected layer for the output. The 50 in \n", "ResNet50 refers to it having 50 weight layers. ResNet50 achieved state-of-the-art \n", "accuracy on ImageNet classification while being easier to optimize than previous \n", "deep models. It is widely used as a powerful pretrained feature extractor for \n", "computer vision tasks.\n", "\n", "For non-practitioners: \n", "\n", "ResNet50 is like a very deep maze (50 layers) that images can go through to be \n", "classified into categories like dogs, cats, cars etc. Going through such a deep \n", "maze makes it hard for information to flow from the beginning to the end. To solve \n", "this, ResNet50 adds shortcut tunnels between some of the layers. So some information \n", "can skip ahead instead of getting lost. This allows ResNet50 to be trained very \n", "accurately on huge image datasets like ImageNet. The whole network acts like a smart \n", "feature extractor that can pick out patterns useful for identifying objects. This \n", "knowledge can then be transferred to classify new images by connecting ResNet50 to \n", "a simpler network. The shortcut design enables ResNet50 to successfully train and \n", "extract powerful features from images despite being super deep." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = ResNet50V2(weights='imagenet')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Model Evaluation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Integrated Gradients is a method to explain individual predictions for deep neural networks by attributing importance to input features.\n", "\n", "Imagine a neural network that classifies images of animals. We want to explain why it predicted \"bird\" for a particular photo. \n", "\n", "1. Take the input image and a baseline image (e.g. a solid gray image). \n", "2. Interpolate between the baseline and input image in small steps. So we get images \n", "slowly going from gray to the original.\n", "3. At each step, pass the interpolated image into the network to get a prediction. \n", "4. Calculate the gradients of the prediction with respect to the input pixels at each \n", "step. The gradients indicate how sensitive the prediction is to changes in each pixel.\n", "5. Integrate the gradients across all the steps. This gives importance scores for each pixel.\n", "\n", "Pixels with high integrated gradients contributed significantly to pushing the network from an uninformative baseline to predicting \"bird\". These pixels are most important for the decision.\n", "\n", "An analogy is explaining why a cake tastes sweet. We take small steps adding ingredients to a baseline of an empty bowl:\n", "\n", "1) Interpolate between empty bowl and final cake batter \n", "2) Taste each step, measure change in sweetness\n", "3) Integrate to get importance of each ingredient to sweetness\n", "\n", "This reveals sugar as highly important, while flour is less so.\n", "\n", "In the first example, the baselines (i.e. the starting points of the path integral) are black \n", "images (all pixel values are set to zero). This means that black areas of the image will always \n", "have zero attributions. In the second example we consider random uniform noise baselines. The \n", "path integral is defined as a straight line from the baseline to the input image. The path is \n", "approximated by choosing 50 discrete steps according to the Gauss-Legendre method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n_steps = 20\n", "method = \"gausslegendre\"\n", "internal_batch_size = 20\n", "\n", "ig = IntegratedGradients(model, n_steps=n_steps, method=method, internal_batch_size=internal_batch_size)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "instance = np.expand_dims(data[1], axis=0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "predictions = model(instance).numpy()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "predictions.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "predictions = predictions.argmax(axis=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ig.explain??" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "explanation = ig.explain(\n", " instance, baselines=None, target=predictions\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Metadata from the explanation object\n", "explanation.meta" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Data fields from the explanation object\n", "explanation.data.keys()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Get attributions values from the explanation object\n", "attrs = explanation.attributions[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def compare_image(image, attrs):\n", " fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(10, 5))\n", " visualize_image_attr(\n", " attr=None, original_image=image, method='original_image',\n", " title='Original Image', plt_fig_axis=(fig, ax[0]), use_pyplot=False\n", " );\n", "\n", " visualize_image_attr(\n", " attr=attrs.squeeze(), original_image=image, method='blended_heat_map',\n", " sign='all', show_colorbar=True, title='Overlaid Attributions',\n", " plt_fig_axis=(fig, ax[1]), use_pyplot=True\n", " );" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "compare_image(data[1], attrs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Random baselines\n", "Here we show the attributions obtained choosing random uniform noise as a baseline. You might notice \n", "that the attributions can be considerably different from the previous example, where the black image \n", "is taken as a baseline. An extensive discussion about the impact of the baselines on integrated \n", "gradients attributions can be found in P. Sturmfels at al., “Visualizing the Impact of Feature \n", "Attribution Baselines”." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "baselines = np.random.random_sample(instance.shape)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "explanation = ig.explain(\n", " instance, baselines=baselines, target=predictions\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "attrs = explanation.attributions[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sample image from the test dataset and its attributions. The attributions are shown by \n", "overlaying the attributions values for each pixel to the original image. The attribution \n", "value for a pixel is obtained by summing up the attributions values for the three color \n", "channels. The attributions are scaled in a $[-1, 1]$ red pixel represents negative \n", "attributions, while green pixels represents positive attributions. The original image is \n", "shown in gray scale for clarity." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "compare_image(data[1], attrs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Our own example" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "img = Image.open('data/images/luna_resized.png')\n", "img" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "img_array = np.asarray(img)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "img = (img_array / 255).astype('float32')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(img[None]).shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "instance = np.expand_dims(img, axis=0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "predictions = model(instance).numpy().argmax(axis=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "explanation = ig.explain(\n", " instance, baselines=None, target=predictions\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Get attributions values from the explanation object\n", "attrs = explanation.attributions[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "compare_image(img, attrs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Exercises" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Find pictures of things that you like and see try to evaluate the following.\n", "- which pixel are influencing the prediction the most?\n", "- what would happen if I change any of these pixes by a little bit or a lot? Will the \n", "model still predict the correct class?" ] } ], "metadata": { "kernelspec": { "display_name": "xai", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }