{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Mirando dentro de la caja negra\n", "## Interpretabilidad de modelos de machine learning" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Rodrigo Parra\n", "### @rparrapy\n", "### rodrigo@codium.com.py" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## ¿Interpretabilidad?\n", "\n", "> Interpretability is the degree to which a human can understand the cause of a decision.\n", "\n", "Miller, Explanation in Artificial Intelligence: Insights from the Social Sciences (2017)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## ¿Por qué?\n", "\n", "![curiosidad](images/xkcd.png)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Curiosidad\n", "\n", "![curiosidad](images/curiosity.jpg)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Seguridad\n", "![seguridad](images/security.jpg)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Detección de sesgo\n", "![sesgo](images/bias.jpg)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Aceptación social\n", "\n", "![social](images/social.jpg)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Auditoría\n", "\n", "![auditoria](images/auditoria.jpeg)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## ¿Cómo?\n", "![auditoria](images/taxonomy.jpg)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Antes\n", "#### Análisis exploratorio de datos\n", "\n", "![eda](images/eda-2.jpg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Durante\n", "#### Regresión Lineal\n", "\n", "![linear](images/linear.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![linear-formula](images/linear-formula.jpg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Después\n", "#### Modelos locales subrogados\n", "\n", "1. Entrenamos un modelo de caja negra con $x$ e $y$ : $f(x)=\\hat{y}$ \n", "1. Entrenamos un modelo de caja negra con $x$ e $\\hat{y}$ : $g(x)=\\bar{y}$\n", "\n", "![surrogate](images/surrogate.jpg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Local interpretable model-agnostic explanations (LIME)\n", "\n", "![lime](images/lime.jpg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# ~~Mirando dentro de~~ Explicando la caja negra\n", "## Interpretabilidad de modelos de machine learning" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "> LEARNING = REPRESENTATION + EVALUATION + OPTIMIZATION\n", "\n", "Domingos, A Few Useful Things to Know about Machine Learning (2012)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Representación\n", "- Modelo de caja negra: no nos importa\n", "- Modelo interpretable: regresión lineal, árboles de decisión, etc." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Evaluación\n", "![lime](images/lime-optimal.jpg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Evaluación (2)\n", "![lime](images/lime-cost-4.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Optimización\n", "\n", "- Modelo de caja negra: descenso de gradiente estocástico \n", "- Modelo interpretable: mínimos cuadrados, suponiendo que usamos regresión lineal" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![lime](images/math-meme.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### El algoritmo de LIME\n", "\n", "![lime](images/lime-algoritmo.jpg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Una imagen vale más que mil palabras\n", "\n", "![lime](images/lime-figura.jpg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### O dos\n", "\n", "![lime](images/lime-dog.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![lime](images/show-code.jpg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Un problema de ejemplo\n", "https://www.youtube.com/watch?v=ACmydtFDTGs" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using TensorFlow backend.\n" ] } ], "source": [ "from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img\n", "\n", "train_data_dir = './seefood/train'\n", "test_data_dir = './seefood/test'\n", "img_height = 150\n", "img_width = 150\n", "batch_size = 32\n", "nb_train_samples = 498\n", "epochs = 10" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "train_datagen = ImageDataGenerator(\n", " rotation_range=40,\n", " width_shift_range=0.2,\n", " height_shift_range=0.2,\n", " rescale=1./255,\n", " shear_range=0.2,\n", " zoom_range=0.2,\n", " horizontal_flip=True,\n", " fill_mode='nearest')\n", "\n", "test_datagen = ImageDataGenerator(rescale=1./255)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found 498 images belonging to 2 classes.\n" ] } ], "source": [ "# the .flow() command below generates batches of randomly transformed images\n", "# and saves the results to the `preview/` directory\n", "for i, batch in enumerate(train_datagen.flow_from_directory(train_data_dir, batch_size=1,\n", " save_to_dir='preview', save_prefix='seefood', save_format='jpeg')):\n", " if i > 5:\n", " break # otherwise the generator would loop indefinitely" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "