{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "bukochgPFg7s" }, "source": [ "# Getting Started with tabular data!\n", "[](https://mybinder.org/v2/gh/understandable-machine-intelligence-lab/Quantus/main?labpath=tutorials%2FTutorial_Getting_Started_with_Tabular_Data.ipynb)\n", "\n", "\n", "This notebook shows how to get started with Quantus using tabular data. For this purpose, we use the classic Titanic tabular dataset (Frank E. Harrell Jr., Thomas Cason):\n", "\n", "https://www.openml.org/d/40945\n", "\n", "The model in this notebook is taken from \"Getting started with Captum - Titanic Data Analysis\" provided by Captum:\n", "\n", "https://captum.ai/tutorials/Titanic_Basic_Interpret" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "from IPython.display import clear_output" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "4Y7_mNf9Bic0" }, "outputs": [], "source": [ "!pip install quantus torch captum tensorflow-datasets pandas\n", "\n", "clear_output()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "RV7X-Ss9-16F" }, "outputs": [], "source": [ "import pathlib\n", "import numpy as np\n", "import pandas as pd\n", "from sklearn.model_selection import train_test_split\n", "\n", "import quantus\n", "from captum.attr import IntegratedGradients\n", "\n", "import torch\n", "import torch.nn as nn\n", "\n", "torch.manual_seed(27)\n", "\n", "clear_output()\n", "\n", "np.random.seed(27)" ] }, { "cell_type": "markdown", "metadata": { "id": "mGhP4bTuoWYF" }, "source": [ "## 1) Preliminaries" ] }, { "cell_type": "markdown", "metadata": { "id": "XqKzag4VFjHT" }, "source": [ "### 1.1 Load datasets\n", "\n", "We load the dataset using the tensorflow-datasets library. Alternatively, it can be downloaded directly from the OpenML website: https://www.openml.org/d/40945" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "TmsZxFhuc0mm" }, "outputs": [], "source": [ "# Load datasets\n", "df = pd.read_csv(\"assets/titanic3.csv\")\n", "df = df[[\"age\", \"embarked\", \"fare\", \"parch\", \"pclass\", \"sex\", \"sibsp\", \"survived\"]]\n", "df[\"age\"] = df[\"age\"].fillna(df[\"age\"].mean())\n", "df[\"fare\"] = df[\"fare\"].fillna(df[\"fare\"].mean())" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": " age fare parch pclass sibsp \\\ncount 1309.000000 1309.000000 1309.000000 1309.000000 1309.000000 \nmean 29.881138 33.295479 0.385027 2.294882 0.498854 \nstd 12.883193 51.738879 0.865560 0.837836 1.041658 \nmin 0.170000 0.000000 0.000000 1.000000 0.000000 \n25% 22.000000 7.895800 0.000000 2.000000 0.000000 \n50% 29.881138 14.454200 0.000000 3.000000 0.000000 \n75% 35.000000 31.275000 0.000000 3.000000 1.000000 \nmax 80.000000 512.329200 9.000000 3.000000 8.000000 \n\n survived \ncount 1309.000000 \nmean 0.381971 \nstd 0.486055 \nmin 0.000000 \n25% 0.000000 \n50% 0.000000 \n75% 1.000000 \nmax 1.000000 ", "text/html": "
| \n | age | \nfare | \nparch | \npclass | \nsibsp | \nsurvived | \n
|---|---|---|---|---|---|---|
| count | \n1309.000000 | \n1309.000000 | \n1309.000000 | \n1309.000000 | \n1309.000000 | \n1309.000000 | \n
| mean | \n29.881138 | \n33.295479 | \n0.385027 | \n2.294882 | \n0.498854 | \n0.381971 | \n
| std | \n12.883193 | \n51.738879 | \n0.865560 | \n0.837836 | \n1.041658 | \n0.486055 | \n
| min | \n0.170000 | \n0.000000 | \n0.000000 | \n1.000000 | \n0.000000 | \n0.000000 | \n
| 25% | \n22.000000 | \n7.895800 | \n0.000000 | \n2.000000 | \n0.000000 | \n0.000000 | \n
| 50% | \n29.881138 | \n14.454200 | \n0.000000 | \n3.000000 | \n0.000000 | \n0.000000 | \n
| 75% | \n35.000000 | \n31.275000 | \n0.000000 | \n3.000000 | \n1.000000 | \n1.000000 | \n
| max | \n80.000000 | \n512.329200 | \n9.000000 | \n3.000000 | \n8.000000 | \n1.000000 | \n