{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Bias scan using Multi-Dimensional Subset Scan (MDSS)\n", "\n", "\"Identifying Significant Predictive Bias in Classifiers\" https://arxiv.org/abs/1611.08292\n", "\n", "The goal of bias scan is to identify a subgroup(s) that has significantly more predictive bias than would be expected from an unbiased classifier. There are $\\prod_{m=1}^{M}\\left(2^{|X_{m}|}-1\\right)$ unique subgroups from a dataset with $M$ features, with each feature having $|X_{m}|$ discretized values, where a subgroup is any $M$-dimension\n", "Cartesian set product, between subsets of feature-values from each feature --- excluding the empty set. Bias scan mitigates this computational hurdle by approximately identifing the most statistically biased subgroup in linear time (rather than exponential).\n", "\n", "\n", "We define the statistical measure of predictive bias function, $score_{bias}(S)$ as a likelihood ratio score and a function of a given subgroup $S$. The null hypothesis is that the given prediction's odds are correct for all subgroups in\n", "\n", "$\\mathcal{D}$: $H_{0}:odds(y_{i})=\\frac{\\hat{p}_{i}}{1-\\hat{p}_{i}}\\ \\forall i\\in\\mathcal{D}$.\n", "\n", "The alternative hypothesis assumes some constant multiplicative bias in the odds for some given subgroup $S$:\n", "\n", "\n", "$H_{1}:\\ odds(y_{i})=q\\frac{\\hat{p}_{i}}{1-\\hat{p}_{i}},\\ \\text{where}\\ q>1\\ \\forall i\\in S\\ \\mbox{and}\\ q=1\\ \\forall i\\notin S.$\n", "\n", "In the classification setting, each observation's likelihood is Bernoulli distributed and assumed independent. This results in the following scoring function for a subgroup $S$\n", "\n", "\\begin{align*}\n", "score_{bias}(S)= & \\max_{q}\\log\\prod_{i\\in S}\\frac{Bernoulli(\\frac{q\\hat{p}_{i}}{1-\\hat{p}_{i}+q\\hat{p}_{i}})}{Bernoulli(\\hat{p}_{i})}\\\\\n", "= & \\max_{q}\\log(q)\\sum_{i\\in S}y_{i}-\\sum_{i\\in S}\\log(1-\\hat{p}_{i}+q\\hat{p}_{i}).\n", "\\end{align*}\n", "Our bias scan is thus represented as: $S^{*}=FSS(\\mathcal{D},\\mathcal{E},F_{score})=MDSS(\\mathcal{D},\\hat{p},score_{bias})$.\n", "\n", "where $S^{*}$ is the detected most anomalous subgroup, $FSS$ is one of several subset scan algorithms for different problem settings, $\\mathcal{D}$ is a dataset with outcomes $Y$ and discretized features $\\mathcal{X}$, $\\mathcal{E}$ are a set of expectations or 'normal' values for $Y$, and $F_{score}$ is an expectation-based scoring statistic that measures the amount of anomalousness between subgroup observations and their expectations.\n", "\n", "Predictive bias emphasizes comparable predictions for a subgroup and its observations and Bias scan provides a more general method that can detect and characterize such bias, or poor classifier fit, in the larger space of all possible subgroups, without a priori specification." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import itertools\n", "\n", "from aif360.metrics import BinaryLabelDatasetMetric \n", "from aif360.metrics.mdss_classification_metric import MDSSClassificationMetric\n", "from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions import load_preproc_data_compas\n", "\n", "from IPython.display import Markdown, display\n", "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from aif360.metrics import BinaryLabelDatasetMetric " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll demonstrate scoring a subset and finding the most anomalous subset with bias scan using the compas dataset.\n", "\n", "We can specify subgroups to be scored or scan for the most anomalous subgroup. Bias scan allows us to decide if we aim to identify bias as `higher` than expected probabilities or `lower` than expected probabilities. Depending on the favourable label, the corresponding subgroup may be categorized as priviledged or unprivileged." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "np.random.seed(0)\n", "\n", "dataset_orig = load_preproc_data_compas()\n", "\n", "female_group = [{'sex': 1}]\n", "male_group = [{'sex': 0}]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The dataset has the categorical features one-hot encoded so we'll modify the dataset to convert them back \n", "to the categorical featues because scanning one-hot encoded features may find subgroups that are not meaningful eg. a subgroup with 2 race values. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "dataset_orig_df = pd.DataFrame(dataset_orig.features, columns=dataset_orig.feature_names)\n", "\n", "age_cat = np.argmax(dataset_orig_df[['age_cat=Less than 25', 'age_cat=25 to 45', \n", " 'age_cat=Greater than 45']].values, axis=1).reshape(-1, 1)\n", "priors_count = np.argmax(dataset_orig_df[['priors_count=0', 'priors_count=1 to 3', \n", " 'priors_count=More than 3']].values, axis=1).reshape(-1, 1)\n", "c_charge_degree = np.argmax(dataset_orig_df[['c_charge_degree=F', 'c_charge_degree=M']].values, axis=1).reshape(-1, 1)\n", "\n", "features = np.concatenate((dataset_orig_df[['sex', 'race']].values, age_cat, priors_count, \\\n", " c_charge_degree, dataset_orig.labels), axis=1)\n", "feature_names = ['sex', 'race', 'age_cat', 'priors_count', 'c_charge_degree']" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "df = pd.DataFrame(features, columns=feature_names + ['two_year_recid'])" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", " | sex | \n", "race | \n", "age_cat | \n", "priors_count | \n", "c_charge_degree | \n", "two_year_recid | \n", "
---|---|---|---|---|---|---|
0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "
1 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "2.0 | \n", "0.0 | \n", "1.0 | \n", "
2 | \n", "0.0 | \n", "1.0 | \n", "1.0 | \n", "2.0 | \n", "0.0 | \n", "1.0 | \n", "
3 | \n", "1.0 | \n", "1.0 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "
4 | \n", "0.0 | \n", "1.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
\n", " | sex | \n", "race | \n", "age_cat | \n", "priors_count | \n", "c_charge_degree | \n", "two_year_recid | \n", "model_not_recid | \n", "observed_not_recid | \n", "
---|---|---|---|---|---|---|---|---|
2479 | \n", "1.0 | \n", "1.0 | \n", "2.0 | \n", "2.0 | \n", "0.0 | \n", "1.0 | \n", "0.552945 | \n", "0.0 | \n", "
3574 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.740960 | \n", "1.0 | \n", "
513 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.374734 | \n", "1.0 | \n", "
1725 | \n", "0.0 | \n", "0.0 | \n", "2.0 | \n", "2.0 | \n", "0.0 | \n", "1.0 | \n", "0.444486 | \n", "0.0 | \n", "
96 | \n", "0.0 | \n", "1.0 | \n", "1.0 | \n", "1.0 | \n", "1.0 | \n", "1.0 | \n", "0.584904 | \n", "0.0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
4931 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.374734 | \n", "1.0 | \n", "
3264 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.535762 | \n", "0.0 | \n", "
1653 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "0.490041 | \n", "1.0 | \n", "
2607 | \n", "1.0 | \n", "1.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "0.769141 | \n", "0.0 | \n", "
2732 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "2.0 | \n", "1.0 | \n", "1.0 | \n", "0.251724 | \n", "0.0 | \n", "
1584 rows × 8 columns
\n", "