## Bias scan using Multi-Dimensional Subset Scan (MDSS)

"Identifying Significant Predictive Bias in Classifiers" https://arxiv.org/abs/1611.08292

The goal of bias scan is to identify a subgroup(s) that has significantly more predictive bias than would be expected from an unbiased classifier. There are $\prod_{m=1}^{M}\left(2^{|X_{m}|}-1\right)$ unique subgroups from a dataset with $M$ features, with each feature having $|X_{m}|$ discretized values, where a subgroup is any $M$-dimension
Cartesian set product, between subsets of feature-values from each feature --- excluding the empty set. Bias scan mitigates this computational hurdle by approximately identifing the most statistically biased subgroup in linear time (rather than exponential).


We define the statistical measure of predictive bias function, $score_{bias}(S)$ as a likelihood ratio score and a function of a given subgroup $S$. The null hypothesis is that the given prediction's odds are correct for all subgroups in

$\mathcal{D}$: $H_{0}:odds(y_{i})=\frac{\hat{p}_{i}}{1-\hat{p}_{i}}\ \forall i\in\mathcal{D}$.

The alternative hypothesis assumes some constant multiplicative bias in the odds for some given subgroup $S$:


$H_{1}:\ odds(y_{i})=q\frac{\hat{p}_{i}}{1-\hat{p}_{i}},\ \text{where}\ q>1\ \forall i\in S\ \mbox{and}\ q=1\ \forall i\notin S.$

In the classification setting, each observation's likelihood is Bernoulli distributed and assumed independent. This results in the following scoring function for a subgroup $S$

\begin{align*}
score_{bias}(S)= & \max_{q}\log\prod_{i\in S}\frac{Bernoulli(\frac{q\hat{p}_{i}}{1-\hat{p}_{i}+q\hat{p}_{i}})}{Bernoulli(\hat{p}_{i})}\\
= & \max_{q}\log(q)\sum_{i\in S}y_{i}-\sum_{i\in S}\log(1-\hat{p}_{i}+q\hat{p}_{i}).
\end{align*}
Our bias scan is thus represented as: $S^{*}=FSS(\mathcal{D},\mathcal{E},F_{score})=MDSS(\mathcal{D},\hat{p},score_{bias})$.

where $S^{*}$ is the detected most anomalous subgroup, $FSS$ is one of several subset scan algorithms for different problem settings, $\mathcal{D}$ is a dataset with outcomes $Y$ and discretized features $\mathcal{X}$, $\mathcal{E}$ are a set of expectations or 'normal' values for $Y$, and $F_{score}$ is an expectation-based scoring statistic that measures the amount of anomalousness between subgroup observations and their expectations.

Predictive bias emphasizes comparable predictions for a subgroup and its observations and Bias scan provides a more general method that can detect and characterize such bias, or poor classifier fit, in the larger space of all possible subgroups, without a priori specification.

In [1]:
import itertools

from aif360.metrics import BinaryLabelDatasetMetric 
from aif360.metrics.mdss_classification_metric import MDSSClassificationMetric
from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions import load_preproc_data_compas

from IPython.display import Markdown, display
import numpy as np
import pandas as pd

In [2]:
from aif360.metrics import BinaryLabelDatasetMetric 

We'll demonstrate scoring a subset and finding the most anomalous subset with bias scan using the compas dataset.

We can specify subgroups to be scored or scan for the most anomalous subgroup. Bias scan allows us to decide if we aim to identify bias as `higher` than expected probabilities or `lower` than expected probabilities. Depending on the favourable label, the corresponding subgroup may be categorized as priviledged or unprivileged.

In [3]:
np.random.seed(0)

dataset_orig = load_preproc_data_compas()

female_group = [{'sex': 1}]
male_group = [{'sex': 0}]

The dataset has the categorical features one-hot encoded so we'll modify the dataset to convert them back 
to the categorical featues because scanning one-hot encoded features may find subgroups that are not meaningful eg. a subgroup with 2 race values. 

In [4]:
dataset_orig_df = pd.DataFrame(dataset_orig.features, columns=dataset_orig.feature_names)

age_cat = np.argmax(dataset_orig_df[['age_cat=Less than 25', 'age_cat=25 to 45', 
                                     'age_cat=Greater than 45']].values, axis=1).reshape(-1, 1)
priors_count = np.argmax(dataset_orig_df[['priors_count=0', 'priors_count=1 to 3', 
                                          'priors_count=More than 3']].values, axis=1).reshape(-1, 1)
c_charge_degree = np.argmax(dataset_orig_df[['c_charge_degree=F', 'c_charge_degree=M']].values, axis=1).reshape(-1, 1)

features = np.concatenate((dataset_orig_df[['sex', 'race']].values, age_cat, priors_count, \
                           c_charge_degree, dataset_orig.labels), axis=1)
feature_names = ['sex', 'race', 'age_cat', 'priors_count', 'c_charge_degree']

In [5]:
df = pd.DataFrame(features, columns=feature_names + ['two_year_recid'])

In [6]:
df.head()

Unnamed: 0,sex,race,age_cat,priors_count,c_charge_degree,two_year_recid
0,0.0,0.0,1.0,0.0,0.0,1.0
1,0.0,0.0,0.0,2.0,0.0,1.0
2,0.0,1.0,1.0,2.0,0.0,1.0
3,1.0,1.0,1.0,0.0,1.0,0.0
4,0.0,1.0,1.0,0.0,0.0,0.0


### training
We'll create a structured dataset and then train a simple classifier to predict the probability of the outcome

In [7]:
from aif360.datasets import StandardDataset
dataset = StandardDataset(df, label_name='two_year_recid', favorable_classes=[0],
                 protected_attribute_names=['sex', 'race'],
                 privileged_classes=[[1], [1]],
                 instance_weights_name=None)

In [8]:
dataset_orig_train, dataset_orig_test = dataset.split([0.7], shuffle=True)

In [9]:
display(Markdown("#### Training Dataset shape"))
print(dataset_orig_train.features.shape)
display(Markdown("#### Favorable and unfavorable labels"))
print(dataset_orig_train.favorable_label, dataset_orig_train.unfavorable_label)
display(Markdown("#### Protected attribute names"))
print(dataset_orig_train.protected_attribute_names)
display(Markdown("#### Privileged and unprivileged protected attribute values"))
print(dataset_orig_train.privileged_protected_attributes, 
      dataset_orig_train.unprivileged_protected_attributes)
display(Markdown("#### Dataset feature names"))
print(dataset_orig_train.feature_names)


#### Training Dataset shape

(3694, 5)


#### Favorable and unfavorable labels

0.0 1.0


#### Protected attribute names

['sex', 'race']


#### Privileged and unprivileged protected attribute values

[array([1.]), array([1.])] [array([0.]), array([0.])]


#### Dataset feature names

['sex', 'race', 'age_cat', 'priors_count', 'c_charge_degree']


In [10]:
metric_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                             unprivileged_groups=male_group,
                             privileged_groups=female_group)

print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_train.mean_difference())
metric_test = BinaryLabelDatasetMetric(dataset_orig_test, 
                             unprivileged_groups=male_group,
                             privileged_groups=female_group)
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_test.mean_difference())


Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.124496
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.159410


It shows that overall Females in the dataset have a lower observed recidivism them Males.

If we train a classifier, the model is likely to pick up this bias in the dataset

In [11]:
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(solver='lbfgs', C=1.0, penalty='l2')
clf.fit(dataset_orig_train.features, dataset_orig_train.labels.flatten())

LogisticRegression()

Note that the probability scores we use are the probabilities of the favorable label, which is 0 in this case.

In [12]:
dataset_bias_test_prob = clf.predict_proba(dataset_orig_test.features)[:,0]

In [13]:
dff = pd.DataFrame(dataset_orig_test.features, columns=dataset_orig_test.feature_names)
dff['observed'] = pd.Series(dataset_orig_test.labels.flatten(), index=dff.index)
dff['probabilities'] = pd.Series(dataset_bias_test_prob, index=dff.index)

We'll the create another structured dataset as the classified dataset by assigning the predicted probabilities to the scores attribute

In [14]:
dataset_bias_test = dataset_orig_test.copy()
dataset_bias_test.scores = dataset_bias_test_prob
dataset_bias_test.labels = dataset_orig_test.labels

### bias scoring

First, we try to observe the difference between the model prediction and the actual observations of the favorable label, which in this case is 0. We create a new test_df for this computation. 

If the model's average prediction of the favorable label is higher than the actual observations average, then the group is said to be privileged. In the converse case, the group is said to be unprivileged.

We would check for whether the male and female groups are privileged or not using mdss score

In [15]:
test_df = dataset_bias_test.convert_to_dataframe()[0]
test_df['model_not_recid'] = dataset_bias_test.scores.flatten()
test_df['observed_not_recid'] = 1 - test_df['two_year_recid']
test_df

Unnamed: 0,sex,race,age_cat,priors_count,c_charge_degree,two_year_recid,model_not_recid,observed_not_recid
2479,1.0,1.0,2.0,2.0,0.0,1.0,0.552945,0.0
3574,1.0,0.0,1.0,0.0,0.0,0.0,0.740960,1.0
513,0.0,1.0,0.0,1.0,0.0,0.0,0.374734,1.0
1725,0.0,0.0,2.0,2.0,0.0,1.0,0.444486,0.0
96,0.0,1.0,1.0,1.0,1.0,1.0,0.584904,0.0
...,...,...,...,...,...,...,...,...
4931,0.0,1.0,0.0,1.0,0.0,0.0,0.374734,1.0
3264,0.0,0.0,0.0,0.0,0.0,1.0,0.535762,0.0
1653,0.0,0.0,1.0,1.0,0.0,0.0,0.490041,1.0
2607,1.0,1.0,1.0,0.0,0.0,1.0,0.769141,0.0


In [16]:
# Females actual vs predicted rates of positive label
test_df[test_df.sex == 1][['model_not_recid','observed_not_recid']].mean()

model_not_recid       0.617559
observed_not_recid    0.657051
dtype: float64

Since model average predictions for the positive label is lower than the observed average by a substantial amount (about 4%), the female group is most likely unprivileged.

In [17]:
# Males actual vs predicted rates of positive label
test_df[test_df.sex == 0][['model_not_recid','observed_not_recid']].mean()

model_not_recid       0.512445
observed_not_recid    0.497642
dtype: float64

Since model average predictions for the positive label is greater than the observed average by a small amount (about 1.5%), the male group could be privileged.

Now, we'll create an instance of the MDSS Classification Metric and assess the apriori defined privileged and unprivileged groups; females and males respectively. 

By apriori defining the male group as unprivileged, we are saying we expect that the model's predictions is systematically lower than the actual observation.

By apriori defining the female group as privileged, we are saying we expect that the model's predictions is systematically higher than the actual observation.

From our mini-analysis above, we know that these hypothesis are unlikely to be true 

In [18]:
mdss_classified = MDSSClassificationMetric(dataset_orig_test, dataset_bias_test,
                         unprivileged_groups=male_group,
                         privileged_groups=female_group)

In [19]:
# We are asking the question:
# Is there evidence that the hypothesized privileged group is actually privileged?

female_privileged_score = mdss_classified.score_groups(privileged=True)
female_privileged_score

-0.0

By having a score very close to zero, mdss bias score is informing us that there is no evidence from the data that our hypothesis of the female group being privileged is true.

In [20]:
# We are asking the question:
# Is there evidence that the hypothesized unprivileged group is actually unprivileged?

male_unprivileged_score = mdss_classified.score_groups(privileged=False)
male_unprivileged_score

-0.0

By having a score very close zero, mdss bias score is informing us that there is no evidence from the data to support our hypothesis of the male group being unprivileged is true.

We can flip our initial hypothesis and check if the male group is privileged or the female group is unprivileged.

In [21]:
mdss_classified = MDSSClassificationMetric(dataset_orig_test, dataset_bias_test,
                         unprivileged_groups=female_group,
                         privileged_groups=male_group)

In [22]:
male_privileged_score = mdss_classified.score_groups(privileged=True)
male_privileged_score

0.6301

By having a positive score, mdss bias score is informing us that there is evidence from the data that our hypothesis of the male group being privileged is true.

In [23]:
female_unprivileged_score = mdss_classified.score_groups(privileged=False)
female_unprivileged_score

1.1771

By having a positive score, mdss bias score is informing us that there is evidence from the data to support our hypothesis of the female group being unprivileged is true.

By taking into account the size of the group and the magnitude of the deviation, mdss bias core has been able to tell us the following about the male and female groups:
- There is no evidence that the female group is privileged.
- There is no evidence that the male group is unprivileged.
- There is evidence that the male group is privileged.
- There is evidence that the female is unprivileged.

### bias scan
We get the bias score for the apriori defined subgroup but assuming we had no prior knowledge 
about the predictive bias and wanted to find the subgroups with the most bias, we can apply bias scan to identify the priviledged and unpriviledged groups. The privileged argument is not a reference to a group but the direction for which to scan for bias.

In [24]:
privileged_subset = mdss_classified.bias_scan(penalty=0.5, privileged=True)
unprivileged_subset = mdss_classified.bias_scan(penalty=0.5, privileged=False)

Function bias_scan is deprecated; Change to new interface - aif360.detectors.mdss_detector.bias_scan by version 0.5.0.
Function bias_scan is deprecated; Change to new interface - aif360.detectors.mdss_detector.bias_scan by version 0.5.0.


In [25]:
print(privileged_subset)
print(unprivileged_subset)

({'race': [0.0], 'age_cat': [0.0], 'sex': [0.0]}, 3.1531)
({'sex': [1.0], 'race': [0.0]}, 3.3037)


In [26]:
assert privileged_subset[0]
assert unprivileged_subset[0]

We can observe that the bias score is higher than the score of the prior groups. These subgroups are guaranteed to be the highest scoring subgroup among the exponentially many subgroups.

For the purposes of this example, the logistic regression model systematically under estimates the recidivism risk of individuals in the `Non-caucasian`, `less than 25`, `Male` subgroup whereas individuals belonging to the `Causasian`, `Female` are assigned a higher risk than is actually observed. We refer to these subgroups as the `detected privileged group` and `detected unprivileged group` respectively.

We can create another srtuctured dataset using the new groups to compute other dataset metrics.  

In [27]:
protected_attr_names = set(privileged_subset[0].keys()).union(set(unprivileged_subset[0].keys()))
dataset_orig_test.protected_attribute_names = list(protected_attr_names)
dataset_bias_test.protected_attribute_names = list(protected_attr_names)

protected_attr = np.where(np.isin(dataset_orig_test.feature_names, list(protected_attr_names)))[0]

dataset_orig_test.protected_attributes = dataset_orig_test.features[:, protected_attr]
dataset_bias_test.protected_attributes = dataset_bias_test.features[:, protected_attr]

In [28]:
display(Markdown("#### Training Dataset shape"))
print(dataset_bias_test.features.shape)
display(Markdown("#### Favorable and unfavorable labels"))
print(dataset_bias_test.favorable_label, dataset_orig_train.unfavorable_label)
display(Markdown("#### Protected attribute names"))
print(dataset_bias_test.protected_attribute_names)
display(Markdown("#### Privileged and unprivileged protected attribute values"))
print(dataset_bias_test.privileged_protected_attributes, 
      dataset_bias_test.unprivileged_protected_attributes)
display(Markdown("#### Dataset feature names"))
print(dataset_bias_test.feature_names)

#### Training Dataset shape

(1584, 5)


#### Favorable and unfavorable labels

0.0 1.0


#### Protected attribute names

['sex', 'race', 'age_cat']


#### Privileged and unprivileged protected attribute values

[array([1.]), array([1.])] [array([0.]), array([0.])]


#### Dataset feature names

['sex', 'race', 'age_cat', 'priors_count', 'c_charge_degree']


In [29]:
# converts from dictionary of lists to list of dictionaries
a = list(privileged_subset[0].values())
subset_values = list(itertools.product(*a))

detected_privileged_groups = []
for vals in subset_values:
    detected_privileged_groups.append((dict(zip(privileged_subset[0].keys(), vals))))
    
a = list(unprivileged_subset[0].values())
subset_values = list(itertools.product(*a))

detected_unprivileged_groups = []
for vals in subset_values:
    detected_unprivileged_groups.append((dict(zip(unprivileged_subset[0].keys(), vals))))

In [30]:
metric_bias_test = BinaryLabelDatasetMetric(dataset_bias_test, 
                                             unprivileged_groups=detected_unprivileged_groups,
                                             privileged_groups=detected_privileged_groups)

print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" 
      % metric_bias_test.mean_difference())

Test set: Difference in mean outcomes between unprivileged and privileged groups = 0.345722


It appears the detected privileged group have a higher risk of recidivism than the unpriviledged group.

As noted in the paper, predictive bias is different from predictive fairness so there's no the emphasis in the subgroups having comparable predictions between them. 
We can investigate the difference in what the model predicts vs what we actually observed as well as the multiplicative difference in the odds of the subgroups.

In [31]:
to_choose = dff[privileged_subset[0].keys()].isin(privileged_subset[0]).all(axis=1)
temp_df = dff.loc[to_choose]

In [32]:
"Our detected priviledged group has a size of {}, we observe {} as the average risk of recidivism, but our model predicts {}"\
.format(len(temp_df), temp_df['observed'].mean(), 1 - temp_df['probabilities'].mean())

'Our detected priviledged group has a size of 192, we observe 0.6770833333333334 as the average risk of recidivism, but our model predicts 0.5730004938240802'

In [33]:
group_obs = temp_df['observed'].mean()
group_prob = temp_df['probabilities'].mean()

odds_mul = (group_obs / (1 - group_obs)) / (group_prob /(1 - group_prob))
"This is a multiplicative increase in the odds by {}"\
.format(odds_mul)

'This is a multiplicative increase in the odds by 2.81370969044125'

In [34]:
assert odds_mul > 1

In [35]:
to_choose = dff[unprivileged_subset[0].keys()].isin(unprivileged_subset[0]).all(axis=1)
temp_df = dff.loc[to_choose]

In [36]:
"Our detected unpriviledged group has a size of {}, we observe {} as the average risk of recidivism, but our model predicts {}"\
.format(len(temp_df), temp_df['observed'].mean(), 1 - temp_df['probabilities'].mean())

'Our detected unpriviledged group has a size of 169, we observe 0.33136094674556216 as the average risk of recidivism, but our model predicts 0.43652313575727764'

In [37]:
group_obs = temp_df['observed'].mean()
group_prob = temp_df['probabilities'].mean()

odds_mul = (group_obs / (1 - group_obs)) / (group_prob /(1 - group_prob))
"This is a multiplicative decrease in the odds by {}"\
.format(odds_mul)

'This is a multiplicative decrease in the odds by 0.38392002104569445'

In [38]:
assert odds_mul < 1

In summary, this notebook demonstrates the use of bias scan to identify subgroups with significant predictive bias, as quantified by a likelihood ratio score, using subset scannig. This allows consideration of not just subgroups of a priori interest or small dimensions, but the space of all possible subgroups of features.
It also presents opportunity for a kind of bias mitigation technique that uses the multiplicative odds in the over-or-under estimated subgroups to adjust for predictive fairness.