{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Example Tool Usage - Binary Classification Problems\n", "----\n", "\n", "# About\n", "This notebook contains simple, toy examples to help you get started with FairMLHealth tool usage. This same content is mirrored in the repository's main [README](../README.md)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Example Setup" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from fairmlhealth import report, measure, stat_utils\n", "import numpy as np\n", "import pandas as pd\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.naive_bayes import BernoulliNB\n", "from sklearn.tree import DecisionTreeClassifier" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# First, we'll create a semi-randomized dataframe with specific columns for our \n", "# attributes of interest\n", "rng = np.random.RandomState(506)\n", "N = 240\n", "X = pd.DataFrame({'col1': rng.randint(1, 4, N), \n", " 'col2': rng.randint(1, 75, N),\n", " 'col3': rng.randint(0, 2, N),\n", " 'gender': [0, 1]*int(N/2), \n", " 'ethnicity': [1, 1, 0, 0]*int(N/4),\n", " 'other': [1, 0, 0, 0, 1, 0, 0, 1]*int(N/8)\n", " })\n", "\n", "# Second, we'll create a randomized target value\n", "y = pd.Series(X['col3'].values + rng.randint(0, 2, N), name='Example_Target').clip(upper=1)\n", "\n", "# Third, we'll split the data and use it to train two generic models\n", "splits = train_test_split(X, y, stratify=y, test_size=0.5, random_state=60)\n", "X_train, X_test, y_train, y_test = splits\n", "\n", "model_1 = BernoulliNB().fit(X_train, y_train)\n", "model_2 = DecisionTreeClassifier().fit(X_train, y_train)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col1col2col3genderethnicityother
01150011
13511110
21301000
32281100
41720011
\n", "
" ], "text/plain": [ " col1 col2 col3 gender ethnicity other\n", "0 1 15 0 0 1 1\n", "1 3 51 1 1 1 0\n", "2 1 30 1 0 0 0\n", "3 2 28 1 1 0 0\n", "4 1 72 0 0 1 1" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0 0\n", "1 1\n", "2 1\n", "3 1\n", "4 1\n", "Name: Example_Target, dtype: int64" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(X.head(), y.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Generalized Reports\n", "fairMLHealth has tools to create generalized reports of model bias and performance.\n", "\n", "The primary reporting tool is now the **compare** function, which can be used to generate side-by-side comparisons for any number of models, and for either binary classifcation or for regression problems. Model performance metrics such as accuracy and precision (or MAE and RSquared for regression problems) are also provided to facilitate comparison. Below is an example output comparing the two example models defined above. Missing values have been added for metrics requiring prediction probabilities (which the second model does not have).\n", "\n", "A flagging protocol is applied by default to highlight any cells with values that are out of range. This can be turned off by passing ***flag_oor = False*** to report.compare().\n", "\n", "*Note that the Equal Odds Ratio has been dropped from the example below*. This is because the false positive rate is approximately zero for both the entire dataset and for the privileged class, leading to a zero in the denominator of the False Positive Rate Ratio: $\\frac{{FPR}_{unprivileged}}{{FPR}_{privileged}}$. The result is therefore undefined and cannot be compared in the Equal Odds Ratio. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "~/repos/fairMLHealth/fairmlhealth/measure.py:888: UserWarning: The following measures are undefined and have been dropped: ['Equal Odds Ratio']\n", " warn(f\"The following measures are undefined and have been dropped: {undefined}\")\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Gender Ethnicity
Metric Measure
Group FairnessAUC Difference-0.0250-0.0778
Balanced Accuracy Difference0.0988-0.3667
Balanced Accuracy Ratio1.15760.5769
Disparate Impact Ratio0.90681.8182
Equal Odds Difference-0.20361.0000
Equal Odds Ratio0.6691nan
Positive Predictive Parity Difference0.0111-0.2500
Positive Predictive Parity Ratio1.01330.7500
Statistical Parity Difference-0.07590.4500
Individual FairnessBetween-Group Gen. Entropy Error0.00000.0241
Consistency Score0.76830.7683
Model PerformanceAccuracy1.00001.0000
F1-Score1.00001.0000
FPR0.00000.0000
Mean Example_Target0.77500.7750
Precision1.00001.0000
TPR1.00001.0000
Data MetricsPrevalence of Privileged Class (%)49.000050.0000
" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Example with different protected attributes. \n", "# Note that the same model is passed with two different keys to clarify the column names.\n", "# Equal Odds Ratio is not displayed because it is undefined (the False Positive Rate for \n", "# the privileged group, ethnicity = 1, is 0.0)\n", "report.compare(test_data = X_test, \n", " targets = y_test, \n", " protected_attr = {'Gender': X_test['gender'], \n", " 'Ethnicity': X_test['ethnicity']}, \n", " models = model_1\n", " )" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "~/repos/fairMLHealth/fairmlhealth/measure.py:888: UserWarning: The following measures are undefined and have been dropped: ['Equal Odds Ratio']\n", " warn(f\"The following measures are undefined and have been dropped: {undefined}\")\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
model 1
Metric Measure
Group FairnessAUC Difference-0.0778
Balanced Accuracy Difference-0.3667
Balanced Accuracy Ratio0.5769
Disparate Impact Ratio1.8182
Equal Odds Difference1.0000
Positive Predictive Parity Difference-0.2500
Positive Predictive Parity Ratio0.7500
Statistical Parity Difference0.4500
Individual FairnessBetween-Group Gen. Entropy Error0.0241
Consistency Score0.7683
Model PerformanceAccuracy1.0000
F1-Score1.0000
FPR0.0000
Mean Example_Target0.7750
Precision1.0000
TPR1.0000
Data MetricsPrevalence of Privileged Class (%)50.0000
" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Generate a measure report\n", "report.compare(X_test, y_test, X_test['ethnicity'], model_1)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Returned type: \n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
model 1
Metric Measure
Group FairnessAUC Difference-0.0250
Balanced Accuracy Difference0.0988
Balanced Accuracy Ratio1.1576
Disparate Impact Ratio0.9068
Equal Odds Difference-0.2036
Equal Odds Ratio0.6691
Positive Predictive Parity Difference0.0111
Positive Predictive Parity Ratio1.0133
Statistical Parity Difference-0.0759
Individual FairnessBetween-Group Gen. Entropy Error0.0000
Consistency Score0.7683
Data MetricsPrevalence of Privileged Class (%)49.0000
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Display the same report without performance measures\n", "bias_report = report.compare(test_data=X_test, \n", " targets=y_test, \n", " protected_attr=X_test['gender'], \n", " models=model_1, \n", " pred_type=\"classification\", \n", " skip_performance=True)\n", "print(\"Returned type:\", type(bias_report))\n", "display(bias_report)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Alternative Return Types\n", "\n", "By default the **compare** function returns a flagged comparison of type pandas Styler (pandas.io.formats.style.Styler). When flags are disabled, the default return type is a pandas DataFrame. Outputs can also be returned as embedded HTML -- with or without flags -- by specitying *output_type=\"html\"*. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Returned type: \n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# With flags disabled, the report is returned as a pandas DataFrame\n", "df = report.compare(test_data=X_test, \n", " targets=y_test, \n", " protected_attr=X_test['gender'], \n", " models=model_1, \n", " pred_type=\"classification\",\n", " flag_oor=False, \n", " output_type=\"styler\")\n", "print(\"Returned type:\", type(df))\n", "#display(df.head(2))\n", "isinstance(df, pd.io.formats.style.Styler)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Returned type: \n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
model 1
Metric Measure
Group FairnessAUC Difference-0.0250
Balanced Accuracy Difference0.0988
Balanced Accuracy Ratio1.1576
Disparate Impact Ratio0.9068
Equal Odds Difference-0.2036
Equal Odds Ratio0.6691
Positive Predictive Parity Difference0.0111
Positive Predictive Parity Ratio1.0133
Statistical Parity Difference-0.0759
Individual FairnessBetween-Group Gen. Entropy Error0.0000
Consistency Score0.7683
Model PerformanceAccuracy1.0000
F1-Score1.0000
FPR0.0000
Mean Example_Target0.7750
Precision1.0000
TPR1.0000
Data MetricsPrevalence of Privileged Class (%)49.0000
" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Comparisons can also be returned as embedded HTML\n", "from IPython.core.display import HTML\n", "html_output = report.compare(test_data=X_test, \n", " targets=y_test, \n", " protected_attr=X_test['gender'], \n", " models=model_1, \n", " pred_type=\"classification\", \n", " output_type=\"html\")\n", "print(\"Returned type:\", type(html_output))\n", "HTML(html_output)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comparing Results for Multiple Models\n", "\n", "The **compare** tool can also be used to measure two different models or two different protected attributes. Protected attributes are measured separately and cannot yet be combined together with the **compare** tool, although they can be grouped as cohorts in the stratified tables [as shown below](#cohort). \n", "\n", "Below is an example output comparing the two test models defined above. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Any Name 1 Model 2
Metric Measure
Group FairnessAUC Difference-0.02500.0623
Balanced Accuracy Difference0.09880.0709
Balanced Accuracy Ratio1.15761.1151
Disparate Impact Ratio0.90680.7820
Equal Odds Difference-0.2036-0.2624
Equal Odds Ratio0.66910.5735
Positive Predictive Parity Difference0.01110.0123
Positive Predictive Parity Ratio1.01331.0148
Statistical Parity Difference-0.0759-0.1737
Individual FairnessBetween-Group Gen. Entropy Error0.00000.0018
Consistency Score0.76830.7367
Model PerformanceAccuracy1.00001.0000
F1-Score1.00001.0000
FPR0.00000.0000
Mean Example_Target0.77500.7083
Precision1.00001.0000
TPR1.00001.0000
Data MetricsPrevalence of Privileged Class (%)49.000049.0000
" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Example with multiple models\n", "report.compare(test_data = X_test, \n", " targets = y_test, \n", " protected_attr = X_test['gender'],\n", " models = {'Any Name 1':model_1, 'Model 2':model_2})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Detailed Analyses\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Significance Testing\n", "\n", "It is generally recommended to test whether any differences in model outcomes for protected attributes are the effect of a sampling error in our test. FairMLHealth comes with a bootstrapping utility and supporting functions that can be used in statistical testing. The bootstrapping utility accepts any function that returns a p-value and will return a True or False if the p-value is greater than some alpha for a threshold number of randomly sampled trials. While the selection of proper statistical tests is beyond the scope of this notebook, two examples using the bootstrap_significance tool with built-in test functions are shown below: 1) using Kruskal-Wallis, and 2) using Chi-Square." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "model_1_preds = pd.Series(model_1.predict(X_test))" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Can we reject the hypothesis that y values have the same distribution, regardless of gender?\n", " False\n" ] } ], "source": [ "# Example 1 Bootstrap Test Results Applying Kruskal-Wallis to Predictions\n", "isMale = X_test.reset_index(drop=True)['gender'].eq(1)\n", "reject_h0 = stat_utils.bootstrap_significance(alpha=0.05,\n", " threshold=0.70,\n", " func=stat_utils.kruskal_pval, \n", " a=model_1_preds.loc[isMale], \n", " b=model_1_preds.loc[~isMale])\n", "print(\"Can we reject the hypothesis that y values have the same distribution, regardless of gender?\\n\",\n", " reject_h0)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Can we reject the hypothesis that prediction results are from the same distribution regardless of gender?\n", " False\n" ] } ], "source": [ "# Example 2 Bootstrap Results Applying Chi-Square to the Distribution of \n", "# Prediction Successes/Failures\n", "model_1_results = stat_utils.binary_result_labels(y_test, model_1_preds)\n", "reject_h0 = stat_utils.bootstrap_significance(alpha=0.05,\n", " threshold=0.70,\n", " func=stat_utils.chisquare_pval, \n", " group=X_test['gender'], \n", " values=model_1_results)\n", "print(\"Can we reject the hypothesis that prediction results are from the same\", \n", " \"distribution regardless of gender?\\n\", reject_h0)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "P-Value of single Chi-Square test: 0.9214286123014678\n" ] } ], "source": [ "# Example of Single Chi-Square Test\n", "pval = stat_utils.chisquare_pval(group=X_test['gender'], \n", " values=model_1_results,\n", " # If n_sample set to None, tests on full dataset rather than sample\n", " n_sample=None\n", " )\n", "print(\"P-Value of single Chi-Square test:\", pval)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Stratified Tables\n", "FairMLHealth also provides tools for detailed analysis of model variance by way of stratified data, performance, and bias tables. Beyond evaluating fairness, these tools are intended for flexible use in any generic assessment of model bias. Tables can evaluate multiple features at once. *An important update starting in Version 1.0.0 is that all of these features are now contained in the **measure.py** module (previously named reports.py).*\n", "\n", "All tables display a summary row for \"All Features, All Values\". This summary can be turned off by passing ***add_overview=False*** to measure.data()." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Tables\n", "\n", "The stratified data table can be used to evaluate data against one or multiple targets. Two methods are available for identifying which features to assess, as shown in the examples below. " ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Feature NameFeature ValueObs.EntropyMean Example_TargetMedian Example_TargetMissing ValuesStd. Dev. Example_TargetValue Prevalence
0ALL FEATURESALL VALUES120NaN0.75001.000.43481.0000
1gender0610.99980.72131.000.45210.5083
2gender1590.99980.77971.000.41800.4917
3other0770.94130.79221.000.40840.6417
4other1430.94130.67441.000.47410.3583
5col11421.58380.78571.000.41530.3500
6col12401.58380.82501.000.38480.3333
7col13381.58380.63161.000.48890.3167
\n", "
" ], "text/plain": [ " Feature Name Feature Value Obs. Entropy Mean Example_Target \\\n", "0 ALL FEATURES ALL VALUES 120 NaN 0.7500 \n", "1 gender 0 61 0.9998 0.7213 \n", "2 gender 1 59 0.9998 0.7797 \n", "3 other 0 77 0.9413 0.7922 \n", "4 other 1 43 0.9413 0.6744 \n", "5 col1 1 42 1.5838 0.7857 \n", "6 col1 2 40 1.5838 0.8250 \n", "7 col1 3 38 1.5838 0.6316 \n", "\n", " Median Example_Target Missing Values Std. Dev. Example_Target \\\n", "0 1.0 0 0.4348 \n", "1 1.0 0 0.4521 \n", "2 1.0 0 0.4180 \n", "3 1.0 0 0.4084 \n", "4 1.0 0 0.4741 \n", "5 1.0 0 0.4153 \n", "6 1.0 0 0.3848 \n", "7 1.0 0 0.4889 \n", "\n", " Value Prevalence \n", "0 1.0000 \n", "1 0.5083 \n", "2 0.4917 \n", "3 0.6417 \n", "4 0.3583 \n", "5 0.3500 \n", "6 0.3333 \n", "7 0.3167 " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Arguments Option 1: pass full set of data, subsetting with *features* argument\n", "measure.data(X_test, y_test, features=['gender', 'other', 'col1'])" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Feature NameFeature ValueObs.EntropyMean Example_TargetMedian Example_TargetMissing ValuesStd. Dev. Example_TargetValue Prevalence
0ALL FEATURESALL VALUES120NaN0.75001.000.43481.0000
1gender0610.99980.72131.000.45210.5083
2gender1590.99980.77971.000.41800.4917
3other0770.94130.79221.000.40840.6417
4other1430.94130.67441.000.47410.3583
5col11421.58380.78571.000.41530.3500
6col12401.58380.82501.000.38480.3333
7col13381.58380.63161.000.48890.3167
\n", "
" ], "text/plain": [ " Feature Name Feature Value Obs. Entropy Mean Example_Target \\\n", "0 ALL FEATURES ALL VALUES 120 NaN 0.7500 \n", "1 gender 0 61 0.9998 0.7213 \n", "2 gender 1 59 0.9998 0.7797 \n", "3 other 0 77 0.9413 0.7922 \n", "4 other 1 43 0.9413 0.6744 \n", "5 col1 1 42 1.5838 0.7857 \n", "6 col1 2 40 1.5838 0.8250 \n", "7 col1 3 38 1.5838 0.6316 \n", "\n", " Median Example_Target Missing Values Std. Dev. Example_Target \\\n", "0 1.0 0 0.4348 \n", "1 1.0 0 0.4521 \n", "2 1.0 0 0.4180 \n", "3 1.0 0 0.4084 \n", "4 1.0 0 0.4741 \n", "5 1.0 0 0.4153 \n", "6 1.0 0 0.3848 \n", "7 1.0 0 0.4889 \n", "\n", " Value Prevalence \n", "0 1.0000 \n", "1 0.5083 \n", "2 0.4917 \n", "3 0.6417 \n", "4 0.3583 \n", "5 0.3500 \n", "6 0.3333 \n", "7 0.3167 " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Arguments Option 2: pass the data subset of interest without using the *features* argument\n", "measure.data(X_test[['gender', 'other', 'col1']], y_test)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Feature NameFeature ValueObs.EntropyMean col2Mean col3Median col2Median col3Missing ValuesStd. Dev. col2Std. Dev. col3Value Prevalence
0ALL FEATURESALL VALUES120NaN39.31670.508340.51.0020.75430.50201.0000
1gender0610.999839.77050.409839.00.0021.64820.49590.5083
2gender1590.999838.84750.610241.01.0019.96270.49190.4917
3col11421.583838.50000.595239.51.0020.64240.49680.3500
4col12401.583835.72500.525034.01.0018.28970.50570.3333
5col13381.583844.00000.394749.50.0022.87690.49540.3167
\n", "
" ], "text/plain": [ " Feature Name Feature Value Obs. Entropy Mean col2 Mean col3 \\\n", "0 ALL FEATURES ALL VALUES 120 NaN 39.3167 0.5083 \n", "1 gender 0 61 0.9998 39.7705 0.4098 \n", "2 gender 1 59 0.9998 38.8475 0.6102 \n", "3 col1 1 42 1.5838 38.5000 0.5952 \n", "4 col1 2 40 1.5838 35.7250 0.5250 \n", "5 col1 3 38 1.5838 44.0000 0.3947 \n", "\n", " Median col2 Median col3 Missing Values Std. Dev. col2 Std. Dev. col3 \\\n", "0 40.5 1.0 0 20.7543 0.5020 \n", "1 39.0 0.0 0 21.6482 0.4959 \n", "2 41.0 1.0 0 19.9627 0.4919 \n", "3 39.5 1.0 0 20.6424 0.4968 \n", "4 34.0 1.0 0 18.2897 0.5057 \n", "5 49.5 0.0 0 22.8769 0.4954 \n", "\n", " Value Prevalence \n", "0 1.0000 \n", "1 0.5083 \n", "2 0.4917 \n", "3 0.3500 \n", "4 0.3333 \n", "5 0.3167 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Pass multiple targets (again, using Arguments Option 2)\n", "measure.data(X = X_test[['gender', 'col1']], # used to define rows\n", " Y = X_test[['col2', 'col3']]) # used to define columns" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Feature NameFeature ValueMean col2Mean col3
1gender138.84750.6102
2col1138.50000.5952
\n", "
" ], "text/plain": [ " Feature Name Feature Value Mean col2 Mean col3\n", "1 gender 1 38.8475 0.6102\n", "2 col1 1 38.5000 0.5952" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Analytical tables are output as pandas DataFrames\n", "test_table = measure.data(X=X_test[['gender', 'col1']], # used to define rows\n", " Y=X_test[['col2', 'col3']], # used to define columns\n", " add_overview=False # turns off \"All Features, All Values\" row\n", " )\n", "\n", "test_table.loc[test_table['Feature Value'].eq(\"1\"), ['Feature Name', 'Feature Value', 'Mean col2', 'Mean col3']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Stratified Performance Tables\n", "\n", "The stratified performance table evaluates model performance specific to each feature-value subset. These tables are compatible with both classification and regression models. For classification models with the *predict_proba()* method, additional ROC_AUC and PR_AUC values will be included if possible." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Feature NameFeature ValueObs.Mean TargetMean PredictionAccuracyF1-ScoreFPRPrecisionTPR
0ALL FEATURESALL VALUES120.00.75000.77500.77500.85250.50000.83870.8667
1gender061.00.72130.73770.78690.85390.41180.84440.8636
2gender159.00.77970.81360.76270.85110.61540.83330.8696
\n", "
" ], "text/plain": [ " Feature Name Feature Value Obs. Mean Target Mean Prediction Accuracy \\\n", "0 ALL FEATURES ALL VALUES 120.0 0.7500 0.7750 0.7750 \n", "1 gender 0 61.0 0.7213 0.7377 0.7869 \n", "2 gender 1 59.0 0.7797 0.8136 0.7627 \n", "\n", " F1-Score FPR Precision TPR \n", "0 0.8525 0.5000 0.8387 0.8667 \n", "1 0.8539 0.4118 0.8444 0.8636 \n", "2 0.8511 0.6154 0.8333 0.8696 " ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Performance table example\n", "measure.performance(X_test[['gender']], y_test, model_1.predict(X_test))" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Feature NameFeature ValueObs.Mean TargetMean PredictionAccuracyF1-ScoreFPRPR AUCPrecisionROC AUCTPR
0ALL FEATURESALL VALUES120.00.75000.77500.77500.85250.50000.20620.83870.85830.8667
1gender061.00.72130.73770.78690.85390.41180.22610.84440.84290.8636
2gender159.00.77970.81360.76270.85110.61540.18730.83330.86790.8696
\n", "
" ], "text/plain": [ " Feature Name Feature Value Obs. Mean Target Mean Prediction Accuracy \\\n", "0 ALL FEATURES ALL VALUES 120.0 0.7500 0.7750 0.7750 \n", "1 gender 0 61.0 0.7213 0.7377 0.7869 \n", "2 gender 1 59.0 0.7797 0.8136 0.7627 \n", "\n", " F1-Score FPR PR AUC Precision ROC AUC TPR \n", "0 0.8525 0.5000 0.2062 0.8387 0.8583 0.8667 \n", "1 0.8539 0.4118 0.2261 0.8444 0.8429 0.8636 \n", "2 0.8511 0.6154 0.1873 0.8333 0.8679 0.8696 " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Performance table example with probabilities included\n", "measure.performance(X_test[['gender']], \n", " y_true=y_test, \n", " y_pred=model_1.predict(X_test), \n", " y_prob=model_1.predict_proba(X_test)[:,1])" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Feature NameFeature ValueObs.Mean TargetMean PredictionAccuracyF1-ScoreFPRPrecisionTPR
0ALL FEATURESALL VALUES120.00.750.7750.7750.85250.50.83870.8667
1ethnicity060.00.751.0000.7500.85711.00.75001.0000
2ethnicity160.00.750.5500.8000.84620.01.00000.7333
\n", "
" ], "text/plain": [ " Feature Name Feature Value Obs. Mean Target Mean Prediction Accuracy \\\n", "0 ALL FEATURES ALL VALUES 120.0 0.75 0.775 0.775 \n", "1 ethnicity 0 60.0 0.75 1.000 0.750 \n", "2 ethnicity 1 60.0 0.75 0.550 0.800 \n", "\n", " F1-Score FPR Precision TPR \n", "0 0.8525 0.5 0.8387 0.8667 \n", "1 0.8571 1.0 0.7500 1.0000 \n", "2 0.8462 0.0 1.0000 0.7333 " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Performance table example using ethnicity as the protected attribute\n", "measure.performance(X_test[['ethnicity']], y_test, model_1.predict(X_test))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Stratified Bias Tables\n", "\n", "The stratified bias analysis feature applies fairness-related metrics for each feature-value pair. It assumes a given feature-value as the \"privileged\" group relative to all other possible values for the feature. For example, in the table output shown in the cell below, row **2** in the table below displays measures for **\"col1\"** with a value of **\"2\"**. For this row, \"2\" is considered to be the privileged group, while all other non-null values (namely \"1\" and \"3\") are considered unprivileged.\n", "\n", "To simplify the table, fairness measures have been reduced to their component parts. For example, the Equal Odds Ratio has been reduced to the True Positive Rate (TPR) Ratio and False Positive Rate (FPR) Ratio.\n", "\n", "Note that the *flag* function is compatible with both **measure.bias()** and **measure.summary()** (which is demonstrated below). However, to enable colored cells the tool returns a pandas Styler rather than a DataTable. For this reason, *flag_oor* is False by default for these features. Flagging can be turned on by passing *flag_oor=True* to either function. As an added feature, optional custom ranges can be passed to either **measure.bias()** or **measure.summary()** to facilitate regression evaluation, shown in [Example-ToolUsage_Regression](./examples_and_tutorials/Example-ToolUsage_Regression.ipynb)." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Feature Name Feature Value Balanced Accuracy Difference Balanced Accuracy Ratio FPR Diff FPR Ratio PPV Diff PPV Ratio Selection Diff Selection Ratio TPR Diff TPR Ratio
0gender0-0.09880.86380.20361.4945-0.01110.98680.07591.10280.00591.0069
1gender10.09881.1576-0.20360.66910.01111.0133-0.07590.9068-0.00590.9932
2col300.45691.8413-0.50000.00000.46881.88240.45761.84380.41381.7059
3col31-0.45690.54310.5000nan-0.46880.5312-0.45760.5424-0.41380.5862
" ], "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Example of bias table with flag turned on\n", "measure.bias(X_test[['gender', 'col3']], y_test, model_1.predict(X_test), flag_oor=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The **measure** module also contains a summary function that works similarly to report.compare(). While it can only be applied to one model at a time, it can accept custom \"fair\" ranges, and accept cohort groups as will be [shown in the next section](#cohort)." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Value
Metric Measure
Group FairnessBalanced Accuracy Difference0.0988
Balanced Accuracy Ratio1.1576
Disparate Impact Ratio0.9068
Equal Odds Difference-0.2036
Equal Odds Ratio0.6691
Positive Predictive Parity Difference0.0111
Positive Predictive Parity Ratio1.0133
Statistical Parity Difference-0.0759
Individual FairnessBetween-Group Gen. Entropy Error0.0000
Consistency Score0.7350
Data MetricsPrevalence of Privileged Class (%)49.0000
" ], "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Example summary with performance skipped\n", "measure.summary(X_test[['col2']], \n", " y_test, \n", " model_1.predict(X_test),\n", " prtc_attr=X_test['gender'], \n", " pred_type=\"classification\",\n", " skip_performance=True,\n", " flag_oor=True\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analysis by Cohort\n", "\n", "Table-generating functions in the **measure** module can all be additionally grouped using the *cohort_labels* argument to specify additional labels for each observation. Cohorts may consist of either as a single label or a set of labels, and may be either separate from or attached to the existing data." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
gender Feature Name Feature Value Balanced Accuracy Difference Balanced Accuracy Ratio FPR Diff FPR Ratio PPV Diff PPV Ratio Selection Diff Selection Ratio TPR Diff TPR Ratio
00col300.36381.5718-0.41180.00000.35001.53850.44441.80000.31581.4615
10col31-0.36380.63620.4118nan-0.35000.6500-0.44440.5556-0.31580.6842
21col300.60772.5490-0.61540.00000.66673.00000.47831.91670.60002.5000
31col31-0.60770.39230.6154nan-0.66670.3333-0.47830.5217-0.60000.4000
" ], "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Separate, Single-Level Cohorts\n", "cohort_labels = X_test['gender']\n", "measure.bias(X_test['col3'], y_test, model_1.predict(X_test), \n", " flag_oor=True, cohort_labels=cohort_labels)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Feature NameFeature ValueObs.EntropyMean Example_TargetMedian Example_TargetMissing ValuesStd. Dev. Example_TargetValue Prevalence
genderethnicity
00ALL FEATURESALL VALUES32NaN0.78121.000.42001.0000
0col30200.95440.65001.000.48940.6250
0col31120.95441.00001.000.00000.3750
1ALL FEATURESALL VALUES29NaN0.65521.000.48371.0000
1col30160.99230.37500.000.50000.5517
1col31130.99231.00001.000.00000.4483
10ALL FEATURESALL VALUES28NaN0.71431.000.46001.0000
0col30120.98520.33330.000.49240.4286
0col31160.98521.00001.000.00000.5714
1ALL FEATURESALL VALUES31NaN0.83871.000.37391.0000
1col30110.93830.54551.000.52220.3548
1col31200.93831.00001.000.00000.6452
\n", "
" ], "text/plain": [ " Feature Name Feature Value Obs. Entropy \\\n", "gender ethnicity \n", "0 0 ALL FEATURES ALL VALUES 32 NaN \n", " 0 col3 0 20 0.9544 \n", " 0 col3 1 12 0.9544 \n", " 1 ALL FEATURES ALL VALUES 29 NaN \n", " 1 col3 0 16 0.9923 \n", " 1 col3 1 13 0.9923 \n", "1 0 ALL FEATURES ALL VALUES 28 NaN \n", " 0 col3 0 12 0.9852 \n", " 0 col3 1 16 0.9852 \n", " 1 ALL FEATURES ALL VALUES 31 NaN \n", " 1 col3 0 11 0.9383 \n", " 1 col3 1 20 0.9383 \n", "\n", " Mean Example_Target Median Example_Target Missing Values \\\n", "gender ethnicity \n", "0 0 0.7812 1.0 0 \n", " 0 0.6500 1.0 0 \n", " 0 1.0000 1.0 0 \n", " 1 0.6552 1.0 0 \n", " 1 0.3750 0.0 0 \n", " 1 1.0000 1.0 0 \n", "1 0 0.7143 1.0 0 \n", " 0 0.3333 0.0 0 \n", " 0 1.0000 1.0 0 \n", " 1 0.8387 1.0 0 \n", " 1 0.5455 1.0 0 \n", " 1 1.0000 1.0 0 \n", "\n", " Std. Dev. Example_Target Value Prevalence \n", "gender ethnicity \n", "0 0 0.4200 1.0000 \n", " 0 0.4894 0.6250 \n", " 0 0.0000 0.3750 \n", " 1 0.4837 1.0000 \n", " 1 0.5000 0.5517 \n", " 1 0.0000 0.4483 \n", "1 0 0.4600 1.0000 \n", " 0 0.4924 0.4286 \n", " 0 0.0000 0.5714 \n", " 1 0.3739 1.0000 \n", " 1 0.5222 0.3548 \n", " 1 0.0000 0.6452 " ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Associated, Multi-Level Cohorts\n", "measure.data(X=X_test['col3'], Y=y_test, cohort_labels=X_test[['gender', 'ethnicity']])" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "~/opt/anaconda3/envs/exactech/lib/python3.6/site-packages/aif360/sklearn/metrics/metrics.py:116: UndefinedMetricWarning: The ratio is ill-defined and being set to 0.0 because 'predmean' for privileged samples is 0.\n", " UndefinedMetricWarning)\n", "~/repos/fairMLHealth/fairmlhealth/measure.py:888: UserWarning: The following measures are undefined and have been dropped: ['Positive Predictive Parity Ratio', 'Equal Odds Ratio']\n", " warn(f\"The following measures are undefined and have been dropped: {undefined}\")\n", "~/repos/fairMLHealth/fairmlhealth/__utils.py:214: UserWarning: Could not evaluate function for group(s): {errant_list}. This is commonly caused when there is too little data or there is only a single feature-value pair is available in a given cohort. Each cohort must have 5 observations.\n", " warn(msg)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Value
MetricMeasureethnicitycol3
Group FairnessBalanced Accuracy Difference000.0000
Balanced Accuracy Ratio001.0000
Disparate Impact Ratio001.0000
Equal Odds Difference000.0000
Equal Odds Ratio001.0000
Positive Predictive Parity Difference000.3167
Positive Predictive Parity Ratio001.9500
Statistical Parity Difference000.0000
Individual FairnessBetween-Group Gen. Entropy Error000.0054
Consistency Score001.0000
Data MetricsPrevalence of Privileged Class (%)0038.0000
Group FairnessBalanced Accuracy Difference100.0000
Balanced Accuracy Ratio101.0000
Disparate Impact Ratio100.0000
Equal Odds Difference100.0000
Positive Predictive Parity Difference100.0000
Statistical Parity Difference100.0000
Individual FairnessBetween-Group Gen. Entropy Error100.0114
Consistency Score101.0000
Data MetricsPrevalence of Privileged Class (%)1041.0000
\n", "
" ], "text/plain": [ " Value\n", "Metric Measure ethnicity col3 \n", "Group Fairness Balanced Accuracy Difference 0 0 0.0000\n", " Balanced Accuracy Ratio 0 0 1.0000\n", " Disparate Impact Ratio 0 0 1.0000\n", " Equal Odds Difference 0 0 0.0000\n", " Equal Odds Ratio 0 0 1.0000\n", " Positive Predictive Parity Difference 0 0 0.3167\n", " Positive Predictive Parity Ratio 0 0 1.9500\n", " Statistical Parity Difference 0 0 0.0000\n", "Individual Fairness Between-Group Gen. Entropy Error 0 0 0.0054\n", " Consistency Score 0 0 1.0000\n", "Data Metrics Prevalence of Privileged Class (%) 0 0 38.0000\n", "Group Fairness Balanced Accuracy Difference 1 0 0.0000\n", " Balanced Accuracy Ratio 1 0 1.0000\n", " Disparate Impact Ratio 1 0 0.0000\n", " Equal Odds Difference 1 0 0.0000\n", " Positive Predictive Parity Difference 1 0 0.0000\n", " Statistical Parity Difference 1 0 0.0000\n", "Individual Fairness Between-Group Gen. Entropy Error 1 0 0.0114\n", " Consistency Score 1 0 1.0000\n", "Data Metrics Prevalence of Privileged Class (%) 1 0 41.0000" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Cohorts for summary tables\n", "measure.summary(X_test[['col2']], \n", " y_test, \n", " model_1.predict(X_test),\n", " prtc_attr=X_test['gender'], \n", " pred_type=\"classification\",\n", " flag_oor=False,\n", " skip_performance=True,\n", " cohort_labels=X_test[['ethnicity', 'col3']]\n", " )" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 4 }