{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Example Tool Usage - Regression Problems\n", "----\n", "\n", "# About\n", "This notebook contains simple, toy examples to help you get started with FairMLHealth tool usage. This same content is mirrored in the repository's main [README](../../../README.md)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Example Setup" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from fairmlhealth import report, measure, stat_utils\n", "import numpy as np\n", "import pandas as pd\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import LinearRegression, TweedieRegressor" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# First, we'll create a semi-randomized dataframe with specific columns for our attributes of interest\n", "rng = np.random.RandomState(506)\n", "N = 240\n", "X = pd.DataFrame({'col1': rng.randint(1, 4, N), \n", " 'col2': rng.randint(1, 75, N),\n", " 'col3': rng.randint(0, 2, N),\n", " 'gender': [0, 1]*int(N/2), \n", " 'ethnicity': [1, 1, 0, 0]*int(N/4),\n", " 'other': [1, 0, 0, 0, 1, 0, 0, 1]*int(N/8)\n", " })\n", "\n", "# Second, we'll create a randomized target variable\n", "y = pd.Series((X['col3']+X['gender']).values + rng.uniform(0, 6, N), name='Example_Target')\n", "\n", "# Third, we'll split the data and use it to train two generic models\n", "splits = train_test_split(X, y, test_size=0.5, random_state=42)\n", "X_train, X_test, y_train, y_test = splits\n", "\n", "model_1 = LinearRegression().fit(X_train, y_train)\n", "model_2 = TweedieRegressor().fit(X_train, y_train)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col1col2col3genderethnicityother
01150011
13511110
21301000
32281100
41720011
\n", "
" ], "text/plain": [ " col1 col2 col3 gender ethnicity other\n", "0 1 15 0 0 1 1\n", "1 3 51 1 1 1 0\n", "2 1 30 1 0 0 0\n", "3 2 28 1 1 0 0\n", "4 1 72 0 0 1 1" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0 1.700759\n", "1 2.312593\n", "2 6.117705\n", "3 3.481302\n", "4 1.051515\n", "Name: Example_Target, dtype: float64" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(X.head(), y.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Generalized Reports\n", "fairMLHealth has tools to create generalized reports of model bias and performance.\n", "\n", "The primary reporting tool is now the **compare** function, which can be used to generate side-by-side comparisons for any number of models, and for either binary classifcation or for regression problems. Model performance metrics such as accuracy and precision (or MAE and RSquared for regression problems) are also provided to facilitate comparison. \n", "\n", "A flagging protocol is applied by default to highlight any cells with values that are out of range. This can be turned off by passing ***flag_oor = False*** to report.compare().\n", "\n", "Below is an example applying the function for a regression model. Note that the \"fair\" range to be used for evaluation of regression metrics does requires judgment on the part of the user. Default ranges have been set to [0.8, 1.2] for ratios, 10% of the available target range for *Mean Prediction Difference*, and 10% of the available MAE range for *MAE Difference*. If the default flags do not meet your needs, they can be turned off by passing ***flag_oor = False*** to report.compare(). More information is available in our [Evaluating Fairness Documentation](./docs/resources/Evaluating_Fairness.md#regression_ranges)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
model 1
Metric Measure
Group FairnessMAE Difference0.3878
MAE Ratio1.2864
Mean Prediction Difference-1.0663
Mean Prediction Ratio0.7721
Individual FairnessBetween-Group Gen. Entropy Error0.0000
Consistency Score0.3652
Model PerformanceMAE1.5547
MSE3.3753
Mean Error-0.1224
Mean Example_Target4.2513
Mean Prediction4.1290
Rsqrd0.1326
Std. Dev. Error1.8408
Std. Dev. Example_Target1.9809
Std. Dev. Prediction0.9631
Data MetricsPrevalence of Privileged Class (%)48.0000
" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Generate a measure report\n", "report.compare(X_test, y_test, X_test['gender'], model_1, pred_type=\"regression\")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Returned type: \n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
model 1
Metric Measure
Group FairnessMAE Difference0.3878
MAE Ratio1.2864
Mean Prediction Difference-1.0663
Mean Prediction Ratio0.7721
Individual FairnessBetween-Group Gen. Entropy Error0.0000
Consistency Score0.3652
Data MetricsPrevalence of Privileged Class (%)48.0000
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Display the same report without performance measures\n", "bias_report = report.compare(test_data=X_test, \n", " targets=y_test, \n", " protected_attr=X_test['gender'], \n", " models=model_1, \n", " pred_type=\"regression\", \n", " skip_performance=True)\n", "print(\"Returned type:\", type(bias_report))\n", "display(bias_report)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Alternative Return Types\n", "\n", "By default the **compare** function returns a flagged comparison of type pandas Styler (pandas.io.formats.style.Styler). When flags are disabled, the default return type is a pandas DataFrame. Outputs can also be returned as embedded HTML -- with or without flags -- by specitying *output_type=\"html\"*. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Returned type: \n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
model 1
MetricMeasure
Group FairnessMAE Difference0.3878
MAE Ratio1.2864
\n", "
" ], "text/plain": [ " model 1\n", "Metric Measure \n", "Group Fairness MAE Difference 0.3878\n", " MAE Ratio 1.2864" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# With flags disabled, the report is returned as a pandas DataFrame\n", "df = report.compare(test_data=X_test, \n", " targets=y_test, \n", " protected_attr=X_test['gender'], \n", " models=model_1, \n", " pred_type=\"regression\",\n", " flag_oor=False)\n", "\n", "print(\"Returned type:\", type(df))\n", "display(df.head(2))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Returned type: \n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
model 1
Metric Measure
Group FairnessMAE Difference0.3878
MAE Ratio1.2864
Mean Prediction Difference-1.0663
Mean Prediction Ratio0.7721
Individual FairnessBetween-Group Gen. Entropy Error0.0000
Consistency Score0.3652
Model PerformanceMAE1.5547
MSE3.3753
Mean Error-0.1224
Mean Example_Target4.2513
Mean Prediction4.1290
Rsqrd0.1326
Std. Dev. Error1.8408
Std. Dev. Example_Target1.9809
Std. Dev. Prediction0.9631
Data MetricsPrevalence of Privileged Class (%)48.0000
" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Comparisons can also be returned as embedded HTML\n", "from IPython.core.display import HTML\n", "html_output = report.compare(test_data=X_test, \n", " targets=y_test, \n", " protected_attr=X_test['gender'], \n", " models=model_1, \n", " pred_type=\"regression\", \n", " output_type=\"html\")\n", "print(\"Returned type:\", type(html_output))\n", "HTML(html_output)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comparing Results for Multiple Models\n", "\n", "The **compare** tool can also be used to measure two different models or two different protected attributes. Protected attributes are measured separately and cannot yet be combined together with the **compare** tool, although they can be grouped as cohorts in the stratified tables [as shown below](#cohort).\n", "\n", "Here is an example output comparing the two test models defined above. Missing values have been added for metrics requiring prediction probabilities, which the second model does not have (note the warning below)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
model 1 model 2
Metric Measure
Group FairnessMAE Difference0.38780.3357
MAE Ratio1.28641.2271
Mean Prediction Difference-1.0663-0.2019
Mean Prediction Ratio0.77210.9523
Individual FairnessBetween-Group Gen. Entropy Error0.00000.0000
Consistency Score0.36520.8737
Model PerformanceMAE1.55471.6516
MSE3.37533.7409
Mean Error-0.1224-0.1204
Mean Example_Target4.25134.2513
Mean Prediction4.12904.1310
Rsqrd0.13260.0386
Std. Dev. Error1.84081.9385
Std. Dev. Example_Target1.98091.9809
Std. Dev. Prediction0.96310.2086
Data MetricsPrevalence of Privileged Class (%)48.000048.0000
" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Generate a pandas dataframe of measures\n", "report.compare(X_test, \n", " y_test, \n", " X_test['gender'], \n", " {'model 1':model_1, 'model 2':model_2}, \n", " pred_type=\"regression\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Detailed Analyses\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Significance Testing\n", "\n", "It is generally recommended to test whether any differences in model outcomes for protected attributes are the effect of a sampling error in our test. FairMLHealth comes with a bootstrapping utility and supporting functions that can be used in statistical testing. The bootstrapping utility accepts any function that returns a p-value and will return a True or False if the p-value is greater than some alpha for a threshold number of randomly sampled trials. While the selection of proper statistical tests is beyond the scope of this notebook, three examples using the bootstrap_significance tool with a built-in Kruskal-Wallis test function are shown below." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Is the y value is different for male vs female?\n", " True\n" ] } ], "source": [ "# Example 1 Bootstrap Test Results Applying Kruskal-Wallis to Relative to Gender\n", "isMale = X['gender'].eq(1)\n", "reject_h0 = stat_utils.bootstrap_significance(func=stat_utils.kruskal_pval, \n", " a=y.loc[isMale], \n", " b=y.loc[~isMale])\n", "print(\"Is the y value is different for male vs female?\\n\", reject_h0)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Is the y-value is different for caucasian vs not-caucasian?\n", " False\n" ] } ], "source": [ "# Example 1 Bootstrap Test Results Applying Kruskal-Wallis to Relative to Ethnicity\n", "isCaucasian = X['ethnicity'].eq(1)\n", "reject_h0 = stat_utils.bootstrap_significance(func=stat_utils.kruskal_pval, \n", " a=y.loc[isCaucasian], \n", " b=y.loc[~isCaucasian])\n", "print(\"Is the y-value is different for caucasian vs not-caucasian?\\n\", reject_h0)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "P-Value of single K-W test: 2.981592458110808e-10\n" ] } ], "source": [ "# Example of Single Krusakal-Wallis Test\n", "pval = stat_utils.kruskal_pval(a=y.loc[X['col3'].eq(1)], \n", " b=y.loc[X['col3'].eq(0)], \n", " # If n_sample set to None, tests on full dataset rather than sample\n", " n_sample=None\n", " )\n", "print(\"P-Value of single K-W test:\", pval)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Stratified Tables\n", "FairMLHealth also provides tools for detailed analysis of model variance by way of stratified data, performance, and bias tables. Beyond evaluating fairness, these tools are intended for flexible use in any generic assessment of model bais. Tables can evaluate multiple features at once. *An important update starting in Version 1.0.0 is that all of these features are now contained in the **measure.py** module (previously named reports.py).*\n", "\n", "All tables display a summary row for \"All Features, All Values\". This summary can be turned off by passing ***add_overview=False*** to measure.data()." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Tables\n", "\n", "The stratified data table can be used to evaluate data against one or multiple targets. Two methods are available for identifying which features to assess, as shown in the examples below. " ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Feature NameFeature ValueObs.EntropyMean Example_TargetMedian Example_TargetMissing ValuesStd. Dev. Example_TargetValue Prevalence
0ALL FEATURESALL VALUES120NaN4.25134.574501.98091.0000
1gender0620.99923.54103.783502.03570.5167
2gender1580.99925.01065.067301.61920.4833
\n", "
" ], "text/plain": [ " Feature Name Feature Value Obs. Entropy Mean Example_Target \\\n", "0 ALL FEATURES ALL VALUES 120 NaN 4.2513 \n", "1 gender 0 62 0.9992 3.5410 \n", "2 gender 1 58 0.9992 5.0106 \n", "\n", " Median Example_Target Missing Values Std. Dev. Example_Target \\\n", "0 4.5745 0 1.9809 \n", "1 3.7835 0 2.0357 \n", "2 5.0673 0 1.6192 \n", "\n", " Value Prevalence \n", "0 1.0000 \n", "1 0.5167 \n", "2 0.4833 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Arguments Option 1: pass full set of data, subsetting with *features* argument\n", "measure.data(X_test, y_test, features=['gender'])" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Feature NameFeature ValueObs.EntropyMean Example_TargetMedian Example_TargetMissing ValuesStd. Dev. Example_TargetValue Prevalence
0ALL FEATURESALL VALUES120NaN4.25134.574501.98091.0000
1gender0620.99923.54103.783502.03570.5167
2gender1580.99925.01065.067301.61920.4833
\n", "
" ], "text/plain": [ " Feature Name Feature Value Obs. Entropy Mean Example_Target \\\n", "0 ALL FEATURES ALL VALUES 120 NaN 4.2513 \n", "1 gender 0 62 0.9992 3.5410 \n", "2 gender 1 58 0.9992 5.0106 \n", "\n", " Median Example_Target Missing Values Std. Dev. Example_Target \\\n", "0 4.5745 0 1.9809 \n", "1 3.7835 0 2.0357 \n", "2 5.0673 0 1.6192 \n", "\n", " Value Prevalence \n", "0 1.0000 \n", "1 0.5167 \n", "2 0.4833 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Arguments Option 2: pass the data subset of interest without using the *features* argument\n", "measure.data(X_test['gender'], y_test)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Feature NameFeature ValueObs.EntropyMean col2Mean col3Median col2Median col3Missing ValuesStd. Dev. col2Std. Dev. col3Value Prevalence
0gender0620.999236.64520.467734.50.0022.58110.50300.5167
1gender1580.999236.22410.603432.51.0020.98210.49350.4833
2col11511.557936.47060.647132.01.0021.29350.48260.4250
3col12331.557933.63640.454530.00.0021.42260.50560.2750
4col13361.557938.97220.444440.00.0022.90160.50400.3000
\n", "
" ], "text/plain": [ " Feature Name Feature Value Obs. Entropy Mean col2 Mean col3 \\\n", "0 gender 0 62 0.9992 36.6452 0.4677 \n", "1 gender 1 58 0.9992 36.2241 0.6034 \n", "2 col1 1 51 1.5579 36.4706 0.6471 \n", "3 col1 2 33 1.5579 33.6364 0.4545 \n", "4 col1 3 36 1.5579 38.9722 0.4444 \n", "\n", " Median col2 Median col3 Missing Values Std. Dev. col2 Std. Dev. col3 \\\n", "0 34.5 0.0 0 22.5811 0.5030 \n", "1 32.5 1.0 0 20.9821 0.4935 \n", "2 32.0 1.0 0 21.2935 0.4826 \n", "3 30.0 0.0 0 21.4226 0.5056 \n", "4 40.0 0.0 0 22.9016 0.5040 \n", "\n", " Value Prevalence \n", "0 0.5167 \n", "1 0.4833 \n", "2 0.4250 \n", "3 0.2750 \n", "4 0.3000 " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display a similar report for multiple targets, dropping the summary row\n", "measure.data(X=X_test, # used to define rows\n", " Y=X_test, # used to define columns\n", " features=['gender', 'col1'], # optional subset of X\n", " targets=['col2', 'col3'], # optional subset of Y\n", " add_overview=False # turns off \"All Features, All Values\" row\n", " )" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Feature NameFeature ValueMean col2Mean col3
2gender136.22410.6034
3col1136.47060.6471
\n", "
" ], "text/plain": [ " Feature Name Feature Value Mean col2 Mean col3\n", "2 gender 1 36.2241 0.6034\n", "3 col1 1 36.4706 0.6471" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Analytical tables are output as pandas DataFrames\n", "test_table = measure.data(X=X_test[['gender', 'col1']], # used to define rows\n", " Y=X_test[['col2', 'col3']], # used to define columns\n", " )\n", "\n", "test_table.loc[test_table['Feature Value'].eq(\"1\"), ['Feature Name', 'Feature Value', 'Mean col2', 'Mean col3']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Stratified Performance Tables\n", "\n", "The stratified performance table evaluates model performance specific to each feature-value subset. These tables are compatible with both classification and regression models." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Feature NameFeature ValueObs.Mean TargetMean PredictionMAEMSEMean ErrorRsqrdStd. Dev. ErrorStd. Dev. PredictionStd. Dev. Target
0ALL FEATURESALL VALUES120.04.25134.12901.55473.3753-0.12240.13261.84080.96311.9809
1gender062.03.54103.61361.74223.87870.07250.04871.98420.76712.0357
2gender158.05.01064.67991.35442.8372-0.3307-0.10121.66600.84201.6192
\n", "
" ], "text/plain": [ " Feature Name Feature Value Obs. Mean Target Mean Prediction MAE \\\n", "0 ALL FEATURES ALL VALUES 120.0 4.2513 4.1290 1.5547 \n", "1 gender 0 62.0 3.5410 3.6136 1.7422 \n", "2 gender 1 58.0 5.0106 4.6799 1.3544 \n", "\n", " MSE Mean Error Rsqrd Std. Dev. Error Std. Dev. Prediction \\\n", "0 3.3753 -0.1224 0.1326 1.8408 0.9631 \n", "1 3.8787 0.0725 0.0487 1.9842 0.7671 \n", "2 2.8372 -0.3307 -0.1012 1.6660 0.8420 \n", "\n", " Std. Dev. Target \n", "0 1.9809 \n", "1 2.0357 \n", "2 1.6192 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Performance table example\n", "measure.performance(X_test[['gender']], y_test, model_1.predict(X_test), \n", " pred_type=\"regression\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Stratified Bias Tables\n", "\n", "The stratified bias analysis feature applies fairness-related metrics for each feature-value pair. It assumes a given feature-value as the \"privileged\" group relative to all other possible values for the feature. For example, in the table output shown in the cell below, row **2** in the table below displays measures for **\"col1\"** with a value of **\"2\"**. For this row, \"2\" is considered to be the privileged group, while all other non-null values (namely \"1\" and \"3\") are considered unprivileged.\n", "\n", "Note that the *flag* function is compatible with both **measure.bias()** and **measure.summary()** (which is demonstrated below). However, to enable colored cells the tool returns a pandas Styler rather than a DataTable. For this reason, *flag_oor* is False by default for these features. Flagging can be turned on by passing *flag_oor=True* to either function. As an added feature, optional custom ranges can be passed to either **measure.bias()** or **measure.summary()** to facilitate regression evaluation as shown below." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Feature Name Feature Value MAE Difference MAE Ratio Mean Prediction Difference Mean Prediction Ratio
0gender0-0.38780.77741.06631.2951
1gender10.38781.2864-1.06630.7721
2col11-0.22750.86500.15451.0382
3col120.24951.18160.13371.0332
4col130.02791.0182-0.30670.9294
" ], "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Custom \"fair\" ranges may be passed as dictionaries of tuples whose keys \n", "# are case-insensitive measure names\n", "my_ranges = {'mean prediction difference':(-2, 2)}\n", "\n", "# Note that flag_oor is set to False by default for this feature\n", "measure.bias(X_test[['gender', 'col1']],\n", " y_test,\n", " model_1.predict(X_test),\n", " pred_type=\"regression\",\n", " flag_oor=True,\n", " custom_ranges=my_ranges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The **measure** module also contains a summary function that works similarly to report.compare(). While it can only be applied to one model at a time, it can accept custom \"fair\" ranges, and accept cohort groups as [shown in the next section](#cohort)." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Value
Metric Measure
Group FairnessMAE Difference0.3878
MAE Ratio1.2864
Mean Prediction Difference-1.0663
Mean Prediction Ratio0.7721
Individual FairnessBetween-Group Gen. Entropy Error0.0000
Consistency Score0.3141
Model PerformanceMAE1.5547
MSE3.3753
Mean Error-0.1224
Mean Example_Target4.2513
Mean Prediction4.1290
Rsqrd0.1326
Std. Dev. Error1.8408
Std. Dev. Example_Target1.9809
Std. Dev. Prediction0.9631
Data MetricsPrevalence of Privileged Class (%)48.0000
" ], "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Example summary output for the regression model with custom ranges\n", "measure.summary(X_test[['gender', 'col1']],\n", " y_test,\n", " model_1.predict(X_test),\n", " prtc_attr=X_test['gender'],\n", " pred_type=\"regression\",\n", " flag_oor=True,\n", " custom_ranges={ 'mean prediction difference':(-0.5, 2)})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analysis by Cohort\n", "\n", "Table-generating functions in the **measure** module can be additionally grouped using the *cohort_labels* argument to specify additional labels for each observation. Cohorts may consist of either as a single label or a set of labels, and may be either separate from or attached to the existing data." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
True Value Group Feature Name Feature Value MAE Difference MAE Ratio Mean Prediction Difference Mean Prediction Ratio
00col300.94211.59541.46681.4613
10col31-0.94210.6268-1.46680.6843
21col30-0.49560.56981.42321.4092
31col310.49561.7549-1.42320.7096
42col30-1.12910.58101.28331.3623
52col311.12911.7211-1.28330.7340
" ], "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Define cohort labels relative to the true values of the target\n", "cohort_labels = pd.qcut(y_test, 3, labels=False).rename('True Value Group')\n", "\n", "# Separate, Single-Level Cohorts\n", "measure.bias(X_test['col3'], y_test, model_1.predict(X_test), \n", " pred_type=\"regression\", flag_oor=True, \n", " cohort_labels=cohort_labels)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Feature NameFeature ValueObs.EntropyMean Example_TargetMedian Example_TargetMissing ValuesStd. Dev. Example_TargetValue Prevalence
genderethnicity
00ALL FEATURESALL VALUES29NaN3.92733.984701.91641.0000
0col30150.99913.37973.775401.95320.5172
0col31140.99914.51414.448501.75640.4828
1ALL FEATURESALL VALUES33NaN3.20162.602402.10531.0000
1col30180.99402.49201.670901.88590.5455
1col31150.99404.05305.142602.09490.4545
10ALL FEATURESALL VALUES26NaN4.95444.789501.47011.0000
0col30110.98294.65574.571101.60140.4231
0col31150.98295.17355.094801.38050.5769
1ALL FEATURESALL VALUES32NaN5.05635.295701.75301.0000
1col30120.95444.24364.239701.77310.3750
1col31200.95445.54405.674001.58940.6250
\n", "
" ], "text/plain": [ " Feature Name Feature Value Obs. Entropy \\\n", "gender ethnicity \n", "0 0 ALL FEATURES ALL VALUES 29 NaN \n", " 0 col3 0 15 0.9991 \n", " 0 col3 1 14 0.9991 \n", " 1 ALL FEATURES ALL VALUES 33 NaN \n", " 1 col3 0 18 0.9940 \n", " 1 col3 1 15 0.9940 \n", "1 0 ALL FEATURES ALL VALUES 26 NaN \n", " 0 col3 0 11 0.9829 \n", " 0 col3 1 15 0.9829 \n", " 1 ALL FEATURES ALL VALUES 32 NaN \n", " 1 col3 0 12 0.9544 \n", " 1 col3 1 20 0.9544 \n", "\n", " Mean Example_Target Median Example_Target Missing Values \\\n", "gender ethnicity \n", "0 0 3.9273 3.9847 0 \n", " 0 3.3797 3.7754 0 \n", " 0 4.5141 4.4485 0 \n", " 1 3.2016 2.6024 0 \n", " 1 2.4920 1.6709 0 \n", " 1 4.0530 5.1426 0 \n", "1 0 4.9544 4.7895 0 \n", " 0 4.6557 4.5711 0 \n", " 0 5.1735 5.0948 0 \n", " 1 5.0563 5.2957 0 \n", " 1 4.2436 4.2397 0 \n", " 1 5.5440 5.6740 0 \n", "\n", " Std. Dev. Example_Target Value Prevalence \n", "gender ethnicity \n", "0 0 1.9164 1.0000 \n", " 0 1.9532 0.5172 \n", " 0 1.7564 0.4828 \n", " 1 2.1053 1.0000 \n", " 1 1.8859 0.5455 \n", " 1 2.0949 0.4545 \n", "1 0 1.4701 1.0000 \n", " 0 1.6014 0.4231 \n", " 0 1.3805 0.5769 \n", " 1 1.7530 1.0000 \n", " 1 1.7731 0.3750 \n", " 1 1.5894 0.6250 " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Multi-Level Cohorts for the Data table\n", "measure.data(X=X_test[['col3']], Y=y_test, cohort_labels=X_test[['gender', 'ethnicity']])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 4 }