{ "cells": [ { "cell_type": "markdown", "id": "62e70a2a", "metadata": {}, "source": [ "# Data\n", "\n", "For the illustration of the group fairness metrics in TrustyAI, two synthetic datasets were created with the same input features and outcome types. \n", "The outcome is whether a certain invidual reaches a 50k income threshold by using age, race and gender as categorical inputs and both datasets consist of $N=10000$ data points.\n", "The gender values are allocated with a proportion of 20% to `gender=0` and 80% to `gender=1`.\n", "\n", "Both datasets have an increasing likelihood (with uniform probability) of having a positive outcome with age, regardless of race or gender.\n", "The first dataset, deemed _unbiased_, simply allocates the income value with an uniform random value, regardless of race or gender.\n", "The second dataset, deemed _biased_, allocates a positive outcome to `gender=0` with a lower probability than `gender=1`." ] }, { "cell_type": "code", "execution_count": 1, "id": "6de2a925", "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "id": "98cd9647", "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(\"data/income-unbiased.zip\", index_col=False)" ] }, { "cell_type": "code", "execution_count": 3, "id": "be16cc2e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ageracegenderincome
013000
165701
271610
338111
442001
...............
999520510
999634210
999725211
999873511
999958311
\n", "

10000 rows × 4 columns

\n", "
" ], "text/plain": [ " age race gender income\n", "0 13 0 0 0\n", "1 65 7 0 1\n", "2 71 6 1 0\n", "3 38 1 1 1\n", "4 42 0 0 1\n", "... ... ... ... ...\n", "9995 20 5 1 0\n", "9996 34 2 1 0\n", "9997 25 2 1 1\n", "9998 73 5 1 1\n", "9999 58 3 1 1\n", "\n", "[10000 rows x 4 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "id": "20e2a6d7", "metadata": {}, "source": [ "# Demographic Parity\n", "\n", "\n", "_Demographic Parity_ provides a measure of imbalances in positive and negative outcomes between priveleged and unprivileged groups.\n", "\n", "Taking the previous data as an example, we would use Demographic Parity metrics to measure if (for instance), the `income` is predicted to be above or below $50k regardless of race or gender.\n", "\n", "\n", "## Statistical Parity Difference\n", "\n", "The _Statistical Parity Difference (SPD)_ is the difference in the probability of prediction between the privileged and unprivileged groups. Typically:\n", "\n", "- $SPD=0$ means that the model is behaving fairly in regards of the selected attribute (e.g. race, gender)\n", "- Values between $-0.1" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "nobias.groupby(['gender', 'income'])['income'].count().unstack().plot.bar()" ] }, { "cell_type": "code", "execution_count": 7, "id": "2b2c678a", "metadata": {}, "outputs": [], "source": [ "from trustyai.metrics.fairness.group import statistical_parity_difference\n", "from trustyai.model import output\n", "\n", "nobias_privileged = nobias[nobias.gender == 1]\n", "nobias_unprivileged = nobias[nobias.gender == 0]\n", "favorable = output(\"income\", dtype=\"number\", value=1)\n", "score = statistical_parity_difference(privileged=nobias_privileged,\n", " unprivileged=nobias_unprivileged,\n", " favorable=[favorable])" ] }, { "cell_type": "code", "execution_count": 8, "id": "9e548018", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.0036255104824703954\n" ] } ], "source": [ "print(score)" ] }, { "cell_type": "markdown", "id": "a13a2ac3", "metadata": {}, "source": [ "We can see that the $SPD$ for this dataset is between the $[-0.1, 0.1]$ threshold, which classifies the model as _reasonably fair_." ] }, { "cell_type": "markdown", "id": "09bb7d45", "metadata": {}, "source": [ "### Biased dataset" ] }, { "cell_type": "code", "execution_count": 9, "id": "63b953c9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "gender income\n", "0 0 1772\n", " 1 242\n", "1 0 5775\n", " 1 2211\n", "Name: income, dtype: int64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bias = pd.read_csv(\"data/income-biased.zip\", index_col=False)\n", "bias.groupby(['gender', 'income'])['income'].count()" ] }, { "cell_type": "code", "execution_count": 10, "id": "aed61b77", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "bias.groupby(['gender', 'income'])['income'].count().unstack().plot.bar()" ] }, { "cell_type": "code", "execution_count": 12, "id": "901e5720", "metadata": {}, "outputs": [], "source": [ "bias_privileged = bias[bias.gender == 1]\n", "bias_unprivileged = bias[bias.gender == 0]\n", "\n", "score = statistical_parity_difference(privileged=bias_privileged,\n", " unprivileged=bias_unprivileged,\n", " favorable=[favorable])" ] }, { "cell_type": "code", "execution_count": 13, "id": "7be544a7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-0.15670061634672994\n" ] } ], "source": [ "print(score)" ] }, { "cell_type": "markdown", "id": "8e3f2bd4", "metadata": {}, "source": [ "This dataset, as expected, is outside the $[-0.1, 0.1]$ threshold, which classifies the model as _unfair_.\n", "In addiction, the negative score indicates that the unprivileged group (in our example, `gender = 0`) is the one in disadvantage for this particular outcome." ] }, { "cell_type": "markdown", "id": "de0affcf", "metadata": {}, "source": [ "## Disparate impact ratio\n", "\n", "\n", "Similarly to the _Statistical Parity Difference_, the _Disparate Impact Ratio (DIR)_ measures imbalances in positive outcome predictions across privliged and unpriviliged groups.\n", "Instead of calculating the difference, this metric calculates the ration of such selection rates.Typically:\n", "\n", "- $DIR=1$ means that the model is fair with regards to the protected attribute.\n", "- $0.8 22\u001b[0m xgb \u001b[39m=\u001b[39m train(\u001b[39m\"\u001b[39;49m\u001b[39mdata/income-biased.zip\u001b[39;49m\u001b[39m\"\u001b[39;49m)\n", "Cell \u001b[0;32mIn[48], line 19\u001b[0m, in \u001b[0;36mtrain\u001b[0;34m(dataset)\u001b[0m\n\u001b[1;32m 13\u001b[0m _y \u001b[39m=\u001b[39m df\u001b[39m.\u001b[39mincome\n\u001b[1;32m 15\u001b[0m clf \u001b[39m=\u001b[39m XGBClassifier(objective\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mbinary:logistic\u001b[39m\u001b[39m\"\u001b[39m, \n\u001b[1;32m 16\u001b[0m enable_categorical\u001b[39m=\u001b[39m\u001b[39mTrue\u001b[39;00m, \n\u001b[1;32m 17\u001b[0m use_label_encoder\u001b[39m=\u001b[39m\u001b[39mFalse\u001b[39;00m,\n\u001b[1;32m 18\u001b[0m eval_metric\u001b[39m=\u001b[39m\u001b[39m'\u001b[39m\u001b[39mlogloss\u001b[39m\u001b[39m'\u001b[39m)\n\u001b[0;32m---> 19\u001b[0m clf\u001b[39m.\u001b[39;49mfit(_X, _y)\n\u001b[1;32m 20\u001b[0m \u001b[39mreturn\u001b[39;00m clf\n", "File \u001b[0;32m~/.virtualenvs/trustyai-explainability-python-examples/lib/python3.10/site-packages/xgboost/core.py:436\u001b[0m, in \u001b[0;36m_deprecate_positional_args..inner_f\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 434\u001b[0m \u001b[39mfor\u001b[39;00m k, arg \u001b[39min\u001b[39;00m \u001b[39mzip\u001b[39m(sig\u001b[39m.\u001b[39mparameters, args):\n\u001b[1;32m 435\u001b[0m kwargs[k] \u001b[39m=\u001b[39m arg\n\u001b[0;32m--> 436\u001b[0m \u001b[39mreturn\u001b[39;00m f(\u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n", "File \u001b[0;32m~/.virtualenvs/trustyai-explainability-python-examples/lib/python3.10/site-packages/xgboost/sklearn.py:1158\u001b[0m, in \u001b[0;36mXGBClassifier.fit\u001b[0;34m(self, X, y, sample_weight, base_margin, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model, sample_weight_eval_set, base_margin_eval_set, feature_weights, callbacks)\u001b[0m\n\u001b[1;32m 1153\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(X\u001b[39m.\u001b[39mshape) \u001b[39m!=\u001b[39m \u001b[39m2\u001b[39m:\n\u001b[1;32m 1154\u001b[0m \u001b[39m# Simply raise an error here since there might be many\u001b[39;00m\n\u001b[1;32m 1155\u001b[0m \u001b[39m# different ways of reshaping\u001b[39;00m\n\u001b[1;32m 1156\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\u001b[39m\"\u001b[39m\u001b[39mPlease reshape the input data X into 2-dimensional matrix.\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[0;32m-> 1158\u001b[0m train_dmatrix, evals \u001b[39m=\u001b[39m _wrap_evaluation_matrices(\n\u001b[1;32m 1159\u001b[0m missing\u001b[39m=\u001b[39;49m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mmissing,\n\u001b[1;32m 1160\u001b[0m X\u001b[39m=\u001b[39;49mX,\n\u001b[1;32m 1161\u001b[0m y\u001b[39m=\u001b[39;49my,\n\u001b[1;32m 1162\u001b[0m group\u001b[39m=\u001b[39;49m\u001b[39mNone\u001b[39;49;00m,\n\u001b[1;32m 1163\u001b[0m qid\u001b[39m=\u001b[39;49m\u001b[39mNone\u001b[39;49;00m,\n\u001b[1;32m 1164\u001b[0m sample_weight\u001b[39m=\u001b[39;49msample_weight,\n\u001b[1;32m 1165\u001b[0m base_margin\u001b[39m=\u001b[39;49mbase_margin,\n\u001b[1;32m 1166\u001b[0m feature_weights\u001b[39m=\u001b[39;49mfeature_weights,\n\u001b[1;32m 1167\u001b[0m eval_set\u001b[39m=\u001b[39;49meval_set,\n\u001b[1;32m 1168\u001b[0m sample_weight_eval_set\u001b[39m=\u001b[39;49msample_weight_eval_set,\n\u001b[1;32m 1169\u001b[0m base_margin_eval_set\u001b[39m=\u001b[39;49mbase_margin_eval_set,\n\u001b[1;32m 1170\u001b[0m eval_group\u001b[39m=\u001b[39;49m\u001b[39mNone\u001b[39;49;00m,\n\u001b[1;32m 1171\u001b[0m eval_qid\u001b[39m=\u001b[39;49m\u001b[39mNone\u001b[39;49;00m,\n\u001b[1;32m 1172\u001b[0m create_dmatrix\u001b[39m=\u001b[39;49m\u001b[39mlambda\u001b[39;49;00m \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs: DMatrix(nthread\u001b[39m=\u001b[39;49m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mn_jobs, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs),\n\u001b[1;32m 1173\u001b[0m label_transform\u001b[39m=\u001b[39;49mlabel_transform,\n\u001b[1;32m 1174\u001b[0m )\n\u001b[1;32m 1176\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_Booster \u001b[39m=\u001b[39m train(\n\u001b[1;32m 1177\u001b[0m params,\n\u001b[1;32m 1178\u001b[0m train_dmatrix,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 1187\u001b[0m callbacks\u001b[39m=\u001b[39mcallbacks,\n\u001b[1;32m 1188\u001b[0m )\n\u001b[1;32m 1190\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m callable(\u001b[39mself\u001b[39m\u001b[39m.\u001b[39mobjective):\n", "File \u001b[0;32m~/.virtualenvs/trustyai-explainability-python-examples/lib/python3.10/site-packages/xgboost/sklearn.py:236\u001b[0m, in \u001b[0;36m_wrap_evaluation_matrices\u001b[0;34m(missing, X, y, group, qid, sample_weight, base_margin, feature_weights, eval_set, sample_weight_eval_set, base_margin_eval_set, eval_group, eval_qid, create_dmatrix, label_transform)\u001b[0m\n\u001b[1;32m 216\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39m_wrap_evaluation_matrices\u001b[39m(\n\u001b[1;32m 217\u001b[0m missing: \u001b[39mfloat\u001b[39m,\n\u001b[1;32m 218\u001b[0m X: Any,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 231\u001b[0m label_transform: Callable \u001b[39m=\u001b[39m \u001b[39mlambda\u001b[39;00m x: x,\n\u001b[1;32m 232\u001b[0m ) \u001b[39m-\u001b[39m\u001b[39m>\u001b[39m Tuple[Any, Optional[List[Tuple[Any, \u001b[39mstr\u001b[39m]]]]:\n\u001b[1;32m 233\u001b[0m \u001b[39m \u001b[39m\u001b[39m\"\"\"Convert array_like evaluation matrices into DMatrix. Perform validation on the way.\u001b[39;00m\n\u001b[1;32m 234\u001b[0m \n\u001b[1;32m 235\u001b[0m \u001b[39m \"\"\"\u001b[39;00m\n\u001b[0;32m--> 236\u001b[0m train_dmatrix \u001b[39m=\u001b[39m create_dmatrix(\n\u001b[1;32m 237\u001b[0m data\u001b[39m=\u001b[39;49mX,\n\u001b[1;32m 238\u001b[0m label\u001b[39m=\u001b[39;49mlabel_transform(y),\n\u001b[1;32m 239\u001b[0m group\u001b[39m=\u001b[39;49mgroup,\n\u001b[1;32m 240\u001b[0m qid\u001b[39m=\u001b[39;49mqid,\n\u001b[1;32m 241\u001b[0m weight\u001b[39m=\u001b[39;49msample_weight,\n\u001b[1;32m 242\u001b[0m base_margin\u001b[39m=\u001b[39;49mbase_margin,\n\u001b[1;32m 243\u001b[0m feature_weights\u001b[39m=\u001b[39;49mfeature_weights,\n\u001b[1;32m 244\u001b[0m missing\u001b[39m=\u001b[39;49mmissing,\n\u001b[1;32m 245\u001b[0m )\n\u001b[1;32m 247\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mvalidate_or_none\u001b[39m(meta: Optional[List], name: \u001b[39mstr\u001b[39m) \u001b[39m-\u001b[39m\u001b[39m>\u001b[39m List:\n\u001b[1;32m 248\u001b[0m \u001b[39mif\u001b[39;00m meta \u001b[39mis\u001b[39;00m \u001b[39mNone\u001b[39;00m:\n", "File \u001b[0;32m~/.virtualenvs/trustyai-explainability-python-examples/lib/python3.10/site-packages/xgboost/sklearn.py:1172\u001b[0m, in \u001b[0;36mXGBClassifier.fit..\u001b[0;34m(**kwargs)\u001b[0m\n\u001b[1;32m 1153\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(X\u001b[39m.\u001b[39mshape) \u001b[39m!=\u001b[39m \u001b[39m2\u001b[39m:\n\u001b[1;32m 1154\u001b[0m \u001b[39m# Simply raise an error here since there might be many\u001b[39;00m\n\u001b[1;32m 1155\u001b[0m \u001b[39m# different ways of reshaping\u001b[39;00m\n\u001b[1;32m 1156\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\u001b[39m\"\u001b[39m\u001b[39mPlease reshape the input data X into 2-dimensional matrix.\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[1;32m 1158\u001b[0m train_dmatrix, evals \u001b[39m=\u001b[39m _wrap_evaluation_matrices(\n\u001b[1;32m 1159\u001b[0m missing\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39mmissing,\n\u001b[1;32m 1160\u001b[0m X\u001b[39m=\u001b[39mX,\n\u001b[1;32m 1161\u001b[0m y\u001b[39m=\u001b[39my,\n\u001b[1;32m 1162\u001b[0m group\u001b[39m=\u001b[39m\u001b[39mNone\u001b[39;00m,\n\u001b[1;32m 1163\u001b[0m qid\u001b[39m=\u001b[39m\u001b[39mNone\u001b[39;00m,\n\u001b[1;32m 1164\u001b[0m sample_weight\u001b[39m=\u001b[39msample_weight,\n\u001b[1;32m 1165\u001b[0m base_margin\u001b[39m=\u001b[39mbase_margin,\n\u001b[1;32m 1166\u001b[0m feature_weights\u001b[39m=\u001b[39mfeature_weights,\n\u001b[1;32m 1167\u001b[0m eval_set\u001b[39m=\u001b[39meval_set,\n\u001b[1;32m 1168\u001b[0m sample_weight_eval_set\u001b[39m=\u001b[39msample_weight_eval_set,\n\u001b[1;32m 1169\u001b[0m base_margin_eval_set\u001b[39m=\u001b[39mbase_margin_eval_set,\n\u001b[1;32m 1170\u001b[0m eval_group\u001b[39m=\u001b[39m\u001b[39mNone\u001b[39;00m,\n\u001b[1;32m 1171\u001b[0m eval_qid\u001b[39m=\u001b[39m\u001b[39mNone\u001b[39;00m,\n\u001b[0;32m-> 1172\u001b[0m create_dmatrix\u001b[39m=\u001b[39m\u001b[39mlambda\u001b[39;00m \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs: DMatrix(nthread\u001b[39m=\u001b[39;49m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mn_jobs, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs),\n\u001b[1;32m 1173\u001b[0m label_transform\u001b[39m=\u001b[39mlabel_transform,\n\u001b[1;32m 1174\u001b[0m )\n\u001b[1;32m 1176\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_Booster \u001b[39m=\u001b[39m train(\n\u001b[1;32m 1177\u001b[0m params,\n\u001b[1;32m 1178\u001b[0m train_dmatrix,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 1187\u001b[0m callbacks\u001b[39m=\u001b[39mcallbacks,\n\u001b[1;32m 1188\u001b[0m )\n\u001b[1;32m 1190\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m callable(\u001b[39mself\u001b[39m\u001b[39m.\u001b[39mobjective):\n", "File \u001b[0;32m~/.virtualenvs/trustyai-explainability-python-examples/lib/python3.10/site-packages/xgboost/core.py:436\u001b[0m, in \u001b[0;36m_deprecate_positional_args..inner_f\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 434\u001b[0m \u001b[39mfor\u001b[39;00m k, arg \u001b[39min\u001b[39;00m \u001b[39mzip\u001b[39m(sig\u001b[39m.\u001b[39mparameters, args):\n\u001b[1;32m 435\u001b[0m kwargs[k] \u001b[39m=\u001b[39m arg\n\u001b[0;32m--> 436\u001b[0m \u001b[39mreturn\u001b[39;00m f(\u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n", "File \u001b[0;32m~/.virtualenvs/trustyai-explainability-python-examples/lib/python3.10/site-packages/xgboost/core.py:541\u001b[0m, in \u001b[0;36mDMatrix.__init__\u001b[0;34m(self, data, label, weight, base_margin, missing, silent, feature_names, feature_types, nthread, group, qid, label_lower_bound, label_upper_bound, feature_weights, enable_categorical)\u001b[0m\n\u001b[1;32m 537\u001b[0m \u001b[39mreturn\u001b[39;00m\n\u001b[1;32m 539\u001b[0m \u001b[39mfrom\u001b[39;00m \u001b[39m.\u001b[39;00m\u001b[39mdata\u001b[39;00m \u001b[39mimport\u001b[39;00m dispatch_data_backend\n\u001b[0;32m--> 541\u001b[0m handle, feature_names, feature_types \u001b[39m=\u001b[39m dispatch_data_backend(\n\u001b[1;32m 542\u001b[0m data,\n\u001b[1;32m 543\u001b[0m missing\u001b[39m=\u001b[39;49m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mmissing,\n\u001b[1;32m 544\u001b[0m threads\u001b[39m=\u001b[39;49m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mnthread,\n\u001b[1;32m 545\u001b[0m feature_names\u001b[39m=\u001b[39;49mfeature_names,\n\u001b[1;32m 546\u001b[0m feature_types\u001b[39m=\u001b[39;49mfeature_types,\n\u001b[1;32m 547\u001b[0m enable_categorical\u001b[39m=\u001b[39;49menable_categorical,\n\u001b[1;32m 548\u001b[0m )\n\u001b[1;32m 549\u001b[0m \u001b[39massert\u001b[39;00m handle \u001b[39mis\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39mNone\u001b[39;00m\n\u001b[1;32m 550\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mhandle \u001b[39m=\u001b[39m handle\n", "File \u001b[0;32m~/.virtualenvs/trustyai-explainability-python-examples/lib/python3.10/site-packages/xgboost/data.py:573\u001b[0m, in \u001b[0;36mdispatch_data_backend\u001b[0;34m(data, missing, threads, feature_names, feature_types, enable_categorical)\u001b[0m\n\u001b[1;32m 571\u001b[0m \u001b[39mreturn\u001b[39;00m _from_tuple(data, missing, feature_names, feature_types)\n\u001b[1;32m 572\u001b[0m \u001b[39mif\u001b[39;00m _is_pandas_df(data):\n\u001b[0;32m--> 573\u001b[0m \u001b[39mreturn\u001b[39;00m _from_pandas_df(data, enable_categorical, missing, threads,\n\u001b[1;32m 574\u001b[0m feature_names, feature_types)\n\u001b[1;32m 575\u001b[0m \u001b[39mif\u001b[39;00m _is_pandas_series(data):\n\u001b[1;32m 576\u001b[0m \u001b[39mreturn\u001b[39;00m _from_pandas_series(data, missing, threads, feature_names,\n\u001b[1;32m 577\u001b[0m feature_types)\n", "File \u001b[0;32m~/.virtualenvs/trustyai-explainability-python-examples/lib/python3.10/site-packages/xgboost/data.py:258\u001b[0m, in \u001b[0;36m_from_pandas_df\u001b[0;34m(data, enable_categorical, missing, nthread, feature_names, feature_types)\u001b[0m\n\u001b[1;32m 256\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39m_from_pandas_df\u001b[39m(data, enable_categorical, missing, nthread,\n\u001b[1;32m 257\u001b[0m feature_names, feature_types):\n\u001b[0;32m--> 258\u001b[0m data, feature_names, feature_types \u001b[39m=\u001b[39m _transform_pandas_df(\n\u001b[1;32m 259\u001b[0m data, enable_categorical, feature_names, feature_types)\n\u001b[1;32m 260\u001b[0m \u001b[39mreturn\u001b[39;00m _from_numpy_array(data, missing, nthread, feature_names,\n\u001b[1;32m 261\u001b[0m feature_types)\n", "File \u001b[0;32m~/.virtualenvs/trustyai-explainability-python-examples/lib/python3.10/site-packages/xgboost/data.py:223\u001b[0m, in \u001b[0;36m_transform_pandas_df\u001b[0;34m(data, enable_categorical, feature_names, feature_types, meta, meta_type)\u001b[0m\n\u001b[1;32m 215\u001b[0m bad_fields \u001b[39m=\u001b[39m [\n\u001b[1;32m 216\u001b[0m \u001b[39mstr\u001b[39m(data\u001b[39m.\u001b[39mcolumns[i]) \u001b[39mfor\u001b[39;00m i, dtype \u001b[39min\u001b[39;00m \u001b[39menumerate\u001b[39m(data_dtypes)\n\u001b[1;32m 217\u001b[0m \u001b[39mif\u001b[39;00m dtype\u001b[39m.\u001b[39mname \u001b[39mnot\u001b[39;00m \u001b[39min\u001b[39;00m _pandas_dtype_mapper\n\u001b[1;32m 218\u001b[0m ]\n\u001b[1;32m 220\u001b[0m msg \u001b[39m=\u001b[39m \u001b[39m\"\"\"\u001b[39m\u001b[39mDataFrame.dtypes for data must be int, float, bool or categorical. When\u001b[39m\n\u001b[1;32m 221\u001b[0m \u001b[39m categorical type is supplied, DMatrix parameter\u001b[39m\n\u001b[1;32m 222\u001b[0m \u001b[39m `enable_categorical` must be set to `True`.\u001b[39m\u001b[39m\"\"\"\u001b[39m\n\u001b[0;32m--> 223\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(msg \u001b[39m+\u001b[39m \u001b[39m'\u001b[39m\u001b[39m, \u001b[39m\u001b[39m'\u001b[39m\u001b[39m.\u001b[39mjoin(bad_fields))\n\u001b[1;32m 225\u001b[0m \u001b[39mif\u001b[39;00m feature_names \u001b[39mis\u001b[39;00m \u001b[39mNone\u001b[39;00m \u001b[39mand\u001b[39;00m meta \u001b[39mis\u001b[39;00m \u001b[39mNone\u001b[39;00m:\n\u001b[1;32m 226\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39misinstance\u001b[39m(data\u001b[39m.\u001b[39mcolumns, MultiIndex):\n", "\u001b[0;31mValueError\u001b[0m: DataFrame.dtypes for data must be int, float, bool or categorical. When\n categorical type is supplied, DMatrix parameter\n `enable_categorical` must be set to `True`.race, gender" ] } ], "source": [ "from xgboost import XGBClassifier\n", "\n", "\n", "def train(dataset):\n", " df = pd.read_csv(dataset)\n", "\n", " categories = ['race', 'gender', 'income']\n", " for f in categories:\n", " df[f] = df[f].astype('category')\n", " df['age'] = df['age'].astype('int')\n", "\n", " _X = df[[\"age\", \"race\", \"gender\"]]\n", " _y = df.income\n", "\n", " clf = XGBClassifier(objective=\"binary:logistic\", \n", " enable_categorical=True, \n", " use_label_encoder=False,\n", " eval_metric='logloss')\n", " clf.fit(_X, _y)\n", " return clf\n", "\n", "xgb = train(\"data/income-biased.zip\")\n" ] }, { "cell_type": "code", "execution_count": 46, "id": "2b027d0b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-0.06288176602997649\n" ] } ], "source": [ "from trustyai.model import Model\n", "from trustyai.metrics.fairness.group import statistical_parity_difference_model\n", "\n", "X = nobias[[\"age\", \"race\", \"gender\"]]\n", "\n", "model = Model(xgb.predict, dataframe_input=True, output_names=[\"approved\"])\n", "score = statistical_parity_difference_model(samples=X,\n", " model=model,\n", " privilege_columns=[\"gender\"],\n", " privilege_values=[1],\n", " favorable=[favorable])\n", "print(score)" ] }, { "cell_type": "markdown", "id": "daa3794e", "metadata": {}, "source": [ "## Disparate impact ratio" ] }, { "cell_type": "code", "execution_count": 43, "id": "7df08157", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.03798125763334818\n" ] } ], "source": [ "from trustyai.metrics.fairness.group import disparate_impact_ratio_model\n", "\n", "score = disparate_impact_ratio_model(samples=X,\n", " model=model,\n", " privilege_columns=[\"gender\"],\n", " privilege_values=[1],\n", " favorable=[favorable])\n", "print(score)" ] }, { "cell_type": "markdown", "id": "52a499d3", "metadata": {}, "source": [ "## Average Odds Difference" ] }, { "cell_type": "code", "execution_count": 44, "id": "18061c05", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "9.581224702515101e-14\n" ] } ], "source": [ "from trustyai.metrics.fairness.group import average_odds_difference_model\n", "\n", "score = average_odds_difference_model(samples=X,\n", " model=model,\n", " privilege_columns=[\"gender\"],\n", " privilege_values=[1],\n", " positive_class=[1])\n", "print(score)" ] }, { "cell_type": "code", "execution_count": null, "id": "ad3d2363", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "trustyai-explainability-python-examples", "language": "python", "name": "trustyai-explainability-python-examples" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" } }, "nbformat": 4, "nbformat_minor": 5 }