{ "cells": [ { "cell_type": "markdown", "id": "62e70a2a", "metadata": {}, "source": [ "# Data\n", "\n", "For the illustration of the group fairness metrics in TrustyAI, two synthetic datasets were created with the same input features and outcome types. \n", "The outcome is whether a certain invidual reaches a 50k income threshold by using age, race and gender as categorical inputs and both datasets consist of $N=10000$ data points.\n", "The gender values are allocated with a proportion of 20% to `gender=0` and 80% to `gender=1`.\n", "\n", "Both datasets have an increasing likelihood (with uniform probability) of having a positive outcome with age, regardless of race or gender.\n", "The first dataset, deemed _unbiased_, simply allocates the income value with an uniform random value, regardless of race or gender.\n", "The second dataset, deemed _biased_, allocates a positive outcome to `gender=0` with a lower probability than `gender=1`." ] }, { "cell_type": "code", "execution_count": 1, "id": "6de2a925", "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "id": "98cd9647", "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(\"data/income-unbiased.zip\", index_col=False)" ] }, { "cell_type": "code", "execution_count": 3, "id": "be16cc2e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | age | \n", "race | \n", "gender | \n", "income | \n", "
|---|---|---|---|---|
| 0 | \n", "13 | \n", "0 | \n", "0 | \n", "0 | \n", "
| 1 | \n", "65 | \n", "7 | \n", "0 | \n", "1 | \n", "
| 2 | \n", "71 | \n", "6 | \n", "1 | \n", "0 | \n", "
| 3 | \n", "38 | \n", "1 | \n", "1 | \n", "1 | \n", "
| 4 | \n", "42 | \n", "0 | \n", "0 | \n", "1 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 9995 | \n", "20 | \n", "5 | \n", "1 | \n", "0 | \n", "
| 9996 | \n", "34 | \n", "2 | \n", "1 | \n", "0 | \n", "
| 9997 | \n", "25 | \n", "2 | \n", "1 | \n", "1 | \n", "
| 9998 | \n", "73 | \n", "5 | \n", "1 | \n", "1 | \n", "
| 9999 | \n", "58 | \n", "3 | \n", "1 | \n", "1 | \n", "
10000 rows × 4 columns
\n", "