{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Protodash: NHANES (CDC) data example\n", "- This notebook shows an example of how to use the ProtodashExplainer defined in [AIX360](https://github.com/IBM/AIX360/) to generate prototypes from (training/test) data. The notebook uses one of the [NHANES CDC questionnaire dataset](https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Questionnaire&CycleBeginYear=2013) related to incomes of individuals.\n", "- ProtodashExplainer is an implementation of the [Protodash algorithm](https://arxiv.org/abs/1707.01212)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Protodash Explainer examples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Import statements" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "from sklearn.preprocessing import OneHotEncoder\n", "\n", "from aix360.algorithms.protodash import ProtodashExplainer, get_Gaussian_Data\n", "from aix360.datasets import CDCDataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Load NHANES dataset from CDC " ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "nhanes = CDCDataset()\n", "nhanes_files = nhanes.get_csv_file_names()\n", "(nhanesinfo, _, _) = nhanes._cdc_files_info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "#### Explore NHANES Income questionnaire dataset\n", "\n", "Now let us explore the income questionnaire dataset and find out the types of responses received in the survey. Each column in this dataset corresponds to a question and each row denotes the answers given by a respondent to those questions. Both column names and answers by respondents are encoded. For example, 'SEQN' denotes the sequence number assigned to a respondent and 'IND235' corresponds to a question about monthly family income. As seen below, in most cases a value of 1 implies \"Yes\" to the question, while a value of 2 implies \"No\". More details about the income questionaire and how questions and answers are encoded can be seen [here](https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/INQ_H.htm)\n", "\n", "|Column |Description | Values and Meaning|\n", "|-------|----------------------------|---------|\n", "|SEQN | Respondent sequence number |\n", "|INQ020 | Income from wages/salaries |1->Yes, 2->No, 7->Refused, 9->Don't know|\n", "|INQ012 | Income from self employment|1->Yes, 2->No, 7->Refused, 9->Don't know|\n", "|INQ030 | Income from Social Security or RR |1->Yes, 2->No, 7->Refused, 9->Don't know|\n", "|INQ060 | Income from other disability pension |1->Yes, 2->No, 7->Refused, 9->Don't know|\n", "|INQ080 | Income from retirement/survivor pension |1->Yes, 2->No, 7->Refused, 9->Don't know|\n", "|INQ090 | Income from Supplemental Security Income |1->Yes, 2->No, 7->Refused, 9->Don't know|\n", "|INQ132 | Income from state/county cash assistance |1->Yes, 2->No, 7->Refused, 9->Don't know|\n", "|INQ140 | Income from interest/dividends or rental |1->Yes, 2->No, 7->Refused, 9->Don't know|\n", "|INQ150 | Income from other sources |1->Yes, 2->No, 7->Refused, 9->Don't know|\n", "|IND235 | Monthly family income |1-12->Increasing income brackets, 77->Refused, 99->Don't know|\n", "|INDFMMPI | Family monthly poverty level index |0-5->Higher value more affluent|\n", "|INDFMMPC | Family monthly poverty level category |1-3->Increasing INDFMMPI brackets, 7->Refused, 9->Don't know|\n", "|INQ244 | Family has savings more than $5000 |1->Yes, 2->No, 7->Refused, 9->Don't know|\n", "|IND247 | Total savings/cash assets for the family |1-6->Increasing savings brackets, 77->Refused, 99->Don't know|" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Answers given by some respondents to the income questionnaire:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Respondent sequence numberIncome from wages/salariesIncome from self employmentIncome from Social Security or RRIncome from other disability pensionIncome from retirement/survivor pensionIncome from Supplemental Security IncomeIncome from state/county cash assistanceIncome from interest/dividends or rentalIncome from other sourcesMonthly family incomeFamily monthly poverty level indexFamily monthly poverty level categoryFamily has savings more than $5000Total savings/cash assets for the family
073557.02.02.01.02.02.02.02.02.02.04.00.861.09.0NaN
173558.01.01.01.02.02.02.02.01.02.05.00.921.01.0NaN
273559.02.02.01.02.01.02.02.01.02.010.04.373.0NaNNaN
373560.01.02.02.02.02.02.02.02.01.09.02.523.0NaNNaN
473561.02.02.01.02.02.02.02.02.02.011.05.003.0NaNNaN
\n", "
" ], "text/plain": [ " Respondent sequence number Income from wages/salaries \\\n", "0 73557.0 2.0 \n", "1 73558.0 1.0 \n", "2 73559.0 2.0 \n", "3 73560.0 1.0 \n", "4 73561.0 2.0 \n", "\n", " Income from self employment Income from Social Security or RR \\\n", "0 2.0 1.0 \n", "1 1.0 1.0 \n", "2 2.0 1.0 \n", "3 2.0 2.0 \n", "4 2.0 1.0 \n", "\n", " Income from other disability pension \\\n", "0 2.0 \n", "1 2.0 \n", "2 2.0 \n", "3 2.0 \n", "4 2.0 \n", "\n", " Income from retirement/survivor pension \\\n", "0 2.0 \n", "1 2.0 \n", "2 1.0 \n", "3 2.0 \n", "4 2.0 \n", "\n", " Income from Supplemental Security Income \\\n", "0 2.0 \n", "1 2.0 \n", "2 2.0 \n", "3 2.0 \n", "4 2.0 \n", "\n", " Income from state/county cash assistance \\\n", "0 2.0 \n", "1 2.0 \n", "2 2.0 \n", "3 2.0 \n", "4 2.0 \n", "\n", " Income from interest/dividends or rental Income from other sources \\\n", "0 2.0 2.0 \n", "1 1.0 2.0 \n", "2 1.0 2.0 \n", "3 2.0 1.0 \n", "4 2.0 2.0 \n", "\n", " Monthly family income Family monthly poverty level index \\\n", "0 4.0 0.86 \n", "1 5.0 0.92 \n", "2 10.0 4.37 \n", "3 9.0 2.52 \n", "4 11.0 5.00 \n", "\n", " Family monthly poverty level category Family has savings more than $5000 \\\n", "0 1.0 9.0 \n", "1 1.0 1.0 \n", "2 3.0 NaN \n", "3 3.0 NaN \n", "4 3.0 NaN \n", "\n", " Total savings/cash assets for the family \n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# replace encoded column names by the associated question text. \n", "df_inc = nhanes.get_csv_file('INQ_H.csv')\n", "df_inc.columns[0]\n", "dict_inc = {\n", "'SEQN': 'Respondent sequence number', \n", "'INQ020': 'Income from wages/salaries',\n", "'INQ012': 'Income from self employment',\n", "'INQ030':'Income from Social Security or RR',\n", "'INQ060': 'Income from other disability pension', \n", "'INQ080': 'Income from retirement/survivor pension',\n", "'INQ090': 'Income from Supplemental Security Income',\n", "'INQ132': 'Income from state/county cash assistance', \n", "'INQ140': 'Income from interest/dividends or rental', \n", "'INQ150': 'Income from other sources',\n", "'IND235': 'Monthly family income',\n", "'INDFMMPI': 'Family monthly poverty level index', \n", "'INDFMMPC': 'Family monthly poverty level category',\n", "'INQ244': 'Family has savings more than $5000',\n", "'IND247': 'Total savings/cash assets for the family'\n", "}\n", "qlist = []\n", "for i in range(len(df_inc.columns)):\n", " qlist.append(dict_inc[df_inc.columns[i]])\n", "df_inc.columns = qlist\n", "print(\"Answers given by some respondents to the income questionnaire:\")\n", "df_inc.head(5)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of respondents to Income questionnaire: 10175\n", "Distribution of answers to 'monthly family income' and 'Family savings' questions:\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "print(\"Number of respondents to Income questionnaire:\", df_inc.shape[0])\n", "print(\"Distribution of answers to \\'monthly family income\\' and \\'Family savings\\' questions:\")\n", "\n", "fig, axes = plt.subplots(1, 2, figsize=(10,5))\n", "fig.subplots_adjust(wspace=0.5)\n", "hist1 = df_inc['Monthly family income'].value_counts().plot(kind='bar', ax=axes[0])\n", "hist2 = df_inc['Family has savings more than $5000'].value_counts().plot(kind='bar', ax=axes[1])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "#### Summarize NHANES Income Questionnaire dataset using Prototypes\n", "\n", "Consider a social scientist who would like to quickly obtain a summary report of this dataset in terms of types of people that span this dataset. Is it possible to summarize this dataset by looking at answers given by a few representative/prototypical respondents? \n", "\n", "We now show how the ProtodashExplainer can be used to obtain a few prototypical respondents (about 10 in this example) that span the diverse set of individuals answering the income questionnaire making it easy for the social scientist to summarize the dataset." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/anaconda2/envs/env3/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py:415: FutureWarning: The handling of integer data will change in version 0.22. Currently, the categories are determined based on the range [0, max(values)], while in the future they will be determined based on the unique values.\n", "If you want the future behaviour and silence this warning, you can specify \"categories='auto'\".\n", "In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.\n", " warnings.warn(msg, FutureWarning)\n", "/opt/anaconda2/envs/env3/lib/python3.6/site-packages/cvxopt/coneprog.py:2111: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison\n", " if 'x' in initvals:\n", "/opt/anaconda2/envs/env3/lib/python3.6/site-packages/cvxopt/coneprog.py:2116: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison\n", " if 's' in initvals:\n", "/opt/anaconda2/envs/env3/lib/python3.6/site-packages/cvxopt/coneprog.py:2131: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison\n", " if 'y' in initvals:\n", "/opt/anaconda2/envs/env3/lib/python3.6/site-packages/cvxopt/coneprog.py:2136: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison\n", " if 'z' in initvals:\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -2.0000e+04 4e+00 1e+00 1e+00\n", " 1: 3.9668e+01 -4.7629e+05 9e+01 1e+00 1e+00\n", " 2: 3.1760e+00 -2.0407e+06 5e+02 1e+00 1e+00\n", " 3: 2.2364e+01 -1.2096e+08 2e+04 1e+00 1e+00\n", " 4: 3.7900e+01 -2.8195e+11 8e+07 1e+00 1e+00\n", " 5: 4.5923e+05 -3.1803e+18 3e+18 5e-13 6e-03\n", " 6: 4.5923e+05 -3.1803e+16 3e+16 5e-15 1e-03\n", " 7: 4.5923e+05 -3.1803e+14 3e+14 2e-16 4e-05\n", " 8: 4.5923e+05 -3.1803e+12 3e+12 3e-17 5e-07\n", " 9: 4.5919e+05 -3.1804e+10 3e+10 9e-17 5e-09\n", "10: 4.5530e+05 -3.1930e+08 3e+08 1e-16 4e-11\n", "11: 2.5225e+05 -3.8443e+06 4e+06 2e-16 5e-13\n", "12: 3.9004e+04 -7.9611e+04 1e+05 2e-16 6e-14\n", "13: 5.5412e+03 -6.2407e+03 1e+04 3e-16 1e-14\n", "14: 7.6180e+02 -9.4756e+02 2e+03 2e-16 1e-14\n", "15: 9.5649e+01 -1.4835e+02 2e+02 1e-16 2e-15\n", "16: 7.4120e+00 -2.6974e+01 3e+01 8e-17 0e+00\n", "17: -2.7382e+00 -7.1060e+00 4e+00 6e-17 1e-16\n", "18: -3.7687e+00 -4.4831e+00 7e-01 6e-17 2e-17\n", "19: -3.8264e+00 -3.8637e+00 4e-02 1e-16 6e-17\n", "20: -3.8268e+00 -3.8272e+00 4e-04 3e-16 4e-17\n", "21: -3.8268e+00 -3.8268e+00 4e-06 3e-17 1e-16\n", "22: -3.8268e+00 -3.8268e+00 4e-08 2e-16 1e-16\n", "Optimal solution found.\n", " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -3.0000e+04 6e+00 1e+00 1e+00\n", " 1: 6.0751e+01 -8.5638e+05 2e+02 1e+00 1e+00\n", " 2: 2.1167e+00 -3.4967e+06 8e+02 1e+00 1e+00\n", " 3: 3.8908e+00 -1.0796e+07 2e+03 1e+00 1e+00\n", " 4: 3.6949e+00 -5.5972e+07 1e+04 1e+00 1e+00\n", " 5: 3.5683e+00 -1.2098e+09 2e+05 1e+00 1e+00\n", " 6: 3.5536e+00 -6.2578e+11 1e+08 1e+00 1e+00\n", " 7: 1.3153e+09 -1.2640e+19 1e+19 4e-13 2e-02\n", " 8: 1.3153e+09 -1.2640e+17 1e+17 4e-15 5e-03\n", " 9: 1.3153e+09 -1.2640e+15 1e+15 2e-16 2e-04\n", "10: 1.3153e+09 -1.2642e+13 1e+13 2e-16 9e-07\n", "11: 1.3150e+09 -1.2865e+11 1e+11 2e-16 1e-08\n", "12: 1.2843e+09 -3.4720e+09 5e+09 9e-17 3e-07\n", "13: 1.9046e+08 -1.5375e+09 2e+09 8e-17 2e-08\n", "14: 5.7362e+07 -8.6782e+07 1e+08 1e-16 1e-09\n", "15: 8.4664e+06 -9.4546e+06 2e+07 7e-17 1e-12\n", "16: 1.2184e+06 -1.3696e+06 3e+06 5e-17 3e-13\n", "17: 1.7441e+05 -1.9431e+05 4e+05 2e-16 1e-13\n", "18: 2.4823e+04 -2.7964e+04 5e+04 3e-16 7e-14\n", "19: 3.4862e+03 -4.0738e+03 8e+03 2e-16 2e-14\n", "20: 4.7129e+02 -6.1124e+02 1e+03 2e-16 8e-15\n", "21: 5.5606e+01 -9.8898e+01 2e+02 3e-16 7e-16\n", "22: 2.1525e+00 -1.9389e+01 2e+01 2e-16 9e-16\n", "23: -3.4411e+00 -5.9840e+00 3e+00 6e-17 2e-16\n", "24: -3.9122e+00 -4.2332e+00 3e-01 1e-16 8e-17\n", "25: -3.9407e+00 -3.9562e+00 2e-02 2e-16 1e-16\n", "26: -3.9410e+00 -3.9412e+00 2e-04 1e-16 7e-17\n", "27: -3.9410e+00 -3.9410e+00 2e-06 1e-16 1e-16\n", "Optimal solution found.\n", " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -4.0000e+04 8e+00 1e+00 1e+00\n", " 1: 7.4328e+01 -1.2314e+06 2e+02 1e+00 1e+00\n", " 2: 7.4441e+01 -2.0795e+06 4e+02 1e+00 1e+00\n", " 3: 1.0860e+00 -3.4170e+07 7e+03 1e+00 1e+00\n", " 4: 1.0372e+01 -4.2071e+08 8e+04 1e+00 1e+00\n", " 5: 2.5302e+01 -7.9685e+10 2e+07 1e+00 1e+00\n", " 6: 9.1590e+08 -1.8212e+18 2e+18 6e-13 9e-04\n", " 7: 9.1590e+08 -1.8212e+16 2e+16 6e-15 5e-04\n", " 8: 9.1590e+08 -1.8212e+14 2e+14 1e-16 1e-05\n", " 9: 9.1589e+08 -1.8261e+12 2e+12 4e-17 1e-07\n", "10: 9.1407e+08 -2.3159e+10 2e+10 2e-16 2e-09\n", "11: 8.1327e+08 -4.7256e+09 6e+09 2e-16 3e-09\n", "12: 1.1597e+08 -5.1501e+09 5e+09 8e-17 2e-09\n", "13: 4.9799e+07 -2.2036e+08 3e+08 3e-16 9e-11\n", "14: 8.0295e+06 -1.0899e+07 2e+07 2e-16 3e-12\n", "15: 1.1514e+06 -1.2677e+06 2e+06 1e-16 3e-13\n", "16: 1.6479e+05 -1.8397e+05 3e+05 2e-16 7e-14\n", "17: 2.3438e+04 -2.6436e+04 5e+04 3e-16 2e-14\n", "18: 3.2866e+03 -3.8575e+03 7e+03 1e-16 1e-14\n", "19: 4.4218e+02 -5.8075e+02 1e+03 2e-16 5e-15\n", "20: 5.1163e+01 -9.4774e+01 1e+02 1e-16 2e-15\n", "21: 1.3192e+00 -1.8967e+01 2e+01 2e-16 7e-16\n", "22: -3.7193e+00 -6.0761e+00 2e+00 2e-16 1e-16\n", "23: -3.9969e+00 -4.2884e+00 3e-01 9e-17 1e-16\n", "24: -4.0148e+00 -4.0273e+00 1e-02 2e-16 2e-16\n", "25: -4.0150e+00 -4.0151e+00 2e-04 1e-16 1e-16\n", "26: -4.0150e+00 -4.0150e+00 2e-06 9e-17 1e-16\n", "Optimal solution found.\n", " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -5.0000e+04 1e+01 1e+00 1e+00\n", " 1: 8.2641e+01 -1.5517e+06 3e+02 1e+00 1e+00\n", " 2: 1.0048e+02 -2.6299e+06 5e+02 1e+00 1e+00\n", " 3: 2.2702e+01 -1.1726e+07 2e+03 1e+00 1e+00\n", " 4: 5.1243e+01 -7.9666e+07 2e+04 1e+00 1e+00\n", " 5: 6.3374e+01 -8.3514e+09 2e+06 1e+00 1e+00\n", " 6: 4.1550e+08 -1.9875e+17 2e+17 4e-13 1e-04\n", " 7: 4.1550e+08 -1.9875e+15 2e+15 4e-15 7e-05\n", " 8: 4.1550e+08 -1.9879e+13 2e+13 4e-16 2e-06\n", " 9: 4.1537e+08 -2.0297e+11 2e+11 2e-16 2e-08\n", "10: 4.0286e+08 -6.0761e+09 6e+09 1e-16 7e-10\n", "11: 9.2939e+07 -7.3327e+08 8e+08 2e-16 2e-10\n", "12: 2.0691e+07 -4.1222e+07 6e+07 2e-16 8e-12\n", "13: 3.0021e+06 -3.2465e+06 6e+06 1e-16 5e-13\n", "14: 4.3102e+05 -4.8469e+05 9e+05 1e-16 2e-13\n", "15: 6.1523e+04 -6.8892e+04 1e+05 2e-16 1e-13\n", "16: 8.7028e+03 -9.9729e+03 2e+04 1e-16 5e-14\n", "17: 1.2017e+03 -1.4729e+03 3e+03 6e-17 9e-15\n", "18: 1.5364e+02 -2.2892e+02 4e+02 2e-16 3e-15\n", "19: 1.3653e+01 -4.0529e+01 5e+01 2e-16 1e-15\n", "20: -2.7033e+00 -9.8618e+00 7e+00 2e-16 3e-16\n", "21: -3.9602e+00 -4.7535e+00 8e-01 2e-16 2e-16\n", "22: -4.0450e+00 -4.1201e+00 8e-02 2e-16 2e-16\n", "23: -4.0457e+00 -4.0466e+00 9e-04 1e-16 7e-17\n", "24: -4.0457e+00 -4.0457e+00 9e-06 3e-16 1e-16\n", "25: -4.0457e+00 -4.0457e+00 9e-08 2e-16 1e-16\n", "Optimal solution found.\n", " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -6.0000e+04 1e+01 1e+00 1e+00\n", " 1: 8.4570e+01 -1.8587e+06 3e+02 1e+00 1e+00\n", " 2: 1.4921e+02 -3.0856e+06 6e+02 1e+00 1e+00\n", " 3: 3.2180e+01 -1.3043e+07 3e+03 1e+00 1e+00\n", " 4: 6.9242e+01 -5.7262e+07 1e+04 1e+00 1e+00\n", " 5: 8.6398e+01 -2.2766e+09 5e+05 1e+00 1e+00\n", " 6: 4.6824e+01 -2.8147e+13 2e+10 1e+00 1e+00\n", " 7: 4.1275e+08 -5.6089e+19 6e+19 1e-13 3e-02\n", " 8: 4.1275e+08 -5.6089e+17 6e+17 8e-16 1e-02\n", " 9: 4.1275e+08 -5.6089e+15 6e+15 1e-16 2e-04\n", "10: 4.1275e+08 -5.6094e+13 6e+13 2e-16 3e-06\n", "11: 4.1268e+08 -5.6545e+11 6e+11 3e-16 3e-08\n", "12: 4.0609e+08 -1.0099e+10 1e+10 2e-16 7e-10\n", "13: 1.8116e+08 -1.4110e+09 2e+09 1e-16 9e-10\n", "14: 3.1446e+07 -5.8401e+07 9e+07 1e-16 4e-12\n", "15: 4.7302e+06 -5.3611e+06 1e+07 1e-16 7e-13\n", "16: 6.7942e+05 -7.6196e+05 1e+06 8e-17 2e-13\n", "17: 9.7114e+04 -1.0851e+05 2e+05 3e-16 5e-14\n", "18: 1.3781e+04 -1.5652e+04 3e+04 1e-16 5e-14\n", "19: 1.9198e+03 -2.2954e+03 4e+03 2e-16 9e-15\n", "20: 2.5294e+02 -3.5035e+02 6e+02 2e-16 3e-15\n", "21: 2.6572e+01 -5.9244e+01 9e+01 1e-16 2e-15\n", "22: -1.2555e+00 -1.2944e+01 1e+01 7e-17 4e-16\n", "23: -3.7496e+00 -5.4223e+00 2e+00 1e-16 2e-16\n", "24: -4.0667e+00 -4.5012e+00 4e-01 2e-16 2e-16\n", "25: -4.0819e+00 -4.1027e+00 2e-02 1e-16 2e-16\n", "26: -4.0821e+00 -4.0823e+00 2e-04 9e-17 1e-16\n", "27: -4.0821e+00 -4.0821e+00 2e-06 2e-16 7e-17\n", "Optimal solution found.\n", " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -7.0000e+04 1e+01 1e+00 1e+00\n", " 1: 8.7825e+01 -2.2247e+06 4e+02 1e+00 1e+00\n", " 2: 1.8441e+02 -3.8098e+06 7e+02 1e+00 1e+00\n", " 3: 1.5973e+02 -8.5687e+06 2e+03 1e+00 1e+00\n", " 4: 1.7838e+01 -2.1791e+08 4e+04 1e+00 1e+00\n", " 5: 6.0455e+01 -6.3242e+09 1e+06 1e+00 1e+00\n", " 6: 6.3873e+01 -1.5695e+13 5e+09 1e+00 1e+00\n", " 7: 1.0155e+09 -1.6922e+20 2e+20 2e-13 4e-02\n", " 8: 1.0155e+09 -1.6922e+18 2e+18 2e-15 6e-02\n", " 9: 1.0155e+09 -1.6922e+16 2e+16 2e-16 4e-04\n", "10: 1.0155e+09 -1.6924e+14 2e+14 2e-16 6e-06\n", "11: 1.0155e+09 -1.7046e+12 2e+12 2e-16 6e-08\n", "12: 1.0115e+09 -2.9294e+10 3e+10 2e-16 9e-10\n", "13: 8.6923e+08 -9.7335e+09 1e+10 3e-16 5e-09\n", "14: 2.5628e+08 -6.0824e+09 6e+09 3e-16 2e-09\n", "15: 8.3686e+07 -2.1572e+08 3e+08 2e-16 6e-11\n", "16: 1.2790e+07 -1.4451e+07 3e+07 2e-16 1e-12\n", "17: 1.8366e+06 -2.0483e+06 4e+06 2e-16 3e-13\n", "18: 2.6285e+05 -2.9209e+05 6e+05 1e-16 1e-13\n", "19: 3.7442e+04 -4.2017e+04 8e+04 3e-16 4e-14\n", "20: 5.2741e+03 -6.1079e+03 1e+04 2e-16 2e-14\n", "21: 7.1969e+02 -9.1048e+02 2e+03 1e-16 7e-15\n", "22: 8.8170e+01 -1.4478e+02 2e+02 2e-16 3e-15\n", "23: 5.6093e+00 -2.7141e+01 3e+01 2e-16 1e-15\n", "24: -3.4117e+00 -7.5215e+00 4e+00 2e-16 4e-16\n", "25: -3.9738e+00 -4.7410e+00 8e-01 1e-16 1e-16\n", "26: -4.0912e+00 -4.3263e+00 2e-01 2e-16 6e-17\n", "27: -4.0967e+00 -4.1021e+00 5e-03 2e-16 2e-16\n", "28: -4.0967e+00 -4.0968e+00 6e-05 2e-16 1e-16\n", "29: -4.0967e+00 -4.0967e+00 6e-07 1e-16 7e-17\n", "Optimal solution found.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -8.0000e+04 2e+01 1e+00 1e+00\n", " 1: 9.2714e+01 -2.5967e+06 4e+02 1e+00 1e+00\n", " 2: 2.2268e+02 -4.6524e+06 9e+02 1e+00 1e+00\n", " 3: 2.0020e+02 -8.8808e+06 2e+03 1e+00 1e+00\n", " 4: 7.0492e+01 -5.7775e+07 1e+04 1e+00 1e+00\n", " 5: 1.3244e+02 -7.4658e+08 2e+05 1e+00 1e+00\n", " 6: 1.5210e+02 -4.0931e+11 9e+07 1e+00 1e+00\n", " 7: 7.7582e+08 -8.2049e+18 8e+18 5e-13 2e-03\n", " 8: 7.7582e+08 -8.2049e+16 8e+16 5e-15 1e-03\n", " 9: 7.7582e+08 -8.2051e+14 8e+14 2e-16 3e-05\n", "10: 7.7581e+08 -8.2207e+12 8e+12 2e-16 4e-07\n", "11: 7.7478e+08 -9.7860e+10 1e+11 2e-16 5e-09\n", "12: 7.0303e+08 -1.4965e+10 2e+10 1e-16 7e-10\n", "13: 2.0381e+08 -9.8474e+09 1e+10 2e-16 3e-10\n", "14: 8.2855e+07 -2.7731e+08 4e+08 2e-16 1e-11\n", "15: 1.3061e+07 -1.5647e+07 3e+07 3e-16 2e-12\n", "16: 1.8737e+06 -2.0712e+06 4e+06 2e-16 5e-13\n", "17: 2.6814e+05 -2.9803e+05 6e+05 2e-16 1e-13\n", "18: 3.8192e+04 -4.2849e+04 8e+04 2e-16 6e-14\n", "19: 5.3795e+03 -6.2302e+03 1e+04 1e-16 3e-14\n", "20: 7.3397e+02 -9.2882e+02 2e+03 1e-16 4e-15\n", "21: 8.9893e+01 -1.4772e+02 2e+02 2e-16 2e-15\n", "22: 5.7120e+00 -2.7700e+01 3e+01 1e-16 1e-15\n", "23: -3.4701e+00 -7.6691e+00 4e+00 2e-16 3e-16\n", "24: -4.0143e+00 -4.7901e+00 8e-01 2e-16 9e-17\n", "25: -4.1051e+00 -4.4775e+00 4e-01 2e-16 1e-16\n", "26: -4.1122e+00 -4.1317e+00 2e-02 2e-16 1e-16\n", "27: -4.1123e+00 -4.1125e+00 2e-04 1e-16 6e-17\n", "28: -4.1123e+00 -4.1123e+00 2e-06 3e-16 9e-17\n", "Optimal solution found.\n", " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -9.0000e+04 2e+01 1e+00 1e+00\n", " 1: 9.7205e+01 -3.0140e+06 4e+02 1e+00 1e+00\n", " 2: 2.6361e+02 -5.6333e+06 1e+03 1e+00 1e+00\n", " 3: 2.8257e+02 -9.5981e+06 2e+03 1e+00 1e+00\n", " 4: 9.0415e+01 -3.9255e+07 8e+03 1e+00 1e+00\n", " 5: 1.8815e+02 -3.1673e+08 6e+04 1e+00 1e+00\n", " 6: 1.9863e+02 -1.0198e+11 2e+07 1e+00 1e+00\n", " 7: 9.0570e+08 -2.2295e+18 2e+18 6e-13 8e-04\n", " 8: 9.0570e+08 -2.2295e+16 2e+16 6e-15 3e-04\n", " 9: 9.0570e+08 -2.2297e+14 2e+14 1e-16 5e-06\n", "10: 9.0564e+08 -2.2477e+12 2e+12 1e-16 8e-08\n", "11: 8.9946e+08 -4.0396e+10 4e+10 3e-16 1e-09\n", "12: 6.7081e+08 -1.0301e+10 1e+10 2e-16 5e-10\n", "13: 1.9188e+08 -1.5188e+09 2e+09 2e-16 8e-12\n", "14: 3.7009e+07 -5.7807e+07 9e+07 2e-16 1e-12\n", "15: 5.3302e+06 -5.7183e+06 1e+07 2e-16 9e-13\n", "16: 7.6421e+05 -8.5124e+05 2e+06 2e-16 3e-13\n", "17: 1.0914e+05 -1.2153e+05 2e+05 2e-16 1e-13\n", "18: 1.5483e+04 -1.7565e+04 3e+04 2e-16 4e-14\n", "19: 2.1571e+03 -2.5772e+03 5e+03 2e-16 1e-14\n", "20: 2.8443e+02 -3.9329e+02 7e+02 2e-16 4e-15\n", "21: 3.0051e+01 -6.6409e+01 1e+02 1e-16 2e-15\n", "22: -1.2310e+00 -1.4418e+01 1e+01 3e-16 5e-16\n", "23: -4.0230e+00 -5.3925e+00 1e+00 1e-16 1e-16\n", "24: -4.1164e+00 -4.2053e+00 9e-02 2e-16 1e-16\n", "25: -4.1207e+00 -4.1247e+00 4e-03 2e-16 8e-17\n", "26: -4.1208e+00 -4.1208e+00 4e-05 9e-17 6e-17\n", "27: -4.1208e+00 -4.1208e+00 4e-07 2e-16 7e-17\n", "Optimal solution found.\n", " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -1.0000e+05 2e+01 1e+00 1e+00\n", " 1: 1.0225e+02 -3.4451e+06 5e+02 1e+00 1e+00\n", " 2: 3.0804e+02 -6.7243e+06 1e+03 1e+00 1e+00\n", " 3: 3.5286e+02 -1.0601e+07 2e+03 1e+00 1e+00\n", " 4: 3.2792e+01 -5.8260e+07 1e+04 1e+00 1e+00\n", " 5: 1.3354e+02 -3.5394e+08 7e+04 1e+00 1e+00\n", " 6: 2.5326e+02 -2.5191e+10 5e+06 1e+00 1e+00\n", " 7: 1.1744e+09 -6.0706e+17 6e+17 5e-13 2e-04\n", " 8: 1.1744e+09 -6.0706e+15 6e+15 5e-15 7e-05\n", " 9: 1.1744e+09 -6.0723e+13 6e+13 2e-16 1e-06\n", "10: 1.1740e+09 -6.2356e+11 6e+11 1e-16 2e-08\n", "11: 1.1314e+09 -2.2217e+10 2e+10 2e-16 7e-10\n", "12: 4.6121e+08 -5.8629e+09 6e+09 2e-16 4e-09\n", "13: 1.1244e+08 -2.1075e+08 3e+08 2e-16 1e-10\n", "14: 1.6958e+07 -1.8708e+07 4e+07 2e-16 2e-12\n", "15: 2.4338e+06 -2.7048e+06 5e+06 2e-16 4e-13\n", "16: 3.4827e+05 -3.8617e+05 7e+05 2e-16 2e-13\n", "17: 4.9636e+04 -5.5560e+04 1e+05 1e-16 4e-14\n", "18: 7.0046e+03 -8.0658e+03 2e+04 3e-16 3e-14\n", "19: 9.6128e+02 -1.1974e+03 2e+03 2e-16 7e-15\n", "20: 1.2033e+02 -1.8834e+02 3e+02 2e-16 2e-15\n", "21: 9.2452e+00 -3.4335e+01 4e+01 2e-16 8e-16\n", "22: -3.2786e+00 -8.9128e+00 6e+00 2e-16 3e-16\n", "23: -4.1113e+00 -4.5171e+00 4e-01 2e-16 1e-16\n", "24: -4.1282e+00 -4.1607e+00 3e-02 2e-16 7e-17\n", "25: -4.1295e+00 -4.1306e+00 1e-03 2e-16 8e-17\n", "26: -4.1295e+00 -4.1295e+00 1e-05 2e-16 8e-17\n", "27: -4.1295e+00 -4.1295e+00 1e-07 2e-16 6e-17\n", "Optimal solution found.\n" ] } ], "source": [ "# convert pandas dataframe to numpy\n", "data = df_inc.to_numpy()\n", "\n", "#sort the rows by sequence numbers in 1st column \n", "idx = np.argsort(data[:, 0]) \n", "data = data[idx, :]\n", "\n", "# replace nan's (missing values) with 0's\n", "original = data\n", "original[np.isnan(original)] = 0\n", "\n", "# delete 1st column (sequence numbers)\n", "original = original[:, 1:]\n", "\n", "# one hot encode all features as they are categorical\n", "onehot_encoder = OneHotEncoder(sparse=False)\n", "onehot_encoded = onehot_encoder.fit_transform(original)\n", "\n", "explainer = ProtodashExplainer()\n", "\n", "# call protodash explainer\n", "# S contains indices of the selected prototypes\n", "# W contains importance weights associated with the selected prototypes \n", "(W, S, _) = explainer.explain(onehot_encoded, onehot_encoded, m=10) " ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Respondent sequence numberIncome from wages/salariesIncome from self employmentIncome from Social Security or RRIncome from other disability pensionIncome from retirement/survivor pensionIncome from Supplemental Security IncomeIncome from state/county cash assistanceIncome from interest/dividends or rentalIncome from other sourcesMonthly family incomeFamily monthly poverty level indexFamily monthly poverty level categoryFamily has savings more than $5000Total savings/cash assets for the familyWeights of Prototypes
173775294.01.02.02.02.02.02.02.02.02.03.00.671.0NaNNaN0.18
256176118.01.02.01.02.02.02.02.02.02.07.01.973.02.01.00.14
34373900.02.01.02.02.02.02.02.01.02.012.05.003.0NaNNaN0.11
677880335.01.02.02.01.02.02.02.02.01.06.02.883.02.01.00.09
837681933.02.02.02.02.01.01.01.02.02.02.00.731.02.01.00.07
71874275.01.02.02.02.02.02.02.01.02.05.01.852.01.0NaN0.11
82174378.01.02.02.02.02.02.02.02.02.08.03.483.02.02.00.08
406877625.01.02.01.02.02.02.02.02.01.04.00.991.0NaNNaN0.09
2173578.01.01.02.02.02.02.02.02.02.099.0NaN1.02.01.00.08
362077177.01.02.02.02.01.02.02.02.02.011.04.773.0NaNNaN0.06
\n", "
" ], "text/plain": [ " Respondent sequence number Income from wages/salaries \\\n", "1737 75294.0 1.0 \n", "2561 76118.0 1.0 \n", "343 73900.0 2.0 \n", "6778 80335.0 1.0 \n", "8376 81933.0 2.0 \n", "718 74275.0 1.0 \n", "821 74378.0 1.0 \n", "4068 77625.0 1.0 \n", "21 73578.0 1.0 \n", "3620 77177.0 1.0 \n", "\n", " Income from self employment Income from Social Security or RR \\\n", "1737 2.0 2.0 \n", "2561 2.0 1.0 \n", "343 1.0 2.0 \n", "6778 2.0 2.0 \n", "8376 2.0 2.0 \n", "718 2.0 2.0 \n", "821 2.0 2.0 \n", "4068 2.0 1.0 \n", "21 1.0 2.0 \n", "3620 2.0 2.0 \n", "\n", " Income from other disability pension \\\n", "1737 2.0 \n", "2561 2.0 \n", "343 2.0 \n", "6778 1.0 \n", "8376 2.0 \n", "718 2.0 \n", "821 2.0 \n", "4068 2.0 \n", "21 2.0 \n", "3620 2.0 \n", "\n", " Income from retirement/survivor pension \\\n", "1737 2.0 \n", "2561 2.0 \n", "343 2.0 \n", "6778 2.0 \n", "8376 1.0 \n", "718 2.0 \n", "821 2.0 \n", "4068 2.0 \n", "21 2.0 \n", "3620 1.0 \n", "\n", " Income from Supplemental Security Income \\\n", "1737 2.0 \n", "2561 2.0 \n", "343 2.0 \n", "6778 2.0 \n", "8376 1.0 \n", "718 2.0 \n", "821 2.0 \n", "4068 2.0 \n", "21 2.0 \n", "3620 2.0 \n", "\n", " Income from state/county cash assistance \\\n", "1737 2.0 \n", "2561 2.0 \n", "343 2.0 \n", "6778 2.0 \n", "8376 1.0 \n", "718 2.0 \n", "821 2.0 \n", "4068 2.0 \n", "21 2.0 \n", "3620 2.0 \n", "\n", " Income from interest/dividends or rental Income from other sources \\\n", "1737 2.0 2.0 \n", "2561 2.0 2.0 \n", "343 1.0 2.0 \n", "6778 2.0 1.0 \n", "8376 2.0 2.0 \n", "718 1.0 2.0 \n", "821 2.0 2.0 \n", "4068 2.0 1.0 \n", "21 2.0 2.0 \n", "3620 2.0 2.0 \n", "\n", " Monthly family income Family monthly poverty level index \\\n", "1737 3.0 0.67 \n", "2561 7.0 1.97 \n", "343 12.0 5.00 \n", "6778 6.0 2.88 \n", "8376 2.0 0.73 \n", "718 5.0 1.85 \n", "821 8.0 3.48 \n", "4068 4.0 0.99 \n", "21 99.0 NaN \n", "3620 11.0 4.77 \n", "\n", " Family monthly poverty level category \\\n", "1737 1.0 \n", "2561 3.0 \n", "343 3.0 \n", "6778 3.0 \n", "8376 1.0 \n", "718 2.0 \n", "821 3.0 \n", "4068 1.0 \n", "21 1.0 \n", "3620 3.0 \n", "\n", " Family has savings more than $5000 \\\n", "1737 NaN \n", "2561 2.0 \n", "343 NaN \n", "6778 2.0 \n", "8376 2.0 \n", "718 1.0 \n", "821 2.0 \n", "4068 NaN \n", "21 2.0 \n", "3620 NaN \n", "\n", " Total savings/cash assets for the family Weights of Prototypes \n", "1737 NaN 0.18 \n", "2561 1.0 0.14 \n", "343 NaN 0.11 \n", "6778 1.0 0.09 \n", "8376 1.0 0.07 \n", "718 NaN 0.11 \n", "821 2.0 0.08 \n", "4068 NaN 0.09 \n", "21 1.0 0.08 \n", "3620 NaN 0.06 " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the prototypes along with their computed weights\n", "inc_prototypes = df_inc.iloc[S, :].copy()\n", "# Compute normalized importance weights for prototypes\n", "inc_prototypes[\"Weights of Prototypes\"] = np.around(W/np.sum(W), 2) \n", "inc_prototypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Explanation:\n", "The 10 people shown above (i.e. 5 prototypes) are representative of the income questionnaire according to Protodash. Firstly, in the distribution plot for family finance related questions we saw that there roughly were 5 times as many people not having savings in excess of $5000 compared with others. Our prototypes also have a similar spread which is reassuring. Also for monthly family income we get a more even spread over the more commonly occuring categories. This is kind of a spot check to see if our prototypes actually match the distribution of values in the dataset.\n", "\n", "Looking at the other questions in the questionnaire and the corresponding answers given by the prototypical people above the social scientist realizes that most people are employeed (3rd question) and work for an organization earning through salary/wages (1st two questions). Most of them are also young (5th question) and fit to work (4th question). However, they don't seem to have much savings (last question). These insights that the social scientist has acquired from studying the prototypes could be conveyed also to the appropriate government authorities that affect future public policy decisions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Summarize Gaussian (simulated) data using prototypes" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(300, 100) (4000, 100)\n" ] } ], "source": [ "# generate normalized gaussian data X, Y with 100 features and 300 & 4000 observations respectively\n", "(X, Y) = get_Gaussian_Data(100, 300, 4000)\n", "print(X.shape, Y.shape)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -2.0000e+04 4e+00 1e+00 1e+00\n", " 1: 3.2221e+00 -8.5266e+04 2e+01 1e+00 1e+00\n", " 2: 2.7001e-01 -1.6914e+07 4e+03 1e+00 1e+00\n", " 3: 4.3897e+07 -3.9165e+14 4e+14 3e-13 6e-06\n", " 4: 4.3897e+07 -3.9165e+12 4e+12 3e-15 5e-06\n", " 5: 4.3894e+07 -3.9210e+10 4e+10 1e-16 9e-08\n", " 6: 4.3518e+07 -4.3607e+08 5e+08 2e-16 1e-09\n", " 7: 2.4531e+07 -2.8978e+07 5e+07 1e-16 1e-10\n", " 8: 3.6567e+06 -5.7170e+06 9e+06 2e-16 8e-13\n", " 9: 5.4550e+05 -5.8860e+05 1e+06 3e-16 9e-13\n", "10: 7.8453e+04 -8.8545e+04 2e+05 1e-16 5e-14\n", "11: 1.1222e+04 -1.2517e+04 2e+04 3e-16 4e-14\n", "12: 1.5941e+03 -1.8062e+03 3e+03 5e-16 5e-14\n", "13: 2.2267e+02 -2.6432e+02 5e+02 3e-16 3e-15\n", "14: 2.9593e+01 -4.0124e+01 7e+01 9e-17 6e-15\n", "15: 3.2462e+00 -6.6869e+00 1e+01 2e-16 3e-15\n", "16: -3.8432e-02 -1.4066e+00 1e+00 3e-16 9e-16\n", "17: -3.4476e-01 -4.9425e-01 1e-01 2e-16 3e-16\n", "18: -3.5526e-01 -3.5919e-01 4e-03 3e-16 9e-17\n", "19: -3.5527e-01 -3.5531e-01 4e-05 1e-16 2e-17\n", "20: -3.5527e-01 -3.5527e-01 4e-07 2e-16 6e-17\n", "21: -3.5527e-01 -3.5527e-01 4e-09 2e-16 8e-17\n", "Optimal solution found.\n", " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -3.0000e+04 6e+00 1e+00 1e+00\n", " 1: 5.8894e+00 -1.4435e+05 3e+01 1e+00 1e+00\n", " 2: -2.5861e-01 -7.3766e+06 1e+03 1e+00 1e+00\n", " 3: 9.5059e-01 -4.6431e+08 1e+05 1e+00 1e+00\n", " 4: 4.6414e+07 -1.1086e+16 1e+16 2e-13 8e-05\n", " 5: 4.6414e+07 -1.1086e+14 1e+14 2e-15 4e-05\n", " 6: 4.6414e+07 -1.1088e+12 1e+12 1e-16 3e-07\n", " 7: 4.6409e+07 -1.1309e+10 1e+10 2e-16 4e-09\n", " 8: 4.5972e+07 -3.3206e+08 4e+08 2e-16 2e-10\n", " 9: 8.2487e+06 -1.1901e+08 1e+08 2e-16 2e-10\n", "10: 2.7745e+06 -5.3081e+06 8e+06 4e-16 8e-12\n", "11: 4.1102e+05 -4.4387e+05 9e+05 2e-16 7e-13\n", "12: 5.9104e+04 -6.6801e+04 1e+05 2e-16 4e-13\n", "13: 8.4480e+03 -9.4441e+03 2e+04 1e-16 9e-14\n", "14: 1.1982e+03 -1.3638e+03 3e+03 8e-17 3e-14\n", "15: 1.6663e+02 -2.0025e+02 4e+02 2e-16 3e-15\n", "16: 2.1831e+01 -3.0674e+01 5e+01 1e-16 5e-15\n", "17: 2.2316e+00 -5.2332e+00 7e+00 2e-16 2e-15\n", "18: -1.5274e-01 -1.1659e+00 1e+00 1e-16 4e-16\n", "19: -3.5994e-01 -4.5999e-01 1e-01 2e-16 4e-16\n", "20: -3.6968e-01 -3.7995e-01 1e-02 1e-16 4e-17\n", "21: -3.7030e-01 -3.7057e-01 3e-04 2e-16 7e-17\n", "22: -3.7030e-01 -3.7030e-01 3e-06 1e-16 6e-17\n", "23: -3.7030e-01 -3.7030e-01 3e-08 1e-16 9e-17\n", "Optimal solution found.\n", " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -4.0000e+04 8e+00 1e+00 1e+00\n", " 1: 9.3221e+00 -2.1867e+05 4e+01 1e+00 1e+00\n", " 2: -2.7071e-01 -6.2612e+06 1e+03 1e+00 1e+00\n", " 3: 2.3451e+00 -2.0323e+08 4e+04 1e+00 1e+00\n", " 4: 1.7674e+00 -7.3334e+11 3e+08 1e+00 1e+00\n", " 5: 4.6015e+07 -5.9700e+18 6e+18 4e-13 3e-02\n", " 6: 4.6015e+07 -5.9700e+16 6e+16 3e-15 1e-02\n", " 7: 4.6015e+07 -5.9700e+14 6e+14 1e-16 4e-04\n", " 8: 4.6015e+07 -5.9703e+12 6e+12 2e-16 3e-06\n", " 9: 4.6014e+07 -6.0011e+10 6e+10 1e-16 5e-08\n", "10: 4.5901e+07 -9.0781e+08 1e+09 2e-16 8e-10\n", "11: 3.9137e+07 -2.6189e+08 3e+08 2e-16 4e-09\n", "12: 3.7830e+06 -1.6312e+08 2e+08 2e-16 6e-10\n", "13: 1.1693e+06 -4.1683e+06 5e+06 4e-16 2e-11\n", "14: 1.7665e+05 -2.1074e+05 4e+05 3e-16 3e-13\n", "15: 2.5265e+04 -2.7971e+04 5e+04 2e-16 1e-13\n", "16: 3.6001e+03 -4.0440e+03 8e+03 3e-16 6e-14\n", "17: 5.0722e+02 -5.8725e+02 1e+03 1e-16 3e-14\n", "18: 6.9248e+01 -8.7506e+01 2e+02 2e-16 9e-15\n", "19: 8.5002e+00 -1.3901e+01 2e+01 2e-16 1e-15\n", "20: 5.5238e-01 -2.5990e+00 3e+00 2e-16 1e-15\n", "21: -3.1688e-01 -7.1435e-01 4e-01 1e-16 3e-16\n", "22: -3.6885e-01 -4.3770e-01 7e-02 2e-16 1e-16\n", "23: -3.7656e-01 -3.9196e-01 2e-02 2e-16 7e-17\n", "24: -3.7664e-01 -3.7682e-01 2e-04 3e-16 8e-17\n", "25: -3.7664e-01 -3.7664e-01 2e-06 2e-16 3e-17\n", "26: -3.7664e-01 -3.7664e-01 2e-08 2e-16 5e-17\n", "Optimal solution found.\n", " pcost dcost gap pres dres\n", " 0: 0.0000e+00 -5.0000e+04 1e+01 1e+00 1e+00\n", " 1: 1.3506e+01 -3.0820e+05 6e+01 1e+00 1e+00\n", " 2: -2.5046e-01 -6.0247e+06 1e+03 1e+00 1e+00\n", " 3: 3.4951e+00 -1.2157e+08 2e+04 1e+00 1e+00\n", " 4: 5.2379e+00 -1.2733e+11 3e+07 1e+00 1e+00\n", " 5: 4.8704e+07 -2.1353e+18 2e+18 3e-13 1e-02\n", " 6: 4.8704e+07 -2.1353e+16 2e+16 3e-15 8e-03\n", " 7: 4.8704e+07 -2.1353e+14 2e+14 1e-16 1e-04\n", " 8: 4.8704e+07 -2.1357e+12 2e+12 3e-16 1e-06\n", " 9: 4.8699e+07 -2.1766e+10 2e+10 1e-16 9e-09\n", "10: 4.8239e+07 -6.2353e+08 7e+08 1e-16 3e-10\n", "11: 1.7641e+07 -1.5589e+08 2e+08 2e-16 5e-09\n", "12: 4.6226e+06 -1.0162e+07 1e+07 2e-16 3e-10\n", "13: 6.8723e+05 -7.5421e+05 1e+06 2e-16 7e-13\n", "14: 9.8870e+04 -1.1150e+05 2e+05 2e-16 2e-13\n", "15: 1.4144e+04 -1.5779e+04 3e+04 1e-16 1e-13\n", "16: 2.0108e+03 -2.2729e+03 4e+03 2e-16 5e-14\n", "17: 2.8153e+02 -3.3191e+02 6e+02 9e-17 2e-14\n", "18: 3.7703e+01 -5.0125e+01 9e+01 3e-16 5e-15\n", "19: 4.2787e+00 -8.2456e+00 1e+01 2e-16 1e-15\n", "20: 5.4510e-02 -1.6812e+00 2e+00 2e-16 8e-16\n", "21: -3.5895e-01 -5.5641e-01 2e-01 1e-16 2e-16\n", "22: -3.7871e-01 -3.9368e-01 1e-02 2e-16 1e-16\n", "23: -3.8024e-01 -3.8132e-01 1e-03 1e-16 1e-16\n", "24: -3.8026e-01 -3.8027e-01 1e-05 1e-16 1e-16\n", "25: -3.8026e-01 -3.8026e-01 1e-07 2e-16 1e-16\n", "Optimal solution found.\n" ] } ], "source": [ "(W, S, setValues) = explainer.explain(X, Y, m=5, kernelType='Gaussian', sigma=2)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[3940 2539 2168 2189 1170] [0.20611975 0.24524152 0.19131865 0.17175151 0.16123794]\n" ] } ], "source": [ "print(S, W)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 7.54620330e-02, 2.21559502e-03, -2.60390535e-04,\n", " 1.32841035e-03, -6.39270748e-02, 1.60731744e-01,\n", " 1.00469456e-01, 2.30534177e-02, -2.62244674e-02,\n", " -7.41659651e-02, 5.02276642e-03, -1.99183986e-02,\n", " -3.11979874e-02, 9.58197735e-04, -3.98023293e-03,\n", " -3.29754316e-01, -8.57806586e-02, -5.55736043e-02,\n", " -1.14853770e-01, 1.54146242e-01, -1.06148650e-02,\n", " 4.69670224e-02, -1.11019441e-01, 2.79269162e-02,\n", " -1.53149165e-02, -1.21122605e-01, -1.03368232e-01,\n", " 9.06247095e-04, 1.39727121e-01, 4.64877100e-02,\n", " -9.19127100e-02, -1.99338641e-01, 5.17279612e-02,\n", " -6.04863165e-02, 3.31939372e-02, -5.58036293e-02,\n", " 5.00831453e-03, 2.91216408e-02, -1.19168965e-01,\n", " 5.18582471e-02, -7.42898829e-02, -2.37439618e-02,\n", " -7.45301815e-02, -4.90749528e-02, -2.48565755e-02,\n", " 1.38223978e-01, -1.51096859e-01, 1.26429938e-01,\n", " 1.40878748e-01, -1.20653470e-03, -1.99680954e-01,\n", " -4.61772255e-02, -5.58457023e-03, 6.04222729e-02,\n", " -1.46530239e-01, 2.57859750e-02, 1.80667261e-01,\n", " 9.87520991e-02, 6.30395511e-02, -1.78646359e-01,\n", " -7.16598231e-02, -6.35797314e-02, 1.98389065e-01,\n", " -9.38033072e-02, -3.26780737e-02, -2.06693817e-01,\n", " 1.07858528e-01, 6.37029365e-03, -4.87539909e-03,\n", " -6.44757776e-02, 1.44313795e-02, 4.09943309e-02,\n", " 1.76572648e-01, 9.05027830e-03, -8.08082206e-02,\n", " -3.62697988e-02, 3.70204530e-02, 1.83713721e-01,\n", " -1.61783956e-02, -1.57289741e-01, 4.23404205e-02,\n", " -7.74897612e-02, -1.34721576e-01, 1.24695632e-01,\n", " 7.57095929e-02, -5.91620971e-02, -6.43205657e-02,\n", " 7.04128833e-02, 3.91826034e-02, 1.30478643e-01,\n", " -8.74352429e-02, 1.27753818e-01, 1.46221904e-01,\n", " -7.67186220e-02, 1.95530661e-01, -1.22633318e-01,\n", " 9.72488385e-02, -3.13912478e-02, -2.49134642e-02,\n", " -1.52523104e-02],\n", " [-5.12032011e-02, 8.59070568e-02, 6.02458030e-02,\n", " 4.39533646e-02, -4.07826532e-02, -1.30577943e-01,\n", " -1.65584070e-01, -9.82477819e-02, 1.58885961e-01,\n", " 6.76718800e-02, -1.11328437e-02, 2.66462814e-02,\n", " 3.13068659e-02, -9.19254513e-04, -3.22633719e-02,\n", " 1.82389121e-01, -1.81855515e-01, 2.39420524e-02,\n", " 2.96660210e-02, -1.09595087e-01, -1.87684690e-01,\n", " 3.86686749e-02, -1.59571156e-01, -8.74150224e-02,\n", " -3.80742210e-02, 5.91535463e-02, 1.31050834e-02,\n", " -9.52395634e-02, -2.44643504e-01, 2.80989215e-02,\n", " -3.60962632e-02, 1.13454889e-01, -4.52915355e-02,\n", " 1.13456828e-01, -3.27414266e-03, -9.37852482e-02,\n", " 2.47258487e-02, 2.77962049e-02, 1.56346339e-01,\n", " -7.62389840e-02, 2.27576201e-01, 2.10499828e-01,\n", " -4.49497435e-02, 2.97213572e-02, 1.00515129e-01,\n", " -2.16547503e-01, -2.04878926e-03, -7.32397613e-02,\n", " 5.29739867e-02, 2.15015182e-02, 1.28655169e-01,\n", " -1.39754017e-01, -6.63246871e-03, -1.92752244e-01,\n", " 2.22893961e-02, -4.68796913e-02, -1.32616591e-01,\n", " 1.63384768e-01, -1.18715808e-01, 4.33659288e-03,\n", " -2.25741529e-02, 1.20411948e-01, -1.35740211e-01,\n", " -3.07858713e-02, 1.07129775e-01, 1.29308501e-01,\n", " -8.71632481e-02, -3.56135454e-02, -1.54720613e-01,\n", " 5.83464511e-02, -7.13917621e-02, -1.23603444e-01,\n", " 7.17494224e-02, 1.28982996e-01, -4.96462940e-02,\n", " 1.30917066e-01, 3.88935305e-02, 2.14543481e-02,\n", " 2.13189516e-01, -2.53210773e-02, -2.70219323e-02,\n", " 2.42025355e-02, -3.88447363e-02, 8.05132480e-03,\n", " 8.25499366e-02, 6.91965319e-03, -6.79202904e-03,\n", " -1.15702999e-01, 9.91226792e-02, -1.64877368e-01,\n", " -9.18924817e-02, -7.66279063e-02, 2.27675735e-02,\n", " -9.66024848e-03, 1.15008589e-02, 3.51111969e-02,\n", " 1.08829441e-01, 5.44845952e-02, 2.18649162e-02,\n", " 5.06752151e-02],\n", " [ 2.86155297e-02, -1.05488006e-01, -7.87286100e-02,\n", " -1.23060317e-01, 1.41242195e-01, 1.61289406e-01,\n", " -6.65333113e-02, 4.24718744e-03, 2.30728355e-02,\n", " 2.52204781e-02, -1.67344024e-02, -9.64094848e-03,\n", " -2.05024220e-01, -8.56742828e-03, -1.63558273e-01,\n", " -1.85095596e-02, 2.32519692e-01, -1.06712872e-01,\n", " 1.08756470e-01, 9.08776614e-02, 2.25193240e-02,\n", " 1.10843888e-02, 2.22624858e-02, 7.94616916e-02,\n", " -7.05475069e-02, -5.04894538e-02, 6.73770298e-02,\n", " 1.70403916e-02, 1.85043584e-01, -6.86832999e-02,\n", " 1.82798381e-01, 2.76679186e-02, -9.80483718e-02,\n", " -1.68076398e-02, -6.16767243e-03, 5.82337855e-02,\n", " -4.46581284e-02, 1.65138897e-01, -7.97622466e-04,\n", " -7.90062537e-02, 1.35627635e-01, -6.71052957e-02,\n", " 1.32776898e-03, 2.26621107e-02, -1.33920408e-01,\n", " 2.12284548e-01, 7.76337902e-03, -9.06858479e-03,\n", " -6.63401285e-02, 2.20085701e-02, -1.12339780e-01,\n", " -6.83010639e-02, -1.50540980e-01, -1.11909889e-02,\n", " 1.57512596e-02, -2.02935431e-01, 4.41772377e-02,\n", " -1.89468455e-01, 4.27344770e-02, 2.31878301e-01,\n", " 1.50165308e-02, -1.74239224e-02, 7.23866853e-02,\n", " -4.49320464e-03, -3.20748774e-02, -7.06749354e-02,\n", " 2.47291946e-02, 1.16010448e-01, 1.12739693e-01,\n", " 1.32046266e-01, 1.71358218e-02, -7.09924656e-03,\n", " -5.28219086e-03, -1.04730872e-01, 9.09559127e-02,\n", " 1.94412356e-02, -1.47668693e-01, -2.13166568e-01,\n", " -1.45972201e-01, 1.55662643e-01, 1.47125643e-02,\n", " -1.62560344e-01, -8.05978172e-02, 4.79596312e-02,\n", " 1.68172414e-02, -1.09184256e-02, 4.69774175e-02,\n", " -2.77278658e-01, 3.34985588e-02, -8.62113655e-02,\n", " -2.76764428e-02, -5.94575156e-02, -2.66501089e-02,\n", " -2.11362090e-02, -3.73709651e-02, 4.49581338e-02,\n", " -9.23345967e-02, -9.81218174e-02, 1.06150953e-01,\n", " -3.57024980e-02],\n", " [-3.80170810e-02, 1.81228960e-02, 3.76613228e-02,\n", " 9.99429285e-03, -8.72905662e-02, -8.26127992e-02,\n", " -1.12254856e-01, 3.77826881e-02, -1.41961665e-01,\n", " -1.05190036e-01, -2.99855443e-02, -1.01086834e-01,\n", " 4.46633225e-02, -1.13663339e-01, -1.05941933e-01,\n", " 6.01894161e-02, 1.12233329e-02, 1.85114909e-01,\n", " 1.18560975e-01, -2.25948355e-01, 6.30857038e-04,\n", " -2.63961780e-02, 1.80748449e-01, 2.53197031e-04,\n", " 1.20292849e-01, -2.03532504e-01, 2.90324898e-03,\n", " 5.80505596e-02, 5.41591917e-02, -2.14111793e-01,\n", " -1.11118157e-02, -9.09999291e-02, -1.16088150e-01,\n", " -6.23335307e-02, 1.51891995e-01, 1.33701854e-01,\n", " 3.32558231e-02, -1.81205246e-01, -1.12656057e-01,\n", " -2.99952638e-02, -5.12185606e-02, -1.59430716e-01,\n", " 5.13962604e-02, 1.24832598e-01, -1.62586819e-01,\n", " 9.61286974e-02, 4.61616910e-02, -1.12523339e-01,\n", " -3.43494196e-03, -7.98131294e-02, -6.45696025e-02,\n", " 1.27240837e-01, 2.95214970e-02, 9.35722145e-03,\n", " -3.83294142e-02, 1.21269680e-01, -2.46976687e-02,\n", " -3.06565462e-02, -3.25698826e-02, -5.70489883e-02,\n", " 1.63827523e-01, -1.75598017e-02, -5.38532941e-02,\n", " -6.28838588e-02, -4.06005514e-02, -5.06006934e-04,\n", " 1.95758391e-01, 8.70703211e-02, -5.88573081e-02,\n", " 9.78798207e-02, 2.54893854e-02, -5.41371409e-02,\n", " -2.49284494e-01, -6.45988807e-02, 1.39372907e-01,\n", " -6.14654719e-03, 1.01849484e-01, -2.38256236e-03,\n", " 7.39643107e-02, 6.90527576e-02, -7.64595250e-02,\n", " 2.23452953e-02, 1.95699468e-01, 8.05587920e-02,\n", " 5.73316558e-03, -9.62002819e-02, 9.42235873e-02,\n", " 1.59832522e-01, 2.80312092e-02, 6.90448387e-02,\n", " 1.25089467e-01, 4.60651678e-02, 2.21493616e-02,\n", " -2.09332136e-01, -2.89396925e-02, 2.69579160e-02,\n", " -2.07402111e-02, -1.50227546e-01, -2.51955853e-02,\n", " 4.73837516e-02],\n", " [-8.44611617e-02, -2.41034389e-01, 3.92667741e-02,\n", " 8.38829240e-03, -2.59541454e-02, 2.36086807e-01,\n", " 2.95424807e-01, -8.59653420e-03, -1.04826651e-01,\n", " 2.05520928e-02, -1.53437224e-02, -5.55097162e-02,\n", " 2.40340904e-01, -8.00020730e-03, 7.00061651e-02,\n", " 4.83951396e-02, 4.77067468e-02, -8.49009332e-02,\n", " -1.19649325e-01, -1.01977658e-01, 1.24834510e-01,\n", " 6.51047201e-03, -1.05574031e-01, -1.52698610e-01,\n", " 4.83373112e-02, 1.29525367e-01, -9.18014945e-02,\n", " -1.80375977e-01, 3.43130869e-02, 4.37972090e-02,\n", " -1.06864933e-01, -4.86896181e-02, -1.77240218e-01,\n", " -1.42453651e-01, 5.24072304e-02, -6.31389082e-03,\n", " -9.78705031e-02, -5.69558513e-02, -1.28056559e-01,\n", " -2.56793123e-02, 2.17994924e-03, 8.55609072e-02,\n", " -3.22962889e-02, 4.78407496e-02, 1.76081104e-01,\n", " 8.63313197e-02, -3.06273765e-02, -3.05080417e-03,\n", " -9.11791944e-02, -5.95668276e-02, 1.68375029e-01,\n", " 1.73418800e-01, 2.58610479e-02, 1.98958394e-01,\n", " 1.43898168e-01, 2.70602516e-02, 8.44988112e-02,\n", " 4.42356916e-02, 7.60191656e-02, -6.02561971e-02,\n", " -5.10416603e-02, -4.64762878e-02, -9.35054259e-02,\n", " 5.31547946e-02, -4.45871774e-02, 1.52269925e-01,\n", " 7.91817576e-02, -2.76924895e-02, -1.65051197e-02,\n", " -1.66953517e-02, 5.81978655e-02, -5.67794191e-02,\n", " 3.51915998e-02, 9.15276681e-02, 4.43072760e-02,\n", " 1.57386410e-02, -1.01662148e-01, 3.58153399e-02,\n", " -1.77535049e-01, 6.98827654e-02, 1.22914586e-01,\n", " -2.38931458e-02, -1.59025828e-01, -1.52231010e-01,\n", " -4.36140365e-02, -5.44352366e-04, 2.32160700e-02,\n", " 9.10446427e-02, 8.00022857e-03, -5.94089208e-02,\n", " 7.29694580e-03, 1.22248239e-01, 1.33425872e-01,\n", " 8.88037904e-02, -1.28971128e-01, -6.42771550e-02,\n", " -6.64008830e-03, 1.41297694e-03, -9.11373154e-04,\n", " -8.89220906e-02]])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Y[S, :]" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 2 }