{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# CIC Subpopulation Construction\n",
    "\n",
    "To create a representative model of the agent interactions, we will use subpopulation modeling. We take all of the agents and cluster them based off of the following features from full population actual transactional data from Jan - May 11 2020 xDai data (s means source, t means target):\n",
    "* s_location - source individual location\n",
    "* s_business_type - source individual business type\n",
    "* t_location - target individual location\n",
    "* t_business_type - target individual business type.\n",
    "* weight, which is tokens, exchange amount\n",
    "* s_bal - source individual CIC wallet balance\n",
    "* t_bal - target individual CIC wallet balance\n",
    "\n",
    "Essentially, we are taking a graph zoom operation, bundling nodes together based off of their likeness. Nodes are constant with edges being transative. The algorithm we use for this graph zoom operation is Kmeans clustering. Based off our descriptive statistical analysis and use of th Gap Statistic created by Stanford researchers Tibshirani, Walther and Hastie in their 2001 [paper](https://web.stanford.edu/~hastie/Papers/gap.pdf), we determined 50 clusters are representative of the subpopulations. All of the flows inside of the bundle become part of the self-loop flow. For example, within cluster 1, agent a can transaction with as b. This will not be reflected within our model as this is intra not inter cluster interactions.\n",
    " \n",
    "## Graph Model of Current Spend Activity\n",
    "\n",
    "We created a network graph of the CIC transaction data as a $G(N,E)$ weighted directed graph with source and target agents as nodes, $N$ and the edges as $E$. Tokens are used as the edge weight to denote the actual CIC flow between agents, as $i,j \\in E$.\n",
    "\n",
    "The observed data shows the actual payments between network actors that are transacting in CIC. The observed data does not show us shillings payments between actors, actors utility, or demand. We only know actual CIC spends between agents.     \n",
    "\n",
    "\n",
    "## Saving Clustering Results\n",
    "At the bottom of this notebook, we calculate the median, 1st quartile, 3rd quartile, mean, standard deviation, utility types ordering, and utility types probability. These values can then be copied into the ```subpopulation_clusters.py``` in the simulation folders for use in the simulations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import libraries\n",
    "import networkx as nx\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from sklearn.cluster import KMeans\n",
    "from gap_statistic import OptimalK\n",
    "from sklearn.decomposition import PCA\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Dump as of 5-15-2020\n",
    "Jan - May 11 2020 xDai Blockchain data\n",
    "https://www.grassrootseconomics.org/research"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import the data\n",
    "transactions = pd.read_csv('data/sarafu_xDAI_tx_all_pub_all_time_12May2020.csv')\n",
    "users = pd.read_csv('data/sarafu_xDAI_users_all_pub_all_time_12May2020.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>timeset</th>\n",
       "      <th>transfer_subtype</th>\n",
       "      <th>source</th>\n",
       "      <th>s_gender</th>\n",
       "      <th>s_location</th>\n",
       "      <th>s_business_type</th>\n",
       "      <th>target</th>\n",
       "      <th>t_gender</th>\n",
       "      <th>t_location</th>\n",
       "      <th>t_business_type</th>\n",
       "      <th>tx_token</th>\n",
       "      <th>weight</th>\n",
       "      <th>type</th>\n",
       "      <th>token_name</th>\n",
       "      <th>token_address</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2020-01-25 19:13:17.731529</td>\n",
       "      <td>DISBURSEMENT</td>\n",
       "      <td>0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>System</td>\n",
       "      <td>0x245fc81fe385450Dc0f4787668e47c903C00b0A1</td>\n",
       "      <td>female</td>\n",
       "      <td>GE Office</td>\n",
       "      <td>Savings Group</td>\n",
       "      <td>NaN</td>\n",
       "      <td>18000.000000</td>\n",
       "      <td>directed</td>\n",
       "      <td>Sarafu</td>\n",
       "      <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>2020-01-25 19:13:19.056070</td>\n",
       "      <td>DISBURSEMENT</td>\n",
       "      <td>0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>System</td>\n",
       "      <td>0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2</td>\n",
       "      <td>male</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>NaN</td>\n",
       "      <td>9047.660892</td>\n",
       "      <td>directed</td>\n",
       "      <td>Sarafu</td>\n",
       "      <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>2020-01-25 19:13:20.288346</td>\n",
       "      <td>DISBURSEMENT</td>\n",
       "      <td>0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>System</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>male</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>NaN</td>\n",
       "      <td>25378.726002</td>\n",
       "      <td>directed</td>\n",
       "      <td>Sarafu</td>\n",
       "      <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>2020-01-25 19:13:21.478850</td>\n",
       "      <td>DISBURSEMENT</td>\n",
       "      <td>0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>System</td>\n",
       "      <td>0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435</td>\n",
       "      <td>male</td>\n",
       "      <td>G.E</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4495.932576</td>\n",
       "      <td>directed</td>\n",
       "      <td>Sarafu</td>\n",
       "      <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>2020-01-26 07:48:43.042684</td>\n",
       "      <td>DISBURSEMENT</td>\n",
       "      <td>0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>System</td>\n",
       "      <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
       "      <td>male</td>\n",
       "      <td>Home</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>NaN</td>\n",
       "      <td>400.000000</td>\n",
       "      <td>directed</td>\n",
       "      <td>Sarafu</td>\n",
       "      <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id                     timeset transfer_subtype  \\\n",
       "0   1  2020-01-25 19:13:17.731529     DISBURSEMENT   \n",
       "1   2  2020-01-25 19:13:19.056070     DISBURSEMENT   \n",
       "2   3  2020-01-25 19:13:20.288346     DISBURSEMENT   \n",
       "3   4  2020-01-25 19:13:21.478850     DISBURSEMENT   \n",
       "4   5  2020-01-26 07:48:43.042684     DISBURSEMENT   \n",
       "\n",
       "                                       source s_gender s_location  \\\n",
       "0  0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F      NaN       None   \n",
       "1  0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F      NaN       None   \n",
       "2  0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F      NaN       None   \n",
       "3  0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F      NaN       None   \n",
       "4  0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F      NaN       None   \n",
       "\n",
       "  s_business_type                                      target t_gender  \\\n",
       "0          System  0x245fc81fe385450Dc0f4787668e47c903C00b0A1   female   \n",
       "1          System  0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2     male   \n",
       "2          System  0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31     male   \n",
       "3          System  0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435     male   \n",
       "4          System  0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72     male   \n",
       "\n",
       "   t_location t_business_type  tx_token        weight      type token_name  \\\n",
       "0   GE Office   Savings Group       NaN  18000.000000  directed     Sarafu   \n",
       "1  GE Nairobi  Farming/Labour       NaN   9047.660892  directed     Sarafu   \n",
       "2  GE Nairobi  Farming/Labour       NaN  25378.726002  directed     Sarafu   \n",
       "3         G.E  Farming/Labour       NaN   4495.932576  directed     Sarafu   \n",
       "4        Home  Farming/Labour       NaN    400.000000  directed     Sarafu   \n",
       "\n",
       "                                token_address  \n",
       "0  0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4  \n",
       "1  0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4  \n",
       "2  0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4  \n",
       "3  0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4  \n",
       "4  0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "transactions.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'STANDARD': 0.5085207861177824,\n",
       " 'DISBURSEMENT': 0.35574873997902784,\n",
       " 'RECLAMATION': 0.13483070053783444,\n",
       " 'AGENT_OUT': 0.0008997733653553429}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "transactions.transfer_subtype.value_counts(normalize=True).to_dict()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Based on the data dictionary provided by Grassroots Economics, we know that the transfer subtype codes are:\n",
    "\n",
    "* DISBURSEMENT = from Grassroots Economics\n",
    "* RECLAMATION = Back to GE, \n",
    "* STANDARD = a trade between users, \n",
    "* AGENT = when a group account is cashing out\n",
    "\n",
    "\n",
    "For purposes of our analysis, we will subset to STANDARD transactions. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "transactions_subset = transactions[transactions['transfer_subtype'] == 'STANDARD']\n",
    "transactions_subset = transactions_subset[transactions_subset['token_name']=='Sarafu']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>start</th>\n",
       "      <th>label</th>\n",
       "      <th>gender</th>\n",
       "      <th>location</th>\n",
       "      <th>held_roles</th>\n",
       "      <th>business_type</th>\n",
       "      <th>bal</th>\n",
       "      <th>xDAI_blockchain_address</th>\n",
       "      <th>confidence</th>\n",
       "      <th>...</th>\n",
       "      <th>otxns_in</th>\n",
       "      <th>otxns_out</th>\n",
       "      <th>ounique_in</th>\n",
       "      <th>ounique_out</th>\n",
       "      <th>svol_in</th>\n",
       "      <th>svol_out</th>\n",
       "      <th>stxns_in</th>\n",
       "      <th>stxns_out</th>\n",
       "      <th>sunique_in</th>\n",
       "      <th>sunique_out</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2020-01-25 19:10:50.218686</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>ADMIN</td>\n",
       "      <td>System</td>\n",
       "      <td>8.916761e+06</td>\n",
       "      <td>0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>19917</td>\n",
       "      <td>52610</td>\n",
       "      <td>9</td>\n",
       "      <td>19862</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>2018-10-23 09:09:58</td>\n",
       "      <td>2</td>\n",
       "      <td>female</td>\n",
       "      <td>GE Office</td>\n",
       "      <td>TOKEN_AGENT</td>\n",
       "      <td>Savings Group</td>\n",
       "      <td>1.800000e+05</td>\n",
       "      <td>0x245fc81fe385450Dc0f4787668e47c903C00b0A1</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>134</td>\n",
       "      <td>16</td>\n",
       "      <td>68</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>2018-10-21 14:20:57</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>BENEFICIARY</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>5.666089e+01</td>\n",
       "      <td>0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>9007.0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>2018-10-21 15:38:30</td>\n",
       "      <td>4</td>\n",
       "      <td>male</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>BENEFICIARY</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>1.173773e+04</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>0.100000</td>\n",
       "      <td>...</td>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>20619.0</td>\n",
       "      <td>50449.0</td>\n",
       "      <td>20</td>\n",
       "      <td>15</td>\n",
       "      <td>11</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>2018-10-23 14:10:27</td>\n",
       "      <td>5</td>\n",
       "      <td>male</td>\n",
       "      <td>G.E</td>\n",
       "      <td>BENEFICIARY</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>7.297263e+03</td>\n",
       "      <td>0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435</td>\n",
       "      <td>0.405063</td>\n",
       "      <td>...</td>\n",
       "      <td>15</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>127393.3</td>\n",
       "      <td>168905.0</td>\n",
       "      <td>158</td>\n",
       "      <td>208</td>\n",
       "      <td>84</td>\n",
       "      <td>65</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 22 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   id                       start  label  gender    location   held_roles  \\\n",
       "0   1  2020-01-25 19:10:50.218686      1     NaN        None        ADMIN   \n",
       "1   2         2018-10-23 09:09:58      2  female   GE Office  TOKEN_AGENT   \n",
       "2   3         2018-10-21 14:20:57      3    male  GE Nairobi  BENEFICIARY   \n",
       "3   4         2018-10-21 15:38:30      4    male  GE Nairobi  BENEFICIARY   \n",
       "4   5         2018-10-23 14:10:27      5    male         G.E  BENEFICIARY   \n",
       "\n",
       "    business_type           bal                     xDAI_blockchain_address  \\\n",
       "0          System  8.916761e+06  0xBDB3Bc887C3b70586BC25D04d89eC802b897fC5F   \n",
       "1   Savings Group  1.800000e+05  0x245fc81fe385450Dc0f4787668e47c903C00b0A1   \n",
       "2  Farming/Labour  5.666089e+01  0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2   \n",
       "3  Farming/Labour  1.173773e+04  0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31   \n",
       "4  Farming/Labour  7.297263e+03  0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435   \n",
       "\n",
       "   confidence  ...  otxns_in  otxns_out  ounique_in  ounique_out   svol_in  \\\n",
       "0    0.000000  ...     19917      52610           9        19862       0.0   \n",
       "1    0.000000  ...       134         16          68            0       0.0   \n",
       "2    0.000000  ...         2          2           1            0       0.0   \n",
       "3    0.100000  ...         6          1           1            0   20619.0   \n",
       "4    0.405063  ...        15          1           1            0  127393.3   \n",
       "\n",
       "   svol_out  stxns_in  stxns_out  sunique_in  sunique_out  \n",
       "0       0.0         0          0           0            0  \n",
       "1       0.0         0          0           0            0  \n",
       "2    9007.0         0          1           0            1  \n",
       "3   50449.0        20         15          11            5  \n",
       "4  168905.0       158        208          84           65  \n",
       "\n",
       "[5 rows x 22 columns]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'Farming/Labour': 0.43367860016090104,\n",
       " 'Food/Water': 0.22863032984714401,\n",
       " 'Shop': 0.1406878519710378,\n",
       " 'Fuel/Energy': 0.06365647626709574,\n",
       " 'None': 0.0621983105390185,\n",
       " 'Transport': 0.04379525341914722,\n",
       " 'Education': 0.014380530973451327,\n",
       " 'Savings Group': 0.006335478680611424,\n",
       " 'Health': 0.00331858407079646,\n",
       " 'Environment': 0.001910699919549477,\n",
       " 'System': 0.0012067578439259854,\n",
       " 'Staff': 0.00010056315366049879,\n",
       " 'Chama': 5.0281576830249393e-05,\n",
       " 'Game': 5.0281576830249393e-05}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users['business_type'].value_counts(normalize=True).to_dict()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Combine user and transaction tables\n",
    "\n",
    "Combine user and transaction tables on both the source and target features."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_subset = users[['bal','xDAI_blockchain_address']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>timeset</th>\n",
       "      <th>transfer_subtype</th>\n",
       "      <th>source</th>\n",
       "      <th>s_gender</th>\n",
       "      <th>s_location</th>\n",
       "      <th>s_business_type</th>\n",
       "      <th>target</th>\n",
       "      <th>t_gender</th>\n",
       "      <th>t_location</th>\n",
       "      <th>t_business_type</th>\n",
       "      <th>tx_token</th>\n",
       "      <th>weight</th>\n",
       "      <th>type</th>\n",
       "      <th>token_name</th>\n",
       "      <th>token_address</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>72647</th>\n",
       "      <td>170140</td>\n",
       "      <td>2020-04-30 10:43:45.170528</td>\n",
       "      <td>STANDARD</td>\n",
       "      <td>0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2</td>\n",
       "      <td>male</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>male</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>NaN</td>\n",
       "      <td>9007.0</td>\n",
       "      <td>directed</td>\n",
       "      <td>Sarafu</td>\n",
       "      <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>72648</th>\n",
       "      <td>10</td>\n",
       "      <td>2020-01-26 08:26:22.521902</td>\n",
       "      <td>STANDARD</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>male</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
       "      <td>male</td>\n",
       "      <td>Home</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>NaN</td>\n",
       "      <td>100.0</td>\n",
       "      <td>directed</td>\n",
       "      <td>Sarafu</td>\n",
       "      <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>72649</th>\n",
       "      <td>11</td>\n",
       "      <td>2020-01-26 08:27:26.757372</td>\n",
       "      <td>STANDARD</td>\n",
       "      <td>0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435</td>\n",
       "      <td>male</td>\n",
       "      <td>G.E</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>male</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>directed</td>\n",
       "      <td>Sarafu</td>\n",
       "      <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>72650</th>\n",
       "      <td>13</td>\n",
       "      <td>2020-01-26 08:32:05.154096</td>\n",
       "      <td>STANDARD</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>male</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
       "      <td>male</td>\n",
       "      <td>Home</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>NaN</td>\n",
       "      <td>23.0</td>\n",
       "      <td>directed</td>\n",
       "      <td>Sarafu</td>\n",
       "      <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>72651</th>\n",
       "      <td>15</td>\n",
       "      <td>2020-01-26 08:38:42.186525</td>\n",
       "      <td>STANDARD</td>\n",
       "      <td>0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3</td>\n",
       "      <td>male</td>\n",
       "      <td>Test</td>\n",
       "      <td>Health</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>male</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>NaN</td>\n",
       "      <td>12.0</td>\n",
       "      <td>directed</td>\n",
       "      <td>Sarafu</td>\n",
       "      <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>147810</th>\n",
       "      <td>208035</td>\n",
       "      <td>2020-05-11 08:52:34.504171</td>\n",
       "      <td>STANDARD</td>\n",
       "      <td>0x97F5165b544e0869ba3Be80D7eEe8b73a0270Dfe</td>\n",
       "      <td>Unknown gender</td>\n",
       "      <td>kilibole</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>0x5CAaA1f7dC13235Fe181D0307e682c387e75a6ec</td>\n",
       "      <td>Unknown gender</td>\n",
       "      <td>Kilibole</td>\n",
       "      <td>Food/Water</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20.0</td>\n",
       "      <td>directed</td>\n",
       "      <td>Sarafu</td>\n",
       "      <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>147811</th>\n",
       "      <td>208021</td>\n",
       "      <td>2020-05-11 08:49:20.768559</td>\n",
       "      <td>STANDARD</td>\n",
       "      <td>0x9a05d12df366cE3aa1420c6DFFD0db9ce4ba77Fc</td>\n",
       "      <td>Unknown gender</td>\n",
       "      <td>Kikomani</td>\n",
       "      <td>Food/Water</td>\n",
       "      <td>0xb44279a1d11A2bc4b1b3D08D3BEAb8278cc86985</td>\n",
       "      <td>Unknown gender</td>\n",
       "      <td>Bofu</td>\n",
       "      <td>Shop</td>\n",
       "      <td>NaN</td>\n",
       "      <td>350.0</td>\n",
       "      <td>directed</td>\n",
       "      <td>Sarafu</td>\n",
       "      <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>147812</th>\n",
       "      <td>208459</td>\n",
       "      <td>2020-05-11 10:11:18.699013</td>\n",
       "      <td>STANDARD</td>\n",
       "      <td>0x2e44845BE57687bFdcdd26044bB7CdD575781336</td>\n",
       "      <td>male</td>\n",
       "      <td>Miyani</td>\n",
       "      <td>Shop</td>\n",
       "      <td>0xfCF20a412eB6DD345237C7BEeBab53B424b98297</td>\n",
       "      <td>male</td>\n",
       "      <td>Miyani</td>\n",
       "      <td>Shop</td>\n",
       "      <td>NaN</td>\n",
       "      <td>400.0</td>\n",
       "      <td>directed</td>\n",
       "      <td>Sarafu</td>\n",
       "      <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>147813</th>\n",
       "      <td>208395</td>\n",
       "      <td>2020-05-11 10:01:04.805823</td>\n",
       "      <td>STANDARD</td>\n",
       "      <td>0xAc4DB7728940e76BCd98Bb8E60671916f3B7576A</td>\n",
       "      <td>male</td>\n",
       "      <td>Kilifi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>0x2f99a653F5dc201eA97578A6a203BC4db1eaD2FF</td>\n",
       "      <td>Unknown gender</td>\n",
       "      <td>KIlifi</td>\n",
       "      <td>Education</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20.0</td>\n",
       "      <td>directed</td>\n",
       "      <td>Sarafu</td>\n",
       "      <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>147814</th>\n",
       "      <td>208396</td>\n",
       "      <td>2020-05-11 10:01:10.449068</td>\n",
       "      <td>STANDARD</td>\n",
       "      <td>0x2f99a653F5dc201eA97578A6a203BC4db1eaD2FF</td>\n",
       "      <td>Unknown gender</td>\n",
       "      <td>KIlifi</td>\n",
       "      <td>Education</td>\n",
       "      <td>0xAc4DB7728940e76BCd98Bb8E60671916f3B7576A</td>\n",
       "      <td>male</td>\n",
       "      <td>Kilifi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20.0</td>\n",
       "      <td>directed</td>\n",
       "      <td>Sarafu</td>\n",
       "      <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>75167 rows × 16 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "            id                     timeset transfer_subtype  \\\n",
       "72647   170140  2020-04-30 10:43:45.170528         STANDARD   \n",
       "72648       10  2020-01-26 08:26:22.521902         STANDARD   \n",
       "72649       11  2020-01-26 08:27:26.757372         STANDARD   \n",
       "72650       13  2020-01-26 08:32:05.154096         STANDARD   \n",
       "72651       15  2020-01-26 08:38:42.186525         STANDARD   \n",
       "...        ...                         ...              ...   \n",
       "147810  208035  2020-05-11 08:52:34.504171         STANDARD   \n",
       "147811  208021  2020-05-11 08:49:20.768559         STANDARD   \n",
       "147812  208459  2020-05-11 10:11:18.699013         STANDARD   \n",
       "147813  208395  2020-05-11 10:01:04.805823         STANDARD   \n",
       "147814  208396  2020-05-11 10:01:10.449068         STANDARD   \n",
       "\n",
       "                                            source        s_gender  \\\n",
       "72647   0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2            male   \n",
       "72648   0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31            male   \n",
       "72649   0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435            male   \n",
       "72650   0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31            male   \n",
       "72651   0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3            male   \n",
       "...                                            ...             ...   \n",
       "147810  0x97F5165b544e0869ba3Be80D7eEe8b73a0270Dfe  Unknown gender   \n",
       "147811  0x9a05d12df366cE3aa1420c6DFFD0db9ce4ba77Fc  Unknown gender   \n",
       "147812  0x2e44845BE57687bFdcdd26044bB7CdD575781336            male   \n",
       "147813  0xAc4DB7728940e76BCd98Bb8E60671916f3B7576A            male   \n",
       "147814  0x2f99a653F5dc201eA97578A6a203BC4db1eaD2FF  Unknown gender   \n",
       "\n",
       "        s_location s_business_type  \\\n",
       "72647   GE Nairobi  Farming/Labour   \n",
       "72648   GE Nairobi  Farming/Labour   \n",
       "72649          G.E  Farming/Labour   \n",
       "72650   GE Nairobi  Farming/Labour   \n",
       "72651         Test          Health   \n",
       "...            ...             ...   \n",
       "147810    kilibole  Farming/Labour   \n",
       "147811    Kikomani      Food/Water   \n",
       "147812      Miyani            Shop   \n",
       "147813      Kilifi  Farming/Labour   \n",
       "147814      KIlifi       Education   \n",
       "\n",
       "                                            target        t_gender  \\\n",
       "72647   0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31            male   \n",
       "72648   0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72            male   \n",
       "72649   0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31            male   \n",
       "72650   0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72            male   \n",
       "72651   0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31            male   \n",
       "...                                            ...             ...   \n",
       "147810  0x5CAaA1f7dC13235Fe181D0307e682c387e75a6ec  Unknown gender   \n",
       "147811  0xb44279a1d11A2bc4b1b3D08D3BEAb8278cc86985  Unknown gender   \n",
       "147812  0xfCF20a412eB6DD345237C7BEeBab53B424b98297            male   \n",
       "147813  0x2f99a653F5dc201eA97578A6a203BC4db1eaD2FF  Unknown gender   \n",
       "147814  0xAc4DB7728940e76BCd98Bb8E60671916f3B7576A            male   \n",
       "\n",
       "        t_location t_business_type  tx_token  weight      type token_name  \\\n",
       "72647   GE Nairobi  Farming/Labour       NaN  9007.0  directed     Sarafu   \n",
       "72648         Home  Farming/Labour       NaN   100.0  directed     Sarafu   \n",
       "72649   GE Nairobi  Farming/Labour       NaN     2.0  directed     Sarafu   \n",
       "72650         Home  Farming/Labour       NaN    23.0  directed     Sarafu   \n",
       "72651   GE Nairobi  Farming/Labour       NaN    12.0  directed     Sarafu   \n",
       "...            ...             ...       ...     ...       ...        ...   \n",
       "147810    Kilibole      Food/Water       NaN    20.0  directed     Sarafu   \n",
       "147811        Bofu            Shop       NaN   350.0  directed     Sarafu   \n",
       "147812      Miyani            Shop       NaN   400.0  directed     Sarafu   \n",
       "147813      KIlifi       Education       NaN    20.0  directed     Sarafu   \n",
       "147814      Kilifi  Farming/Labour       NaN    20.0  directed     Sarafu   \n",
       "\n",
       "                                     token_address  \n",
       "72647   0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4  \n",
       "72648   0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4  \n",
       "72649   0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4  \n",
       "72650   0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4  \n",
       "72651   0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4  \n",
       "...                                            ...  \n",
       "147810  0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4  \n",
       "147811  0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4  \n",
       "147812  0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4  \n",
       "147813  0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4  \n",
       "147814  0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4  \n",
       "\n",
       "[75167 rows x 16 columns]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "transactions_subset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "transactions_subset_v1 = transactions_subset.merge(user_subset, how='left', left_on='source', right_on='xDAI_blockchain_address')\n",
    "transactions_subset_v1['s_bal'] = transactions_subset_v1['bal']\n",
    "del transactions_subset_v1['bal']\n",
    "transactions_subset_v1['s_xDAI_blockchain_address'] = transactions_subset_v1['xDAI_blockchain_address']\n",
    "del transactions_subset_v1['xDAI_blockchain_address']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "transactions_subset_v2 = transactions_subset_v1.merge(user_subset, how='left', left_on='target', right_on='xDAI_blockchain_address')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "transactions_subset_v2 = transactions_subset_v1.merge(user_subset, how='left', left_on='target', right_on='xDAI_blockchain_address')\n",
    "transactions_subset_v2['t_bal'] = transactions_subset_v2['bal']\n",
    "del transactions_subset_v2['bal']\n",
    "transactions_subset_v2['t_xDAI_blockchain_address'] = transactions_subset_v2['xDAI_blockchain_address']\n",
    "del transactions_subset_v2['xDAI_blockchain_address']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>timeset</th>\n",
       "      <th>transfer_subtype</th>\n",
       "      <th>source</th>\n",
       "      <th>s_gender</th>\n",
       "      <th>s_location</th>\n",
       "      <th>s_business_type</th>\n",
       "      <th>target</th>\n",
       "      <th>t_gender</th>\n",
       "      <th>t_location</th>\n",
       "      <th>t_business_type</th>\n",
       "      <th>tx_token</th>\n",
       "      <th>weight</th>\n",
       "      <th>type</th>\n",
       "      <th>token_name</th>\n",
       "      <th>token_address</th>\n",
       "      <th>s_bal</th>\n",
       "      <th>s_xDAI_blockchain_address</th>\n",
       "      <th>t_bal</th>\n",
       "      <th>t_xDAI_blockchain_address</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>170140</td>\n",
       "      <td>2020-04-30 10:43:45.170528</td>\n",
       "      <td>STANDARD</td>\n",
       "      <td>0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2</td>\n",
       "      <td>male</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>male</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>NaN</td>\n",
       "      <td>9007.0</td>\n",
       "      <td>directed</td>\n",
       "      <td>Sarafu</td>\n",
       "      <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
       "      <td>56.660892</td>\n",
       "      <td>0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2</td>\n",
       "      <td>11737.726002</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10</td>\n",
       "      <td>2020-01-26 08:26:22.521902</td>\n",
       "      <td>STANDARD</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>male</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
       "      <td>male</td>\n",
       "      <td>Home</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>NaN</td>\n",
       "      <td>100.0</td>\n",
       "      <td>directed</td>\n",
       "      <td>Sarafu</td>\n",
       "      <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
       "      <td>11737.726002</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>902.500000</td>\n",
       "      <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11</td>\n",
       "      <td>2020-01-26 08:27:26.757372</td>\n",
       "      <td>STANDARD</td>\n",
       "      <td>0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435</td>\n",
       "      <td>male</td>\n",
       "      <td>G.E</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>male</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>directed</td>\n",
       "      <td>Sarafu</td>\n",
       "      <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
       "      <td>7297.262576</td>\n",
       "      <td>0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435</td>\n",
       "      <td>11737.726002</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>13</td>\n",
       "      <td>2020-01-26 08:32:05.154096</td>\n",
       "      <td>STANDARD</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>male</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
       "      <td>male</td>\n",
       "      <td>Home</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>NaN</td>\n",
       "      <td>23.0</td>\n",
       "      <td>directed</td>\n",
       "      <td>Sarafu</td>\n",
       "      <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
       "      <td>11737.726002</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>902.500000</td>\n",
       "      <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>15</td>\n",
       "      <td>2020-01-26 08:38:42.186525</td>\n",
       "      <td>STANDARD</td>\n",
       "      <td>0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3</td>\n",
       "      <td>male</td>\n",
       "      <td>Test</td>\n",
       "      <td>Health</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>male</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>NaN</td>\n",
       "      <td>12.0</td>\n",
       "      <td>directed</td>\n",
       "      <td>Sarafu</td>\n",
       "      <td>0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4</td>\n",
       "      <td>448.000000</td>\n",
       "      <td>0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3</td>\n",
       "      <td>11737.726002</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       id                     timeset transfer_subtype  \\\n",
       "0  170140  2020-04-30 10:43:45.170528         STANDARD   \n",
       "1      10  2020-01-26 08:26:22.521902         STANDARD   \n",
       "2      11  2020-01-26 08:27:26.757372         STANDARD   \n",
       "3      13  2020-01-26 08:32:05.154096         STANDARD   \n",
       "4      15  2020-01-26 08:38:42.186525         STANDARD   \n",
       "\n",
       "                                       source s_gender  s_location  \\\n",
       "0  0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2     male  GE Nairobi   \n",
       "1  0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31     male  GE Nairobi   \n",
       "2  0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435     male         G.E   \n",
       "3  0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31     male  GE Nairobi   \n",
       "4  0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3     male        Test   \n",
       "\n",
       "  s_business_type                                      target t_gender  \\\n",
       "0  Farming/Labour  0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31     male   \n",
       "1  Farming/Labour  0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72     male   \n",
       "2  Farming/Labour  0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31     male   \n",
       "3  Farming/Labour  0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72     male   \n",
       "4          Health  0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31     male   \n",
       "\n",
       "   t_location t_business_type  tx_token  weight      type token_name  \\\n",
       "0  GE Nairobi  Farming/Labour       NaN  9007.0  directed     Sarafu   \n",
       "1        Home  Farming/Labour       NaN   100.0  directed     Sarafu   \n",
       "2  GE Nairobi  Farming/Labour       NaN     2.0  directed     Sarafu   \n",
       "3        Home  Farming/Labour       NaN    23.0  directed     Sarafu   \n",
       "4  GE Nairobi  Farming/Labour       NaN    12.0  directed     Sarafu   \n",
       "\n",
       "                                token_address         s_bal  \\\n",
       "0  0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4     56.660892   \n",
       "1  0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4  11737.726002   \n",
       "2  0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4   7297.262576   \n",
       "3  0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4  11737.726002   \n",
       "4  0x0Fd6e8F2320C90e9D4b3A5bd888c4D556d20AbD4    448.000000   \n",
       "\n",
       "                    s_xDAI_blockchain_address         t_bal  \\\n",
       "0  0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2  11737.726002   \n",
       "1  0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31    902.500000   \n",
       "2  0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435  11737.726002   \n",
       "3  0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31    902.500000   \n",
       "4  0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3  11737.726002   \n",
       "\n",
       "                    t_xDAI_blockchain_address  \n",
       "0  0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31  \n",
       "1  0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72  \n",
       "2  0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31  \n",
       "3  0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72  \n",
       "4  0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "transactions_subset_v2.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# subset the data into the needed columns for clustering\n",
    "combined = transactions_subset_v2[['source','s_location','s_business_type','target','t_location',\n",
    "            't_business_type','weight','s_bal','t_bal']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>source</th>\n",
       "      <th>s_location</th>\n",
       "      <th>s_business_type</th>\n",
       "      <th>target</th>\n",
       "      <th>t_location</th>\n",
       "      <th>t_business_type</th>\n",
       "      <th>weight</th>\n",
       "      <th>s_bal</th>\n",
       "      <th>t_bal</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>9007.0</td>\n",
       "      <td>56.660892</td>\n",
       "      <td>11737.726002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
       "      <td>Home</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>100.0</td>\n",
       "      <td>11737.726002</td>\n",
       "      <td>902.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435</td>\n",
       "      <td>G.E</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>2.0</td>\n",
       "      <td>7297.262576</td>\n",
       "      <td>11737.726002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
       "      <td>Home</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>23.0</td>\n",
       "      <td>11737.726002</td>\n",
       "      <td>902.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3</td>\n",
       "      <td>Test</td>\n",
       "      <td>Health</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>12.0</td>\n",
       "      <td>448.000000</td>\n",
       "      <td>11737.726002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75162</th>\n",
       "      <td>0x97F5165b544e0869ba3Be80D7eEe8b73a0270Dfe</td>\n",
       "      <td>kilibole</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>0x5CAaA1f7dC13235Fe181D0307e682c387e75a6ec</td>\n",
       "      <td>Kilibole</td>\n",
       "      <td>Food/Water</td>\n",
       "      <td>20.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>5.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75163</th>\n",
       "      <td>0x9a05d12df366cE3aa1420c6DFFD0db9ce4ba77Fc</td>\n",
       "      <td>Kikomani</td>\n",
       "      <td>Food/Water</td>\n",
       "      <td>0xb44279a1d11A2bc4b1b3D08D3BEAb8278cc86985</td>\n",
       "      <td>Bofu</td>\n",
       "      <td>Shop</td>\n",
       "      <td>350.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>800.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75164</th>\n",
       "      <td>0x2e44845BE57687bFdcdd26044bB7CdD575781336</td>\n",
       "      <td>Miyani</td>\n",
       "      <td>Shop</td>\n",
       "      <td>0xfCF20a412eB6DD345237C7BEeBab53B424b98297</td>\n",
       "      <td>Miyani</td>\n",
       "      <td>Shop</td>\n",
       "      <td>400.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>800.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75165</th>\n",
       "      <td>0xAc4DB7728940e76BCd98Bb8E60671916f3B7576A</td>\n",
       "      <td>Kilifi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>0x2f99a653F5dc201eA97578A6a203BC4db1eaD2FF</td>\n",
       "      <td>KIlifi</td>\n",
       "      <td>Education</td>\n",
       "      <td>20.0</td>\n",
       "      <td>400.000000</td>\n",
       "      <td>500.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75166</th>\n",
       "      <td>0x2f99a653F5dc201eA97578A6a203BC4db1eaD2FF</td>\n",
       "      <td>KIlifi</td>\n",
       "      <td>Education</td>\n",
       "      <td>0xAc4DB7728940e76BCd98Bb8E60671916f3B7576A</td>\n",
       "      <td>Kilifi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>20.0</td>\n",
       "      <td>500.000000</td>\n",
       "      <td>400.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>75167 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                           source  s_location s_business_type  \\\n",
       "0      0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2  GE Nairobi  Farming/Labour   \n",
       "1      0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31  GE Nairobi  Farming/Labour   \n",
       "2      0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435         G.E  Farming/Labour   \n",
       "3      0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31  GE Nairobi  Farming/Labour   \n",
       "4      0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3        Test          Health   \n",
       "...                                           ...         ...             ...   \n",
       "75162  0x97F5165b544e0869ba3Be80D7eEe8b73a0270Dfe    kilibole  Farming/Labour   \n",
       "75163  0x9a05d12df366cE3aa1420c6DFFD0db9ce4ba77Fc    Kikomani      Food/Water   \n",
       "75164  0x2e44845BE57687bFdcdd26044bB7CdD575781336      Miyani            Shop   \n",
       "75165  0xAc4DB7728940e76BCd98Bb8E60671916f3B7576A      Kilifi  Farming/Labour   \n",
       "75166  0x2f99a653F5dc201eA97578A6a203BC4db1eaD2FF      KIlifi       Education   \n",
       "\n",
       "                                           target  t_location t_business_type  \\\n",
       "0      0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31  GE Nairobi  Farming/Labour   \n",
       "1      0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72        Home  Farming/Labour   \n",
       "2      0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31  GE Nairobi  Farming/Labour   \n",
       "3      0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72        Home  Farming/Labour   \n",
       "4      0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31  GE Nairobi  Farming/Labour   \n",
       "...                                           ...         ...             ...   \n",
       "75162  0x5CAaA1f7dC13235Fe181D0307e682c387e75a6ec    Kilibole      Food/Water   \n",
       "75163  0xb44279a1d11A2bc4b1b3D08D3BEAb8278cc86985        Bofu            Shop   \n",
       "75164  0xfCF20a412eB6DD345237C7BEeBab53B424b98297      Miyani            Shop   \n",
       "75165  0x2f99a653F5dc201eA97578A6a203BC4db1eaD2FF      KIlifi       Education   \n",
       "75166  0xAc4DB7728940e76BCd98Bb8E60671916f3B7576A      Kilifi  Farming/Labour   \n",
       "\n",
       "       weight         s_bal         t_bal  \n",
       "0      9007.0     56.660892  11737.726002  \n",
       "1       100.0  11737.726002    902.500000  \n",
       "2         2.0   7297.262576  11737.726002  \n",
       "3        23.0  11737.726002    902.500000  \n",
       "4        12.0    448.000000  11737.726002  \n",
       "...       ...           ...           ...  \n",
       "75162    20.0      0.000000      5.000000  \n",
       "75163   350.0      0.000000    800.000000  \n",
       "75164   400.0      0.000000    800.000000  \n",
       "75165    20.0    400.000000    500.000000  \n",
       "75166    20.0    500.000000    400.000000  \n",
       "\n",
       "[75167 rows x 9 columns]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "combined"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "source = combined.source.values\n",
    "target = combined.target.values\n",
    "# remove the source and target variables for clustering\n",
    "del combined['source']\n",
    "del combined['target']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create dummy variables of the categorical variables \n",
    "updated = pd.get_dummies(combined)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define how many clusters to test\n",
    "clustersToTest = [10,20,25,30,40,50]\n",
    "# calculate the optimal number of clusters using the Gap Statistic -https://statweb.stanford.edu/~gwalther/gap\n",
    "optimalK = OptimalK(parallel_backend='joblib')\n",
    "n_clusters = optimalK(X=updated, cluster_array=clustersToTest)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>n_clusters</th>\n",
       "      <th>gap_value</th>\n",
       "      <th>gap*</th>\n",
       "      <th>ref_dispersion_std</th>\n",
       "      <th>sk</th>\n",
       "      <th>sk*</th>\n",
       "      <th>diff</th>\n",
       "      <th>diff*</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10.0</td>\n",
       "      <td>3.363615</td>\n",
       "      <td>1.414036e+14</td>\n",
       "      <td>2.510135e+12</td>\n",
       "      <td>0.019816</td>\n",
       "      <td>1.633046e+14</td>\n",
       "      <td>0.200252</td>\n",
       "      <td>1.555745e+14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>20.0</td>\n",
       "      <td>3.177815</td>\n",
       "      <td>9.154344e+13</td>\n",
       "      <td>1.200491e+12</td>\n",
       "      <td>0.014451</td>\n",
       "      <td>1.057143e+14</td>\n",
       "      <td>0.187308</td>\n",
       "      <td>1.033105e+14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>25.0</td>\n",
       "      <td>3.001553</td>\n",
       "      <td>7.603468e+13</td>\n",
       "      <td>7.675192e+11</td>\n",
       "      <td>0.011046</td>\n",
       "      <td>8.780176e+13</td>\n",
       "      <td>0.106216</td>\n",
       "      <td>8.643467e+13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>30.0</td>\n",
       "      <td>2.904411</td>\n",
       "      <td>6.720920e+13</td>\n",
       "      <td>5.602132e+11</td>\n",
       "      <td>0.009074</td>\n",
       "      <td>7.760920e+13</td>\n",
       "      <td>0.230329</td>\n",
       "      <td>7.539618e+13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>40.0</td>\n",
       "      <td>2.686358</td>\n",
       "      <td>5.289592e+13</td>\n",
       "      <td>6.017818e+11</td>\n",
       "      <td>0.012277</td>\n",
       "      <td>6.108290e+13</td>\n",
       "      <td>-1.248115</td>\n",
       "      <td>6.018132e+13</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   n_clusters  gap_value          gap*  ref_dispersion_std        sk  \\\n",
       "0        10.0   3.363615  1.414036e+14        2.510135e+12  0.019816   \n",
       "1        20.0   3.177815  9.154344e+13        1.200491e+12  0.014451   \n",
       "2        25.0   3.001553  7.603468e+13        7.675192e+11  0.011046   \n",
       "3        30.0   2.904411  6.720920e+13        5.602132e+11  0.009074   \n",
       "4        40.0   2.686358  5.289592e+13        6.017818e+11  0.012277   \n",
       "\n",
       "            sk*      diff         diff*  \n",
       "0  1.633046e+14  0.200252  1.555745e+14  \n",
       "1  1.057143e+14  0.187308  1.033105e+14  \n",
       "2  8.780176e+13  0.106216  8.643467e+13  \n",
       "3  7.760920e+13  0.230329  7.539618e+13  \n",
       "4  6.108290e+13 -1.248115  6.018132e+13  "
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "optimalK.gap_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.plot(optimalK.gap_df.n_clusters, optimalK.gap_df.gap_value, linewidth=3)\n",
    "plt.scatter(optimalK.gap_df[optimalK.gap_df.n_clusters == n_clusters].n_clusters,\n",
    "            optimalK.gap_df[optimalK.gap_df.n_clusters == n_clusters].gap_value, s=250, c='r')\n",
    "plt.grid(True)\n",
    "plt.text(20, 4, 'Clusters: {}'.format(str(clustersToTest)), horizontalalignment='center',verticalalignment='center')\n",
    "plt.xlabel('Cluster Count')\n",
    "plt.ylabel('Gap Value')\n",
    "plt.title('Gap Values by Cluster Count')\n",
    "plt.savefig('gap_statistic.png')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compute clusters based off of the following features:\n",
    "* s_location\n",
    "* s_business_type\n",
    "* t_location\n",
    "* t_business_type\n",
    "* weight, which is tokens exchange\n",
    "* s_bal\n",
    "* t_bal\n",
    "\n",
    "\n",
    "\"The KMeans algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares (see below). This algorithm requires the number of clusters to be specified. It scales well to large number of samples and has been used across a large range of application areas in many different fields.\n",
    "\n",
    "The k-means algorithm divides a set of  samples  into  disjoint clusters , each described by the mean \n",
    " of the samples in the cluster. The means are commonly called the cluster “centroids”; note that they are not, in general, points from , although they live in the same space.\" - https://scikit-learn.org/stable/modules/clustering.html#k-means"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/aclarkdata/anaconda3/lib/python3.7/site-packages/sklearn/cluster/_kmeans.py:974: FutureWarning: 'n_jobs' was deprecated in version 0.23 and will be removed in 0.25.\n",
      "  \" removed in 0.25.\", FutureWarning)\n"
     ]
    }
   ],
   "source": [
    "kmeans = KMeans(n_clusters=50, random_state=1,n_jobs=-1).fit(updated.values)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/aclarkdata/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  \n"
     ]
    }
   ],
   "source": [
    "# add the clusters back to the combined dataframe\n",
    "combined['cluster'] = kmeans.labels_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/aclarkdata/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  \n",
      "/home/aclarkdata/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  This is separate from the ipykernel package so we can avoid doing imports until\n"
     ]
    }
   ],
   "source": [
    "# add back the source and target variables\n",
    "combined['source'] = source\n",
    "combined['target'] = target"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>s_location</th>\n",
       "      <th>s_business_type</th>\n",
       "      <th>t_location</th>\n",
       "      <th>t_business_type</th>\n",
       "      <th>weight</th>\n",
       "      <th>s_bal</th>\n",
       "      <th>t_bal</th>\n",
       "      <th>cluster</th>\n",
       "      <th>source</th>\n",
       "      <th>target</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>9007.0</td>\n",
       "      <td>56.660892</td>\n",
       "      <td>11737.726002</td>\n",
       "      <td>13</td>\n",
       "      <td>0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>Home</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>100.0</td>\n",
       "      <td>11737.726002</td>\n",
       "      <td>902.500000</td>\n",
       "      <td>12</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>G.E</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>2.0</td>\n",
       "      <td>7297.262576</td>\n",
       "      <td>11737.726002</td>\n",
       "      <td>48</td>\n",
       "      <td>0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>Home</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>23.0</td>\n",
       "      <td>11737.726002</td>\n",
       "      <td>902.500000</td>\n",
       "      <td>12</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "      <td>0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Test</td>\n",
       "      <td>Health</td>\n",
       "      <td>GE Nairobi</td>\n",
       "      <td>Farming/Labour</td>\n",
       "      <td>12.0</td>\n",
       "      <td>448.000000</td>\n",
       "      <td>11737.726002</td>\n",
       "      <td>13</td>\n",
       "      <td>0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3</td>\n",
       "      <td>0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   s_location s_business_type  t_location t_business_type  weight  \\\n",
       "0  GE Nairobi  Farming/Labour  GE Nairobi  Farming/Labour  9007.0   \n",
       "1  GE Nairobi  Farming/Labour        Home  Farming/Labour   100.0   \n",
       "2         G.E  Farming/Labour  GE Nairobi  Farming/Labour     2.0   \n",
       "3  GE Nairobi  Farming/Labour        Home  Farming/Labour    23.0   \n",
       "4        Test          Health  GE Nairobi  Farming/Labour    12.0   \n",
       "\n",
       "          s_bal         t_bal  cluster  \\\n",
       "0     56.660892  11737.726002       13   \n",
       "1  11737.726002    902.500000       12   \n",
       "2   7297.262576  11737.726002       48   \n",
       "3  11737.726002    902.500000       12   \n",
       "4    448.000000  11737.726002       13   \n",
       "\n",
       "                                       source  \\\n",
       "0  0xC1697C1326fD192438515fE2F7E4cCb0C705C5d2   \n",
       "1  0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31   \n",
       "2  0xD95954e3fCd2f09A6Be5931D24f731eFa63BF435   \n",
       "3  0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31   \n",
       "4  0x4AfD04b9eD17759B362c8C929207Fe7ad81C39d3   \n",
       "\n",
       "                                       target  \n",
       "0  0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31  \n",
       "1  0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72  \n",
       "2  0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31  \n",
       "3  0x4AB73CfaC1732a9DcD74BdB4C9605f21832D7C72  \n",
       "4  0xBAB77A20a757e8438DfaBF01D5F36DD12d862B31  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "combined.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Calculate and plot Two PCA components of the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 165,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Create a PCA instance: pca\n",
    "pca = PCA(n_components=2)\n",
    "principalComponents = pca.fit_transform(updated\n",
    "                                       )\n",
    "df = pd.DataFrame(principalComponents)\n",
    "\n",
    "df['label'] = kmeans.labels_\n",
    "colors = plt.cm.Spectral(np.linspace(0, 1, len(df.label.unique())))\n",
    "\n",
    "for color, label in zip(colors, df.label.unique()):\n",
    "    \n",
    "    tempdf = df[df.label == label]\n",
    "    plt.scatter(tempdf[0], tempdf[1], c=color)\n",
    "    \n",
    "plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:, 1], c='r', s=500, alpha=0.5,)\n",
    "plt.grid(True)\n",
    "plt.text(200000, 260000, 'Clusters are the red dots', horizontalalignment='center',verticalalignment='center')\n",
    "plt.savefig('pca.png')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Descriptive statistics \n",
    "\n",
    "Calculate relevant statistics, such as median, mean, etc for creating probability distributions in the subpopulation model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>weight</th>\n",
       "      <th>s_bal</th>\n",
       "      <th>t_bal</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cluster</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>217.737536</td>\n",
       "      <td>332.674357</td>\n",
       "      <td>3122.710726</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>588.111940</td>\n",
       "      <td>793.864819</td>\n",
       "      <td>251651.998315</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>957.820312</td>\n",
       "      <td>7089.179214</td>\n",
       "      <td>21755.181601</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>349.925309</td>\n",
       "      <td>516.042937</td>\n",
       "      <td>64166.491418</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>455.317844</td>\n",
       "      <td>64995.360511</td>\n",
       "      <td>751.148124</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2443.890625</td>\n",
       "      <td>251651.998315</td>\n",
       "      <td>1746.597767</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>586.917409</td>\n",
       "      <td>1022.866799</td>\n",
       "      <td>38404.821609</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1533.448040</td>\n",
       "      <td>23214.200804</td>\n",
       "      <td>1469.670862</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>408.100000</td>\n",
       "      <td>895.817336</td>\n",
       "      <td>100579.182676</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1468.701299</td>\n",
       "      <td>64148.612782</td>\n",
       "      <td>22314.167052</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>271.261258</td>\n",
       "      <td>573.611940</td>\n",
       "      <td>1152.574038</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>365.099390</td>\n",
       "      <td>406.400941</td>\n",
       "      <td>14410.774177</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>790.907135</td>\n",
       "      <td>11235.023756</td>\n",
       "      <td>956.054517</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>433.822322</td>\n",
       "      <td>537.032863</td>\n",
       "      <td>9837.111302</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>8074.000000</td>\n",
       "      <td>251651.998315</td>\n",
       "      <td>121082.429001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>1187.972973</td>\n",
       "      <td>9402.471887</td>\n",
       "      <td>55225.482860</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>15562.500000</td>\n",
       "      <td>121082.429001</td>\n",
       "      <td>50899.750788</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>389.501989</td>\n",
       "      <td>401.109603</td>\n",
       "      <td>25436.848041</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>4298.095238</td>\n",
       "      <td>34136.137502</td>\n",
       "      <td>62137.100395</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>80000.000000</td>\n",
       "      <td>63145.960000</td>\n",
       "      <td>121082.429001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>8978.181818</td>\n",
       "      <td>44220.000308</td>\n",
       "      <td>121082.429001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>1039.047619</td>\n",
       "      <td>100579.182676</td>\n",
       "      <td>1313.687026</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>302.688709</td>\n",
       "      <td>2335.376506</td>\n",
       "      <td>576.133882</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>374.248656</td>\n",
       "      <td>529.628969</td>\n",
       "      <td>31342.309668</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>1631.845550</td>\n",
       "      <td>38576.683107</td>\n",
       "      <td>1335.521733</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>1962.083333</td>\n",
       "      <td>63573.309892</td>\n",
       "      <td>43986.716465</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>285.848711</td>\n",
       "      <td>848.769339</td>\n",
       "      <td>18260.218613</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>37712.500000</td>\n",
       "      <td>15293.127761</td>\n",
       "      <td>6721.732034</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>1941.818182</td>\n",
       "      <td>45110.659482</td>\n",
       "      <td>251651.998315</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>468.716263</td>\n",
       "      <td>17535.443217</td>\n",
       "      <td>791.779968</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>3240.503052</td>\n",
       "      <td>862.117155</td>\n",
       "      <td>993.258934</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>1236.973333</td>\n",
       "      <td>15340.314634</td>\n",
       "      <td>39194.641356</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>1408.134556</td>\n",
       "      <td>15404.220436</td>\n",
       "      <td>68353.725751</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>317.344987</td>\n",
       "      <td>336.022102</td>\n",
       "      <td>6084.374342</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>404.778700</td>\n",
       "      <td>594.851729</td>\n",
       "      <td>44619.093968</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>482.002628</td>\n",
       "      <td>4479.433066</td>\n",
       "      <td>4557.371287</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>3593.148148</td>\n",
       "      <td>3569.248851</td>\n",
       "      <td>121082.429001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>9354.545455</td>\n",
       "      <td>121082.429001</td>\n",
       "      <td>2014.247057</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>14338.571429</td>\n",
       "      <td>251651.998315</td>\n",
       "      <td>60154.599923</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>503.700787</td>\n",
       "      <td>928.528786</td>\n",
       "      <td>71742.725832</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>175.678483</td>\n",
       "      <td>294.939972</td>\n",
       "      <td>271.114703</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>1302.072607</td>\n",
       "      <td>12999.924780</td>\n",
       "      <td>7072.339809</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>387.122709</td>\n",
       "      <td>697.305678</td>\n",
       "      <td>21694.464157</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>437.794249</td>\n",
       "      <td>568.978270</td>\n",
       "      <td>55748.221328</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>11678.857143</td>\n",
       "      <td>4067.108100</td>\n",
       "      <td>4226.813826</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>548.573207</td>\n",
       "      <td>6283.897324</td>\n",
       "      <td>679.334775</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>13100.037736</td>\n",
       "      <td>4238.100340</td>\n",
       "      <td>28746.781097</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47</th>\n",
       "      <td>32377.777778</td>\n",
       "      <td>59578.441515</td>\n",
       "      <td>14515.326330</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48</th>\n",
       "      <td>772.342169</td>\n",
       "      <td>4772.491359</td>\n",
       "      <td>12047.885243</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>3661.085106</td>\n",
       "      <td>30063.054365</td>\n",
       "      <td>19776.065227</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               weight          s_bal          t_bal\n",
       "cluster                                            \n",
       "0          217.737536     332.674357    3122.710726\n",
       "1          588.111940     793.864819  251651.998315\n",
       "2          957.820312    7089.179214   21755.181601\n",
       "3          349.925309     516.042937   64166.491418\n",
       "4          455.317844   64995.360511     751.148124\n",
       "5         2443.890625  251651.998315    1746.597767\n",
       "6          586.917409    1022.866799   38404.821609\n",
       "7         1533.448040   23214.200804    1469.670862\n",
       "8          408.100000     895.817336  100579.182676\n",
       "9         1468.701299   64148.612782   22314.167052\n",
       "10         271.261258     573.611940    1152.574038\n",
       "11         365.099390     406.400941   14410.774177\n",
       "12         790.907135   11235.023756     956.054517\n",
       "13         433.822322     537.032863    9837.111302\n",
       "14        8074.000000  251651.998315  121082.429001\n",
       "15        1187.972973    9402.471887   55225.482860\n",
       "16       15562.500000  121082.429001   50899.750788\n",
       "17         389.501989     401.109603   25436.848041\n",
       "18        4298.095238   34136.137502   62137.100395\n",
       "19       80000.000000   63145.960000  121082.429001\n",
       "20        8978.181818   44220.000308  121082.429001\n",
       "21        1039.047619  100579.182676    1313.687026\n",
       "22         302.688709    2335.376506     576.133882\n",
       "23         374.248656     529.628969   31342.309668\n",
       "24        1631.845550   38576.683107    1335.521733\n",
       "25        1962.083333   63573.309892   43986.716465\n",
       "26         285.848711     848.769339   18260.218613\n",
       "27       37712.500000   15293.127761    6721.732034\n",
       "28        1941.818182   45110.659482  251651.998315\n",
       "29         468.716263   17535.443217     791.779968\n",
       "30        3240.503052     862.117155     993.258934\n",
       "31        1236.973333   15340.314634   39194.641356\n",
       "32        1408.134556   15404.220436   68353.725751\n",
       "33         317.344987     336.022102    6084.374342\n",
       "34         404.778700     594.851729   44619.093968\n",
       "35         482.002628    4479.433066    4557.371287\n",
       "36        3593.148148    3569.248851  121082.429001\n",
       "37        9354.545455  121082.429001    2014.247057\n",
       "38       14338.571429  251651.998315   60154.599923\n",
       "39         503.700787     928.528786   71742.725832\n",
       "40         175.678483     294.939972     271.114703\n",
       "41        1302.072607   12999.924780    7072.339809\n",
       "42         387.122709     697.305678   21694.464157\n",
       "43         437.794249     568.978270   55748.221328\n",
       "44       11678.857143    4067.108100    4226.813826\n",
       "45         548.573207    6283.897324     679.334775\n",
       "46       13100.037736    4238.100340   28746.781097\n",
       "47       32377.777778   59578.441515   14515.326330\n",
       "48         772.342169    4772.491359   12047.885243\n",
       "49        3661.085106   30063.054365   19776.065227"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "combined.groupby('cluster').mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "# compute median, Q1,Q3, mean, and sigma\n",
    "clustersMedianSourceBalance = []\n",
    "clusters1stQSourceBalance = []\n",
    "clusters3rdQSourceBalance = []\n",
    "clustersMu = []\n",
    "clustersSigma = []\n",
    "for i in range(0,len(combined.cluster.unique())):\n",
    "    temp = combined[combined['cluster']==i]\n",
    "    clustersMu.append(round(temp.weight.mean(),2))\n",
    "    clustersSigma.append(round(temp.weight.std(),2))\n",
    "    clustersMedianSourceBalance.append(round(temp.weight.median(),2))\n",
    "    clusters1stQSourceBalance.append(round(temp.s_bal.quantile(0.25),2))\n",
    "    clusters3rdQSourceBalance.append(round(temp.s_bal.quantile(0.75),2))\n",
    "    \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "clusters = []\n",
    "for i in range(0,len(combined.cluster.unique())):\n",
    "    clusters.append(str(i))\n",
    "    \n",
    "    \n",
    "mixingAgents = clusters.copy()\n",
    "mixingAgents.append('external')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "UtilityTypesOrdered = {}\n",
    "for i in range(0,len(combined.cluster.unique())):\n",
    "    UtilityTypesOrdered[str(i)] = dict(zip(list(combined[combined['cluster']==i].t_business_type.value_counts(normalize=True).to_dict().keys()),list(combined[combined['cluster']==i].t_business_type.value_counts(normalize=True).to_dict().values())))\n",
    "    \n",
    "UtilityTypesOrdered['external'] =  {'Food/Water':1,\n",
    "                                            'Fuel/Energy':2,\n",
    "                                            'Health':3,\n",
    "                                            'Education':4,\n",
    "                                            'Savings Group':5,\n",
    "                                            'Shop':6}\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "utilityTypesProbability = {}\n",
    "for i in range(0,len(combined.cluster.unique())):\n",
    "    utilityTypesProbability[str(i)] = combined[combined['cluster']==i].t_business_type.value_counts(normalize=True).to_dict()\n",
    "    \n",
    "    \n",
    "utilityTypesProbability['external'] = {'Food/Water':0.6,\n",
    "                                            'Fuel/Energy':0.10,\n",
    "                                            'Health':0.03,\n",
    "                                            'Education':0.015,\n",
    "                                            'Savings Group':0.065,\n",
    "                                            'Shop':0.19}\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create initilization file (copy from here) \n",
    "\n",
    "clusters = ['0',\n",
    " '1',\n",
    " '2',\n",
    " '3',\n",
    " '4',\n",
    " '5',\n",
    " '6',\n",
    " '7',\n",
    " '8',\n",
    " '9',\n",
    " '10',\n",
    " '11',\n",
    " '12',\n",
    " '13',\n",
    " '14',\n",
    " '15',\n",
    " '16',\n",
    " '17',\n",
    " '18',\n",
    " '19',\n",
    " '20',\n",
    " '21',\n",
    " '22',\n",
    " '23',\n",
    " '24',\n",
    " '25',\n",
    " '26',\n",
    " '27',\n",
    " '28',\n",
    " '29',\n",
    " '30',\n",
    " '31',\n",
    " '32',\n",
    " '33',\n",
    " '34',\n",
    " '35',\n",
    " '36',\n",
    " '37',\n",
    " '38',\n",
    " '39',\n",
    " '40',\n",
    " '41',\n",
    " '42',\n",
    " '43',\n",
    " '44',\n",
    " '45',\n",
    " '46',\n",
    " '47',\n",
    " '48',\n",
    " '49']\n",
    "\n",
    "mixingAgents = ['0',\n",
    " '1',\n",
    " '2',\n",
    " '3',\n",
    " '4',\n",
    " '5',\n",
    " '6',\n",
    " '7',\n",
    " '8',\n",
    " '9',\n",
    " '10',\n",
    " '11',\n",
    " '12',\n",
    " '13',\n",
    " '14',\n",
    " '15',\n",
    " '16',\n",
    " '17',\n",
    " '18',\n",
    " '19',\n",
    " '20',\n",
    " '21',\n",
    " '22',\n",
    " '23',\n",
    " '24',\n",
    " '25',\n",
    " '26',\n",
    " '27',\n",
    " '28',\n",
    " '29',\n",
    " '30',\n",
    " '31',\n",
    " '32',\n",
    " '33',\n",
    " '34',\n",
    " '35',\n",
    " '36',\n",
    " '37',\n",
    " '38',\n",
    " '39',\n",
    " '40',\n",
    " '41',\n",
    " '42',\n",
    " '43',\n",
    " '44',\n",
    " '45',\n",
    " '46',\n",
    " '47',\n",
    " '48',\n",
    " '49',\n",
    " 'external']\n",
    "\n",
    "\n",
    "clustersMedianSourceBalance = [150.0,\n",
    " 340.0,\n",
    " 250.0,\n",
    " 20.0,\n",
    " 330.0,\n",
    " 320.0,\n",
    " 240.0,\n",
    " 300.0,\n",
    " 300.0,\n",
    " 50.0,\n",
    " 900.0,\n",
    " 120.0,\n",
    " 400.0,\n",
    " 180.0,\n",
    " 300.0,\n",
    " 6000.0,\n",
    " 132.5,\n",
    " 130.0,\n",
    " 160.0,\n",
    " 5000.0,\n",
    " 150.0,\n",
    " 10000.0,\n",
    " 200.0,\n",
    " 10000.0,\n",
    " 200.0,\n",
    " 200.0,\n",
    " 35000.0,\n",
    " 20000.0,\n",
    " 100.0,\n",
    " 500.0,\n",
    " 425.0,\n",
    " 13320.0,\n",
    " 500.0,\n",
    " 500.0,\n",
    " 1000.0,\n",
    " 390.0,\n",
    " 150.0,\n",
    " 250.0,\n",
    " 45000.0,\n",
    " 36300.0,\n",
    " 960.0,\n",
    " 120.0,\n",
    " 200.0,\n",
    " 100.0,\n",
    " 220.0,\n",
    " 600.0,\n",
    " 62000.0,\n",
    " 500.0,\n",
    " 900.0,\n",
    " 486.0]\n",
    "\n",
    "clusters1stQSourceBalance = [56.0,\n",
    " 118.46,\n",
    " 105.0,\n",
    " 64767.51,\n",
    " 251652.0,\n",
    " 124.5,\n",
    " 4139.28,\n",
    " 146.1,\n",
    " 1002.5,\n",
    " 17145.78,\n",
    " 52676.2,\n",
    " 100.0,\n",
    " 121082.43,\n",
    " 112.0,\n",
    " 28849.43,\n",
    " 27619.22,\n",
    " 66.36,\n",
    " 251652.0,\n",
    " 148.0,\n",
    " 38653.54,\n",
    " 67.22,\n",
    " 121082.43,\n",
    " 6429.46,\n",
    " 555.04,\n",
    " 104.48,\n",
    " 96.43,\n",
    " 52676.2,\n",
    " 251652.0,\n",
    " 64.73,\n",
    " 36824.5,\n",
    " 15182.03,\n",
    " 485.94,\n",
    " 21660.89,\n",
    " 11210.0,\n",
    " 100579.18,\n",
    " 100.46,\n",
    " 2845.01,\n",
    " 3338.98,\n",
    " 1274.91,\n",
    " 6724.88,\n",
    " 38653.54,\n",
    " 114.5,\n",
    " 68.0,\n",
    " 100.0,\n",
    " 20.93,\n",
    " 14050.3,\n",
    " 63145.96,\n",
    " 9276.23,\n",
    " 63234.8,\n",
    " 64767.51]\n",
    "\n",
    "clusters3rdQSourceBalance = [403.96,\n",
    " 506.6,\n",
    " 592.96,\n",
    " 64767.51,\n",
    " 251652.0,\n",
    " 1501.41,\n",
    " 7214.9,\n",
    " 869.82,\n",
    " 1557.01,\n",
    " 18304.36,\n",
    " 55142.93,\n",
    " 419.96,\n",
    " 121082.43,\n",
    " 816.3,\n",
    " 38653.54,\n",
    " 37106.89,\n",
    " 770.65,\n",
    " 251652.0,\n",
    " 838.46,\n",
    " 38653.54,\n",
    " 315.0,\n",
    " 121082.43,\n",
    " 9074.79,\n",
    " 5726.66,\n",
    " 602.02,\n",
    " 437.96,\n",
    " 63234.8,\n",
    " 251652.0,\n",
    " 425.0,\n",
    " 40953.15,\n",
    " 17145.78,\n",
    " 6349.27,\n",
    " 25695.83,\n",
    " 13156.46,\n",
    " 100579.18,\n",
    " 819.33,\n",
    " 4158.5,\n",
    " 5597.38,\n",
    " 2823.81,\n",
    " 20030.91,\n",
    " 51710.52,\n",
    " 537.94,\n",
    " 542.92,\n",
    " 415.43,\n",
    " 895.66,\n",
    " 18304.36,\n",
    " 63145.96,\n",
    " 14050.3,\n",
    " 64767.51,\n",
    " 64767.51]\n",
    "\n",
    "clustersMu = [329.98,\n",
    " 588.11,\n",
    " 469.93,\n",
    " 492.32,\n",
    " 2443.89,\n",
    " 565.21,\n",
    " 1120.5,\n",
    " 408.1,\n",
    " 550.09,\n",
    " 503.42,\n",
    " 2478.89,\n",
    " 349.93,\n",
    " 9354.55,\n",
    " 453.69,\n",
    " 4298.1,\n",
    " 7508.1,\n",
    " 376.86,\n",
    " 8074.0,\n",
    " 333.75,\n",
    " 7691.43,\n",
    " 362.68,\n",
    " 15562.5,\n",
    " 672.28,\n",
    " 10809.6,\n",
    " 274.98,\n",
    " 405.46,\n",
    " 34555.56,\n",
    " 14338.57,\n",
    " 255.48,\n",
    " 1229.44,\n",
    " 1470.23,\n",
    " 14590.61,\n",
    " 1527.75,\n",
    " 770.73,\n",
    " 1039.05,\n",
    " 503.7,\n",
    " 362.11,\n",
    " 499.51,\n",
    " 45000.0,\n",
    " 37504.55,\n",
    " 1941.82,\n",
    " 262.96,\n",
    " 702.23,\n",
    " 168.57,\n",
    " 2000.58,\n",
    " 1383.32,\n",
    " 65333.33,\n",
    " 1454.43,\n",
    " 1483.11,\n",
    " 1853.03]\n",
    "\n",
    "clustersSigma = [583.23,\n",
    " 1501.26,\n",
    " 966.32,\n",
    " 1452.2,\n",
    " 6789.39,\n",
    " 847.29,\n",
    " 2228.12,\n",
    " 483.5,\n",
    " 852.2,\n",
    " 1170.38,\n",
    " 3256.26,\n",
    " 1174.55,\n",
    " 16235.99,\n",
    " 841.35,\n",
    " 7696.91,\n",
    " 6814.68,\n",
    " 785.21,\n",
    " 10886.9,\n",
    " 712.65,\n",
    " 8713.11,\n",
    " 708.54,\n",
    " 18542.24,\n",
    " 1164.0,\n",
    " 3682.08,\n",
    " 340.99,\n",
    " 624.76,\n",
    " 8171.77,\n",
    " 15060.34,\n",
    " 461.52,\n",
    " 1774.39,\n",
    " 4617.97,\n",
    " 4770.82,\n",
    " 2641.75,\n",
    " 1133.41,\n",
    " 767.87,\n",
    " 437.68,\n",
    " 652.72,\n",
    " 761.07,\n",
    " 7071.07,\n",
    " 5274.96,\n",
    " 2716.8,\n",
    " 572.43,\n",
    " 1553.21,\n",
    " 210.61,\n",
    " 4477.94,\n",
    " 1798.73,\n",
    " 31134.12,\n",
    " 2147.9,\n",
    " 1900.27,\n",
    " 2909.68]\n",
    "\n",
    "\n",
    "# nested dictionary\n",
    "UtilityTypesOrdered = {'0': {'Food/Water': 0.4119323241317899,\n",
    "  'Farming/Labour': 0.26090828138913624,\n",
    "  'Shop': 0.17916295636687443,\n",
    "  'Savings Group': 0.07266251113089937,\n",
    "  'Fuel/Energy': 0.034194122885129116,\n",
    "  'Transport': 0.02617987533392698,\n",
    "  'Health': 0.006767586821015138,\n",
    "  'Education': 0.004096170970614425,\n",
    "  'None': 0.004096170970614425},\n",
    " '1': {'Food/Water': 1.0},\n",
    " '2': {'Savings Group': 0.87890625,\n",
    "  'Health': 0.08984375,\n",
    "  'Food/Water': 0.03125},\n",
    " '3': {'Savings Group': 0.4905964535196131,\n",
    "  'Farming/Labour': 0.3610961848468565,\n",
    "  'Food/Water': 0.14830736163353037},\n",
    " '4': {'Farming/Labour': 0.2843866171003718,\n",
    "  'Shop': 0.25650557620817843,\n",
    "  'Fuel/Energy': 0.17843866171003717,\n",
    "  'Food/Water': 0.16171003717472118,\n",
    "  'None': 0.10966542750929369,\n",
    "  'Savings Group': 0.0055762081784386614,\n",
    "  'Transport': 0.0037174721189591076},\n",
    " '5': {'Farming/Labour': 0.421875,\n",
    "  'Food/Water': 0.421875,\n",
    "  'Shop': 0.0625,\n",
    "  'Savings Group': 0.03125,\n",
    "  'Fuel/Energy': 0.03125,\n",
    "  'Transport': 0.03125},\n",
    " '6': {'Savings Group': 0.6008097165991902,\n",
    "  'Food/Water': 0.35870445344129553,\n",
    "  'Shop': 0.04048582995951417},\n",
    " '7': {'Farming/Labour': 0.4346590909090909,\n",
    "  'Food/Water': 0.2869318181818182,\n",
    "  'Shop': 0.1278409090909091,\n",
    "  'Fuel/Energy': 0.07670454545454546,\n",
    "  'Savings Group': 0.03977272727272727,\n",
    "  'Education': 0.017045454545454544,\n",
    "  'None': 0.011363636363636364,\n",
    "  'Transport': 0.002840909090909091,\n",
    "  'Health': 0.002840909090909091},\n",
    " '8': {'Savings Group': 1.0},\n",
    " '9': {'Savings Group': 0.7142857142857143,\n",
    "  'Food/Water': 0.18181818181818182,\n",
    "  'Farming/Labour': 0.07792207792207792,\n",
    "  'Education': 0.025974025974025976},\n",
    " '10': {'Food/Water': 0.3499875508340941,\n",
    "  'Farming/Labour': 0.3162088140094614,\n",
    "  'Shop': 0.21047389824881732,\n",
    "  'Transport': 0.03950535314133953,\n",
    "  'None': 0.03386173126400531,\n",
    "  'Fuel/Energy': 0.022491493069964313,\n",
    "  'Education': 0.01709685451074778,\n",
    "  'Savings Group': 0.006473566271059839,\n",
    "  'Environment': 0.002157855423686613,\n",
    "  'Health': 0.0016598887874512407,\n",
    "  'Chama': 8.299443937256204e-05},\n",
    " '11': {'Savings Group': 0.4873417721518987,\n",
    "  'Food/Water': 0.3377445339470656,\n",
    "  'Education': 0.09723820483314154,\n",
    "  'Farming/Labour': 0.06271576524741082,\n",
    "  'Shop': 0.014959723820483314},\n",
    " '12': {'Food/Water': 0.34994337485843713,\n",
    "  'Shop': 0.2332955832389581,\n",
    "  'Farming/Labour': 0.19592298980747452,\n",
    "  'Fuel/Energy': 0.057757644394110984,\n",
    "  'Savings Group': 0.053227633069082674,\n",
    "  'Education': 0.05096262740656852,\n",
    "  'None': 0.026047565118912798,\n",
    "  'Transport': 0.020385050962627407,\n",
    "  'Health': 0.011325028312570781,\n",
    "  'Environment': 0.0011325028312570782},\n",
    " '13': {'Savings Group': 0.3712871287128713,\n",
    "  'Food/Water': 0.247974797479748,\n",
    "  'Shop': 0.19801980198019803,\n",
    "  'Fuel/Energy': 0.08235823582358236,\n",
    "  'Health': 0.07605760576057606,\n",
    "  'Farming/Labour': 0.024302430243024302},\n",
    " '14': {'Savings Group': 1.0},\n",
    " '15': {'Savings Group': 1.0},\n",
    " '16': {'Savings Group': 0.5, 'Food/Water': 0.5},\n",
    " '17': {'Savings Group': 0.7335701598579041,\n",
    "  'Shop': 0.17584369449378331,\n",
    "  'Food/Water': 0.0905861456483126},\n",
    " '18': {'Savings Group': 0.6984126984126984,\n",
    "  'Food/Water': 0.23809523809523808,\n",
    "  'Farming/Labour': 0.06349206349206349},\n",
    " '19': {'Savings Group': 1.0},\n",
    " '20': {'Savings Group': 1.0},\n",
    " '21': {'Farming/Labour': 0.47619047619047616,\n",
    "  'Food/Water': 0.3333333333333333,\n",
    "  'Shop': 0.09523809523809523,\n",
    "  'Fuel/Energy': 0.047619047619047616,\n",
    "  'Transport': 0.047619047619047616},\n",
    " '22': {'Food/Water': 0.33040588654165676,\n",
    "  'Farming/Labour': 0.3209114645145977,\n",
    "  'Shop': 0.164016140517446,\n",
    "  'None': 0.06147638262520769,\n",
    "  'Fuel/Energy': 0.05008307619273677,\n",
    "  'Transport': 0.028957987182530263,\n",
    "  'Savings Group': 0.023973415618324233,\n",
    "  'Education': 0.014478993591265131,\n",
    "  'Health': 0.0035604082601471635,\n",
    "  'Environment': 0.0011868027533823878,\n",
    "  'Staff': 0.00047472110135295516,\n",
    "  'Chama': 0.00023736055067647758,\n",
    "  'Game': 0.00023736055067647758},\n",
    " '23': {'Savings Group': 0.8323424494649228,\n",
    "  'Farming/Labour': 0.16765755053507728},\n",
    " '24': {'Farming/Labour': 0.38481675392670156,\n",
    "  'Food/Water': 0.3717277486910995,\n",
    "  'Shop': 0.1387434554973822,\n",
    "  'Fuel/Energy': 0.05235602094240838,\n",
    "  'Transport': 0.02356020942408377,\n",
    "  'Savings Group': 0.01832460732984293,\n",
    "  'Education': 0.007853403141361256,\n",
    "  'Staff': 0.002617801047120419},\n",
    " '25': {'Savings Group': 0.7916666666666666,\n",
    "  'Food/Water': 0.20833333333333334},\n",
    " '26': {'Savings Group': 0.7442348008385744, 'Food/Water': 0.2557651991614256},\n",
    " '27': {'Food/Water': 0.3333333333333333,\n",
    "  'Farming/Labour': 0.25,\n",
    "  'Health': 0.25,\n",
    "  'Savings Group': 0.08333333333333333,\n",
    "  'Fuel/Energy': 0.08333333333333333},\n",
    " '28': {'Food/Water': 1.0},\n",
    " '29': {'Food/Water': 0.27335640138408307,\n",
    "  'Farming/Labour': 0.23529411764705882,\n",
    "  'Shop': 0.21972318339100347,\n",
    "  'Fuel/Energy': 0.21280276816608998,\n",
    "  'None': 0.03806228373702422,\n",
    "  'Education': 0.006920415224913495,\n",
    "  'Transport': 0.006920415224913495,\n",
    "  'Savings Group': 0.005190311418685121,\n",
    "  'Staff': 0.0017301038062283738},\n",
    " '30': {'Food/Water': 0.36228287841191065,\n",
    "  'Shop': 0.2679900744416873,\n",
    "  'Farming/Labour': 0.21712158808933002,\n",
    "  'Savings Group': 0.08436724565756824,\n",
    "  'Education': 0.02481389578163772,\n",
    "  'Fuel/Energy': 0.018610421836228287,\n",
    "  'Transport': 0.017369727047146403,\n",
    "  'None': 0.0037220843672456576,\n",
    "  'Health': 0.0024813895781637717,\n",
    "  'Environment': 0.0012406947890818859},\n",
    " '31': {'Savings Group': 0.8,\n",
    "  'Food/Water': 0.13333333333333333,\n",
    "  'Shop': 0.06666666666666667},\n",
    " '32': {'Savings Group': 0.7444444444444445,\n",
    "  'Farming/Labour': 0.2,\n",
    "  'Food/Water': 0.05555555555555555},\n",
    " '33': {'Food/Water': 0.33343474292668085,\n",
    "  'Farming/Labour': 0.28414968055978096,\n",
    "  'Savings Group': 0.18892607240644965,\n",
    "  'Shop': 0.1146942500760572,\n",
    "  'Fuel/Energy': 0.06936416184971098,\n",
    "  'None': 0.006693033160937024,\n",
    "  'Education': 0.0027380590203833284},\n",
    " '34': {'Savings Group': 1.0},\n",
    " '35': {'Food/Water': 0.3829787234042553,\n",
    "  'Farming/Labour': 0.2390488110137672,\n",
    "  'Shop': 0.1902377972465582,\n",
    "  'Savings Group': 0.07259073842302878,\n",
    "  'Transport': 0.060075093867334166,\n",
    "  'Health': 0.030037546933667083,\n",
    "  'Fuel/Energy': 0.016270337922403004,\n",
    "  'None': 0.0050062578222778474,\n",
    "  'Education': 0.0037546933667083854},\n",
    " '36': {'Savings Group': 1.0},\n",
    " '37': {'Farming/Labour': 0.5454545454545454,\n",
    "  'Food/Water': 0.36363636363636365,\n",
    "  'Savings Group': 0.045454545454545456,\n",
    "  'Shop': 0.045454545454545456},\n",
    " '38': {'Savings Group': 1.0},\n",
    " '39': {'Savings Group': 1.0},\n",
    " '40': {'Farming/Labour': 0.3595236417447678,\n",
    "  'Food/Water': 0.3165386512578395,\n",
    "  'Shop': 0.18842928616728913,\n",
    "  'Fuel/Energy': 0.05108871820167712,\n",
    "  'None': 0.0360439715312522,\n",
    "  'Transport': 0.022443802409978154,\n",
    "  'Education': 0.01039391163413431,\n",
    "  'Savings Group': 0.00842083010358678,\n",
    "  'Health': 0.004545134240011275,\n",
    "  'Staff': 0.0011627087590726517,\n",
    "  'Environment': 0.0010570079627933197,\n",
    "  'System': 0.00035233598759777326},\n",
    " '41': {'Food/Water': 0.33003300330033003,\n",
    "  'Farming/Labour': 0.2739273927392739,\n",
    "  'Shop': 0.1782178217821782,\n",
    "  'Savings Group': 0.13861386138613863,\n",
    "  'Health': 0.0429042904290429,\n",
    "  'Fuel/Energy': 0.0165016501650165,\n",
    "  'Transport': 0.0165016501650165,\n",
    "  'Education': 0.0033003300330033004},\n",
    " '42': {'Savings Group': 0.8661740558292282, 'Health': 0.13382594417077176},\n",
    " '43': {'Savings Group': 1.0},\n",
    " '44': {'Food/Water': 0.4805194805194805,\n",
    "  'Shop': 0.14285714285714285,\n",
    "  'Savings Group': 0.14285714285714285,\n",
    "  'Farming/Labour': 0.13636363636363635,\n",
    "  'Health': 0.06493506493506493,\n",
    "  'Transport': 0.012987012987012988,\n",
    "  'Environment': 0.012987012987012988,\n",
    "  'Fuel/Energy': 0.006493506493506494},\n",
    " '45': {'Food/Water': 0.35471100554235946,\n",
    "  'Farming/Labour': 0.2414885193982581,\n",
    "  'Shop': 0.23198733174980204,\n",
    "  'Education': 0.03800475059382423,\n",
    "  'None': 0.035629453681710214,\n",
    "  'Transport': 0.035629453681710214,\n",
    "  'Fuel/Energy': 0.028503562945368172,\n",
    "  'Savings Group': 0.02454473475851148,\n",
    "  'Health': 0.006334125098970704,\n",
    "  'Environment': 0.001583531274742676,\n",
    "  'Staff': 0.000791765637371338,\n",
    "  'System': 0.000791765637371338},\n",
    " '46': {'Savings Group': 0.6981132075471698,\n",
    "  'Health': 0.18867924528301888,\n",
    "  'Food/Water': 0.09433962264150944,\n",
    "  'Shop': 0.018867924528301886},\n",
    " '47': {'Savings Group': 0.5555555555555556,\n",
    "  'Farming/Labour': 0.2222222222222222,\n",
    "  'Food/Water': 0.2222222222222222},\n",
    " '48': {'Food/Water': 0.38795180722891565,\n",
    "  'Savings Group': 0.38313253012048193,\n",
    "  'Health': 0.10120481927710843,\n",
    "  'Shop': 0.09879518072289156,\n",
    "  'Fuel/Energy': 0.016867469879518072,\n",
    "  'Farming/Labour': 0.012048192771084338},\n",
    " '49': {'Food/Water': 0.3829787234042553,\n",
    "  'Savings Group': 0.3829787234042553,\n",
    "  'Education': 0.19148936170212766,\n",
    "  'Fuel/Energy': 0.0425531914893617},\n",
    " 'external': {'Food/Water': 1,\n",
    "  'Fuel/Energy': 2,\n",
    "  'Health': 3,\n",
    "  'Education': 4,\n",
    "  'Savings Group': 5,\n",
    "  'Shop': 6}}\n",
    "    \n",
    "#  nested dictionary \n",
    "utilityTypesProbability = {'0': {'Food/Water': 0.4119323241317899,\n",
    "  'Farming/Labour': 0.26090828138913624,\n",
    "  'Shop': 0.17916295636687443,\n",
    "  'Savings Group': 0.07266251113089937,\n",
    "  'Fuel/Energy': 0.034194122885129116,\n",
    "  'Transport': 0.02617987533392698,\n",
    "  'Health': 0.006767586821015138,\n",
    "  'Education': 0.004096170970614425,\n",
    "  'None': 0.004096170970614425},\n",
    " '1': {'Food/Water': 1.0},\n",
    " '2': {'Savings Group': 0.87890625,\n",
    "  'Health': 0.08984375,\n",
    "  'Food/Water': 0.03125},\n",
    " '3': {'Savings Group': 0.4905964535196131,\n",
    "  'Farming/Labour': 0.3610961848468565,\n",
    "  'Food/Water': 0.14830736163353037},\n",
    " '4': {'Farming/Labour': 0.2843866171003718,\n",
    "  'Shop': 0.25650557620817843,\n",
    "  'Fuel/Energy': 0.17843866171003717,\n",
    "  'Food/Water': 0.16171003717472118,\n",
    "  'None': 0.10966542750929369,\n",
    "  'Savings Group': 0.0055762081784386614,\n",
    "  'Transport': 0.0037174721189591076},\n",
    " '5': {'Farming/Labour': 0.421875,\n",
    "  'Food/Water': 0.421875,\n",
    "  'Shop': 0.0625,\n",
    "  'Savings Group': 0.03125,\n",
    "  'Fuel/Energy': 0.03125,\n",
    "  'Transport': 0.03125},\n",
    " '6': {'Savings Group': 0.6008097165991902,\n",
    "  'Food/Water': 0.35870445344129553,\n",
    "  'Shop': 0.04048582995951417},\n",
    " '7': {'Farming/Labour': 0.4346590909090909,\n",
    "  'Food/Water': 0.2869318181818182,\n",
    "  'Shop': 0.1278409090909091,\n",
    "  'Fuel/Energy': 0.07670454545454546,\n",
    "  'Savings Group': 0.03977272727272727,\n",
    "  'Education': 0.017045454545454544,\n",
    "  'None': 0.011363636363636364,\n",
    "  'Transport': 0.002840909090909091,\n",
    "  'Health': 0.002840909090909091},\n",
    " '8': {'Savings Group': 1.0},\n",
    " '9': {'Savings Group': 0.7142857142857143,\n",
    "  'Food/Water': 0.18181818181818182,\n",
    "  'Farming/Labour': 0.07792207792207792,\n",
    "  'Education': 0.025974025974025976},\n",
    " '10': {'Food/Water': 0.3499875508340941,\n",
    "  'Farming/Labour': 0.3162088140094614,\n",
    "  'Shop': 0.21047389824881732,\n",
    "  'Transport': 0.03950535314133953,\n",
    "  'None': 0.03386173126400531,\n",
    "  'Fuel/Energy': 0.022491493069964313,\n",
    "  'Education': 0.01709685451074778,\n",
    "  'Savings Group': 0.006473566271059839,\n",
    "  'Environment': 0.002157855423686613,\n",
    "  'Health': 0.0016598887874512407,\n",
    "  'Chama': 8.299443937256204e-05},\n",
    " '11': {'Savings Group': 0.4873417721518987,\n",
    "  'Food/Water': 0.3377445339470656,\n",
    "  'Education': 0.09723820483314154,\n",
    "  'Farming/Labour': 0.06271576524741082,\n",
    "  'Shop': 0.014959723820483314},\n",
    " '12': {'Food/Water': 0.34994337485843713,\n",
    "  'Shop': 0.2332955832389581,\n",
    "  'Farming/Labour': 0.19592298980747452,\n",
    "  'Fuel/Energy': 0.057757644394110984,\n",
    "  'Savings Group': 0.053227633069082674,\n",
    "  'Education': 0.05096262740656852,\n",
    "  'None': 0.026047565118912798,\n",
    "  'Transport': 0.020385050962627407,\n",
    "  'Health': 0.011325028312570781,\n",
    "  'Environment': 0.0011325028312570782},\n",
    " '13': {'Savings Group': 0.3712871287128713,\n",
    "  'Food/Water': 0.247974797479748,\n",
    "  'Shop': 0.19801980198019803,\n",
    "  'Fuel/Energy': 0.08235823582358236,\n",
    "  'Health': 0.07605760576057606,\n",
    "  'Farming/Labour': 0.024302430243024302},\n",
    " '14': {'Savings Group': 1.0},\n",
    " '15': {'Savings Group': 1.0},\n",
    " '16': {'Savings Group': 0.5, 'Food/Water': 0.5},\n",
    " '17': {'Savings Group': 0.7335701598579041,\n",
    "  'Shop': 0.17584369449378331,\n",
    "  'Food/Water': 0.0905861456483126},\n",
    " '18': {'Savings Group': 0.6984126984126984,\n",
    "  'Food/Water': 0.23809523809523808,\n",
    "  'Farming/Labour': 0.06349206349206349},\n",
    " '19': {'Savings Group': 1.0},\n",
    " '20': {'Savings Group': 1.0},\n",
    " '21': {'Farming/Labour': 0.47619047619047616,\n",
    "  'Food/Water': 0.3333333333333333,\n",
    "  'Shop': 0.09523809523809523,\n",
    "  'Fuel/Energy': 0.047619047619047616,\n",
    "  'Transport': 0.047619047619047616},\n",
    " '22': {'Food/Water': 0.33040588654165676,\n",
    "  'Farming/Labour': 0.3209114645145977,\n",
    "  'Shop': 0.164016140517446,\n",
    "  'None': 0.06147638262520769,\n",
    "  'Fuel/Energy': 0.05008307619273677,\n",
    "  'Transport': 0.028957987182530263,\n",
    "  'Savings Group': 0.023973415618324233,\n",
    "  'Education': 0.014478993591265131,\n",
    "  'Health': 0.0035604082601471635,\n",
    "  'Environment': 0.0011868027533823878,\n",
    "  'Staff': 0.00047472110135295516,\n",
    "  'Chama': 0.00023736055067647758,\n",
    "  'Game': 0.00023736055067647758},\n",
    " '23': {'Savings Group': 0.8323424494649228,\n",
    "  'Farming/Labour': 0.16765755053507728},\n",
    " '24': {'Farming/Labour': 0.38481675392670156,\n",
    "  'Food/Water': 0.3717277486910995,\n",
    "  'Shop': 0.1387434554973822,\n",
    "  'Fuel/Energy': 0.05235602094240838,\n",
    "  'Transport': 0.02356020942408377,\n",
    "  'Savings Group': 0.01832460732984293,\n",
    "  'Education': 0.007853403141361256,\n",
    "  'Staff': 0.002617801047120419},\n",
    " '25': {'Savings Group': 0.7916666666666666,\n",
    "  'Food/Water': 0.20833333333333334},\n",
    " '26': {'Savings Group': 0.7442348008385744, 'Food/Water': 0.2557651991614256},\n",
    " '27': {'Food/Water': 0.3333333333333333,\n",
    "  'Farming/Labour': 0.25,\n",
    "  'Health': 0.25,\n",
    "  'Savings Group': 0.08333333333333333,\n",
    "  'Fuel/Energy': 0.08333333333333333},\n",
    " '28': {'Food/Water': 1.0},\n",
    " '29': {'Food/Water': 0.27335640138408307,\n",
    "  'Farming/Labour': 0.23529411764705882,\n",
    "  'Shop': 0.21972318339100347,\n",
    "  'Fuel/Energy': 0.21280276816608998,\n",
    "  'None': 0.03806228373702422,\n",
    "  'Education': 0.006920415224913495,\n",
    "  'Transport': 0.006920415224913495,\n",
    "  'Savings Group': 0.005190311418685121,\n",
    "  'Staff': 0.0017301038062283738},\n",
    " '30': {'Food/Water': 0.36228287841191065,\n",
    "  'Shop': 0.2679900744416873,\n",
    "  'Farming/Labour': 0.21712158808933002,\n",
    "  'Savings Group': 0.08436724565756824,\n",
    "  'Education': 0.02481389578163772,\n",
    "  'Fuel/Energy': 0.018610421836228287,\n",
    "  'Transport': 0.017369727047146403,\n",
    "  'None': 0.0037220843672456576,\n",
    "  'Health': 0.0024813895781637717,\n",
    "  'Environment': 0.0012406947890818859},\n",
    " '31': {'Savings Group': 0.8,\n",
    "  'Food/Water': 0.13333333333333333,\n",
    "  'Shop': 0.06666666666666667},\n",
    " '32': {'Savings Group': 0.7444444444444445,\n",
    "  'Farming/Labour': 0.2,\n",
    "  'Food/Water': 0.05555555555555555},\n",
    " '33': {'Food/Water': 0.33343474292668085,\n",
    "  'Farming/Labour': 0.28414968055978096,\n",
    "  'Savings Group': 0.18892607240644965,\n",
    "  'Shop': 0.1146942500760572,\n",
    "  'Fuel/Energy': 0.06936416184971098,\n",
    "  'None': 0.006693033160937024,\n",
    "  'Education': 0.0027380590203833284},\n",
    " '34': {'Savings Group': 1.0},\n",
    " '35': {'Food/Water': 0.3829787234042553,\n",
    "  'Farming/Labour': 0.2390488110137672,\n",
    "  'Shop': 0.1902377972465582,\n",
    "  'Savings Group': 0.07259073842302878,\n",
    "  'Transport': 0.060075093867334166,\n",
    "  'Health': 0.030037546933667083,\n",
    "  'Fuel/Energy': 0.016270337922403004,\n",
    "  'None': 0.0050062578222778474,\n",
    "  'Education': 0.0037546933667083854},\n",
    " '36': {'Savings Group': 1.0},\n",
    " '37': {'Farming/Labour': 0.5454545454545454,\n",
    "  'Food/Water': 0.36363636363636365,\n",
    "  'Savings Group': 0.045454545454545456,\n",
    "  'Shop': 0.045454545454545456},\n",
    " '38': {'Savings Group': 1.0},\n",
    " '39': {'Savings Group': 1.0},\n",
    " '40': {'Farming/Labour': 0.3595236417447678,\n",
    "  'Food/Water': 0.3165386512578395,\n",
    "  'Shop': 0.18842928616728913,\n",
    "  'Fuel/Energy': 0.05108871820167712,\n",
    "  'None': 0.0360439715312522,\n",
    "  'Transport': 0.022443802409978154,\n",
    "  'Education': 0.01039391163413431,\n",
    "  'Savings Group': 0.00842083010358678,\n",
    "  'Health': 0.004545134240011275,\n",
    "  'Staff': 0.0011627087590726517,\n",
    "  'Environment': 0.0010570079627933197,\n",
    "  'System': 0.00035233598759777326},\n",
    " '41': {'Food/Water': 0.33003300330033003,\n",
    "  'Farming/Labour': 0.2739273927392739,\n",
    "  'Shop': 0.1782178217821782,\n",
    "  'Savings Group': 0.13861386138613863,\n",
    "  'Health': 0.0429042904290429,\n",
    "  'Fuel/Energy': 0.0165016501650165,\n",
    "  'Transport': 0.0165016501650165,\n",
    "  'Education': 0.0033003300330033004},\n",
    " '42': {'Savings Group': 0.8661740558292282, 'Health': 0.13382594417077176},\n",
    " '43': {'Savings Group': 1.0},\n",
    " '44': {'Food/Water': 0.4805194805194805,\n",
    "  'Shop': 0.14285714285714285,\n",
    "  'Savings Group': 0.14285714285714285,\n",
    "  'Farming/Labour': 0.13636363636363635,\n",
    "  'Health': 0.06493506493506493,\n",
    "  'Transport': 0.012987012987012988,\n",
    "  'Environment': 0.012987012987012988,\n",
    "  'Fuel/Energy': 0.006493506493506494},\n",
    " '45': {'Food/Water': 0.35471100554235946,\n",
    "  'Farming/Labour': 0.2414885193982581,\n",
    "  'Shop': 0.23198733174980204,\n",
    "  'Education': 0.03800475059382423,\n",
    "  'None': 0.035629453681710214,\n",
    "  'Transport': 0.035629453681710214,\n",
    "  'Fuel/Energy': 0.028503562945368172,\n",
    "  'Savings Group': 0.02454473475851148,\n",
    "  'Health': 0.006334125098970704,\n",
    "  'Environment': 0.001583531274742676,\n",
    "  'Staff': 0.000791765637371338,\n",
    "  'System': 0.000791765637371338},\n",
    " '46': {'Savings Group': 0.6981132075471698,\n",
    "  'Health': 0.18867924528301888,\n",
    "  'Food/Water': 0.09433962264150944,\n",
    "  'Shop': 0.018867924528301886},\n",
    " '47': {'Savings Group': 0.5555555555555556,\n",
    "  'Farming/Labour': 0.2222222222222222,\n",
    "  'Food/Water': 0.2222222222222222},\n",
    " '48': {'Food/Water': 0.38795180722891565,\n",
    "  'Savings Group': 0.38313253012048193,\n",
    "  'Health': 0.10120481927710843,\n",
    "  'Shop': 0.09879518072289156,\n",
    "  'Fuel/Energy': 0.016867469879518072,\n",
    "  'Farming/Labour': 0.012048192771084338},\n",
    " '49': {'Food/Water': 0.3829787234042553,\n",
    "  'Savings Group': 0.3829787234042553,\n",
    "  'Education': 0.19148936170212766,\n",
    "  'Fuel/Energy': 0.0425531914893617},\n",
    " 'external': {'Food/Water': 0.6,\n",
    "  'Fuel/Energy': 0.1,\n",
    "  'Health': 0.03,\n",
    "  'Education': 0.015,\n",
    "  'Savings Group': 0.065,\n",
    "  'Shop': 0.19}}\n",
    "\n",
    "# agent:[centrality,allocationValue]\n",
    "agentAllocation = {'0': [1, 1],\n",
    " '1': [1, 1],\n",
    " '2': [1, 1],\n",
    " '3': [1, 1],\n",
    " '4': [1, 1],\n",
    " '5': [1, 1],\n",
    " '6': [1, 1],\n",
    " '7': [1, 1],\n",
    " '8': [1, 1],\n",
    " '9': [1, 1],\n",
    " '10': [1, 1],\n",
    " '11': [1, 1],\n",
    " '12': [1, 1],\n",
    " '13': [1, 1],\n",
    " '14': [1, 1],\n",
    " '15': [1, 1],\n",
    " '16': [1, 1],\n",
    " '17': [1, 1],\n",
    " '18': [1, 1],\n",
    " '19': [1, 1],\n",
    " '20': [1, 1],\n",
    " '21': [1, 1],\n",
    " '22': [1, 1],\n",
    " '23': [1, 1],\n",
    " '24': [1, 1],\n",
    " '25': [1, 1],\n",
    " '26': [1, 1],\n",
    " '27': [1, 1],\n",
    " '28': [1, 1],\n",
    " '29': [1, 1],\n",
    " '30': [1, 1],\n",
    " '31': [1, 1],\n",
    " '32': [1, 1],\n",
    " '33': [1, 1],\n",
    " '34': [1, 1],\n",
    " '35': [1, 1],\n",
    " '36': [1, 1],\n",
    " '37': [1, 1],\n",
    " '38': [1, 1],\n",
    " '39': [1, 1],\n",
    " '40': [1, 1],\n",
    " '41': [1, 1],\n",
    " '42': [1, 1],\n",
    " '43': [1, 1],\n",
    " '44': [1, 1],\n",
    " '45': [1, 1],\n",
    " '46': [1, 1],\n",
    " '47': [1, 1],\n",
    " '48': [1, 1],\n",
    " '49': [1, 1]}\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}