{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Extra 2.1 - Unbalanced Data - Application 1: CollabMap Data Quality\n", "\n", "Assessing the quality of crowdsourced data in CollabMap from their provenance\n", "\n", "In this notebook, we compared the classification accuracy on **unbalanced** (original) CollabMap datasets vs that on a **balanced** CollabMap datasets.\n", "\n", "* **Goal**: To determine if the provenance network analytics method can identify trustworthy data (i.e. buildings, routes, and route sets) contributed by crowd workers in [CollabMap](https://collabmap.org/).\n", "* **Classification labels**: $\\mathcal{L} = \\left\\{ \\textit{trusted}, \\textit{uncertain} \\right\\} $.\n", "* **Training data**:\n", " - Buildings: 5175\n", " - Routes: 4710\n", " - Route sets: 4997\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading data\n", "The CollabMap dataset is provided in the [`collabmap/depgraphs.csv`](collabmap/depgraphs.csv) file, each row corresponds to a building, route, or route sets created in the application:\n", "* `id`: the identifier of the data entity (i.e. building/route/route set).\n", "* `trust_value`: the beta trust value calculated from the votes for the data entity.\n", "* The remaining columns provide the provenance network metrics calculated from the dependency provenance graph of the entity." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
trust_valueentitiesagentsactivitiesnodesedgesdiameterassortativityaccacc_e...mfd_e_amfd_e_agmfd_a_emfd_a_amfd_a_agmfd_ag_emfd_ag_amfd_ag_agmfd_derpowerlaw_alpha
id
Route41053.00.83333390615263-0.2722070.8910910.809409...102000002-1.00000
RouteSet9042.10.6000006039152-0.4129740.8796300.847222...101000001-1.00000
Building19305.00.42857160410132-0.5270460.9012350.822222...1010000013.19876
Building1136.00.42857160410132-0.5270460.9012350.822222...1010000013.19876
Building24156.00.83333390514243-0.3639370.8380340.757639...202200002-1.00000
\n", "

5 rows × 23 columns

\n", "
" ], "text/plain": [ " trust_value entities agents activities nodes edges \\\n", "id \n", "Route41053.0 0.833333 9 0 6 15 26 \n", "RouteSet9042.1 0.600000 6 0 3 9 15 \n", "Building19305.0 0.428571 6 0 4 10 13 \n", "Building1136.0 0.428571 6 0 4 10 13 \n", "Building24156.0 0.833333 9 0 5 14 24 \n", "\n", " diameter assortativity acc acc_e ... \\\n", "id ... \n", "Route41053.0 3 -0.272207 0.891091 0.809409 ... \n", "RouteSet9042.1 2 -0.412974 0.879630 0.847222 ... \n", "Building19305.0 2 -0.527046 0.901235 0.822222 ... \n", "Building1136.0 2 -0.527046 0.901235 0.822222 ... \n", "Building24156.0 3 -0.363937 0.838034 0.757639 ... \n", "\n", " mfd_e_a mfd_e_ag mfd_a_e mfd_a_a mfd_a_ag mfd_ag_e \\\n", "id \n", "Route41053.0 1 0 2 0 0 0 \n", "RouteSet9042.1 1 0 1 0 0 0 \n", "Building19305.0 1 0 1 0 0 0 \n", "Building1136.0 1 0 1 0 0 0 \n", "Building24156.0 2 0 2 2 0 0 \n", "\n", " mfd_ag_a mfd_ag_ag mfd_der powerlaw_alpha \n", "id \n", "Route41053.0 0 0 2 -1.00000 \n", "RouteSet9042.1 0 0 1 -1.00000 \n", "Building19305.0 0 0 1 3.19876 \n", "Building1136.0 0 0 1 3.19876 \n", "Building24156.0 0 0 2 -1.00000 \n", "\n", "[5 rows x 23 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(\"collabmap/depgraphs.csv\", index_col='id')\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
trust_valueentitiesagentsactivitiesnodesedgesdiameterassortativityaccacc_e...mfd_e_amfd_e_agmfd_a_emfd_a_amfd_a_agmfd_ag_emfd_ag_amfd_ag_agmfd_derpowerlaw_alpha
count14882.00000014882.00000014882.014882.00000014882.00000014882.00000014882.00000014882.00000014882.00000014882.000000...14882.00000014882.014882.00000014882.00000014882.014882.014882.014882.014882.00000014882.000000
mean0.76670613.3846930.06.79337520.17806739.1188682.771267-0.3637910.8061230.762426...1.5454240.01.7425750.9871660.00.00.00.01.802782-0.226061
std0.11530117.1656770.07.24770624.14788859.6485350.9172980.2386580.2036270.200090...1.0440790.01.0126151.3917630.00.00.00.00.9389741.590865
min0.1538462.0000000.00.0000002.0000001.0000001.000000-1.0000000.0000000.000000...0.0000000.00.0000000.0000000.00.00.00.01.000000-1.000000
25%0.7500005.0000000.02.0000007.00000010.0000002.000000-0.5000000.8203090.757639...1.0000000.01.0000000.0000000.00.00.00.01.000000-1.000000
50%0.8000009.0000000.05.00000014.00000024.0000003.000000-0.3308350.8497900.809409...1.0000000.02.0000000.0000000.00.00.00.02.000000-1.000000
75%0.83333314.0000000.09.00000022.00000040.0000003.000000-0.2512560.8800830.854159...2.0000000.02.0000002.0000000.00.00.00.02.000000-1.000000
max0.965517178.0000000.070.000000248.000000706.00000013.0000000.4940081.0000001.000000...13.0000000.012.00000013.0000000.00.00.00.012.0000004.674298
\n", "

8 rows × 23 columns

\n", "
" ], "text/plain": [ " trust_value entities agents activities nodes \\\n", "count 14882.000000 14882.000000 14882.0 14882.000000 14882.000000 \n", "mean 0.766706 13.384693 0.0 6.793375 20.178067 \n", "std 0.115301 17.165677 0.0 7.247706 24.147888 \n", "min 0.153846 2.000000 0.0 0.000000 2.000000 \n", "25% 0.750000 5.000000 0.0 2.000000 7.000000 \n", "50% 0.800000 9.000000 0.0 5.000000 14.000000 \n", "75% 0.833333 14.000000 0.0 9.000000 22.000000 \n", "max 0.965517 178.000000 0.0 70.000000 248.000000 \n", "\n", " edges diameter assortativity acc acc_e \\\n", "count 14882.000000 14882.000000 14882.000000 14882.000000 14882.000000 \n", "mean 39.118868 2.771267 -0.363791 0.806123 0.762426 \n", "std 59.648535 0.917298 0.238658 0.203627 0.200090 \n", "min 1.000000 1.000000 -1.000000 0.000000 0.000000 \n", "25% 10.000000 2.000000 -0.500000 0.820309 0.757639 \n", "50% 24.000000 3.000000 -0.330835 0.849790 0.809409 \n", "75% 40.000000 3.000000 -0.251256 0.880083 0.854159 \n", "max 706.000000 13.000000 0.494008 1.000000 1.000000 \n", "\n", " ... mfd_e_a mfd_e_ag mfd_a_e mfd_a_a \\\n", "count ... 14882.000000 14882.0 14882.000000 14882.000000 \n", "mean ... 1.545424 0.0 1.742575 0.987166 \n", "std ... 1.044079 0.0 1.012615 1.391763 \n", "min ... 0.000000 0.0 0.000000 0.000000 \n", "25% ... 1.000000 0.0 1.000000 0.000000 \n", "50% ... 1.000000 0.0 2.000000 0.000000 \n", "75% ... 2.000000 0.0 2.000000 2.000000 \n", "max ... 13.000000 0.0 12.000000 13.000000 \n", "\n", " mfd_a_ag mfd_ag_e mfd_ag_a mfd_ag_ag mfd_der powerlaw_alpha \n", "count 14882.0 14882.0 14882.0 14882.0 14882.000000 14882.000000 \n", "mean 0.0 0.0 0.0 0.0 1.802782 -0.226061 \n", "std 0.0 0.0 0.0 0.0 0.938974 1.590865 \n", "min 0.0 0.0 0.0 0.0 1.000000 -1.000000 \n", "25% 0.0 0.0 0.0 0.0 1.000000 -1.000000 \n", "50% 0.0 0.0 0.0 0.0 2.000000 -1.000000 \n", "75% 0.0 0.0 0.0 0.0 2.000000 -1.000000 \n", "max 0.0 0.0 0.0 0.0 12.000000 4.674298 \n", "\n", "[8 rows x 23 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Labelling data\n", "Based on its trust value, we categorise the data entity into two sets: _trusted_ and _uncertain_. Here, the threshold for the trust value, whose range is [0, 1], is chosen to be 0.75." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
trust_valueentitiesagentsactivitiesnodesedgesdiameterassortativityaccacc_e...mfd_e_agmfd_a_emfd_a_amfd_a_agmfd_ag_emfd_ag_amfd_ag_agmfd_derpowerlaw_alphalabel
id
Route41053.00.83333390615263-0.2722070.8910910.809409...02000002-1.00000Trusted
RouteSet9042.10.6000006039152-0.4129740.8796300.847222...01000001-1.00000Uncertain
Building19305.00.42857160410132-0.5270460.9012350.822222...010000013.19876Uncertain
Building1136.00.42857160410132-0.5270460.9012350.822222...010000013.19876Uncertain
Building24156.00.83333390514243-0.3639370.8380340.757639...02200002-1.00000Trusted
\n", "

5 rows × 24 columns

\n", "
" ], "text/plain": [ " trust_value entities agents activities nodes edges \\\n", "id \n", "Route41053.0 0.833333 9 0 6 15 26 \n", "RouteSet9042.1 0.600000 6 0 3 9 15 \n", "Building19305.0 0.428571 6 0 4 10 13 \n", "Building1136.0 0.428571 6 0 4 10 13 \n", "Building24156.0 0.833333 9 0 5 14 24 \n", "\n", " diameter assortativity acc acc_e ... \\\n", "id ... \n", "Route41053.0 3 -0.272207 0.891091 0.809409 ... \n", "RouteSet9042.1 2 -0.412974 0.879630 0.847222 ... \n", "Building19305.0 2 -0.527046 0.901235 0.822222 ... \n", "Building1136.0 2 -0.527046 0.901235 0.822222 ... \n", "Building24156.0 3 -0.363937 0.838034 0.757639 ... \n", "\n", " mfd_e_ag mfd_a_e mfd_a_a mfd_a_ag mfd_ag_e mfd_ag_a \\\n", "id \n", "Route41053.0 0 2 0 0 0 0 \n", "RouteSet9042.1 0 1 0 0 0 0 \n", "Building19305.0 0 1 0 0 0 0 \n", "Building1136.0 0 1 0 0 0 0 \n", "Building24156.0 0 2 2 0 0 0 \n", "\n", " mfd_ag_ag mfd_der powerlaw_alpha label \n", "id \n", "Route41053.0 0 2 -1.00000 Trusted \n", "RouteSet9042.1 0 1 -1.00000 Uncertain \n", "Building19305.0 0 1 3.19876 Uncertain \n", "Building1136.0 0 1 3.19876 Uncertain \n", "Building24156.0 0 2 -1.00000 Trusted \n", "\n", "[5 rows x 24 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trust_threshold = 0.75\n", "df['label'] = df.apply(lambda row: 'Trusted' if row.trust_value >= trust_threshold else 'Uncertain', axis=1)\n", "df.head() # The new label column is the last column below" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Having used the trust valuue to label all the data entities, we remove the `trust_value` column from the data frame." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(14882, 23)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# We will not use trust value from now on\n", "df.drop('trust_value', axis=1, inplace=True)\n", "df.shape # the dataframe now have 23 columns (22 metrics + label)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filtering data\n", "We split the dataset into three: buildings, routes, and route sets." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((5175, 23), (4997, 23), (4710, 23))" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_buildings = df.filter(like=\"Building\", axis=0)\n", "df_routes = df.filter(regex=\"^Route\\d\", axis=0)\n", "df_routesets = df.filter(like=\"RouteSet\", axis=0)\n", "df_buildings.shape, df_routes.shape, df_routesets.shape # The number of data points in each dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Classification on unbalanced (original) data\n", "\n", "We now run the cross validation tests on the three unbalanced datasets (`df_buildings`, `df_routes`, and `df_routesets`) using all the features (`combined`), only the generic network metrics (`generic`), and only the provenance-specific network metrics (`provenance`). Please refer to [Cross Validation Code.ipynb](Cross%20Validation%20Code.ipynb) for the detailed description of the cross validation code." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from analytics import test_classification" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Building Classification\n", "\n", "We test the classification of buildings, collect individual accuracy scores `results` and the importance of every feature in each test in `importances` (both are Pandas Dataframes). These two tables will also be used to collect data from testing the classification of routes and route sets later." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy: 95.69% ±0.0513 <-- combined\n", "Accuracy: 95.73% ±0.0493 <-- generic\n", "Accuracy: 95.68% ±0.0499 <-- provenance\n" ] } ], "source": [ "# Cross validation test on building classification\n", "res, imps = test_classification(df_buildings)\n", "\n", "# adding the Data Type column\n", "res['Data Type'] = 'Building'\n", "imps['Data Type'] = 'Building'\n", "\n", "# storing the results and importance of features\n", "results_unb = res\n", "importances_unb = imps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Route Classification" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy: 96.00% ±0.0527 <-- combined\n", "Accuracy: 95.84% ±0.0539 <-- generic\n", "Accuracy: 95.33% ±0.0552 <-- provenance\n" ] } ], "source": [ "# Cross validation test on route classification\n", "res, imps = test_classification(df_routes)\n", "\n", "# adding the Data Type column\n", "res['Data Type'] = 'Route'\n", "imps['Data Type'] = 'Route'\n", "\n", "# storing the results and importance of features\n", "results_unb = results_unb.append(res, ignore_index=True)\n", "importances_unb = importances_unb.append(imps, ignore_index=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Route Set Classification" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy: 94.84% ±0.0594 <-- combined\n", "Accuracy: 94.34% ±0.0629 <-- generic\n", "Accuracy: 94.42% ±0.0606 <-- provenance\n" ] } ], "source": [ "# Cross validation test on route classification\n", "res, imps = test_classification(df_routesets)\n", "\n", "# adding the Data Type column\n", "res['Data Type'] = 'Route Set'\n", "imps['Data Type'] = 'Route Set'\n", "\n", "# storing the results and importance of features\n", "results_unb = results_unb.append(res, ignore_index=True)\n", "importances_unb = importances_unb.append(imps, ignore_index=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " ## Classification on balanced data\n", " \n", " We repeat the same experiements but now with balanced datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Balancing Data\n", "This section explore the balance of each of the three datasets and balance them using the [SMOTE Oversampling Method](https://www.jair.org/media/953/live-953-2037-jair.pdf)." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from analytics import balance_smote" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Buildings" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Trusted 4491\n", "Uncertain 684\n", "Name: label, dtype: int64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_buildings.label.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Balancing the building dataset:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original data shapes: (5175, 22) (5175,)\n", "Balanced data shapes: (8982, 22) (8982,)\n" ] } ], "source": [ "df_buildings = balance_smote(df_buildings)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Routes" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "Trusted 3908\n", "Uncertain 1089\n", "Name: label, dtype: int64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_routes.label.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Balancing the route dataset:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original data shapes: (4997, 22) (4997,)\n", "Balanced data shapes: (7816, 22) (7816,)\n" ] } ], "source": [ "df_routes = balance_smote(df_routes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Route Sets" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Trusted 3019\n", "Uncertain 1691\n", "Name: label, dtype: int64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_routesets.label.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Balancing the route set dataset:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original data shapes: (4710, 22) (4710,)\n", "Balanced data shapes: (6038, 22) (6038,)\n" ] } ], "source": [ "df_routesets = balance_smote(df_routesets)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Building Classification\n", "\n", "We test the classification of buildings, collect individual accuracy scores `results` and the importance of every feature in each test in `importances` (both are Pandas Dataframes). These two tables will also be used to collect data from testing the classification of routes and route sets later." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy: 89.92% ±0.0595 <-- combined\n", "Accuracy: 89.94% ±0.0588 <-- generic\n", "Accuracy: 89.79% ±0.0608 <-- provenance\n" ] } ], "source": [ "# Cross validation test on building classification\n", "res, imps = test_classification(df_buildings)\n", "\n", "# adding the Data Type column\n", "res['Data Type'] = 'Building'\n", "imps['Data Type'] = 'Building'\n", "\n", "# storing the results and importance of features\n", "results_bal = res\n", "importances_bal = imps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Route Classification" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy: 96.74% ±0.0394 <-- combined\n", "Accuracy: 96.57% ±0.0397 <-- generic\n", "Accuracy: 96.09% ±0.0416 <-- provenance\n" ] } ], "source": [ "# Cross validation test on route classification\n", "res, imps = test_classification(df_routes)\n", "\n", "# adding the Data Type column\n", "res['Data Type'] = 'Route'\n", "imps['Data Type'] = 'Route'\n", "\n", "# storing the results and importance of features\n", "results_bal = results_bal.append(res, ignore_index=True)\n", "importances_bal = importances_bal.append(imps, ignore_index=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Route Set Classification" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy: 95.74% ±0.0492 <-- combined\n", "Accuracy: 95.23% ±0.0531 <-- generic\n", "Accuracy: 95.36% ±0.0503 <-- provenance\n" ] } ], "source": [ "# Cross validation test on route classification\n", "res, imps = test_classification(df_routesets)\n", "\n", "# adding the Data Type column\n", "res['Data Type'] = 'Route Set'\n", "imps['Data Type'] = 'Route Set'\n", "\n", "# storing the results and importance of features\n", "results_bal = results_bal.append(res, ignore_index=True)\n", "importances_bal = importances_bal.append(imps, ignore_index=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Combining the results" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Merging the two result sets\n", "results_unb['Balanced'] = False\n", "results_bal['Balanced'] = True\n", "results = results_unb.append(results_bal, ignore_index=True)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Charting the accuracy scores" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib inline\n", "import seaborn as sns\n", "sns.set_style(\"whitegrid\")\n", "sns.set_context(\"talk\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Converting the accuracy score from [0, 1] to percentage, i.e [0, 100]:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": true }, "outputs": [], "source": [ "results.Accuracy = results.Accuracy * 100" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAwsAAAEMCAYAAACLPCorAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3XlcVdX+//EXo4hDTogoCqaB5YiopOKE8whpOZCmRWqY\naWkOUE45Ikqm0U1NuzlUinETh6umOZU41rWupVkOFxRnUdGQ6fz+8Nf5Sh71gHA44Pv5ePh4uIe1\n1uesfc46fM7ee20bg8FgQERERERE5G9sCzoAERERERGxTkoWRERERETEJCULIiIiIiJikpIFERER\nERExScmCiIiIiIiYpGRBRERERERMsi/oAKRw8vb2zrZcunRpmjZtyrvvvkvFihXNqmPAgAHUr1+f\nt99+Oz9CfCTe3t4sXryYli1bPnJd+/bt46WXXjK57aWXXuKdd96xaDwiYj6NdTmr60G2bduGu7v7\nI7cjIpalZEFybe7cufj5+ZGVlcWlS5eIiIhg9OjRLF++vKBDs0rffvstjo6O2dYVL168gKIREXNp\nrDPPd999Z/z/9OnTSUtLY8qUKcZ15cqVK4iwROQRKVmQXCtdujQuLi4AuLq6MmrUKPr06cPly5cp\nX758AUdnfSpUqECxYsUKOgwRySGNdeb5q48AnJycsLGxybZORAon3bMgeebvv5JnZGQQGRlJ69at\nqV27Nv7+/rz//vsmyz5s3/HjxzN16lTGjh2Lj48PTZs2ZeHChcbtWVlZREdH06pVK3x8fHjllVc4\nffq0cftXX31Fhw4dqF+/Pr169WLPnj3Z2o6IiMDPz4+mTZvy1VdfPfB1BgQE4O3tfc+/gICAHPXX\n3S5cuMBbb72Fn58fderUoWPHjvz73/82ue+BAwfo2bMn9erVo1WrVnz44Yf89SB2g8HAokWLaN26\nNT4+PvTv358jR47kOi4RuZfGutyPdQMGDOC9996jY8eONG/enMTERLy9vdm1a5dxn127dmW7pOn8\n+fOMGDECHx8fWrRoweTJk7l582auYxCRnNGZBckTKSkpLF68GH9/f+MvbYsWLeLf//43c+bMoVKl\nSuzevZspU6bQqlUrGjZsmK28OfuuWrWKoUOHsnbtWjZt2sTcuXNp3bo13t7efPjhh3zxxRdMnTqV\nmjVrEhUVxbBhw9iwYQM7d+5k9uzZvPfee9SqVYtdu3YxdOhQ1qxZg7e3NwsWLGDDhg3MmzePJ554\nIttpc1PWrFlDZmbmPevt7Oxy3X9jx47F3t6e5cuX4+DgwJIlS5gwYQJt2rTBycnJuF9mZibDhw+n\nb9++zJ8/n99//52RI0dSq1Yt2rVrx+eff86qVauYNm0aVapUIS4ujgEDBrB582b9wieSBzTWPdpY\nBxATE8OiRYsoUaLEQ+9hMBgMDB8+nGrVqhETE8PNmzeZOXMm4eHhfPDBB48Uh4iYySCSC15eXoa6\ndesaGjRoYKhfv77B29vbUKdOHcP+/fuN+3zzzTeGffv2ZSvXokULw6pVqwwGg8HQv39/Q2RkpFn7\njhs3ztC1a9ds25s0aWKIiYkxZGVlGZ599lnDihUrjNsuXrxomDVrluHGjRuG4OBgw+LFi7OVHTVq\nlCE8PNxY9q92DAaD4fjx4wYvLy/Dzp07c9s92ezdu9fg5eVlaNCgQbZ/vXr1Mu7z2WefGRISEozL\np06dMnh5eRlOnjxpMBgMxniuXr1q8PLyMixfvtyQlZVlMBgMhkOHDhnOnTtnMBgMhlatWhk2btyY\nrf0+ffoYoqOj8+S1iDxuNNblzrhx4wxvvvnmPev79+9vGDp0aLZ1f49h586dBi8vL4PBYDDs2bPH\n4Ovra0hLSzNuP3HihMHLy8uQlJSU53GLyL10ZkFybdKkSTRq1Ai482vbjh07CAkJYenSpTRq1Ih2\n7dqxd+9eZs+ezYkTJzh69Cjnz583+UuVOft6enpmK1OiRAkyMjK4evUqV65coW7dusZtFSpUYNy4\ncQD8/vvv/PTTT0RHRxu3p6enU69ePWPZWrVqGbfVrFkTZ2fn+77url27cvbs2XvWV65cmQ0bNty3\nXExMDA4ODsblu2927tevH5s2bWLJkiWcPHmSX375BeCevipTpgz9+/dn6tSpfPzxx7Rq1YoePXrg\n6urKzZs3SUpKYvz48YSHhxvLpKWlUbVq1fvGJSIPprEuu4eNdQ+TkxmR/vjjD1JSUmjSpMk9206e\nPEmlSpVyHYeImEfJguSai4sLHh4exuXatWvzww8/sGzZMho1asT8+fNZuXIlPXv2pEuXLoSHhzNg\nwACTdZmz791/aP/FYDCYXH+3zMxMxo4de8/UgH+fmehuDzrNvmjRIjIyMu5Zb2//4I9T1apVTd7g\nnJWVRUhICJcuXaJLly40b94cFxcXevfubbKeCRMm8OKLL7Jt2zZ27tzJwIEDmTRpEl27dgUgMjLy\nnikMH/QHgYg8mMa67B421j3M3ZdWmnJ34pSRkUG1atVYvHjxPfvp0koRy1CyIHkqMzOTrKwsAD79\n9FMmTZpEUFAQANevX+fy5cvGm3HvlpN9/65UqVKUL1+eX3/9lXr16gFw7do1OnbsyMqVK6lRowZn\nz57N9mUfFRVFhQoVGDBgAC4uLhw+fNhYNiEhgRs3bty3vSpVqpjZG+b55Zdf2LdvHzt27MDNzQ2A\nnTt3mtz34sWLREdHM3bsWAYPHszgwYOZOHEiGzdupF+/fri4uHDhwgU6dOhgLPPOO+/QrFkzYzIh\nIo9OY13ecXBwICUlxbickJBg/H+NGjU4d+4cpUqVMk69+scffzBnzhymTJmiH0JELEDJguTa9evX\nuXjxInDnVPeWLVvYu3evcWYPV1dXdu7cScOGDUlOTiYqKor09HTS0tLuqSsn+5oyaNAgPvzwQ9zc\n3HB3d2fevHm4urry5JNP8uqrrzJ69GiqV6/Os88+y7fffsvixYtZuHAhNjY2vPTSS0RHR1O1alUq\nVarE9OnTsbW13ERhLi4u2NnZsXHjRjp16sTvv/9uvPHw76+/TJkybN26lbS0NIYMGcL169c5ePAg\nrVq1AuDVV19lwYIFlC9fntq1a7N69WrjTc4ikjsa6/JX3bp1WblyJbVq1eLcuXMsW7bMuK158+bU\nqFGDUaNGMWbMGAwGAxMnTsTR0dHsh+KJyKNRsiC5Nnr0aOP/HR0d8fT0ZOLEiXTu3BmAWbNmMXny\nZLp160aFChXo0qULpUuXNjmVZ072NeWVV14hJSWFsLAwbt26RZMmTfjoo4+wsbGhffv2vPPOO3zy\nySe89957VK1aldmzZxtP1Q8ePJjbt28TFhZGZmYmQ4YM4bfffsuDHjKPq6srU6ZMITo6mvnz51Ot\nWjVef/11PvjgA/773//y9NNPG/d1cHBg4cKFzJgxg+eeew5HR0c6duzIyJEjgTtPhP7zzz+JiIjg\nypUr1KxZk3/84x/ZrlMWkZzRWJe/Jk6cyIQJE+jRowdeXl68+eabxj63tbXlo48+Yvr06fTv3x8H\nBwf8/f2z3ZclIvnLxmDOuU8REREREXnsFPz5RxERERERsUoWTRbi4+MJCgrCx8eHPn36cPjwYeDO\n0xlfe+01GjdujL+/P3PnzjXeOCYiIiIiIgXDYslCYmIioaGhBAcHc+DAAUJDQxkyZAgXL15k2rRp\nVKtWjfj4eNasWcPGjRuJi4uzVGgiIiIiImKCxZKFXbt24eXlRe/evbG3t6d169bUq1ePTZs2cerU\nqWzT0Nna2pqcj15ERERERCzHYrMhZWVl3fMgFltbW06fPk1ISAgTJkzgiy++IDMzk+eee844y4Q5\nDh06lNfhiog8Ml9f30euQ+ObiFijvBjfpHCwWLLg7+/PnDlz2LRpE23btiU+Pp74+HjjPMlDhw4l\nJCSEM2fO8Nprr/Hll1/St29fs+uvXbt2foUuIlKgNL6JiEhBsejUqTt27CAqKorz58/j7++PnZ0d\njo6OrFu3jgMHDhgfSb969Wq+/PJLYmNjzar30KFDynBFpEjS+CYiIgXJYmcWUlJScHNzy3bjcu/e\nvWnTpg3p6emkp6cbkwU7Ozvs7OwsFZqIiIiIiJhgsRuck5OT6du3L0eOHCEtLY2VK1eSlJRE//79\nqVSpEhEREaSlpZGYmMjSpUvp2rWrpUITERERERETLHZmwd3dncmTJ/PGG2+QnJxM7dq1Wbp0KaVK\nlWLRokXMmDEDf39/SpQowfPPP89LL71kqdBERERERMQEi96zkF90Ta+IFFUa30REpCBZ9AnOIiIi\nIiJSeChZEBERERERk5QsiIiIiIiISUoWRERERETEJCULIiIiIiJikpIFERERERExScmCiIiIiIiY\npGRBRERERERMUrIgIiIiIiImKVkQERERERGTlCyIiIiIiIhJShZERERERMQkJQsiIiIiImKSkgUR\nERERETFJyYKIiIiIiJikZEFERERERExSsiAiIiIiIiYpWRAREREREZPsCzoAERGRoqL55H5m7+t4\n6DK2tzKNy1nOdqT5ljer7PeTv8hxbCIiuaEzCyIiIiIiYpLOLIiIiBQAc88iiIgUJCULIiIiRVhI\nSAinTp0yLnt6erJkyZKCC0hEChUlCyIiIoVM9NbPzd63Yb+2NARiZy6hZ1hIjsq/3i44N+GJSBGi\nZEFERKQI27r4K65fSgbuJAylK5Sh3eBeBRyViBQWShZERESKMCUGIvIoNBuSiIiIiIiYpGRBRERE\nRERMUrIgIiIiIiImKVkQERERERGTLJosxMfHExQUhI+PD3369OHw4cMApKWlMXXqVPz8/PDz8+Od\nd94hLS3NkqGJiIiIiMjfWCxZSExMJDQ0lODgYA4cOEBoaChDhgzh4sWLREVFcfz4cTZv3szmzZv5\n/fffWbp0qaVCExEREREREyw2dequXbvw8vKid+/eALRu3Zp69eqxadMmVq1aRUxMDGXKlAFg/vz5\nZGRkWCo0ERERMeFK6g2z9x01bAQJp/9nXK7qUY2oj+abVbacU6kcxyYilmGxZCErKwsnJ6ds62xt\nbdm+fTuZmZkcPnyYYcOG8eeff9KtWzdGjRqVo/pTU1PzMlyxEsOGDeP06dPGZQ8PDz766KMCjEjE\nfH8f83JL45sUdXqPFz55Nb6J9bNYsuDv78+cOXPYtGkTbdu2JT4+nvj4eBo0aEB6ejrbt29nzZo1\n3Lx5k6FDh1KqVCmGDRtmdv1HjhzJx+glL41YN8f8nasB1SritPsCqS0qcow/aTvrZbOKzu/+du4C\nFMkjvr6+eVKPxjcpDMw9i2CK3uOFT16Nb2L9LJYseHp6Mm/ePKKiopg0aRL+/v506tSJpKQksrKy\nePPNNyldujSlS5fm5ZdfZvny5TlKFmrXrp2P0UueWmf+ro6HLmN7KxMAp90XyHK2I823vFll9Z6Q\nokLv5UIkB+Ob/B+9x0Wsl8WShZSUFNzc3IiLizOu6927Ny+99BIHDx7MNvtRZmZmjuvX6bCiydzE\nwJScvCdCQkI4deqUcdnT05MlS5bkum2RvKTxTYo6vcdFrJfFkoXk5GT69u3LihUreOqpp4iJiSEp\nKYmAgADatWtHVFQUUVFR/Pnnn3z22Wd0797dUqFJERW99XOz923Yry0NgdiZS+gZFpKj8q+3C85N\neCIiIiJWz2LJgru7O5MnT+aNN94gOTmZ2rVrs3TpUpydnZk5cyYRERF06dKF9PR0goKCeOWVVywV\nWoHSL9rWYevir7h+KRm4kzCUrlCGdoN7FXBUIiKS3/Q9LPJgFksWAAIDAwkMDLxnfcmSJZk6daol\nQ3moRxk8rPEXbQ2GD6bEQESk6Gg+uZ/5O1cFqv7fRBpHuWV2+e8nf5G7AEUKEYsmCwXNUoNHsL/5\nl1A9yi/aOZn/OjJ6HgAvdA0iZsPXOSqv+a9FRKSoepSJNEQeB49VspATlho8LPWL9t0Py3mha1CO\nHpYjIiJSVCkxEHkwJQv3UdQGDyUGIiIiIpJTtgUdgIiIiIiIWCclCyIiIiIiYpKSBRERERERMUn3\nLIhY2N3T2GoKWxERyWuaLl3ykpIFkUeUkylsIfs0tpHR83JUXtPYiog8njRduhQUJQsiFnb3NLaj\nho3QTFUiIo8BS/7ar+nSJS8pWRCxMA3YIiJFQ/TWz83et2G/tjTkzgNYe4aF5Lh8vxw88FXfM5KX\nlCyIiIiI5LOti7/i+qVk4E7CULpCGYs9mFXkUShZEBEREclnSgyksNLUqSIiIiIiYpKSBRERERER\nMUnJgoiIiIiImKRkQURERERETNINziLyUHoaqIiIyONJyYLIY6r55H7m71wVqFoRp90XSG1RkaPc\nylH57yd/kfMARUREpMApWRCRh3I8dBnbW5kAOO2+QJazHWm+5Qs4KhEREclvShZE5KGUGIiIiDye\ndIOziIiIiIiYpGRBRERERERMUrIgIiIiIiImKVkQERERERGTdIOziDyW9OwIERGRh1OyICJFxpXU\nG2bvGxk9D4AXugYRs+HrHJUv51Qq58GJiIgUQmYlC7/++itPP/10fsciIqJf/EVERKyIWcnCCy+8\ngIeHB926daNbt25UrVo1v+MSkSIkeuvnZu/bsF9bGgKxM5fQMywkR+X7+Xc3u51Rw0aQcPp/wJ2z\nC1U9qhH10Xyzy4uIiDwOzEoWvv/+ezZv3syGDRv48MMPqVu3Lt27d6dz586UK1cuv2MUkcfI1sVf\ncf1SMnAnYShdoQztBvfK83aUGIiIiDycWcnCE088Qe/evenduzcXLlxgy5YtfPvtt8yZM4dGjRoR\nGBhIhw4dcHR0zO94RaSIy4/EQERERHInx1OnpqamcvPmTVJSUkhPTycrK4tFixbRpk0bdu7c+cCy\n8fHxBAUF4ePjQ58+fTh8+HC27VlZWQwYMICIiIichiUiIiIiInnMrGTh7NmzLFmyhJ49e9KxY0e+\n/fZbunXrxs6dO1myZAlxcXH06tWLsLCw+9aRmJhIaGgowcHBHDhwgNDQUIYMGcLFixeN+yxdupSD\nBw8++qsSEREREZFHZlayEBAQwOrVqwkICGDTpk2sWrWKAQMGUL58eeM+jRs3pk6dOvetY9euXXh5\nedG7d2/s7e1p3bo19erVY9OmTQAcPXqU2NhY2rdv/4gvSUREREQed4mJiXh7e9O3b997tk2cOBFv\nb28SExPvW/7GjRv079//vtsDAwO5evVqnsRqzcy6Z2H16tXUq1cv27qUlBRKlixpXG7RogUtWrS4\nbx1ZWVk4OTllW2dra8vp06dJS0tj3LhxTJ06lZiYmJzEb5SampqrcvJw6lvroWNhHcw9Dn8f8/K7\nPZHCSu9x62Hp8S2/OTg4kJiYSEJCgnE2z9u3b7Nnzx4cHBweWPbatWv3XDJ/t7Vr1+ZprNbKrGTB\n3d2d1157jWeeeYYRI0YA0KlTJ+rXr8+MGTN44oknHlqHv78/c+bMYdOmTbRt25b4+Hji4+OpWLEi\nc+fOxd/fH19f31wnC0eOHMlVOXk49a310LGwDuYeB19fX4u2J1JY6T1uPSw9vuU3Gxsbunfvztq1\naxk+fDgA27Zto2XLlsa/Obdv384//vEP0tPTcXBw4K233qJp06aMGzeO9PR0AgMDWbVqFQ0bNqRT\np04cOXKE6dOn8+KLL/Ldd9/h4uLCokWL+Oqrr3B0dMTNzY1Zs2ZRrFgxwsLC+N///oeNjQ1PP/00\nU6dOxc7OriC7JMfMShYmT55MSkoKXbt2Na5bsmQJ06ZNY/r06cyePfuhdXh6ejJv3jyioqKYNGkS\n/v7+dOrUiV9//ZX09PRcJwl/qV279sN3WvdITTy2zOrbnNBxyLU8PRY6DrmW558JK2tPHoE+V7mi\n7xnrURTHm8DAQEaOHGlMFmJjYxk5ciQxMTFkZGQwZ84cli9fTrly5Th58iT9+/dn3bp1RERE0Llz\nZ+MZhMzMTJo1a0ZUVFS2+r/99lvWrFnDqlWrKFu2LO+//z7//Oc/efLJJ0lPT+frr78mMzOTSZMm\ncfr0aZ588kmL98GjMCtZ2LNnD6tWraJGjRrGdd7e3rz77ru89NJLZjWUkpKCm5sbcXFxxnW9e/fm\n+vXrXLx4kWbNmgF3Tn/Z2Nhw4sQJFi5caPYLKSynwwoj9a310LGwDpY+DjruUtTpPW49iuKxqFWr\nFsWLF+fw4cNUqlSJixcvUrduXeDOs8QuXbrEyy+/bNzf3t6eP/74Azc3t3vqaty48T3r9uzZQ4cO\nHShbtiwAb731FgAJCQnMmzePl156iaZNm/LSSy8VukQBzEwWihUrxpUrV7IlCwA3b940u6Hk5GT6\n9u3LihUreOqpp4iJiSEpKYnNmzfj7Oxs3G/8+PGULVuWcePGmV23iIiIiMj9BAUFERcXR6VKlejR\no4dxfVZWFk2aNGHBggXGdefOnaNChQqcO3funnru/pv1L/b29tjY2BiXU1JSuHLlCtWqVWPLli3s\n3buXffv28corrxAWFpbtSp3CwKzZkLp06cK7777L7t27uXr1KlevXmXPnj1MmjSJTp06mdWQu7s7\nkydP5o033uDZZ59l06ZNLF261GSni4iIiIjkle7du7N582bWr1+fLVlo1qwZ8fHxHD9+HIADBw7Q\nuXNnbty4gb29PQaDAYPB8MC6mzVrxjfffMONGzcAWLx4MdHR0axYsYKwsDBatGjBmDFj8Pf357ff\nfsu/F5lPzDqzMGbMGK5fv05oaCiZmZnAnZmMnn/+ecaPH292Y4GBgQQGBj5wn1mzZpldn4iIiIjI\nw5QvX546depgMBhwcXExri9WrBgzZsxg7NixZGZmYmNjw4cffkjZsmXJzMykbt26dO7cmWXLlt23\n7pYtW3LixAmCg4MB8PDwYPr06djb23Pw4EG6dOlC8eLFqVSpUqG8csasZMHR0ZGIiAgmTJjAyZMn\ncXBwoGrVqpQoUSK/4xMRERERyTF3d3d+/vln4/LHH3+cbftf29zd3enQocM95e3s7Pjiiy+My8eO\nHcu2/e7lQYMGMWjQoHvqmDdvXq5ityZmJQsA58+f58SJE8YzCxcvXiQtLY0jR44Yp1MVEREREZGi\nw6xkYeXKlcyYMcN4euava7dsbGyoX7++kgURERERkSLIrBuclyxZQmhoKD///DPly5dnx44drF+/\nnlq1atG+ffv8jlFERERERAqAWcnChQsXCAwMxMHBgaeffpr//Oc/1KxZk7CwsEd+mJqIiIiIiFgn\ns5KFMmXKGKeDql69uvGGjipVqpicg1ZERERERAo/s5KFNm3aMHHiRI4ePcqzzz7L2rVr+eGHH1i+\nfLnJp9uJiIiIiEjhZ1ayMH78eGrVqsXRo0cJCAigcePGBAcHs2bNmhw9Z0FERERERAoPs2ZD2r17\nN2PGjOGJJ54AICIigrCwMEqWLIm9vdmzr4qIiIhIEdd8cj+LtPP95C8evpM8MrPOLEycOJFLly5l\nW1emTBklCiIiIiLy2Bs/fjwREREmt8XFxfHiiy/meZsjRoxgwYIFeV7v35mVLNSpU4ddu3bldywi\nIiIiIkVKjx49WLlyZUGHkWtmnRpwdHQkIiKC6Oho3N3dcXJyyrb9yy+/zJfgRERERETywv79+4mI\niODEiRNUrlyZsLAwfHx8mDNnDlu2bAGgdevWjB8/nlKlSrFgwQKSkpK4fPky+/fvx8PDgylTprBg\nwQIOHTpEzZo1mT9/vnGynzNnztC/f3+OHDmCj48P06ZNo3LlysTGxrJixQpiY2NZsGABp0+f5saN\nG+zfvx83NzfCw8Px9/cH4MCBA8yaNYvTp09TvXp1JkyYQL169QD45ZdfmDhxIr///jt+fn6kpqZa\npN/MPrMwfPhwBg0aRLt27fD398/2T0RERETEWl2+fJnXXnuN4OBgDh48yOjRo3njjTd48803OXHi\nBOvWrWPjxo1cunSJiRMnGsvFxcUxePBg9u/fT6lSpRg4cCDDhg0jPj4eJycnli1bZtx39+7dvPnm\nm+zbt48qVaowatQok7Fs2rSJQYMGsW/fPlq1asXUqVMBOHv2LEOHDiU0NJS9e/fyyiuvMHjwYJKT\nk0lLSyM0NJSOHTty4MABXnjhBfbv35+/nfb/mXVmYfjw4fkdh4iIiIhIvtixYwfVqlWjV69eAAQE\nBLBw4UJefvllVq1aRbly5QAYN24cXbp0YebMmQD4+PjQqFEjAHx9fbGzs6Nhw4YANGrUiBMnThjb\n6N69u3Hft99+myZNmpCUlHRPLA0aNKBp06bGMp9++ikA69evx8/Pj3bt2gHQuXNnPv/8czZv3ky1\natW4ffs2ISEh2Nra0q5dO5599tk87ydTzEoWwsLCHrj9rw4VEREREbE2ly9fplKlStnWeXp6kpGR\nQeXKlY3rqlSpgsFg4Pz588CdCX3+YmdnR+nSpY3Ltra2GAwG4/Ld9TzxxBM4Oztz8eLFe2L5KzEB\nsLe3N9Zx9uxZdu/ebUw4ADIyMvD19cXZ2RkXFxdsbf/voqAqVaqY3wGPwKxk4fbt29mWMzIySExM\n5I8//qBv3775EpiIiIiISF6oWLGiMQH4y1dffYWNjQ1nzpwx/gGfmJiIra2tcdnGxsbsNu6eOfTq\n1avcunWLypUr8/vvv5tV3sXFhS5dujB79mzjuoSEBMqWLcuRI0c4f/48GRkZxtlIz58/j6urq9nx\n5ZZZ9yxERUVl+zd//nxiY2MZOnQoN2/ezO8YRURERERyrVWrVpw5c4a1a9eSmZnJt99+y6effspz\nzz3HnDlzuHLlCteuXWP27Nm0atWKUqVK5biNuLg4Dh8+TGpqKhEREbRq1YoKFSqYXb5r165s376d\n+Ph4DAYDhw4dokePHvz88880bNiQ0qVLs2DBAtLS0ti5cyfff/99jmPMjUd6UEKPHj3o0aMH06ZN\ny6t4RERERKQQs8aHpZUtW5aFCxcyc+ZM3nvvPdzd3YmOjubpp58mMjKSHj16cPv2bdq2bUt4eHiu\n2ggICGDixImcOXOGZs2aMWvWrByV9/T0ZN68eURGRnLq1CnKlStHWFiY8f6GhQsX8u677/LZZ59R\nu3ZtWreDznEAAAAgAElEQVRunas4c+qRkoV///vflChRIq9iERERERHJFz4+Pqxevfqe9VOmTGHK\nlCn3rH/jjTfMXn5QYtCzZ0969uxpsg4vLy+OHTtmXG7evDnNmzc3WU+NGjX44gvLJ2JmJQumpke9\ndesWf/7550NvfhYRERERkcLJrGRh9OjR2ZZtbGxwcHCgTp06eHh45EtgIiIiIiJSsMxKFp577jnO\nnTtHSkoKNWvWBOBf//oXjo6O+RqciIiIiIgUHLNmQ/ruu+/o1KkTGzduNK5bs2YN3bp14+DBg/kW\nnIiIiIiIFByzkoU5c+YwbNgwRowYYVy3cuVKhgwZogeyiYiIiIgUUWYlCydPnqRz5873rO/SpYvZ\nD5oQEREREZHCxaxkwcPDgx07dtyzfs+ePfc8OltERERERIoGs25wHjZsGKNHj+aHH36gbt26APzy\nyy9s3rxZlyGJiIiIiBRRZiULnTp1okyZMnzxxRfExsbi4OCAp6cny5cvp0GDBvkdo4iIiIgUEtFb\nP7dIO6+3C7ZIO487s5/g7OPjQ/Xq1XF1dQUgPj4eLy+vfAtMRERERKSoePXVV2nfvj19+vQp6FBy\nxKx7Fn7++WfatGnDP//5T+O6iRMn0rlzZ3777bf8ik1EREREpEj45JNPCl2iAGYmC9OnT6dLly6M\nGjXKuG7Lli20a9eOqVOnmt1YfHw8QUFB+Pj40KdPHw4fPgzAuXPnGDZsGH5+fjRv3pypU6eSlpaW\nw5ciIiIiImLali1b6NixI35+foSHh9O3b19iY2NJTk5mzJgxNG3alICAABYtWoTBYABg/PjxTJs2\njeDgYHx8fOjZsydHjhzJVme3bt1o1KgRAwcO5OTJkwAkJibi6+vL+PHjadSoEWvXrmXAgAGsWLEC\ngKSkJF577TUaNmxIixYt+PTTTy3fIWYyK1k4evQoAwcOxMHBwbjOxsaGgQMH8t///teshhITEwkN\nDSU4OJgDBw4QGhrKkCFDuHjxImPGjKFSpUrs2rWLr7/+mp9//pno6OjcvSIRERERkbucPHmSMWPG\nEB4eznfffUe1atX48ccfARg7diw2NjZs27aNZcuWERcXR2xsrLHs2rVrmThxIvHx8Xh4eBAVFQXA\nTz/9RHh4OFOmTCE+Pp42bdowdOhQ0tPTAUhJSaFKlSrs2bOHDh06ZItn5MiRuLi48P3337NixQo+\n+eQTvvvuOwv1Rs6Ydc+Cq6srP/74I1WrVs22/siRI5QpU8ashnbt2oWXlxe9e/cGoHXr1tSrV4+N\nGzdSvHhxQkNDKVasGC4uLnTv3p1vvvkmRy8kNTU1R/uL+dS31kPHwjqYexycnJws2p5IYaX3uPWw\n9PhmKRs2bKB58+a0atUKgKFDh7Jy5UouXbrErl27iI+Px9nZGWdnZ0JCQli1ahW9evUCICAggFq1\nagF3njE2a9YsANasWUNQUBC+vr4ADBo0iGXLlrFv3z48PT0B6N69O46OjtliSUhI4PDhwyxZsoTi\nxYvj4eHBZ599Rrly5SzRFTlmVrIwcOBAJk2axPHjx6lTpw5wZ+rUlStXMnz4cLMaysrKuueNZWtr\nS0JCAosWLcq2fvv27caDYq67TwlJ3lLfWg8dC+tg7nH46wvEUu2JFFZ6j1sPS49vlnLhwgXc3NyM\nyzY2Nri5uWFjY4PBYKB9+/bGbVlZWdl+DL/7j3h7e3vjJUpJSUns27ePr7/+2rg9PT2dpKQkY7JQ\noUKFe2K5fPkyzs7OlCpVyriuZs2aj/4i84lZyUJwcDDFihXjiy++YMWKFTg4OFC9enXeffddihUr\nZlZD/v7+zJkzh02bNtG2bVvi4+OJj4+nYsWKxn0MBgPTp0/nxIkTREZG5uiF1K5d++E7rctRlfL/\nmdW3OaHjkGt5eix0HHItzz8TVtaePAJ9rnJF3zPWo6iON25ubvz000/GZYPBwPnz50lLS8Pe3p49\ne/YYzwBcu3aNmzdvPrROFxcXQkJCGDlypHHdqVOncHV15fLly8CdpOTvXF1duXXrFjdu3DAmDOvX\nr6d06dK0bNnykV5nfjB76tRevXoZT8ccOXKE2NhYZs6cyfXr1+ncufNDy3t6ejJv3jyioqKYNGkS\n/v7+dOrUydhJqampjB07lmPHjrF8+XLKly+foxdS2E6HFSbqW+uhY2EdLH0cdNylqNN73HoU1WPR\nrVs3Pv74Y3bv3k3Tpk1ZsWIF586dw83NDV9fXyIjIxk9ejSpqamMHDmSihUrPvSH66CgIEaNGkW7\ndu145pln2Lp1K2+99RZxcXH3XHp0Nzc3Nxo1asTcuXMJCwsjKSmJWbNmMXv27Lx+2XnC7GTh6tWr\nxhs+fvvtNxwcHOjQoQMvvviiWeVTUlJwc3MjLi7OuK537960bNmS5ORkXn31VZydnVm1apXZ90GI\niIiIiHWxxoelVa1alZkzZzJp0iRSUlLo2LEjlStXxsHBgaioKGbMmEFAQACZmZm0bNmSSZMmPbTO\nJk2aMH78eMaOHcvZs2epUqUK8+bN48knnyQxMfGBZaOionjvvfdo2bIlxYsX5/XXX6dZs2Z59XLz\n1AOThaysLHbu3ElsbCw7duwgPT2dOnXqYGNjw4oVK6hXr57ZDSUnJ9O3b19WrFjBU089RUxMDElJ\nSQQEBDB06FAqVKjAggULss24JCIiIiLyqM6ePYuXlxfffvutcV2zZs0oW7YsFSpUMM5w9Hd/3cz8\nlzZt2tCmTRvjcrdu3ejWrds95dzd3Tl27Fi2dcuXLzf+39XVtdDM/HnfZGH27NnExcWRnJxMgwYN\nGD16NB06dKBy5crUrl0bZ2fnHDXk7u7O5MmTeeONN0hOTqZ27dosXbqUo0ePsn//fooVK0aTJk2M\n+z/zzDOsXLky969MRERERIQ7NzgPGzaMVatWUaVKFVatWkVaWhoNGjQo6NCs3n2ThaVLl+Lh4cHY\nsWMJCAigZMmSj9xYYGAggYGB96z/e+YlIiIiIpJXGjRowJAhQxgwYADXrl2jRo0afPzxx3ny921R\nd99kYeHChaxfv55JkyYRHh6On58fHTt2pG3btpaMT0RERETkkQ0aNIhBgwYVdBiFzn2f4NyqVSsi\nIyPZs2cPM2fOxN7e3ngjRlZWFtu3b+fPP/+0ZKwiIiIiImJB900W/lK8eHG6d+/OwoUL2bVrF+Hh\n4dSvX5+5c+fi7+9v1t3iIiIiIiJS+Jg9dSrceYLdiy++yIsvvkhCQgLr1q1jw4YN+RWbiIiIiIgU\noIeeWbifqlWrMmzYMCULIiIiIiJFVI7OLIiIiIiIPMiV1BsWaaecUymLtPO4y/WZBRERERERKdqU\nLIiIiIiIiElKFkRERESkSEtMTMTHx4fo6GgaN26Mv78/n332GQABAQFMmDABPz8/Jk2aREZGBvPm\nzaNly5b4+fkxYsQIzp8/T1ZWFq1atWLHjh3Gevfu3Yu/vz+ZmZkkJyczZswYmjZtSkBAAIsWLcJg\nMAAwfvx4pk2bRnBwMD4+PvTs2ZMjR44AkJWVxbx58+jUqRM+Pj60atWKL7/80hh3o0aNWLRoEc2b\nN6dp06bMmDHD2H5SUhKvvfYaDRs2pEWLFnz66afGbVu2bKFbt240atSIgQMHcvLkyVz1nZIFERER\nESnybt26xbFjx9i5cycff/wxH374Ibt27QLg7Nmz7Ny5kzFjxjB//ny2bdvG559/zo4dOyhdujQj\nR47ExsaG7t27Z5vcZ926dXTv3h07OzvGjh2LjY0N27ZtY9myZcTFxREbG2vcd+3atUycOJH4+Hg8\nPDyIiooCIC4uji1btrB8+XJ++OEHRo8ezYwZM7h58yYAN27cIDExke3bt/OPf/yDzz//nB9//BGA\nkSNH4uLiwvfff8+KFSv45JNP+O677/jpp58IDw9nypQpxMfH06ZNG4YOHUp6enqO+03JgoiIiIg8\nFt555x2cnZ2pU6cOQUFBxj/8O3bsiJOTEyVLlmTt2rUMHz4cd3d3ihcvTnh4OD/99BMnTpwgKCiI\nbdu2cfv2bdLS0tiyZQuBgYFcvHiRXbt2ERYWhrOzM+7u7oSEhBATE2NsOyAggFq1auHk5ESXLl04\ndeoUAO3ateOzzz6jQoUKnD9/nmLFinH79m2uXbtmLDt48GAcHR1p0KABTz75JKdPnyYhIYHDhw8z\nduxYihcvjoeHB5999hnPPPMMa9asISgoCF9fXxwcHBg0aBAZGRns27cvx32m2ZBEREREpMgrVqwY\nrq6uxuVKlSpx4sQJACpUqGBcf/nyZSpXrmxcdnZ2pmzZspw/f55mzZrh6enJjh07sLOzw83NjVq1\navHTTz9hMBho3769sVxWVhZlypQxLpcrV874f3t7e+MlSunp6UybNo34+Hjc3Nx4+umnjeXvVzYr\nK4vLly/j7OxMqVL/NytUzZo1gTuXJ+3bt4+vv/7auC09PZ2kpKScdpuSBREREREp+v76tf6JJ54A\n7lx6VKlSJU6ePImNjY1xv8qVK3PmzBnq1q0LwM2bN7l69Srly5cHIDAwkE2bNmFra0tgYCAALi4u\n2Nvbs2fPHhwdHQG4du2a8VKiB4mKisJgMLB7926KFSvG2bNn+de//vXQcq6urty6dYsbN24YE4b1\n69dTunRpXFxcCAkJYeTIkcb9T506lS1ZMpcuQxIRERGRx8LcuXNJS0vjp59+Yu3atQQFBd2zT1BQ\nENHR0Zw5c4Y///yTmTNnUrNmTby8vADo3r07e/bsYffu3XTr1g0ANzc3fH19iYyMJDU1leTkZEaM\nGMH777//0JhSUlJwdHTEzs6Oq1evEhERAUBGRsYDy7m5udGoUSPmzp3L7du3OXXqFLNmzcLe3p6g\noCBiYmI4cuQIBoOBb775hm7duunMgoiIiIjI/ZQoUYLWrVvj5OTEO++8Q+PGje/ZZ/Dgwdy+fZvg\n4GBSUlLw8/Nj0aJFxrMP5cqVw8fHh7S0tGy/1EdFRTFjxgwCAgLIzMykZcuWTJo06aExjRgxgnHj\nxtG4cWNKly5NYGAg1apV448//sDb2/uBZaOionjvvfdo2bIlxYsX5/XXX6dZs2bAnRmYxo4dy9mz\nZ6lSpQrz5s3jySefzEl3AWBj+OuCqULs0KFD+Pr6PnS/5pP7WSAaCPbvbpF2+lmonbx+QqKOQ+7l\n5bGw1HGAoncsLPnUUHPHN7EOGt9yR98zD1dYj4W1SExMpG3btvzwww+UKFGioMMpVHQZkoiIiIiI\nmKRkQURERERETNI9CyIiIiJSpLm7u3Ps2LGCDqNQ0pkFERERERExScmCiIiIiIiYpGRBRERERERM\nUrIgIiIiIiImKVkQERERERGTlCyIiIiIiIhJShZERERERMQkJQsiIiIiImKSkgURERERETFJyYKI\niIiIiJhk0WQhPj6eoKAgfHx86NOnD4cPHwbg2rVrvP766/j6+tK6dWtiYmIsGZaIiIiIiJhgsWQh\nMTGR0NBQgoODOXDgAKGhoQwZMoSLFy8yYcIEnJ2d2bNnD/Pnz2fOnDn85z//sVRoIiIiIiJigsWS\nhV27duHl5UXv3r2xt7endevW1KtXj02bNrF161ZGjBhBsWLFqFevHt26dePrr7+2VGgiIiIiImKC\nvaUaysrKwsnJKds6W1tbvv/+e+zt7alatapxffXq1dmyZUuO6k9NTc2TOOVe6lvroWNhHcw9Dn8f\n8/K7PZHCSu9x62Hp8U2sn8WSBX9/f+bMmcOmTZto27Yt8fHxxMfH06BBg3vecE5OTjkeOI4cOZKX\n4cpd1LfWQ8fCOph7HHx9fS3anrWZO3cu58+fNy67uroyevToAoxIrFVhfY8XRZYe38T6WSxZ8PT0\nZN68eURFRTFp0iT8/f3p1KkTly5d4vbt29n2TU1NxdnZOUf1165d++E7rctRlfL/mdW3OaHjkGt5\neix0HHItzz8TFm5v2LBhnD592rjs4eHBRx99ZFbZJd/Fmt1O81e6ARA7cwk9w0IA2Hv1mFllQ/x7\nmt2OVdHnKlf0PWM9LD2+ifWzWLKQkpKCm5sbcXFxxnW9e/cmODiYvXv3cvbsWSpXrgzAyZMnqVmz\nZo7q1+mw/KO+tR46FtbB0sfhYe01n9wvZxVWA6pVxGn3BVJbVOQYf9J21stmFQ327252M1sXf8X1\nS8nAnYShdIUytBvcy6yyeq8/XnS8rYeOhfydxZKF5ORk+vbty4oVK3jqqaeIiYkhKSmJDh06sG3b\nNubOncu0adM4fvw469evZ9GiRZYKTUTkseJ46DK2tzIBcNp9gSxnO9J8y+d5O+YmBo8qJCSEU6dO\nGZc9PT1ZsmSJRdoWESnqLJYsuLu7M3nyZN544w2Sk5OpXbs2S5cuxdnZmalTpzJp0iRatWqFs7Mz\nY8aMoX79+pYKTUTksZIfiUFeu5J6w+x9I6PnAfBC1yBiNnydo/LlnErlPDgRkceIxZIFgMDAQAID\nA+9ZX6ZMGT744ANLhiIiIkXEqGEjSDj9P+BOwlDVoxpRH80v4KhERIoGiyYLIiIieU2JgYhI/rHY\nQ9lERERERKRwUbIgIiIiIiImKVkQERERERGTlCyIiIiIiIhJShZERERERMQkJQsiIiIiImKSkgUR\nERERETFJyYKIiIiIiJikZEFERERERExSsiAiIiIiIiYpWRAREREREZOULIiIiIiIiElKFkRERERE\nxCQlCyIiIiIiYpKSBRERERERMUnJgoiIiIiImKRkQURERERETFKyICIiIiIiJilZEBERERERk5Qs\niIiIiIiISUoWRERERETEJCULIiIiIiJikpIFERERERExScmCiIiIiIiYpGRBRERERERMUrIgIiIi\nIiImKVkQERERERGTlCyIiIiIiIhJShZERERERMQkiyYLP/zwAz179qRhw4Z07NiRdevWAXD+/Hle\ne+01GjdujL+/P3PnziUrK8uSoYmIiIiIyN9YLFnIzMzk9ddfZ8iQIfzwww9Mnz6d8ePHk5iYyLRp\n06hWrRrx8fGsWbOGjRs3EhcXZ6nQRERERETEBIslC9evX+fKlStkZmZiMBiwsbHBwcEBOzs7Tp06\nRWZmpvFsgq2tLcWKFbNUaCIiIiIiYoK9pRoqW7YswcHBjBo1ijFjxpCVlcX06dNxc3MjJCSECRMm\n8MUXX5CZmclzzz1H586dc1R/ampqPkUu6lvroWNhHcw9Dk5OThZtT3JOfWsddBysh6XHN7F+NgaD\nwWCJhrKysoiMjKR+/foEBASwZ88eRo8ezcqVKzl69CiJiYmEhIRw5swZXnvtNV599VX69u1rVt2H\nDh3K5+hFRHLH19f3kcprfBMRa/Wo45sUDhZLFjZt2sTy5ctZuXKlcd3o0aMpVqwYGzZs4MCBAzg6\nOgKwevVqvvzyS2JjYy0RmoiIiIiImGCxexaSkpJIS0vLts7e3p6kpCTS09NJT083rrezs8POzs5S\noYmIiIiIiAkWSxaaNWvGr7/+yldffYXBYGD//v188803vPnmm1SqVImIiAjS0tJITExk6dKldO3a\n1VKhiYiIiIiICRa7DAng22+/5YMPPiAhIYHKlSszcuRI2rdvz++//86MGTP473//S4kSJXj++ecJ\nDQ3F1lbPjBMRERERKSgWTRZERERERKTw0E/3IiIiIiJikpIFERERERExScmCiIiIiIiYpGRBJAcS\nEhIKOoQiq6D6NiMjg3PnzhVI2yLWRONb/tH4JoWZkoUc8vb2pn79+vj4+NCgQQNat27Nxx9/bFbZ\ns2fP4uPjw61bt9i3bx9+fn733dfPz499+/YB0LVrV3bt2pUn8Rd1dx+fv45Rhw4diImJeeS6f/nl\nF/r165cHURZO1tq3KSkpTJ48GX9/fxo0aEBAQACRkZH3PNflfkaNGsXWrVtz1XZRorHN+lnrZ7Ao\nsNa+1fgm1sC+oAMojGJiYvDy8gLg1KlT9OvXjxo1atC+ffsHlqtcuTI//vhjjtvbsGFDruJ8XN19\nfDIzM9mwYQPjxo2jYcOG1KhRI9f13rhxI9vDAx9H1ti3U6dO5caNG6xdu5by5ctz+vRpRo0aRWpq\nKhMmTHho+atXr+aq3aJIY5v1s8bPYFFhjX2r8U2sgc4sPCJPT08aN27ML7/8AkBsbCw9e/Y0br95\n8ybe3t4kJiaSmJiIt7c3N2/evKeedevW0bZtWxo2bEhkZGS2bQEBAWzfvh248+vHsmXLaNOmDU2a\nNOHtt982/sJw/vx5QkJCaNiwIb169SIiIoIBAwbk10svFOzs7OjRowdPPPEEx48fB+D06dMMHTqU\nxo0b07ZtWxYvXsxfMwgPGDCAFStWGMuvWLGCAQMGcPnyZQYPHkxycjI+Pj5cvXqV1NRUpk2bRosW\nLfD39zc+WPBxYS19+/PPPxMQEED58uUB8PDwIDw8nNKlSxv3OXDgAL169aJRo0a88MIL/PTTTwBM\nnz6dgwcPMmvWLGbNmpUv/VRYaWyzftbyGSyKrKVvNb6JNVCy8Ih+/fVXDh8+TMuWLXNdx9GjR3n3\n3XeZMWMGe/fuxcbGhuTk5PvuHx8fz7p161i1ahXfffcdW7ZsAe6cbqxUqRLx8fFMmTKF2NjYXMdU\nVKSlpbFs2TJu375NgwYNSEtL4+WXX6ZGjRp8//33LFq0iFWrVvHll18+sJ7y5cuzePFiypQpw48/\n/kjZsmWJiIjgxIkTxMXFERcXx3//+1+zL9soCqylb7t06cLMmTOZOnUqW7du5fLly/j6+jJy5Ejg\nziUyQ4cOJTQ0lL179/LKK68Yv7zfeecdGjVqxPjx4xk/fnye91FhprHN+lnLZ7Aospa+1fgm1kDJ\nQi707duXRo0aUb9+fYKCgnjqqafw9vbOdX2bN2+mRYsW+Pn54ejoyIgRI3B2dr7v/gMHDqRkyZJU\nr14dHx8fTp06xdmzZzl48CBjx46lWLFi1KlTh969e+c6psLsr+NTr149fH192bt3L//85z+pVKkS\nhw4d4saNG4waNQpHR0dq1KjBq6++yr/+9a8ctWEwGIiNjeXtt9+mbNmylCtXjjfeeIPVq1fn06uy\nDtbYt8OHD2fmzJmcPXuW8ePH06xZM/r168evv/4KwPr16/Hz86Ndu3bY29vTuXNnvLy82Lx58yP3\nR1Gjsc36WeNnsKiwxr7V+CbWQPcs5MKXX35pvK7x4sWLhIeHM2rUqFz/6nLp0iVcXV2Ny46Ojri4\nuNx3/3Llyhn/7+DggMFg4MKFCzg7O/PEE08Yt1WuXJn//Oc/uYqpMPvr+CQkJDB8+HDKli1L/fr1\nAbh8+TKurq7Y2//fW79y5co5ni3iypUrpKamMmDAAGxsbIA7XwLp6encvn2bYsWK5d0LsiLW2rcd\nOnSgQ4cOZGVlcezYMRYvXkxISAjbt2/n7Nmz7N69m0aNGhn3z8jIwNfXNzddUKRpbLN+1voZLAqs\ntW81vklBU7LwiFxcXAgODubNN98EwNbWNtuNTA865f6XihUrcuTIEeNyRkYGly9fzlEcbm5u3Lp1\ni2vXrhm/VB/36dKqVq3KRx99RFBQEO7u7oSGhuLm5sb58+fJyMgwDvqJiYlUqFABMP/4lSlTBgcH\nB77++muqVq0KwK1bt7h06VKR/SK9m7X07fnz52nfvj3r1q3Dw8MDW1tbnn76aaZOnUrDhg25cOEC\nLi4udOnShdmzZxvLJSQkULZs2Tztk6JGY5t1s5bPYFFkLX2r8U2shS5DekTXr1/nq6++wsfHB4Dq\n1atz6tQp/vjjD27fvs2iRYuMvx7cT5cuXdizZw/bt28nPT2d6OhoUlJSchSHq6srzZo1IzIyktu3\nb/Pbb7+xZs2aXL+uoqJKlSqEhYURHR3N0aNHqVevHhUqVCAqKoq0tDT++OMPlixZQvfu3YE7N3Xu\n3r2b27dvk5CQQFxcnLEuR0dH0tLSSEtLw87Oju7duzNnzhyuX7/OrVu3mDhx4mN1Xag19K2rqysN\nGjRg4sSJ/PHHH8CdX+6io6Px9vamSpUqdO3ale3btxMfH4/BYODQoUP06NGDn3/+2dh2Tj9vjwON\nbdbPGj6DRZU19K3GN7EWShZy4YUXXjDOxdy+fXvs7OyMWX39+vXp378/AwcOpG3btnh6emY7fW5K\njRo1iIqKYtasWTRp0oQLFy7g4eGR47imT59OQkICzz77LOHh4Tz77LM4ODjk6jUWJT179qRJkyaE\nh4dja2vLxx9/zPHjx2nevDmDBg3i+eefZ+DAgQAMGTKEjIwMmjVrxogRIwgKCjLW4+3tTc2aNfHz\n8+P06dO88847lC1blq5du9KqVStSUlJ4//33C+plFghr6Nvo6Gi8vLwYPHgwDRo0oHPnzly6dInF\nixdja2uLp6cn8+bNIzIyEl9fX8aNG0dYWBhNmzYFoFu3bixcuNCsaQiLOo1thY81fAaLKmvoW41v\nYg1sDH/N+yWFXnx8PI0bNzaeIo2MjOTcuXPMnTu3gCMTEck9jW0iIgVHZxaKkClTprB69WoMBgOn\nTp1i3bp1tGjRoqDDEhF5JBrbREQKjs4sFCFHjhzhvffe4/jx45QsWZI+ffowbNiwh15XLCJizTS2\niYgUHCULIiIiIiJiki5DEhERERERk5QsiIiIiIiISUoWRERERETEJD3BWaxKQEAAZ86cMS4XL16c\nGjVqEBISQpcuXcyuJyEhgePHjxMQEJDjGAYMGMD+/fvvu33mzJn07Nkzx/WKyONN45uIFEZKFsTq\nvP322wQFBWEwGLhx4wZbtmzh7bffJj09ncDAQLPqCA8Pp379+rn6Ml2wYAHp6ekAbNy4kY8//jjb\n0zhLlSqV4zpFREDjm4gUPkoWxOqULFkSFxcXACpWrEhoaCi3bt0iMjKSzp074+jomK/tlylTxvj/\nUqVKYWtra4xHRORRaHwTkcJG9yxIodCvXz8uXrzIoUOHALhw4QJv/b927iYk6q6N4/g3096NQgYp\nCDy9guAAAAVnSURBVBFpaGGjYjNDE1IRMRSjixKj0ham7STcWFBBiQUtClLIhWipZeALLqa0CCoT\nsalFJGRGvvRiUGq0mAnNnDnPIpjnjryfxXg/NMP9+8As5rwMFwf+F1zzP+eUl+N0OklPT8ftdtPd\n3Q3AyZMnefr0KXV1dRQVFQHw4sULioqKyMzMxGazcfDgQV6/fh1RLLOzs9jtdjo7O39p37dvH9eu\nXcPn8+F0Omlra8PlcmG326mqqgr/mwfw/PlzDhw4gM1mw+1209jYiG4xFvl3Un4TkWimYkFiwvr1\n61mxYgXDw8MAVFRU4Pf7aW5uxuv1YrfbOXPmDDMzM5w6dYqsrCwKCwupqakhEAhQWlpKZmYmXq+X\nlpYWQqEQFy9ejCiWJUuW4Ha76erqCre9ffuWV69ehfcdBwIBmpqaqK2t5cqVK9y7d49Lly4BMDU1\nRUlJCbt378br9VJRUUFdXR0tLS0LXCURiUXKbyISzVQsSMxITEwkEAgAPw8Knj17FqvVSmpqKqWl\npfj9fj59+kRiYiIJCQksX76cNWvWMD09zbFjxygvL2fDhg2kp6eTn5/PmzdvIo4lNzeX/v5+vn79\nCsDt27dxOBwkJycDMDc3R2VlJRkZGbhcLo4fP057ezvBYJCbN2+SnZ1NSUkJKSkp7Nq1i7KyMq5f\nv77gNRKR2KT8JiLRSmcWJGZ8+/aNVatWAT9f29+9e5f6+nrGxsYYHBwEIBgM/jbPYrGQn59Pc3Mz\nQ0NDjI2N8fLlS1avXh1xLA6Hg6SkJO7fv09BQQFdXV0UFxeH+xMSEsjIyAh/37x5M36/n4mJCUZG\nRujr6yMrKyvcHwwG+fHjB7Ozs//3PcsiEn2U30QkWqlYkJgwPj5OIBBg48aNhEIhjh49ytTUFHv3\n7mXbtm1YLBYKCgrmnfv582f279+P1WolJyeHvLw8RkdHuXr1asTxLFq0CI/HQ3d3NxkZGXz48AG3\n2x3uj4uLIy7uvy/uQqFQuH1ubo49e/ZQVlb22+/Gx+uRFPm3UX4TkWimJ1diQltbGxaLhS1btjA4\nOIjP5+PRo0esW7cOgJ6enr+de+fOHZYtW0ZDQ0O4rbe3d8EH7nJzc2lsbMTr9bJjx45frhz8/v07\nIyMjpKWlATAwMEBSUhIWi4W0tDT6+vpISUn5Jcb+/n6qqqoWFJOIxB7lNxGJZjqzIFEnEAgwOTnJ\nxMQEw8PDVFdXU19fz4kTJ4iPj8disbB48WK6urr4+PEjPT09nDt3Dvh5kwfAypUref/+PV++fCE5\nOZnJyUkeP37M+Pg4t27d4saNG+Gxkdq0aROpqak0NTXh8Xh+6z99+jRDQ0P09vZSU1NDYWEhcXFx\nHD58mNHRUS5cuMDo6Cg9PT1UVlaydu3aBcUjItFP+U1EYo4RiSI7d+40Vqs1/HE6naaoqMg8fPjw\nl3Gtra1m+/btxmazGY/HY9rb201OTo5pbW01xhjz4MEDY7fbTV5engkGg6aystI4HA6TnZ1tDh06\nZDo7O43VajXv3r37n/F0dHQYl8v1t/21tbUmOzvbzMzMhNuePHlirFaraWhoMA6Hw2zdutVUV1eb\nYDAYHvPs2TNTUFBg0tPTTU5Ojrl8+bKZm5uLYMVEJFYov4lILFpkjC4/FolUVVUV09PTnD9/Ptzm\n8/k4cuQIAwMDLF269A9GJyISOeU3EQGdWRCJyMDAAENDQ3R0dNDY2PinwxER+ccov4nIX6lYEIlA\nf38/tbW1FBcXY7PZ/nQ4IiL/GOU3EfkrbUMSEREREZF56TYkERERERGZl4oFERERERGZl4oFERER\nERGZl4oFERERERGZl4oFERERERGZ138AP9myDRKyLwYAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "pal = sns.light_palette(\"seagreen\", n_colors=3, reverse=True)\n", "g = sns.factorplot(data=results, x='Data Type', y='Accuracy', hue='Metrics', col='Balanced',\n", " kind='bar', palette=pal, aspect=1.2, errwidth=1, capsize=0.04)\n", "g.set(ylim=(88, 98))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this application, training and testing classifiers on unbalanced data yields similar levels of accuracy, except for buildings. It seems that the heavily skewed `buildings` dataset (87% vs 13%) inflates the performance of the building classifiers, hence the difference. If the classifier always predicts _trusted_, it is already 87% accurace on the unbalanced data (compared to the 50% baseline accuracy on a balance dataset with random predictions)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 2 }