{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Application 3: RRG Chat Messages\n", "Identifying instructions from chat messages in the Radiation Response Game\n", "\n", "* **Goal**: To determine if the provenance network analytics method can identify instructions from the provenance of a chat messages.\n", "* **Classification labels**: $\\mathcal{L} = \\left\\{ \\textit{instruction}, \\textit{other} \\right\\} $.\n", "* **Training data**: 69 chat messages manually categorised by HCI researchers.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading data\n", "\n", "The datasets from this application are provided in the folder [`rrg`](rrg). Each CSV file, `depgraphs-`$k$ `.csv` with $k = 1 \\ldots 18$, is a table whose rows correspond to individual chat messages in RRG:\n", "* First column: the identifier of the chat message\n", "* `label`: the manual classification of the message (e.g., _instruction_, _information_, _requests_, etc.)\n", "* The remaining columns provide the provenance network metrics calculated from the dependency provenance graph of the message to the depth of $k$." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "filepath = lambda k: \"rrg/depgraphs-%d.csv\" % k" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelentitiesagentsactivitiesnodesedgesdiameterassortativityaccacc_e...mfd_e_amfd_e_agmfd_a_emfd_a_amfd_a_agmfd_ag_emfd_ag_amfd_ag_agmfd_derpowerlaw_alpha
21requests5301546360.1970080.3939390.090909...404000005-1.0
20commissives5703608470.0467170.4033670.105051...505000005-1.0
23assertives6202648470.1911050.3939390.090909...503000006-1.0
25instruction5502577760.1285940.3939390.090909...505400005-1.0
24instruction5201536270.1752500.3949490.092424...405000005-1.0
\n", "

5 rows × 23 columns

\n", "
" ], "text/plain": [ " label entities agents activities nodes edges diameter \\\n", "21 requests 53 0 1 54 63 6 \n", "20 commissives 57 0 3 60 84 7 \n", "23 assertives 62 0 2 64 84 7 \n", "25 instruction 55 0 2 57 77 6 \n", "24 instruction 52 0 1 53 62 7 \n", "\n", " assortativity acc acc_e ... mfd_e_a mfd_e_ag \\\n", "21 0.197008 0.393939 0.090909 ... 4 0 \n", "20 0.046717 0.403367 0.105051 ... 5 0 \n", "23 0.191105 0.393939 0.090909 ... 5 0 \n", "25 0.128594 0.393939 0.090909 ... 5 0 \n", "24 0.175250 0.394949 0.092424 ... 4 0 \n", "\n", " mfd_a_e mfd_a_a mfd_a_ag mfd_ag_e mfd_ag_a mfd_ag_ag mfd_der \\\n", "21 4 0 0 0 0 0 5 \n", "20 5 0 0 0 0 0 5 \n", "23 3 0 0 0 0 0 6 \n", "25 5 4 0 0 0 0 5 \n", "24 5 0 0 0 0 0 5 \n", "\n", " powerlaw_alpha \n", "21 -1.0 \n", "20 -1.0 \n", "23 -1.0 \n", "25 -1.0 \n", "24 -1.0 \n", "\n", "[5 rows x 23 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# An example of reading the data file\n", "df = pd.read_csv(filepath(5), index_col=0)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Labelling data\n", "\n", "Since we are only interested in the _instruction_ messages, we categorise the data entity into two sets: _instruction_ and _other_.\n", "\n", "Note: This section is just an example to show the data transformation to be applied on each dataset." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "label = lambda l: 'other' if l != 'instruction' else l" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelentitiesagentsactivitiesnodesedgesdiameterassortativityaccacc_e...mfd_e_amfd_e_agmfd_a_emfd_a_amfd_a_agmfd_ag_emfd_ag_amfd_ag_agmfd_derpowerlaw_alpha
21other5301546360.1970080.3939390.090909...404000005-1.0
20other5703608470.0467170.4033670.105051...505000005-1.0
23other6202648470.1911050.3939390.090909...503000006-1.0
25instruction5502577760.1285940.3939390.090909...505400005-1.0
24instruction5201536270.1752500.3949490.092424...405000005-1.0
\n", "

5 rows × 23 columns

\n", "
" ], "text/plain": [ " label entities agents activities nodes edges diameter \\\n", "21 other 53 0 1 54 63 6 \n", "20 other 57 0 3 60 84 7 \n", "23 other 62 0 2 64 84 7 \n", "25 instruction 55 0 2 57 77 6 \n", "24 instruction 52 0 1 53 62 7 \n", "\n", " assortativity acc acc_e ... mfd_e_a mfd_e_ag \\\n", "21 0.197008 0.393939 0.090909 ... 4 0 \n", "20 0.046717 0.403367 0.105051 ... 5 0 \n", "23 0.191105 0.393939 0.090909 ... 5 0 \n", "25 0.128594 0.393939 0.090909 ... 5 0 \n", "24 0.175250 0.394949 0.092424 ... 4 0 \n", "\n", " mfd_a_e mfd_a_a mfd_a_ag mfd_ag_e mfd_ag_a mfd_ag_ag mfd_der \\\n", "21 4 0 0 0 0 0 5 \n", "20 5 0 0 0 0 0 5 \n", "23 3 0 0 0 0 0 6 \n", "25 5 4 0 0 0 0 5 \n", "24 5 0 0 0 0 0 5 \n", "\n", " powerlaw_alpha \n", "21 -1.0 \n", "20 -1.0 \n", "23 -1.0 \n", "25 -1.0 \n", "24 -1.0 \n", "\n", "[5 rows x 23 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.label = df.label.apply(label).astype('category')\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Balancing data\n", "\n", "This section explore the balance of the RRG datasets." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "other 37\n", "instruction 32\n", "Name: label, dtype: int64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Examine the balance of the dataset\n", "df.label.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since both labels have roughly the same number of data points, we decide not to balance the RRG datasets." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cross validation\n", "\n", "We now run the cross validation tests on the 18 datasets ($k = 1 \\ldots 18$) using all the features (`combined`), only the generic network metrics (`generic`), and only the provenance-specific network metrics (`provenance`). The folowing steps are applied to each dataset:\n", "1. Read the dataset from the CSV file\n", "2. Label the data (see above)\n", "3. Carry out the cross validation test\n", "4. Append the test result into `results` and the feature importance into `importances`\n", "\n", "Please refer to [Cross Validation Code.ipynb](Cross%20Validation%20Code.ipynb) for the detailed description of the cross validation code." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from analytics import test_classification" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy: 53.57% ±0.2216 <-- 1-combined\n", "Accuracy: 53.57% ±0.2216 <-- 1-generic\n", "Accuracy: 53.57% ±0.2216 <-- 1-provenance\n", "Accuracy: 70.69% ±0.9766 <-- 2-combined\n", "Accuracy: 71.06% ±0.9238 <-- 2-generic\n", "Accuracy: 70.44% ±0.9567 <-- 2-provenance\n", "Accuracy: 82.12% ±0.8706 <-- 3-combined\n", "Accuracy: 82.59% ±0.8471 <-- 3-generic\n", "Accuracy: 76.97% ±0.8917 <-- 3-provenance\n", "Accuracy: 78.14% ±0.9607 <-- 4-combined\n", "Accuracy: 75.64% ±0.9324 <-- 4-generic\n", "Accuracy: 72.01% ±0.9712 <-- 4-provenance\n", "Accuracy: 75.94% ±1.0142 <-- 5-combined\n", "Accuracy: 75.04% ±0.9833 <-- 5-generic\n", "Accuracy: 78.20% ±0.9767 <-- 5-provenance\n", "Accuracy: 80.32% ±0.8902 <-- 6-combined\n", "Accuracy: 78.80% ±0.8886 <-- 6-generic\n", "Accuracy: 78.28% ±0.9354 <-- 6-provenance\n", "Accuracy: 80.04% ±0.9246 <-- 7-combined\n", "Accuracy: 79.71% ±0.9206 <-- 7-generic\n", "Accuracy: 78.41% ±0.9294 <-- 7-provenance\n", "Accuracy: 83.04% ±0.8573 <-- 8-combined\n", "Accuracy: 83.43% ±0.8413 <-- 8-generic\n", "Accuracy: 83.01% ±0.8509 <-- 8-provenance\n", "Accuracy: 77.65% ±0.9467 <-- 9-combined\n", "Accuracy: 80.00% ±0.9303 <-- 9-generic\n", "Accuracy: 77.95% ±0.9707 <-- 9-provenance\n", "Accuracy: 78.41% ±0.9444 <-- 10-combined\n", "Accuracy: 76.35% ±0.9573 <-- 10-generic\n", "Accuracy: 81.06% ±0.8990 <-- 10-provenance\n", "Accuracy: 85.13% ±0.7883 <-- 11-combined\n", "Accuracy: 85.11% ±0.8229 <-- 11-generic\n", "Accuracy: 84.68% ±0.7948 <-- 11-provenance\n", "Accuracy: 78.68% ±0.9448 <-- 12-combined\n", "Accuracy: 75.82% ±0.9552 <-- 12-generic\n", "Accuracy: 84.07% ±0.9171 <-- 12-provenance\n", "Accuracy: 80.92% ±0.8581 <-- 13-combined\n", "Accuracy: 85.24% ±0.8555 <-- 13-generic\n", "Accuracy: 78.61% ±0.9031 <-- 13-provenance\n", "Accuracy: 82.06% ±0.8376 <-- 14-combined\n", "Accuracy: 73.98% ±0.9360 <-- 14-generic\n", "Accuracy: 81.76% ±0.8362 <-- 14-provenance\n", "Accuracy: 85.13% ±0.8144 <-- 15-combined\n", "Accuracy: 79.38% ±0.9295 <-- 15-generic\n", "Accuracy: 83.82% ±0.8687 <-- 15-provenance\n", "Accuracy: 76.70% ±0.9083 <-- 16-combined\n", "Accuracy: 79.91% ±0.9507 <-- 16-generic\n", "Accuracy: 79.81% ±0.8504 <-- 16-provenance\n", "Accuracy: 79.35% ±0.8664 <-- 17-combined\n", "Accuracy: 77.12% ±0.8863 <-- 17-generic\n", "Accuracy: 80.59% ±0.8716 <-- 17-provenance\n", "Accuracy: 74.42% ±0.9875 <-- 18-combined\n", "Accuracy: 70.79% ±0.9429 <-- 18-generic\n", "Accuracy: 75.42% ±0.9804 <-- 18-provenance\n" ] } ], "source": [ "results = pd.DataFrame()\n", "importances = pd.DataFrame()\n", "for k in range(1, 19):\n", " df = pd.read_csv(filepath(k), index_col=0)\n", " df.label = df.label.apply(label).astype('category')\n", "\n", " res, imps = test_classification(df, n_iterations=1000, test_id=str(k))\n", " res['$k$'] = k\n", " imps['$k$'] = k\n", "\n", " # storing the results and importance of features\n", " results = results.append(res, ignore_index=True)\n", " importances = importances.append(imps, ignore_index=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Optionally, we can save the test results to save time the next time we want to re-explore them:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "results.to_pickle(\"rrg/results.pkl\")\n", "importances.to_pickle(\"rrg/importances.pkl\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next time, we can reload the results as follows:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((54000, 3), (18000, 23))" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "results = pd.read_pickle(\"rrg/results.pkl\")\n", "importances = pd.read_pickle(\"rrg/importances.pkl\")\n", "results.shape, importances.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Charting the resutls" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib inline\n", "import seaborn as sns\n", "sns.set_style(\"whitegrid\")\n", "sns.set_context(\"paper\", font_scale=1.4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this application, with the many configuration to chart, it is difficult to determine which configuration yields the best accuracy from a figure. Instead, we determine this from the data. We group the performance of all classifiers by the set of metrics they used and the $k$ value; then, we calculate the mean accuracy of those groups." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true }, "outputs": [], "source": [ "results['Accuracy'] = results['Accuracy'] * 100 # converting accuracy values to percent" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# define a function to calculate the mean and its confidence interval from a group of values\n", "import scipy.stats as st\n", "def calc_means_ci(group):\n", " mean = group.mean()\n", " ci_low, ci_high = st.t.interval(0.95, group.size - 1, loc=mean, scale=st.sem(group))\n", " return pd.Series({\n", " 'mean': mean,\n", " 'ci_low': ci_low,\n", " 'ci_high': ci_high\n", " })" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [], "source": [ "accuracy_by_metrics_k = results.groupby([\"Metrics\", \"$k$\"]) # grouping results by metrics sets and k" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
meanci_lowci_high
Metrics$k$
combined153.57142953.34969453.793163
270.69464369.71750471.671781
382.12381081.25281382.994806
478.13511977.17395779.096281
575.93809574.92341076.952781
680.32142979.43077281.212085
780.04404879.11897480.969121
883.03988182.18216983.897593
977.65178676.70465078.598922
1078.41428677.46940379.359168
1185.13333384.34466685.922001
1278.67738177.73212179.622641
1380.91904880.06047781.777618
1482.05714381.21910282.895184
1585.12976284.31490685.944618
1676.69702475.78823777.605811
1779.34523878.47839080.212086
1874.42321473.43523275.411197
generic153.57142953.34969453.793163
271.05654870.13229271.980803
382.58928681.74171583.436857
475.64345274.71058976.576315
575.03571474.05187676.019553
678.80476277.91571779.693807
779.70535778.78429680.626418
883.43214382.59040084.273886
980.00357179.07280480.934339
1076.35357175.39584277.311301
1185.11250084.28915885.935842
1275.82321474.86750576.778923
1385.23928684.38331886.095253
1473.97500073.03856174.911439
1579.37916778.44919680.309137
1679.90833378.95714180.859525
1777.12142976.23470778.008150
1870.78869069.84532671.732055
provenance153.57142953.34969453.793163
270.43511969.47792971.392309
376.97381076.08162677.865993
472.00833371.03669472.979972
578.20119077.22399279.178389
678.28035777.34448179.216234
778.41428677.48440179.344170
883.01309582.16180683.864384
977.95178676.98057178.923000
1081.06250080.16301981.961981
1184.68035783.88516585.475549
1284.06785783.15033184.985383
1378.60773877.70421779.511260
1481.76071480.92407282.597356
1583.82083382.95172784.689939
1679.81190578.96108780.662722
1780.58928679.71721781.461355
1875.41904874.43813976.399957
\n", "
" ], "text/plain": [ " mean ci_low ci_high\n", "Metrics $k$ \n", "combined 1 53.571429 53.349694 53.793163\n", " 2 70.694643 69.717504 71.671781\n", " 3 82.123810 81.252813 82.994806\n", " 4 78.135119 77.173957 79.096281\n", " 5 75.938095 74.923410 76.952781\n", " 6 80.321429 79.430772 81.212085\n", " 7 80.044048 79.118974 80.969121\n", " 8 83.039881 82.182169 83.897593\n", " 9 77.651786 76.704650 78.598922\n", " 10 78.414286 77.469403 79.359168\n", " 11 85.133333 84.344666 85.922001\n", " 12 78.677381 77.732121 79.622641\n", " 13 80.919048 80.060477 81.777618\n", " 14 82.057143 81.219102 82.895184\n", " 15 85.129762 84.314906 85.944618\n", " 16 76.697024 75.788237 77.605811\n", " 17 79.345238 78.478390 80.212086\n", " 18 74.423214 73.435232 75.411197\n", "generic 1 53.571429 53.349694 53.793163\n", " 2 71.056548 70.132292 71.980803\n", " 3 82.589286 81.741715 83.436857\n", " 4 75.643452 74.710589 76.576315\n", " 5 75.035714 74.051876 76.019553\n", " 6 78.804762 77.915717 79.693807\n", " 7 79.705357 78.784296 80.626418\n", " 8 83.432143 82.590400 84.273886\n", " 9 80.003571 79.072804 80.934339\n", " 10 76.353571 75.395842 77.311301\n", " 11 85.112500 84.289158 85.935842\n", " 12 75.823214 74.867505 76.778923\n", " 13 85.239286 84.383318 86.095253\n", " 14 73.975000 73.038561 74.911439\n", " 15 79.379167 78.449196 80.309137\n", " 16 79.908333 78.957141 80.859525\n", " 17 77.121429 76.234707 78.008150\n", " 18 70.788690 69.845326 71.732055\n", "provenance 1 53.571429 53.349694 53.793163\n", " 2 70.435119 69.477929 71.392309\n", " 3 76.973810 76.081626 77.865993\n", " 4 72.008333 71.036694 72.979972\n", " 5 78.201190 77.223992 79.178389\n", " 6 78.280357 77.344481 79.216234\n", " 7 78.414286 77.484401 79.344170\n", " 8 83.013095 82.161806 83.864384\n", " 9 77.951786 76.980571 78.923000\n", " 10 81.062500 80.163019 81.961981\n", " 11 84.680357 83.885165 85.475549\n", " 12 84.067857 83.150331 84.985383\n", " 13 78.607738 77.704217 79.511260\n", " 14 81.760714 80.924072 82.597356\n", " 15 83.820833 82.951727 84.689939\n", " 16 79.811905 78.961087 80.662722\n", " 17 80.589286 79.717217 81.461355\n", " 18 75.419048 74.438139 76.399957" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Calculate the means and the confidence intervals over the grouped data (using the calc_means_ci function above)\n", "results_means_ci = accuracy_by_metrics_k.Accuracy.apply(calc_means_ci).unstack()\n", "results_means_ci = results_means_ci[['mean', 'ci_low', 'ci_high']] # reorder the column\n", "results_means_ci" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we sort the mean accuracy values in each metrics sets and find $k$ value that yields the *highest accuracy* for *each* set of metrics (i.e. `combined`, `generic`, and `provenance`)." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
$k$123456789101112131415161718
Metrics
combined53.57142970.69464382.12381078.13511975.93809580.32142980.04404883.03988177.65178678.41428685.13333378.67738180.91904882.05714385.12976276.69702479.34523874.423214
generic53.57142971.05654882.58928675.64345275.03571478.80476279.70535783.43214380.00357176.35357185.11250075.82321485.23928673.97500079.37916779.90833377.12142970.788690
provenance53.57142970.43511976.97381072.00833378.20119078.28035778.41428683.01309577.95178681.06250084.68035784.06785778.60773881.76071483.82083379.81190580.58928675.419048
\n", "
" ], "text/plain": [ "$k$ 1 2 3 4 5 6 \\\n", "Metrics \n", "combined 53.571429 70.694643 82.123810 78.135119 75.938095 80.321429 \n", "generic 53.571429 71.056548 82.589286 75.643452 75.035714 78.804762 \n", "provenance 53.571429 70.435119 76.973810 72.008333 78.201190 78.280357 \n", "\n", "$k$ 7 8 9 10 11 12 \\\n", "Metrics \n", "combined 80.044048 83.039881 77.651786 78.414286 85.133333 78.677381 \n", "generic 79.705357 83.432143 80.003571 76.353571 85.112500 75.823214 \n", "provenance 78.414286 83.013095 77.951786 81.062500 84.680357 84.067857 \n", "\n", "$k$ 13 14 15 16 17 18 \n", "Metrics \n", "combined 80.919048 82.057143 85.129762 76.697024 79.345238 74.423214 \n", "generic 85.239286 73.975000 79.379167 79.908333 77.121429 70.788690 \n", "provenance 78.607738 81.760714 83.820833 79.811905 80.589286 75.419048 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Looking at only the means in each set of metrics\n", "results_means_ci['mean'].unstack()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('combined', 11), ('generic', 13), ('provenance', 11)]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Finding the highest accuracy value in each row (i.e. each set of metrics)\n", "highest_accuracy_configurations = [\n", " (row_name, row.sort_values(ascending=False)[:1].index.get_values()[0]) # the index (i.e. k value) of the highest accuracy (i.e. first one)\n", " for row_name, row in results_means_ci['mean'].unstack().iterrows()\n", "]\n", "highest_accuracy_configurations" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
meanci_lowci_high
Metrics$k$
combined1185.13333384.34466685.922001
generic1385.23928684.38331886.095253
provenance1184.68035783.88516585.475549
\n", "
" ], "text/plain": [ " mean ci_low ci_high\n", "Metrics $k$ \n", "combined 11 85.133333 84.344666 85.922001\n", "generic 13 85.239286 84.383318 86.095253\n", "provenance 11 84.680357 83.885165 85.475549" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results_means_ci.loc[highest_accuracy_configurations, :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The results above shows that $k = 13$ - `generic` yields the highest accuracy level: 85.24%. Using all the metrics or only the provenance-specific metrics yield comparable levels of accuracy (in the confidence interval of the highest accuracy) with $k = 11$.\n", "\n", "For a visual comparison of all the configurations tested, we chart their accuracy next." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "85.239285714285714" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlwAAAENCAYAAADaJrt/AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X1czff/P/DHWRcshSzXTLShYT6E1UIoQ1SkWKTNzkwu\n95HaGDFpwkcTc7G5mBGKXJWrPtMwk7BbrBkxsxkzSyifLp1T5/37w8/5iuqgXu9z0eN+u7nR+5zz\nfj7f56qH1/v9fr0VkiRJICIiIiJhXtB3A0RERESmjoGLiIiISDAGLiIiIiLBGLiIiIiIBGPgIiIi\nIhKMgYuIiIhIMHORK1+zZg127twJS0tLeHp6YsKECdi4cSPWr18PW1tbAECvXr0QGhoqsg0iIiIi\nvRIWuE6cOIHExETs2LEDVlZWmDRpEr799lv8/PPPiIyMRO/evUWVJiIiIjIownYpXrhwAa6urrCx\nsYGZmRl69eqFlJQUZGRkYOvWrfD29kZYWBju3bsnqgUiIiIigyAscHXo0AGpqanIycnB/fv3cfjw\nYdy8eRP29vaYNm0aEhMTYWdnh4iICFEtEBERERkEhchL+2zYsAG7du1C/fr14eLigoyMDHz11Vfa\n23Nzc9GvXz+cOXOm3Menp6eLao2IiIio2jk5OZW7XNgxXPn5+fDw8MDYsWMBPAhfdnZ2iIuLQ0BA\nAABAkiSYm1feQkWNExERERmSygaKhO1SvHHjBoKDg6FWq5Gfn4+EhAT4+PggJiYGly9fBgBs2rQJ\nHh4eologIiIiMgjCRrjatWsHb29v+Pj4oLS0FEFBQejRowcWL16MkJAQqNVqODg4ICoqSlQLRERE\nRAZB6DFcVZWens5dikRERGQUKsstnGmeiIiISDChM80TycH10wAh6039NE7IenXp168fNmzYgFat\nWpVZvmzZMrz22mvo37+/sBrVaWXKVmHrnuQxSti6n1dcXBw0Gg1Gjx4te+27xXlC1tugto2Q9RLV\nRAxcREbiww8/1HcLVImHZ18TEZWHgYuoimJjY7FlyxaYmZnByckJU6dORXh4OP766y8oFAqMGzcO\nXl5e2LVrF44ePYrs7GxkZWWhb9++aNiwIVJSUlBUVIQvvvgCbdq0AQCsWrUKv/76K8zNzREREQFH\nR0fMmDEDTk5OcHFxQXBwMDp16oTz58+jVq1a+Pzzz9GyZUucP38eCxYsQGFhIaysrBAeHo727dvj\n5s2b2is7vPrqq7h//76enzV5LFu2DPv374eNjQ0cHBzQsmVLdO3aFTExMSgpKUGDBg0QERGB5s2b\nY8yYMejUqRPOnj2LrKwsTJ48Gb6+vigqKkJkZCQuXLiAkpISjBgxAmPGjMGpU6ewaNEiAECTJk3g\n6OiIkpISTJs2DadOnUJUVBRKS0vRoEEDREdHw87OTs/PhjinTp3CsmXLULt2bdy4cQMdO3bEggUL\n4Orqiq5du+Kvv/7C9u3bERcXh927d8PMzAyvv/46Zs+ejR07duDixYv47LPPAADffPMNbty4gRkz\nZiA6OhppaWkoKSlB3759MW3aNO0Z8OW9/w8ePIgNGzaguLgYxcXFiIiIgLOzc4WvrUqlQmRkJE6d\nOgVzc3OMGTMGb7/9doWfI6Kq4DFcRFVw/vx5bNiwAXFxcdi3bx9yc3MxZMgQdO7cGXv37sXXX3+N\n6OhoXLp0CQBw5swZrF69GklJSUhISICNjQ127NiBvn37IiEhQbveli1bYvfu3Zg8eTI++uijJ+pe\nvnwZb7/9NpKSktC5c2ds2bIFarUaM2bMwMKFC7F7927MmjVLOyo2f/589O/fH3v37oW/vz9u374t\nzxOkR0eOHMGxY8eQlJSEzZs348qVK8jJycHChQvx1VdfYffu3QgICMDs2bO1jyksLERcXBy++OIL\nbZhavXo17O3tsXv3biQkJGD//v3auXZ+//13fP3111i1apV2HSqVCiEhIYiIiMDevXvRp08fxMbG\nyrvxenDu3Dl88sknSE5OhlqtxqZNm5CXl4fAwEAcOHAAZ86cwcGDB5GQkICkpCQoFAqsWLECgwcP\nxpEjR6BSqQAASUlJGDZsGHbs2IHi4mLs2rULe/bswdWrV7F3714A5b//JUlCfHy89vM1fvx4rFu3\nTttfea/tli1bcOfOHRw4cADx8fHYsmUL7t69W+HniKgqOMJFVAWnTp1C3759YWtrCwBYvnw5nJ2d\nMWLECACAnZ0d+vXrh1OnTsHa2hpOTk6oX78+AKB+/fpwdXUFADRv3hy//PKLdr2+vr4AADc3N4SF\nhSEnJ6dM3fr166Nz584AgLZt2+Ls2bP4448/cP36dUyePFl7v4KCAuTk5CAtLQ0LFiwAALi4uKBl\ny5Ying6DkpqaiiFDhqB27doAAG9vbyQmJuKff/7RTsgsSRIKCgq0j3FzcwPw4DnNzc0FABw/fhxF\nRUXYt28fgAfP6cWLF/HKK6+gdevW2tfzoV9//RUNGjTA66+/DgDaWqaua9eueOWVVwAAPj4+2Lp1\nq3Y5AKSlpWHw4MGoU6cOAGDkyJGYM2cOwsLC0KVLFxw7dgxt2rSBWq3Ga6+9hi+//BIXLlzA0KFD\nAQDFxcVo1aoVunbtWu77X6FQYOXKlTh8+DD++OMPnD59Go+ehF/ea3vq1CkMGzYMZmZmsLGxwd69\ne/Hrr79W+Dl6+Dkneh4MXERVYGZmBoVCof35zp07T1yQXZIklJaWAgAsLCzK3FbRlRbMzMzKPP7x\nx9WqVUv7b4VCAUmSoNFo0LRpUyQmJmpv++eff1C/fn0oFApoNJpy12+qXnjhBTw+640kSejcubN2\n5KOkpAR37tzR3v7weX30NZUkCQsXLtT+gr979y7q1KmDn376SRvmHvX4a1pUVITbt2+bfMh9dLsl\nSdK+xx4+R+W9FiUlJQCAoUOH4sCBA7C3t9cGLI1Gg5CQEHh6egIA7t27B3Nzc+Tk5JT7/i8oKICv\nry+8vb3RvXt3tGvXDhs3btTer7zX9vHP7/Xr1yv9HBFVBXcpElVB9+7d8f333yMvLw+SJGHu3LnQ\naDTYvn07AOD27dv47rvv0KNHj2da78NdJ4cOHUKrVq1gbW2t8zFt2rRBYWEhTpw4AQBISUlBYGAg\nAKBnz57YuXMnAODs2bO4du3aM/VjjFxdXZGcnIz79+9DpVLh4MGDcHd3x7lz57RXu9iyZQtCQ0Mr\nXY+zs7N2l9X//vc/+Pv7IyMjo8L7t27dGvfu3cOFCxcAANu2bSuzy9FUnTlzBjdv3oRGo8GePXu0\no7cPubi4YP/+/SgoKIAkSdi2bRveeOMNAA9GnzIyMpCcnAwvLy/t/bdt2waVSgWVSoUPPvgA3377\nbYX1r169CgCYOHEinJ2dcezYsTL/yShPjx49sG/fPmg0GuTn5+Odd96BpaVlhZ8joqrgCBcZPX1N\n3wAAr732GpRKJUaNGgWNRoMePXrg2LFjmDdvHry8vFBaWoopU6agQ4cO2uO4nsZff/0FHx8fvPji\ni1i8ePFTPcbS0hLLly/HZ599hoULF8LCwgIxMTFQKBQIDw/Hxx9/jKSkJLRu3Rovv/zy827yU9P3\n1A1ubm44d+4chg0bhjp16sDW1ha1atXC4sWLERYWhtLSUtSrV0/n1S4mTZqEyMhIeHl5Qa1WY9So\nUejRowdOnTpV7v0fHsQ9Z84cqFQqNGzY8Klfw+dlCNM3NGrUCJ988gn+/vtvvPHGGxg1apR2Nzbw\n4PW4dOkSRo4ciZKSEnTq1AkzZ84E8OC926dPH/z5559o1KgRgAe7HK9du4Zhw4ahpKQE/fr1w9Ch\nQ3Hjxo1y67dv3x4dO3bEwIEDYWVlhW7duuHGjRuVhq6AgABcvXoVPj4+0Gg0mDBhAtq0aVPh54io\nKjjTPBGZpIyMDPz666/w9/eHJEmYOnUqhg8fjj59+ui7NZNz6tQpxMTEIC5Of//5ITIEnGmeiGoc\ne3t7HDp0CF5eXvD29oa9vT3DFhHpDUe4iIiIiKoBR7iIiIiI9IiBi4iIiEgwBi4iIiIiwYQGrjVr\n1mDAgAHw8vLC6tWrAUB71tDAgQPx4YcforCwUGQLRERERHonLHCdOHECiYmJ2LFjB/bs2YOMjAx8\n++23CAsLQ0hICJKTk2Fvb68NYkRERESmSljgunDhAlxdXWFjYwMzMzP06tULsbGxyM3NhYuLCwDA\nz88PBw4cENUCERERkUEQFrg6dOiA1NRU5OTk4P79+zh8+DAsLCzQuHFj7X0aNWqErKwsUS0QERER\nGQRhl/ZxcXGBn58fgoKCUL9+fbi4uODkyZNP3E/X5RIyMzNFtUhEREQkC2GBKz8/Hx4eHhg7diwA\nYMOGDWjevHmZ0HXr1i00adKk0vU4OjqKapGIiIio2qSnp1d4m7Bdijdu3EBwcDDUajXy8/ORkJCA\nYcOGwdraGmlpaQCAnTt3ws3NTVQLRERERAZB2AhXu3bt4O3tDR8fH5SWliIoKAg9evRAdHQ0wsPD\nMX/+fLRo0QLR0dGiWiAiIiIyCLyWIhEREVE14LUUiYiIiPSIgYuIiIhIMAYuIiIiIsEYuIiIiIgE\nY+AiIiIiEoyBi4iIiEgwBi4iIiIiwRi4iIiIiARj4CIiIiISjIGLiIiISDAGLiIiIiLBGLiIiIiI\nBGPgIiIiIhKMgYuIiIhIMAYuIiIyaEqlEgMHDtR3G0RVwsBFRGQgdAWLmho81q9fD7Vare82ql1N\nfT1rKqGBKzExEYMHD8aQIUMwc+ZMqNVqbNy4Eb1794aPjw98fHywZMkSkS0QERkNXcHCVINHTcXX\ns2YxF7XigoICREZG4uDBg3jppZcwceJE7NmzBz///DMiIyPRu3dvUaW1rty9gtziXOF1iIiqy/+s\n/4f0v9Of6/aIiAj8ffNvNGvaDHPmzBHVol7oel6MlalulzGrX7s+HBo4VPt6hQUujUYDjUaDoqIi\nlJaWQqVSoVatWsjIyEBBQQGWLFmCdu3aYfbs2ahXr161188uyEbbFW2hkTTVvm4iImG6At3Wdnu+\n25v+/z8AktYmVXtreqXreTFWprpdRuwFxQv4Z/o/aFinYbWuVyFJklSta3xEbGwsFi9eDCsrK7zy\nyivYsGEDJk6ciLCwMLRt2xaLFy/GrVu3EB0dXe7j09PTYWVl9dz1r+VfQ54q77kfT0Qk0vxDa59Y\nZnn2LlRdGmh/HvivsnsDDm/Yg35jh2p/7t6wXZnbFy1ehI8/+riaO9Vt/dfrcfv2bdjZ2UH5nrLK\n6/sx+1KZn3VttzF4mtc7vP84OVuicthY2uBl65ef67GFhYVwcnIq9zZhI1yZmZnYvn07jhw5grp1\n6yI0NBRr167FunXrtPcZP348+vXrV+l6HB0dn7sHRzz/Y4mIRFt6aOcTy2rnl6AY//cL+GXr1mVu\nr5tft8wyP1e/Mrevzl/9xDI5+Ln6wd3dHQcTD1bL+rJStpb5Wdd2G4Oneb2Ncbvo/6SnV7x7WNhB\n86mpqXB2doadnR0sLS3h6+uL06dPIy4uTnsfSZJgbi4s8xERmby7xXll/jy+jEyXUqmEu7s7lMqq\njyiSeMICV/v27XHy5EkUFBQAAI4ePQoHBwfExMTg8uXLAIBNmzbBw8NDVAtEJBBPaa9+lul3ICn0\n3YXhSVm7Ey+YcRajx61fv77M32TYhA0v9ezZE5mZmfD19YWlpSU6dOiAuXPnws3NDSEhIVCr1XBw\ncEBUVJSoFohIoPXr18Pd3V3fbZgUldNLld5eU4OHx7jh+m5BCAbsmkXo/rxx48Zh3LiyBwC6ubnB\nzc1NZFkiqgZKpRI3btxAcnLycz/+6tWrsLe35//Aq4mu4BEycar2789XLRfeT3m7LB9d1qC2jfAe\njJmugE2mhQdQEVG5qjqC9fDxDFvykSNkkX4x5Bqvmjc2TURERCQzjnCZmKruBiKqzMrHTtV/fNkk\nj1FytkNEZDQ4wmVieG0uItKHR48fI6IncYTLyHAEi0Rx/TTgiWW1H1s+qqdXpevg8SU1l7EeP2bM\n36lynyRBVcPAZWR4Kj4RUfUx5u9UhizjwsBlAnhcDelDTZ0TiuhRnP6EnhYDFxE9F1OdjJLoWXD6\nE3paDFwGrjqOqyF6HlWdBZvHl5Aheprv1NRP4564D1FVMXARUbmqOgt2TQ1Z3MX07Iz5wHWip8UD\nMEwMj6sh0i9eUPjZGdt0NneL88r8eXwZUXk4wmVieFwNERGR4WHgMjK8ujwRUfXhd6rxMdbd9gxc\nRoZXl5cfjy8hXTjhq/Hid6rxMdYzQxm4iHTQ58SIDHvl4/Ni3Ezp7GtTPRuXn7HqJzRwJSYmYs2a\nNVAoFOjUqRMiIiLwxx9/YNasWcjLy0O7du0QFRUFKysrkW0QGS1dYa+mfika8+zgZFpMKWQ96mm+\ne+TcrWcKo8jCTmcrKChAZGQkNm7ciL179yI3Nxd79uxBWFgYQkJCkJycDHt7e6xevVpUC0Qmz9jO\n7iIi08CzcZ+dsBEujUYDjUaDoqIilJaWQqVSwdzcHLm5uXBxcQEA+Pn54d1338X06dNFtUH0zDgx\nIlWFqe5iIvmZ6gi2qW6XLsICl42NDf7973/D09MTVlZWeOWVV/Dyyy+jcePG2vs0atQIWVlZla4n\nMzNTVIs1Bp/D6ifqOX1/W8QTyx4Pe+tGzpGtn+pWlT4P3zj7xLJHrxnar3mX5153VTVu3UL77/JC\nlj5fH5G1H93uqtR+njMFRW6XrnVX13bremxoaCgmT55cZpnI2s+zvuepV9526SL3dougM3CpVCqc\nPHkSv//+OxQKBRwcHODi4gIzM7NKH5eZmYnt27fjyJEjqFu3LkJDQ5GWlvbE/RSKyj9ljo6Oulok\nHfgcVj99Pqfl1TaW17gqfZYXuKpr3VWla7JLQ3u/VJfq2u7nOVNQ5HbpWrfI11vX51tk7Uf/A/PQ\no5+7gHJOZHg0CD3rcVTP0qshf8YelZ6eXuFtlQauLVu2YPXq1WjcuDFatWqFkpIS7NmzB9nZ2QgO\nDkZgYGCFj01NTYWzszPs7OwAAL6+vli/fj2ys7O197l16xaaNGnyrNtDZFDkHh4v70vx0WWTPEbJ\n0geZlpq6m4eMj7Hutq8wcE2dOhUODg5ISEhA06ZNy9yWlZWFzZs3Y+LEiVi1alW5j2/fvj0SExNR\nUFCAOnXq4OjRo+jatStyc3ORlpYGFxcX7Ny5E25ubtW7RUTVTNfuDkM+Y85YJwjUp5oaPAz5fayL\nKb3Pdf2HqrxRJkNX3cfFGlPIelSFgWv69Olo1apVubc1btwY06dPx9WrVytccc+ePZGZmQlfX19Y\nWlqiQ4cO+PjjjzF48GCEh4dj/vz5aNGiBaKjo6u8ESQPY/5FVJUv5KpOjFiV2lWdBdtYJwjUJ2MO\nHjUV3+dkDCoMXOWFLZVKBbVajTp16gAA7O3tK135uHHjMG7cuDLL2rZti23btj1Hq6RvxvyLSJ9f\nyFWpzVmwy8eLtBPpl7Hu1tOnpz5L8cCBA4iMjERJSQnef/99fPDBByL7IjJohnIcVU0NHrou0m5K\nu5iI9EHXdwtD1rOrMHAVFhaWmQE+KSkJR48eBfDgAHgGLiL90xU8jFVVd19zF5Nuxj7fnCnMPG7I\nRH631NQLhlcYuCZOnIgRI0bA09MTAGBlZYWvv/4a5ubmqFWrlmwNEtGzMYVfRHLuvjb24EGmy1RH\nsGvqoRIVBq61a9fi66+/RnBwMMLCwhAeHo5vvvkGKpUKMTExcvZIemLsv4j0GTxMIfQQkX6Z6gh2\nTVVh4LKwsMD48eNx48YNLFq0CM2bN8eUKVN4oWkiMkgMuUSmwZjPiK9MhWOVBQUF2Lp1K9LS0rBk\nyRJ069YNSqUSBw4ckLM/qmZKpRLu7u5QKpX6bsVkmOqwP9UsxnxczaNnzJHxW79+PdRqtb7bqHaV\nTnzaokULFBUV4fTp01i8eDF69uyJVatWYdeuXVi3bp2cfVI14cHE1c+Qh/2N5dTtp9l9PcoIJ3w0\nJsZ8XI0hv7cNlamOIhmyCgNXdna29peyj48PAKBWrVqYNm0afv/9d3m6IzIB+gw9/EX07HSN9PAX\nFZkCY55X0VhVOvHpuHHjcP/+fTg5OZW5rU2bNsIbI8NjKrscGHpMX1Veb10jPfxFRVT9TPGSRo+r\nMHDFxMTghx9+gKWlJd588005e6JqVl0HE+tzl0NVRxUYemoWvt5EZGgqDFx///03+vTpU+mDr1+/\njpYtW1Z3T0RP4KhCzWLMo6mmirtSjRuPk9S/Ck+tWrRoEVasWIHs7Ownbrt9+zaWLl2Kzz77TGhz\nZFyUSiUGDhyo7zbIBKicXsL9no303QY9wlTPHCOSS4UjXF988QW2bNkCX19fNGvWDC1atEBpaSmu\nXbuGW7duYfz48fjwww/l7JUMHEehyFTVhONLiEi3qlyntcLApVAoEBgYiBEjRiAtLQ1XrlzBCy+8\nAF9fX7i4uMDCwqLKjZP8jGWagGfF3R1ERKbBkOc2rMrUShUGrocsLS3h5uYGNze352qODIuxhKxn\nvawQR9eIqh9H9kgfDHluw6rQGbieV1xcHOLj47U/PzwIv2PHjli/fj1sbW0BAL169UJoaKioNkgw\nXV/IkzxGydkOERE9BZ6YIj9hgSsgIAABAQ9GI65evQqlUonQ0FAsXrwYkZGR6N27t6jSRETCGPLu\nDqKnZcxXFqiK5zn8pLqmVhIWuB4VERGBKVOmoHHjxsjIyEBBQQGWLFmCdu3aYfbs2ahXr54cbRAR\nVZmp7u4gqgn0efiJzsD11ltvYfTo0Rg+fDisra2fucCPP/6I7Oxs+Pj4QKVSwd7eHtOmTUPbtm2x\nePFiREREIDo6usLHZ2ZmPnNNejYin2OR667q8SX6fG+xNmuztuHW1vXYxq1bCKtdVaxd/fevrtdb\nZ+BatGgR4uPjsWrVKgwYMACBgYFo27bt03UJYMuWLRg7diwUCgUsLS3LXPR6/Pjx6NevX6WPd3R0\nfOpa9HxEPsfVtW4Rxxvo873F2qxtbLWfZ1eqsW73o48tbxdUebuYqqt2VbF29d//WV7v9PT0Cu+n\nM3B16dIFXbp0QW5uLhITEzFp0iQ0adIE77zzDjw8PCp9rEqlQlpaGiIjIwEA//zzD44cOaI9tkuS\nJJiby7JXk2Qg8tiWmnq8AZGhqKm7UnkGNFWXp0o79+/fx5EjR5CcnAyFQgFXV1ds3LgR33//PebP\nn1/h4y5duoRWrVppd0VaWVkhJiYG3bp1w6uvvopNmzbpDG1kPGrqFzIRERmmZ51iSJeqzGWpM3BF\nRkYiKSkJnTp1wvjx4+Hm5gaFQoH33nsPPXv2rDRwXb9+HU2aNNH+XLduXSxevBghISFQq9VwcHBA\nVFTUMzVMREREpA9VmctSZ+DSaDSIi4uDg4NDmeWWlpZYtmxZpY/19PSEp6dnmWWcRJVE4Kn6RFRd\nOOEriaDzN9SHH36IgwcPAngwYhUeHo78/HwAgIuLi9juiJ6Sx7jhGPrRWH23QUREVC6dgWvWrFna\ngFW/fn1YWVlhzpw5whsjIiIiMhU6A9eff/6JGTNmAABsbGwwc+ZM/Pbbb8IbIyIiIqpO+rykkc5j\nuFQqFYqLi1G7dm0AQHFxMSRJEt4YERERUXXS5xRDOgNX//79ERQUhCFDhkChUGD//v3o37+/HL2R\niVEqlbh69Srs7e2xfv16fbdDRKQTT8ih6qIzcIWEhCAuLg7Hjx+HhYUFhg0bhhEjRsjRG5mYhxMI\nMmwRkbHg/IJUXXQGrhdeeAGjR4/G6NGjtcvy8/Of67qKRERERDWRzsD1/fffIyYmBnl5eZAkCRqN\nBjk5Ofjpp5/k6I+IiIjI6OncMb1gwQKMGjUKDRs2xIwZM9C1a1cEBgbK0RsRERGRSdAZuCwtLeHv\n748uXbrA1tYWCxcuRGpqqhy9GS2lUomBAwfquw2DcLc4r8yfx5cRERHVBDoDl5WVFQCgZcuWuHLl\nCiwsLKDRaIQ3ZszWr18PtVqt7zaIiIjIQOgMXK+++ipmz56Nbt26ITY2Fl9++WWND1wcwSIiIjI9\nIn+/6wxc4eHh6Nu3L1599VW88847OHfuHCIiIoQ0YyyqOoKlVCrh7u4OpVJZjV0RERFRVYjcQ6Xz\nLMV58+ZhwYIFAAB/f3/4+/sLacTY6bq6/CSPUdp/19T5qEImTtX+/fmq5XruhoiISD46A9e5c+fk\n6INqAIYsIiKqqXQGrubNmyMgIABOTk7a6ykCwOTJkyt9XFxcHOLj47U///333+jTpw/GjRuHWbNm\nIS8vD+3atUNUVJT2wHxD5fppwBPLaj+2fFRPLxk7IiIiouqgaw9VQDX9ftcZuGxtbWFra4s7d+48\n04oDAgIQEPAgkFy9ehVKpRKhoaH44IMPMGPGDLi4uGDp0qVYvXo1pk+f/nzdG5HypkB4dFmD2jZy\ntkNEREQy0hm4oqKiqlwkIiICU6ZMgUajQW5uLlxcXAAAfn5+ePfdd2tE4CIiIqKaS2fgGjNmDBQK\nxRPLN23a9FQFfvzxR2RnZ8PHxwcZGRlo3Lix9rZGjRohKyvrGdo1Dry6PBERET1KZ+Dy9fXV/lut\nViMlJQWdOnV66gJbtmzB2LFjoVAoyp2/q7ww96jMzMynrmUonufq8vrcTtZmbdZmbdZmbdYWW1tn\n4Bo2bNgTPwcFBWHKlCk6V65SqZCWlobIyEgAQJMmTZCdna29/datW2jSpEml63B0dNRZR26W6Xcg\nVZ4Tn5k+t5O1WZu1WZu1WZu1n28P1aO109PTK7zfM+/3MjMzw+3bt5/qvpcuXUKrVq1gbW0NAGjW\nrBmsra2RlpYGANi5cyfc3NyetQW9Uzm9hPs9Gz334x+dj4qIiIgMg8e44Rj60Vgh69Y5wjVz5swy\nP1+8eBFt2rR5qpVfv379iRGs6OhohIeHY/78+WjRogWio6OfoV3TwPmoiIiIapanmofrUZ06dYK3\nt/dTrdzT0xOenp5llrVt2xbbtm17hhaJiIiIjJvOwDV58mSkpqbC1dUVd+/eRWpqqnYXIRERERHp\npvMYrujkTxnVAAAV9UlEQVToaHz55ZcAHhwEv3nzZqxYsUJ4Y0RERESmQmfgOnz4sPYiy02aNEFs\nbCySk5OFN0ZERERkKnQGrpKSElhaWmp/trS01Dl3FhERERH9H53HcDk6OiIiIgIjRoyAQqHArl27\n0LZtWzl6IyIiIjIJOke45syZgzt37mD06NF45513cPv2bcyePVuO3oiIiIhMgs4RrgYNGmD+/Pmo\nW7cuiouLcefOHdja2srRGxEREZFJ0DnCtW/fPu31FP/++28MHz4cKSkpwhsjIiIiMhU6A9eaNWuw\nadMmAECbNm2we/durFy5UnhjRERERKZCZ+DSaDRo1qyZ9uemTZtCkiShTRERERGZEp2By8bGpsy8\nWykpKbCxsRHaFBEREZEp0XnQ/KxZszBx4kTMmTMHwIMA9sUXXwhvjIiIiMhU6AxcHTt2xJEjR3Dp\n0iWYm5sjLi4Oo0ePxtmzZ+Xoj4iIiMjo6QxcAHDu3Dl8/fXX+O6779CuXTv85z//Ed0XERERkcmo\nNHClpKRg3bp1uHDhAvr06QNbW1vs2rVLrt6IiIiITEKFgWvAgAF48cUXMWzYMKxevRq2trZwd3d/\nppWnpKRg1apVKCoqgqurK2bPno0FCxbgu+++g7W1NQBg+PDhCAoKqtpWEBERERmwCgNXrVq1AAC5\nubnIz89/5tnlr127hrlz5yIhIQGNGjVCUFAQUlJS8PPPP2PNmjVwcHCoWudERERERqLCwJWUlIQf\nf/wRsbGxGDRoELp27YqioiKo1WpYWFjoXPGhQ4cwaNAg7RxeS5cuhYWFBX799VdER0fj+vXreOON\nNxAWFqYNd0RERESmqNJ5uLp3747ly5fj0KFD+Ne//gVJktCvXz+sW7dO54qvXbsGSZKgVCrh5eWF\nzZs3o7i4GD169MCcOXOwa9cu3L17l7PWExERkcl7qrMUmzZtipCQEEyePBn79u3D5s2b8f7771f6\nmNLSUqSlpWHr1q2wtrbGxIkTkZqaii+//FJ7H6VSiRkzZiAkJKTC9WRmZj7lphg3fW4na7M2a7M2\na7M2a4ut/VSB6yFLS0v4+vpqL2ZdGTs7Ozg7O8POzg4A4O7ujgMHDuDFF1/EkCFDAACSJMHcvPIW\nHB0dn6VFo6XP7WRt1mZt1mZt1mbtqtdOT0+v8H46L+3zvPr27YsTJ04gJycHpaWlOH78OAYMGIAF\nCxYgOzsbkiRh8+bN8PDwENUCERERkUF4phGuZ9G5c2cEBwcjMDAQJSUlcHZ2hr+/PywtLfHOO++g\ntLQUTk5OGDdunKgWiIiIiAyCsMAFAH5+fvDz8yuz7Gl3SRIRERGZCmG7FImIiIjoAQYuIiIiIsEY\nuIiIiIgEY+AiIiIiEoyBi4iIiEgwBi4iIiIiwRi4iIiIiARj4CIiIiISjIGLiIiISDAGLiIiIiLB\nGLiIiIiIBGPgIiIiIhKMgYuIiIhIMAYuIiIiIsEYuIiIiIgEExq4UlJS4Ovri0GDBiEyMhIAcPr0\nafj4+GDAgAGYN28eSkpKRLZAREREpHfCAte1a9cwd+5crFixAnv37sWFCxeQkpKCjz76CJ9//jmS\nk5ORl5eHHTt2iGqBiIiIyCAIC1yHDh3CoEGD0KxZM5ibm2Pp0qWoX78+mjVrBgcHBygUCgwfPhwH\nDhwQ1QIRERGRQTAXteJr167B3NwcSqUSt27dQp8+fdC2bVs0btxYe59GjRohKytLVAtEREREBkFY\n4CotLUVaWhq2bt0Ka2trTJw4EbVq1XrifgqFotL1ZGZmimrRoOhzO1mbtVmbtVmbtVlbbG1hgcvO\nzg7Ozs6ws7MDALi7u+O///0vNBqN9j7Z2dlo0qRJpetxdHQU1aJB0ed2sjZrszZrszZrs3bVa6en\np1d4P2HHcPXt2xcnTpxATk4OSktLcfz4cXh6euLatWv47bffAAC7d+9Gnz59RLVAREREZBCEjXB1\n7twZwcHBCAwMRElJCZydneHv7482bdogNDQURUVF6Ny5M0aNGiWqBSIiIiKDICxwAYCfnx/8/PzK\nLOvRowf27NkjsiwRERGRQeFM80RERESCMXARERERCcbARURERCQYAxcRERGRYAxcRERERIIxcBER\nEREJxsBFREREJBgDFxEREZFgDFxEREREgjFwEREREQnGwEVEREQkGAMXERERkWAMXERERESCMXAR\nERERCcbARURERCSYuciVT5o0CVeuXEGtWrUAABMmTEBWVhbWr18PW1tbAECvXr0QGhoqsg0iIiIi\nvRIauC5cuIB9+/ahTp062mXTp09HZGQkevfuLbI0ERERkcEQtkvx5s2bKCoqwtSpU+Hl5YUVK1ZA\no9EgIyMDW7duhbe3N8LCwnDv3j1RLRAREREZBGGB686dO3B2dsbSpUsRHx+PkydPYvv27bC3t8e0\nadOQmJgIOzs7REREiGqBiIiIyCAI26XYsWNHxMTEaH8OCgrCrl27sG7dOu2y8ePHo1+/fpWuJzMz\nU1SLBkWf28narM3arM3arM3aYmsLC1xnzpxBXl4e3NzcAACSJKGwsBBxcXEICAjQLjM3r7wFR0dH\nUS0aFH1uJ2uzNmuzNmuzNmtXvXZ6enqF9xO2S7G4uBhRUVEoKCiASqVCfHw8fH19ERMTg8uXLwMA\nNm3aBA8PD1EtEBERERkEYSNcb775JoYNGwY/Pz9oNBq89dZb8PHxga2tLUJCQqBWq+Hg4ICoqChR\nLRAREREZBKHTQowfPx7jx48vs8zNzU27m5GIiIioJuBM80RERESCGW3gUiqVGDhwYJVud3d3h1Kp\nFNEeERERkZbQXYrVyfXTgLILWgK1r6rLLB/V00v7764B7rgatR4rU7ZqlwU8cvt/VsbAf/BQ/Gdl\nDO4W5wEAGtS2EdQ9ERER1WRGO8JlmX4HkqLi21PW7sQLZhVvXsjEqWX+JiIiIhLFaEa4HqdyeqnS\n2z3GDa/09s9XLa/OdoiIiIgqZLQjXERERETGgoGLiIiISDAGLiIiIiLBGLiIiIiIBGPgIiIiIhKM\ngYuIiIhIMAYuIiIiIsEYuIiIiIgEY+AiIiIiEoyBi4iIiEgwBi4iIiIiwYReS3HSpEm4cuUKatWq\nBQCYMGEC2rRpg1mzZiEvLw/t2rVDVFQUrKysRLZBREREpFdCA9eFCxewb98+1KlTR7vMx8cHM2bM\ngIuLC5YuXYrVq1dj+vTpItsgIiIi0ithuxRv3ryJoqIiTJ06FV5eXlixYgVu3LiB3NxcuLi4AAD8\n/Pxw4MABUS0QERERGQSFJEmSiBX/8ssvWLduHSIiImBmZobx48fD2dkZx44dw/bt2wEA9+/fh5OT\nE3755Zdy15Geni6iNSIiIiIhnJycyl0uLHA97ttvv0V8fDzy8/PLBK5u3brh3LlzcrRAREREpBfC\ndimeOXMG33//vfZnSZJQUlKC7Oxs7bJbt26hSZMmologIiIiMgjCAldxcTGioqJQUFAAlUqF+Ph4\n+Pr6wtraGmlpaQCAnTt3ws3NTVQLRERERAZB6C7Fr776Cnv27IFGo8Fbb72FkJAQXL58GeHh4cjL\ny0OLFi0QHR0NGxsbUS0QERER6Z1sx3ARERER1VQmO9O8Wq3Gu+++ixMnTshad8OGDRg8eDCGDBmC\nGTNmQKVSyVZ73bp18PT0hKenJxYvXgx9ZOmFCxciNDRU1pqTJk3CwIED4ePjAx8fHyQnJ8tWOyUl\nBb6+vhg0aBAiIyNlqxsXF6fdXh8fH3Tv3h1hYWGy1U9MTNS+z2fOnAm1Wi1b7TVr1mDAgAHw8vLC\n6tWrZan5+PdJVlYWxowZg0GDBuHdd9/FnTt3ZKv90M6dO4V/1h6vff78eYwcORLe3t4YOXIkzp8/\nL1vtX375Bf7+/vD29kZQUBD++usv2Wo/dP78eXTs2BElJSWy1U5JSYGLi4v2s/7RRx/JVjsrKwvj\nxo2Dj48P3n77bVy/fl2W2llZWWW+3zw8PNCpUycUFhYKrw0A169fR2BgIHx8fDBixAhcvHixegpJ\nJujSpUvSiBEjpNdff11KTU2Vre5PP/0kDRkyRCooKJA0Go0UGhoqrV27VpbaGRkZ0uDBg6Xi4mKp\npKREGjlypHT06FFZaj907Ngx6Y033pCmT58ua90+ffpI+fn5staUJEn6888/pTfffFO6ceOGpFar\npYCAAOnQoUOy9/HHH39I/fr1k/755x9Z6uXn50vdunWTsrOzJY1GIwUHB0vbt2+XpXZqaqrk6ekp\n/e9//5NKSkqk8ePHS//973+F1izv+2TChAlSQkKCJEmStG3bNmnatGmy1S4sLJQWLlwo/etf/xL6\nWSuv9sCBA6XTp09LkiRJx48flzw9PWWr7enpKf3444+SJEnS5s2bpalTp8pWW5IevO9HjBghtW3b\nVlKr1bLVjo6OlrZu3Sqknq7aY8aMkWJjYyVJkqStW7dKEydOlK32o9577z0pPj5ettohISHa7U5O\nTpYCAgKqpZZJjnBt374dH3zwAV5//XVZ69atWxfh4eGwsrKCQqFA+/btcfPmTVlqv/7669i9ezdq\n1aqFe/fuIS8vT9Zj4+7cuYPly5cjODhYtppA+RPsajQaWWofOnQIgwYNQrNmzWBubo6lS5eia9eu\nstR+VEREBKZMmYLGjRvLUk+j0UCj0aCoqAilpaVQqVTay3eJduHCBbi6usLGxgZmZmbo1asXUlJS\nhNZ8/PtErVYjLS0N3t7eAIChQ4fiyJEjQkb5yvsuS01NhZmZmdCRjvJql5aWQqlUonv37gAAR0dH\nYd9v5W33nj170K1bN2g0Gty8eRP16tWTrTYAREZGQqlUCqlZWe2MjAykpKRg6NChmDBhgmzP+d27\nd3Hx4kWMGjUKADB8+HBhI6qV/c7eu3cvSkpKMHLkSNlqazQaFBQUAACKioqq7fvNJAPX7Nmz4e7u\nLnvd1q1bo0ePHgCA7OxsxMbGytqHhYUFNm/eDA8PDzRq1AgdO3aUpa4kSfjkk0/w8ccfo27durLU\nfOjOnTtwdnbG0qVLER8fj5MnTyIhIUGW2teuXYMkSVAqlfDy8sLmzZtRv359WWo/9OOPPyI7Oxs+\nPj6y1bSxscG///1veHp6wtXVFcXFxRg8eLAstTt06IDU1FTk5OTg/v37OHz4MG7fvi205uPfJ7m5\nubCysoKlpSUAwNLSElZWVrh7967w2gDg4eGB0NBQ4SH38dpmZmbw8/PT/vz5558L+34rb7stLCxw\n+/Zt9O7dG/Hx8Xj77bdlq71//36Ym5vjrbfeElKzstr169fH+++/jz179sDV1RXTpk2Tpfb169fR\ntGlTLFq0CN7e3pg8eTLMzcVcDbCi39kajQYrVqwQuuu8vNrTpk3Dxo0b0atXL8ybN6/a6ptk4NK3\nv/76C0FBQfDz88Obb74pa+3AwECcPn0aDRo0QExMjCw1v/nmG7Rv3x7dunWTpd6jOnbsiJiYGNSt\nWxd16tRBUFAQjhw5Ikvt0tJS/PDDD1i0aBESEhJw/vx57Ny5U5baD23ZsgVjx46FQqGQrWZmZia2\nb9+OI0eO4IcffsBLL72EL7/8UpbaLi4u8PPzQ1BQEN5//304OTnBwsJCltoPVTSC+sILNePrtLS0\nFPPmzcP58+cRHh4ua207OzscP34cS5YsQXBwsNBjqR66fv06NmzYgE8++UR4rfIsW7ZMezm8wMBA\nXL58Gbm5ucLrlpSU4OLFi3ByckJSUhL69+8vfFT1cT/88AMaNWqETp06yVZTkiSEhYXh008/xQ8/\n/ICFCxciJCSkWvac1IxvCBlduHABAQEBGDVqFCZPnixb3WvXruHnn38GAJibm8PLywuXLl2Spfb+\n/ftx+PBh+Pj4YPny5Th27BjmzJkjS+3yJtgV9b+wx9nZ2cHZ2Rl2dnaoXbs23N3dta+BHFQqFdLS\n0oT/r/txqamp2u22tLSEr68vzpw5I0vt/Px8eHh4YO/evYiNjcWLL76I5s2by1L7oQYNGqCwsFB7\nQoxKpUJhYaHso5v6cP/+fUyePBl//PEHYmNjZRvRVqvVZU6G6dOnD1QqFXJycoTXPnToEO7du4e3\n335bO5I8fPhwoSdKPJSfn4+1a9eWWSbXd1zDhg1haWmJ/v37AwCGDBki+1VhUlJSMGTIEFlr5uTk\n4MqVK9rv1QEDBuDevXvVMpLOwFWNbt++DaVSifDwcIwZM0bW2llZWfj4449RVFQEjUaDgwcPao+1\nEG3Hjh3Yu3cvEhMTMXXqVPTu3RsRERGy1C5vgl0PDw9Zavft2xcnTpxATk4OSktLcfz4cXTo0EGW\n2gBw6dIltGrVCtbW1rLVBID27dvj5MmT2mMcjh49Ktvu6xs3biA4OBhqtRr5+flISEjAwIEDZan9\nkIWFBd544w0kJSUBeHDGprOzs+wjbfrwySefwNLSEmvXrpX1fWdhYYH//Oc/2kmzT5w4gXr16qFh\nw4bCa7/33ns4dOgQEhMTkZiYCODBGaIvvfSS8Np16tTBli1bkJqaCuDBd22nTp1kee5ffvlltGzZ\nUnuM5Pfffy/r9xsAnD17VvY9J7a2trCyssKpU6e0Pbz44ovV8l6TZyightiwYQMKCwuxcuVKrFy5\nEgDQq1cvWaZJ6N69O/z9/TF8+HCYmZmhe/fueO+994TX1bc333wTw4YNg5+fn3aCXbmOZ+rcuTOC\ng4MRGBiIkpISODs7w9/fX5bawINdHfq4NFbPnj2RmZkJX19fWFpaokOHDvj4449lqd2uXTt4e3vD\nx8cHpaWlCAoK0h43Kae5c+di5syZ2LBhA+rVq4clS5bI3oPcrly5gn379qF169ZljuVKSEjQHs8m\n0rJlyxAREYGFCxeibt26WLFihfCa+qZQKLB8+XLMmzcPCxYswEsvvYSFCxfKVn/FihWYO3cuYmJi\nYG1tjQULFshWG9DPd5xCocCKFSswf/58FBcXo06dOli+fHm1HLbBiU+JiIiIBOMuRSIiIiLBGLiI\niIiIBGPgIiIiIhKMgYuIiIhIMAYuIiIiIsEYuIiIiIgEY+AiIiIiEoyBi4hqFC8vrzKXgyIikgMD\nFxHVGCqVCr///jvat2+v71aIqIZh4CKiGuO3336DtbU1GjdurO9WiKiGYeAiohrj4sWLaNu2LQBA\no9Fg6dKl8PX1xfXr1/XcGRGZOl68mohqjIsXL6Jdu3a4e/cuQkJC0KRJE2zduhW1a9fWd2tEZOI4\nwkVENcalS5dQWFgIf39/vPXWW1i4cCHDFhHJgiNcRFRjXLx4Eb/99hsGDRqEUaNG6bsdIqpBOMJF\nRDVCVlYW8vPzERsbi+TkZBw+fFjfLRFRDcLARUQ1wsWLF9GmTRu0adMGX3zxBWbOnInMzEx9t0VE\nNQQDFxHVCBcvXtTOv9WlSxeEh4cjODgYWVlZeu6MiGoChSRJkr6bICIiIjJlHOEiIiIiEoyBi4iI\niEgwBi4iIiIiwRi4iIiIiARj4CIiIiISjIGLiIiISDAGLiIiIiLBGLiIiIiIBPt/xA52bBO49yYA\nAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "pal = sns.light_palette(\"seagreen\", n_colors=3, reverse=True)\n", "plot = sns.barplot(x=\"$k$\", y=\"Accuracy\", hue='Metrics', palette=pal, errwidth=1, capsize=0.04, data=results)\n", "plot.figure.set_size_inches((10, 4))\n", "plot.legend(loc='upper center', bbox_to_anchor=(0.5, 1.02), ncol=3)\n", "plot.set_ylabel('Accuracy (%)')\n", "plot.set_ylim(50, 95)\n", "\n", "# drawing a line at the highest accuracy for visual comparison between configurations\n", "highest_accuracy = results_means_ci['mean'].max()\n", "plot.axes.plot([0, 17], [highest_accuracy, highest_accuracy], 'g')\n", "highest_accuracy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The chart shows that the configurations yield the highest accuracy are: $k = 11$ - `combined`/`generic`/`provenance`, $k = 13$ - `generic`, and $k = 15$ - `combined`. The accuracy level seems to decrease with $k > 15$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Saving the chart above to `Fig6.eps` to be included in the paper:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "plot.figure.savefig(\"figures/Fig6.eps\")" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Analysing the importance of features\n", "\n", "In this section, we explore the relevance of each features in classifying messages in RRG. To do so, we analyse the feature importance values provided by the decision tree training done above - the `importances` data frame." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Rename the columns with Math notation for consistency with the metrics symbols in the paper\n", "feature_name_maths_mapping = {\n", " \"entities\": \"$n_e$\", \"agents\": \"$n_{ag}$\", \"activities\": \"$n_a$\", \"nodes\": \"$n$\", \"edges\": \"$e$\",\n", " \"diameter\": \"$d$\", \"assortativity\": \"$r$\", \"acc\": \"$\\\\mathsf{ACC}$\",\n", " \"acc_e\": \"$\\\\mathsf{ACC}_e$\", \"acc_a\": \"$\\\\mathsf{ACC}_a$\", \"acc_ag\": \"$\\\\mathsf{ACC}_{ag}$\",\n", " \"mfd_e_e\": \"$\\\\mathrm{mfd}_{e \\\\rightarrow e}$\", \"mfd_e_a\": \"$\\\\mathrm{mfd}_{e \\\\rightarrow a}$\",\n", " \"mfd_e_ag\": \"$\\\\mathrm{mfd}_{e \\\\rightarrow ag}$\", \"mfd_a_e\": \"$\\\\mathrm{mfd}_{a \\\\rightarrow e}$\",\n", " \"mfd_a_a\": \"$\\\\mathrm{mfd}_{a \\\\rightarrow a}$\", \"mfd_a_ag\": \"$\\\\mathrm{mfd}_{a \\\\rightarrow ag}$\",\n", " \"mfd_ag_e\": \"$\\\\mathrm{mfd}_{ag \\\\rightarrow e}$\", \"mfd_ag_a\": \"$\\\\mathrm{mfd}_{ag \\\\rightarrow a}$\",\n", " \"mfd_ag_ag\": \"$\\\\mathrm{mfd}_{ag \\\\rightarrow ag}$\", \"mfd_der\": \"$\\\\mathrm{mfd}_\\\\mathit{der}$\", \"powerlaw_alpha\": \"$\\\\alpha$\"\n", "}\n", "importances.rename(columns=feature_name_maths_mapping, inplace=True)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": true }, "outputs": [], "source": [ "grouped =importances.groupby(\"$k$\") # Grouping the importance values by k" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Calculate the mean importance of each feature for each data type\n", "imp_means = grouped.mean()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
123456789101112131415161718
0$\\alpha$$e$$e$$e$$n_e$$n$$n_e$$n$$e$$n_e$$n_e$$e$$e$$\\mathrm{mfd}_{e \\rightarrow a}$$e$$n_e$$\\mathrm{mfd}_{a \\rightarrow a}$$\\mathrm{mfd}_{a \\rightarrow a}$
1$\\mathrm{mfd}_\\mathit{der}$$r$$n_e$$n_e$$n$$\\mathrm{mfd}_{a \\rightarrow a}$$e$$n_e$$\\mathrm{mfd}_{e \\rightarrow a}$$e$$\\mathsf{ACC}_e$$\\mathrm{mfd}_{a \\rightarrow e}$$\\mathrm{mfd}_{e \\rightarrow a}$$e$$\\mathrm{mfd}_{e \\rightarrow a}$$\\mathrm{mfd}_{e \\rightarrow a}$$\\mathrm{mfd}_{e \\rightarrow a}$$\\mathrm{mfd}_{e \\rightarrow a}$
2$n_{ag}$$n_a$$\\mathsf{ACC}$$d$$e$$n_e$$\\mathsf{ACC}$$e$$r$$n$$\\mathsf{ACC}$$\\mathrm{mfd}_{a \\rightarrow a}$$\\mathsf{ACC}$$n_e$$n_e$$\\mathsf{ACC}_e$$e$$\\mathsf{ACC}_e$
\n", "
" ], "text/plain": [ " 1 2 3 4 5 \\\n", "0 $\\alpha$ $e$ $e$ $e$ $n_e$ \n", "1 $\\mathrm{mfd}_\\mathit{der}$ $r$ $n_e$ $n_e$ $n$ \n", "2 $n_{ag}$ $n_a$ $\\mathsf{ACC}$ $d$ $e$ \n", "\n", " 6 7 8 \\\n", "0 $n$ $n_e$ $n$ \n", "1 $\\mathrm{mfd}_{a \\rightarrow a}$ $e$ $n_e$ \n", "2 $n_e$ $\\mathsf{ACC}$ $e$ \n", "\n", " 9 10 11 \\\n", "0 $e$ $n_e$ $n_e$ \n", "1 $\\mathrm{mfd}_{e \\rightarrow a}$ $e$ $\\mathsf{ACC}_e$ \n", "2 $r$ $n$ $\\mathsf{ACC}$ \n", "\n", " 12 13 \\\n", "0 $e$ $e$ \n", "1 $\\mathrm{mfd}_{a \\rightarrow e}$ $\\mathrm{mfd}_{e \\rightarrow a}$ \n", "2 $\\mathrm{mfd}_{a \\rightarrow a}$ $\\mathsf{ACC}$ \n", "\n", " 14 15 \\\n", "0 $\\mathrm{mfd}_{e \\rightarrow a}$ $e$ \n", "1 $e$ $\\mathrm{mfd}_{e \\rightarrow a}$ \n", "2 $n_e$ $n_e$ \n", "\n", " 16 17 \\\n", "0 $n_e$ $\\mathrm{mfd}_{a \\rightarrow a}$ \n", "1 $\\mathrm{mfd}_{e \\rightarrow a}$ $\\mathrm{mfd}_{e \\rightarrow a}$ \n", "2 $\\mathsf{ACC}_e$ $e$ \n", "\n", " 18 \n", "0 $\\mathrm{mfd}_{a \\rightarrow a}$ \n", "1 $\\mathrm{mfd}_{e \\rightarrow a}$ \n", "2 $\\mathsf{ACC}_e$ " ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "three_most_relevant_metrics = pd.DataFrame(\n", " {row_name: row.sort_values(ascending=False)[:3].index.get_values() # three highest importance values in each row\n", " for row_name, row in imp_means.iterrows()\n", " }\n", ")\n", "three_most_relevant_metrics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The table above shows the most important metrics as reported by the decision tree classifiers during their training for each value of $k$.\n", "\n", "Apart from $k = 1$, whose performance is no better than the random baseline, we count the occurences of the most relevant metrics in cases where $k \\geq 2$ to find the most common metrics in the table above." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
$\\mathrm{mfd}_{a \\rightarrow a}$$\\mathrm{mfd}_{a \\rightarrow e}$$\\mathrm{mfd}_{e \\rightarrow a}$$\\mathsf{ACC}$$\\mathsf{ACC}_e$$d$$e$$n$$n_a$$n_e$$r$
02.00.01.00.00.00.07.02.00.05.00.0
11.01.06.00.01.00.03.01.00.03.01.0
21.00.00.04.02.01.03.01.01.03.01.0
\n", "
" ], "text/plain": [ " $\\mathrm{mfd}_{a \\rightarrow a}$ $\\mathrm{mfd}_{a \\rightarrow e}$ \\\n", "0 2.0 0.0 \n", "1 1.0 1.0 \n", "2 1.0 0.0 \n", "\n", " $\\mathrm{mfd}_{e \\rightarrow a}$ $\\mathsf{ACC}$ $\\mathsf{ACC}_e$ $d$ \\\n", "0 1.0 0.0 0.0 0.0 \n", "1 6.0 0.0 1.0 0.0 \n", "2 0.0 4.0 2.0 1.0 \n", "\n", " $e$ $n$ $n_a$ $n_e$ $r$ \n", "0 7.0 2.0 0.0 5.0 0.0 \n", "1 3.0 1.0 0.0 3.0 1.0 \n", "2 3.0 1.0 1.0 3.0 1.0 " ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "metrics_occurrences = three_most_relevant_metrics.loc[:,2:].apply(pd.value_counts, axis=1).fillna(0) # excluding k = 1\n", "metrics_occurrences" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
occurences
$e$13.0
$n_e$11.0
$\\mathrm{mfd}_{e \\rightarrow a}$7.0
$n$4.0
$\\mathsf{ACC}$4.0
$\\mathrm{mfd}_{a \\rightarrow a}$4.0
$\\mathsf{ACC}_e$3.0
$r$2.0
$n_a$1.0
$d$1.0
$\\mathrm{mfd}_{a \\rightarrow e}$1.0
\n", "
" ], "text/plain": [ " occurences\n", "$e$ 13.0\n", "$n_e$ 11.0\n", "$\\mathrm{mfd}_{e \\rightarrow a}$ 7.0\n", "$n$ 4.0\n", "$\\mathsf{ACC}$ 4.0\n", "$\\mathrm{mfd}_{a \\rightarrow a}$ 4.0\n", "$\\mathsf{ACC}_e$ 3.0\n", "$r$ 2.0\n", "$n_a$ 1.0\n", "$d$ 1.0\n", "$\\mathrm{mfd}_{a \\rightarrow e}$ 1.0" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# sorting the sum of the metrics occurences\n", "pd.DataFrame(metrics_occurrences.sum().sort_values(ascending=False), columns=['occurences'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As shown above, the number of edges $e$, the number of entities $n_e$, and the maximum finite distance between entities and activities $\\mathrm{mfd}_{e \\rightarrow a}$ are the most common metrics in the table of the most relevant metrics." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Classification using full dependency graphs (Extra)\n", "\n", "In this extra experiement, we tested run the same experiment as above but on the full dependency graphs of messages (similar to the experiements in [Application 2](Application%202%20-%20CollabMap%20Data%20Quality.ipynb)), i.e. without restricting a dependency graph to $k$ edges away from a message entity. The provenance network metrics of those dependency graphs are provided in [rrg/depgraphs.csv](rrg/depgraphs.csv), which has the same format as the other CSV files provided in this application." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Reading the data\n", "df = pd.read_csv(\"rrg/depgraphs.csv\", index_col=0)\n", "# Generate the label for classification\n", "df.label = df.label.apply(label).astype('category')" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy: 65.49% ±1.0807 <-- combined\n", "Accuracy: 60.76% ±1.1250 <-- generic\n", "Accuracy: 64.91% ±1.0958 <-- provenance\n" ] } ], "source": [ "res, imps = test_classification(df, n_iterations=1000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Results**: The above accuracy levels are very low (compared with the 50% baseline accuracy of random selection between two labels), indicating that the provenance network metrics of full dependency graphs of RRG messages do not correlate well with the nature of the messages.\n", "\n", "The reason for this is that a RRG provenance graph captures *all* the activities in a game which are all connected. As a RRG provenance graph evolves linearly along the lifeline of a RRG game, the size of a dependency graph varies greatly depending on when in a game the message was sent; messages sent at the beginning of a game have significantly more (potential) dependants than those sent later in the game. This is shown in the histograms of the number of nodes and edges below. As a result, their network metrics also similarly vary (in another word, noisy) and are not a good predictor of the message type." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXIAAAD3CAYAAAAALt/WAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEe1JREFUeJzt3X1M1XX/x/EXieTIXCuScqv9vGoOprMtrMhaYtbRc0Ao\nRE2n1WI0K9bN0iiWNRsrK2fN5dZqi5RWS7wJHLnChXmDNxvdDk81stJKgbwrJODI+Vx/uM4qgXM8\nwOG8f9fzsbUFnMN58e3bc1+P52iCc84JAGDWeUM9AADQP4QcAIwj5ABgHCEHAOMIOQAYR8gBwLjE\ngfxmDQ0NA/ntAOB/RkZGRtT3HdCQS9GP8fv9Sk9PH+A1g8viZsnmbjbHhsXNks3df9/c34tgnloB\nAOMIOQAYR8gBwDhCDgDGEXIAMI6QA4BxhBwAjCPkAGDcgL8hKFreNQckHRiSx/5xefaQPC4ADASu\nyAHAOEIOAMYRcgAwjpADgHGEHACMI+QAYBwhBwDjCDkAGEfIAcA4Qg4AxhFyADCOkAOAcYQcAIwj\n5ABgHCEHAOMIOQAYR8gBwDhCDgDGEXIAMC6ikFdVVSk7O1s5OTl66qmnFAgEBnsXACBCYUN+6tQp\nlZWVac2aNdq8ebNOnDihDz74IBbbAAARCBvyYDCoYDCoP//8U93d3erq6tL5558fi20AgAgkOOdc\nuBtVVFTopZdeUnJysq6++mqtXbtWw4YNO+t2DQ0NSk5OjmqId82BqO43ELbc85+o7tfR0aERI0YM\n8JrBZ3E3m2PD4mbJ5u6/b25vb1dGRkbU3ysx3A38fr/WrVunuro6jRo1SosXL9brr7+uhx56qMfb\np6enRzll6EIe7Wa/39+Pn3foWNzN5tiwuFmyufvvmxsaGvr1vcI+tbJr1y5lZmYqJSVFSUlJys/P\n12effdavBwUADJywIU9LS9OePXt06tQpSdK2bds0YcKEQR8GAIhM2KdWbr75Zvn9fuXn5yspKUnj\nx49XSUlJLLYBACIQNuSSVFRUpKKiosHeAgCIAu/sBADjCDkAGEfIAcA4Qg4AxhFyADCOkAOAcYQc\nAIwj5ABgHCEHAOMIOQAYR8gBwDhCDgDGEXIAMI6QA4BxhBwAjCPkAGAcIQcA4yL6G4L+v/u/J2v6\nce8DUd/zx+XZ/Xjc/unfzxy9ofqZh+rn7c/50V//e+fX0B1raWiPN1fkAGAcIQcA4wg5ABhHyAHA\nOEIOAMYRcgAwjpADgHGEHACMI+QAYBwhBwDjCDkAGEfIAcA4Qg4AxhFyADCOkAOAcYQcAIwj5ABg\nHCEHAOMIOQAYF1HIt27dqvz8fHm9XpWVlQ32JgDAOQgb8oMHD+rZZ5/Va6+9ps2bN2v//v3aunVr\nLLYBACKQGO4GtbW18nq9GjNmjCTplVde0fDhwwd9GAAgMgnOOdfXDZ599lklJibqxx9/VEtLi7Ky\nsvTYY4/pvPPOvphvaGhQcnJyVEO8aw5EdT8AiAdb7vnPOd2+o6NDI0aMkCS1t7crIyMj6scOe0Xe\n3d2t3bt3691339XIkSP14IMPasOGDZo9e3aPt09PT49yCiEHYNe5ts/v94fu09DQ0K/HDvsceUpK\nijIzM5WSkqIRI0Zo2rRp+uqrr/r1oACAgRM25FOnTlV9fb2OHz+u7u5u7dy5U+PHj4/FNgBABMI+\ntXLNNddo0aJFWrBggU6fPq3MzMxen1YBAMRe2JBLUkFBgQoKCgZ7CwAgCryzEwCMI+QAYBwhBwDj\nCDkAGEfIAcA4Qg4AxhFyADCOkAOAcYQcAIwj5ABgHCEHAOMIOQAYR8gBwDhCDgDGEXIAMI6QA4Bx\nhBwAjCPkAGAcIQcA4wg5ABhHyAHAOEIOAMYRcgAwjpADgHGEHACMI+QAYBwhBwDjCDkAGEfIAcA4\nQg4AxhFyADCOkAOAcYQcAIwj5ABgHCEHAOMIOQAYR8gBwLiIQ758+XItXrx4MLcAAKIQUch37Nih\nDz74YLC3AACiEDbkR48e1apVq7Ro0aJY7AEAnKPEvr7onFNpaalKSkp08ODBiL6h3+8fkGEAYMm5\ntq+jo2PAetlnyN9++22lpaVp0qRJEYc8PT09yikHorwfAAy9c22f3+8P3aehoaFfj91nyGtqatTZ\n2alt27bp5MmTam9v1zPPPKPnnnuuXw8KABg4fYZ8/fr1oX/fuHGj6uvriTgAxBleRw4AxvV5Rf53\n+fn5ys/PH8wtAIAocEUOAMYRcgAwjpADgHGEHACMI+QAYBwhBwDjCDkAGEfIAcA4Qg4AxhFyADCO\nkAOAcYQcAIwj5ABgHCEHAOMIOQAYR8gBwDhCDgDGEXIAMI6QA4BxhBwAjCPkAGAcIQcA4wg5ABhH\nyAHAOEIOAMYRcgAwjpADgHGEHACMI+QAYBwhBwDjCDkAGEfIAcA4Qg4AxhFyADCOkAOAcYQcAIwj\n5ABgXGIkNyovL9f69euVkJCgCRMm6LnnnlNSUtJgbwMARCDsFfmXX36pjRs3qrKyUps3b1Z3d7fW\nrl0bi20AgAiEDfmoUaO0dOlSJScnKyEhQWlpaTp8+HAstgEAIhD2qZWxY8dq7NixkqTW1lZVVFTo\n+eef7/X2fr9/4NYBgBHn2r6Ojo4B62VEz5FL0s8//6yioiIVFBRo8uTJvd4uPT09yikHorwfAAy9\nc22f3+8P3aehoaFfjx3Rq1b279+vefPmaf78+SouLu7XAwIABlbYK/LffvtNhYWFWrZsmTweTyw2\nAQDOQdgr8vLycrW3t2v16tXKy8tTXl6eVqxYEYttAIAIhL0iX7JkiZYsWRKLLQCAKPDOTgAwjpAD\ngHGEHACMI+QAYBwhBwDjCDkAGEfIAcA4Qg4AxhFyADCOkAOAcYQcAIwj5ABgHCEHAOMIOQAYR8gB\nwDhCDgDGEXIAMI6QA4BxhBwAjCPkAGAcIQcA4wg5ABhHyAHAOEIOAMYRcgAwjpADgHGEHACMI+QA\nYBwhBwDjCDkAGEfIAcA4Qg4AxhFyADCOkAOAcYQcAIwj5ABgHCEHAOMiCvmWLVuUnZ0tj8ej1atX\nD/YmAMA5CBvy1tZWLV++XGvWrFFNTY327NmjHTt2xGIbACACYUO+a9cuXX/99UpJSdHw4cOVl5en\nDz/8MBbbAAARSAx3g5aWFqWmpoY+Hj16tJqbm3u9fUNDQ1RDNsy+LKr7AUA8iKZ90fby38KGPBgM\nnvW5hISEHm+bkZHR/0UAgHMS9qmVyy67TK2traGPW1tbddllXD0DQLwIG/Ibb7xRe/fuVUtLiwKB\ngKqrq5WVlRWDaQCASCQ451y4G23ZskWrV69WIBDQrbfeqpKSklhsAwBEIKKQAwDiV8ze2VleXq7s\n7Gzl5OToySefVFdXl7777jvNnj1bM2bM0COPPKL29nZJUltbmx544AH5fD4VFBTop59+itXMHi1f\nvlyLFy+WJBObt27dqvz8fHm9XpWVlUmS9u3bp7y8PE2fPl3Lli3T6dOnJUnNzc1auHChvF6v7r33\nXh09enRINldVVYXOj6eeekqBQCBuj3UgENC9996r+vp6Sb0fw0AgoNLSUnm9Xs2cOVNfffVV6HtU\nVFTI6/XK4/GosrIy5psbGxs1d+5c5ebmau7cuWpsbIz7zX9pbGzUhAkTQudwPG3uaXdzc7OKioqU\nl5enu+66S4cOHZLU93m8cuVKzZgxQ9OnT1ddXV34B3Ux8MUXX7icnBx36tQpFwwG3eLFi92bb77p\ncnNzXX19vXPOuZUrV7oVK1Y455wrKytzr776qnPOuZ07d7q5c+fGYmaPtm/f7m644Qb3+OOPO+dc\n3G/+6aef3OTJk90vv/ziAoGAmzdvnqutrXVTpkxxTU1NLhgMuscff9y99957zjnnHnjgAVdZWemc\nc+799993jz32WMw3t7W1uUmTJrnW1lYXDAbdokWL3Lp16+LyWH/77bduzpw5buLEiW7Xrl3Oud6P\nYXl5uVuyZEnofrfddpsLBAKusbHReb1e19bW5v744w/n8/lcU1NTTDfPmDHD7du3zzl35hj6fL64\n3+zcmXNlzpw5bty4cS4QCMTV5t52L1y40FVUVDjnnHv33Xfdgw8+6Jzr/Tyura11CxcudIFAwB05\ncsRNnTrVnThxos/HjckV+ahRo7R06VIlJycrISFBaWlp+v7773XixAndeOONkqSCgoLQG43q6up0\n5513SpJuuukmHTlyRL/++msspv7D0aNHtWrVKi1atEiSdPjw4bjfXFtbK6/XqzFjxigxMVGvvPKK\nLrroIo0ZM0ZXXXWVEhISNGvWLH344YcKBALavXu3cnNzJUl33HGH6urqFAgEYro5GAwqGAzqzz//\nVHd3t7q6upSYmBiXx3rdunW6//77NXHiREnq8xj+fee4ceM0evRoff7556qrq5PH49EFF1ygkSNH\nyuPxaMuWLTHb3N3drcLCQl133XWSpPT0dB0+fFiS4nbzX8rKylRYWPiPz8XL5p52Hzt2TN98843m\nz58vSZo1a1boV/e9nceffPKJZs6cqcTERKWmpmrSpElhr8pjEvKxY8fq+uuvl3Tm5YsVFRW64oor\nen2jUXNz81lfO3LkSCymhjjnVFpaqpKSEo0aNarXXfG0WZIOHjwo55wKCws1c+ZMvfPOOzp8+HCP\nu0+cOKHk5GQlJSVJkpKSkpScnKxjx47FdPOFF16oRx99VD6fTzfddJM6Ojp05ZVXxuWxfvrppzVt\n2rTQx30dw+bmZo0ePfqsnX2dR7HYPGzYMBUUFIQ+XrlyZejr8bpZkmpqapSYmCiPx/OPz8fL5p52\nHzp0SJdffrlefPFF5ebmqri4WImJiaHdPZ3H0ZzfMf3TD3/++WfdfffdKigoUGZm5llf/+uNRq6H\n338977zY/kGNb7/9ttLS0jRp0qTQ5/p6c1Q8bJbOXG3t2LFDL774oiorK9XY2Njjc8gJCQk9/jxS\n7Hf7/X6tW7dOdXV12rFjhy655BLt3r37rNvF27GWej4npDN7etvZ0+d7e5PdYOru7tayZcvU2Nio\npUuXSur92A715kOHDqm8vFylpaVnfS1eN0vS6dOn9c033ygjI0PV1dW6/fbb9cQTT0g6t93hzu+Y\nnf379+/XvHnzNH/+fBUXF5/1RqOWlpbQG41SU1PV0tIS+tpQvAmppqZGn3zyifLy8rRq1Spt375d\nlZWVcb1ZklJSUpSZmamUlBSNGDFC06ZN0759+3p8U9fFF1+s9vZ2dXV1SZK6urrU3t6uiy66KKab\nd+3aFdqclJSk/Px87d27N+6PtaQ+j2FqaupZxz01NbXHz8d6f2dnp4qLi/XDDz+ooqIi9KvOeN1c\nW1urkydP6q677lJeXp6kM09THD16NG43S9Kll16qpKQk3X777ZKknJwcff3115J6P497+3n6EpOQ\n//bbbyosLNTSpUu1cOFCSdKYMWM0cuTI0JXXhg0bNGXKFElSVlaWNm7cKEnavXu3Lrzwwpj/B1i/\nfr02b96sqqoqPfzww7rlllv0wgsvxPVmSZo6darq6+t1/PhxdXd3a+fOnfL5fDp48KCampokSZs2\nbVJWVpaGDx+uG264QdXV1ZLOvHIkMzNTw4cPj+nmtLQ07dmzR6dOnZIkbdu2Tddee23cH2tJfR7D\nrKwsbdq0SZLU1NSkQ4cOaeLEiZoyZYo+/vhjtbW1qa2tTR999FHoZ4uV0tJSJSUl6c0339TIkSND\nn4/Xzffdd59qa2tVVVWlqqoqSWfOiUsuuSRuN0vSlVdeqSuuuEJbt26VJH366acaP368pN7P46ys\nLFVXVysQCKilpUV79uzR5MmT+36gAfwN21699NJLbuLEiS43Nzf0z8svvxz6HV6v1+uKiorc77//\n7pxz7uTJk664uNhlZ2e7O++80/n9/ljM7NWGDRtCr1qxsLmystL5fD7n8XjcM888406fPu327t3r\n8vLynMfjcUuWLHGdnZ3OOed+/fVXd8899zifz+fmzZvnfvnllyHZ/MYbbziPx+NycnJcSUmJa29v\nj+tjvWDBgtCrEno7hp2dna60tNT5fD6XnZ0degWOc86tXbs29N/orbfeiunmpqYmN27cODd9+vR/\n/D/Z2dkZt5v/7e+vWom3zf/e/f3337sFCxY4n8/n5syZE3rlTG/ncTAYdCtWrHA+n89Nnz7dVVdX\nh3083hAEAMbxV70BgHGEHACMI+QAYBwhBwDjCDkAGEfIAcA4Qg4AxhFyADDuv4eGMQHP2TiKAAAA\nAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df.nodes.hist()" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW8AAAD3CAYAAADSftWOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEMVJREFUeJzt3XtM1fUfx/EXTaghojMm5mrT1Qqm0R+aomZiOBIRMIbd\nJo5mXshW4STzQg5n6RZearrVbEOlzLwCzqhkolMu/YG3Lc9Sq19mcRPDBCSO8P390TpL4Sh8zwH6\ncJ6PvzznfM/5fs5bz9NvJ75+/SzLsgQAMMo9vb0AAEDXEW8AMBDxBgADEW8AMBDxBgADEW8AMFC/\nnthJRUVFT+wGAPqc0aNHd3h/j8T7Tgv4r3A4HAoPD+/tZfynMJP2mEnHmEt73pjJnQ58+doEAAxE\nvAHAQMQbAAxEvAHAQMQbAAxEvAHAQMQbAAxEvAHAQD12ko4nhr9zqIf29NMtt/63Lq6H9gsAXcOR\nNwAYiHgDgIGINwAYiHgDgIGINwAYiHgDgIGINwAYiHgDgIGINwAYiHgDgIGINwAYiHgDgIGINwAY\niHgDgIE6FW+n06nU1FSVlpZKkqqrq5WSkqLY2Filpqaqrq6uWxcJALjVXeN9/vx5zZ49W6dOnXLd\nl5WVpcTERBUWFmr69Ol67733unWRAIBb3TXeu3fv1vz58xURESHp76PwsrIyJSQkSJJmzpyp4uJi\nOZ3O7l0pAMDlrvFeuXKloqOjXbfr6+sVGBiogIAASVJAQIACAwN19erV7lslAOAWXb4MWltbW4f3\n33PPnf8ecDgcXd1VrzNxzd7U3Nzs8zO4HTPpGHNpr7tn0uV4Dx48WE1NTWppaVFAQIBaWlrU1NSk\nQYMG3fF54eHhthd5+7Ule4pnazafw+Hw+Rncjpl0jLm0542ZVFRUuH2syz8q6O/vr3HjxqmgoECS\nlJ+fr8jISPn7+9tfIQCgS2xdPX7VqlVatmyZcnJyNHDgQGVnZ3t7XQCAO+h0vHNzc12/fuCBB7Rt\n27buWA8AoBM4wxIADES8AcBAxBsADES8AcBAxBsADES8AcBAxBsADES8AcBAxBsADES8AcBAxBsA\nDES8AcBAxBsADES8AcBAxBsADES8AcBAxBsADES8AcBAxBsADES8AcBAxBsADES8AcBAxBsADES8\nAcBAxBsADES8AcBAxBsADES8AcBAxBsADORRvPPz8xUXF6cZM2Zo2bJlcjqd3loXAOAObMe7sbFR\na9as0fbt23Xw4EHV19crLy/Pm2sDALhhO95tbW1qa2vTjRs31NraqpaWFt17773eXBsAwI1+dp84\nYMAAvfXWW5o+fboCAwP1yCOPKC4uzptrAwC4YTveDodDu3fvVnFxsYKDg7VkyRJ9/PHHWrRokdvt\nTWPimr2pubnZ52dwO2bSMebSXnfPxHa8S0pKFBkZqZCQEElSUlKScnNz3W4fHh5ud1eSfvLgufZ5\ntmbzORwOn5/B7ZhJx5hLe96YSUVFhdvHbH/nHRYWpvLycjU2NkqSjh49qlGjRtl9OQBAF9g+8n7q\nqafkcDiUlJSkgIAAjRw5UkuXLvXm2gAAbtiOtyTNmzdP8+bN89ZaAACdxBmWAGAg4g0ABiLeAGAg\n4g0ABiLeAGAg4g0ABiLeAGAg4g0ABiLeAGAg4g0ABiLeAGAg4g0ABiLeAGAgj/5VQfQ9w985dNs9\nPXMhjP+t671L6LV/z3fTOxcH+e+7+1zM+n32hp+67T1z5A0ABiLeAGAg4g0ABiLeAGAg4g0ABiLe\nAGAg4g0ABiLeAGAg4g0ABiLeAGAg4g0ABiLeAGAg4g0ABiLeAGAg4g0ABvIo3kVFRUpKSlJsbKzW\nrFnjrTUBAO7CdrwvXbqkVatWafPmzTp48KDOnTunoqIib64NAOCG7SvpHD58WLGxsRo2bJgkaePG\njfL39/fawgAA7tmO96VLl9SvXz/NnTtXNTU1ioqKUnp6utvtHQ6H3V31mt5cc+x237rUVu9cogo9\nzRd/n7urI7bj3draqrKyMu3cuVNBQUF67bXXtG/fPs2aNavD7cPDw20vsreuGejZmj3lW/EG+ipP\nOlJRUeH2MdvfeYeEhCgyMlIhISG67777FB0drbNnz9p9OQBAF9iO95QpU1RaWqo//vhDra2tOnHi\nhEaOHOnNtQEA3LD9tckTTzyhhQsXavbs2bp586YiIyPdfmUCAPAu2/GWpOTkZCUnJ3trLQCATuIM\nSwAwEPEGAAMRbwAwEPEGAAMRbwAwEPEGAAMRbwAwEPEGAAMRbwAwEPEGAAMRbwAwEPEGAAMRbwAw\nkEf/qmBf54uXbAJgBo68AcBAxBsADES8AcBAxBsADES8AcBAxBsADES8AcBAxBsADES8AcBAxBsA\nDES8AcBAxBsADES8AcBAxBsADOSVeK9bt05LlizxxksBADrB43gfP35ceXl53lgLAKCTPIp3XV2d\nPvroIy1cuNBb6wEAdILteFuWpeXLl2vp0qUKDg725poAAHdh+zJo27ZtU1hYmMaMGaNLly7ddXuH\nw2F3VwBgrO5qn+14Hzp0SH/99ZeOHj2qa9euqampSe+++65Wr17d4fbh4eG2Fyn95MFzAaD3eNK+\niooKt4/ZjvfevXtdv96/f79KS0vdhhsA4F38nDcAGMj2kfe/JSUlKSkpyRsvBQDoBI68AcBAxBsA\nDES8AcBAxBsADES8AcBAxBsADES8AcBAxBsADES8AcBAxBsADES8AcBAxBsADES8AcBAxBsADES8\nAcBAxBsADES8AcBAxBsADES8AcBAxBsADES8AcBAxBsADES8AcBAxBsADES8AcBAxBsADES8AcBA\nxBsADES8AcBA/Tx5ck5Ojvbu3Ss/Pz+NGjVKq1evVkBAgLfWBgBww/aR95kzZ7R//37t2bNHBw8e\nVGtrq3bs2OHNtQEA3LAd7+DgYGVmZiowMFB+fn4KCwtTZWWlN9cGAHDDdrxHjBihsWPHSpJqa2uV\nm5ur6Ohory0MAOCeR995S9Lly5c1b948JScna8KECW63czgcnu4KAIzTXe3zKN7nzp3TggULNH/+\nfKWkpNxx2/DwcA/29JMHzwWA3uNJ+yoqKtw+ZjveV65c0dy5c5WVlaWYmBi7LwMAsMH2d945OTlq\namrSli1blJiYqMTERGVnZ3tzbQAAN2wfeWdkZCgjI8ObawEAdBJnWAKAgYg3ABiIeAOAgYg3ABiI\neAOAgYg3ABiIeAOAgYg3ABiIeAOAgYg3ABiIeAOAgYg3ABiIeAOAgYg3ABiIeAOAgYg3ABiIeAOA\ngYg3ABiIeAOAgYg3ABiIeAOAgYg3ABiIeAOAgYg3ABiIeAOAgYg3ABiIeAOAgYg3ABiIeAOAgTyK\nd2FhoeLi4hQTE6MtW7Z4a00AgLuwHe/a2lqtW7dO27dv16FDh1ReXq7jx497c20AADdsx7ukpERj\nx45VSEiI/P39lZiYqK+++sqbawMAuNHP7hNramoUGhrquj1kyBBVV1e73b6iosLurrRv1lDbzwWA\n3uRJ++7Edrzb2tra3efn59fhtqNHj7a7GwBAB2x/bTJ06FDV1ta6btfW1mroUI6QAaAn2I73+PHj\n9d1336mmpkZOp1MFBQWKiory4tIAAO74WZZl2X1yYWGhtmzZIqfTqWeeeUZLly715toAAG54FG8A\nQO/wmTMsFy1apGnTpikxMVGJiYn6+uuvdf78ec2aNUvTpk3Tm2++qaamJklSQ0OD0tLSNH36dCUn\nJ+uXX37p5dV7l9PpVGpqqkpLSyVJ1dXVSklJUWxsrFJTU1VXV+fabvny5YqNjVV8fLzOnj3reo3c\n3FzFxsYqJiZGe/bs6ZX34U23z6SoqEjjx493/Xl5++23Xdv5wkxycnIUFxenGTNm6J133lFLS4ut\nz8uGDRs0bdo0PfvssyouLu6tt+M1Hc1l+/btevrpp11/VrKzsyX1wFwsHxEVFWU1NDTccl9CQoJV\nWlpqWZZlbdiwwcrOzrYsy7LWrFljbdq0ybIsyzpx4oT1wgsv9Oxiu9EPP/xgPf/881ZERIRVUlJi\nWZZlpaWlWXv27LEsy7K+/PJLKz093bIsy8rJybEyMjJcz5s6darldDqt77//3oqNjbUaGhqs69ev\nW9OnT7cuXrzYO2/ICzqayfr1662dO3e229YXZnL69GlrxowZVmNjo9XW1mYtWbLE2rp1a5c/L4cP\nH7ZSUlIsp9NpVVVVWVOmTLHq6+t75015gbu5LF682Dp27Fi77bt7Lj5x5F1ZWakbN27ojTfeUHx8\nvDZv3qzffvtN9fX1Gj9+vCQpOTnZdZJRcXGxnnvuOUnSxIkTVVVVpd9//73X1u9Nu3fv1vz58xUR\nESHp7yPJsrIyJSQkSJJmzpyp4uJiOZ3OW+bw6KOPasiQITp16pSKi4sVExOj/v37KygoSDExMSos\nLOy19+Sp22ciSWfOnFFRUZFmzpyptLQ0VVZWSpJPzCQ4OFiZmZkKDAyUn5+fwsLC9OOPP3b583Lk\nyBHFx8erX79+Cg0N1ZgxY4w++u5oLpWVlTpz5ox27typhIQEZWRk6Nq1a5K6fy4+Ee+6ujpFRkZq\n48aN2rVrl8rLy3XgwAG3JxlVV1e3e6yqqqrH190dVq5cqejoaNft+vp6BQYGKiAgQJIUEBCgwMBA\nXb16VdXV1RoyZIhr23/m0NF87nSC1n/d7TORpEGDBunVV19VXl6eJk6cqPT0dEnyiZmMGDFCY8eO\nlfT3jwDn5ubqoYce6vLnpa99jjqaS1RUlIYPH6709HTl5+crJCREq1evltT9c/GJeI8aNUqbNm1S\ncHCw+vfvrzlz5ujkyZPttvvnJCOrg/+He889fXNUHZ1sJf39ft3NoaP73Z2gZaoPP/zQdZQ5e/Zs\nXbhwQfX19T41k8uXL2vOnDlKTk5WZGRku8fv9nnpq5+jf89l0qRJ+vTTT/XYY4/Jz89PCxYscB1F\nd/dczJ9kJ5w8eVLHjh1z3bYsSzdv3rzlJKOamhrXSUahoaGqqalxPdaXT0AaPHiwmpqa1NLSIklq\naWlRU1OTBg0apNDQ0HYnYoWGhnZ4f1+aT0NDg7Zu3XrLfZZluf4z1xdmcu7cOb300kt6+eWX9frr\nr7c7Ka8znxd3szLZ7XOpqqrSF1984Xr8nz8nUvfPxSfi3dzcrLVr16qxsVEtLS3atWuXkpKSFBQU\npLKyMknSvn37NHnyZElSVFSU9u/fL0kqKyvTgAEDjP4g3om/v7/GjRungoICSVJ+fr4iIyPl7++v\nqKgoHThwQJJ08eJF/frrr4qIiNDkyZP17bffqqGhQQ0NDfrmm29cs+sL+vfvr88//1wlJSWSpL17\n9+rxxx9XUFCQT8zkypUrmjt3rjIzM5WSkiJJGjZsWJc/L1FRUSooKJDT6VRNTY3Ky8s1YcKE3nlT\nXtDRXAIDA7Vp0yZduHBBkrRjxw5NnTpVUvfPxWd+zvuTTz5RXl6e2traFBMTo8WLF+vChQvKzMzU\n9evX9eCDD2r9+vUaMGCA/vzzT61YsUI///yzAgIC9P777yssLKy334JXpaSkKC0tTRMmTFBlZaWW\nLVum2tpaDRw4UNnZ2Ro2bJhaWlqUlZWl06dPy8/PTytWrHB9lZCbm6tdu3bp5s2bevHFF/XKK6/0\n8jvy3L9ncvbsWWVlZam5uVn333+/1q1b5zMz+eCDD/TZZ59p+PDhrvsmTZqkhISELn1eLMvShg0b\ndOTIEbW2tmrRokWKj4/vvTfmIXdzefLJJ5WdnS2n06mHH35Ya9euVXBwcLfPxWfiDQB9iU98bQIA\nfQ3xBgADEW8AMBDxBgADEW8AMBDxBgADEW8AMBDxBgAD/R9ndve6rmZf2AAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df.edges.hist()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 2 }