{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Extra 3.2 - Historical Provenance - Application 3: RRG Chat Messages\n", "Identifying instructions from chat messages in the Radiation Response Game.\n", "\n", "In this notebook, we explore the performance of classification using the provenance of a data entity instead of its dependencies (as shown [here](Application%203%20-%20RRG%20Messages.ipynb) and in the paper). In order to distinguish between the two, we call the former _historical_ provenance and the latter _forward_ provenance. Apart from using the historical provenance, all other steps are the same as [the original experiments](Application%203%20-%20RRG%20Messages.ipynb).\n", "\n", "* **Goal**: To determine if the provenance network analytics method can identify instructions from the provenance of a chat messages.\n", "* **Classification labels**: $\\mathcal{L} = \\left\\{ \\textit{instruction}, \\textit{other} \\right\\} $.\n", "* **Training data**: 69 chat messages manually categorised by HCI researchers.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading data\n", "\n", "The RRG dataset based on historical provenance is provided in the [`rrg/ancestor-graphs.csv`](rrg/ancestor-graphs.csv) file, which contains a table whose rows correspond to individual chat messages in RRG:\n", "* First column: the identifier of the chat message\n", "* `label`: the manual classification of the message (e.g., _instruction_, _information_, _requests_, etc.)\n", "* The remaining columns provide the provenance network metrics calculated from the *historical provenance* graph of the message.\n", "\n", "Note that in this extra experiment, we use the full (historical) provenance of a message, not limiting how far it goes. Hence, there is no $k$ parameter in this experiment." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "filepath = \"rrg/ancestor-graphs.csv\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | label | \n", "entities | \n", "agents | \n", "activities | \n", "nodes | \n", "edges | \n", "diameter | \n", "assortativity | \n", "acc | \n", "acc_e | \n", "... | \n", "mfd_e_a | \n", "mfd_e_ag | \n", "mfd_a_e | \n", "mfd_a_a | \n", "mfd_a_ag | \n", "mfd_ag_e | \n", "mfd_ag_a | \n", "mfd_ag_ag | \n", "mfd_der | \n", "powerlaw_alpha | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
21 | \n", "requests | \n", "186 | \n", "7 | \n", "21 | \n", "214 | \n", "469 | \n", "7 | \n", "0.012152 | \n", "0.488348 | \n", "0.445533 | \n", "... | \n", "22 | \n", "19 | \n", "34 | \n", "22 | \n", "19 | \n", "0 | \n", "0 | \n", "0 | \n", "37 | \n", "2.924960 | \n", "
20 | \n", "commissives | \n", "183 | \n", "7 | \n", "20 | \n", "210 | \n", "461 | \n", "7 | \n", "0.007546 | \n", "0.487386 | \n", "0.446461 | \n", "... | \n", "22 | \n", "19 | \n", "33 | \n", "22 | \n", "19 | \n", "0 | \n", "0 | \n", "0 | \n", "37 | \n", "2.858642 | \n", "
23 | \n", "assertives | \n", "216 | \n", "7 | \n", "23 | \n", "246 | \n", "543 | \n", "7 | \n", "-0.001550 | \n", "0.489050 | \n", "0.447828 | \n", "... | \n", "26 | \n", "22 | \n", "38 | \n", "26 | \n", "19 | \n", "0 | \n", "0 | \n", "0 | \n", "46 | \n", "2.867888 | \n", "
25 | \n", "instruction | \n", "220 | \n", "7 | \n", "24 | \n", "251 | \n", "553 | \n", "7 | \n", "0.002591 | \n", "0.489752 | \n", "0.447110 | \n", "... | \n", "26 | \n", "22 | \n", "38 | \n", "26 | \n", "19 | \n", "0 | \n", "0 | \n", "0 | \n", "46 | \n", "2.891161 | \n", "
24 | \n", "instruction | \n", "219 | \n", "7 | \n", "24 | \n", "250 | \n", "551 | \n", "7 | \n", "0.002284 | \n", "0.489859 | \n", "0.447021 | \n", "... | \n", "26 | \n", "22 | \n", "38 | \n", "26 | \n", "19 | \n", "0 | \n", "0 | \n", "0 | \n", "46 | \n", "2.928098 | \n", "
5 rows × 23 columns
\n", "\n", " | label | \n", "entities | \n", "agents | \n", "activities | \n", "nodes | \n", "edges | \n", "diameter | \n", "assortativity | \n", "acc | \n", "acc_e | \n", "... | \n", "mfd_e_a | \n", "mfd_e_ag | \n", "mfd_a_e | \n", "mfd_a_a | \n", "mfd_a_ag | \n", "mfd_ag_e | \n", "mfd_ag_a | \n", "mfd_ag_ag | \n", "mfd_der | \n", "powerlaw_alpha | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
21 | \n", "other | \n", "186 | \n", "7 | \n", "21 | \n", "214 | \n", "469 | \n", "7 | \n", "0.012152 | \n", "0.488348 | \n", "0.445533 | \n", "... | \n", "22 | \n", "19 | \n", "34 | \n", "22 | \n", "19 | \n", "0 | \n", "0 | \n", "0 | \n", "37 | \n", "2.924960 | \n", "
20 | \n", "other | \n", "183 | \n", "7 | \n", "20 | \n", "210 | \n", "461 | \n", "7 | \n", "0.007546 | \n", "0.487386 | \n", "0.446461 | \n", "... | \n", "22 | \n", "19 | \n", "33 | \n", "22 | \n", "19 | \n", "0 | \n", "0 | \n", "0 | \n", "37 | \n", "2.858642 | \n", "
23 | \n", "other | \n", "216 | \n", "7 | \n", "23 | \n", "246 | \n", "543 | \n", "7 | \n", "-0.001550 | \n", "0.489050 | \n", "0.447828 | \n", "... | \n", "26 | \n", "22 | \n", "38 | \n", "26 | \n", "19 | \n", "0 | \n", "0 | \n", "0 | \n", "46 | \n", "2.867888 | \n", "
25 | \n", "instruction | \n", "220 | \n", "7 | \n", "24 | \n", "251 | \n", "553 | \n", "7 | \n", "0.002591 | \n", "0.489752 | \n", "0.447110 | \n", "... | \n", "26 | \n", "22 | \n", "38 | \n", "26 | \n", "19 | \n", "0 | \n", "0 | \n", "0 | \n", "46 | \n", "2.891161 | \n", "
24 | \n", "instruction | \n", "219 | \n", "7 | \n", "24 | \n", "250 | \n", "551 | \n", "7 | \n", "0.002284 | \n", "0.489859 | \n", "0.447021 | \n", "... | \n", "26 | \n", "22 | \n", "38 | \n", "26 | \n", "19 | \n", "0 | \n", "0 | \n", "0 | \n", "46 | \n", "2.928098 | \n", "
5 rows × 23 columns
\n", "