{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2\n", "%store -r the_page\n", "\n", "if 'the_page' not in locals():\n", " import pickle\n", " print(\"Loading default data...\")\n", " the_page = pickle.load(open(\"data/the_page.p\",'rb'))\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "inputHidden": false, "outputHidden": false }, "outputs": [ { "data": { "text/markdown": [ "# ***Page: The Camp of the Saints***" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import display, Markdown as md\n", "display(md(f\"# ***Page: {the_page['title']}***\"))\n", "display(md(f\" \"))\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "inputHidden": false, "outputHidden": false }, "outputs": [ { "data": { "text/markdown": [ "---" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "# A. Article actions and conflict" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "The [WikiWho API](https://www.wikiwho.net/en/api/v1.0.0-beta/) tracks the changes to every token (words or special characters) on a Wikipedia page with at least 95% accuracy. It distinguishes every token in the document even when the string appears several times. E.g. \"and\" at the beginning of an article is a different token then \"and\" at the end of the article. See also [this figure](https://www.wikiwho.net/#technical_details)." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "That means that **not only edits** are counted, which can contain changes many different tokens, but *every single action to every single token* is recorded. Two actions can perfomed per token: i.e. **insertions** and **deletions** (a character change in a word, e.g. \"dog\" -> \"dogs\", is modeled as deletion of \"dog\" and the insertion of \"dogs\", two separate tokens). An **insertion** is also considered a **re-insertion** if the insertion has occured before; the only insertion of a token that is not a re-insertion is the first one. Similarly, a **deletion** is also considered a **re-deletion** if the deletion has occured before." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "Formally, the token history can represented by a time-ordered sequence of actions *(a0, ..., an)*; note that *a0+2i* is always an insertion and *a1+2i* is always a deletion for i ∈ ℕ." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "---\n", "***IMPORTANT:*** For articles with a long revision history, please allow for some time to load (see cog wheel symbol right of 'edit app') before interacting with the controls too often." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "\n", "\n", "display(md(\"---\"))\n", "display(md(f\"# A. Article actions and conflict\"))\n", "display(md(f\"The [WikiWho API](https://www.wikiwho.net/en/api/v1.0.0-beta/) tracks the changes to every token (words or special characters) on a \"\n", " \"Wikipedia page with at least 95% accuracy. It distinguishes every token in the \"\n", " 'document even when the string appears several times. E.g. \"and\" at the beginning of an article is a different token then \"and\" at the end of the article. '\n", " \"See also [this figure](https://www.wikiwho.net/#technical_details).\"\n", " ))\n", "display(md(\"That means that **not only edits** are counted, which can contain changes many different tokens, but *every single action to every single token* is recorded. Two actions can perfomed per token: i.e. **insertions** \"\n", " 'and **deletions** (a character change in a word, e.g. \"dog\" -> \"dogs\", is modeled as deletion of '\n", " '\"dog\" and the insertion of \"dogs\", two separate tokens). An **insertion** is also considered '\n", " \"a **re-insertion** if the insertion has occured before; the only insertion of a token that is not \"\n", " \"a re-insertion is the first one. Similarly, a **deletion** is also considered a **re-deletion** if \"\n", " \"the deletion has occured before.\"))\n", "display(md(\"Formally, the token history can represented by a time-ordered sequence of actions \"\n", " \"*(a0, ..., an)*; note that *a0+2i* is always an insertion and \"\n", " \"*a1+2i* is always a deletion for i ∈ ℕ.\"))\n", "\n", "\n", "display(md(\"---\\n***IMPORTANT:*** For articles with a long revision history, \"\n", " \"please allow for some time to load (see cog wheel symbol right of 'edit app') \"\n", " \"before interacting with the controls too often.\"))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "## A.1 Total actions per month and editor" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "***Page: The Camp of the Saints***" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "The following table shows the total number of actions (insertions + deletions) per month \n", "(`year_month` column), and editor (`editor_id` and `editor` columns)." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "**Columns description:**\n", "- **total**: total number of actions (insertions, and deletions)\n", "- **total_surv_48h**: total number of actions that survived at least 48 hours\n", "- **total_persistent**: total number of actions that survived until, at least, the end of the month\n", "- **total_stopword_count**: total number of actions that were performed in stop words" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "2b8f3561a6fe48eb8578da4110075d21", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from wikiwho_wrapper import WikiWho\n", "import pandas as pd\n", "import qgrid\n", "# set the default max number of rows to 10 so the larger DataFrame we render don't take up to much space \n", "qgrid.set_grid_option('maxVisibleRows', 5)\n", "\n", "wikiwho = WikiWho(lng='en')\n", "agg_actions = wikiwho.dv.edit_persistence(the_page.page_id)\n", "\n", "# define total columns\n", "total_columns = ['total', 'total_surv_48h', 'total_persistent', 'total_stopword_count']\n", "\n", "# add columns with the total actions\n", "agg_actions = agg_actions.join(pd.DataFrame(\n", " agg_actions.loc[:,'adds':'adds_stopword_count'].values +\\\n", " agg_actions.loc[:,'dels':'dels_stopword_count'].values +\\\n", " agg_actions.loc[:,'reins':'reins_stopword_count'].values, \n", " index=agg_actions.index, \n", " columns=total_columns\n", "))\n", "\n", "display(md(\"## A.1 Total actions per month and editor\"))\n", "display(md(f\"***Page: {the_page['title']}***\"))\n", "display(md(\"\"\"The following table shows the total number of actions (insertions + deletions) per month \n", "(`year_month` column), and editor (`editor_id` and `editor` columns).\"\"\"))\n", "display(md(\"\"\"**Columns description:**\n", "- **total**: total number of actions (insertions, and deletions)\n", "- **total_surv_48h**: total number of actions that survived at least 48 hours\n", "- **total_persistent**: total number of actions that survived until, at least, the end of the month\n", "- **total_stopword_count**: total number of actions that were performed in stop words\"\"\"))\n", "\n", "from IPython.display import clear_output\n", "from ipywidgets import Output\n", "\n", "# the output widget is used to update the qgrid\n", "out = Output()\n", "display(out)\n", "with out:\n", " print(\"Downloading editor usernames (i.e. *editor* column)...\")\n", " display(qgrid.show_grid(agg_actions[['year_month', 'editor_id'] + total_columns]))\n", "\n", "# Grab user names from wikipedia and merge them to the editors_conflict dataframe\n", "from external.wikipedia import WikipediaDV, WikipediaAPI\n", "wikipedia_dv = WikipediaDV(WikipediaAPI(domain='en.wikipedia.org'))\n", "editors = wikipedia_dv.get_editors(agg_actions['editor_id'].unique()).rename(columns = {\n", " 'userid': 'editor_id'})\n", "\n", "# Merge the namesof the editors to the aggregate actions dataframe\n", "agg_actions = agg_actions.merge(editors[['editor_id', 'name']], on='editor_id')\n", "agg_actions.insert(3, 'editor', agg_actions['name'])\n", "agg_actions = agg_actions.drop(columns=['name'])\n", "agg_actions['editor'] = agg_actions['editor'].fillna(\"Unregistered\")\n", "\n", "with out:\n", " clear_output()\n", " display(qgrid.show_grid(agg_actions[['year_month', 'editor_id', 'editor'] + total_columns]))\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "## A.2. Visualization of actions per month" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "***Page: The Camp of the Saints***" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "In the following graph you can select the *date range* and *granularity* (yearly, montly) \n", "of the timeline (X-axis), and plot any of the follow counts in the black, red, blue and green lines:\n", " \n", "- **adds**: number of first-time insertions\n", "- **adds_surv_48h**: number of insertions for the first time that survived at least 48 hours\n", "- **adds_persistent**: number of insertions for the first time that survived until, at least, the end of the month\n", "- **adds_stopword_count**: number of insertions that were stop words\n", "- **dels**: number of deletions\n", "- **dels_surv_48h**: number of deletions that were not resinserted in the next 48 hours\n", "- **dels_persistent**: number of deletions that were not resinserted until, at least, the end of the month\n", "- **dels_stopword_count**: number of deletions that were stop words\n", "- **reins**: number of reinsertions\n", "- **reins_surv_48h**: number of reinsertions that survived at least 48 hours\n", "- **reins_persistent**: number of reinsertionsthat survived until the end of the month\n", "- **reins_stopword_count**: number of reinsertionsthat were stop words\n", "\n", "**What do these actions/counts mean?** For instance, if you see 10 \"adds\" in a month, but only 4 \"adds_surv_48h\", 10 completely new tokens/words have been added to the article, but only 4 of them stayed in the article for more than 2 days, which usually means the other 6 are gone for good. If \"dels\" are performed and don't survive, that means that these deletions have been undone, i.e., the deleted tokens have been put back. I.e., these are measurements of the longevity and stability of edit actions done to the article. \n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "bb10b88ef52d4501b2b210fbe23052fe", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(SelectionRangeSlider(continuous_update=False, description='Date Range', index=(0, 111), …" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ ".(*args, **kwargs)>" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "display(md(\"\"\"## A.2. Visualization of actions per month\"\"\"))\n", "display(md(f\"***Page: {the_page['title']}***\"))\n", "display(md(\"\"\"In the following graph you can select the *date range* and *granularity* (yearly, montly) \n", "of the timeline (X-axis), and plot any of the follow counts in the black, red, blue and green lines:\n", " \n", "- **adds**: number of first-time insertions\n", "- **adds_surv_48h**: number of insertions for the first time that survived at least 48 hours\n", "- **adds_persistent**: number of insertions for the first time that survived until, at least, the end of the month\n", "- **adds_stopword_count**: number of insertions that were stop words\n", "- **dels**: number of deletions\n", "- **dels_surv_48h**: number of deletions that were not resinserted in the next 48 hours\n", "- **dels_persistent**: number of deletions that were not resinserted until, at least, the end of the month\n", "- **dels_stopword_count**: number of deletions that were stop words\n", "- **reins**: number of reinsertions\n", "- **reins_surv_48h**: number of reinsertions that survived at least 48 hours\n", "- **reins_persistent**: number of reinsertionsthat survived until the end of the month\n", "- **reins_stopword_count**: number of reinsertionsthat were stop words\n", "\n", "**What do these actions/counts mean?** For instance, if you see 10 \"adds\" in a month, but only 4 \"adds_surv_48h\", 10 completely new tokens/words have been added to the article, but only 4 of them stayed in the article for more than 2 days, which usually means the other 6 are gone for good. If \"dels\" are performed and don't survive, that means that these deletions have been undone, i.e., the deleted tokens have been put back. I.e., these are measurements of the longevity and stability of edit actions done to the article. \n", "\n", "\"\"\"))\n", "\n", "\n", "\n", "# Convert to datetime\n", "agg_actions['year_month'] = pd.to_datetime(agg_actions['year_month'])\n", "\n", "# Group the data by year month and page (drop the editor information)\n", "agg_actions.drop('editor_id', axis=1).groupby(['year_month','page_id']).sum().reset_index()\n", "\n", "# Listener\n", "from visualization.actions_listener import ActionsListener\n", "listener = ActionsListener(agg_actions)\n", "action_types = (agg_actions.columns[4:16]).values.tolist()\n", "\n", "# Visualization\n", "from utils.notebooks import get_date_slider_from_datetime\n", "from ipywidgets import interact, fixed\n", "from ipywidgets.widgets import Dropdown\n", "\n", "interact(listener.listen,\n", " _range = get_date_slider_from_datetime(agg_actions['year_month']),\n", " editor=fixed('All'),\n", " granularity=Dropdown(options=['Yearly', 'Monthly'], value='Yearly'),\n", " black=Dropdown(options=action_types, value='adds'), \n", " red=Dropdown(options= ['None'] + action_types, value='dels'),\n", " green=Dropdown(options= ['None'] + action_types, value='None'), \n", " blue=Dropdown(options= ['None'] + action_types, value='None'))\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "---" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "## A.3 Page Conflict" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "***Page: The Camp of the Saints***" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import display, Markdown as md\n", "display(md(\"---\"))\n", "display(md(f'## A.3 Page Conflict'))\n", "display(md(f\"***Page: {the_page['title']}***\"))\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our measurement of ***conflict*** for single tokens is taken from [Flöck et al.](https://arxiv.org/abs/1703.08244):\n", "\n", "\n", "* (1) The main idea is to count how often a token - after being created (added) the first time - was being deleted, re-inserted, re-deleted, re-inserted, and so on; which would often happen in case two editors disagree on the token's justification to be in the text. \n", "* (2) Only the **re-**deletions and **re-**insertions are counted, since up to the first delete it could be a simple correction that didn't trigger a reponse - this wouldn't indicate conflict. \n", "* (3) The **\"re-\"** actions are only counted if they alternate between different editors and don't come from the same editor twice or more in a row - as the latter would simply indicate self-corrections. \n", "* (4) In a last step, each re-insertion/re-deletion interaction gets a higher weight the faster it occurs (see [Flöck et al.](https://arxiv.org/abs/1703.08244) for the exact formula).\n", "\n", "The total conflict of a page is the sum of all the conflict scores of all actions with \n", "conflict (or conflict actions). \n", "\n", "This total conflict can be normalized if the sum is divided by the number of \n", "actions that could potentially be counted as conflict (elegible actions, i.e **\"re-\"** actions that have occurred at \n", "least twice).\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the following graph you can select the *date range* and *granularity* (yearly, monthly) \n", "of the timeline (X-axis), and plot any of the following counts in the black and red lines:\n", " \n", "- **Total**: total number of actions (insertions, and deletions)\n", "- **Total_surv_48h**: total number of actions that survived at least 48 hours\n", "- **Total_persistent**: total number of actions that survived until, at least, the end of the month\n", "- **Total_stopword_count**: total number of actions that were performed in stop words\n", "- **Total Elegible Actions**: the total number of elegible actions\n", "- **Conflict count**: the total number of conflicts\n", "- **Number of Revisions**: the total number of revisions/edits\n", "- **Conflict Score**: the sum of conflict scores of all actions divided by the number of elegible actions\n", "- **Absolute Conflict Score**: the sum of conflict scores of all actions (without division)\n", "- **Conflict Ratio**: the count of all conflicts divided by the number of elegible actions" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "**Page conflict score: 0.8526734147486141**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "3e514feddf574a48acbdd7f66d6545de", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(SelectionRangeSlider(continuous_update=False, description='Date Range', index=(0, 111), …" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Visualization\n", "from visualization.conflicts_listener import ConflictsListener\n", "listener = ConflictsListener(agg_actions)\n", "\n", "metrics = ['Total', 'Total_surv_48h', 'Total_persistent', 'Total_stopword_count',\n", " 'Total Elegible Actions', 'Number of Conflicts', 'Number of Revisions',\n", " 'Conflict Score', 'Absolute Conflict Score', 'Conflict Ratio']\n", "conflict_score = agg_actions.conflict.sum() / agg_actions.elegibles.sum()\n", "display(md(f'**Page conflict score: {conflict_score}**'))\n", "\n", "# Visualization\n", "from utils.notebooks import get_date_slider_from_datetime\n", "from ipywidgets import interact\n", "from ipywidgets.widgets import Dropdown\n", "\n", "if (conflict_score != 0):\n", " interact(listener.listen,\n", " _range = get_date_slider_from_datetime(agg_actions['year_month']),\n", " granularity=Dropdown(options=['Yearly', 'Monthly'], value='Monthly'),\n", " black=Dropdown(options=metrics, value='Conflict Score'),\n", " red=Dropdown(options= ['None'] + metrics, value='None'))\n", "\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "---" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "## A.4 Editor Conflict Score" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "***Page: The Camp of the Saints***" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "We can also calculate the conflict score for each individual editor. The\n", "table below presents the conflict score and other related metrics per editor (*editor_id* and *editor*\n", "column):\n", "\n", "- **conflicts**: the total number of conflicts\n", "- **elegibles**: the total number of elegible actions performed by the editor\n", "- **conflict**: the sum of conflict scores of all actions divided by the number of elegible actions\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import display, Markdown as md\n", "display(md(\"---\"))\n", "display(md(f'## A.4 Editor Conflict Score'))\n", "display(md(f\"***Page: {the_page['title']}***\"))\n", "display(md(\"\"\"We can also calculate the conflict score for each individual editor. The\n", "table below presents the conflict score and other related metrics per editor (*editor_id* and *editor*\n", "column):\n", "\n", "- **conflicts**: the total number of conflicts\n", "- **elegibles**: the total number of elegible actions performed by the editor\n", "- **conflict**: the sum of conflict scores of all actions divided by the number of elegible actions\n", "\"\"\"))" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "4d624f6bd9c643de906dbfe09154f409", "version_major": 2, "version_minor": 0 }, "text/plain": [ "QgridWidget(grid_options={'fullWidthRows': True, 'syncColumnCellResize': True, 'forceFitColumns': True, 'defau…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "editors_conflicts = agg_actions.groupby(pd.Grouper(\n", " key='editor_id')).agg({'conflicts': 'sum', 'elegibles': 'sum', 'conflict': 'sum'}).reset_index()\n", "editors_conflicts['conflict'] = (editors_conflicts['conflict']/editors_conflicts['elegibles'])\n", "if len(editors_conflicts) > 0:\n", " editors_conflicts = editors[['editor_id', 'name']].merge(editors_conflicts.dropna(), \n", " right_index=True, on='editor_id').set_index('editor_id')\n", " qg_obj = qgrid.show_grid(editors_conflicts.dropna())\n", " display(qg_obj)\n", "else:\n", " display(md(f'**There is no Conflict Scores**')) \n", " editors_conflicts = None" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# create the api\n", "from wikiwho_wrapper import WikiWho\n", "wikiwho = WikiWho(lng='en')\n", "\n", "from IPython.display import display, Markdown as md\n", "# Get the content and revisions from the wikiwho api\n", "display(md(\"Downloading all_content from the WikiWhoApi...\"))\n", "all_content = wikiwho.dv.all_content(the_page['page_id'])\n", "\n", "display(md(\"Downloading revisions from the WikiWhoApi...\"))\n", "revisions = wikiwho.dv.rev_ids_of_article(the_page['page_id'])\n", "\n", "from IPython.display import clear_output\n", "clear_output()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "## B.1 Conflict score of each singular action" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "***Page: The Camp of the Saints***" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "In the following table, all the actions that are in conflict are presented, and a conflict score\n", "is presented per action. The importance of the conflict can be meassure by considering the seconds t that \n", "have passed since the last action on the same token has occured (`time_diff_secs` column). A score to \n", "meassure conflict is calculated based on t with the following formula: 1 / log3600(t+2). \n", "Thus, *undo* actions are weighted higher than the original time in seconds when the *t* is less than an hour.\n", "For details, please refer to [Flöck et al, 2017](https://arxiv.org/abs/1703.08244).\n", "**Columns description:**\n", "- **token**: the string of the token that is being tracked\n", "- **token_id**: the id of the token that is being tracked\n", "- **rev_id**: the revision id in which the action (insertion or deletion) happen\n", "- **editor_id**: the id of the editor that inserted the token (if starts with **0|**, it means that\n", "the editor is not registered, and the ip is displayed instead\n", "- **time_diff_secs**: seconds that have passed since the last action on the same token has occured\n", "- **conflict**: a score to meassure conflict that is calculated based on the `time_diff_secs` \n", "with the following formula: *1 / log3600(time_diff_secs + 2)*. For details, please refer to \n", "[Flöck et al, 2017](https://arxiv.org/abs/1703.08244)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "78657af94a704df9bfa243148a52358a", "version_major": 2, "version_minor": 0 }, "text/plain": [ "QgridWidget(grid_options={'fullWidthRows': True, 'syncColumnCellResize': True, 'forceFitColumns': True, 'defau…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from metrics.conflict import ConflictManager\n", "from wikiwho_wrapper import WikiWho\n", "from IPython.display import clear_output\n", "from IPython.display import HTML\n", "from utils.notebooks import get_next_notebook, get_previous_notebook\n", "\n", "# call the calculator\n", "calculator = ConflictManager(all_content, revisions)\n", "calculator.calculate()\n", "clear_output()\n", "\n", "# display the tokens, the difference in seconds and its corresponding conflict score\n", "conflicts = calculator.conflicts.copy()\n", "conflicts['time_diff_secs'] = conflicts['time_diff'].dt.total_seconds()\n", " \n", "display(md(f'## B.1 Conflict score of each singular action'))\n", "display(md(f\"***Page: {the_page['title']}***\"))\n", "display(md(\"\"\"In the following table, all the actions that are in conflict are presented, and a conflict score\n", "is presented per action. The importance of the conflict can be meassure by considering the seconds t that \n", "have passed since the last action on the same token has occured (`time_diff_secs` column). A score to \n", "meassure conflict is calculated based on t with the following formula: 1 / log3600(t+2). \n", "Thus, *undo* actions are weighted higher than the original time in seconds when the *t* is less than an hour.\n", "For details, please refer to [Flöck et al, 2017](https://arxiv.org/abs/1703.08244).\n", "**Columns description:**\n", "- **token**: the string of the token that is being tracked\n", "- **token_id**: the id of the token that is being tracked\n", "- **rev_id**: the revision id in which the action (insertion or deletion) happen\n", "- **editor_id**: the id of the editor that inserted the token (if starts with **0|**, it means that\n", "the editor is not registered, and the ip is displayed instead\n", "- **time_diff_secs**: seconds that have passed since the last action on the same token has occured\n", "- **conflict**: a score to meassure conflict that is calculated based on the `time_diff_secs` \n", "with the following formula: *1 / log3600(time_diff_secs + 2)*. For details, please refer to \n", "[Flöck et al, 2017](https://arxiv.org/abs/1703.08244)\"\"\"))\n", "\n", "if len(conflicts) > 0:\n", " display(qgrid.show_grid(conflicts[[\n", " 'action', 'token', 'token_id', 'rev_id', \n", " 'editor', 'time_diff_secs', 'conflict']].rename(columns={\n", " 'editor': 'editor_id'}).sort_values('conflict', ascending=False)))\n", "else:\n", " display(md(f'**There are no conflicting tokens in this page.**'))\n", " display(HTML(f'Go back to the previous workbook'))" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "---" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "## B.2 Most frequent conflicting token strings" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "***Page: The Camp of the Saints***" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ " The WordCloud displays the most common conflicting token strings, i.e. words (token strings) \n", "with the most actions that have conflict. The size of the token string in the WordCloud indicates frequency \n", "of actions.\n", "In the controls you can select the *date range*, the type of *action* (insertion or deletion), and the \n", "*source*. The *source* can be any of the following:\n", "- **Only Conflicts**: use only the actions that are in conflict.\n", "- **Elegible Actions**: use only the actions that can potentially enter into conflict, i.e. actions \n", "that have occurred at least twice, e.g. the token x has been inserted twice (which necessarily implies \n", "it was remove once), the token x has been deleted twice (which necessarily implies it was inserted twice) \n", "- **All Actions**: use all tokens regardles conflict\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import display, Markdown as md, HTML as html\n", "display(md(\"---\"))\n", "display(md(f'## B.2 Most frequent conflicting token strings'))\n", "display(md(f\"***Page: {the_page['title']}***\"))\n", " \n", "display(md(\"\"\" The WordCloud displays the most common conflicting token strings, i.e. words (token strings) \n", "with the most actions that have conflict. The size of the token string in the WordCloud indicates frequency \n", "of actions.\n", "In the controls you can select the *date range*, the type of *action* (insertion or deletion), and the \n", "*source*. The *source* can be any of the following:\n", "- **Only Conflicts**: use only the actions that are in conflict.\n", "- **Elegible Actions**: use only the actions that can potentially enter into conflict, i.e. actions \n", "that have occurred at least twice, e.g. the token x has been inserted twice (which necessarily implies \n", "it was remove once), the token x has been deleted twice (which necessarily implies it was inserted twice) \n", "- **All Actions**: use all tokens regardles conflict\n", "\"\"\"))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "dc71e4451b034eb2824e4e6bf57837fa", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(SelectionRangeSlider(continuous_update=False, description='Date Range', index=(0, 219), layout=…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# listener\n", "from visualization.wordcloud_listener import WCListener\n", "\n", "listener = WCListener(sources = {\n", " 'All actions': calculator.all_actions,\n", " 'Elegible Actions': calculator.elegible_actions,\n", " 'Only Conflicts': calculator.conflicts\n", "})\n", "\n", "# visualization\n", "from utils.notebooks import get_date_slider_from_datetime\n", "from ipywidgets import interact, fixed\n", "\n", "from ipywidgets.widgets import Dropdown, HTML, interactive_output, VBox\n", "\n", "_range=get_date_slider_from_datetime(calculator.all_actions['rev_time'])\n", "source=Dropdown(options=list(listener.sources.keys()), value='Only Conflicts', description='Source (*)')\n", "action=Dropdown(options=['Both', 'Just Insertions', 'Just Deletions'], value='Both', description='Action')\n", "editor=fixed('All')\n", "\n", "out = interactive_output(listener.listen, {\n", " '_range': _range,\n", " 'source': source,\n", " 'action': action,\n", " 'editor': editor})\n", "\n", "display(VBox([_range, action, source, out]))" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Go to next workbook" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "from utils.notebooks import get_next_notebook, get_previous_notebook\n", "\n", "%store agg_actions\n", "%store calculator\n", "%store editors_conflicts\n", "\n", "clear_output()\n", " \n", "\n", "if len(editors_conflicts) > 0:\n", " display(HTML(f'Go to next workbook'))\n", "else:\n", " display(HTML(f'Go back to the previous workbook'))\n" ] } ], "metadata": { "kernel_info": { "name": "python3" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" }, "nteract": { "version": "0.14.4" } }, "nbformat": 4, "nbformat_minor": 2 }