{ "cells": [ { "cell_type": "markdown", "id": "5451dfbe-903f-4700-a0e2-484d3f885b57", "metadata": {}, "source": [ "# Comparision of Proposed Vandalism Criteria with Revert Risk scores\n", "\n", "[TASK: T349083](https://phabricator.wikimedia.org/T349083)\n", "\n", "➤ ***Please view this notebook on [nbviewer](https://nbviewer.org/github/wikimedia-research/moderator-tools-FY24/blob/main/%5BT349083%5D%20vandalism_criteria_comparision/vandal_criteria_revert_risk_comparision.ipynb)***\n", "\n", "For various baseline measurements for evaluation of [Automoderator](https://www.mediawiki.org/wiki/Moderator_Tools/Automoderator), we want to develop a criteria to identify potential vandalism. In this analysis the criteria will be compared with the [revert risk scores](https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Language-agnostic_revert_risk). Starting with an set an intial set, different dimensions will be used to see how that impacts the median revert risk score by project and also how restricting the criteria further elimiates edits from consideration. The goal is find a balance between good median score, without eliminating too many edits from consideration.\n", "\n", "**Initial criteria:**\n", "- Edits from account with less than 25 edits or anonymous user\n", "- Reverted by a different editor\n", "- Reverts happen within 24 hours\n", "- Edits in the content namespace\n", "\n", "**Dimensions considered**\n", "- Time to revert \n", "- User edit count (for registered users)\n", "- Time since user's first revision (for registered users)\n", "- Time since user's previous revision (for registered users)\n", "- Time since previous revision on the page being edited\n", "- Absolute difference in bytes made by the revision\n", "\n", "## Summary\n", "Based on the analysis, the following additions/modifications can improve the median risk score\n", "- Reverted within 12 hours\n", "- User edit count less 15 edits\n", "- Time since user's first edit is less than 48 hours\n", "- Absolute bytes difference is more than 5 bytes" ] }, { "cell_type": "code", "execution_count": 566, "id": "31f2f8c8-5923-4ca3-8f0b-337764e49908", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div style='text-align:center'><b>Changes in the Median Risk & Number of Edits</b></div>" ], "text/plain": [ "<IPython.core.display.HTML object>" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " <div style=\"display:flex; justify-content: space-evenly;\">\n", " <div>Initial <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>wiki_db</th>\n", " <th>median_risk</th>\n", " <th>n_edits</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>dewiki</td>\n", " <td>0.901974</td>\n", " <td>16829</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>enwiki</td>\n", " <td>0.910679</td>\n", " <td>172584</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>eswiki</td>\n", " <td>0.922596</td>\n", " <td>55105</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>fawiki</td>\n", " <td>0.916366</td>\n", " <td>9967</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>frwiki</td>\n", " <td>0.903316</td>\n", " <td>19375</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>idwiki</td>\n", " <td>0.902464</td>\n", " <td>3554</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>itwiki</td>\n", " <td>0.919648</td>\n", " <td>23440</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>jawiki</td>\n", " <td>0.875682</td>\n", " <td>10170</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>ptwiki</td>\n", " <td>0.913064</td>\n", " <td>3361</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>ruwiki</td>\n", " <td>0.914291</td>\n", " <td>23587</td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>zhwiki</td>\n", " <td>0.883454</td>\n", " <td>7568</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div></div><div>+ Reverted within 12 hours <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>wiki_db</th>\n", " <th>median_risk</th>\n", " <th>n_edits</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>dewiki</td>\n", " <td>0.904239</td>\n", " <td>16077</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>enwiki</td>\n", " <td>0.912205</td>\n", " <td>162439</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>eswiki</td>\n", " <td>0.923474</td>\n", " <td>52922</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>fawiki</td>\n", " <td>0.916792</td>\n", " <td>9228</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>frwiki</td>\n", " <td>0.905588</td>\n", " <td>18401</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>idwiki</td>\n", " <td>0.901994</td>\n", " <td>3231</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>itwiki</td>\n", " <td>0.921301</td>\n", " <td>22077</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>jawiki</td>\n", " <td>0.879789</td>\n", " <td>9401</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>ptwiki</td>\n", " <td>0.914363</td>\n", " <td>3147</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>ruwiki</td>\n", " <td>0.916403</td>\n", " <td>22250</td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>zhwiki</td>\n", " <td>0.886989</td>\n", " <td>6880</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div></div><div>+ User Edit Count <= 15 edits <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>wiki_db</th>\n", " <th>median_risk</th>\n", " <th>n_edits</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>dewiki</td>\n", " <td>0.904503</td>\n", " <td>16061</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>enwiki</td>\n", " <td>0.912847</td>\n", " <td>160889</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>eswiki</td>\n", " <td>0.923850</td>\n", " <td>52696</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>fawiki</td>\n", " <td>0.918056</td>\n", " <td>9136</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>frwiki</td>\n", " <td>0.906304</td>\n", " <td>18285</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>idwiki</td>\n", " <td>0.902892</td>\n", " <td>3190</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>itwiki</td>\n", " <td>0.921365</td>\n", " <td>22011</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>jawiki</td>\n", " <td>0.880116</td>\n", " <td>9109</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>ptwiki</td>\n", " <td>0.916916</td>\n", " <td>3079</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>ruwiki</td>\n", " <td>0.916746</td>\n", " <td>22204</td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>zhwiki</td>\n", " <td>0.887588</td>\n", " <td>6819</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div></div><div>+ Time Since First Edit <= 48 hrs <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>wiki_db</th>\n", " <th>median_risk</th>\n", " <th>n_edits</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>dewiki</td>\n", " <td>0.907555</td>\n", " <td>15468</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>enwiki</td>\n", " <td>0.915196</td>\n", " <td>153858</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>eswiki</td>\n", " <td>0.924792</td>\n", " <td>51696</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>fawiki</td>\n", " <td>0.920468</td>\n", " <td>8539</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>frwiki</td>\n", " <td>0.909034</td>\n", " <td>17489</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>idwiki</td>\n", " <td>0.905071</td>\n", " <td>3067</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>itwiki</td>\n", " <td>0.922709</td>\n", " <td>21633</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>jawiki</td>\n", " <td>0.882525</td>\n", " <td>8828</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>ptwiki</td>\n", " <td>0.930669</td>\n", " <td>2458</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>ruwiki</td>\n", " <td>0.918103</td>\n", " <td>21661</td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>zhwiki</td>\n", " <td>0.890380</td>\n", " <td>6481</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div></div><div>+ Absolute Bytes Diff >= 5 bytes <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>wiki_db</th>\n", " <th>median_risk</th>\n", " <th>n_edits</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>dewiki</td>\n", " <td>0.917214</td>\n", " <td>11281</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>enwiki</td>\n", " <td>0.920194</td>\n", " <td>115997</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>eswiki</td>\n", " <td>0.930483</td>\n", " <td>39239</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>fawiki</td>\n", " <td>0.924352</td>\n", " <td>6734</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>frwiki</td>\n", " <td>0.913709</td>\n", " <td>13492</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>idwiki</td>\n", " <td>0.910019</td>\n", " <td>2361</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>itwiki</td>\n", " <td>0.924533</td>\n", " <td>15505</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>jawiki</td>\n", " <td>0.883670</td>\n", " <td>6679</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>ptwiki</td>\n", " <td>0.934228</td>\n", " <td>1855</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>ruwiki</td>\n", " <td>0.923788</td>\n", " <td>16914</td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>zhwiki</td>\n", " <td>0.896337</td>\n", " <td>4813</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div></div>\n", " </div>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "pr_centered('Changes in the Median Risk & Number of Edits', True)\n", "display_h(results)" ] }, { "cell_type": "markdown", "id": "d3b4e707-db23-4cc9-95fe-19285595e171", "metadata": {}, "source": [ "- Restricting user related related metrics make minor improvements to the median risk, as majority of the reverted edits are made by anonymous users.\n", "- While having at least an n number of absolute bytes difference, improves the median risk, a substantial number of edits are elimiated, as compared to the initial criteria.\n", "- In addition to the time to revert, absolute bytes difference is only the control factor available for anonymous edits.\n" ] }, { "cell_type": "markdown", "id": "1013433a-2909-4b7c-8325-ac05c07ed8ea", "metadata": { "tags": [] }, "source": [ "# Data-Gathering" ] }, { "cell_type": "code", "execution_count": 2, "id": "7bfb58a2-a2f9-4ddd-b622-7f8130c12dfd", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import wmfdata as wmf\n", "\n", "pd.options.display.max_columns = None\n", "from IPython.display import clear_output\n", "\n", "import warnings\n", "import random\n", "from datetime import datetime\n", "\n", "from IPython.display import display_html\n", "from IPython.display import display, HTML\n", "from IPython.display import clear_output" ] }, { "cell_type": "code", "execution_count": 89, "id": "c806fdbd-1195-4d58-b54e-313dd35c8ced", "metadata": {}, "outputs": [], "source": [ "# import seaborn as sns\n", "# import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 180, "id": "0e5f4221-5d33-4abe-991d-e862d4d5e7f7", "metadata": {}, "outputs": [], "source": [ "spark_session = wmf.spark.get_active_session()\n", "\n", "if type(spark_session) != type(None):\n", " spark_session.stop()\n", "else:\n", " print('no active session')" ] }, { "cell_type": "code", "execution_count": 574, "id": "f24d2e1f-eebb-4b99-8dd8-fd5b8ade338f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " <div>\n", " <p><b>SparkSession - hive</b></p>\n", " \n", " <div>\n", " <p><b>SparkContext</b></p>\n", "\n", " <p><a href=\"http://stat1005.eqiad.wmnet:4045\">Spark UI</a></p>\n", "\n", " <dl>\n", " <dt>Version</dt>\n", " <dd><code>v3.1.2</code></dd>\n", " <dt>Master</dt>\n", " <dd><code>yarn</code></dd>\n", " <dt>AppName</dt>\n", " <dd><code>vandal-criteria-comparision</code></dd>\n", " </dl>\n", " </div>\n", " \n", " </div>\n", " " ], "text/plain": [ "<pyspark.sql.session.SparkSession at 0x7feac1442020>" ] }, "execution_count": 574, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spark_session = wmf.spark.create_custom_session(\n", " master=\"yarn\",\n", " app_name='vandal-criteria-comparision',\n", " spark_config={\n", " \"spark.driver.memory\": \"6g\",\n", " \"spark.dynamicAllocation.maxExecutors\": 64,\n", " \"spark.executor.memory\": \"24g\",\n", " \"spark.executor.cores\": 4,\n", " \"spark.sql.shuffle.partitions\": 256,\n", " \"spark.driver.maxResultSize\": \"2g\"\n", " \n", " }\n", ")\n", "\n", "clear_output()\n", "\n", "spark_session.sparkContext.setLogLevel(\"ERROR\")\n", "spark_session" ] }, { "cell_type": "markdown", "id": "bd65d9cb-c8af-4a44-826e-85e999a3bc4f", "metadata": {}, "source": [ "## query" ] }, { "cell_type": "code", "execution_count": 575, "id": "54c7c413-8b81-428f-95f7-b408b97d9544", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[Stage 0:> (0 + 1) / 1]\r" ] }, { "name": "stdout", "output_type": "stream", "text": [ "root\n", " |-- rev_id: long (nullable = true)\n", " |-- wiki_db: string (nullable = true)\n", " |-- rev_timestamp: string (nullable = true)\n", " |-- revision_is_identity_reverted: boolean (nullable = true)\n", " |-- revision_seconds_to_identity_revert: long (nullable = true)\n", " |-- page_id: long (nullable = true)\n", " |-- revision_revert_risk: float (nullable = true)\n", " |-- user_is_anonymous: boolean (nullable = true)\n", " |-- user_is_bot: boolean (nullable = true)\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " \r" ] } ], "source": [ "rr_scores_path = '/user/paragon/riskobservatory/revertrisk_20212022_anonymous_bot.parquet'\n", "\n", "rr_scores = spark_session.read.parquet(rr_scores_path)\n", "rr_scores.createOrReplaceTempView('rr_scores')\n", "\n", "rr_scores.printSchema()" ] }, { "cell_type": "code", "execution_count": 8, "id": "7a5746a4-b4df-4c23-aa3e-6072d0ccf2dc", "metadata": {}, "outputs": [], "source": [ "mwh_snapshot = '2023-10'\n", "\n", "wikis_list = [f'{lang}wiki' for lang in ['en', 'es', 'ja', 'de', 'fr', 'ru', 'zh', 'it', 'pt', 'fa', 'id']]\n", "wikis_sql = wmf.utils.sql_tuple(wikis_list)" ] }, { "cell_type": "code", "execution_count": 9, "id": "41ff9f3d-87d6-448c-a460-742878d55f7a", "metadata": {}, "outputs": [], "source": [ "# generate 30 random dates in an year\n", "\n", "def generate_random_dates(year, num_dates):\n", " dates = []\n", " for _ in range(num_dates):\n", " month = random.randint(1, 12)\n", " if month in [1, 3, 5, 7, 8, 10, 12]:\n", " day = random.randint(1, 31)\n", " elif month == 2:\n", " day = random.randint(1, 28)\n", " else:\n", " day = random.randint(1, 30)\n", " \n", " date = datetime(year, month, day)\n", " dates.append(date.strftime(\"%Y-%m-%d\"))\n", " \n", " return dates\n", "\n", "random_dates_2022 = generate_random_dates(2022, 30)\n", "random_dates_2022_sql = wmf.utils.sql_tuple(random_dates_2022)" ] }, { "cell_type": "code", "execution_count": 585, "id": "e3058fbc-4d4f-4ecf-9345-423fd62a5bcd", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " 2]\r" ] }, { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 5.2 s, sys: 0 ns, total: 5.2 s\n", "Wall time: 5min 16s\n" ] } ], "source": [ "%%time\n", "\n", "query = f\"\"\"\n", "WITH \n", " base_criteria AS (\n", " SELECT\n", " mwh.wiki_db,\n", " rr.rev_id,\n", " revision_revert_risk AS risk,\n", " mwh.event_user_text AS user_name,\n", " event_timestamp AS rev_ts,\n", " event_user_is_anonymous AS is_anon,\n", " event_user_revision_count AS user_edit_count,\n", " COALESCE(event_user_registration_timestamp, event_user_creation_timestamp) AS user_reg_ts,\n", " event_user_first_edit_timestamp AS user_first_rev_ts,\n", " event_user_seconds_since_previous_revision AS time_user_prev_rev,\n", " page_seconds_since_previous_revision AS time_page_prev_rev,\n", " revision_text_bytes_diff AS rev_bytes_diff,\n", " mwh.revision_seconds_to_identity_revert AS time_to_revert,\n", " revision_text_bytes AS rev_bytes,\n", " revision_is_identity_revert AS reverting_edit,\n", " revision_first_identity_reverting_revision_id AS reverting_edit_id\n", " FROM \n", " rr_scores rr\n", " JOIN \n", " wmf.mediawiki_history mwh \n", " ON rr.wiki_db = mwh.wiki_db AND rr.rev_id = mwh.revision_id\n", " WHERE \n", " snapshot = '{mwh_snapshot}'\n", " AND rr.wiki_db IN {wikis_sql}\n", " AND event_entity = 'revision'\n", " AND event_type = 'create'\n", " AND DATE(event_timestamp) IN {random_dates_2022_sql}\n", " AND page_namespace_is_content\n", " AND (event_user_is_anonymous OR event_user_revision_count <= 250)\n", " AND SIZE(event_user_is_bot_by_historical) = 0\n", " AND mwh.revision_is_identity_reverted\n", " AND mwh.revision_seconds_to_identity_revert <= 3*24*60*60\n", " )\n", " \n", "\n", "SELECT\n", " bc.*,\n", " mwh.event_user_is_anonymous AS reverting_user_is_anon,\n", " mwh.event_user_revision_count AS reverting_user_edit_count,\n", " mwh.event_user_first_edit_timestamp AS reverting_user_first_rev_ts,\n", " mwh.revision_is_identity_reverted AS is_revert_reverted,\n", " mwh.revision_seconds_to_identity_revert AS revert_time_to_revert\n", "FROM \n", " base_criteria bc\n", "JOIN\n", " wmf.mediawiki_history mwh\n", " ON bc.wiki_db = mwh.wiki_db AND bc.reverting_edit_id = mwh.revision_id\n", "WHERE\n", " snapshot = '{mwh_snapshot}'\n", " AND NOT bc.user_name = mwh.event_user_text\n", "\"\"\"\n", "\n", "edits = wmf.spark.run(query)" ] }, { "cell_type": "code", "execution_count": 586, "id": "d1f360de-a02c-4ca0-ad9f-05c6a79b01cf", "metadata": {}, "outputs": [], "source": [ "edits = (\n", " edits\n", " .assign(\n", " rev_ts=pd.to_datetime(edits['rev_ts'], utc=True),\n", " user_reg_ts=pd.to_datetime(edits['user_reg_ts'], utc=True),\n", " user_first_rev_ts=pd.to_datetime(edits['user_first_rev_ts'], utc=True),\n", " reverting_user_first_rev_ts=pd.to_datetime(edits['reverting_user_first_rev_ts'], utc=True),\n", " is_anon=pd.Categorical(edits['is_anon']),\n", " reverting_user_is_anon=pd.Categorical(edits['reverting_user_is_anon']),\n", " is_revert_reverted=pd.Categorical(edits['is_revert_reverted'])\n", " )\n", ")" ] }, { "cell_type": "code", "execution_count": 587, "id": "a71392f0-9a22-4a01-a325-b204304e10a1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "<class 'pandas.core.frame.DataFrame'>\n", "RangeIndex: 391096 entries, 0 to 391095\n", "Data columns (total 21 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 wiki_db 391096 non-null object \n", " 1 rev_id 391096 non-null int64 \n", " 2 risk 391096 non-null float32 \n", " 3 user_name 391096 non-null object \n", " 4 rev_ts 391096 non-null datetime64[ns, UTC]\n", " 5 is_anon 391096 non-null category \n", " 6 user_edit_count 92095 non-null float64 \n", " 7 user_reg_ts 92053 non-null datetime64[ns, UTC]\n", " 8 user_first_rev_ts 92095 non-null datetime64[ns, UTC]\n", " 9 time_user_prev_rev 75259 non-null float64 \n", " 10 time_page_prev_rev 391096 non-null int64 \n", " 11 rev_bytes_diff 387254 non-null float64 \n", " 12 time_to_revert 391096 non-null int64 \n", " 13 rev_bytes 387436 non-null float64 \n", " 14 reverting_edit 391096 non-null bool \n", " 15 reverting_edit_id 391096 non-null int64 \n", " 16 reverting_user_is_anon 391096 non-null category \n", " 17 reverting_user_edit_count 376252 non-null float64 \n", " 18 reverting_user_first_rev_ts 376252 non-null datetime64[ns, UTC]\n", " 19 is_revert_reverted 391096 non-null category \n", " 20 revert_time_to_revert 48583 non-null float64 \n", "dtypes: bool(1), category(3), datetime64[ns, UTC](4), float32(1), float64(6), int64(4), object(2)\n", "memory usage: 50.7+ MB\n" ] } ], "source": [ "edits.info()" ] }, { "cell_type": "markdown", "id": "a2b680e6-afb1-4da3-b3d4-aa79746b0a33", "metadata": {}, "source": [ "# Analysis" ] }, { "cell_type": "markdown", "id": "5691279e-683d-41a7-abb0-ad092bda57f5", "metadata": {}, "source": [ "## Functions" ] }, { "cell_type": "code", "execution_count": 409, "id": "8c0f36bf-2183-42a7-9c6c-fa7690679ce1", "metadata": {}, "outputs": [], "source": [ "# prints a string at center of the output, bold if needed\n", "def pr_centered(content, bold=False):\n", " if bold:\n", " content = f\"<b>{content}</b>\"\n", " \n", " centered_html = f\"<div style='text-align:center'>{content}</div>\"\n", " \n", " display(HTML(centered_html))\n", "\n", "\n", "# display dataframes horizontally with title for each\n", "def display_h(frames, space=100):\n", " html = \"\"\n", " \n", " for key in frames.keys():\n", " html_df =f'<div>{key} {frames[key]._repr_html_()}</div>'\n", " html += html_df\n", " \n", " html = f\"\"\"\n", " <div style=\"display:flex; justify-content: space-evenly;\">\n", " {html}\n", " </div>\"\"\"\n", " \n", " display_html(html, raw=True)" ] }, { "cell_type": "code", "execution_count": 503, "id": "ec036262-2f85-4be4-88c1-a1f8933b68e5", "metadata": {}, "outputs": [], "source": [ "def calculate_grouped(df, intervals, pivot_column, columns_title=None, column_names=None, target_column='risk', group_column='wiki_db', grp_function='median'):\n", "\n", " final_results = []\n", "\n", " for interval in intervals:\n", " \n", " # unlike other temporal columns, bytes difference should be greater than given value\n", " \n", " if pivot_column == 'rev_bytes_diff':\n", " df[pivot_column] = df[pivot_column].abs()\n", " filtered_df = df[df[pivot_column] >= interval]\n", " else:\n", " filtered_df = df[df[pivot_column] <= interval]\n", " \n", " grouped = filtered_df.groupby(group_column).agg({target_column: grp_function}).reset_index()\n", "\n", " grouped['interval'] = interval\n", " final_results.append(grouped)\n", "\n", " concatenated_df = pd.concat(final_results)\n", " pivot_df = concatenated_df.pivot(index=group_column, columns='interval', values=target_column)\n", " \n", " if columns_title == None:\n", " pivot_df.columns.name = f'median: {pivot_column}'\n", " else:\n", " pivot_df.columns.name = f'median: {columns_title}'\n", " \n", " if column_names != None:\n", " pivot_df.columns = column_names\n", "\n", " return pivot_df\n", "\n", "# def plot_hmap(df, x_label, title, fontsize=10, y_label='Wikipedia', cbar_label='Median Risk'):\n", " \n", "# ax = sns.heatmap(df, annot=True, annot_kws={\"size\": fontsize})\n", " \n", "# # set labels\n", "# ax.set_xlabel(x_label, fontsize=fontsize)\n", "# ax.set_ylabel(y_label, fontsize=fontsize)\n", "# ax.set_title(title, fontsize=fontsize + 1)\n", " \n", "# # color bar properties\n", "# cbar = ax.collections[0].colorbar\n", "# cbar.set_label(cbar_label, fontsize=fontsize)\n", "# cbar.ax.tick_params(labelsize=fontsize)\n", "\n", "# plt.show()\n", " \n", "def time_delta(df, start_column, end_column):\n", " try: \n", " return df.apply(lambda row: (row[end_column] - row[start_column]).total_seconds(), axis=1)\n", " except:\n", " return np.NaN" ] }, { "cell_type": "markdown", "id": "759f2ed7-d168-4ced-9f48-18b21c6f6e48", "metadata": {}, "source": [ "## Initial Criteria" ] }, { "cell_type": "code", "execution_count": null, "id": "c72ae52d-e1d9-48e5-8679-1e08a6de6c9d", "metadata": {}, "outputs": [], "source": [ "init_criteria = edits.query(\"\"\"(time_to_revert <= 24*60*60) & ((is_anon == True) | (user_edit_count <= 25))\"\"\")\n", "\n", "init_criteria = (\n", " init_criteria\n", " .assign(\n", " elapsed_reg=time_delta(init_criteria, 'user_reg_ts', 'rev_ts'),\n", " elapsed_first_rev=time_delta(init_criteria, 'user_first_rev_ts', 'rev_ts'),\n", " rv_user_elapsed_first_rev=time_delta(init_criteria, 'reverting_user_first_rev_ts', 'rev_ts')\n", " )\n", ")" ] }, { "cell_type": "code", "execution_count": 589, "id": "1785a25a-6880-4c9c-a76d-ae9eadd5a2ae", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>wiki_db</th>\n", " <th>median_risk</th>\n", " <th>n_edits</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>dewiki</td>\n", " <td>0.901974</td>\n", " <td>16829</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>enwiki</td>\n", " <td>0.910679</td>\n", " <td>172584</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>eswiki</td>\n", " <td>0.922596</td>\n", " <td>55105</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>fawiki</td>\n", " <td>0.916366</td>\n", " <td>9967</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>frwiki</td>\n", " <td>0.903316</td>\n", " <td>19375</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>idwiki</td>\n", " <td>0.902464</td>\n", " <td>3554</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>itwiki</td>\n", " <td>0.919648</td>\n", " <td>23440</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>jawiki</td>\n", " <td>0.875682</td>\n", " <td>10170</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>ptwiki</td>\n", " <td>0.913064</td>\n", " <td>3361</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>ruwiki</td>\n", " <td>0.914291</td>\n", " <td>23587</td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>zhwiki</td>\n", " <td>0.883454</td>\n", " <td>7568</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " wiki_db median_risk n_edits\n", "0 dewiki 0.901974 16829\n", "1 enwiki 0.910679 172584\n", "2 eswiki 0.922596 55105\n", "3 fawiki 0.916366 9967\n", "4 frwiki 0.903316 19375\n", "5 idwiki 0.902464 3554\n", "6 itwiki 0.919648 23440\n", "7 jawiki 0.875682 10170\n", "8 ptwiki 0.913064 3361\n", "9 ruwiki 0.914291 23587\n", "10 zhwiki 0.883454 7568" ] }, "execution_count": 589, "metadata": {}, "output_type": "execute_result" } ], "source": [ "init_criteria_risk = (\n", " init_criteria\n", " .groupby('wiki_db')\n", " .agg({\n", " 'risk': 'median', \n", " 'rev_id': 'count'\n", " })\n", " .reset_index()\n", " .rename({\n", " 'rev_id': 'n_edits', \n", " 'risk': 'median_risk'\n", " }, axis=1)\n", ")\n", "\n", "init_criteria_risk" ] }, { "cell_type": "markdown", "id": "88270901-6d56-4933-a087-1a0a07875c09", "metadata": {}, "source": [ "## Time to Revert" ] }, { "cell_type": "code", "execution_count": 434, "id": "a7c2f9cb-d5a9-4cde-a4fd-2428742c0557", "metadata": {}, "outputs": [], "source": [ "ttr_hour_intervals = [1, 2, 4, 8, 12, 24]\n", "ttr_time_intervals = [i*60*60 for i in ttr_hour_intervals]\n", "ttr_column_names = [f'{i} hr' for i in ttr_hour_intervals]\n", "\n", "ttr_median_risk = calculate_grouped(init_criteria, ttr_time_intervals, \n", " 'time_to_revert', column_names=ttr_column_names)\n", "ttr_interval_counts = calculate_grouped(init_criteria, ttr_time_intervals, \n", " 'time_to_revert', column_names=ttr_column_names, grp_function = 'count')" ] }, { "cell_type": "code", "execution_count": 457, "id": "0784ab64-9003-4ef1-bf41-25b447bf0940", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " <div style=\"display:flex; justify-content: space-evenly;\">\n", " <div>Median Risk <style type=\"text/css\">\n", "#T_0bc8a_row0_col0 {\n", " background-color: #1f958b;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row0_col1 {\n", " background-color: #20928c;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row0_col2 {\n", " background-color: #228c8d;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row0_col3 {\n", " background-color: #25848e;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row0_col4, #T_0bc8a_row0_col5, #T_0bc8a_row4_col0, #T_0bc8a_row4_col1 {\n", " background-color: #26828e;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row1_col0, #T_0bc8a_row3_col1 {\n", " background-color: #3d4e8a;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row1_col1 {\n", " background-color: #3b518b;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row1_col2 {\n", " background-color: #3b528b;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row1_col3, #T_0bc8a_row1_col5 {\n", " background-color: #3a538b;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row1_col4 {\n", " background-color: #3a548c;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row2_col0, #T_0bc8a_row2_col1, #T_0bc8a_row2_col2, #T_0bc8a_row2_col3, #T_0bc8a_row2_col4, #T_0bc8a_row2_col5 {\n", " background-color: #440154;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row3_col0 {\n", " background-color: #355f8d;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row3_col2 {\n", " background-color: #433e85;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row3_col3, #T_0bc8a_row9_col4 {\n", " background-color: #453882;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row3_col4 {\n", " background-color: #453581;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row3_col5, #T_0bc8a_row9_col2 {\n", " background-color: #472e7c;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row4_col2, #T_0bc8a_row4_col3 {\n", " background-color: #29798e;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row4_col4 {\n", " background-color: #297a8e;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row4_col5 {\n", " background-color: #297b8e;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row5_col0 {\n", " background-color: #84d44b;\n", " color: #000000;\n", "}\n", "#T_0bc8a_row5_col1 {\n", " background-color: #50c46a;\n", " color: #000000;\n", "}\n", "#T_0bc8a_row5_col2 {\n", " background-color: #31b57b;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row5_col3 {\n", " background-color: #1f948c;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row5_col4 {\n", " background-color: #218e8d;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row5_col5 {\n", " background-color: #277f8e;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row6_col0 {\n", " background-color: #450457;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row6_col1 {\n", " background-color: #450559;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row6_col2 {\n", " background-color: #471063;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row6_col3, #T_0bc8a_row6_col4 {\n", " background-color: #471365;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row6_col5 {\n", " background-color: #48186a;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row7_col0, #T_0bc8a_row7_col1, #T_0bc8a_row7_col2, #T_0bc8a_row7_col3, #T_0bc8a_row7_col4, #T_0bc8a_row7_col5 {\n", " background-color: #fde725;\n", " color: #000000;\n", "}\n", "#T_0bc8a_row8_col0 {\n", " background-color: #3f4889;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row8_col1 {\n", " background-color: #3e4c8a;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row8_col2 {\n", " background-color: #3f4788;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row8_col3 {\n", " background-color: #424186;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row8_col4 {\n", " background-color: #404688;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row8_col5 {\n", " background-color: #404588;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row9_col0 {\n", " background-color: #481769;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row9_col1 {\n", " background-color: #482576;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row9_col3 {\n", " background-color: #453781;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row9_col5 {\n", " background-color: #433d84;\n", " color: #f1f1f1;\n", "}\n", "#T_0bc8a_row10_col0 {\n", " background-color: #d0e11c;\n", " color: #000000;\n", "}\n", "#T_0bc8a_row10_col1 {\n", " background-color: #dde318;\n", " color: #000000;\n", "}\n", "#T_0bc8a_row10_col2 {\n", " background-color: #a8db34;\n", " color: #000000;\n", "}\n", "#T_0bc8a_row10_col3, #T_0bc8a_row10_col4, #T_0bc8a_row10_col5 {\n", " background-color: #90d743;\n", " color: #000000;\n", "}\n", "</style>\n", "<table id=\"T_0bc8a\">\n", " <thead>\n", " <tr>\n", " <th class=\"blank level0\" > </th>\n", " <th id=\"T_0bc8a_level0_col0\" class=\"col_heading level0 col0\" >1 hr</th>\n", " <th id=\"T_0bc8a_level0_col1\" class=\"col_heading level0 col1\" >2 hr</th>\n", " <th id=\"T_0bc8a_level0_col2\" class=\"col_heading level0 col2\" >4 hr</th>\n", " <th id=\"T_0bc8a_level0_col3\" class=\"col_heading level0 col3\" >8 hr</th>\n", " <th id=\"T_0bc8a_level0_col4\" class=\"col_heading level0 col4\" >12 hr</th>\n", " <th id=\"T_0bc8a_level0_col5\" class=\"col_heading level0 col5\" >24 hr</th>\n", " </tr>\n", " <tr>\n", " <th class=\"index_name level0\" >wiki_db</th>\n", " <th class=\"blank col0\" > </th>\n", " <th class=\"blank col1\" > </th>\n", " <th class=\"blank col2\" > </th>\n", " <th class=\"blank col3\" > </th>\n", " <th class=\"blank col4\" > </th>\n", " <th class=\"blank col5\" > </th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th id=\"T_0bc8a_level0_row0\" class=\"row_heading level0 row0\" >dewiki</th>\n", " <td id=\"T_0bc8a_row0_col0\" class=\"data row0 col0\" >0.910</td>\n", " <td id=\"T_0bc8a_row0_col1\" class=\"data row0 col1\" >0.908</td>\n", " <td id=\"T_0bc8a_row0_col2\" class=\"data row0 col2\" >0.906</td>\n", " <td id=\"T_0bc8a_row0_col3\" class=\"data row0 col3\" >0.905</td>\n", " <td id=\"T_0bc8a_row0_col4\" class=\"data row0 col4\" >0.904</td>\n", " <td id=\"T_0bc8a_row0_col5\" class=\"data row0 col5\" >0.902</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_0bc8a_level0_row1\" class=\"row_heading level0 row1\" >enwiki</th>\n", " <td id=\"T_0bc8a_row1_col0\" class=\"data row1 col0\" >0.920</td>\n", " <td id=\"T_0bc8a_row1_col1\" class=\"data row1 col1\" >0.918</td>\n", " <td id=\"T_0bc8a_row1_col2\" class=\"data row1 col2\" >0.915</td>\n", " <td id=\"T_0bc8a_row1_col3\" class=\"data row1 col3\" >0.913</td>\n", " <td id=\"T_0bc8a_row1_col4\" class=\"data row1 col4\" >0.912</td>\n", " <td id=\"T_0bc8a_row1_col5\" class=\"data row1 col5\" >0.911</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_0bc8a_level0_row2\" class=\"row_heading level0 row2\" >eswiki</th>\n", " <td id=\"T_0bc8a_row2_col0\" class=\"data row2 col0\" >0.928</td>\n", " <td id=\"T_0bc8a_row2_col1\" class=\"data row2 col1\" >0.926</td>\n", " <td id=\"T_0bc8a_row2_col2\" class=\"data row2 col2\" >0.925</td>\n", " <td id=\"T_0bc8a_row2_col3\" class=\"data row2 col3\" >0.924</td>\n", " <td id=\"T_0bc8a_row2_col4\" class=\"data row2 col4\" >0.923</td>\n", " <td id=\"T_0bc8a_row2_col5\" class=\"data row2 col5\" >0.923</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_0bc8a_level0_row3\" class=\"row_heading level0 row3\" >fawiki</th>\n", " <td id=\"T_0bc8a_row3_col0\" class=\"data row3 col0\" >0.918</td>\n", " <td id=\"T_0bc8a_row3_col1\" class=\"data row3 col1\" >0.918</td>\n", " <td id=\"T_0bc8a_row3_col2\" class=\"data row3 col2\" >0.918</td>\n", " <td id=\"T_0bc8a_row3_col3\" class=\"data row3 col3\" >0.917</td>\n", " <td id=\"T_0bc8a_row3_col4\" class=\"data row3 col4\" >0.917</td>\n", " <td id=\"T_0bc8a_row3_col5\" class=\"data row3 col5\" >0.916</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_0bc8a_level0_row4\" class=\"row_heading level0 row4\" >frwiki</th>\n", " <td id=\"T_0bc8a_row4_col0\" class=\"data row4 col0\" >0.913</td>\n", " <td id=\"T_0bc8a_row4_col1\" class=\"data row4 col1\" >0.911</td>\n", " <td id=\"T_0bc8a_row4_col2\" class=\"data row4 col2\" >0.909</td>\n", " <td id=\"T_0bc8a_row4_col3\" class=\"data row4 col3\" >0.907</td>\n", " <td id=\"T_0bc8a_row4_col4\" class=\"data row4 col4\" >0.906</td>\n", " <td id=\"T_0bc8a_row4_col5\" class=\"data row4 col5\" >0.903</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_0bc8a_level0_row5\" class=\"row_heading level0 row5\" >idwiki</th>\n", " <td id=\"T_0bc8a_row5_col0\" class=\"data row5 col0\" >0.900</td>\n", " <td id=\"T_0bc8a_row5_col1\" class=\"data row5 col1\" >0.901</td>\n", " <td id=\"T_0bc8a_row5_col2\" class=\"data row5 col2\" >0.899</td>\n", " <td id=\"T_0bc8a_row5_col3\" class=\"data row5 col3\" >0.902</td>\n", " <td id=\"T_0bc8a_row5_col4\" class=\"data row5 col4\" >0.902</td>\n", " <td id=\"T_0bc8a_row5_col5\" class=\"data row5 col5\" >0.902</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_0bc8a_level0_row6\" class=\"row_heading level0 row6\" >itwiki</th>\n", " <td id=\"T_0bc8a_row6_col0\" class=\"data row6 col0\" >0.927</td>\n", " <td id=\"T_0bc8a_row6_col1\" class=\"data row6 col1\" >0.926</td>\n", " <td id=\"T_0bc8a_row6_col2\" class=\"data row6 col2\" >0.924</td>\n", " <td id=\"T_0bc8a_row6_col3\" class=\"data row6 col3\" >0.922</td>\n", " <td id=\"T_0bc8a_row6_col4\" class=\"data row6 col4\" >0.921</td>\n", " <td id=\"T_0bc8a_row6_col5\" class=\"data row6 col5\" >0.920</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_0bc8a_level0_row7\" class=\"row_heading level0 row7\" >jawiki</th>\n", " <td id=\"T_0bc8a_row7_col0\" class=\"data row7 col0\" >0.894</td>\n", " <td id=\"T_0bc8a_row7_col1\" class=\"data row7 col1\" >0.891</td>\n", " <td id=\"T_0bc8a_row7_col2\" class=\"data row7 col2\" >0.886</td>\n", " <td id=\"T_0bc8a_row7_col3\" class=\"data row7 col3\" >0.882</td>\n", " <td id=\"T_0bc8a_row7_col4\" class=\"data row7 col4\" >0.880</td>\n", " <td id=\"T_0bc8a_row7_col5\" class=\"data row7 col5\" >0.876</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_0bc8a_level0_row8\" class=\"row_heading level0 row8\" >ptwiki</th>\n", " <td id=\"T_0bc8a_row8_col0\" class=\"data row8 col0\" >0.920</td>\n", " <td id=\"T_0bc8a_row8_col1\" class=\"data row8 col1\" >0.918</td>\n", " <td id=\"T_0bc8a_row8_col2\" class=\"data row8 col2\" >0.917</td>\n", " <td id=\"T_0bc8a_row8_col3\" class=\"data row8 col3\" >0.916</td>\n", " <td id=\"T_0bc8a_row8_col4\" class=\"data row8 col4\" >0.914</td>\n", " <td id=\"T_0bc8a_row8_col5\" class=\"data row8 col5\" >0.913</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_0bc8a_level0_row9\" class=\"row_heading level0 row9\" >ruwiki</th>\n", " <td id=\"T_0bc8a_row9_col0\" class=\"data row9 col0\" >0.926</td>\n", " <td id=\"T_0bc8a_row9_col1\" class=\"data row9 col1\" >0.923</td>\n", " <td id=\"T_0bc8a_row9_col2\" class=\"data row9 col2\" >0.920</td>\n", " <td id=\"T_0bc8a_row9_col3\" class=\"data row9 col3\" >0.918</td>\n", " <td id=\"T_0bc8a_row9_col4\" class=\"data row9 col4\" >0.916</td>\n", " <td id=\"T_0bc8a_row9_col5\" class=\"data row9 col5\" >0.914</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_0bc8a_level0_row10\" class=\"row_heading level0 row10\" >zhwiki</th>\n", " <td id=\"T_0bc8a_row10_col0\" class=\"data row10 col0\" >0.896</td>\n", " <td id=\"T_0bc8a_row10_col1\" class=\"data row10 col1\" >0.893</td>\n", " <td id=\"T_0bc8a_row10_col2\" class=\"data row10 col2\" >0.891</td>\n", " <td id=\"T_0bc8a_row10_col3\" class=\"data row10 col3\" >0.889</td>\n", " <td id=\"T_0bc8a_row10_col4\" class=\"data row10 col4\" >0.887</td>\n", " <td id=\"T_0bc8a_row10_col5\" class=\"data row10 col5\" >0.883</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div><div>Number of Edits <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>1 hr</th>\n", " <th>2 hr</th>\n", " <th>4 hr</th>\n", " <th>8 hr</th>\n", " <th>12 hr</th>\n", " <th>24 hr</th>\n", " </tr>\n", " <tr>\n", " <th>wiki_db</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>dewiki</th>\n", " <td>13349</td>\n", " <td>14151</td>\n", " <td>14974</td>\n", " <td>15661</td>\n", " <td>16077</td>\n", " <td>16829</td>\n", " </tr>\n", " <tr>\n", " <th>enwiki</th>\n", " <td>114940</td>\n", " <td>128218</td>\n", " <td>141591</td>\n", " <td>155008</td>\n", " <td>162439</td>\n", " <td>172584</td>\n", " </tr>\n", " <tr>\n", " <th>eswiki</th>\n", " <td>42487</td>\n", " <td>45577</td>\n", " <td>48656</td>\n", " <td>51468</td>\n", " <td>52922</td>\n", " <td>55105</td>\n", " </tr>\n", " <tr>\n", " <th>fawiki</th>\n", " <td>6798</td>\n", " <td>7417</td>\n", " <td>8123</td>\n", " <td>8816</td>\n", " <td>9228</td>\n", " <td>9967</td>\n", " </tr>\n", " <tr>\n", " <th>frwiki</th>\n", " <td>14078</td>\n", " <td>15335</td>\n", " <td>16506</td>\n", " <td>17687</td>\n", " <td>18401</td>\n", " <td>19375</td>\n", " </tr>\n", " <tr>\n", " <th>idwiki</th>\n", " <td>1662</td>\n", " <td>2070</td>\n", " <td>2550</td>\n", " <td>3006</td>\n", " <td>3231</td>\n", " <td>3554</td>\n", " </tr>\n", " <tr>\n", " <th>itwiki</th>\n", " <td>16739</td>\n", " <td>18198</td>\n", " <td>19752</td>\n", " <td>21189</td>\n", " <td>22077</td>\n", " <td>23440</td>\n", " </tr>\n", " <tr>\n", " <th>jawiki</th>\n", " <td>6351</td>\n", " <td>7245</td>\n", " <td>8150</td>\n", " <td>8943</td>\n", " <td>9401</td>\n", " <td>10170</td>\n", " </tr>\n", " <tr>\n", " <th>ptwiki</th>\n", " <td>2081</td>\n", " <td>2347</td>\n", " <td>2686</td>\n", " <td>2985</td>\n", " <td>3147</td>\n", " <td>3361</td>\n", " </tr>\n", " <tr>\n", " <th>ruwiki</th>\n", " <td>15851</td>\n", " <td>17794</td>\n", " <td>19570</td>\n", " <td>21248</td>\n", " <td>22250</td>\n", " <td>23587</td>\n", " </tr>\n", " <tr>\n", " <th>zhwiki</th>\n", " <td>4071</td>\n", " <td>4823</td>\n", " <td>5637</td>\n", " <td>6446</td>\n", " <td>6880</td>\n", " <td>7568</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div></div>\n", " </div>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display_h({\n", " 'Median Risk': ttr_median_risk.style.background_gradient(cmap ='viridis_r').format(\"{:.3f}\"),\n", " 'Number of Edits': ttr_interval_counts\n", "})" ] }, { "cell_type": "markdown", "id": "20d197eb-630e-4b46-8498-888644874a32", "metadata": {}, "source": [ "Limiting to 8 hr window provides a slight improvement without eliminating a lot of edits." ] }, { "cell_type": "markdown", "id": "fcd1397d-ce09-4d02-aa01-b5fe2af0ef20", "metadata": {}, "source": [ "## User Edit Count" ] }, { "cell_type": "code", "execution_count": 439, "id": "fdd8b535-c4ff-4aef-b575-c40e0762b8b9", "metadata": {}, "outputs": [], "source": [ "edit_count_intervals = [5, 10, 15, 20, 25]\n", "edit_count_column_names = [f'{i} edits' for i in edit_count_intervals]\n", "\n", "edit_count_median_risk = calculate_grouped(init_criteria, edit_count_intervals, \n", " 'user_edit_count', column_names=edit_count_column_names)\n", "edit_count_interval_counts = calculate_grouped(init_criteria, edit_count_intervals, \n", " 'user_edit_count', column_names=edit_count_column_names, grp_function='count')" ] }, { "cell_type": "code", "execution_count": 456, "id": "b3df9e0b-10b3-4696-b80e-f8ee0c2a494c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " <div style=\"display:flex; justify-content: space-evenly;\">\n", " <div>Median Risk <style type=\"text/css\">\n", "#T_06d0c_row0_col0, #T_06d0c_row0_col1, #T_06d0c_row0_col2, #T_06d0c_row0_col3, #T_06d0c_row0_col4 {\n", " background-color: #fde725;\n", " color: #000000;\n", "}\n", "#T_06d0c_row1_col0 {\n", " background-color: #32658e;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row1_col1 {\n", " background-color: #2f6c8e;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row1_col2, #T_06d0c_row1_col3 {\n", " background-color: #2f6b8e;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row1_col4 {\n", " background-color: #30698e;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row2_col0, #T_06d0c_row2_col1, #T_06d0c_row2_col2, #T_06d0c_row2_col3, #T_06d0c_row2_col4 {\n", " background-color: #440154;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row3_col0 {\n", " background-color: #404688;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row3_col1 {\n", " background-color: #414487;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row3_col2 {\n", " background-color: #414287;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row3_col3 {\n", " background-color: #3c508b;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row3_col4 {\n", " background-color: #38588c;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row4_col0 {\n", " background-color: #228c8d;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row4_col1 {\n", " background-color: #1e9b8a;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row4_col2 {\n", " background-color: #21a585;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row4_col3 {\n", " background-color: #29af7f;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row4_col4 {\n", " background-color: #2cb17e;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row5_col0 {\n", " background-color: #89d548;\n", " color: #000000;\n", "}\n", "#T_06d0c_row5_col1 {\n", " background-color: #eae51a;\n", " color: #000000;\n", "}\n", "#T_06d0c_row5_col2 {\n", " background-color: #98d83e;\n", " color: #000000;\n", "}\n", "#T_06d0c_row5_col3 {\n", " background-color: #77d153;\n", " color: #000000;\n", "}\n", "#T_06d0c_row5_col4 {\n", " background-color: #6ece58;\n", " color: #000000;\n", "}\n", "#T_06d0c_row6_col0, #T_06d0c_row9_col1 {\n", " background-color: #277e8e;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row6_col1, #T_06d0c_row6_col3, #T_06d0c_row9_col3 {\n", " background-color: #25838e;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row6_col2 {\n", " background-color: #24868e;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row6_col4, #T_06d0c_row9_col0 {\n", " background-color: #297b8e;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row7_col0, #T_06d0c_row7_col2, #T_06d0c_row9_col4 {\n", " background-color: #26828e;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row7_col1 {\n", " background-color: #2b758e;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row7_col3 {\n", " background-color: #228b8d;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row7_col4 {\n", " background-color: #1f958b;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row8_col0 {\n", " background-color: #2e6d8e;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row8_col1 {\n", " background-color: #355f8d;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row8_col2 {\n", " background-color: #39558c;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row8_col3 {\n", " background-color: #3a538b;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row8_col4 {\n", " background-color: #3e4c8a;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row9_col2 {\n", " background-color: #287d8e;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row10_col0 {\n", " background-color: #5ec962;\n", " color: #000000;\n", "}\n", "#T_06d0c_row10_col1 {\n", " background-color: #48c16e;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row10_col2 {\n", " background-color: #31b57b;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row10_col3 {\n", " background-color: #28ae80;\n", " color: #f1f1f1;\n", "}\n", "#T_06d0c_row10_col4 {\n", " background-color: #34b679;\n", " color: #f1f1f1;\n", "}\n", "</style>\n", "<table id=\"T_06d0c\">\n", " <thead>\n", " <tr>\n", " <th class=\"blank level0\" > </th>\n", " <th id=\"T_06d0c_level0_col0\" class=\"col_heading level0 col0\" >5 edits</th>\n", " <th id=\"T_06d0c_level0_col1\" class=\"col_heading level0 col1\" >10 edits</th>\n", " <th id=\"T_06d0c_level0_col2\" class=\"col_heading level0 col2\" >15 edits</th>\n", " <th id=\"T_06d0c_level0_col3\" class=\"col_heading level0 col3\" >20 edits</th>\n", " <th id=\"T_06d0c_level0_col4\" class=\"col_heading level0 col4\" >25 edits</th>\n", " </tr>\n", " <tr>\n", " <th class=\"index_name level0\" >wiki_db</th>\n", " <th class=\"blank col0\" > </th>\n", " <th class=\"blank col1\" > </th>\n", " <th class=\"blank col2\" > </th>\n", " <th class=\"blank col3\" > </th>\n", " <th class=\"blank col4\" > </th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th id=\"T_06d0c_level0_row0\" class=\"row_heading level0 row0\" >dewiki</th>\n", " <td id=\"T_06d0c_row0_col0\" class=\"data row0 col0\" >0.900</td>\n", " <td id=\"T_06d0c_row0_col1\" class=\"data row0 col1\" >0.892</td>\n", " <td id=\"T_06d0c_row0_col2\" class=\"data row0 col2\" >0.885</td>\n", " <td id=\"T_06d0c_row0_col3\" class=\"data row0 col3\" >0.880</td>\n", " <td id=\"T_06d0c_row0_col4\" class=\"data row0 col4\" >0.876</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_06d0c_level0_row1\" class=\"row_heading level0 row1\" >enwiki</th>\n", " <td id=\"T_06d0c_row1_col0\" class=\"data row1 col0\" >0.924</td>\n", " <td id=\"T_06d0c_row1_col1\" class=\"data row1 col1\" >0.918</td>\n", " <td id=\"T_06d0c_row1_col2\" class=\"data row1 col2\" >0.913</td>\n", " <td id=\"T_06d0c_row1_col3\" class=\"data row1 col3\" >0.910</td>\n", " <td id=\"T_06d0c_row1_col4\" class=\"data row1 col4\" >0.908</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_06d0c_level0_row2\" class=\"row_heading level0 row2\" >eswiki</th>\n", " <td id=\"T_06d0c_row2_col0\" class=\"data row2 col0\" >0.936</td>\n", " <td id=\"T_06d0c_row2_col1\" class=\"data row2 col1\" >0.931</td>\n", " <td id=\"T_06d0c_row2_col2\" class=\"data row2 col2\" >0.929</td>\n", " <td id=\"T_06d0c_row2_col3\" class=\"data row2 col3\" >0.926</td>\n", " <td id=\"T_06d0c_row2_col4\" class=\"data row2 col4\" >0.924</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_06d0c_level0_row3\" class=\"row_heading level0 row3\" >fawiki</th>\n", " <td id=\"T_06d0c_row3_col0\" class=\"data row3 col0\" >0.929</td>\n", " <td id=\"T_06d0c_row3_col1\" class=\"data row3 col1\" >0.923</td>\n", " <td id=\"T_06d0c_row3_col2\" class=\"data row3 col2\" >0.920</td>\n", " <td id=\"T_06d0c_row3_col3\" class=\"data row3 col3\" >0.915</td>\n", " <td id=\"T_06d0c_row3_col4\" class=\"data row3 col4\" >0.911</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_06d0c_level0_row4\" class=\"row_heading level0 row4\" >frwiki</th>\n", " <td id=\"T_06d0c_row4_col0\" class=\"data row4 col0\" >0.919</td>\n", " <td id=\"T_06d0c_row4_col1\" class=\"data row4 col1\" >0.910</td>\n", " <td id=\"T_06d0c_row4_col2\" class=\"data row4 col2\" >0.903</td>\n", " <td id=\"T_06d0c_row4_col3\" class=\"data row4 col3\" >0.897</td>\n", " <td id=\"T_06d0c_row4_col4\" class=\"data row4 col4\" >0.893</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_06d0c_level0_row5\" class=\"row_heading level0 row5\" >idwiki</th>\n", " <td id=\"T_06d0c_row5_col0\" class=\"data row5 col0\" >0.906</td>\n", " <td id=\"T_06d0c_row5_col1\" class=\"data row5 col1\" >0.893</td>\n", " <td id=\"T_06d0c_row5_col2\" class=\"data row5 col2\" >0.891</td>\n", " <td id=\"T_06d0c_row5_col3\" class=\"data row5 col3\" >0.889</td>\n", " <td id=\"T_06d0c_row5_col4\" class=\"data row5 col4\" >0.886</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_06d0c_level0_row6\" class=\"row_heading level0 row6\" >itwiki</th>\n", " <td id=\"T_06d0c_row6_col0\" class=\"data row6 col0\" >0.921</td>\n", " <td id=\"T_06d0c_row6_col1\" class=\"data row6 col1\" >0.914</td>\n", " <td id=\"T_06d0c_row6_col2\" class=\"data row6 col2\" >0.908</td>\n", " <td id=\"T_06d0c_row6_col3\" class=\"data row6 col3\" >0.905</td>\n", " <td id=\"T_06d0c_row6_col4\" class=\"data row6 col4\" >0.904</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_06d0c_level0_row7\" class=\"row_heading level0 row7\" >jawiki</th>\n", " <td id=\"T_06d0c_row7_col0\" class=\"data row7 col0\" >0.920</td>\n", " <td id=\"T_06d0c_row7_col1\" class=\"data row7 col1\" >0.916</td>\n", " <td id=\"T_06d0c_row7_col2\" class=\"data row7 col2\" >0.909</td>\n", " <td id=\"T_06d0c_row7_col3\" class=\"data row7 col3\" >0.904</td>\n", " <td id=\"T_06d0c_row7_col4\" class=\"data row7 col4\" >0.899</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_06d0c_level0_row8\" class=\"row_heading level0 row8\" >ptwiki</th>\n", " <td id=\"T_06d0c_row8_col0\" class=\"data row8 col0\" >0.923</td>\n", " <td id=\"T_06d0c_row8_col1\" class=\"data row8 col1\" >0.919</td>\n", " <td id=\"T_06d0c_row8_col2\" class=\"data row8 col2\" >0.917</td>\n", " <td id=\"T_06d0c_row8_col3\" class=\"data row8 col3\" >0.914</td>\n", " <td id=\"T_06d0c_row8_col4\" class=\"data row8 col4\" >0.913</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_06d0c_level0_row9\" class=\"row_heading level0 row9\" >ruwiki</th>\n", " <td id=\"T_06d0c_row9_col0\" class=\"data row9 col0\" >0.921</td>\n", " <td id=\"T_06d0c_row9_col1\" class=\"data row9 col1\" >0.914</td>\n", " <td id=\"T_06d0c_row9_col2\" class=\"data row9 col2\" >0.910</td>\n", " <td id=\"T_06d0c_row9_col3\" class=\"data row9 col3\" >0.906</td>\n", " <td id=\"T_06d0c_row9_col4\" class=\"data row9 col4\" >0.903</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_06d0c_level0_row10\" class=\"row_heading level0 row10\" >zhwiki</th>\n", " <td id=\"T_06d0c_row10_col0\" class=\"data row10 col0\" >0.909</td>\n", " <td id=\"T_06d0c_row10_col1\" class=\"data row10 col1\" >0.903</td>\n", " <td id=\"T_06d0c_row10_col2\" class=\"data row10 col2\" >0.900</td>\n", " <td id=\"T_06d0c_row10_col3\" class=\"data row10 col3\" >0.897</td>\n", " <td id=\"T_06d0c_row10_col4\" class=\"data row10 col4\" >0.892</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div><div>Number of Edits <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>5 edits</th>\n", " <th>10 edits</th>\n", " <th>15 edits</th>\n", " <th>20 edits</th>\n", " <th>25 edits</th>\n", " </tr>\n", " <tr>\n", " <th>wiki_db</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>dewiki</th>\n", " <td>1560</td>\n", " <td>1893</td>\n", " <td>2086</td>\n", " <td>2200</td>\n", " <td>2301</td>\n", " </tr>\n", " <tr>\n", " <th>enwiki</th>\n", " <td>22503</td>\n", " <td>28206</td>\n", " <td>31350</td>\n", " <td>33433</td>\n", " <td>34986</td>\n", " </tr>\n", " <tr>\n", " <th>eswiki</th>\n", " <td>3898</td>\n", " <td>4866</td>\n", " <td>5345</td>\n", " <td>5655</td>\n", " <td>5851</td>\n", " </tr>\n", " <tr>\n", " <th>fawiki</th>\n", " <td>1227</td>\n", " <td>1702</td>\n", " <td>2018</td>\n", " <td>2259</td>\n", " <td>2442</td>\n", " </tr>\n", " <tr>\n", " <th>frwiki</th>\n", " <td>2398</td>\n", " <td>2944</td>\n", " <td>3252</td>\n", " <td>3463</td>\n", " <td>3611</td>\n", " </tr>\n", " <tr>\n", " <th>idwiki</th>\n", " <td>268</td>\n", " <td>383</td>\n", " <td>443</td>\n", " <td>495</td>\n", " <td>527</td>\n", " </tr>\n", " <tr>\n", " <th>itwiki</th>\n", " <td>1230</td>\n", " <td>1514</td>\n", " <td>1647</td>\n", " <td>1745</td>\n", " <td>1820</td>\n", " </tr>\n", " <tr>\n", " <th>jawiki</th>\n", " <td>1342</td>\n", " <td>1889</td>\n", " <td>2239</td>\n", " <td>2484</td>\n", " <td>2691</td>\n", " </tr>\n", " <tr>\n", " <th>ptwiki</th>\n", " <td>2345</td>\n", " <td>2848</td>\n", " <td>3079</td>\n", " <td>3236</td>\n", " <td>3361</td>\n", " </tr>\n", " <tr>\n", " <th>ruwiki</th>\n", " <td>1971</td>\n", " <td>2373</td>\n", " <td>2594</td>\n", " <td>2737</td>\n", " <td>2833</td>\n", " </tr>\n", " <tr>\n", " <th>zhwiki</th>\n", " <td>863</td>\n", " <td>1201</td>\n", " <td>1393</td>\n", " <td>1510</td>\n", " <td>1587</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div></div>\n", " </div>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display_h({\n", " 'Median Risk': edit_count_median_risk.style.background_gradient(cmap ='viridis_r').format(\"{:.3f}\"),\n", " 'Number of Edits': edit_count_interval_counts\n", "})" ] }, { "cell_type": "markdown", "id": "678c9879-762e-4e57-b288-c77267901ade", "metadata": {}, "source": [ "Limiting to 15 edits slightly improves the scores without elimating a lot of edits." ] }, { "cell_type": "markdown", "id": "0953fa80-d3dd-4461-88e8-74a90b9b2491", "metadata": {}, "source": [ "## Time Since User Registration" ] }, { "cell_type": "code", "execution_count": 591, "id": "c13b8c20-39cd-48ad-80e5-e2ccec28be2f", "metadata": {}, "outputs": [], "source": [ "non_anon = init_criteria.query(\"\"\"is_anon == False\"\"\").reset_index(drop=True)" ] }, { "cell_type": "code", "execution_count": 443, "id": "811998da-7e61-471e-b0d3-d188f283a308", "metadata": {}, "outputs": [], "source": [ "elapsed_reg_minutes = [1, 5, 30]\n", "elapsed_reg_hours = [1, 2, 4, 12, 24, 48, 72, non_anon.elapsed_reg.max()/60*60]\n", "elapsed_reg_time_intervals = [i*60 for i in elapsed_reg_minutes] + [i*60*60 for i in elapsed_reg_hours]\n", "\n", "elapsed_reg_column_names = [f'{i} min' for i in elapsed_reg_minutes] + [f'{i} hr' if i<=72 else 'max' for i in elapsed_reg_hours]\n", "\n", "elapsed_reg_median_risk = calculate_grouped(non_anon, elapsed_reg_time_intervals, \n", " 'elapsed_reg', column_names=elapsed_reg_column_names)\n", "elapsed_reg_interval_counts = calculate_grouped(non_anon, elapsed_reg_time_intervals, \n", " 'elapsed_reg', column_names=elapsed_reg_column_names, grp_function='count')" ] }, { "cell_type": "code", "execution_count": 484, "id": "94f582b0-384d-490c-90a9-9d87b7b057f4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " <div style=\"display:flex; justify-content: space-evenly;\">\n", " <div>Median Risk <style type=\"text/css\">\n", "#T_38095_row0_col0, #T_38095_row1_col3, #T_38095_row8_col3 {\n", " background-color: #355f8d;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row0_col1 {\n", " background-color: #a5db36;\n", " color: #000000;\n", "}\n", "#T_38095_row0_col2, #T_38095_row7_col4 {\n", " background-color: #1fa187;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row0_col3, #T_38095_row10_col2 {\n", " background-color: #21908d;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row0_col4, #T_38095_row0_col5, #T_38095_row9_col2 {\n", " background-color: #277f8e;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row0_col6, #T_38095_row0_col9, #T_38095_row9_col10, #T_38095_row10_col6 {\n", " background-color: #26828e;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row0_col7 {\n", " background-color: #21918c;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row0_col8 {\n", " background-color: #297b8e;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row0_col10, #T_38095_row5_col1, #T_38095_row5_col2, #T_38095_row5_col3, #T_38095_row5_col4, #T_38095_row5_col5, #T_38095_row5_col6, #T_38095_row5_col7, #T_38095_row5_col8, #T_38095_row5_col9, #T_38095_row8_col0 {\n", " background-color: #fde725;\n", " color: #000000;\n", "}\n", "#T_38095_row1_col0 {\n", " background-color: #26ad81;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row1_col1 {\n", " background-color: #1f988b;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row1_col2, #T_38095_row1_col5, #T_38095_row9_col4, #T_38095_row9_col5 {\n", " background-color: #365c8d;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row1_col4, #T_38095_row8_col4, #T_38095_row8_col5 {\n", " background-color: #3a538b;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row1_col6 {\n", " background-color: #39558c;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row1_col7 {\n", " background-color: #33628d;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row1_col8, #T_38095_row1_col9, #T_38095_row8_col6 {\n", " background-color: #3d4e8a;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row1_col10 {\n", " background-color: #31688e;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row2_col0, #T_38095_row6_col9 {\n", " background-color: #472a7a;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row2_col1, #T_38095_row2_col2, #T_38095_row2_col3, #T_38095_row2_col4, #T_38095_row2_col5, #T_38095_row2_col6, #T_38095_row2_col7, #T_38095_row2_col8, #T_38095_row2_col9, #T_38095_row2_col10, #T_38095_row6_col0 {\n", " background-color: #440154;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row3_col0 {\n", " background-color: #6ece58;\n", " color: #000000;\n", "}\n", "#T_38095_row3_col1, #T_38095_row4_col4, #T_38095_row4_col5, #T_38095_row10_col3 {\n", " background-color: #2a788e;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row3_col2 {\n", " background-color: #481b6d;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row3_col3 {\n", " background-color: #470d60;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row3_col4, #T_38095_row3_col5, #T_38095_row3_col6 {\n", " background-color: #460b5e;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row3_col7 {\n", " background-color: #482374;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row3_col8 {\n", " background-color: #481769;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row3_col9 {\n", " background-color: #482071;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row3_col10 {\n", " background-color: #38588c;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row4_col0 {\n", " background-color: #24868e;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row4_col1, #T_38095_row7_col5 {\n", " background-color: #22a884;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row4_col2 {\n", " background-color: #24878e;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row4_col3 {\n", " background-color: #23888e;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row4_col6 {\n", " background-color: #2c738e;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row4_col7 {\n", " background-color: #297a8e;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row4_col8, #T_38095_row4_col9 {\n", " background-color: #2e6d8e;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row4_col10, #T_38095_row7_col6 {\n", " background-color: #2eb37c;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row5_col0 {\n", " background-color: #4ec36b;\n", " color: #000000;\n", "}\n", "#T_38095_row5_col10 {\n", " background-color: #75d054;\n", " color: #000000;\n", "}\n", "#T_38095_row6_col1 {\n", " background-color: #472f7d;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row6_col2 {\n", " background-color: #3b518b;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row6_col3 {\n", " background-color: #443983;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row6_col4, #T_38095_row6_col5 {\n", " background-color: #482979;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row6_col6, #T_38095_row6_col8 {\n", " background-color: #46337f;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row6_col7 {\n", " background-color: #453581;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row6_col10 {\n", " background-color: #287c8e;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row7_col0 {\n", " background-color: #2e6f8e;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row7_col1, #T_38095_row8_col1 {\n", " background-color: #d2e21b;\n", " color: #000000;\n", "}\n", "#T_38095_row7_col2 {\n", " background-color: #2db27d;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row7_col3 {\n", " background-color: #2ab07f;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row7_col7 {\n", " background-color: #32b67a;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row7_col8, #T_38095_row10_col9 {\n", " background-color: #1f9f88;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row7_col9 {\n", " background-color: #21a585;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row7_col10, #T_38095_row9_col0 {\n", " background-color: #1f958b;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row8_col2, #T_38095_row9_col6, #T_38095_row9_col8, #T_38095_row9_col9 {\n", " background-color: #32658e;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row8_col7 {\n", " background-color: #375b8d;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row8_col8, #T_38095_row8_col9 {\n", " background-color: #433d84;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row8_col10 {\n", " background-color: #3e4c8a;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row9_col1, #T_38095_row10_col10 {\n", " background-color: #35b779;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row9_col3 {\n", " background-color: #2d708e;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row9_col7 {\n", " background-color: #2c728e;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row10_col0 {\n", " background-color: #471063;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row10_col1 {\n", " background-color: #54c568;\n", " color: #000000;\n", "}\n", "#T_38095_row10_col4, #T_38095_row10_col5 {\n", " background-color: #306a8e;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row10_col7 {\n", " background-color: #23898e;\n", " color: #f1f1f1;\n", "}\n", "#T_38095_row10_col8 {\n", " background-color: #1f978b;\n", " color: #f1f1f1;\n", "}\n", "</style>\n", "<table id=\"T_38095\">\n", " <thead>\n", " <tr>\n", " <th class=\"blank level0\" > </th>\n", " <th id=\"T_38095_level0_col0\" class=\"col_heading level0 col0\" >1 min</th>\n", " <th id=\"T_38095_level0_col1\" class=\"col_heading level0 col1\" >5 min</th>\n", " <th id=\"T_38095_level0_col2\" class=\"col_heading level0 col2\" >30 min</th>\n", " <th id=\"T_38095_level0_col3\" class=\"col_heading level0 col3\" >1 hr</th>\n", " <th id=\"T_38095_level0_col4\" class=\"col_heading level0 col4\" >2 hr</th>\n", " <th id=\"T_38095_level0_col5\" class=\"col_heading level0 col5\" >4 hr</th>\n", " <th id=\"T_38095_level0_col6\" class=\"col_heading level0 col6\" >12 hr</th>\n", " <th id=\"T_38095_level0_col7\" class=\"col_heading level0 col7\" >24 hr</th>\n", " <th id=\"T_38095_level0_col8\" class=\"col_heading level0 col8\" >48 hr</th>\n", " <th id=\"T_38095_level0_col9\" class=\"col_heading level0 col9\" >72 hr</th>\n", " <th id=\"T_38095_level0_col10\" class=\"col_heading level0 col10\" >max</th>\n", " </tr>\n", " <tr>\n", " <th class=\"index_name level0\" >wiki_db</th>\n", " <th class=\"blank col0\" > </th>\n", " <th class=\"blank col1\" > </th>\n", " <th class=\"blank col2\" > </th>\n", " <th class=\"blank col3\" > </th>\n", " <th class=\"blank col4\" > </th>\n", " <th class=\"blank col5\" > </th>\n", " <th class=\"blank col6\" > </th>\n", " <th class=\"blank col7\" > </th>\n", " <th class=\"blank col8\" > </th>\n", " <th class=\"blank col9\" > </th>\n", " <th class=\"blank col10\" > </th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th id=\"T_38095_level0_row0\" class=\"row_heading level0 row0\" >dewiki</th>\n", " <td id=\"T_38095_row0_col0\" class=\"data row0 col0\" >0.953</td>\n", " <td id=\"T_38095_row0_col1\" class=\"data row0 col1\" >0.936</td>\n", " <td id=\"T_38095_row0_col2\" class=\"data row0 col2\" >0.930</td>\n", " <td id=\"T_38095_row0_col3\" class=\"data row0 col3\" >0.930</td>\n", " <td id=\"T_38095_row0_col4\" class=\"data row0 col4\" >0.929</td>\n", " <td id=\"T_38095_row0_col5\" class=\"data row0 col5\" >0.929</td>\n", " <td id=\"T_38095_row0_col6\" class=\"data row0 col6\" >0.928</td>\n", " <td id=\"T_38095_row0_col7\" class=\"data row0 col7\" >0.927</td>\n", " <td id=\"T_38095_row0_col8\" class=\"data row0 col8\" >0.926</td>\n", " <td id=\"T_38095_row0_col9\" class=\"data row0 col9\" >0.924</td>\n", " <td id=\"T_38095_row0_col10\" class=\"data row0 col10\" >0.876</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_38095_level0_row1\" class=\"row_heading level0 row1\" >enwiki</th>\n", " <td id=\"T_38095_row1_col0\" class=\"data row1 col0\" >0.937</td>\n", " <td id=\"T_38095_row1_col1\" class=\"data row1 col1\" >0.941</td>\n", " <td id=\"T_38095_row1_col2\" class=\"data row1 col2\" >0.938</td>\n", " <td id=\"T_38095_row1_col3\" class=\"data row1 col3\" >0.936</td>\n", " <td id=\"T_38095_row1_col4\" class=\"data row1 col4\" >0.935</td>\n", " <td id=\"T_38095_row1_col5\" class=\"data row1 col5\" >0.934</td>\n", " <td id=\"T_38095_row1_col6\" class=\"data row1 col6\" >0.934</td>\n", " <td id=\"T_38095_row1_col7\" class=\"data row1 col7\" >0.933</td>\n", " <td id=\"T_38095_row1_col8\" class=\"data row1 col8\" >0.932</td>\n", " <td id=\"T_38095_row1_col9\" class=\"data row1 col9\" >0.931</td>\n", " <td id=\"T_38095_row1_col10\" class=\"data row1 col10\" >0.908</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_38095_level0_row2\" class=\"row_heading level0 row2\" >eswiki</th>\n", " <td id=\"T_38095_row2_col0\" class=\"data row2 col0\" >0.962</td>\n", " <td id=\"T_38095_row2_col1\" class=\"data row2 col1\" >0.949</td>\n", " <td id=\"T_38095_row2_col2\" class=\"data row2 col2\" >0.946</td>\n", " <td id=\"T_38095_row2_col3\" class=\"data row2 col3\" >0.945</td>\n", " <td id=\"T_38095_row2_col4\" class=\"data row2 col4\" >0.944</td>\n", " <td id=\"T_38095_row2_col5\" class=\"data row2 col5\" >0.944</td>\n", " <td id=\"T_38095_row2_col6\" class=\"data row2 col6\" >0.943</td>\n", " <td id=\"T_38095_row2_col7\" class=\"data row2 col7\" >0.943</td>\n", " <td id=\"T_38095_row2_col8\" class=\"data row2 col8\" >0.940</td>\n", " <td id=\"T_38095_row2_col9\" class=\"data row2 col9\" >0.939</td>\n", " <td id=\"T_38095_row2_col10\" class=\"data row2 col10\" >0.924</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_38095_level0_row3\" class=\"row_heading level0 row3\" >fawiki</th>\n", " <td id=\"T_38095_row3_col0\" class=\"data row3 col0\" >0.929</td>\n", " <td id=\"T_38095_row3_col1\" class=\"data row3 col1\" >0.943</td>\n", " <td id=\"T_38095_row3_col2\" class=\"data row3 col2\" >0.944</td>\n", " <td id=\"T_38095_row3_col3\" class=\"data row3 col3\" >0.944</td>\n", " <td id=\"T_38095_row3_col4\" class=\"data row3 col4\" >0.943</td>\n", " <td id=\"T_38095_row3_col5\" class=\"data row3 col5\" >0.943</td>\n", " <td id=\"T_38095_row3_col6\" class=\"data row3 col6\" >0.942</td>\n", " <td id=\"T_38095_row3_col7\" class=\"data row3 col7\" >0.940</td>\n", " <td id=\"T_38095_row3_col8\" class=\"data row3 col8\" >0.938</td>\n", " <td id=\"T_38095_row3_col9\" class=\"data row3 col9\" >0.936</td>\n", " <td id=\"T_38095_row3_col10\" class=\"data row3 col10\" >0.911</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_38095_level0_row4\" class=\"row_heading level0 row4\" >frwiki</th>\n", " <td id=\"T_38095_row4_col0\" class=\"data row4 col0\" >0.945</td>\n", " <td id=\"T_38095_row4_col1\" class=\"data row4 col1\" >0.940</td>\n", " <td id=\"T_38095_row4_col2\" class=\"data row4 col2\" >0.933</td>\n", " <td id=\"T_38095_row4_col3\" class=\"data row4 col3\" >0.931</td>\n", " <td id=\"T_38095_row4_col4\" class=\"data row4 col4\" >0.930</td>\n", " <td id=\"T_38095_row4_col5\" class=\"data row4 col5\" >0.930</td>\n", " <td id=\"T_38095_row4_col6\" class=\"data row4 col6\" >0.930</td>\n", " <td id=\"T_38095_row4_col7\" class=\"data row4 col7\" >0.930</td>\n", " <td id=\"T_38095_row4_col8\" class=\"data row4 col8\" >0.928</td>\n", " <td id=\"T_38095_row4_col9\" class=\"data row4 col9\" >0.927</td>\n", " <td id=\"T_38095_row4_col10\" class=\"data row4 col10\" >0.893</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_38095_level0_row5\" class=\"row_heading level0 row5\" >idwiki</th>\n", " <td id=\"T_38095_row5_col0\" class=\"data row5 col0\" >0.932</td>\n", " <td id=\"T_38095_row5_col1\" class=\"data row5 col1\" >0.934</td>\n", " <td id=\"T_38095_row5_col2\" class=\"data row5 col2\" >0.918</td>\n", " <td id=\"T_38095_row5_col3\" class=\"data row5 col3\" >0.915</td>\n", " <td id=\"T_38095_row5_col4\" class=\"data row5 col4\" >0.909</td>\n", " <td id=\"T_38095_row5_col5\" class=\"data row5 col5\" >0.909</td>\n", " <td id=\"T_38095_row5_col6\" class=\"data row5 col6\" >0.909</td>\n", " <td id=\"T_38095_row5_col7\" class=\"data row5 col7\" >0.911</td>\n", " <td id=\"T_38095_row5_col8\" class=\"data row5 col8\" >0.906</td>\n", " <td id=\"T_38095_row5_col9\" class=\"data row5 col9\" >0.905</td>\n", " <td id=\"T_38095_row5_col10\" class=\"data row5 col10\" >0.886</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_38095_level0_row6\" class=\"row_heading level0 row6\" >itwiki</th>\n", " <td id=\"T_38095_row6_col0\" class=\"data row6 col0\" >0.968</td>\n", " <td id=\"T_38095_row6_col1\" class=\"data row6 col1\" >0.947</td>\n", " <td id=\"T_38095_row6_col2\" class=\"data row6 col2\" >0.939</td>\n", " <td id=\"T_38095_row6_col3\" class=\"data row6 col3\" >0.940</td>\n", " <td id=\"T_38095_row6_col4\" class=\"data row6 col4\" >0.940</td>\n", " <td id=\"T_38095_row6_col5\" class=\"data row6 col5\" >0.940</td>\n", " <td id=\"T_38095_row6_col6\" class=\"data row6 col6\" >0.938</td>\n", " <td id=\"T_38095_row6_col7\" class=\"data row6 col7\" >0.938</td>\n", " <td id=\"T_38095_row6_col8\" class=\"data row6 col8\" >0.935</td>\n", " <td id=\"T_38095_row6_col9\" class=\"data row6 col9\" >0.935</td>\n", " <td id=\"T_38095_row6_col10\" class=\"data row6 col10\" >0.904</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_38095_level0_row7\" class=\"row_heading level0 row7\" >jawiki</th>\n", " <td id=\"T_38095_row7_col0\" class=\"data row7 col0\" >0.950</td>\n", " <td id=\"T_38095_row7_col1\" class=\"data row7 col1\" >0.935</td>\n", " <td id=\"T_38095_row7_col2\" class=\"data row7 col2\" >0.928</td>\n", " <td id=\"T_38095_row7_col3\" class=\"data row7 col3\" >0.926</td>\n", " <td id=\"T_38095_row7_col4\" class=\"data row7 col4\" >0.924</td>\n", " <td id=\"T_38095_row7_col5\" class=\"data row7 col5\" >0.923</td>\n", " <td id=\"T_38095_row7_col6\" class=\"data row7 col6\" >0.921</td>\n", " <td id=\"T_38095_row7_col7\" class=\"data row7 col7\" >0.922</td>\n", " <td id=\"T_38095_row7_col8\" class=\"data row7 col8\" >0.921</td>\n", " <td id=\"T_38095_row7_col9\" class=\"data row7 col9\" >0.919</td>\n", " <td id=\"T_38095_row7_col10\" class=\"data row7 col10\" >0.899</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_38095_level0_row8\" class=\"row_heading level0 row8\" >ptwiki</th>\n", " <td id=\"T_38095_row8_col0\" class=\"data row8 col0\" >0.918</td>\n", " <td id=\"T_38095_row8_col1\" class=\"data row8 col1\" >0.935</td>\n", " <td id=\"T_38095_row8_col2\" class=\"data row8 col2\" >0.937</td>\n", " <td id=\"T_38095_row8_col3\" class=\"data row8 col3\" >0.936</td>\n", " <td id=\"T_38095_row8_col4\" class=\"data row8 col4\" >0.935</td>\n", " <td id=\"T_38095_row8_col5\" class=\"data row8 col5\" >0.935</td>\n", " <td id=\"T_38095_row8_col6\" class=\"data row8 col6\" >0.935</td>\n", " <td id=\"T_38095_row8_col7\" class=\"data row8 col7\" >0.934</td>\n", " <td id=\"T_38095_row8_col8\" class=\"data row8 col8\" >0.934</td>\n", " <td id=\"T_38095_row8_col9\" class=\"data row8 col9\" >0.933</td>\n", " <td id=\"T_38095_row8_col10\" class=\"data row8 col10\" >0.913</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_38095_level0_row9\" class=\"row_heading level0 row9\" >ruwiki</th>\n", " <td id=\"T_38095_row9_col0\" class=\"data row9 col0\" >0.942</td>\n", " <td id=\"T_38095_row9_col1\" class=\"data row9 col1\" >0.939</td>\n", " <td id=\"T_38095_row9_col2\" class=\"data row9 col2\" >0.934</td>\n", " <td id=\"T_38095_row9_col3\" class=\"data row9 col3\" >0.934</td>\n", " <td id=\"T_38095_row9_col4\" class=\"data row9 col4\" >0.934</td>\n", " <td id=\"T_38095_row9_col5\" class=\"data row9 col5\" >0.934</td>\n", " <td id=\"T_38095_row9_col6\" class=\"data row9 col6\" >0.932</td>\n", " <td id=\"T_38095_row9_col7\" class=\"data row9 col7\" >0.931</td>\n", " <td id=\"T_38095_row9_col8\" class=\"data row9 col8\" >0.929</td>\n", " <td id=\"T_38095_row9_col9\" class=\"data row9 col9\" >0.928</td>\n", " <td id=\"T_38095_row9_col10\" class=\"data row9 col10\" >0.903</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_38095_level0_row10\" class=\"row_heading level0 row10\" >zhwiki</th>\n", " <td id=\"T_38095_row10_col0\" class=\"data row10 col0\" >0.966</td>\n", " <td id=\"T_38095_row10_col1\" class=\"data row10 col1\" >0.938</td>\n", " <td id=\"T_38095_row10_col2\" class=\"data row10 col2\" >0.932</td>\n", " <td id=\"T_38095_row10_col3\" class=\"data row10 col3\" >0.933</td>\n", " <td id=\"T_38095_row10_col4\" class=\"data row10 col4\" >0.932</td>\n", " <td id=\"T_38095_row10_col5\" class=\"data row10 col5\" >0.932</td>\n", " <td id=\"T_38095_row10_col6\" class=\"data row10 col6\" >0.928</td>\n", " <td id=\"T_38095_row10_col7\" class=\"data row10 col7\" >0.928</td>\n", " <td id=\"T_38095_row10_col8\" class=\"data row10 col8\" >0.922</td>\n", " <td id=\"T_38095_row10_col9\" class=\"data row10 col9\" >0.920</td>\n", " <td id=\"T_38095_row10_col10\" class=\"data row10 col10\" >0.892</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div><div>Number of Edits <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>1 min</th>\n", " <th>5 min</th>\n", " <th>30 min</th>\n", " <th>1 hr</th>\n", " <th>2 hr</th>\n", " <th>4 hr</th>\n", " <th>12 hr</th>\n", " <th>24 hr</th>\n", " <th>48 hr</th>\n", " <th>72 hr</th>\n", " <th>max</th>\n", " </tr>\n", " <tr>\n", " <th>wiki_db</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>dewiki</th>\n", " <td>40</td>\n", " <td>350</td>\n", " <td>855</td>\n", " <td>995</td>\n", " <td>1050</td>\n", " <td>1094</td>\n", " <td>1149</td>\n", " <td>1191</td>\n", " <td>1247</td>\n", " <td>1295</td>\n", " <td>2299</td>\n", " </tr>\n", " <tr>\n", " <th>enwiki</th>\n", " <td>794</td>\n", " <td>6181</td>\n", " <td>14757</td>\n", " <td>16826</td>\n", " <td>18287</td>\n", " <td>19213</td>\n", " <td>20335</td>\n", " <td>21605</td>\n", " <td>22581</td>\n", " <td>23170</td>\n", " <td>34976</td>\n", " </tr>\n", " <tr>\n", " <th>eswiki</th>\n", " <td>159</td>\n", " <td>1278</td>\n", " <td>2855</td>\n", " <td>3198</td>\n", " <td>3459</td>\n", " <td>3578</td>\n", " <td>3733</td>\n", " <td>3981</td>\n", " <td>4212</td>\n", " <td>4338</td>\n", " <td>5851</td>\n", " </tr>\n", " <tr>\n", " <th>fawiki</th>\n", " <td>8</td>\n", " <td>247</td>\n", " <td>772</td>\n", " <td>936</td>\n", " <td>1038</td>\n", " <td>1103</td>\n", " <td>1164</td>\n", " <td>1258</td>\n", " <td>1331</td>\n", " <td>1370</td>\n", " <td>2442</td>\n", " </tr>\n", " <tr>\n", " <th>frwiki</th>\n", " <td>53</td>\n", " <td>652</td>\n", " <td>1557</td>\n", " <td>1766</td>\n", " <td>1906</td>\n", " <td>1995</td>\n", " <td>2083</td>\n", " <td>2160</td>\n", " <td>2260</td>\n", " <td>2318</td>\n", " <td>3611</td>\n", " </tr>\n", " <tr>\n", " <th>idwiki</th>\n", " <td>8</td>\n", " <td>56</td>\n", " <td>153</td>\n", " <td>194</td>\n", " <td>230</td>\n", " <td>242</td>\n", " <td>246</td>\n", " <td>270</td>\n", " <td>284</td>\n", " <td>296</td>\n", " <td>527</td>\n", " </tr>\n", " <tr>\n", " <th>itwiki</th>\n", " <td>69</td>\n", " <td>434</td>\n", " <td>833</td>\n", " <td>925</td>\n", " <td>1004</td>\n", " <td>1048</td>\n", " <td>1097</td>\n", " <td>1152</td>\n", " <td>1212</td>\n", " <td>1227</td>\n", " <td>1820</td>\n", " </tr>\n", " <tr>\n", " <th>jawiki</th>\n", " <td>99</td>\n", " <td>645</td>\n", " <td>1582</td>\n", " <td>1726</td>\n", " <td>1817</td>\n", " <td>1876</td>\n", " <td>1950</td>\n", " <td>2016</td>\n", " <td>2075</td>\n", " <td>2111</td>\n", " <td>2691</td>\n", " </tr>\n", " <tr>\n", " <th>ptwiki</th>\n", " <td>22</td>\n", " <td>673</td>\n", " <td>1697</td>\n", " <td>1946</td>\n", " <td>2104</td>\n", " <td>2185</td>\n", " <td>2261</td>\n", " <td>2319</td>\n", " <td>2370</td>\n", " <td>2410</td>\n", " <td>3361</td>\n", " </tr>\n", " <tr>\n", " <th>ruwiki</th>\n", " <td>46</td>\n", " <td>544</td>\n", " <td>1353</td>\n", " <td>1531</td>\n", " <td>1627</td>\n", " <td>1690</td>\n", " <td>1751</td>\n", " <td>1841</td>\n", " <td>1907</td>\n", " <td>1939</td>\n", " <td>2833</td>\n", " </tr>\n", " <tr>\n", " <th>zhwiki</th>\n", " <td>21</td>\n", " <td>238</td>\n", " <td>653</td>\n", " <td>765</td>\n", " <td>794</td>\n", " <td>848</td>\n", " <td>910</td>\n", " <td>964</td>\n", " <td>1044</td>\n", " <td>1072</td>\n", " <td>1587</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div></div>\n", " </div>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display_h({\n", " 'Median Risk': elapsed_reg_median_risk.style.background_gradient(cmap ='viridis_r').format(\"{:.3f}\"),\n", " 'Number of Edits': elapsed_reg_interval_counts\n", "})" ] }, { "cell_type": "markdown", "id": "300da015-79da-4684-92fe-2425180957ea", "metadata": {}, "source": [ "Limiting to 48 hr window significantly improves the scores. However, this only when registered users are considered." ] }, { "cell_type": "markdown", "id": "0acac942-4607-412e-bc82-5b84d69f67cc", "metadata": { "tags": [] }, "source": [ "## Time Since User First Revision" ] }, { "cell_type": "code", "execution_count": 459, "id": "d775040f-910a-4bf7-8585-4b3d4f7b95a4", "metadata": {}, "outputs": [], "source": [ "elapsed_first_rev_minutes = [1, 5, 30]\n", "elapsed_first_rev_hours = [1, 2, 4, 12, 24, 48, 72, non_anon.elapsed_first_rev.max()/60*60]\n", "elapsed_first_rev_time_intervals = [i*60 for i in elapsed_first_rev_minutes] + [i*60*60 for i in elapsed_first_rev_hours]\n", "\n", "elapsed_first_rev_column_names = [f'{i} min' for i in elapsed_first_rev_minutes] + [f'{i} hr' if i<=72 else 'max' for i in elapsed_first_rev_hours]\n", "\n", "elapsed_first_rev_median_risk = calculate_grouped(non_anon, elapsed_first_rev_time_intervals, \n", " 'elapsed_first_rev', column_names=elapsed_first_rev_column_names)\n", "elapsed_first_rev_counts = calculate_grouped(non_anon, elapsed_first_rev_time_intervals, \n", " 'elapsed_first_rev', column_names=elapsed_first_rev_column_names, grp_function='count')" ] }, { "cell_type": "code", "execution_count": 485, "id": "ffb1a034-f114-49ee-93ea-1aeb92221fbd", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " <div style=\"display:flex; justify-content: space-evenly;\">\n", " <div>Median Risk <style type=\"text/css\">\n", "#T_59ca1_row0_col0 {\n", " background-color: #28ae80;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row0_col1 {\n", " background-color: #cde11d;\n", " color: #000000;\n", "}\n", "#T_59ca1_row0_col2 {\n", " background-color: #46c06f;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row0_col3 {\n", " background-color: #3dbc74;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row0_col4 {\n", " background-color: #44bf70;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row0_col5 {\n", " background-color: #4cc26c;\n", " color: #000000;\n", "}\n", "#T_59ca1_row0_col6 {\n", " background-color: #4ec36b;\n", " color: #000000;\n", "}\n", "#T_59ca1_row0_col7 {\n", " background-color: #54c568;\n", " color: #000000;\n", "}\n", "#T_59ca1_row0_col8 {\n", " background-color: #5cc863;\n", " color: #000000;\n", "}\n", "#T_59ca1_row0_col9 {\n", " background-color: #3fbc73;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row0_col10, #T_59ca1_row5_col1, #T_59ca1_row5_col2, #T_59ca1_row5_col3, #T_59ca1_row5_col4, #T_59ca1_row5_col5, #T_59ca1_row5_col6, #T_59ca1_row5_col7, #T_59ca1_row5_col8, #T_59ca1_row5_col9, #T_59ca1_row10_col0 {\n", " background-color: #fde725;\n", " color: #000000;\n", "}\n", "#T_59ca1_row1_col0 {\n", " background-color: #3b518b;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row1_col1 {\n", " background-color: #32658e;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row1_col2, #T_59ca1_row1_col6 {\n", " background-color: #375a8c;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row1_col3 {\n", " background-color: #39568c;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row1_col4 {\n", " background-color: #38598c;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row1_col5, #T_59ca1_row9_col9 {\n", " background-color: #375b8d;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row1_col7 {\n", " background-color: #355f8d;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row1_col8 {\n", " background-color: #365c8d;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row1_col9, #T_59ca1_row3_col1 {\n", " background-color: #3a548c;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row1_col10, #T_59ca1_row9_col8 {\n", " background-color: #30698e;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row2_col0, #T_59ca1_row2_col1, #T_59ca1_row2_col2, #T_59ca1_row2_col3, #T_59ca1_row2_col4, #T_59ca1_row2_col5, #T_59ca1_row2_col6, #T_59ca1_row2_col7, #T_59ca1_row2_col8, #T_59ca1_row2_col9, #T_59ca1_row2_col10 {\n", " background-color: #440154;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row3_col0, #T_59ca1_row8_col8 {\n", " background-color: #424086;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row3_col2 {\n", " background-color: #481a6c;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row3_col3, #T_59ca1_row3_col4 {\n", " background-color: #481d6f;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row3_col5 {\n", " background-color: #482071;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row3_col6 {\n", " background-color: #472c7a;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row3_col7 {\n", " background-color: #433e85;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row3_col8 {\n", " background-color: #433d84;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row3_col9 {\n", " background-color: #463480;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row3_col10 {\n", " background-color: #38588c;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row4_col0, #T_59ca1_row4_col2, #T_59ca1_row4_col8 {\n", " background-color: #277f8e;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row4_col1 {\n", " background-color: #1f9a8a;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row4_col3 {\n", " background-color: #29798e;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row4_col4, #T_59ca1_row4_col7 {\n", " background-color: #277e8e;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row4_col5, #T_59ca1_row9_col2, #T_59ca1_row10_col3 {\n", " background-color: #287c8e;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row4_col6 {\n", " background-color: #2a778e;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row4_col9, #T_59ca1_row9_col3 {\n", " background-color: #2c718e;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row4_col10 {\n", " background-color: #2cb17e;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row5_col0 {\n", " background-color: #2eb37c;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row5_col10 {\n", " background-color: #6ece58;\n", " color: #000000;\n", "}\n", "#T_59ca1_row6_col0 {\n", " background-color: #414287;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row6_col1, #T_59ca1_row9_col5 {\n", " background-color: #2e6d8e;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row6_col2 {\n", " background-color: #365d8d;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row6_col3 {\n", " background-color: #423f85;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row6_col4 {\n", " background-color: #443a83;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row6_col5, #T_59ca1_row6_col6 {\n", " background-color: #3f4788;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row6_col7 {\n", " background-color: #3a538b;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row6_col8, #T_59ca1_row8_col4, #T_59ca1_row8_col5, #T_59ca1_row8_col7 {\n", " background-color: #3e4a89;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row6_col9 {\n", " background-color: #424186;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row6_col10 {\n", " background-color: #297b8e;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row7_col0 {\n", " background-color: #482979;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row7_col1, #T_59ca1_row8_col0 {\n", " background-color: #2b748e;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row7_col2 {\n", " background-color: #23898e;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row7_col3 {\n", " background-color: #24868e;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row7_col4, #T_59ca1_row10_col7 {\n", " background-color: #228d8d;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row7_col5 {\n", " background-color: #1f968b;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row7_col6 {\n", " background-color: #22a884;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row7_col7 {\n", " background-color: #25ac82;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row7_col8 {\n", " background-color: #1fa287;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row7_col9, #T_59ca1_row7_col10 {\n", " background-color: #1f958b;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row8_col1, #T_59ca1_row10_col4 {\n", " background-color: #27808e;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row8_col2 {\n", " background-color: #39558c;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row8_col3 {\n", " background-color: #3d4e8a;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row8_col6 {\n", " background-color: #3f4889;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row8_col9 {\n", " background-color: #443983;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row8_col10 {\n", " background-color: #3e4c8a;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row9_col0 {\n", " background-color: #2a788e;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row9_col1 {\n", " background-color: #238a8d;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row9_col4 {\n", " background-color: #2f6b8e;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row9_col6 {\n", " background-color: #306a8e;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row9_col7 {\n", " background-color: #2e6e8e;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row9_col10 {\n", " background-color: #26828e;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row10_col1 {\n", " background-color: #8bd646;\n", " color: #000000;\n", "}\n", "#T_59ca1_row10_col2 {\n", " background-color: #228c8d;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row10_col5 {\n", " background-color: #21908d;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row10_col6 {\n", " background-color: #218e8d;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row10_col8 {\n", " background-color: #1e9c89;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row10_col9 {\n", " background-color: #21a685;\n", " color: #f1f1f1;\n", "}\n", "#T_59ca1_row10_col10 {\n", " background-color: #34b679;\n", " color: #f1f1f1;\n", "}\n", "</style>\n", "<table id=\"T_59ca1\">\n", " <thead>\n", " <tr>\n", " <th class=\"blank level0\" > </th>\n", " <th id=\"T_59ca1_level0_col0\" class=\"col_heading level0 col0\" >1 min</th>\n", " <th id=\"T_59ca1_level0_col1\" class=\"col_heading level0 col1\" >5 min</th>\n", " <th id=\"T_59ca1_level0_col2\" class=\"col_heading level0 col2\" >30 min</th>\n", " <th id=\"T_59ca1_level0_col3\" class=\"col_heading level0 col3\" >1 hr</th>\n", " <th id=\"T_59ca1_level0_col4\" class=\"col_heading level0 col4\" >2 hr</th>\n", " <th id=\"T_59ca1_level0_col5\" class=\"col_heading level0 col5\" >4 hr</th>\n", " <th id=\"T_59ca1_level0_col6\" class=\"col_heading level0 col6\" >12 hr</th>\n", " <th id=\"T_59ca1_level0_col7\" class=\"col_heading level0 col7\" >24 hr</th>\n", " <th id=\"T_59ca1_level0_col8\" class=\"col_heading level0 col8\" >48 hr</th>\n", " <th id=\"T_59ca1_level0_col9\" class=\"col_heading level0 col9\" >72 hr</th>\n", " <th id=\"T_59ca1_level0_col10\" class=\"col_heading level0 col10\" >max</th>\n", " </tr>\n", " <tr>\n", " <th class=\"index_name level0\" >wiki_db</th>\n", " <th class=\"blank col0\" > </th>\n", " <th class=\"blank col1\" > </th>\n", " <th class=\"blank col2\" > </th>\n", " <th class=\"blank col3\" > </th>\n", " <th class=\"blank col4\" > </th>\n", " <th class=\"blank col5\" > </th>\n", " <th class=\"blank col6\" > </th>\n", " <th class=\"blank col7\" > </th>\n", " <th class=\"blank col8\" > </th>\n", " <th class=\"blank col9\" > </th>\n", " <th class=\"blank col10\" > </th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th id=\"T_59ca1_level0_row0\" class=\"row_heading level0 row0\" >dewiki</th>\n", " <td id=\"T_59ca1_row0_col0\" class=\"data row0 col0\" >0.916</td>\n", " <td id=\"T_59ca1_row0_col1\" class=\"data row0 col1\" >0.917</td>\n", " <td id=\"T_59ca1_row0_col2\" class=\"data row0 col2\" >0.917</td>\n", " <td id=\"T_59ca1_row0_col3\" class=\"data row0 col3\" >0.916</td>\n", " <td id=\"T_59ca1_row0_col4\" class=\"data row0 col4\" >0.915</td>\n", " <td id=\"T_59ca1_row0_col5\" class=\"data row0 col5\" >0.914</td>\n", " <td id=\"T_59ca1_row0_col6\" class=\"data row0 col6\" >0.913</td>\n", " <td id=\"T_59ca1_row0_col7\" class=\"data row0 col7\" >0.913</td>\n", " <td id=\"T_59ca1_row0_col8\" class=\"data row0 col8\" >0.910</td>\n", " <td id=\"T_59ca1_row0_col9\" class=\"data row0 col9\" >0.908</td>\n", " <td id=\"T_59ca1_row0_col10\" class=\"data row0 col10\" >0.876</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_59ca1_level0_row1\" class=\"row_heading level0 row1\" >enwiki</th>\n", " <td id=\"T_59ca1_row1_col0\" class=\"data row1 col0\" >0.930</td>\n", " <td id=\"T_59ca1_row1_col1\" class=\"data row1 col1\" >0.933</td>\n", " <td id=\"T_59ca1_row1_col2\" class=\"data row1 col2\" >0.932</td>\n", " <td id=\"T_59ca1_row1_col3\" class=\"data row1 col3\" >0.931</td>\n", " <td id=\"T_59ca1_row1_col4\" class=\"data row1 col4\" >0.930</td>\n", " <td id=\"T_59ca1_row1_col5\" class=\"data row1 col5\" >0.930</td>\n", " <td id=\"T_59ca1_row1_col6\" class=\"data row1 col6\" >0.929</td>\n", " <td id=\"T_59ca1_row1_col7\" class=\"data row1 col7\" >0.928</td>\n", " <td id=\"T_59ca1_row1_col8\" class=\"data row1 col8\" >0.926</td>\n", " <td id=\"T_59ca1_row1_col9\" class=\"data row1 col9\" >0.925</td>\n", " <td id=\"T_59ca1_row1_col10\" class=\"data row1 col10\" >0.908</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_59ca1_level0_row2\" class=\"row_heading level0 row2\" >eswiki</th>\n", " <td id=\"T_59ca1_row2_col0\" class=\"data row2 col0\" >0.939</td>\n", " <td id=\"T_59ca1_row2_col1\" class=\"data row2 col1\" >0.941</td>\n", " <td id=\"T_59ca1_row2_col2\" class=\"data row2 col2\" >0.941</td>\n", " <td id=\"T_59ca1_row2_col3\" class=\"data row2 col3\" >0.941</td>\n", " <td id=\"T_59ca1_row2_col4\" class=\"data row2 col4\" >0.940</td>\n", " <td id=\"T_59ca1_row2_col5\" class=\"data row2 col5\" >0.940</td>\n", " <td id=\"T_59ca1_row2_col6\" class=\"data row2 col6\" >0.939</td>\n", " <td id=\"T_59ca1_row2_col7\" class=\"data row2 col7\" >0.939</td>\n", " <td id=\"T_59ca1_row2_col8\" class=\"data row2 col8\" >0.937</td>\n", " <td id=\"T_59ca1_row2_col9\" class=\"data row2 col9\" >0.936</td>\n", " <td id=\"T_59ca1_row2_col10\" class=\"data row2 col10\" >0.924</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_59ca1_level0_row3\" class=\"row_heading level0 row3\" >fawiki</th>\n", " <td id=\"T_59ca1_row3_col0\" class=\"data row3 col0\" >0.932</td>\n", " <td id=\"T_59ca1_row3_col1\" class=\"data row3 col1\" >0.934</td>\n", " <td id=\"T_59ca1_row3_col2\" class=\"data row3 col2\" >0.939</td>\n", " <td id=\"T_59ca1_row3_col3\" class=\"data row3 col3\" >0.938</td>\n", " <td id=\"T_59ca1_row3_col4\" class=\"data row3 col4\" >0.937</td>\n", " <td id=\"T_59ca1_row3_col5\" class=\"data row3 col5\" >0.937</td>\n", " <td id=\"T_59ca1_row3_col6\" class=\"data row3 col6\" >0.934</td>\n", " <td id=\"T_59ca1_row3_col7\" class=\"data row3 col7\" >0.932</td>\n", " <td id=\"T_59ca1_row3_col8\" class=\"data row3 col8\" >0.930</td>\n", " <td id=\"T_59ca1_row3_col9\" class=\"data row3 col9\" >0.930</td>\n", " <td id=\"T_59ca1_row3_col10\" class=\"data row3 col10\" >0.911</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_59ca1_level0_row4\" class=\"row_heading level0 row4\" >frwiki</th>\n", " <td id=\"T_59ca1_row4_col0\" class=\"data row4 col0\" >0.924</td>\n", " <td id=\"T_59ca1_row4_col1\" class=\"data row4 col1\" >0.927</td>\n", " <td id=\"T_59ca1_row4_col2\" class=\"data row4 col2\" >0.927</td>\n", " <td id=\"T_59ca1_row4_col3\" class=\"data row4 col3\" >0.926</td>\n", " <td id=\"T_59ca1_row4_col4\" class=\"data row4 col4\" >0.925</td>\n", " <td id=\"T_59ca1_row4_col5\" class=\"data row4 col5\" >0.925</td>\n", " <td id=\"T_59ca1_row4_col6\" class=\"data row4 col6\" >0.925</td>\n", " <td id=\"T_59ca1_row4_col7\" class=\"data row4 col7\" >0.923</td>\n", " <td id=\"T_59ca1_row4_col8\" class=\"data row4 col8\" >0.921</td>\n", " <td id=\"T_59ca1_row4_col9\" class=\"data row4 col9\" >0.921</td>\n", " <td id=\"T_59ca1_row4_col10\" class=\"data row4 col10\" >0.893</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_59ca1_level0_row5\" class=\"row_heading level0 row5\" >idwiki</th>\n", " <td id=\"T_59ca1_row5_col0\" class=\"data row5 col0\" >0.916</td>\n", " <td id=\"T_59ca1_row5_col1\" class=\"data row5 col1\" >0.915</td>\n", " <td id=\"T_59ca1_row5_col2\" class=\"data row5 col2\" >0.907</td>\n", " <td id=\"T_59ca1_row5_col3\" class=\"data row5 col3\" >0.905</td>\n", " <td id=\"T_59ca1_row5_col4\" class=\"data row5 col4\" >0.904</td>\n", " <td id=\"T_59ca1_row5_col5\" class=\"data row5 col5\" >0.904</td>\n", " <td id=\"T_59ca1_row5_col6\" class=\"data row5 col6\" >0.903</td>\n", " <td id=\"T_59ca1_row5_col7\" class=\"data row5 col7\" >0.903</td>\n", " <td id=\"T_59ca1_row5_col8\" class=\"data row5 col8\" >0.901</td>\n", " <td id=\"T_59ca1_row5_col9\" class=\"data row5 col9\" >0.896</td>\n", " <td id=\"T_59ca1_row5_col10\" class=\"data row5 col10\" >0.886</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_59ca1_level0_row6\" class=\"row_heading level0 row6\" >itwiki</th>\n", " <td id=\"T_59ca1_row6_col0\" class=\"data row6 col0\" >0.932</td>\n", " <td id=\"T_59ca1_row6_col1\" class=\"data row6 col1\" >0.932</td>\n", " <td id=\"T_59ca1_row6_col2\" class=\"data row6 col2\" >0.931</td>\n", " <td id=\"T_59ca1_row6_col3\" class=\"data row6 col3\" >0.934</td>\n", " <td id=\"T_59ca1_row6_col4\" class=\"data row6 col4\" >0.934</td>\n", " <td id=\"T_59ca1_row6_col5\" class=\"data row6 col5\" >0.932</td>\n", " <td id=\"T_59ca1_row6_col6\" class=\"data row6 col6\" >0.931</td>\n", " <td id=\"T_59ca1_row6_col7\" class=\"data row6 col7\" >0.929</td>\n", " <td id=\"T_59ca1_row6_col8\" class=\"data row6 col8\" >0.929</td>\n", " <td id=\"T_59ca1_row6_col9\" class=\"data row6 col9\" >0.928</td>\n", " <td id=\"T_59ca1_row6_col10\" class=\"data row6 col10\" >0.904</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_59ca1_level0_row7\" class=\"row_heading level0 row7\" >jawiki</th>\n", " <td id=\"T_59ca1_row7_col0\" class=\"data row7 col0\" >0.935</td>\n", " <td id=\"T_59ca1_row7_col1\" class=\"data row7 col1\" >0.931</td>\n", " <td id=\"T_59ca1_row7_col2\" class=\"data row7 col2\" >0.925</td>\n", " <td id=\"T_59ca1_row7_col3\" class=\"data row7 col3\" >0.924</td>\n", " <td id=\"T_59ca1_row7_col4\" class=\"data row7 col4\" >0.922</td>\n", " <td id=\"T_59ca1_row7_col5\" class=\"data row7 col5\" >0.921</td>\n", " <td id=\"T_59ca1_row7_col6\" class=\"data row7 col6\" >0.917</td>\n", " <td id=\"T_59ca1_row7_col7\" class=\"data row7 col7\" >0.917</td>\n", " <td id=\"T_59ca1_row7_col8\" class=\"data row7 col8\" >0.916</td>\n", " <td id=\"T_59ca1_row7_col9\" class=\"data row7 col9\" >0.915</td>\n", " <td id=\"T_59ca1_row7_col10\" class=\"data row7 col10\" >0.899</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_59ca1_level0_row8\" class=\"row_heading level0 row8\" >ptwiki</th>\n", " <td id=\"T_59ca1_row8_col0\" class=\"data row8 col0\" >0.925</td>\n", " <td id=\"T_59ca1_row8_col1\" class=\"data row8 col1\" >0.930</td>\n", " <td id=\"T_59ca1_row8_col2\" class=\"data row8 col2\" >0.932</td>\n", " <td id=\"T_59ca1_row8_col3\" class=\"data row8 col3\" >0.932</td>\n", " <td id=\"T_59ca1_row8_col4\" class=\"data row8 col4\" >0.932</td>\n", " <td id=\"T_59ca1_row8_col5\" class=\"data row8 col5\" >0.932</td>\n", " <td id=\"T_59ca1_row8_col6\" class=\"data row8 col6\" >0.931</td>\n", " <td id=\"T_59ca1_row8_col7\" class=\"data row8 col7\" >0.931</td>\n", " <td id=\"T_59ca1_row8_col8\" class=\"data row8 col8\" >0.930</td>\n", " <td id=\"T_59ca1_row8_col9\" class=\"data row8 col9\" >0.929</td>\n", " <td id=\"T_59ca1_row8_col10\" class=\"data row8 col10\" >0.913</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_59ca1_level0_row9\" class=\"row_heading level0 row9\" >ruwiki</th>\n", " <td id=\"T_59ca1_row9_col0\" class=\"data row9 col0\" >0.925</td>\n", " <td id=\"T_59ca1_row9_col1\" class=\"data row9 col1\" >0.929</td>\n", " <td id=\"T_59ca1_row9_col2\" class=\"data row9 col2\" >0.927</td>\n", " <td id=\"T_59ca1_row9_col3\" class=\"data row9 col3\" >0.927</td>\n", " <td id=\"T_59ca1_row9_col4\" class=\"data row9 col4\" >0.928</td>\n", " <td id=\"T_59ca1_row9_col5\" class=\"data row9 col5\" >0.927</td>\n", " <td id=\"T_59ca1_row9_col6\" class=\"data row9 col6\" >0.927</td>\n", " <td id=\"T_59ca1_row9_col7\" class=\"data row9 col7\" >0.926</td>\n", " <td id=\"T_59ca1_row9_col8\" class=\"data row9 col8\" >0.925</td>\n", " <td id=\"T_59ca1_row9_col9\" class=\"data row9 col9\" >0.924</td>\n", " <td id=\"T_59ca1_row9_col10\" class=\"data row9 col10\" >0.903</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_59ca1_level0_row10\" class=\"row_heading level0 row10\" >zhwiki</th>\n", " <td id=\"T_59ca1_row10_col0\" class=\"data row10 col0\" >0.903</td>\n", " <td id=\"T_59ca1_row10_col1\" class=\"data row10 col1\" >0.920</td>\n", " <td id=\"T_59ca1_row10_col2\" class=\"data row10 col2\" >0.925</td>\n", " <td id=\"T_59ca1_row10_col3\" class=\"data row10 col3\" >0.926</td>\n", " <td id=\"T_59ca1_row10_col4\" class=\"data row10 col4\" >0.924</td>\n", " <td id=\"T_59ca1_row10_col5\" class=\"data row10 col5\" >0.922</td>\n", " <td id=\"T_59ca1_row10_col6\" class=\"data row10 col6\" >0.921</td>\n", " <td id=\"T_59ca1_row10_col7\" class=\"data row10 col7\" >0.921</td>\n", " <td id=\"T_59ca1_row10_col8\" class=\"data row10 col8\" >0.917</td>\n", " <td id=\"T_59ca1_row10_col9\" class=\"data row10 col9\" >0.912</td>\n", " <td id=\"T_59ca1_row10_col10\" class=\"data row10 col10\" >0.892</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div><div>Number of Edits <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>1 min</th>\n", " <th>5 min</th>\n", " <th>30 min</th>\n", " <th>1 hr</th>\n", " <th>2 hr</th>\n", " <th>4 hr</th>\n", " <th>12 hr</th>\n", " <th>24 hr</th>\n", " <th>48 hr</th>\n", " <th>72 hr</th>\n", " <th>max</th>\n", " </tr>\n", " <tr>\n", " <th>wiki_db</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>dewiki</th>\n", " <td>763</td>\n", " <td>937</td>\n", " <td>1222</td>\n", " <td>1300</td>\n", " <td>1338</td>\n", " <td>1375</td>\n", " <td>1424</td>\n", " <td>1460</td>\n", " <td>1527</td>\n", " <td>1568</td>\n", " <td>2301</td>\n", " </tr>\n", " <tr>\n", " <th>enwiki</th>\n", " <td>9917</td>\n", " <td>13471</td>\n", " <td>19062</td>\n", " <td>20548</td>\n", " <td>21640</td>\n", " <td>22355</td>\n", " <td>23261</td>\n", " <td>24472</td>\n", " <td>25405</td>\n", " <td>25933</td>\n", " <td>34986</td>\n", " </tr>\n", " <tr>\n", " <th>eswiki</th>\n", " <td>1766</td>\n", " <td>2432</td>\n", " <td>3406</td>\n", " <td>3654</td>\n", " <td>3848</td>\n", " <td>3974</td>\n", " <td>4092</td>\n", " <td>4321</td>\n", " <td>4524</td>\n", " <td>4624</td>\n", " <td>5851</td>\n", " </tr>\n", " <tr>\n", " <th>fawiki</th>\n", " <td>444</td>\n", " <td>609</td>\n", " <td>989</td>\n", " <td>1125</td>\n", " <td>1199</td>\n", " <td>1261</td>\n", " <td>1319</td>\n", " <td>1469</td>\n", " <td>1537</td>\n", " <td>1570</td>\n", " <td>2442</td>\n", " </tr>\n", " <tr>\n", " <th>frwiki</th>\n", " <td>1107</td>\n", " <td>1464</td>\n", " <td>1981</td>\n", " <td>2117</td>\n", " <td>2222</td>\n", " <td>2284</td>\n", " <td>2367</td>\n", " <td>2438</td>\n", " <td>2532</td>\n", " <td>2573</td>\n", " <td>3611</td>\n", " </tr>\n", " <tr>\n", " <th>idwiki</th>\n", " <td>111</td>\n", " <td>143</td>\n", " <td>215</td>\n", " <td>268</td>\n", " <td>289</td>\n", " <td>300</td>\n", " <td>320</td>\n", " <td>341</td>\n", " <td>355</td>\n", " <td>376</td>\n", " <td>527</td>\n", " </tr>\n", " <tr>\n", " <th>itwiki</th>\n", " <td>570</td>\n", " <td>770</td>\n", " <td>1021</td>\n", " <td>1100</td>\n", " <td>1134</td>\n", " <td>1175</td>\n", " <td>1229</td>\n", " <td>1283</td>\n", " <td>1329</td>\n", " <td>1341</td>\n", " <td>1820</td>\n", " </tr>\n", " <tr>\n", " <th>jawiki</th>\n", " <td>670</td>\n", " <td>1181</td>\n", " <td>1893</td>\n", " <td>1977</td>\n", " <td>2056</td>\n", " <td>2101</td>\n", " <td>2176</td>\n", " <td>2206</td>\n", " <td>2255</td>\n", " <td>2279</td>\n", " <td>2691</td>\n", " </tr>\n", " <tr>\n", " <th>ptwiki</th>\n", " <td>1048</td>\n", " <td>1375</td>\n", " <td>2017</td>\n", " <td>2214</td>\n", " <td>2323</td>\n", " <td>2406</td>\n", " <td>2482</td>\n", " <td>2536</td>\n", " <td>2587</td>\n", " <td>2624</td>\n", " <td>3361</td>\n", " </tr>\n", " <tr>\n", " <th>ruwiki</th>\n", " <td>903</td>\n", " <td>1196</td>\n", " <td>1716</td>\n", " <td>1815</td>\n", " <td>1874</td>\n", " <td>1922</td>\n", " <td>1965</td>\n", " <td>2049</td>\n", " <td>2115</td>\n", " <td>2148</td>\n", " <td>2833</td>\n", " </tr>\n", " <tr>\n", " <th>zhwiki</th>\n", " <td>336</td>\n", " <td>501</td>\n", " <td>790</td>\n", " <td>873</td>\n", " <td>913</td>\n", " <td>961</td>\n", " <td>1001</td>\n", " <td>1052</td>\n", " <td>1130</td>\n", " <td>1182</td>\n", " <td>1587</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div></div>\n", " </div>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display_h({\n", " 'Median Risk': elapsed_first_rev_median_risk.style.background_gradient(cmap ='viridis_r').format(\"{:.3f}\"),\n", " 'Number of Edits': elapsed_first_rev_counts\n", "})" ] }, { "cell_type": "markdown", "id": "b00e726e-4deb-4bcb-b7ee-5e9405cb42bb", "metadata": {}, "source": [ "- Limiting to 48 hr window significantly improves the scores. However, this only when registered users are considered.\n", "- The impact is similar to that of time since user registration, however, time since user's frist edit eliminates less number of edits compared to user registration." ] }, { "cell_type": "markdown", "id": "3049a57c-a4cd-4848-b562-505efe5be05d", "metadata": { "tags": [] }, "source": [ "## Time Since User's Previous Edit" ] }, { "cell_type": "code", "execution_count": 506, "id": "9acd2efe-f1df-4a81-a516-9231c63fe9bb", "metadata": {}, "outputs": [], "source": [ "time_user_prev_rev_minutes = [1, 5, 15, 30, 60, 120, non_anon.time_user_prev_rev.max()/60]\n", "time_user_prev_rev_time_intervals = [i*60 for i in time_user_prev_rev_minutes]\n", "\n", "time_user_prev_rev_column_names = [f'{i} min' if i<=120 else 'max' for i in time_user_prev_rev_minutes]\n", "\n", "time_user_prev_rev_median_risk = calculate_grouped(non_anon, time_user_prev_rev_time_intervals, \n", " 'time_user_prev_rev', column_names=time_user_prev_rev_column_names)\n", "time_user_prev_rev_counts = calculate_grouped(non_anon, time_user_prev_rev_time_intervals, \n", " 'time_user_prev_rev', column_names=time_user_prev_rev_column_names, grp_function='count')" ] }, { "cell_type": "code", "execution_count": 507, "id": "f05af18d-002a-4c41-b99b-572f1ee09221", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " <div style=\"display:flex; justify-content: space-evenly;\">\n", " <div>Median Risk <style type=\"text/css\">\n", "#T_53fdc_row0_col0 {\n", " background-color: #a2da37;\n", " color: #000000;\n", "}\n", "#T_53fdc_row0_col1 {\n", " background-color: #b8de29;\n", " color: #000000;\n", "}\n", "#T_53fdc_row0_col2, #T_53fdc_row0_col3, #T_53fdc_row0_col4, #T_53fdc_row0_col5, #T_53fdc_row0_col6, #T_53fdc_row5_col0, #T_53fdc_row5_col1 {\n", " background-color: #fde725;\n", " color: #000000;\n", "}\n", "#T_53fdc_row1_col0 {\n", " background-color: #23888e;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row1_col1 {\n", " background-color: #31668e;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row1_col2, #T_53fdc_row9_col6 {\n", " background-color: #306a8e;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row1_col3, #T_53fdc_row1_col5 {\n", " background-color: #2e6f8e;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row1_col4 {\n", " background-color: #2e6e8e;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row1_col6, #T_53fdc_row3_col1, #T_53fdc_row4_col0 {\n", " background-color: #355f8d;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row2_col0 {\n", " background-color: #404588;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row2_col1, #T_53fdc_row2_col2, #T_53fdc_row2_col3, #T_53fdc_row2_col4, #T_53fdc_row2_col5, #T_53fdc_row2_col6, #T_53fdc_row6_col0 {\n", " background-color: #440154;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row3_col0, #T_53fdc_row6_col5 {\n", " background-color: #277f8e;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row3_col2 {\n", " background-color: #365d8d;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row3_col3, #T_53fdc_row3_col4, #T_53fdc_row3_col5 {\n", " background-color: #365c8d;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row3_col6 {\n", " background-color: #433d84;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row4_col1 {\n", " background-color: #1f948c;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row4_col2 {\n", " background-color: #1e9b8a;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row4_col3 {\n", " background-color: #1fa287;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row4_col4, #T_53fdc_row5_col6 {\n", " background-color: #1e9d89;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row4_col5 {\n", " background-color: #24aa83;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row4_col6 {\n", " background-color: #20a386;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row5_col2 {\n", " background-color: #eae51a;\n", " color: #000000;\n", "}\n", "#T_53fdc_row5_col3 {\n", " background-color: #f6e620;\n", " color: #000000;\n", "}\n", "#T_53fdc_row5_col4 {\n", " background-color: #bade28;\n", " color: #000000;\n", "}\n", "#T_53fdc_row5_col5 {\n", " background-color: #b2dd2d;\n", " color: #000000;\n", "}\n", "#T_53fdc_row6_col1 {\n", " background-color: #375b8d;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row6_col2, #T_53fdc_row7_col3, #T_53fdc_row7_col5, #T_53fdc_row9_col4 {\n", " background-color: #2a788e;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row6_col3 {\n", " background-color: #27808e;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row6_col4 {\n", " background-color: #26818e;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row6_col6 {\n", " background-color: #31688e;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row7_col0 {\n", " background-color: #39558c;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row7_col1 {\n", " background-color: #33628d;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row7_col2 {\n", " background-color: #2d708e;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row7_col4, #T_53fdc_row9_col2 {\n", " background-color: #2a768e;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row7_col6 {\n", " background-color: #355e8d;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row8_col0 {\n", " background-color: #424186;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row8_col1, #T_53fdc_row8_col4 {\n", " background-color: #482475;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row8_col2 {\n", " background-color: #482071;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row8_col3 {\n", " background-color: #482878;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row8_col5 {\n", " background-color: #482374;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row8_col6 {\n", " background-color: #472d7b;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row9_col0 {\n", " background-color: #37b878;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row9_col1, #T_53fdc_row10_col6 {\n", " background-color: #2a778e;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row9_col3 {\n", " background-color: #297a8e;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row9_col5 {\n", " background-color: #2c738e;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row10_col0 {\n", " background-color: #70cf57;\n", " color: #000000;\n", "}\n", "#T_53fdc_row10_col1 {\n", " background-color: #25858e;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row10_col2 {\n", " background-color: #23898e;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row10_col3, #T_53fdc_row10_col5 {\n", " background-color: #21908d;\n", " color: #f1f1f1;\n", "}\n", "#T_53fdc_row10_col4 {\n", " background-color: #218f8d;\n", " color: #f1f1f1;\n", "}\n", "</style>\n", "<table id=\"T_53fdc\">\n", " <thead>\n", " <tr>\n", " <th class=\"blank level0\" > </th>\n", " <th id=\"T_53fdc_level0_col0\" class=\"col_heading level0 col0\" >1 min</th>\n", " <th id=\"T_53fdc_level0_col1\" class=\"col_heading level0 col1\" >5 min</th>\n", " <th id=\"T_53fdc_level0_col2\" class=\"col_heading level0 col2\" >15 min</th>\n", " <th id=\"T_53fdc_level0_col3\" class=\"col_heading level0 col3\" >30 min</th>\n", " <th id=\"T_53fdc_level0_col4\" class=\"col_heading level0 col4\" >60 min</th>\n", " <th id=\"T_53fdc_level0_col5\" class=\"col_heading level0 col5\" >120 min</th>\n", " <th id=\"T_53fdc_level0_col6\" class=\"col_heading level0 col6\" >max</th>\n", " </tr>\n", " <tr>\n", " <th class=\"index_name level0\" >wiki_db</th>\n", " <th class=\"blank col0\" > </th>\n", " <th class=\"blank col1\" > </th>\n", " <th class=\"blank col2\" > </th>\n", " <th class=\"blank col3\" > </th>\n", " <th class=\"blank col4\" > </th>\n", " <th class=\"blank col5\" > </th>\n", " <th class=\"blank col6\" > </th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th id=\"T_53fdc_level0_row0\" class=\"row_heading level0 row0\" >dewiki</th>\n", " <td id=\"T_53fdc_row0_col0\" class=\"data row0 col0\" >0.893</td>\n", " <td id=\"T_53fdc_row0_col1\" class=\"data row0 col1\" >0.889</td>\n", " <td id=\"T_53fdc_row0_col2\" class=\"data row0 col2\" >0.883</td>\n", " <td id=\"T_53fdc_row0_col3\" class=\"data row0 col3\" >0.882</td>\n", " <td id=\"T_53fdc_row0_col4\" class=\"data row0 col4\" >0.879</td>\n", " <td id=\"T_53fdc_row0_col5\" class=\"data row0 col5\" >0.879</td>\n", " <td id=\"T_53fdc_row0_col6\" class=\"data row0 col6\" >0.848</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_53fdc_level0_row1\" class=\"row_heading level0 row1\" >enwiki</th>\n", " <td id=\"T_53fdc_row1_col0\" class=\"data row1 col0\" >0.916</td>\n", " <td id=\"T_53fdc_row1_col1\" class=\"data row1 col1\" >0.915</td>\n", " <td id=\"T_53fdc_row1_col2\" class=\"data row1 col2\" >0.911</td>\n", " <td id=\"T_53fdc_row1_col3\" class=\"data row1 col3\" >0.910</td>\n", " <td id=\"T_53fdc_row1_col4\" class=\"data row1 col4\" >0.909</td>\n", " <td id=\"T_53fdc_row1_col5\" class=\"data row1 col5\" >0.909</td>\n", " <td id=\"T_53fdc_row1_col6\" class=\"data row1 col6\" >0.897</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_53fdc_level0_row2\" class=\"row_heading level0 row2\" >eswiki</th>\n", " <td id=\"T_53fdc_row2_col0\" class=\"data row2 col0\" >0.931</td>\n", " <td id=\"T_53fdc_row2_col1\" class=\"data row2 col1\" >0.929</td>\n", " <td id=\"T_53fdc_row2_col2\" class=\"data row2 col2\" >0.926</td>\n", " <td id=\"T_53fdc_row2_col3\" class=\"data row2 col3\" >0.926</td>\n", " <td id=\"T_53fdc_row2_col4\" class=\"data row2 col4\" >0.926</td>\n", " <td id=\"T_53fdc_row2_col5\" class=\"data row2 col5\" >0.926</td>\n", " <td id=\"T_53fdc_row2_col6\" class=\"data row2 col6\" >0.917</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_53fdc_level0_row3\" class=\"row_heading level0 row3\" >fawiki</th>\n", " <td id=\"T_53fdc_row3_col0\" class=\"data row3 col0\" >0.918</td>\n", " <td id=\"T_53fdc_row3_col1\" class=\"data row3 col1\" >0.916</td>\n", " <td id=\"T_53fdc_row3_col2\" class=\"data row3 col2\" >0.914</td>\n", " <td id=\"T_53fdc_row3_col3\" class=\"data row3 col3\" >0.913</td>\n", " <td id=\"T_53fdc_row3_col4\" class=\"data row3 col4\" >0.912</td>\n", " <td id=\"T_53fdc_row3_col5\" class=\"data row3 col5\" >0.912</td>\n", " <td id=\"T_53fdc_row3_col6\" class=\"data row3 col6\" >0.905</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_53fdc_level0_row4\" class=\"row_heading level0 row4\" >frwiki</th>\n", " <td id=\"T_53fdc_row4_col0\" class=\"data row4 col0\" >0.925</td>\n", " <td id=\"T_53fdc_row4_col1\" class=\"data row4 col1\" >0.906</td>\n", " <td id=\"T_53fdc_row4_col2\" class=\"data row4 col2\" >0.903</td>\n", " <td id=\"T_53fdc_row4_col3\" class=\"data row4 col3\" >0.901</td>\n", " <td id=\"T_53fdc_row4_col4\" class=\"data row4 col4\" >0.900</td>\n", " <td id=\"T_53fdc_row4_col5\" class=\"data row4 col5\" >0.897</td>\n", " <td id=\"T_53fdc_row4_col6\" class=\"data row4 col6\" >0.877</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_53fdc_level0_row5\" class=\"row_heading level0 row5\" >idwiki</th>\n", " <td id=\"T_53fdc_row5_col0\" class=\"data row5 col0\" >0.885</td>\n", " <td id=\"T_53fdc_row5_col1\" class=\"data row5 col1\" >0.885</td>\n", " <td id=\"T_53fdc_row5_col2\" class=\"data row5 col2\" >0.884</td>\n", " <td id=\"T_53fdc_row5_col3\" class=\"data row5 col3\" >0.883</td>\n", " <td id=\"T_53fdc_row5_col4\" class=\"data row5 col4\" >0.884</td>\n", " <td id=\"T_53fdc_row5_col5\" class=\"data row5 col5\" >0.884</td>\n", " <td id=\"T_53fdc_row5_col6\" class=\"data row5 col6\" >0.879</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_53fdc_level0_row6\" class=\"row_heading level0 row6\" >itwiki</th>\n", " <td id=\"T_53fdc_row6_col0\" class=\"data row6 col0\" >0.943</td>\n", " <td id=\"T_53fdc_row6_col1\" class=\"data row6 col1\" >0.917</td>\n", " <td id=\"T_53fdc_row6_col2\" class=\"data row6 col2\" >0.909</td>\n", " <td id=\"T_53fdc_row6_col3\" class=\"data row6 col3\" >0.907</td>\n", " <td id=\"T_53fdc_row6_col4\" class=\"data row6 col4\" >0.905</td>\n", " <td id=\"T_53fdc_row6_col5\" class=\"data row6 col5\" >0.906</td>\n", " <td id=\"T_53fdc_row6_col6\" class=\"data row6 col6\" >0.894</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_53fdc_level0_row7\" class=\"row_heading level0 row7\" >jawiki</th>\n", " <td id=\"T_53fdc_row7_col0\" class=\"data row7 col0\" >0.927</td>\n", " <td id=\"T_53fdc_row7_col1\" class=\"data row7 col1\" >0.915</td>\n", " <td id=\"T_53fdc_row7_col2\" class=\"data row7 col2\" >0.910</td>\n", " <td id=\"T_53fdc_row7_col3\" class=\"data row7 col3\" >0.908</td>\n", " <td id=\"T_53fdc_row7_col4\" class=\"data row7 col4\" >0.908</td>\n", " <td id=\"T_53fdc_row7_col5\" class=\"data row7 col5\" >0.907</td>\n", " <td id=\"T_53fdc_row7_col6\" class=\"data row7 col6\" >0.897</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_53fdc_level0_row8\" class=\"row_heading level0 row8\" >ptwiki</th>\n", " <td id=\"T_53fdc_row8_col0\" class=\"data row8 col0\" >0.931</td>\n", " <td id=\"T_53fdc_row8_col1\" class=\"data row8 col1\" >0.925</td>\n", " <td id=\"T_53fdc_row8_col2\" class=\"data row8 col2\" >0.922</td>\n", " <td id=\"T_53fdc_row8_col3\" class=\"data row8 col3\" >0.921</td>\n", " <td id=\"T_53fdc_row8_col4\" class=\"data row8 col4\" >0.921</td>\n", " <td id=\"T_53fdc_row8_col5\" class=\"data row8 col5\" >0.921</td>\n", " <td id=\"T_53fdc_row8_col6\" class=\"data row8 col6\" >0.908</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_53fdc_level0_row9\" class=\"row_heading level0 row9\" >ruwiki</th>\n", " <td id=\"T_53fdc_row9_col0\" class=\"data row9 col0\" >0.904</td>\n", " <td id=\"T_53fdc_row9_col1\" class=\"data row9 col1\" >0.911</td>\n", " <td id=\"T_53fdc_row9_col2\" class=\"data row9 col2\" >0.909</td>\n", " <td id=\"T_53fdc_row9_col3\" class=\"data row9 col3\" >0.908</td>\n", " <td id=\"T_53fdc_row9_col4\" class=\"data row9 col4\" >0.907</td>\n", " <td id=\"T_53fdc_row9_col5\" class=\"data row9 col5\" >0.908</td>\n", " <td id=\"T_53fdc_row9_col6\" class=\"data row9 col6\" >0.894</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_53fdc_level0_row10\" class=\"row_heading level0 row10\" >zhwiki</th>\n", " <td id=\"T_53fdc_row10_col0\" class=\"data row10 col0\" >0.897</td>\n", " <td id=\"T_53fdc_row10_col1\" class=\"data row10 col1\" >0.909</td>\n", " <td id=\"T_53fdc_row10_col2\" class=\"data row10 col2\" >0.906</td>\n", " <td id=\"T_53fdc_row10_col3\" class=\"data row10 col3\" >0.904</td>\n", " <td id=\"T_53fdc_row10_col4\" class=\"data row10 col4\" >0.903</td>\n", " <td id=\"T_53fdc_row10_col5\" class=\"data row10 col5\" >0.902</td>\n", " <td id=\"T_53fdc_row10_col6\" class=\"data row10 col6\" >0.890</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div><div>Number of Edits <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>1 min</th>\n", " <th>5 min</th>\n", " <th>15 min</th>\n", " <th>30 min</th>\n", " <th>60 min</th>\n", " <th>120 min</th>\n", " <th>max</th>\n", " </tr>\n", " <tr>\n", " <th>wiki_db</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>dewiki</th>\n", " <td>189</td>\n", " <td>658</td>\n", " <td>891</td>\n", " <td>960</td>\n", " <td>1003</td>\n", " <td>1031</td>\n", " <td>1571</td>\n", " </tr>\n", " <tr>\n", " <th>enwiki</th>\n", " <td>4042</td>\n", " <td>13345</td>\n", " <td>17506</td>\n", " <td>18766</td>\n", " <td>19514</td>\n", " <td>20040</td>\n", " <td>25824</td>\n", " </tr>\n", " <tr>\n", " <th>eswiki</th>\n", " <td>700</td>\n", " <td>2344</td>\n", " <td>3036</td>\n", " <td>3242</td>\n", " <td>3347</td>\n", " <td>3429</td>\n", " <td>4237</td>\n", " </tr>\n", " <tr>\n", " <th>fawiki</th>\n", " <td>268</td>\n", " <td>1046</td>\n", " <td>1378</td>\n", " <td>1470</td>\n", " <td>1512</td>\n", " <td>1533</td>\n", " <td>2020</td>\n", " </tr>\n", " <tr>\n", " <th>frwiki</th>\n", " <td>309</td>\n", " <td>1217</td>\n", " <td>1653</td>\n", " <td>1785</td>\n", " <td>1876</td>\n", " <td>1937</td>\n", " <td>2574</td>\n", " </tr>\n", " <tr>\n", " <th>idwiki</th>\n", " <td>55</td>\n", " <td>211</td>\n", " <td>289</td>\n", " <td>312</td>\n", " <td>327</td>\n", " <td>335</td>\n", " <td>421</td>\n", " </tr>\n", " <tr>\n", " <th>itwiki</th>\n", " <td>224</td>\n", " <td>671</td>\n", " <td>896</td>\n", " <td>952</td>\n", " <td>987</td>\n", " <td>1010</td>\n", " <td>1307</td>\n", " </tr>\n", " <tr>\n", " <th>jawiki</th>\n", " <td>1045</td>\n", " <td>1716</td>\n", " <td>1920</td>\n", " <td>1992</td>\n", " <td>2013</td>\n", " <td>2035</td>\n", " <td>2266</td>\n", " </tr>\n", " <tr>\n", " <th>ptwiki</th>\n", " <td>338</td>\n", " <td>1235</td>\n", " <td>1656</td>\n", " <td>1774</td>\n", " <td>1833</td>\n", " <td>1867</td>\n", " <td>2371</td>\n", " </tr>\n", " <tr>\n", " <th>ruwiki</th>\n", " <td>244</td>\n", " <td>992</td>\n", " <td>1332</td>\n", " <td>1423</td>\n", " <td>1481</td>\n", " <td>1512</td>\n", " <td>1977</td>\n", " </tr>\n", " <tr>\n", " <th>zhwiki</th>\n", " <td>237</td>\n", " <td>722</td>\n", " <td>927</td>\n", " <td>980</td>\n", " <td>1011</td>\n", " <td>1031</td>\n", " <td>1277</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div></div>\n", " </div>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display_h({\n", " 'Median Risk': time_user_prev_rev_median_risk.style.background_gradient(cmap ='viridis_r').format(\"{:.3f}\"),\n", " 'Number of Edits': time_user_prev_rev_counts\n", "})" ] }, { "cell_type": "markdown", "id": "ec32ee06-b272-46b7-b18a-c24ac6c6ff8c", "metadata": {}, "source": [ "While resitricting improves the score, a susbsantial number of edits will be elimated for no significant benefit." ] }, { "cell_type": "markdown", "id": "b459d7cc-d95f-4704-bf0d-61fe9fe2609e", "metadata": { "tags": [] }, "source": [ "## Time Since Page's Previous Edit" ] }, { "cell_type": "code", "execution_count": 463, "id": "c37a401a-f2c5-45ce-bce8-e50f509d6cfb", "metadata": {}, "outputs": [], "source": [ "time_page_prev_rev_minutes = [1, 5, 15, 30, 60, init_criteria.time_page_prev_rev.max()/60]\n", "time_page_prev_rev_time_intervals = [i*60 for i in time_page_prev_rev_minutes]\n", "\n", "time_page_prev_rev_column_names = [f'{i} min' if i<=60 else 'max' for i in time_page_prev_rev_minutes]\n", "\n", "time_page_prev_rev_median_risk = calculate_grouped(init_criteria, time_page_prev_rev_time_intervals, \n", " 'time_page_prev_rev', column_names=time_page_prev_rev_column_names)\n", "time_page_prev_rev_counts = calculate_grouped(init_criteria, time_page_prev_rev_time_intervals, \n", " 'time_page_prev_rev', column_names=time_page_prev_rev_column_names, grp_function='count')" ] }, { "cell_type": "code", "execution_count": 487, "id": "bce8bbe7-2c20-44ac-aa13-9cdbfdfb84ad", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " <div style=\"display:flex; justify-content: space-evenly;\">\n", " <div>Median Risk <style type=\"text/css\">\n", "#T_c7049_row0_col0, #T_c7049_row2_col0 {\n", " background-color: #3e4a89;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row0_col1, #T_c7049_row9_col0 {\n", " background-color: #34618d;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row0_col2 {\n", " background-color: #31688e;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row0_col3 {\n", " background-color: #2d708e;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row0_col4 {\n", " background-color: #2c728e;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row0_col5, #T_c7049_row5_col2 {\n", " background-color: #26828e;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row1_col0 {\n", " background-color: #25838e;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row1_col1, #T_c7049_row1_col3 {\n", " background-color: #2f6c8e;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row1_col2 {\n", " background-color: #31678e;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row1_col4, #T_c7049_row4_col4 {\n", " background-color: #306a8e;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row1_col5 {\n", " background-color: #3a538b;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row2_col1, #T_c7049_row8_col0 {\n", " background-color: #482878;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row2_col2 {\n", " background-color: #481a6c;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row2_col3, #T_c7049_row8_col3 {\n", " background-color: #481f70;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row2_col4 {\n", " background-color: #481d6f;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row2_col5, #T_c7049_row3_col0, #T_c7049_row3_col1, #T_c7049_row3_col2, #T_c7049_row3_col3, #T_c7049_row3_col4 {\n", " background-color: #440154;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row3_col5 {\n", " background-color: #472e7c;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row4_col0, #T_c7049_row6_col0 {\n", " background-color: #433e85;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row4_col1 {\n", " background-color: #375b8d;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row4_col2 {\n", " background-color: #375a8c;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row4_col3 {\n", " background-color: #33638d;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row4_col5 {\n", " background-color: #297b8e;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row5_col0 {\n", " background-color: #2cb17e;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row5_col1 {\n", " background-color: #23898e;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row5_col3 {\n", " background-color: #24868e;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row5_col4 {\n", " background-color: #277e8e;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row5_col5 {\n", " background-color: #277f8e;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row6_col1 {\n", " background-color: #472d7b;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row6_col2, #T_c7049_row8_col4 {\n", " background-color: #472c7a;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row6_col3 {\n", " background-color: #46327e;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row6_col4 {\n", " background-color: #453781;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row6_col5 {\n", " background-color: #48186a;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row7_col0 {\n", " background-color: #1fa188;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row7_col1 {\n", " background-color: #bddf26;\n", " color: #000000;\n", "}\n", "#T_c7049_row7_col2 {\n", " background-color: #dfe318;\n", " color: #000000;\n", "}\n", "#T_c7049_row7_col3 {\n", " background-color: #f4e61e;\n", " color: #000000;\n", "}\n", "#T_c7049_row7_col4, #T_c7049_row7_col5, #T_c7049_row10_col0, #T_c7049_row10_col1, #T_c7049_row10_col2, #T_c7049_row10_col3 {\n", " background-color: #fde725;\n", " color: #000000;\n", "}\n", "#T_c7049_row8_col1, #T_c7049_row8_col2 {\n", " background-color: #481c6e;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row8_col5, #T_c7049_row9_col1 {\n", " background-color: #404588;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row9_col2 {\n", " background-color: #443a83;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row9_col3 {\n", " background-color: #414487;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row9_col4 {\n", " background-color: #414287;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row9_col5 {\n", " background-color: #433d84;\n", " color: #f1f1f1;\n", "}\n", "#T_c7049_row10_col4 {\n", " background-color: #f8e621;\n", " color: #000000;\n", "}\n", "#T_c7049_row10_col5 {\n", " background-color: #90d743;\n", " color: #000000;\n", "}\n", "</style>\n", "<table id=\"T_c7049\">\n", " <thead>\n", " <tr>\n", " <th class=\"blank level0\" > </th>\n", " <th id=\"T_c7049_level0_col0\" class=\"col_heading level0 col0\" >1 min</th>\n", " <th id=\"T_c7049_level0_col1\" class=\"col_heading level0 col1\" >5 min</th>\n", " <th id=\"T_c7049_level0_col2\" class=\"col_heading level0 col2\" >15 min</th>\n", " <th id=\"T_c7049_level0_col3\" class=\"col_heading level0 col3\" >30 min</th>\n", " <th id=\"T_c7049_level0_col4\" class=\"col_heading level0 col4\" >60 min</th>\n", " <th id=\"T_c7049_level0_col5\" class=\"col_heading level0 col5\" >max</th>\n", " </tr>\n", " <tr>\n", " <th class=\"index_name level0\" >wiki_db</th>\n", " <th class=\"blank col0\" > </th>\n", " <th class=\"blank col1\" > </th>\n", " <th class=\"blank col2\" > </th>\n", " <th class=\"blank col3\" > </th>\n", " <th class=\"blank col4\" > </th>\n", " <th class=\"blank col5\" > </th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th id=\"T_c7049_level0_row0\" class=\"row_heading level0 row0\" >dewiki</th>\n", " <td id=\"T_c7049_row0_col0\" class=\"data row0 col0\" >0.932</td>\n", " <td id=\"T_c7049_row0_col1\" class=\"data row0 col1\" >0.918</td>\n", " <td id=\"T_c7049_row0_col2\" class=\"data row0 col2\" >0.913</td>\n", " <td id=\"T_c7049_row0_col3\" class=\"data row0 col3\" >0.911</td>\n", " <td id=\"T_c7049_row0_col4\" class=\"data row0 col4\" >0.910</td>\n", " <td id=\"T_c7049_row0_col5\" class=\"data row0 col5\" >0.902</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_c7049_level0_row1\" class=\"row_heading level0 row1\" >enwiki</th>\n", " <td id=\"T_c7049_row1_col0\" class=\"data row1 col0\" >0.922</td>\n", " <td id=\"T_c7049_row1_col1\" class=\"data row1 col1\" >0.916</td>\n", " <td id=\"T_c7049_row1_col2\" class=\"data row1 col2\" >0.913</td>\n", " <td id=\"T_c7049_row1_col3\" class=\"data row1 col3\" >0.912</td>\n", " <td id=\"T_c7049_row1_col4\" class=\"data row1 col4\" >0.912</td>\n", " <td id=\"T_c7049_row1_col5\" class=\"data row1 col5\" >0.911</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_c7049_level0_row2\" class=\"row_heading level0 row2\" >eswiki</th>\n", " <td id=\"T_c7049_row2_col0\" class=\"data row2 col0\" >0.932</td>\n", " <td id=\"T_c7049_row2_col1\" class=\"data row2 col1\" >0.927</td>\n", " <td id=\"T_c7049_row2_col2\" class=\"data row2 col2\" >0.924</td>\n", " <td id=\"T_c7049_row2_col3\" class=\"data row2 col3\" >0.923</td>\n", " <td id=\"T_c7049_row2_col4\" class=\"data row2 col4\" >0.923</td>\n", " <td id=\"T_c7049_row2_col5\" class=\"data row2 col5\" >0.923</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_c7049_level0_row3\" class=\"row_heading level0 row3\" >fawiki</th>\n", " <td id=\"T_c7049_row3_col0\" class=\"data row3 col0\" >0.943</td>\n", " <td id=\"T_c7049_row3_col1\" class=\"data row3 col1\" >0.931</td>\n", " <td id=\"T_c7049_row3_col2\" class=\"data row3 col2\" >0.927</td>\n", " <td id=\"T_c7049_row3_col3\" class=\"data row3 col3\" >0.927</td>\n", " <td id=\"T_c7049_row3_col4\" class=\"data row3 col4\" >0.927</td>\n", " <td id=\"T_c7049_row3_col5\" class=\"data row3 col5\" >0.916</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_c7049_level0_row4\" class=\"row_heading level0 row4\" >frwiki</th>\n", " <td id=\"T_c7049_row4_col0\" class=\"data row4 col0\" >0.934</td>\n", " <td id=\"T_c7049_row4_col1\" class=\"data row4 col1\" >0.919</td>\n", " <td id=\"T_c7049_row4_col2\" class=\"data row4 col2\" >0.915</td>\n", " <td id=\"T_c7049_row4_col3\" class=\"data row4 col3\" >0.913</td>\n", " <td id=\"T_c7049_row4_col4\" class=\"data row4 col4\" >0.912</td>\n", " <td id=\"T_c7049_row4_col5\" class=\"data row4 col5\" >0.903</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_c7049_level0_row5\" class=\"row_heading level0 row5\" >idwiki</th>\n", " <td id=\"T_c7049_row5_col0\" class=\"data row5 col0\" >0.913</td>\n", " <td id=\"T_c7049_row5_col1\" class=\"data row5 col1\" >0.911</td>\n", " <td id=\"T_c7049_row5_col2\" class=\"data row5 col2\" >0.908</td>\n", " <td id=\"T_c7049_row5_col3\" class=\"data row5 col3\" >0.907</td>\n", " <td id=\"T_c7049_row5_col4\" class=\"data row5 col4\" >0.908</td>\n", " <td id=\"T_c7049_row5_col5\" class=\"data row5 col5\" >0.902</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_c7049_level0_row6\" class=\"row_heading level0 row6\" >itwiki</th>\n", " <td id=\"T_c7049_row6_col0\" class=\"data row6 col0\" >0.934</td>\n", " <td id=\"T_c7049_row6_col1\" class=\"data row6 col1\" >0.926</td>\n", " <td id=\"T_c7049_row6_col2\" class=\"data row6 col2\" >0.922</td>\n", " <td id=\"T_c7049_row6_col3\" class=\"data row6 col3\" >0.921</td>\n", " <td id=\"T_c7049_row6_col4\" class=\"data row6 col4\" >0.920</td>\n", " <td id=\"T_c7049_row6_col5\" class=\"data row6 col5\" >0.920</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_c7049_level0_row7\" class=\"row_heading level0 row7\" >jawiki</th>\n", " <td id=\"T_c7049_row7_col0\" class=\"data row7 col0\" >0.916</td>\n", " <td id=\"T_c7049_row7_col1\" class=\"data row7 col1\" >0.892</td>\n", " <td id=\"T_c7049_row7_col2\" class=\"data row7 col2\" >0.887</td>\n", " <td id=\"T_c7049_row7_col3\" class=\"data row7 col3\" >0.885</td>\n", " <td id=\"T_c7049_row7_col4\" class=\"data row7 col4\" >0.883</td>\n", " <td id=\"T_c7049_row7_col5\" class=\"data row7 col5\" >0.876</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_c7049_level0_row8\" class=\"row_heading level0 row8\" >ptwiki</th>\n", " <td id=\"T_c7049_row8_col0\" class=\"data row8 col0\" >0.937</td>\n", " <td id=\"T_c7049_row8_col1\" class=\"data row8 col1\" >0.928</td>\n", " <td id=\"T_c7049_row8_col2\" class=\"data row8 col2\" >0.924</td>\n", " <td id=\"T_c7049_row8_col3\" class=\"data row8 col3\" >0.923</td>\n", " <td id=\"T_c7049_row8_col4\" class=\"data row8 col4\" >0.921</td>\n", " <td id=\"T_c7049_row8_col5\" class=\"data row8 col5\" >0.913</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_c7049_level0_row9\" class=\"row_heading level0 row9\" >ruwiki</th>\n", " <td id=\"T_c7049_row9_col0\" class=\"data row9 col0\" >0.928</td>\n", " <td id=\"T_c7049_row9_col1\" class=\"data row9 col1\" >0.923</td>\n", " <td id=\"T_c7049_row9_col2\" class=\"data row9 col2\" >0.920</td>\n", " <td id=\"T_c7049_row9_col3\" class=\"data row9 col3\" >0.918</td>\n", " <td id=\"T_c7049_row9_col4\" class=\"data row9 col4\" >0.918</td>\n", " <td id=\"T_c7049_row9_col5\" class=\"data row9 col5\" >0.914</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_c7049_level0_row10\" class=\"row_heading level0 row10\" >zhwiki</th>\n", " <td id=\"T_c7049_row10_col0\" class=\"data row10 col0\" >0.896</td>\n", " <td id=\"T_c7049_row10_col1\" class=\"data row10 col1\" >0.888</td>\n", " <td id=\"T_c7049_row10_col2\" class=\"data row10 col2\" >0.885</td>\n", " <td id=\"T_c7049_row10_col3\" class=\"data row10 col3\" >0.885</td>\n", " <td id=\"T_c7049_row10_col4\" class=\"data row10 col4\" >0.884</td>\n", " <td id=\"T_c7049_row10_col5\" class=\"data row10 col5\" >0.883</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div><div>Number of Edits <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>1 min</th>\n", " <th>5 min</th>\n", " <th>15 min</th>\n", " <th>30 min</th>\n", " <th>60 min</th>\n", " <th>max</th>\n", " </tr>\n", " <tr>\n", " <th>wiki_db</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>dewiki</th>\n", " <td>1440</td>\n", " <td>3398</td>\n", " <td>3987</td>\n", " <td>4192</td>\n", " <td>4411</td>\n", " <td>16829</td>\n", " </tr>\n", " <tr>\n", " <th>enwiki</th>\n", " <td>20828</td>\n", " <td>50764</td>\n", " <td>60095</td>\n", " <td>63694</td>\n", " <td>66835</td>\n", " <td>172584</td>\n", " </tr>\n", " <tr>\n", " <th>eswiki</th>\n", " <td>6968</td>\n", " <td>17044</td>\n", " <td>19714</td>\n", " <td>20687</td>\n", " <td>21526</td>\n", " <td>55105</td>\n", " </tr>\n", " <tr>\n", " <th>fawiki</th>\n", " <td>1086</td>\n", " <td>3078</td>\n", " <td>3671</td>\n", " <td>3860</td>\n", " <td>4021</td>\n", " <td>9967</td>\n", " </tr>\n", " <tr>\n", " <th>frwiki</th>\n", " <td>1906</td>\n", " <td>5315</td>\n", " <td>6332</td>\n", " <td>6647</td>\n", " <td>6912</td>\n", " <td>19375</td>\n", " </tr>\n", " <tr>\n", " <th>idwiki</th>\n", " <td>465</td>\n", " <td>1073</td>\n", " <td>1286</td>\n", " <td>1370</td>\n", " <td>1442</td>\n", " <td>3554</td>\n", " </tr>\n", " <tr>\n", " <th>itwiki</th>\n", " <td>3027</td>\n", " <td>6706</td>\n", " <td>7844</td>\n", " <td>8220</td>\n", " <td>8561</td>\n", " <td>23440</td>\n", " </tr>\n", " <tr>\n", " <th>jawiki</th>\n", " <td>1625</td>\n", " <td>3389</td>\n", " <td>3973</td>\n", " <td>4217</td>\n", " <td>4450</td>\n", " <td>10170</td>\n", " </tr>\n", " <tr>\n", " <th>ptwiki</th>\n", " <td>340</td>\n", " <td>967</td>\n", " <td>1235</td>\n", " <td>1318</td>\n", " <td>1377</td>\n", " <td>3361</td>\n", " </tr>\n", " <tr>\n", " <th>ruwiki</th>\n", " <td>2323</td>\n", " <td>6326</td>\n", " <td>7420</td>\n", " <td>7792</td>\n", " <td>8089</td>\n", " <td>23587</td>\n", " </tr>\n", " <tr>\n", " <th>zhwiki</th>\n", " <td>1035</td>\n", " <td>2623</td>\n", " <td>3168</td>\n", " <td>3388</td>\n", " <td>3581</td>\n", " <td>7568</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div></div>\n", " </div>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display_h({\n", " 'Median Risk': time_page_prev_rev_median_risk.style.background_gradient(cmap ='viridis_r').format(\"{:.3f}\"),\n", " 'Number of Edits': time_page_prev_rev_counts\n", "})" ] }, { "cell_type": "markdown", "id": "78939d4d-236e-41f2-b8d6-95467ab8f742", "metadata": {}, "source": [ "While resitricting improves the score, a susbsantial number of edits will be elimated for no significant benefit." ] }, { "cell_type": "markdown", "id": "479739c4-ccda-4173-a471-9257720ec1a4", "metadata": {}, "source": [ "## Bytes Diff" ] }, { "cell_type": "code", "execution_count": 494, "id": "b38691ad-c6a4-4a64-9eb0-d40993031666", "metadata": {}, "outputs": [], "source": [ "warnings.filterwarnings('ignore')\n", "\n", "bytes_diff_intervals = [0, 1, 5, 10, 100, 500, 1000, 5000, init_criteria.rev_bytes_diff.abs().max()]\n", "\n", "bytes_diff_column_labels = ['min'] + bytes_diff_intervals[1:-1] + ['max']\n", "\n", "bytes_diff_median_risk = calculate_grouped(init_criteria, bytes_diff_intervals, \n", " 'rev_bytes_diff', column_names=bytes_diff_column_labels)\n", "bytes_diff_counts = calculate_grouped(init_criteria, bytes_diff_intervals, \n", " 'rev_bytes_diff', column_names=bytes_diff_column_labels, grp_function='count')" ] }, { "cell_type": "code", "execution_count": 495, "id": "3a8b18a4-17f8-42e8-9ae2-476e3d758bc2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " <div style=\"display:flex; justify-content: space-evenly;\">\n", " <div>Median Risk <style type=\"text/css\">\n", "#T_6c905_row0_col0 {\n", " background-color: #2b748e;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row0_col1 {\n", " background-color: #2d708e;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row0_col2, #T_6c905_row6_col4 {\n", " background-color: #33638d;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row0_col3 {\n", " background-color: #31688e;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row0_col4 {\n", " background-color: #20a386;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row0_col5, #T_6c905_row3_col3, #T_6c905_row9_col1, #T_6c905_row9_col2 {\n", " background-color: #443a83;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row0_col6 {\n", " background-color: #481467;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row0_col7 {\n", " background-color: #46085c;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row0_col8, #T_6c905_row1_col8, #T_6c905_row3_col8, #T_6c905_row4_col8, #T_6c905_row5_col8, #T_6c905_row6_col8, #T_6c905_row7_col0, #T_6c905_row7_col1, #T_6c905_row7_col2, #T_6c905_row7_col3, #T_6c905_row7_col4, #T_6c905_row7_col8, #T_6c905_row8_col5, #T_6c905_row8_col6, #T_6c905_row8_col7, #T_6c905_row8_col8, #T_6c905_row9_col8, #T_6c905_row10_col8 {\n", " background-color: #fde725;\n", " color: #000000;\n", "}\n", "#T_6c905_row1_col0 {\n", " background-color: #3e4c8a;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row1_col1 {\n", " background-color: #3d4e8a;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row1_col2, #T_6c905_row8_col3 {\n", " background-color: #3c508b;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row1_col3 {\n", " background-color: #39558c;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row1_col4 {\n", " background-color: #1e9c89;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row1_col5 {\n", " background-color: #3d4d8a;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row1_col6, #T_6c905_row10_col7 {\n", " background-color: #472a7a;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row1_col7, #T_6c905_row6_col1 {\n", " background-color: #482677;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row2_col0, #T_6c905_row2_col1, #T_6c905_row2_col2, #T_6c905_row2_col3, #T_6c905_row2_col4, #T_6c905_row2_col8, #T_6c905_row6_col5, #T_6c905_row6_col6, #T_6c905_row6_col7 {\n", " background-color: #440154;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row2_col5 {\n", " background-color: #450457;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row2_col6 {\n", " background-color: #471063;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row2_col7 {\n", " background-color: #2cb17e;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row3_col0 {\n", " background-color: #472d7b;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row3_col1 {\n", " background-color: #46307e;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row3_col2 {\n", " background-color: #453882;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row3_col4 {\n", " background-color: #25858e;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row3_col5 {\n", " background-color: #24868e;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row3_col6, #T_6c905_row8_col0 {\n", " background-color: #423f85;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row3_col7 {\n", " background-color: #3f4788;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row4_col0 {\n", " background-color: #2e6e8e;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row4_col1, #T_6c905_row5_col0 {\n", " background-color: #2c718e;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row4_col2 {\n", " background-color: #2b758e;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row4_col3 {\n", " background-color: #29798e;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row4_col4 {\n", " background-color: #1fa188;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row4_col5 {\n", " background-color: #481d6f;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row4_col6 {\n", " background-color: #471365;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row4_col7, #T_6c905_row5_col5 {\n", " background-color: #470e61;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row5_col1 {\n", " background-color: #2d718e;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row5_col2 {\n", " background-color: #297b8e;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row5_col3 {\n", " background-color: #2a778e;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row5_col4 {\n", " background-color: #21908d;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row5_col6, #T_6c905_row9_col6, #T_6c905_row9_col7 {\n", " background-color: #481668;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row5_col7, #T_6c905_row7_col7 {\n", " background-color: #424186;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row6_col0 {\n", " background-color: #482576;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row6_col2 {\n", " background-color: #472e7c;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row6_col3, #T_6c905_row9_col3 {\n", " background-color: #453781;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row7_col5 {\n", " background-color: #34608d;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row7_col6, #T_6c905_row9_col0 {\n", " background-color: #443b84;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row8_col1 {\n", " background-color: #414287;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row8_col2 {\n", " background-color: #404588;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row8_col4 {\n", " background-color: #6ccd5a;\n", " color: #000000;\n", "}\n", "#T_6c905_row9_col4 {\n", " background-color: #3b518b;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row9_col5 {\n", " background-color: #48186a;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row10_col0, #T_6c905_row10_col2 {\n", " background-color: #52c569;\n", " color: #000000;\n", "}\n", "#T_6c905_row10_col1, #T_6c905_row10_col3 {\n", " background-color: #4ec36b;\n", " color: #000000;\n", "}\n", "#T_6c905_row10_col4 {\n", " background-color: #22a785;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row10_col5 {\n", " background-color: #3e4989;\n", " color: #f1f1f1;\n", "}\n", "#T_6c905_row10_col6 {\n", " background-color: #46327e;\n", " color: #f1f1f1;\n", "}\n", "</style>\n", "<table id=\"T_6c905\">\n", " <thead>\n", " <tr>\n", " <th class=\"blank level0\" > </th>\n", " <th id=\"T_6c905_level0_col0\" class=\"col_heading level0 col0\" >min</th>\n", " <th id=\"T_6c905_level0_col1\" class=\"col_heading level0 col1\" >1</th>\n", " <th id=\"T_6c905_level0_col2\" class=\"col_heading level0 col2\" >5</th>\n", " <th id=\"T_6c905_level0_col3\" class=\"col_heading level0 col3\" >10</th>\n", " <th id=\"T_6c905_level0_col4\" class=\"col_heading level0 col4\" >100</th>\n", " <th id=\"T_6c905_level0_col5\" class=\"col_heading level0 col5\" >500</th>\n", " <th id=\"T_6c905_level0_col6\" class=\"col_heading level0 col6\" >1000</th>\n", " <th id=\"T_6c905_level0_col7\" class=\"col_heading level0 col7\" >5000</th>\n", " <th id=\"T_6c905_level0_col8\" class=\"col_heading level0 col8\" >max</th>\n", " </tr>\n", " <tr>\n", " <th class=\"index_name level0\" >wiki_db</th>\n", " <th class=\"blank col0\" > </th>\n", " <th class=\"blank col1\" > </th>\n", " <th class=\"blank col2\" > </th>\n", " <th class=\"blank col3\" > </th>\n", " <th class=\"blank col4\" > </th>\n", " <th class=\"blank col5\" > </th>\n", " <th class=\"blank col6\" > </th>\n", " <th class=\"blank col7\" > </th>\n", " <th class=\"blank col8\" > </th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th id=\"T_6c905_level0_row0\" class=\"row_heading level0 row0\" >dewiki</th>\n", " <td id=\"T_6c905_row0_col0\" class=\"data row0 col0\" >0.901</td>\n", " <td id=\"T_6c905_row0_col1\" class=\"data row0 col1\" >0.905</td>\n", " <td id=\"T_6c905_row0_col2\" class=\"data row0 col2\" >0.912</td>\n", " <td id=\"T_6c905_row0_col3\" class=\"data row0 col3\" >0.912</td>\n", " <td id=\"T_6c905_row0_col4\" class=\"data row0 col4\" >0.915</td>\n", " <td id=\"T_6c905_row0_col5\" class=\"data row0 col5\" >0.968</td>\n", " <td id=\"T_6c905_row0_col6\" class=\"data row0 col6\" >0.983</td>\n", " <td id=\"T_6c905_row0_col7\" class=\"data row0 col7\" >0.993</td>\n", " <td id=\"T_6c905_row0_col8\" class=\"data row0 col8\" >0.000</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_6c905_level0_row1\" class=\"row_heading level0 row1\" >enwiki</th>\n", " <td id=\"T_6c905_row1_col0\" class=\"data row1 col0\" >0.910</td>\n", " <td id=\"T_6c905_row1_col1\" class=\"data row1 col1\" >0.912</td>\n", " <td id=\"T_6c905_row1_col2\" class=\"data row1 col2\" >0.915</td>\n", " <td id=\"T_6c905_row1_col3\" class=\"data row1 col3\" >0.915</td>\n", " <td id=\"T_6c905_row1_col4\" class=\"data row1 col4\" >0.917</td>\n", " <td id=\"T_6c905_row1_col5\" class=\"data row1 col5\" >0.965</td>\n", " <td id=\"T_6c905_row1_col6\" class=\"data row1 col6\" >0.978</td>\n", " <td id=\"T_6c905_row1_col7\" class=\"data row1 col7\" >0.986</td>\n", " <td id=\"T_6c905_row1_col8\" class=\"data row1 col8\" >0.000</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_6c905_level0_row2\" class=\"row_heading level0 row2\" >eswiki</th>\n", " <td id=\"T_6c905_row2_col0\" class=\"data row2 col0\" >0.922</td>\n", " <td id=\"T_6c905_row2_col1\" class=\"data row2 col1\" >0.924</td>\n", " <td id=\"T_6c905_row2_col2\" class=\"data row2 col2\" >0.928</td>\n", " <td id=\"T_6c905_row2_col3\" class=\"data row2 col3\" >0.929</td>\n", " <td id=\"T_6c905_row2_col4\" class=\"data row2 col4\" >0.943</td>\n", " <td id=\"T_6c905_row2_col5\" class=\"data row2 col5\" >0.978</td>\n", " <td id=\"T_6c905_row2_col6\" class=\"data row2 col6\" >0.984</td>\n", " <td id=\"T_6c905_row2_col7\" class=\"data row2 col7\" >0.943</td>\n", " <td id=\"T_6c905_row2_col8\" class=\"data row2 col8\" >0.963</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_6c905_level0_row3\" class=\"row_heading level0 row3\" >fawiki</th>\n", " <td id=\"T_6c905_row3_col0\" class=\"data row3 col0\" >0.916</td>\n", " <td id=\"T_6c905_row3_col1\" class=\"data row3 col1\" >0.917</td>\n", " <td id=\"T_6c905_row3_col2\" class=\"data row3 col2\" >0.920</td>\n", " <td id=\"T_6c905_row3_col3\" class=\"data row3 col3\" >0.920</td>\n", " <td id=\"T_6c905_row3_col4\" class=\"data row3 col4\" >0.921</td>\n", " <td id=\"T_6c905_row3_col5\" class=\"data row3 col5\" >0.951</td>\n", " <td id=\"T_6c905_row3_col6\" class=\"data row3 col6\" >0.973</td>\n", " <td id=\"T_6c905_row3_col7\" class=\"data row3 col7\" >0.978</td>\n", " <td id=\"T_6c905_row3_col8\" class=\"data row3 col8\" >0.000</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_6c905_level0_row4\" class=\"row_heading level0 row4\" >frwiki</th>\n", " <td id=\"T_6c905_row4_col0\" class=\"data row4 col0\" >0.903</td>\n", " <td id=\"T_6c905_row4_col1\" class=\"data row4 col1\" >0.905</td>\n", " <td id=\"T_6c905_row4_col2\" class=\"data row4 col2\" >0.908</td>\n", " <td id=\"T_6c905_row4_col3\" class=\"data row4 col3\" >0.908</td>\n", " <td id=\"T_6c905_row4_col4\" class=\"data row4 col4\" >0.916</td>\n", " <td id=\"T_6c905_row4_col5\" class=\"data row4 col5\" >0.974</td>\n", " <td id=\"T_6c905_row4_col6\" class=\"data row4 col6\" >0.983</td>\n", " <td id=\"T_6c905_row4_col7\" class=\"data row4 col7\" >0.992</td>\n", " <td id=\"T_6c905_row4_col8\" class=\"data row4 col8\" >0.000</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_6c905_level0_row5\" class=\"row_heading level0 row5\" >idwiki</th>\n", " <td id=\"T_6c905_row5_col0\" class=\"data row5 col0\" >0.902</td>\n", " <td id=\"T_6c905_row5_col1\" class=\"data row5 col1\" >0.905</td>\n", " <td id=\"T_6c905_row5_col2\" class=\"data row5 col2\" >0.906</td>\n", " <td id=\"T_6c905_row5_col3\" class=\"data row5 col3\" >0.908</td>\n", " <td id=\"T_6c905_row5_col4\" class=\"data row5 col4\" >0.919</td>\n", " <td id=\"T_6c905_row5_col5\" class=\"data row5 col5\" >0.976</td>\n", " <td id=\"T_6c905_row5_col6\" class=\"data row5 col6\" >0.983</td>\n", " <td id=\"T_6c905_row5_col7\" class=\"data row5 col7\" >0.979</td>\n", " <td id=\"T_6c905_row5_col8\" class=\"data row5 col8\" >0.000</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_6c905_level0_row6\" class=\"row_heading level0 row6\" >itwiki</th>\n", " <td id=\"T_6c905_row6_col0\" class=\"data row6 col0\" >0.917</td>\n", " <td id=\"T_6c905_row6_col1\" class=\"data row6 col1\" >0.919</td>\n", " <td id=\"T_6c905_row6_col2\" class=\"data row6 col2\" >0.921</td>\n", " <td id=\"T_6c905_row6_col3\" class=\"data row6 col3\" >0.921</td>\n", " <td id=\"T_6c905_row6_col4\" class=\"data row6 col4\" >0.928</td>\n", " <td id=\"T_6c905_row6_col5\" class=\"data row6 col5\" >0.978</td>\n", " <td id=\"T_6c905_row6_col6\" class=\"data row6 col6\" >0.987</td>\n", " <td id=\"T_6c905_row6_col7\" class=\"data row6 col7\" >0.995</td>\n", " <td id=\"T_6c905_row6_col8\" class=\"data row6 col8\" >0.000</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_6c905_level0_row7\" class=\"row_heading level0 row7\" >jawiki</th>\n", " <td id=\"T_6c905_row7_col0\" class=\"data row7 col0\" >0.868</td>\n", " <td id=\"T_6c905_row7_col1\" class=\"data row7 col1\" >0.871</td>\n", " <td id=\"T_6c905_row7_col2\" class=\"data row7 col2\" >0.875</td>\n", " <td id=\"T_6c905_row7_col3\" class=\"data row7 col3\" >0.876</td>\n", " <td id=\"T_6c905_row7_col4\" class=\"data row7 col4\" >0.896</td>\n", " <td id=\"T_6c905_row7_col5\" class=\"data row7 col5\" >0.961</td>\n", " <td id=\"T_6c905_row7_col6\" class=\"data row7 col6\" >0.974</td>\n", " <td id=\"T_6c905_row7_col7\" class=\"data row7 col7\" >0.979</td>\n", " <td id=\"T_6c905_row7_col8\" class=\"data row7 col8\" >0.000</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_6c905_level0_row8\" class=\"row_heading level0 row8\" >ptwiki</th>\n", " <td id=\"T_6c905_row8_col0\" class=\"data row8 col0\" >0.912</td>\n", " <td id=\"T_6c905_row8_col1\" class=\"data row8 col1\" >0.914</td>\n", " <td id=\"T_6c905_row8_col2\" class=\"data row8 col2\" >0.917</td>\n", " <td id=\"T_6c905_row8_col3\" class=\"data row8 col3\" >0.916</td>\n", " <td id=\"T_6c905_row8_col4\" class=\"data row8 col4\" >0.906</td>\n", " <td id=\"T_6c905_row8_col5\" class=\"data row8 col5\" >0.919</td>\n", " <td id=\"T_6c905_row8_col6\" class=\"data row8 col6\" >0.912</td>\n", " <td id=\"T_6c905_row8_col7\" class=\"data row8 col7\" >0.914</td>\n", " <td id=\"T_6c905_row8_col8\" class=\"data row8 col8\" >0.000</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_6c905_level0_row9\" class=\"row_heading level0 row9\" >ruwiki</th>\n", " <td id=\"T_6c905_row9_col0\" class=\"data row9 col0\" >0.913</td>\n", " <td id=\"T_6c905_row9_col1\" class=\"data row9 col1\" >0.915</td>\n", " <td id=\"T_6c905_row9_col2\" class=\"data row9 col2\" >0.919</td>\n", " <td id=\"T_6c905_row9_col3\" class=\"data row9 col3\" >0.921</td>\n", " <td id=\"T_6c905_row9_col4\" class=\"data row9 col4\" >0.931</td>\n", " <td id=\"T_6c905_row9_col5\" class=\"data row9 col5\" >0.974</td>\n", " <td id=\"T_6c905_row9_col6\" class=\"data row9 col6\" >0.983</td>\n", " <td id=\"T_6c905_row9_col7\" class=\"data row9 col7\" >0.990</td>\n", " <td id=\"T_6c905_row9_col8\" class=\"data row9 col8\" >0.000</td>\n", " </tr>\n", " <tr>\n", " <th id=\"T_6c905_level0_row10\" class=\"row_heading level0 row10\" >zhwiki</th>\n", " <td id=\"T_6c905_row10_col0\" class=\"data row10 col0\" >0.883</td>\n", " <td id=\"T_6c905_row10_col1\" class=\"data row10 col1\" >0.886</td>\n", " <td id=\"T_6c905_row10_col2\" class=\"data row10 col2\" >0.890</td>\n", " <td id=\"T_6c905_row10_col3\" class=\"data row10 col3\" >0.891</td>\n", " <td id=\"T_6c905_row10_col4\" class=\"data row10 col4\" >0.915</td>\n", " <td id=\"T_6c905_row10_col5\" class=\"data row10 col5\" >0.965</td>\n", " <td id=\"T_6c905_row10_col6\" class=\"data row10 col6\" >0.976</td>\n", " <td id=\"T_6c905_row10_col7\" class=\"data row10 col7\" >0.985</td>\n", " <td id=\"T_6c905_row10_col8\" class=\"data row10 col8\" >0.000</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div><div>Number of Edits <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>min</th>\n", " <th>1</th>\n", " <th>5</th>\n", " <th>10</th>\n", " <th>100</th>\n", " <th>500</th>\n", " <th>1000</th>\n", " <th>5000</th>\n", " <th>max</th>\n", " </tr>\n", " <tr>\n", " <th>wiki_db</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>dewiki</th>\n", " <td>16711</td>\n", " <td>15566</td>\n", " <td>12420</td>\n", " <td>10723</td>\n", " <td>3840</td>\n", " <td>1491</td>\n", " <td>894</td>\n", " <td>232</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>enwiki</th>\n", " <td>171191</td>\n", " <td>159106</td>\n", " <td>131246</td>\n", " <td>114246</td>\n", " <td>42246</td>\n", " <td>15488</td>\n", " <td>9268</td>\n", " <td>1944</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>eswiki</th>\n", " <td>54949</td>\n", " <td>51473</td>\n", " <td>41913</td>\n", " <td>35805</td>\n", " <td>12103</td>\n", " <td>5167</td>\n", " <td>3242</td>\n", " <td>183</td>\n", " <td>1</td>\n", " </tr>\n", " <tr>\n", " <th>fawiki</th>\n", " <td>9857</td>\n", " <td>9387</td>\n", " <td>8041</td>\n", " <td>7269</td>\n", " <td>3135</td>\n", " <td>1046</td>\n", " <td>592</td>\n", " <td>86</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>frwiki</th>\n", " <td>19263</td>\n", " <td>18155</td>\n", " <td>15031</td>\n", " <td>13282</td>\n", " <td>5303</td>\n", " <td>2303</td>\n", " <td>1537</td>\n", " <td>430</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>idwiki</th>\n", " <td>3526</td>\n", " <td>3261</td>\n", " <td>2773</td>\n", " <td>2397</td>\n", " <td>824</td>\n", " <td>303</td>\n", " <td>168</td>\n", " <td>31</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>itwiki</th>\n", " <td>22761</td>\n", " <td>21010</td>\n", " <td>16844</td>\n", " <td>14480</td>\n", " <td>4761</td>\n", " <td>1756</td>\n", " <td>1064</td>\n", " <td>295</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>jawiki</th>\n", " <td>9659</td>\n", " <td>8968</td>\n", " <td>7751</td>\n", " <td>6926</td>\n", " <td>2999</td>\n", " <td>1357</td>\n", " <td>898</td>\n", " <td>224</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>ptwiki</th>\n", " <td>3339</td>\n", " <td>3153</td>\n", " <td>2693</td>\n", " <td>2446</td>\n", " <td>1038</td>\n", " <td>402</td>\n", " <td>217</td>\n", " <td>46</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>ruwiki</th>\n", " <td>23264</td>\n", " <td>21810</td>\n", " <td>18545</td>\n", " <td>16677</td>\n", " <td>7075</td>\n", " <td>2975</td>\n", " <td>1994</td>\n", " <td>608</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>zhwiki</th>\n", " <td>7482</td>\n", " <td>6622</td>\n", " <td>5646</td>\n", " <td>4869</td>\n", " <td>1825</td>\n", " <td>854</td>\n", " <td>546</td>\n", " <td>120</td>\n", " <td>0</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div></div>\n", " </div>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display_h({\n", " 'Median Risk': bytes_diff_median_risk.fillna(0).style.background_gradient(cmap ='viridis_r').format(\"{:.3f}\"),\n", " 'Number of Edits': bytes_diff_counts.fillna(0).astype(int)\n", "})" ] }, { "cell_type": "markdown", "id": "31bed787-0a34-407f-b552-bd3628f6d026", "metadata": {}, "source": [ "Restricting to have at least 5 bytes difference provides a good balance between the score and the number of edits" ] }, { "cell_type": "markdown", "id": "b92bd161-00c3-42e7-ad73-3e8c646abd2a", "metadata": {}, "source": [ "## Incremental criteria" ] }, { "cell_type": "markdown", "id": "95bbe63a-40ee-4c7a-b1ff-5fef43c06d25", "metadata": {}, "source": [ "Based on the above results, we will incrementally apply additional restrictions\n", "- Reverted within 12 hours\n", "- User edit count less 15 edits\n", "- Time since user's first edit is less than 48 hours\n", "- Absolute bytes difference is more than 5 bytes" ] }, { "cell_type": "code", "execution_count": 512, "id": "7f10c1a0-13a8-46ab-9eb4-9ccd7987f579", "metadata": {}, "outputs": [], "source": [ "init_criteria['abs_bytes_diff'] = init_criteria['rev_bytes_diff'].abs()" ] }, { "cell_type": "code", "execution_count": 564, "id": "5ed4c037-3230-4a04-a5ef-440b5a55552d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " <div style=\"display:flex; justify-content: space-evenly;\">\n", " <div>Initial <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>wiki_db</th>\n", " <th>median_risk</th>\n", " <th>n_edits</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>dewiki</td>\n", " <td>0.901974</td>\n", " <td>16829</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>enwiki</td>\n", " <td>0.910679</td>\n", " <td>172584</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>eswiki</td>\n", " <td>0.922596</td>\n", " <td>55105</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>fawiki</td>\n", " <td>0.916366</td>\n", " <td>9967</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>frwiki</td>\n", " <td>0.903316</td>\n", " <td>19375</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>idwiki</td>\n", " <td>0.902464</td>\n", " <td>3554</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>itwiki</td>\n", " <td>0.919648</td>\n", " <td>23440</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>jawiki</td>\n", " <td>0.875682</td>\n", " <td>10170</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>ptwiki</td>\n", " <td>0.913064</td>\n", " <td>3361</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>ruwiki</td>\n", " <td>0.914291</td>\n", " <td>23587</td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>zhwiki</td>\n", " <td>0.883454</td>\n", " <td>7568</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div></div><div>+ Reverted within 12 hours <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>wiki_db</th>\n", " <th>median_risk</th>\n", " <th>n_edits</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>dewiki</td>\n", " <td>0.904239</td>\n", " <td>16077</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>enwiki</td>\n", " <td>0.912205</td>\n", " <td>162439</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>eswiki</td>\n", " <td>0.923474</td>\n", " <td>52922</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>fawiki</td>\n", " <td>0.916792</td>\n", " <td>9228</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>frwiki</td>\n", " <td>0.905588</td>\n", " <td>18401</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>idwiki</td>\n", " <td>0.901994</td>\n", " <td>3231</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>itwiki</td>\n", " <td>0.921301</td>\n", " <td>22077</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>jawiki</td>\n", " <td>0.879789</td>\n", " <td>9401</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>ptwiki</td>\n", " <td>0.914363</td>\n", " <td>3147</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>ruwiki</td>\n", " <td>0.916403</td>\n", " <td>22250</td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>zhwiki</td>\n", " <td>0.886989</td>\n", " <td>6880</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div></div><div>+ User Edit Count <= 15 edits <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>wiki_db</th>\n", " <th>median_risk</th>\n", " <th>n_edits</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>dewiki</td>\n", " <td>0.904503</td>\n", " <td>16061</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>enwiki</td>\n", " <td>0.912847</td>\n", " <td>160889</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>eswiki</td>\n", " <td>0.923850</td>\n", " <td>52696</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>fawiki</td>\n", " <td>0.918056</td>\n", " <td>9136</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>frwiki</td>\n", " <td>0.906304</td>\n", " <td>18285</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>idwiki</td>\n", " <td>0.902892</td>\n", " <td>3190</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>itwiki</td>\n", " <td>0.921365</td>\n", " <td>22011</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>jawiki</td>\n", " <td>0.880116</td>\n", " <td>9109</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>ptwiki</td>\n", " <td>0.916916</td>\n", " <td>3079</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>ruwiki</td>\n", " <td>0.916746</td>\n", " <td>22204</td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>zhwiki</td>\n", " <td>0.887588</td>\n", " <td>6819</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div></div><div>+ Time Since First Edit <= 48 hrs <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>wiki_db</th>\n", " <th>median_risk</th>\n", " <th>n_edits</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>dewiki</td>\n", " <td>0.907555</td>\n", " <td>15468</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>enwiki</td>\n", " <td>0.915196</td>\n", " <td>153858</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>eswiki</td>\n", " <td>0.924792</td>\n", " <td>51696</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>fawiki</td>\n", " <td>0.920468</td>\n", " <td>8539</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>frwiki</td>\n", " <td>0.909034</td>\n", " <td>17489</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>idwiki</td>\n", " <td>0.905071</td>\n", " <td>3067</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>itwiki</td>\n", " <td>0.922709</td>\n", " <td>21633</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>jawiki</td>\n", " <td>0.882525</td>\n", " <td>8828</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>ptwiki</td>\n", " <td>0.930669</td>\n", " <td>2458</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>ruwiki</td>\n", " <td>0.918103</td>\n", " <td>21661</td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>zhwiki</td>\n", " <td>0.890380</td>\n", " <td>6481</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div></div><div>+ Absolute Bytes Diff >= 5 bytes <div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>wiki_db</th>\n", " <th>median_risk</th>\n", " <th>n_edits</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>dewiki</td>\n", " <td>0.917214</td>\n", " <td>11281</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>enwiki</td>\n", " <td>0.920194</td>\n", " <td>115997</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>eswiki</td>\n", " <td>0.930483</td>\n", " <td>39239</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>fawiki</td>\n", " <td>0.924352</td>\n", " <td>6734</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>frwiki</td>\n", " <td>0.913709</td>\n", " <td>13492</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>idwiki</td>\n", " <td>0.910019</td>\n", " <td>2361</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>itwiki</td>\n", " <td>0.924533</td>\n", " <td>15505</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>jawiki</td>\n", " <td>0.883670</td>\n", " <td>6679</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>ptwiki</td>\n", " <td>0.934228</td>\n", " <td>1855</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>ruwiki</td>\n", " <td>0.923788</td>\n", " <td>16914</td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>zhwiki</td>\n", " <td>0.896337</td>\n", " <td>4813</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div></div>\n", " </div>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "def calculate_median_risk_and_count(df, criteria, time_to_revert_limit=12*60*60):\n", " \n", " query_string = f\"time_to_revert <= {time_to_revert_limit} \" + (\"& \" + criteria if criteria else \"\")\n", " filtered_df = df.query(query_string)\n", " aggregated_df = filtered_df.groupby('wiki_db').agg({'risk': 'median', 'rev_id': 'count'})\n", " aggregated_df.rename({'rev_id': 'n_edits', 'risk': 'median_risk'}, inplace=True, axis=1)\n", " \n", " return aggregated_df.reset_index()\n", "\n", "criteria_conditions = {\n", " 'Initial': init_criteria_risk,\n", " '+ Reverted within 12 hours': '',\n", " '+ User Edit Count <= 15 edits': \"(is_anon == True) | (user_edit_count <= 15)\",\n", " '+ Time Since First Edit <= 48 hrs': \"(is_anon == True) | ((user_edit_count <= 15) & (elapsed_first_rev < 48*60*60))\",\n", " '+ Absolute Bytes Diff >= 5 bytes': \"(abs_bytes_diff >= 5) & ((is_anon == True) | ((user_edit_count <= 15) & (elapsed_first_rev < 48*60*60)))\"\n", "}\n", "\n", "results = {label: calculate_median_risk_and_count(init_criteria, criteria) if label != 'Initial' \\\n", " else init_criteria_risk for label, criteria in criteria_conditions.items()}\n", "display_h(results)" ] }, { "cell_type": "markdown", "id": "f44014bf-be98-439f-8d58-672ae7fc0504", "metadata": {}, "source": [ "- Restricting user related related metrics make minor improvements to the median risk, as majority of the reverted edits are made by anonymous users.\n", "- While having at least an n number of absolute bytes difference, a substantial number of edits are elimiated, as compared to the initial criteria.\n", "- In addition to the time to revert, absolute bytes difference is only the control factor available for anonymous edits." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" } }, "nbformat": 4, "nbformat_minor": 5 }