{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Desktop Vector Skin Version User Preferences"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[Task](https://phabricator.wikimedia.org/T260149)\n",
"\n",
"Pending resolution of identified Prefupdate bugs, I reviewed the mediawiki [user_properties table](https://www.mediawiki.org/wiki/Manual:User_properties_table) to determine the total number of users with each vector skin preference set for each of the test wiki. Note: Unlike PrefUpdate, this does not record every change in users preference but stores any current non-default state of the user's preference. As result, this data may included users that enabled and disabled their vectorskin version preference multiple times. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Contents\n",
"1. [Calculate opt out rate among registered users](#Calculate-opt-out-rate-among-all-registered-users)\n",
"2. [Calculate opt out rate among active editors](#Calculate-opt-out-rate-among-active-editors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data Notes:\n",
"\n",
"This reflects all current nondefault user preferences. If a user has not made any changes to their vector skin version then there will be no record for that user in this table and skin preference is default (i.e. Modern for logged-in users on test wikis)\n",
"* Only accounts for logged-in users.\n",
"* Data reflects the current state and does not account for users that opt in and opt out multiple times since deployment.\n",
"* Based on the context that the new vector skin was deployed as the default setting for all logged-in users on the wikis below, we can assume each of the values mean the following:\n",
" * Legacy: These are users that have currently opt'd out of the modern version.\n",
" * Modern: These are users that opt'd out and then opt'd back in to the modern version. (Note: This number does not reflect the total number of users that are using the modern skin; only the users that made changes to their default preferences)\n",
" * Unknown: There were a few VectorSkinVersion values set to 0 (instead of 1 [legacy] or 2[modern]). I need to further investigate what those values indicate."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Remaining To Dos:\n",
" - Investigate what `VectorSkinVersion` set to 0 means.\n",
" - Look into options of putting on superset dashboard.\n",
" - Pending resolution of PrefUpdate bugs or new schema, show opt-out rate over time."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"import datetime as dt\n",
"\n",
"from wmfdata import hive, mariadb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Calculate opt out rate among all registered users"
]
},
{
"cell_type": "code",
"execution_count": 139,
"metadata": {},
"outputs": [],
"source": [
"query = \"\"\" \n",
"SELECT \n",
" up_value as skin,\n",
" COUNT(*) as users\n",
"FROM user_properties\n",
"WHERE \n",
" up_property = 'VectorSkinVersion'\n",
"GROUP BY up_value \"\"\""
]
},
{
"cell_type": "code",
"execution_count": 150,
"metadata": {},
"outputs": [],
"source": [
"#define list of target wikis\n",
"wikis = ['euwiki','frwiktionary', 'ptwikiversity', 'fawiki','hewiki', 'frwiki']"
]
},
{
"cell_type": "code",
"execution_count": 141,
"metadata": {},
"outputs": [],
"source": [
"up_skin=list()\n",
"for wiki in wikis:\n",
" prefs = mariadb.run(\n",
" query.format(),\n",
" wiki\n",
" )\n",
" up_skin.append(prefs)\n",
"\n",
"skin= pd.concat(up_skin)"
]
},
{
"cell_type": "code",
"execution_count": 142,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total number of users for whom we have vector skin preferences set in the user_properties table: 226605\n"
]
}
],
"source": [
"skin_users = skin['users'].sum()\n",
"print('Total number of users for whom we have vector skin preferences set in the user_properties table:' , skin_users)"
]
},
{
"cell_type": "code",
"execution_count": 143,
"metadata": {},
"outputs": [],
"source": [
"skin_aliases = {\n",
" \"0\":\"unknown\",\n",
" \"1\":\"legacy\",\n",
" \"2\":\"modern\"\n",
"}\n",
"\n",
"skin= skin.replace({\"skin\": skin_aliases})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Number of users for each skin type overall"
]
},
{
"cell_type": "code",
"execution_count": 144,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" users | \n",
"
\n",
" \n",
" skin | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" legacy | \n",
" 128242 | \n",
"
\n",
" \n",
" modern | \n",
" 98334 | \n",
"
\n",
" \n",
" unknown | \n",
" 29 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" users\n",
"skin \n",
"legacy 128242\n",
"modern 98334\n",
"unknown 29"
]
},
"execution_count": 144,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"user_skin=skin.groupby('skin').sum()\n",
"user_skin"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Number of users for each skin type by wiki"
]
},
{
"cell_type": "code",
"execution_count": 145,
"metadata": {},
"outputs": [],
"source": [
"#List of wikis to correspond to data values \n",
"wikis_list = ['euwiki','euwiki', 'frwiktionary', 'frwiktionary', 'frwiktionary', 'ptwikiversity', 'ptwikiversity','fawiki', 'fawiki', 'fawiki','hewiki', 'hewiki', 'hewiki', 'frwiki', 'frwiki', 'frwiki']"
]
},
{
"cell_type": "code",
"execution_count": 146,
"metadata": {},
"outputs": [],
"source": [
"skin['wiki'] = wikis_list"
]
},
{
"cell_type": "code",
"execution_count": 148,
"metadata": {},
"outputs": [],
"source": [
"user_skin_bywiki=pd.pivot_table(skin, index=['wiki','skin'],values=['users'],aggfunc=np.sum)"
]
},
{
"cell_type": "code",
"execution_count": 149,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" users | \n",
"
\n",
" \n",
" wiki | \n",
" skin | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" euwiki | \n",
" legacy | \n",
" 2067 | \n",
"
\n",
" \n",
" modern | \n",
" 1852 | \n",
"
\n",
" \n",
" fawiki | \n",
" legacy | \n",
" 21442 | \n",
"
\n",
" \n",
" modern | \n",
" 22046 | \n",
"
\n",
" \n",
" unknown | \n",
" 2 | \n",
"
\n",
" \n",
" frwiki | \n",
" legacy | \n",
" 71535 | \n",
"
\n",
" \n",
" modern | \n",
" 49185 | \n",
"
\n",
" \n",
" unknown | \n",
" 19 | \n",
"
\n",
" \n",
" frwiktionary | \n",
" legacy | \n",
" 4684 | \n",
"
\n",
" \n",
" modern | \n",
" 3899 | \n",
"
\n",
" \n",
" unknown | \n",
" 6 | \n",
"
\n",
" \n",
" hewiki | \n",
" legacy | \n",
" 28034 | \n",
"
\n",
" \n",
" modern | \n",
" 20890 | \n",
"
\n",
" \n",
" unknown | \n",
" 2 | \n",
"
\n",
" \n",
" ptwikiversity | \n",
" legacy | \n",
" 480 | \n",
"
\n",
" \n",
" modern | \n",
" 462 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" users\n",
"wiki skin \n",
"euwiki legacy 2067\n",
" modern 1852\n",
"fawiki legacy 21442\n",
" modern 22046\n",
" unknown 2\n",
"frwiki legacy 71535\n",
" modern 49185\n",
" unknown 19\n",
"frwiktionary legacy 4684\n",
" modern 3899\n",
" unknown 6\n",
"hewiki legacy 28034\n",
" modern 20890\n",
" unknown 2\n",
"ptwikiversity legacy 480\n",
" modern 462"
]
},
"execution_count": 149,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"user_skin_bywiki"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Total Number of Registered Users on Test Wikis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use the total number of users on test wikis using the [mediawiki user table](https://www.mediawiki.org/wiki/Manual:User_table) to estimate the opt-out rate."
]
},
{
"cell_type": "code",
"execution_count": 155,
"metadata": {},
"outputs": [],
"source": [
"# collect total number of users on each wiki\n",
"\n",
"query = \"\"\" \n",
"SELECT \n",
" COUNT(DISTINCT user_id) AS num_users\n",
"FROM user\"\"\"\n"
]
},
{
"cell_type": "code",
"execution_count": 156,
"metadata": {},
"outputs": [],
"source": [
"user_count = mariadb.run(commands = query, dbs = wikis, format=\"pandas\")"
]
},
{
"cell_type": "code",
"execution_count": 157,
"metadata": {},
"outputs": [],
"source": [
"user_count['wiki'] = wikis"
]
},
{
"cell_type": "code",
"execution_count": 158,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" num_users | \n",
" wiki | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 115786 | \n",
" euwiki | \n",
"
\n",
" \n",
" 1 | \n",
" 291868 | \n",
" frwiktionary | \n",
"
\n",
" \n",
" 2 | \n",
" 29455 | \n",
" ptwikiversity | \n",
"
\n",
" \n",
" 3 | \n",
" 963101 | \n",
" fawiki | \n",
"
\n",
" \n",
" 4 | \n",
" 685306 | \n",
" hewiki | \n",
"
\n",
" \n",
" 5 | \n",
" 3905482 | \n",
" frwiki | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" num_users wiki\n",
"0 115786 euwiki\n",
"1 291868 frwiktionary\n",
"2 29455 ptwikiversity\n",
"3 963101 fawiki\n",
"4 685306 hewiki\n",
"5 3905482 frwiki"
]
},
"execution_count": 158,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"user_count"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Opt Out Rate for Registered Users\n",
"\n",
"The opt-out rate was calculated by dividing the total number of users with their vector version preference changed to 'legacy' by the total number of all registered users on the wiki. "
]
},
{
"cell_type": "code",
"execution_count": 166,
"metadata": {},
"outputs": [],
"source": [
"# Create list of legacy users - these are all users that opt-out assuming modern is the default\n",
"\n",
"legacy_users = skin[skin['skin']=='legacy']\n",
"#rename colums\n",
"\n",
"legacy_users.columns = ['skin', 'num_legacy_users', 'wiki']\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# join to user_count table to obtain opt-out rate for each wiki\n",
"\n",
"opt_out_rate = legacy_users.merge(user_count, left_on = 'wiki', right_on = 'wiki')\n"
]
},
{
"cell_type": "code",
"execution_count": 174,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" skin | \n",
" num_legacy_users | \n",
" wiki | \n",
" num_users | \n",
" opt_out_rate | \n",
" pct_opt_out_rate | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" legacy | \n",
" 2067 | \n",
" euwiki | \n",
" 115786 | \n",
" 0.017852 | \n",
" 1.785190 | \n",
"
\n",
" \n",
" 1 | \n",
" legacy | \n",
" 4684 | \n",
" frwiktionary | \n",
" 291868 | \n",
" 0.016048 | \n",
" 1.604835 | \n",
"
\n",
" \n",
" 2 | \n",
" legacy | \n",
" 480 | \n",
" ptwikiversity | \n",
" 29455 | \n",
" 0.016296 | \n",
" 1.629604 | \n",
"
\n",
" \n",
" 3 | \n",
" legacy | \n",
" 21442 | \n",
" fawiki | \n",
" 963101 | \n",
" 0.022264 | \n",
" 2.226350 | \n",
"
\n",
" \n",
" 4 | \n",
" legacy | \n",
" 28034 | \n",
" hewiki | \n",
" 685306 | \n",
" 0.040907 | \n",
" 4.090727 | \n",
"
\n",
" \n",
" 5 | \n",
" legacy | \n",
" 71535 | \n",
" frwiki | \n",
" 3905482 | \n",
" 0.018317 | \n",
" 1.831656 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" skin num_legacy_users wiki num_users opt_out_rate \\\n",
"0 legacy 2067 euwiki 115786 0.017852 \n",
"1 legacy 4684 frwiktionary 291868 0.016048 \n",
"2 legacy 480 ptwikiversity 29455 0.016296 \n",
"3 legacy 21442 fawiki 963101 0.022264 \n",
"4 legacy 28034 hewiki 685306 0.040907 \n",
"5 legacy 71535 frwiki 3905482 0.018317 \n",
"\n",
" pct_opt_out_rate \n",
"0 1.785190 \n",
"1 1.604835 \n",
"2 1.629604 \n",
"3 2.226350 \n",
"4 4.090727 \n",
"5 1.831656 "
]
},
"execution_count": 174,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Calculate opt-out rate\n",
"\n",
"opt_out_rate['pct_opt_out_rate'] = opt_out_rate['num_legacy_users']/ opt_out_rate['num_users'] * 100\n",
"\n",
"opt_out_rate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Calculate opt out rate among active editors"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I reviewed the opt-out rate among active editors (users that had 5 or more content edits overall in the last year from September 2019 to September 2020). This was calculated by finding the percent of active editors for each wiki (obtained using data from [mediawiki history table](https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/MediaWiki_history) that have `VectorSkinVersion` preference set to legacy in the [user properties table](https://www.mediawiki.org/wiki/Manual:User_properties_table/en). \n",
"\n",
"Since the modern vector version was deployed as default to all of the test wikis in this analysis, it was assumed that any users with a non-default preference recorded as legacy have opt-d out.\n"
]
},
{
"cell_type": "code",
"execution_count": 172,
"metadata": {},
"outputs": [],
"source": [
"HIVE_SNAPSHOT = \"2020-09\"\n",
"START_OF_DATA = \"2019-09-01\"\n",
"END_OF_DATA = \"2020-10-01\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Collect number of active users"
]
},
{
"cell_type": "code",
"execution_count": 173,
"metadata": {},
"outputs": [],
"source": [
"#all active editors from the past one year\n",
"\n",
"active_editor_query = \"\"\"\n",
"\n",
"WITH yr_proj_edits as (\n",
" select\n",
" event_user_text as user,\n",
" event_user_id as user_id,\n",
" wiki_db as proj,\n",
" sum(if(wiki_db = \"wikidatawiki\", 0.1, 1)) as content_edits,\n",
" max(event_timestamp) as latest_edit\n",
" from wmf.mediawiki_history\n",
" where\n",
" -- review target wikis\n",
" wiki_db IN ('euwiki','frwiktionary', 'ptwikiversity', 'fawiki','hewiki', 'frwiki') and\n",
" -- REGISTERED\n",
" event_user_is_anonymous = false and\n",
" \n",
" -- NON-BOT\n",
" size(event_user_is_bot_by) = 0 and\n",
" not array_contains(event_user_groups, \"bot\") and\n",
" \n",
" -- CONTENT EDITS\n",
" event_entity = \"revision\" and\n",
" event_type = \"create\" and\n",
" page_namespace_is_content = true and\n",
" \n",
" -- FROM THE LAST YEAR\n",
" event_timestamp >= \"{START_OF_DATA}\" and event_timestamp < \"{END_OF_DATA}\" and\n",
" \n",
" -- FROM THE LATEST SNAPSHOT\n",
" snapshot = \"{hive_snapshot}\"\n",
" \n",
" -- PER USER, PER WIKI\n",
" group by event_user_text, event_user_id, wiki_db\n",
")\n",
"\n",
"-- FINAL SELECT OF\n",
"select \n",
" user as user_name,\n",
" user_id as user_id,\n",
" proj as wiki,\n",
" global_edits\n",
"\n",
"from \n",
"-- JOINED TO THEIR HOME WIKI AND GLOBAL EDITS\n",
"(\n",
" select\n",
" user,\n",
" user_id,\n",
" proj,\n",
" -- in the unlikely event that wikis are tied by edit count and latest edit, \n",
" -- row_number() will break it somehow\n",
" row_number() over (partition by user order by content_edits desc, latest_edit desc) as rank,\n",
" sum(content_edits) over (partition by user) as global_edits\n",
" from yr_proj_edits\n",
") yr_edits\n",
"where\n",
"rank = 1\n",
"and global_edits>= 5\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 174,
"metadata": {},
"outputs": [],
"source": [
"active_editor = hive.run(\n",
" active_editor_query.format(\n",
" hive_snapshot = HIVE_SNAPSHOT,\n",
" START_OF_DATA= START_OF_DATA,\n",
" END_OF_DATA=END_OF_DATA\n",
" )\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 176,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total number of editors for whom we will be checking vector skin preferences: user_id\n",
"wiki \n",
"euwiki 788\n",
"fawiki 12421\n",
"frwiki 35442\n",
"frwiktionary 446\n",
"hewiki 5958\n",
"ptwikiversity 81\n"
]
}
],
"source": [
"#Total_active_ed = active_editor['user_id'].count()\n",
"\n",
"Total_active_ed = active_editor.groupby(['wiki'])[['user_id']].count()\n",
"\n",
"print('Total number of editors for whom we will be checking vector skin preferences:' , Total_active_ed)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Vector Skin Preferences By Active Users"
]
},
{
"cell_type": "code",
"execution_count": 177,
"metadata": {},
"outputs": [],
"source": [
"#Querying user_properties for getting the skin preferences set by the active editors we got in the above query\n",
"\n",
"query='''\n",
"SELECT \n",
" up_value AS skin, \n",
" COUNT(*) AS users\n",
"FROM user_properties\n",
"WHERE up_user in ({users})\n",
"AND up_property = \"VectorSkinVersion\"\n",
"GROUP BY up_value\n",
"'''\n"
]
},
{
"cell_type": "code",
"execution_count": 178,
"metadata": {},
"outputs": [],
"source": [
"#define list of target wikis\n",
"wikis = ['euwiki','frwiktionary', 'ptwikiversity', 'fawiki','hewiki', 'frwiki']\n"
]
},
{
"cell_type": "code",
"execution_count": 179,
"metadata": {},
"outputs": [],
"source": [
"# Looping through each wiki for the list of users for each skin\n",
"\n",
"up_skin=list()\n",
"for wiki in wikis:\n",
" user_ids = active_editor[active_editor['wiki'] == wiki][\"user_id\"]\n",
" user_list = ','.join([str(u) for u in user_ids])\n",
" prefs = mariadb.run(\n",
" query.format(users=user_list),\n",
" wiki\n",
" )\n",
" up_skin.append(prefs)\n",
"\n",
"skin= pd.concat(up_skin)"
]
},
{
"cell_type": "code",
"execution_count": 180,
"metadata": {},
"outputs": [],
"source": [
"#List of wikis to correspond to data values \n",
"wikis_list = ['euwiki','euwiki', 'frwiktionary', 'frwiktionary', 'ptwikiversity', 'ptwikiversity','fawiki', 'fawiki', 'fawiki','hewiki', 'hewiki', 'frwiki', 'frwiki', 'frwiki']"
]
},
{
"cell_type": "code",
"execution_count": 181,
"metadata": {},
"outputs": [],
"source": [
"# add wiki column\n",
"skin['wiki'] = wikis_list"
]
},
{
"cell_type": "code",
"execution_count": 182,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total number of users for whom we have vector skin preferences set in the user_properties table: users\n",
"wiki \n",
"euwiki 67\n",
"fawiki 3316\n",
"frwiki 6399\n",
"frwiktionary 93\n",
"hewiki 1439\n",
"ptwikiversity 11\n"
]
}
],
"source": [
"# skin_users = skin['users'].sum()\n",
"\n",
"skin_users = skin.groupby(['wiki']).sum()\n",
"\n",
"print('Total number of users for whom we have vector skin preferences set in the user_properties table:' , skin_users)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: There are a large number of users who do not have data for vector skin preference in the user_preference table indicating that they are set to the default 'Modern' skin OR due to being deleted from the user_preference table. For the analysis below, let's default them to 'Modern'."
]
},
{
"cell_type": "code",
"execution_count": 183,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" user_id | \n",
"
\n",
" \n",
" wiki | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" euwiki | \n",
" 721 | \n",
"
\n",
" \n",
" fawiki | \n",
" 9105 | \n",
"
\n",
" \n",
" frwiki | \n",
" 29043 | \n",
"
\n",
" \n",
" frwiktionary | \n",
" 353 | \n",
"
\n",
" \n",
" hewiki | \n",
" 4519 | \n",
"
\n",
" \n",
" ptwikiversity | \n",
" 70 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" user_id\n",
"wiki \n",
"euwiki 721\n",
"fawiki 9105\n",
"frwiki 29043\n",
"frwiktionary 353\n",
"hewiki 4519\n",
"ptwikiversity 70"
]
},
"execution_count": 183,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"modern_users=np.subtract(Total_active_ed,skin_users)\n",
"modern_users"
]
},
{
"cell_type": "code",
"execution_count": 184,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" skin | \n",
" users | \n",
" wiki | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" modern | \n",
" 721 | \n",
" euwiki | \n",
"
\n",
" \n",
" 1 | \n",
" modern | \n",
" 9105 | \n",
" fawiki | \n",
"
\n",
" \n",
" 2 | \n",
" modern | \n",
" 29043 | \n",
" frwiki | \n",
"
\n",
" \n",
" 3 | \n",
" modern | \n",
" 353 | \n",
" frwiktionary | \n",
"
\n",
" \n",
" 4 | \n",
" modern | \n",
" 4519 | \n",
" hewiki | \n",
"
\n",
" \n",
" 5 | \n",
" modern | \n",
" 70 | \n",
" ptwikiversity | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" skin users wiki\n",
"0 modern 721 euwiki\n",
"1 modern 9105 fawiki\n",
"2 modern 29043 frwiki\n",
"3 modern 353 frwiktionary\n",
"4 modern 4519 hewiki\n",
"5 modern 70 ptwikiversity"
]
},
"execution_count": 184,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#crate data frame of modern users\n",
"modern_users_df = pd.DataFrame([['modern', 721, 'euwiki'], ['modern', 9105, 'fawiki'], \n",
" ['modern', 29043, 'frwiki'],['modern', 353, 'frwiktionary'], ['modern', 4519, 'hewiki'],\n",
" ['modern', 70, 'ptwikiversity']],\n",
" columns=['skin','users','wiki'])\n",
"modern_users_df\n"
]
},
{
"cell_type": "code",
"execution_count": 185,
"metadata": {},
"outputs": [],
"source": [
"# Define skin type for each property values\n",
"skin_aliases = {\n",
" \"0\":\"unknown\",\n",
" \"1\":\"legacy\",\n",
" \"2\":\"modern\"\n",
"}\n",
"\n",
"skin= skin.replace({\"skin\": skin_aliases})"
]
},
{
"cell_type": "code",
"execution_count": 186,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" skin | \n",
" users | \n",
" wiki | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" legacy | \n",
" 40 | \n",
" euwiki | \n",
"
\n",
" \n",
" 1 | \n",
" modern | \n",
" 27 | \n",
" euwiki | \n",
"
\n",
" \n",
" 2 | \n",
" legacy | \n",
" 52 | \n",
" frwiktionary | \n",
"
\n",
" \n",
" 3 | \n",
" modern | \n",
" 41 | \n",
" frwiktionary | \n",
"
\n",
" \n",
" 4 | \n",
" legacy | \n",
" 7 | \n",
" ptwikiversity | \n",
"
\n",
" \n",
" 5 | \n",
" modern | \n",
" 4 | \n",
" ptwikiversity | \n",
"
\n",
" \n",
" 6 | \n",
" unknown | \n",
" 1 | \n",
" fawiki | \n",
"
\n",
" \n",
" 7 | \n",
" legacy | \n",
" 1874 | \n",
" fawiki | \n",
"
\n",
" \n",
" 8 | \n",
" modern | \n",
" 1441 | \n",
" fawiki | \n",
"
\n",
" \n",
" 9 | \n",
" legacy | \n",
" 863 | \n",
" hewiki | \n",
"
\n",
" \n",
" 10 | \n",
" modern | \n",
" 576 | \n",
" hewiki | \n",
"
\n",
" \n",
" 11 | \n",
" unknown | \n",
" 9 | \n",
" frwiki | \n",
"
\n",
" \n",
" 12 | \n",
" legacy | \n",
" 4299 | \n",
" frwiki | \n",
"
\n",
" \n",
" 13 | \n",
" modern | \n",
" 2091 | \n",
" frwiki | \n",
"
\n",
" \n",
" 14 | \n",
" modern | \n",
" 721 | \n",
" euwiki | \n",
"
\n",
" \n",
" 15 | \n",
" modern | \n",
" 9105 | \n",
" fawiki | \n",
"
\n",
" \n",
" 16 | \n",
" modern | \n",
" 29043 | \n",
" frwiki | \n",
"
\n",
" \n",
" 17 | \n",
" modern | \n",
" 353 | \n",
" frwiktionary | \n",
"
\n",
" \n",
" 18 | \n",
" modern | \n",
" 4519 | \n",
" hewiki | \n",
"
\n",
" \n",
" 19 | \n",
" modern | \n",
" 70 | \n",
" ptwikiversity | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" skin users wiki\n",
"0 legacy 40 euwiki\n",
"1 modern 27 euwiki\n",
"2 legacy 52 frwiktionary\n",
"3 modern 41 frwiktionary\n",
"4 legacy 7 ptwikiversity\n",
"5 modern 4 ptwikiversity\n",
"6 unknown 1 fawiki\n",
"7 legacy 1874 fawiki\n",
"8 modern 1441 fawiki\n",
"9 legacy 863 hewiki\n",
"10 modern 576 hewiki\n",
"11 unknown 9 frwiki\n",
"12 legacy 4299 frwiki\n",
"13 modern 2091 frwiki\n",
"14 modern 721 euwiki\n",
"15 modern 9105 fawiki\n",
"16 modern 29043 frwiki\n",
"17 modern 353 frwiktionary\n",
"18 modern 4519 hewiki\n",
"19 modern 70 ptwikiversity"
]
},
"execution_count": 186,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# combine modern_users with skin table\n",
"skin = skin.append(modern_users_df,ignore_index=True)\n",
"skin"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Number of Active Editors for Skin Type "
]
},
{
"cell_type": "code",
"execution_count": 187,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" users | \n",
"
\n",
" \n",
" skin | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" legacy | \n",
" 7135 | \n",
"
\n",
" \n",
" modern | \n",
" 47991 | \n",
"
\n",
" \n",
" unknown | \n",
" 10 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" users\n",
"skin \n",
"legacy 7135\n",
"modern 47991\n",
"unknown 10"
]
},
"execution_count": 187,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"user_skin=skin.groupby('skin').sum()\n",
"user_skin"
]
},
{
"cell_type": "code",
"execution_count": 188,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" users | \n",
"
\n",
" \n",
" wiki | \n",
" skin | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" euwiki | \n",
" legacy | \n",
" 40 | \n",
"
\n",
" \n",
" modern | \n",
" 748 | \n",
"
\n",
" \n",
" fawiki | \n",
" legacy | \n",
" 1874 | \n",
"
\n",
" \n",
" modern | \n",
" 10546 | \n",
"
\n",
" \n",
" unknown | \n",
" 1 | \n",
"
\n",
" \n",
" frwiki | \n",
" legacy | \n",
" 4299 | \n",
"
\n",
" \n",
" modern | \n",
" 31134 | \n",
"
\n",
" \n",
" unknown | \n",
" 9 | \n",
"
\n",
" \n",
" frwiktionary | \n",
" legacy | \n",
" 52 | \n",
"
\n",
" \n",
" modern | \n",
" 394 | \n",
"
\n",
" \n",
" hewiki | \n",
" legacy | \n",
" 863 | \n",
"
\n",
" \n",
" modern | \n",
" 5095 | \n",
"
\n",
" \n",
" ptwikiversity | \n",
" legacy | \n",
" 7 | \n",
"
\n",
" \n",
" modern | \n",
" 74 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" users\n",
"wiki skin \n",
"euwiki legacy 40\n",
" modern 748\n",
"fawiki legacy 1874\n",
" modern 10546\n",
" unknown 1\n",
"frwiki legacy 4299\n",
" modern 31134\n",
" unknown 9\n",
"frwiktionary legacy 52\n",
" modern 394\n",
"hewiki legacy 863\n",
" modern 5095\n",
"ptwikiversity legacy 7\n",
" modern 74"
]
},
"execution_count": 188,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"user_skin_bywiki=pd.pivot_table(skin, index=['wiki','skin'],values=['users'],aggfunc=np.sum)\n",
"user_skin_bywiki"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Perentage of Active Editors for Each Skin Type"
]
},
{
"cell_type": "code",
"execution_count": 189,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" users | \n",
"
\n",
" \n",
" skin | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" modern | \n",
" 87.0% | \n",
"
\n",
" \n",
" legacy | \n",
" 12.9% | \n",
"
\n",
" \n",
" unknown | \n",
" 0.0% | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" users\n",
"skin \n",
"modern 87.0%\n",
"legacy 12.9%\n",
"unknown 0.0%"
]
},
"execution_count": 189,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# overall\n",
"pct_user_skin=(100. * user_skin / user_skin.sum()).round(1).astype(str) + '%'\n",
"pct_user_skin.sort_values(by=['users'],ascending=False)"
]
},
{
"cell_type": "code",
"execution_count": 190,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" users | \n",
"
\n",
" \n",
" wiki | \n",
" skin | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" euwiki | \n",
" legacy | \n",
" 5.076142 | \n",
"
\n",
" \n",
" modern | \n",
" 94.923858 | \n",
"
\n",
" \n",
" fawiki | \n",
" legacy | \n",
" 15.087352 | \n",
"
\n",
" \n",
" modern | \n",
" 84.904597 | \n",
"
\n",
" \n",
" unknown | \n",
" 0.008051 | \n",
"
\n",
" \n",
" frwiki | \n",
" legacy | \n",
" 12.129677 | \n",
"
\n",
" \n",
" modern | \n",
" 87.844930 | \n",
"
\n",
" \n",
" unknown | \n",
" 0.025394 | \n",
"
\n",
" \n",
" frwiktionary | \n",
" legacy | \n",
" 11.659193 | \n",
"
\n",
" \n",
" modern | \n",
" 88.340807 | \n",
"
\n",
" \n",
" hewiki | \n",
" legacy | \n",
" 14.484726 | \n",
"
\n",
" \n",
" modern | \n",
" 85.515274 | \n",
"
\n",
" \n",
" ptwikiversity | \n",
" legacy | \n",
" 8.641975 | \n",
"
\n",
" \n",
" modern | \n",
" 91.358025 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" users\n",
"wiki skin \n",
"euwiki legacy 5.076142\n",
" modern 94.923858\n",
"fawiki legacy 15.087352\n",
" modern 84.904597\n",
" unknown 0.008051\n",
"frwiki legacy 12.129677\n",
" modern 87.844930\n",
" unknown 0.025394\n",
"frwiktionary legacy 11.659193\n",
" modern 88.340807\n",
"hewiki legacy 14.484726\n",
" modern 85.515274\n",
"ptwikiversity legacy 8.641975\n",
" modern 91.358025"
]
},
"execution_count": 190,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# by target wiki\n",
"\n",
"\n",
"pct_user_skin_bywiki = user_skin_bywiki.groupby(['wiki', 'skin']).agg({'users': 'sum'})\n",
"wiki = user_skin_bywiki.groupby(['wiki']).agg({'users': 'sum'})\n",
"pct_user_skin_bywiki.div(wiki, level='wiki') * 100\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The percentage of legacy users listed above for each wiki reflects the opt-out rate as the modern vector was presented as default for all these wikis.\n",
"\n",
"The opt out rates among active editors for each target wiki are still well below 40%. Persian Wikipedia (fawiki) currently has the highest opt out rate (15.1%) among active editors while Basque Wikipedia (euwiki) has the lowest (5.1%). "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}