{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Re-run active editors skin statistics [T180860](https://phabricator.wikimedia.org/T180860)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This task looks at getting a better understanding of the skin preferences opted by our users. \n", "\n", "Note:\n", "- Values like 'chick','simple,','classic' etc. that fall back to the default skin have been modified to reflect that. \n", "\n", "- The users in each section are all the users who met the edit threshold when their edits from the past year were summed across all projects. Each user's skin was checked on the wiki where they made the most edits during the year (for both the edit threshold and home wiki identification, Wikidata edits were each treated as 1/10th of an edit to account for the greater granularity of edits there)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Results : \n", "\n", "| Skin | Percentage of users (with 5 or more edits) | Percentage of users (with 30 or more edits) | Percentage of users (with 600 or more edits) | \n", "|-|-|-|-|\n", "| vector | 97.2% | 95.5% | 91.4% |\n", "| monobook | 2.0% | 3.4% | 7.2% |\n", "| modern | 0.3% | 0.4% | 0.7% |\n", "| timeless | 0.3% | 0.5% | 0.5% |\n", "| cologneblue | 0.1% | 0.1% | 0.1% |\n", "| minerva | 0.1% | 0.1% | 0.0%|" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "You are using wmfdata v1.0.1, but v1.0.3 is available.\n", "\n", "To update, run `pip install --upgrade git+https://github.com/neilpquinn/wmfdata/wmfdata.git@release`.\n", "\n", "To see the changes, refer to https://github.com/neilpquinn/wmfdata/blob/release/CHANGELOG.md\n" ] } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "import datetime as dt\n", "\n", "from wmfdata import hive, mariadb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Active Editor (5 or more content edits in the last one year ) \n", "We will first begin by taking all users that were [active editors](https://www.mediawiki.org/wiki/Wikimedia_Product/Data_dictionary#Editors) i.e. had 5 or more content edits overall in the last year from May 2019 to May 2020. We will pick the wiki where each user had the most edits and treat the preference there as their global preference. \n", "We will use their user ids to query the mariadb table user_properties to get their skin preferences. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "HIVE_SNAPSHOT = \"2020-05\"\n", "START_OF_DATA = \"2019-05-01\"\n", "END_OF_DATA = \"2020-06-01\"" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "#all active editors from the past one year\n", "\n", "active_editor_query = \"\"\"\n", "\n", "WITH yr_proj_edits as (\n", " select\n", " event_user_text as user,\n", " event_user_id as user_id,\n", " wiki_db as proj,\n", " sum(if(wiki_db = \"wikidatawiki\", 0.1, 1)) as content_edits,\n", " max(event_timestamp) as latest_edit\n", " from wmf.mediawiki_history\n", " where\n", " -- REGISTERED\n", " event_user_is_anonymous = false and\n", " \n", " -- NON-BOT\n", " size(event_user_is_bot_by) = 0 and\n", " not array_contains(event_user_groups, \"bot\") and\n", " \n", " -- CONTENT EDITS\n", " event_entity = \"revision\" and\n", " event_type = \"create\" and\n", " page_namespace_is_content = true and\n", " \n", " -- FROM THE LAST YEAR\n", " event_timestamp >= \"{START_OF_DATA}\" and event_timestamp < \"{END_OF_DATA}\" and\n", " \n", " -- FROM THE LATEST SNAPSHOT\n", " snapshot = \"{hive_snapshot}\"\n", " \n", " -- PER USER, PER WIKI\n", " group by event_user_text, event_user_id, wiki_db\n", ")\n", "\n", "-- FINAL SELECT OF\n", "select \n", " user as user_name,\n", " user_id as user_id,\n", " proj as wiki,\n", " global_edits\n", "\n", "from \n", "-- JOINED TO THEIR HOME WIKI AND GLOBAL EDITS\n", "(\n", " select\n", " user,\n", " user_id,\n", " proj,\n", " -- in the unlikely event that wikis are tied by edit count and latest edit, \n", " -- row_number() will break it somehow\n", " row_number() over (partition by user order by content_edits desc, latest_edit desc) as rank,\n", " sum(content_edits) over (partition by user) as global_edits\n", " from yr_proj_edits\n", ") yr_edits\n", "where\n", "rank = 1\n", "and global_edits>=5\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "active_editor = hive.run(\n", " active_editor_query.format(\n", " hive_snapshot = HIVE_SNAPSHOT,\n", " START_OF_DATA= START_OF_DATA,\n", " END_OF_DATA=END_OF_DATA\n", " )\n", ")" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of editors for whom we will be checking skin preferences: 587111\n" ] } ], "source": [ "Total_active_ed = active_editor['user_id'].count()\n", "print('Total number of editors for whom we will be checking skin preferences:' , Total_active_ed) " ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "#Querying user_properties for getting the skin preferences set by the active editors we got in the above query\n", "\n", "query='''\n", "SELECT \n", " up_value AS skin, \n", " COUNT(*) AS users\n", "FROM user_properties\n", "WHERE up_user in ({users})\n", "AND up_property = \"skin\"\n", "GROUP BY up_value\n", "'''" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will be using the user_properties table to identify skin preferences set by the active editors. Note that if a user has not set a preferred skin then there will be no record for that user in this table and skin preference is default i.e. Vector" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Looping through each wiki for the list of users for each skin\n", "\n", "wikis=active_editor['wiki'].unique()\n", "up_skin=list()\n", "for wiki in wikis:\n", " user_ids = active_editor[active_editor['wiki'] == wiki][\"user_id\"]\n", " user_list = ','.join([str(u) for u in user_ids])\n", " prefs = mariadb.run(\n", " query.format(users=user_list),\n", " wiki\n", " )\n", " up_skin.append(prefs)\n", "\n", "skin= pd.concat(up_skin)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of editors for whom we have preferences set in the user_properties table: 29587\n" ] } ], "source": [ "skin_users = skin['users'].sum()\n", "print('Total number of editors for whom we have preferences set in the user_properties table:' , skin_users) \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are a huge number of users who do not have data for skin preference in the user_preference table indicating that they are set to the default 'Vector' skin OR due to being deleted from the user_preference table. \n", "For this analysis, let's default them to 'Vector'. " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "557524" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vectors=np.subtract(Total_active_ed,skin_users)\n", "vectors" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "new_row={'skin':'vector', 'users':vectors}" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "skin=skin.append(new_row,ignore_index=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are making the following considerations from the information given on this [page](https://www.mediawiki.org/wiki/Category:All_skins) : \n", "- Vector = 0, 1, vector, simple, nostalgia, chick, standard, classic or nothing \n", "- 2 = cologneblue \n", "- chick=monobook \n", "- myskin=monobook \n", "- minervaneue = minerva " ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "skin_aliases = {\n", " \"\":\"vector\",\n", " \"0\":\"vector\",\n", " \"1\":\"vector\",\n", " \"simple\":\"vector\",\n", " \"nostalgia\":\"vector\",\n", " \"chick\":\"vector\",\n", " \"standard\":\"vector\",\n", " \"classic\":\"vector\",\n", " \"2\":\"cologneblue\",\n", " \"myskin\":\"monobook\",\n", " \"minervaneue\":\"minerva\"\n", "}\n", "\n", "skin= skin.replace({\"skin\": skin_aliases})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Number of users for each skin type" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
users
skin
cologneblue427
minerva606
modern1499
monobook11727
timeless1916
vector570936
\n", "
" ], "text/plain": [ " users\n", "skin \n", "cologneblue 427\n", "minerva 606\n", "modern 1499\n", "monobook 11727\n", "timeless 1916\n", "vector 570936" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "user_skin=skin.groupby('skin').sum()\n", "user_skin" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Percentage of Active Editors for each skin type" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
users
skin
vector97.2%
monobook2.0%
modern0.3%
timeless0.3%
cologneblue0.1%
minerva0.1%
\n", "
" ], "text/plain": [ " users\n", "skin \n", "vector 97.2%\n", "monobook 2.0%\n", "modern 0.3%\n", "timeless 0.3%\n", "cologneblue 0.1%\n", "minerva 0.1%" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pct_user_skin=(100. * user_skin / user_skin.sum()).round(1).astype(str) + '%'\n", "pct_user_skin.sort_values(by=['users'],ascending=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Active Editors with 30 or more content edits in the last one year\n", "Now let's look at users that were active editors and had 30 or more content edits in the lsat one year from May 2019 to May 2020. We will pick the wiki where each user had the most edits and treat the preference there as their global preference. " ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "#all active editors from the past one year\n", "\n", "active_editor_30_query = \"\"\"\n", "\n", "WITH yr_proj_edits as (\n", " select\n", " event_user_text as user,\n", " event_user_id as user_id,\n", " wiki_db as proj,\n", " sum(if(wiki_db = \"wikidatawiki\", 0.1, 1)) as content_edits,\n", " max(event_timestamp) as latest_edit\n", " from wmf.mediawiki_history\n", " where\n", " -- REGISTERED\n", " event_user_is_anonymous = false and\n", " \n", " -- NON-BOT\n", " size(event_user_is_bot_by) = 0 and\n", " not array_contains(event_user_groups, \"bot\") and\n", " \n", " -- CONTENT EDITS\n", " event_entity = \"revision\" and\n", " event_type = \"create\" and\n", " page_namespace_is_content = true and\n", " \n", " -- FROM THE LAST YEAR\n", " event_timestamp >= \"{START_OF_DATA}\" and event_timestamp < \"{END_OF_DATA}\" and\n", " \n", " -- FROM THE LATEST SNAPSHOT\n", " snapshot = \"{hive_snapshot}\"\n", " \n", " -- PER USER, PER WIKI\n", " group by event_user_text, event_user_id, wiki_db\n", ")\n", "\n", "-- FINAL SELECT OF\n", "select \n", " user as user_name,\n", " user_id as user_id,\n", " proj as wiki,\n", " global_edits\n", "\n", "from \n", "-- JOINED TO THEIR HOME WIKI AND GLOBAL EDITS\n", "(\n", " select\n", " user,\n", " user_id,\n", " proj,\n", " -- in the unlikely event that wikis are tied by edit count and latest edit, \n", " -- row_number() will break it somehow\n", " row_number() over (partition by user order by content_edits desc, latest_edit desc) as rank,\n", " sum(content_edits) over (partition by user) as global_edits\n", " from yr_proj_edits\n", ") yr_edits\n", "where\n", "rank = 1\n", "and global_edits>=30\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "active_editor_30 = hive.run(\n", " active_editor_30_query.format(\n", " hive_snapshot = HIVE_SNAPSHOT,\n", " START_OF_DATA= START_OF_DATA,\n", " END_OF_DATA=END_OF_DATA\n", " )\n", ")" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of editors with greater than 30 edits for whom we will be checking skin preferences: 148973\n" ] } ], "source": [ "Total_active_ed_30 = active_editor_30['user_id'].count()\n", "print('Total number of editors (with greater than 30 edits) for whom we will be checking skin preferences:' , \n", " Total_active_ed_30)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "# Looping through each wiki for the list of users for each skin\n", "\n", "wikis=active_editor_30['wiki'].unique()\n", "up_skin=list()\n", "for wiki in wikis:\n", " user_ids = active_editor_30[active_editor_30['wiki'] == wiki][\"user_id\"]\n", " user_list = ','.join([str(u) for u in user_ids])\n", " prefs = mariadb.run(\n", " query.format(users=user_list),\n", " wiki\n", " )\n", " up_skin.append(prefs)\n", "\n", "skin_30= pd.concat(up_skin)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of editors (with 30 or more edits) for whom we have preferences set in the user_properties table: 10001\n" ] } ], "source": [ "skin_users_30 = skin_30['users'].sum()\n", "print('Total number of editors (with 30 or more edits) for whom we have preferences set in the user_properties table:'\n", " , skin_users_30) \n" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "138972" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vectors_30=np.subtract(Total_active_ed_30,skin_users_30)\n", "vectors_30" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "new_row={'skin':'vector', 'users':vectors_30}" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "skin_30=skin_30.append(new_row,ignore_index=True)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "skin_30= skin_30.replace({\"skin\": skin_aliases})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Number of users (with 30 or more edits) for each skin type" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
users
skin
cologneblue153
minerva134
modern648
monobook5014
timeless740
vector142284
\n", "
" ], "text/plain": [ " users\n", "skin \n", "cologneblue 153\n", "minerva 134\n", "modern 648\n", "monobook 5014\n", "timeless 740\n", "vector 142284" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "user_skin_30=skin_30.groupby('skin').sum()\n", "user_skin_30" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Percentage of Active Editors (with 30 or more edits) for each skin type" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
users
skin
vector95.5%
monobook3.4%
timeless0.5%
modern0.4%
cologneblue0.1%
minerva0.1%
\n", "
" ], "text/plain": [ " users\n", "skin \n", "vector 95.5%\n", "monobook 3.4%\n", "timeless 0.5%\n", "modern 0.4%\n", "cologneblue 0.1%\n", "minerva 0.1%" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pct_user_skin_30=(100. * user_skin_30 / user_skin_30.sum()).round(1).astype(str) + '%'\n", "pct_user_skin_30.sort_values(by=['users'],ascending=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Very Active Editors with 600 or more content edits in the past year\n", "Lastly, let's look at users that were very active editors and had 600 or more content edits in the lsat one year from May 2019 to May 2020. We will pick the wiki where each user had the most edits and treat the preference there as their global preference. " ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "#all active editors from the past one year\n", "\n", "active_editor_600_query = \"\"\"\n", "\n", "WITH yr_proj_edits as (\n", " select\n", " event_user_text as user,\n", " event_user_id as user_id,\n", " wiki_db as proj,\n", " sum(if(wiki_db = \"wikidatawiki\", 0.1, 1)) as content_edits,\n", " max(event_timestamp) as latest_edit\n", " from wmf.mediawiki_history\n", " where\n", " -- REGISTERED\n", " event_user_is_anonymous = false and\n", " \n", " -- NON-BOT\n", " size(event_user_is_bot_by) = 0 and\n", " not array_contains(event_user_groups, \"bot\") and\n", " \n", " -- CONTENT EDITS\n", " event_entity = \"revision\" and\n", " event_type = \"create\" and\n", " page_namespace_is_content = true and\n", " \n", " -- FROM THE LAST YEAR\n", " event_timestamp >= \"{START_OF_DATA}\" and event_timestamp < \"{END_OF_DATA}\" and\n", " \n", " -- FROM THE LATEST SNAPSHOT\n", " snapshot = \"{hive_snapshot}\"\n", " \n", " -- PER USER, PER WIKI\n", " group by event_user_text, event_user_id, wiki_db\n", ")\n", "\n", "-- FINAL SELECT OF\n", "select \n", " user as user_name,\n", " user_id as user_id,\n", " proj as wiki,\n", " global_edits\n", "\n", "from \n", "-- JOINED TO THEIR HOME WIKI AND GLOBAL EDITS\n", "(\n", " select\n", " user,\n", " user_id,\n", " proj,\n", " -- in the unlikely event that wikis are tied by edit count and latest edit, \n", " -- row_number() will break it somehow\n", " row_number() over (partition by user order by content_edits desc, latest_edit desc) as rank,\n", " sum(content_edits) over (partition by user) as global_edits\n", " from yr_proj_edits\n", ") yr_edits\n", "where\n", "rank = 1\n", "and global_edits>=600\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "active_editor_600 = hive.run(\n", " active_editor_600_query.format(\n", " hive_snapshot = HIVE_SNAPSHOT,\n", " START_OF_DATA= START_OF_DATA,\n", " END_OF_DATA=END_OF_DATA\n", " )\n", ")" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of editors (with greater than 600 edits) for whom we will be checking skin preferences: 24080\n" ] } ], "source": [ "Total_active_ed_600 = active_editor_600['user_id'].count()\n", "print('Total number of editors (with greater than 600 edits) for whom we will be checking skin preferences:' , \n", " Total_active_ed_600)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "# Looping through each wiki for the list of users for each skin\n", "\n", "wikis=active_editor_600['wiki'].unique()\n", "up_skin=list()\n", "for wiki in wikis:\n", " user_ids = active_editor_600[active_editor_600['wiki'] == wiki][\"user_id\"]\n", " user_list = ','.join([str(u) for u in user_ids])\n", " prefs = mariadb.run(\n", " query.format(users=user_list),\n", " wiki\n", " )\n", " up_skin.append(prefs)\n", "\n", "skin_600= pd.concat(up_skin)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of editors (with 600 or more edits) for whom we have preferences set in the user_properties table: 2536\n" ] } ], "source": [ "skin_users_600 = skin_600['users'].sum()\n", "print('Total number of editors (with 600 or more edits) for whom we have preferences set in the user_properties table:'\n", " , skin_users_600) \n" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "21544" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vectors_600=np.subtract(Total_active_ed_600,skin_users_600)\n", "vectors_600" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "new_row={'skin':'vector', 'users':vectors_600}" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "skin_600=skin_600.append(new_row,ignore_index=True)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "skin_600= skin_600.replace({\"skin\": skin_aliases})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Number of users (with 600 or more edits) for each skin type" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
users
skin
cologneblue27
minerva9
modern171
monobook1734
timeless130
vector22009
\n", "
" ], "text/plain": [ " users\n", "skin \n", "cologneblue 27\n", "minerva 9\n", "modern 171\n", "monobook 1734\n", "timeless 130\n", "vector 22009" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "user_skin_600=skin_600.groupby('skin').sum()\n", "user_skin_600" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Percentage of Active Editors (with 600 or more edits) for each skin type" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
users
skin
vector91.4%
monobook7.2%
modern0.7%
timeless0.5%
cologneblue0.1%
minerva0.0%
\n", "
" ], "text/plain": [ " users\n", "skin \n", "vector 91.4%\n", "monobook 7.2%\n", "modern 0.7%\n", "timeless 0.5%\n", "cologneblue 0.1%\n", "minerva 0.0%" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pct_user_skin_600=(100. * user_skin_600 / user_skin_600.sum()).round(1).astype(str) + '%'\n", "pct_user_skin_600.sort_values(by=['users'],ascending=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These results are more or less consistent with the results we got with the analysis that was done in 2017 [Statistics about /active/ users of skins on the Wikimedia cluster](https://phabricator.wikimedia.org/T147696) i.e. not a lot has changed in terms of users' choice of skin on the wikis. \n", "\n", "We have kept the edit bucket similar to the last analysis for ease of comparison.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 4 }