{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# New Discussion Tool Adoption Metrics\n", "\n", "[Task](https://phabricator.wikimedia.org/T263053)\n", "\n", "Last Updated: 17 August 2021\n", "\n", "[Code Repository](https://github.com/wikimedia-research/New-discussion-tool-analysis-2020)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Contents\n", "\n", "1. [Purpose](#Purpose)\n", "2. [Data](#Data)\n", "3. [Disruption Metrics](#Disruption-Metrics)\n", "4. [Usage Metrics](#Usage-Metrics)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Purpose\n", "\n", "The New Discussion Tool was deployed as an opt-in beta feature to all logged-in users to improve contributors' workflows for starting new discussion threads on talk pages, across Wikipedia's 16 talk [namespaces](https://www.mediawiki.org/wiki/Manual:Namespace). See the [project page](https://www.mediawiki.org/wiki/Talk_pages_project/New_discussion) for more details.\n", "\n", "**Deployment dates:**\n", "* 18 February 2021: Arabic, Czech and Hungarian Wikipedias.\n", "* 10 March 2021: All Wikipedias except the English, German, and Russian Wikipedias.\n", "* 16 March 2021: English and Russian Wikipedias and all Wikimedia Sister Projects.\n", "\n", "The purpose of this analysis is to understand how people are engaging with the New Discussion Tool beta feature to help us determine whether the New Discussion Tool is ready to be made available to all people by default at some sub-set of wikis. This analysis is intended to help us answer these questions:\n", "\n", "* Are people finding the tool to be disruptive?\n", "* Are people finding the tool behaves in the ways they expect?\n", "* Who has been using the new Discussion Tool and how much have they been using it?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data\n", " \n", "Data for this analysis comes from a combination of the following sources:\n", "* [PrepUpdate](https://meta.wikimedia.org/wiki/Schema:PrefUpdate): Tracks user-initiated preference changes over time \n", "* [User Properties](https://www.mediawiki.org/wiki/Manual:User_properties_table/en): Tracks all current non-default user preferences\n", "* [mediawiki_history](https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/MediaWiki_history): tracks completed edits\n", "* [EditAttemptStep](https://schema.wikimedia.org/repositories//secondary/jsonschema/analytics/legacy/editattemptstep/current.yaml): Tracks editor activity.\n", "\n", "For this analysis, we reviewed events logged from the data of deployment as a beta feature (18 February 2021) through the end of July (31 July 2021). For each metric, we calculated metrics for overall (across all Wikimedia projects), by experience level (users cumulative edit count), and by the specific Wikipedias we are considering opt-out deployments (Arabic and Czech Wikipedia). \n" ] }, { "cell_type": "code", "execution_count": 308, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "library(IRdisplay)\n", "\n", "display_html(\n", "'\n", "
\n", " \n", "
'\n", ")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# load required packages\n", "shhh <- function(expr) suppressPackageStartupMessages(suppressWarnings(suppressMessages(expr)))\n", "shhh({\n", " library(tidyverse); library(glue); library(lubridate); library(scales)\n", "})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Disruption Metrics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What percent of contributors that explicitly disabled the discussion tools beta feature after making at least one new discussion edit? \n", "\n", "**Purpose**: Do people using the New Discussion Tool find it disruptive?\n", "\n", "We reviewed how many new discussion tool users explicitly[^1]turned off the feature after making at least one edit. \n", "\n", "**Data Desciption and Assumptions:**\n", "- User Preference changes comes from the PrefUpdate eventlogging data. \n", "- We can only review last 90 days of data due to sanitization of prefupdate data, which at the time of this analysis was 10 May 20201 through 31 July 2021. We do not have data on the number of users that opted out prior to that date. To supplement this data and account for users that opted out prior to this 90 day period, we reviewed current user preference settings recorded in the mediawiki_user_history table. Please see \"New_discussion_tool_opt_out_analysis.ipynb\" located in the code repository for details of this analysis.\n", "- Excludes users that opted in and out multiple times.\n", "- There is user preference (` event.property = 'discussiontools-betaenable'`) that allows a user to explicitly turn on or off all discussion tool beta features. This includes both the reply tool and new discussion tool - these features are not turned off individually. \n", "\n", "[^1]: \"Explicitly\" turned on indicates users did not have the Automatically enable all new beta features preference checked. Note explicilty turned off could include users that were auto enrolled and then turned off the feature." ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "\n", "query <- \"\n", "--find users that opted out of the discussiontool beta feature \n", "WITH opt_out_users AS (\n", "SELECT\n", " event.userid as opt_out_user,\n", " wiki as opt_out_wiki,\n", " min(event.saveTimestamp) as opt_out_time,\n", " sum(cast(event.value = '\\\"0\\\"' as int)) as opt_outs\n", "FROM \n", " event.prefupdate\n", "WHERE\n", " event.property = 'discussiontools-betaenable' AND\n", " event.value = '\\\"0\\\"' AND\n", " CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) >= '2021-05-18' AND\n", " CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) <= '2021-07-31'\n", "GROUP BY \n", " event.userid, \n", " wiki\n", "),\n", "\n", "-- find users that made at least one edit with the new discussion tool\n", "new_topic_users AS (\n", "SELECT\n", " event_user_id as new_topic_user,\n", " wiki_db as new_topic_wiki,\n", " min(mh.event_timestamp) as first_post,\n", " CASE\n", " WHEN min(event_user_revision_count) < 100 THEN 'under 100'\n", " WHEN (min(event_user_revision_count) >=100 AND min(event_user_revision_count <= 500)) THEN '100-500'\n", " ELSE 'over 500'\n", " END AS edit_count_group,\n", " min(event_user_revision_count)AS edit_count\n", "FROM wmf.mediawiki_history AS mh\n", "WHERE \n", " ARRAY_CONTAINS(revision_tags, 'discussiontools-newtopic') \n", " AND snapshot = '2021-07' \n", "-- date of first deployment\n", " AND event_timestamp >= '2021-02-18' \n", " AND event_timestamp <= '2021-07-31' \n", "-- only on desktop\n", " AND NOT array_contains(revision_tags, 'iOS')\n", " AND NOT array_contains(revision_tags, 'Android')\n", " AND NOT array_contains(revision_tags, 'Mobile Web')\n", " -- find all edits on talk pages \n", " AND page_namespace_historical % 2 = 1\n", " AND event_entity = 'revision' AND \n", " event_type = 'create'\n", " AND event_user_is_anonymous = FALSE\n", "GROUP BY\n", " event_user_id,\n", " wiki_db\n", ")\n", "\n", "-- Main Query --\n", "SELECT\n", " new_topic_wiki AS wiki,\n", " edit_count AS edit_count,\n", " edit_count_group AS edit_count_group,\n", "--find opt out users that opted out following new discussion tool post\n", " SUM(CAST(opt_out_user IS NOT NULL AND first_post < opt_out_time AS INT)) AS opt_out_users,\n", " SUM(CAST(new_topic_user IS NOT NULL AS int)) AS new_topic_contributor\n", " \n", "FROM (\n", "SELECT\n", " new_topic_users.first_post,\n", " new_topic_users.new_topic_user,\n", " opt_out_users.opt_out_time,\n", " new_topic_users.new_topic_wiki,\n", " opt_out_users.opt_out_user,\n", " new_topic_users.edit_count,\n", " new_topic_users.edit_count_group\n", "FROM new_topic_users\n", "LEFT JOIN opt_out_users ON \n", " new_topic_users.new_topic_user = opt_out_users.opt_out_user AND\n", " new_topic_users.new_topic_wiki = opt_out_users.opt_out_wiki \n", "WHERE \n", " opt_out_users.opt_outs IS NULL OR\n", " opt_out_users.opt_outs = 1 \n", ") sessions\n", "GROUP BY\n", " sessions.new_topic_wiki,\n", " sessions.edit_count,\n", " sessions.edit_count_group\n", "\"" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Don't forget to authenticate with Kerberos using kinit\n", "\n" ] } ], "source": [ "opt_out_contributors <- wmfdata::query_hive(query)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "write_csv(opt_out_contributors, \"Data/opt_out_contributors.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Overall" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\n", "
A data.frame: 1 × 3
opt_out_usersnew_topic_contributorspct_opt_out
<int><int><chr>
42751338.32%
\n" ], "text/latex": [ "A data.frame: 1 × 3\n", "\\begin{tabular}{lll}\n", " opt\\_out\\_users & new\\_topic\\_contributors & pct\\_opt\\_out\\\\\n", " & & \\\\\n", "\\hline\n", "\t 427 & 5133 & 8.32\\%\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 1 × 3\n", "\n", "| opt_out_users <int> | new_topic_contributors <int> | pct_opt_out <chr> |\n", "|---|---|---|\n", "| 427 | 5133 | 8.32% |\n", "\n" ], "text/plain": [ " opt_out_users new_topic_contributors pct_opt_out\n", "1 427 5133 8.32% " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "opt_out_contributors_overall <- opt_out_contributors %>%\n", " summarise(opt_out_users = sum(opt_out_users),\n", " new_topic_contributors = sum(new_topic_contributor),\n", " pct_opt_out = paste0(round(opt_out_users/new_topic_contributors * 100, 2), \"%\")\n", " )\n", "\n", "opt_out_contributors_overall " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### By Experience Level" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 3 × 4
edit_count_groupopt_out_usersnew_topic_contributorspct_opt_out
<chr><int><int><chr>
100-500 43 6816.31%
over 500 23734276.92%
under 100147102514.34%
\n" ], "text/latex": [ "A tibble: 3 × 4\n", "\\begin{tabular}{llll}\n", " edit\\_count\\_group & opt\\_out\\_users & new\\_topic\\_contributors & pct\\_opt\\_out\\\\\n", " & & & \\\\\n", "\\hline\n", "\t 100-500 & 43 & 681 & 6.31\\% \\\\\n", "\t over 500 & 237 & 3427 & 6.92\\% \\\\\n", "\t under 100 & 147 & 1025 & 14.34\\%\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 3 × 4\n", "\n", "| edit_count_group <chr> | opt_out_users <int> | new_topic_contributors <int> | pct_opt_out <chr> |\n", "|---|---|---|---|\n", "| 100-500 | 43 | 681 | 6.31% |\n", "| over 500 | 237 | 3427 | 6.92% |\n", "| under 100 | 147 | 1025 | 14.34% |\n", "\n" ], "text/plain": [ " edit_count_group opt_out_users new_topic_contributors pct_opt_out\n", "1 100-500 43 681 6.31% \n", "2 over 500 237 3427 6.92% \n", "3 under 100 147 1025 14.34% " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "opt_out_contributors_byexp <- opt_out_contributors %>%\n", " group_by(edit_count_group) %>%\n", " summarise(opt_out_users = sum(opt_out_users),\n", " new_topic_contributors = sum(new_topic_contributor),\n", " pct_opt_out = paste0(round(opt_out_users/new_topic_contributors * 100, 2), \"%\")\n", " )\n", "\n", "opt_out_contributors_byexp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Junior Contributor Opt-Out Investigation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Opt-out rates for all three groups are fairly low (below 15%) with Junior Contributors (editors with under 100 edits) having the highest opt-out rate. \n", "\n", "Since the higher opt-out rate for junior contributors is somewhat unexpected, we further broke down the under 100 edit count group into smaller edit count groups (e.g 0-10 edits, 10-20 edits, 30-40 edits, etc) and reviewed the wikis with this highest Junior Contributors Opt-out rate. This was done to identify if the higher opt-out rate for Junior Contributors was due to a specific edit count group or wiki. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**By Junior Contributor Experience Level**" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "# Divide edit counts into groups\n", "\n", "b <- c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100)\n", "names <- c( '0-10 edits', '11-20 edits', '21-30 edits', '31-40 edits', \n", " '41-50 edits', '51-60 edits', '61-70 edits', '71-80 edits', '81-90 edits', '91-100 edits')" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 10 × 4
edit_countopt_out_usersnew_topic_contributorspct_opt_out
<fct><int><int><chr>
0-10 edits 5332016.56%
11-20 edits 2919015.26%
21-30 edits 1410713.08%
31-40 edits 10 9710.31%
41-50 edits 16 8219.51%
51-60 edits 10 7313.7%
61-70 edits 7 779.09%
71-80 edits 4 705.71%
81-90 edits 12 6219.35%
91-100 edits 3 446.82%
\n" ], "text/latex": [ "A tibble: 10 × 4\n", "\\begin{tabular}{llll}\n", " edit\\_count & opt\\_out\\_users & new\\_topic\\_contributors & pct\\_opt\\_out\\\\\n", " & & & \\\\\n", "\\hline\n", "\t 0-10 edits & 53 & 320 & 16.56\\%\\\\\n", "\t 11-20 edits & 29 & 190 & 15.26\\%\\\\\n", "\t 21-30 edits & 14 & 107 & 13.08\\%\\\\\n", "\t 31-40 edits & 10 & 97 & 10.31\\%\\\\\n", "\t 41-50 edits & 16 & 82 & 19.51\\%\\\\\n", "\t 51-60 edits & 10 & 73 & 13.7\\% \\\\\n", "\t 61-70 edits & 7 & 77 & 9.09\\% \\\\\n", "\t 71-80 edits & 4 & 70 & 5.71\\% \\\\\n", "\t 81-90 edits & 12 & 62 & 19.35\\%\\\\\n", "\t 91-100 edits & 3 & 44 & 6.82\\% \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 10 × 4\n", "\n", "| edit_count <fct> | opt_out_users <int> | new_topic_contributors <int> | pct_opt_out <chr> |\n", "|---|---|---|---|\n", "| 0-10 edits | 53 | 320 | 16.56% |\n", "| 11-20 edits | 29 | 190 | 15.26% |\n", "| 21-30 edits | 14 | 107 | 13.08% |\n", "| 31-40 edits | 10 | 97 | 10.31% |\n", "| 41-50 edits | 16 | 82 | 19.51% |\n", "| 51-60 edits | 10 | 73 | 13.7% |\n", "| 61-70 edits | 7 | 77 | 9.09% |\n", "| 71-80 edits | 4 | 70 | 5.71% |\n", "| 81-90 edits | 12 | 62 | 19.35% |\n", "| 91-100 edits | 3 | 44 | 6.82% |\n", "\n" ], "text/plain": [ " edit_count opt_out_users new_topic_contributors pct_opt_out\n", "1 0-10 edits 53 320 16.56% \n", "2 11-20 edits 29 190 15.26% \n", "3 21-30 edits 14 107 13.08% \n", "4 31-40 edits 10 97 10.31% \n", "5 41-50 edits 16 82 19.51% \n", "6 51-60 edits 10 73 13.7% \n", "7 61-70 edits 7 77 9.09% \n", "8 71-80 edits 4 70 5.71% \n", "9 81-90 edits 12 62 19.35% \n", "10 91-100 edits 3 44 6.82% " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "jc_opt_out_contributors_byexp <- opt_out_contributors %>%\n", " filter(edit_count <= 100) %>% # only review Junior Contributors\n", " mutate(edit_count = cut(edit_count, breaks = b, labels = names)) %>%\n", " group_by(edit_count) %>%\n", " summarise(opt_out_users = sum(opt_out_users),\n", " new_topic_contributors = sum(new_topic_contributor),\n", " pct_opt_out = paste0(round(opt_out_users/new_topic_contributors * 100, 2), \"%\")\n", " )\n", "\n", "jc_opt_out_contributors_byexp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Most all the Junior Contributor edit groups have around the same opt-out rate identified for all contributors with under 100 edits (~15%). There are slightly higher opt-out rates for contributors with under 50 edits but there is does not appear to be a specific group that contributed to the higher opt-out rate. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Wikis with the highest Junior Contributor Opt-Out Rate**" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 20 × 4
wikiopt_out_usersnew_topic_contributorspct_opt_out
<chr><int><int><dbl>
bnwiki 2 450.00
zhwikibooks 1 250.00
arwiki 5 1145.45
trwiki 7 1741.18
simplewiki 3 933.33
svwiki 1 333.33
kowiki 3 1030.00
mediawikiwiki 3 1030.00
mswiki 1 425.00
thwiki 1 425.00
ruwiki 1 520.00
enwiki 5931618.67
fawiki 7 3818.42
commonswiki 4 2218.18
viwiki 2 1118.18
jawiki 5 2817.86
eswiki 10 7014.29
zhwiki 4 3611.11
ptwiki 3 2810.71
itwiki 6 62 9.68
\n" ], "text/latex": [ "A tibble: 20 × 4\n", "\\begin{tabular}{llll}\n", " wiki & opt\\_out\\_users & new\\_topic\\_contributors & pct\\_opt\\_out\\\\\n", " & & & \\\\\n", "\\hline\n", "\t bnwiki & 2 & 4 & 50.00\\\\\n", "\t zhwikibooks & 1 & 2 & 50.00\\\\\n", "\t arwiki & 5 & 11 & 45.45\\\\\n", "\t trwiki & 7 & 17 & 41.18\\\\\n", "\t simplewiki & 3 & 9 & 33.33\\\\\n", "\t svwiki & 1 & 3 & 33.33\\\\\n", "\t kowiki & 3 & 10 & 30.00\\\\\n", "\t mediawikiwiki & 3 & 10 & 30.00\\\\\n", "\t mswiki & 1 & 4 & 25.00\\\\\n", "\t thwiki & 1 & 4 & 25.00\\\\\n", "\t ruwiki & 1 & 5 & 20.00\\\\\n", "\t enwiki & 59 & 316 & 18.67\\\\\n", "\t fawiki & 7 & 38 & 18.42\\\\\n", "\t commonswiki & 4 & 22 & 18.18\\\\\n", "\t viwiki & 2 & 11 & 18.18\\\\\n", "\t jawiki & 5 & 28 & 17.86\\\\\n", "\t eswiki & 10 & 70 & 14.29\\\\\n", "\t zhwiki & 4 & 36 & 11.11\\\\\n", "\t ptwiki & 3 & 28 & 10.71\\\\\n", "\t itwiki & 6 & 62 & 9.68\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 20 × 4\n", "\n", "| wiki <chr> | opt_out_users <int> | new_topic_contributors <int> | pct_opt_out <dbl> |\n", "|---|---|---|---|\n", "| bnwiki | 2 | 4 | 50.00 |\n", "| zhwikibooks | 1 | 2 | 50.00 |\n", "| arwiki | 5 | 11 | 45.45 |\n", "| trwiki | 7 | 17 | 41.18 |\n", "| simplewiki | 3 | 9 | 33.33 |\n", "| svwiki | 1 | 3 | 33.33 |\n", "| kowiki | 3 | 10 | 30.00 |\n", "| mediawikiwiki | 3 | 10 | 30.00 |\n", "| mswiki | 1 | 4 | 25.00 |\n", "| thwiki | 1 | 4 | 25.00 |\n", "| ruwiki | 1 | 5 | 20.00 |\n", "| enwiki | 59 | 316 | 18.67 |\n", "| fawiki | 7 | 38 | 18.42 |\n", "| commonswiki | 4 | 22 | 18.18 |\n", "| viwiki | 2 | 11 | 18.18 |\n", "| jawiki | 5 | 28 | 17.86 |\n", "| eswiki | 10 | 70 | 14.29 |\n", "| zhwiki | 4 | 36 | 11.11 |\n", "| ptwiki | 3 | 28 | 10.71 |\n", "| itwiki | 6 | 62 | 9.68 |\n", "\n" ], "text/plain": [ " wiki opt_out_users new_topic_contributors pct_opt_out\n", "1 bnwiki 2 4 50.00 \n", "2 zhwikibooks 1 2 50.00 \n", "3 arwiki 5 11 45.45 \n", "4 trwiki 7 17 41.18 \n", "5 simplewiki 3 9 33.33 \n", "6 svwiki 1 3 33.33 \n", "7 kowiki 3 10 30.00 \n", "8 mediawikiwiki 3 10 30.00 \n", "9 mswiki 1 4 25.00 \n", "10 thwiki 1 4 25.00 \n", "11 ruwiki 1 5 20.00 \n", "12 enwiki 59 316 18.67 \n", "13 fawiki 7 38 18.42 \n", "14 commonswiki 4 22 18.18 \n", "15 viwiki 2 11 18.18 \n", "16 jawiki 5 28 17.86 \n", "17 eswiki 10 70 14.29 \n", "18 zhwiki 4 36 11.11 \n", "19 ptwiki 3 28 10.71 \n", "20 itwiki 6 62 9.68 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "jc_opt_out_contributors_bywiki <- opt_out_contributors %>%\n", " filter(edit_count_group == 'under 100') %>% # only review Junior Contributors\n", " group_by(wiki) %>%\n", " summarise(opt_out_users = sum(opt_out_users),\n", " new_topic_contributors = sum(new_topic_contributor),\n", " pct_opt_out = round(opt_out_users/new_topic_contributors * 100, 2)\n", " ) %>%\n", " filter(new_topic_contributors > 1) %>% # review wikis with more than 1 new topic contributor\n", " arrange(desc(pct_opt_out))\n", "\n", "head(jc_opt_out_contributors_bywiki, 20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A review by wiki appears also does not reveal any surprising trends . The higher opt-out rates are for wikis with only a few new discussion tool users; as result these rates do not accurately represent the population.\n", "\n", "The rates for larger wikis are around 15 to 18%, similar to the overall opt-out rate identifed for Junior Contributors.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since we are only able to access user-specific opt-out data for the last 90 days, this higher opt-out rate for Junior Contributors is likely because Senior Contributors are more likely to have already accessed and decided to opt out of the tool prior to this 90 days.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Arabic and Czech Wikipedias" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\n", "
A tibble: 2 × 4
wikiopt_out_usersnew_topic_contributorspct_opt_out
<chr><int><int><chr>
arwiki95416.67%
cswiki0270%
\n" ], "text/latex": [ "A tibble: 2 × 4\n", "\\begin{tabular}{llll}\n", " wiki & opt\\_out\\_users & new\\_topic\\_contributors & pct\\_opt\\_out\\\\\n", " & & & \\\\\n", "\\hline\n", "\t arwiki & 9 & 54 & 16.67\\%\\\\\n", "\t cswiki & 0 & 27 & 0\\% \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 2 × 4\n", "\n", "| wiki <chr> | opt_out_users <int> | new_topic_contributors <int> | pct_opt_out <chr> |\n", "|---|---|---|---|\n", "| arwiki | 9 | 54 | 16.67% |\n", "| cswiki | 0 | 27 | 0% |\n", "\n" ], "text/plain": [ " wiki opt_out_users new_topic_contributors pct_opt_out\n", "1 arwiki 9 54 16.67% \n", "2 cswiki 0 27 0% " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "opt_out_contributors_byexp <- opt_out_contributors %>%\n", " filter(wiki %in% c('arwiki', 'cswiki')) %>%\n", " group_by(wiki) %>%\n", " summarise(opt_out_users = sum(opt_out_users),\n", " new_topic_contributors = sum(new_topic_contributor),\n", " pct_opt_out = paste0(round(opt_out_users/new_topic_contributors * 100, 2), \"%\"),.groups = 'drop'\n", " )\n", "\n", "opt_out_contributors_byexp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Status of Current Discussion Tool Preference Settings for all New Discussion Tool Contributors\n", "\n", "Data: Based on data recorded in the mediawiki [user_properties table](https://www.mediawiki.org/wiki/Manual:User_properties_table)\n", "\n", "While we are unable to access user-specific preference change events that occured prior to 90 days ago (18 May 2021) in PrefUpdate, I reviewed the user properties database to determine the numbers of new discussion tool contributors that currently have the `discussiontools-betaenable` set to disabled. \n", "\n", "Note: This data reflects just the current nondefault status of user preference and does not provide any details on if the user enabled and disabled the feature multiple times or when they disabled it in relation to their edit. Also, there are contributors that have used the new discussion tool but don't have a preference set in the user properties table, indicated as \"no local preference recorded\" in the results below. Possible reasons for this include: (1) the user disabled the setting by selecting 'restore all default preferences' in their user preferences or (2) the user enabled discussion tools in their global preferences but not in their local preferences. \n", "\n", "Please see summary of results below and \"New_discussion_tool_opt_out_analysis.ipynb\" located in the code repository for further details of current user discussionvtool preferences using the mediawiki user_properties table." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Overall**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "|Current Discussion Tool Preference Status | Percent of New Discussion Tool Contributors|\n", "|-------|-------|\n", "|no local preference recorded|27.14%|\n", "|explicitly disabled|6.05%|\n", "|explicitly enabled|66.81%|\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**By Edit Count Group**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "|edit_count_group| Current Discussion Tool Preference Status | Percent of New Discussion Tool Contributors|\n", "|----|-----|------|\n", "|under 100| no local preference recorded| 28.32%|\n", "||explicitly disabled| 4.05%|\n", "||explicitly enabled| 67.63%|\n", "|100-500| no local preference recorded| 28.87%|\n", "||explicitly disabled| 3.1%|\n", "||explicitly enabled| 68.03%|\n", "|over 500| no local preference recorded| 26.5%|\n", "||explicitly disabled| 7.2%|\n", "||explicitly enabled| 66.3%|\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Arabic and Czech Wikipedias**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "|Wiki| Current Discussion Tool Preference Status | Percent of New Discussion Tool Contributors|\n", "|----|-----|------|\n", "|arwiki |explicitly disabled| 4.84%|\n", "||explicitly enabled| 95.16%|\n", "|cswiki |explicitly disabled| 3.33%|\n", "||explicitly enabled| 96.67%|" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Summary\n", "\n", "From 18 May 2021 through 31 July 2021, 8.32% of contributors that saved at least one new discussion tool edit explicitly opted out of the new discussion tool, indicating that most users of the tool do not find it disruptive. Junior contributors (users with under 100 edits) had the highest opt out rate (15.04%). \n", "\n", "Furher investigation indicates that the higher opt-out rate identified for Junior Contributors is likely due to the reviewed timeframe used for the opt-out analysis. We only retain user-specific data on preference updates for 90 days in PrefUpdate due to privacy concerns. As a result, the opt-out analysis only reflects preference changes between 18 May 2021 through 31 July 2021. It's more likely that Senior Contributors have already accessed and decided to opt-out of the tool prior to this 90 days. A review of data logged in the user properties table shows a slightly lower opt-out rate for Junior Contributors compared to Senior Contributors and still reflects an overall low opt-out rate across all three edit count groups, indicating no significant sign of disruption. \n", "\n", "No new discussion tool contributors have opted out of Czech Wikipedia. There was an 18.03% opt out rate (based on PrefUpdate data) for Arabic Wikipedia. However, each of these wikis had a limited number of contributors that made a new discussion tool edit (61 new discussion tool contributors on Arabic Wikipedia and only 29 on Czech Wikipedia) so this data may not be reflective of the population.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What percent of all edits made with the New Discussion Tool are reverted within 48 hours of being published?\n", "\n", "**Purpose**: Do people NOT using the New Discussion Tool find it disruptive? How does the level of disruption introduced by people using the New Discussion Tool compare to the level of disruption introduced by people using the current experience?\n", "\n", "For this analysis, we reviewed data recorded in [mediawiki_history](https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/MediaWiki_history) to identify the percent comments posted by the reply tool (identified by the revision tag: `discussiontools-newtopic`) on talk pages that are reverted within 48 hours [^revert]. \n", "\n", "[^revert]: 48 hours is a common cutoff, as research suggests that, at least for the English Wikipedia, nearly all reverts take place within 48 hours. Source: Research: Revert. Mediawiki. https://meta.wikimedia.org/wiki/Research:Revert.\n", "\n", "We compared the revert rate for comments published using the new discussion tool to the revert rate for comments made using full page editing (the current editing experience) during the same timeframe. Note: In this analysis, page edits can include any edit made on a talk page not using a discussion tool. This can include both edits to start a new topic and edits to existing comments. " ] }, { "cell_type": "code", "execution_count": 316, "metadata": {}, "outputs": [], "source": [ "## collect all revert edits for new discussion tool and page editing\n", "query <-\n", "\n", "\"SELECT\n", " wiki_db AS wiki,\n", " event_user_id AS user_id,\n", " CASE\n", " WHEN min(event_user_revision_count) < 100 THEN 'under 100'\n", " WHEN (min(event_user_revision_count) >=100 AND min(event_user_revision_count <= 500)) THEN '100-500'\n", " ELSE 'over 500'\n", " END AS edit_count,\n", " max(size(event_user_is_bot_by) > 0 or size(event_user_is_bot_by_historical) > 0) as bot_by_group,\n", " IF(ARRAY_CONTAINS(revision_tags, 'discussiontools-newtopic'), 'new-discussion-tool', 'page-edit') AS editor_type,\n", " SUM(CAST(\n", " revision_is_identity_reverted AND \n", " revision_seconds_to_identity_revert <= 172800 -- 48 hours\n", " AS int)) AS num_reverts,\n", " COUNT(*) as num_comments\n", "FROM wmf.mediawiki_history \n", "WHERE \n", " snapshot = '2021-07'\n", " -- exclude reply tool talk page edits\n", " AND NOT (ARRAY_CONTAINS(revision_tags, 'discussiontools-reply'))\n", " -- include only desktop edits\n", " AND NOT array_contains(revision_tags, 'iOS')\n", " AND NOT array_contains(revision_tags, 'Android')\n", " AND NOT array_contains(revision_tags, 'Mobile Web')\n", " -- find all edits on talk pages \n", " AND page_namespace_historical % 2 = 1\n", " AND event_entity = 'revision'\n", " AND event_type = 'create'\n", " -- date deployed\n", " AND event_timestamp >= '2021-02-18' \n", " AND event_timestamp <= '2021-07-31' -- allow two days to avoid data censoring \n", " -- user is not anonymous\n", " AND event_user_is_anonymous = FALSE\n", "GROUP BY \n", " wiki_db,\n", " event_user_id,\n", " IF(ARRAY_CONTAINS(revision_tags, 'discussiontools-newtopic'), 'new-discussion-tool', 'page-edit')\n", "\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "new_dt_reverts <- wmfdata::query_hive(query)" ] }, { "cell_type": "code", "execution_count": 318, "metadata": {}, "outputs": [], "source": [ "# reformat user-id and adjust to include wiki to account for duplicate user id instances.\n", "# Users can have the smae user_id on different wikis\n", "\n", "new_dt_reverts$user_id <-\n", " as.character(paste(new_dt_reverts$user_id,new_dt_reverts$wiki,sep =\"-\" ))" ] }, { "cell_type": "code", "execution_count": 319, "metadata": {}, "outputs": [], "source": [ "# set factor levels\n", "new_dt_reverts$editor_type <-\n", " factor(\n", " new_dt_reverts$editor_type,\n", " levels = c(\"page-edit\", \"new-discussion-tool\"),\n", " labels = c(\"Page editing\", \"New Discussion Tool\")\n", " )\n", "new_dt_reverts$edit_count <-\n", " factor(new_dt_reverts$edit_count,\n", " levels = c(\"under 100\", \"100-500\", \"over 500\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Overall" ] }, { "cell_type": "code", "execution_count": 320, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\n", "
A tibble: 2 × 4
editor_typetotal_revertstotal_commentsrevert_rate
<fct><int><int><chr>
Page editing 13314260210372.21 %
New Discussion Tool 1053 382492.75 %
\n" ], "text/latex": [ "A tibble: 2 × 4\n", "\\begin{tabular}{llll}\n", " editor\\_type & total\\_reverts & total\\_comments & revert\\_rate\\\\\n", " & & & \\\\\n", "\\hline\n", "\t Page editing & 133142 & 6021037 & 2.21 \\%\\\\\n", "\t New Discussion Tool & 1053 & 38249 & 2.75 \\%\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 2 × 4\n", "\n", "| editor_type <fct> | total_reverts <int> | total_comments <int> | revert_rate <chr> |\n", "|---|---|---|---|\n", "| Page editing | 133142 | 6021037 | 2.21 % |\n", "| New Discussion Tool | 1053 | 38249 | 2.75 % |\n", "\n" ], "text/plain": [ " editor_type total_reverts total_comments revert_rate\n", "1 Page editing 133142 6021037 2.21 % \n", "2 New Discussion Tool 1053 38249 2.75 % " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# overall revert rate for dt and page edits\n", "new_dt_reverts_byexp <- new_dt_reverts %>%\n", " filter(bot_by_group == 'false') %>%\n", " group_by(editor_type) %>%\n", " summarise(total_reverts = sum(num_reverts),\n", " total_comments = sum(num_comments),\n", " revert_rate =paste(round(total_reverts/total_comments * 100, 2), '%'), .groups = 'drop') \n", "\n", "new_dt_reverts_byexp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### By Experience Level" ] }, { "cell_type": "code", "execution_count": 321, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 6 × 5
edit_counteditor_typetotal_revertstotal_commentsrevert_rate
<fct><fct><int><int><chr>
under 100Page editing 57971 7333917.9 %
under 100New Discussion Tool 170 27326.22 %
100-500 Page editing 4602 1071634.29 %
100-500 New Discussion Tool 63 14814.25 %
over 500 Page editing 7056951804831.36 %
over 500 New Discussion Tool 820 340362.41 %
\n" ], "text/latex": [ "A tibble: 6 × 5\n", "\\begin{tabular}{lllll}\n", " edit\\_count & editor\\_type & total\\_reverts & total\\_comments & revert\\_rate\\\\\n", " & & & & \\\\\n", "\\hline\n", "\t under 100 & Page editing & 57971 & 733391 & 7.9 \\% \\\\\n", "\t under 100 & New Discussion Tool & 170 & 2732 & 6.22 \\%\\\\\n", "\t 100-500 & Page editing & 4602 & 107163 & 4.29 \\%\\\\\n", "\t 100-500 & New Discussion Tool & 63 & 1481 & 4.25 \\%\\\\\n", "\t over 500 & Page editing & 70569 & 5180483 & 1.36 \\%\\\\\n", "\t over 500 & New Discussion Tool & 820 & 34036 & 2.41 \\%\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 6 × 5\n", "\n", "| edit_count <fct> | editor_type <fct> | total_reverts <int> | total_comments <int> | revert_rate <chr> |\n", "|---|---|---|---|---|\n", "| under 100 | Page editing | 57971 | 733391 | 7.9 % |\n", "| under 100 | New Discussion Tool | 170 | 2732 | 6.22 % |\n", "| 100-500 | Page editing | 4602 | 107163 | 4.29 % |\n", "| 100-500 | New Discussion Tool | 63 | 1481 | 4.25 % |\n", "| over 500 | Page editing | 70569 | 5180483 | 1.36 % |\n", "| over 500 | New Discussion Tool | 820 | 34036 | 2.41 % |\n", "\n" ], "text/plain": [ " edit_count editor_type total_reverts total_comments revert_rate\n", "1 under 100 Page editing 57971 733391 7.9 % \n", "2 under 100 New Discussion Tool 170 2732 6.22 % \n", "3 100-500 Page editing 4602 107163 4.29 % \n", "4 100-500 New Discussion Tool 63 1481 4.25 % \n", "5 over 500 Page editing 70569 5180483 1.36 % \n", "6 over 500 New Discussion Tool 820 34036 2.41 % " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# wiki revert rate for dt and page edits\n", "new_dt_reverts_byexp<- new_dt_reverts %>%\n", " filter(bot_by_group == 'false') %>%\n", " group_by(edit_count, editor_type) %>%\n", " summarise(total_reverts = sum(num_reverts),\n", " total_comments = sum(num_comments),\n", " revert_rate =paste(round(total_reverts/total_comments * 100, 2), '%'), .groups = 'drop') \n", "\n", "new_dt_reverts_byexp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Arabic and Czech Wikipedia" ] }, { "cell_type": "code", "execution_count": 322, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 4 × 5
wikieditor_typetotal_revertstotal_commentsrevert_rate
<chr><fct><int><int><chr>
arwikiPage editing 2162538634.01 %
arwikiNew Discussion Tool 27 10532.56 %
cswikiPage editing 262193641.35 %
cswikiNew Discussion Tool 3 2721.1 %
\n" ], "text/latex": [ "A tibble: 4 × 5\n", "\\begin{tabular}{lllll}\n", " wiki & editor\\_type & total\\_reverts & total\\_comments & revert\\_rate\\\\\n", " & & & & \\\\\n", "\\hline\n", "\t arwiki & Page editing & 2162 & 53863 & 4.01 \\%\\\\\n", "\t arwiki & New Discussion Tool & 27 & 1053 & 2.56 \\%\\\\\n", "\t cswiki & Page editing & 262 & 19364 & 1.35 \\%\\\\\n", "\t cswiki & New Discussion Tool & 3 & 272 & 1.1 \\% \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 4 × 5\n", "\n", "| wiki <chr> | editor_type <fct> | total_reverts <int> | total_comments <int> | revert_rate <chr> |\n", "|---|---|---|---|---|\n", "| arwiki | Page editing | 2162 | 53863 | 4.01 % |\n", "| arwiki | New Discussion Tool | 27 | 1053 | 2.56 % |\n", "| cswiki | Page editing | 262 | 19364 | 1.35 % |\n", "| cswiki | New Discussion Tool | 3 | 272 | 1.1 % |\n", "\n" ], "text/plain": [ " wiki editor_type total_reverts total_comments revert_rate\n", "1 arwiki Page editing 2162 53863 4.01 % \n", "2 arwiki New Discussion Tool 27 1053 2.56 % \n", "3 cswiki Page editing 262 19364 1.35 % \n", "4 cswiki New Discussion Tool 3 272 1.1 % " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# revert rate for dt and page edits by experience level\n", "new_dt_reverts_bywiki <- new_dt_reverts %>%\n", " filter(bot_by_group == 'false',\n", " wiki %in% c('arwiki', 'cswiki')) %>%\n", " group_by(wiki, editor_type) %>%\n", " summarise(total_reverts = sum(num_reverts),\n", " total_comments = sum(num_comments),\n", " revert_rate =paste(round(total_reverts/total_comments * 100, 2), '%'), .groups = 'drop') \n", "\n", "new_dt_reverts_bywiki" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Summary\n", "\n", "Overall, the revert rate for the new discussion tool is only slightly higher than the revert rate for page editing on talk pages (2.75% for the new discussion tool compared to 2.21% for page editing. \n", "\n", "However, by experience level, the revert rate for the new discussion tool is lower than page editing for Junior Contributors. For editors with under 100 cumulative edits, there was a -21.3% percent decrease the revert rate for editors using the new discussion tool.\n", "\n", "The new discussion tool also had a lower revert rate on both Arabic and Czech Wikipedia compared to page editing on those Wikipedias." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Usage Metrics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are also interested in understanding who has been using the new Discussion Tool and how much they have been using it. \n", "\n", "For this analysis, we reviewed two metrics:\n", "- The percent of distinct contributors who publish at least one new topic with the tool. We reviewed both the percent of all distinct talk page contributors and the percent of all contributors that started a new topic during the reviewed time period. \n", "- For contributors that have posted 1 new topic with the New Discussion Tool, the percent of distinct contributors used the New Discussion Tool to create the following percentage of all new topics within the time period?\n", " * 0%-25% of new topics\n", " * 26%-50% of new topics\n", " * 51%-75% of new topics\n", " * 76%-100% of new topics\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What percent of distinct contributors publish at least one new topic with the tool?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### All Talk Page Contributors\n", "\n", "We first reviewed the percent of distinct contributors that publish at least one new topic with the new discussion tool out of all talk page contributors [^3]. \n", "\n", "[^3]: This includes anyone that has made at least one talk page edit (including posting new comments or sections or editing exiting comments) on any of talk namespaces during the reviewed time period." ] }, { "cell_type": "code", "execution_count": 212, "metadata": {}, "outputs": [], "source": [ "# Collect users new topic edits by user over deployment time period and remove bots\n", "# use mediawiki-history as it includes all saved edits at 100 percent sampling rate\n", "\n", "query <- \"\n", "\n", "SELECT\n", " to_date(event_timestamp) as `date`,\n", " wiki_db AS wiki,\n", " event_user_id AS `user`,\n", " max(size(event_user_is_bot_by) > 0 or size(event_user_is_bot_by_historical) > 0) as bot_by_group,\n", " CASE\n", " WHEN min(event_user_revision_count) < 100 THEN 'under 100'\n", " WHEN (min(event_user_revision_count) >=100 AND min(event_user_revision_count <= 500)) THEN '100-500'\n", " ELSE 'over 500'\n", " END AS edit_count,\n", " SUM(CAST(ARRAY_CONTAINS(revision_tags, 'discussiontools-newtopic') AS INT)) AS new_topic_edits,\n", " COUNT(*) AS all_talk_edits\n", "FROM wmf.mediawiki_history\n", "WHERE \n", " snapshot = '2021-07' \n", "-- include only desktop edits\n", " AND NOT array_contains(revision_tags, 'iOS')\n", " AND NOT array_contains(revision_tags, 'Android')\n", " AND NOT array_contains(revision_tags, 'Mobile Web')\n", "-- review all talk namespaces\n", " AND page_namespace_historical % 2 = 1 \n", "-- date of first deployment \n", " AND event_timestamp >= '2021-02-18' \n", " AND event_timestamp <= '2021-07-31' \n", " AND event_entity = 'revision' \n", " AND event_type = 'create' \n", "-- remove logged out users\n", " AND event_user_is_anonymous = FALSE\n", "GROUP BY\n", " to_date(event_timestamp),\n", " wiki_db,\n", " event_user_id \n", "\"" ] }, { "cell_type": "code", "execution_count": 213, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Don't forget to authenticate with Kerberos using kinit\n", "\n" ] } ], "source": [ "discussion_tool_users <- wmfdata::query_hive(query)" ] }, { "cell_type": "code", "execution_count": 214, "metadata": {}, "outputs": [], "source": [ "write_csv(discussion_tool_users, file = 'Data/discussion_tool_users.csv')" ] }, { "cell_type": "code", "execution_count": 218, "metadata": {}, "outputs": [], "source": [ "discussion_tool_users$date <- as.Date(discussion_tool_users$date, format = \"%Y-%m-%d\")" ] }, { "cell_type": "code", "execution_count": 216, "metadata": {}, "outputs": [], "source": [ "# reformat user-id and adjust to include wiki to account for duplicate user id instances.\n", "\n", "discussion_tool_users$user <-\n", " as.character(paste(discussion_tool_users$user, discussion_tool_users$wiki, sep =\"-\"))\n", "\n", "# set discussion tool factor levels\n", "discussion_tool_users$edit_count <-\n", " factor(discussion_tool_users$edit_count,\n", " levels = c(\"under 100\", \"100-500\", \"over 500\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Overall" ] }, { "cell_type": "code", "execution_count": 251, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\n", "
A data.frame: 1 × 2
new_discussion_usersnew_discussion_edits
<int><int>
538838261
\n" ], "text/latex": [ "A data.frame: 1 × 2\n", "\\begin{tabular}{ll}\n", " new\\_discussion\\_users & new\\_discussion\\_edits\\\\\n", " & \\\\\n", "\\hline\n", "\t 5388 & 38261\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 1 × 2\n", "\n", "| new_discussion_users <int> | new_discussion_edits <int> |\n", "|---|---|\n", "| 5388 | 38261 |\n", "\n" ], "text/plain": [ " new_discussion_users new_discussion_edits\n", "1 5388 38261 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# overall numbers since deployment\n", "new_discussion_contributors <- discussion_tool_users %>%\n", " filter(bot_by_group == 'false') %>% # remove bots\n", " summarise(new_discussion_users = n_distinct(user[new_topic_edits >= 1]) ,\n", " new_discussion_edits = sum(new_topic_edits))\n", "\n", "new_discussion_contributors" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since deployment as a beta feature on 18 February 2021, a total of 5,388 distinct users have posted at least one new topic using the new discussion tool. There have been a total of 38,261 edits using the new discussion tool." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To put these numbers into context, we reviewed the percent of contributors that edited a talk page and made at least 1 new topic using the new discussion tool during the reviewed time. Note: For this calculation, we only reviewed the time period when the new discussion tool was available to all wikis. " ] }, { "cell_type": "code", "execution_count": 324, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\n", "
A data.frame: 1 × 3
new_discussion_contributorsall_talk_contributorspct_new_discussion_users
<int><int><chr>
51852073842.5%
\n" ], "text/latex": [ "A data.frame: 1 × 3\n", "\\begin{tabular}{lll}\n", " new\\_discussion\\_contributors & all\\_talk\\_contributors & pct\\_new\\_discussion\\_users\\\\\n", " & & \\\\\n", "\\hline\n", "\t 5185 & 207384 & 2.5\\%\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 1 × 3\n", "\n", "| new_discussion_contributors <int> | all_talk_contributors <int> | pct_new_discussion_users <chr> |\n", "|---|---|---|\n", "| 5185 | 207384 | 2.5% |\n", "\n" ], "text/plain": [ " new_discussion_contributors all_talk_contributors pct_new_discussion_users\n", "1 5185 207384 2.5% " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# pct talk page users\n", "new_discussion_contributors_pct <- discussion_tool_users %>%\n", " filter(bot_by_group == 'false',\n", " date >= '2021-03-17') %>% #day of deployment to all wikis\n", " summarise(new_discussion_contributors = n_distinct(user[new_topic_edits >= 1]),\n", " all_talk_contributors = n_distinct(user),\n", " pct_new_discussion_users = paste0(round(new_discussion_contributors/all_talk_contributors * 100, 2), '%')\n", " )\n", "\n", "\n", "new_discussion_contributors_pct " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### By Experience Level" ] }, { "cell_type": "code", "execution_count": 348, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 3 × 4
edit_countnew_discussion_contributorsall_talk_contributorspct_new_discussion_contributors
<fct><int><int><chr>
under 10010521408590.75%
100-500 877 244443.59%
over 500 3480 485657.17%
\n" ], "text/latex": [ "A tibble: 3 × 4\n", "\\begin{tabular}{llll}\n", " edit\\_count & new\\_discussion\\_contributors & all\\_talk\\_contributors & pct\\_new\\_discussion\\_contributors\\\\\n", " & & & \\\\\n", "\\hline\n", "\t under 100 & 1052 & 140859 & 0.75\\%\\\\\n", "\t 100-500 & 877 & 24444 & 3.59\\%\\\\\n", "\t over 500 & 3480 & 48565 & 7.17\\%\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 3 × 4\n", "\n", "| edit_count <fct> | new_discussion_contributors <int> | all_talk_contributors <int> | pct_new_discussion_contributors <chr> |\n", "|---|---|---|---|\n", "| under 100 | 1052 | 140859 | 0.75% |\n", "| 100-500 | 877 | 24444 | 3.59% |\n", "| over 500 | 3480 | 48565 | 7.17% |\n", "\n" ], "text/plain": [ " edit_count new_discussion_contributors all_talk_contributors\n", "1 under 100 1052 140859 \n", "2 100-500 877 24444 \n", "3 over 500 3480 48565 \n", " pct_new_discussion_contributors\n", "1 0.75% \n", "2 3.59% \n", "3 7.17% " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# pct talk page users by experience levels\n", "new_discussion_contributors_pct_byexp <- discussion_tool_users %>%\n", " filter(bot_by_group == 'false',\n", " date >= '2021-03-17') %>% #day of deployment to all wikis\n", " mutate(all_new_discussion_contributors = n_distinct(user[new_topic_edits >= 1])) %>%\n", " group_by(edit_count) %>% \n", " summarise(new_discussion_contributors = n_distinct(user[new_topic_edits >= 1]),\n", " all_talk_contributors = n_distinct(user),\n", " pct_new_discussion_contributors = paste0(round(new_discussion_contributors/all_talk_contributors *100, 2), '%'),.groups = 'drop'\n", " ) %>% \n", " distinct()\n", "\n", "\n", "new_discussion_contributors_pct_byexp " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Arabic and Czech Wiki" ] }, { "cell_type": "code", "execution_count": 349, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\n", "
A tibble: 2 × 4
wikinew_discussion_contributorsall_talk_contributorspct_new_discussion_contributors
<chr><int><int><chr>
arwiki6270810.88%
cswiki3016741.79%
\n" ], "text/latex": [ "A tibble: 2 × 4\n", "\\begin{tabular}{llll}\n", " wiki & new\\_discussion\\_contributors & all\\_talk\\_contributors & pct\\_new\\_discussion\\_contributors\\\\\n", " & & & \\\\\n", "\\hline\n", "\t arwiki & 62 & 7081 & 0.88\\%\\\\\n", "\t cswiki & 30 & 1674 & 1.79\\%\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 2 × 4\n", "\n", "| wiki <chr> | new_discussion_contributors <int> | all_talk_contributors <int> | pct_new_discussion_contributors <chr> |\n", "|---|---|---|---|\n", "| arwiki | 62 | 7081 | 0.88% |\n", "| cswiki | 30 | 1674 | 1.79% |\n", "\n" ], "text/plain": [ " wiki new_discussion_contributors all_talk_contributors\n", "1 arwiki 62 7081 \n", "2 cswiki 30 1674 \n", " pct_new_discussion_contributors\n", "1 0.88% \n", "2 1.79% " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "new_discussion_contributors_pct_bywikis <- discussion_tool_users %>%\n", " filter(bot_by_group == 'false',\n", " wiki %in% c('arwiki', 'cswiki')) %>% #no date filter needed as it was deployed at these wikis since deployment date\n", " group_by(wiki) %>%\n", " summarise(new_discussion_contributors = n_distinct(user[new_topic_edits >= 1]),\n", " all_talk_contributors = n_distinct(user),\n", " pct_new_discussion_contributors = paste0(round(new_discussion_contributors/all_talk_contributors * 100, 2), '%'),.groups = 'drop'\n", " )\n", "\n", "new_discussion_contributors_pct_bywikis " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Overall, 2.5% of all talk page contributors have posted at least one new topic using the new discussion tool since March 17th (when available at all wikis as an opt-in beta feature) through the end of July. \n", "\n", "Senior contributors are the more frequent users of the tool. 7.2% of users with over 500 edits that edited a talk page during the reviewed time period made an edit with the new discussion tool. \n", "\n", "Usage of the new discussion tool on Arabic and Czech Wikipedias are somewhat low with only 0.88% of talk page editors on Arabic Wikipedia and 1.79% of all talk page editors on Czech Wikipedias making an edit with the new discussion tool.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## New Section Usage" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the analysis below, we also reviewed the percent of distinct contributors that publish at least one new topic with the new discussion tool but only reviewed contributors that created a new topic on talk page during the reviewed time period.\n", "\n", "We used data EditAttemptStep for this analysis as it allows us distinguish edits to existing sections from edits associated with the creation of new sections.\n" ] }, { "cell_type": "code", "execution_count": 229, "metadata": {}, "outputs": [], "source": [ "query <-\n", "\"\n", "SELECT \n", " CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')) as `date`,\n", " wiki AS wiki,\n", " event.user_id AS `user`,\n", " CASE\n", " WHEN min(event.user_editcount) < 100 THEN 'under 100'\n", " WHEN (min(event.user_editcount) >=100 AND min(event.user_editcount <= 500)) THEN '100-500'\n", " ELSE 'over 500'\n", " END AS edit_count,\n", "-- new page section edits\n", " SUM(CAST(event.integration = 'page' AND (event.init_mechanism = 'url-new' OR event.init_mechanism == 'new') AS INT)) AS page_edit,\n", "-- new discussion tool edits\n", " SUM(CAST(event.integration ='discussiontools' AS INT)) AS dt_edit\n", "FROM event_sanitized.editattemptstep\n", "WHERE\n", "-- section edits\n", " event.action = 'init'\n", " AND event.init_type = 'section'\n", " AND year = 2021\n", "-- review events following deployment\n", " AND dt >= '2021-02-18'\n", " AND dt <= '2021-07-31'\n", " -- review all talk namespaces\n", " AND event.platform = 'desktop'\n", " AND event.page_ns % 2 = 1\n", " AND event.user_id != 0\n", "GROUP BY\n", " CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0')),\n", " wiki, \n", " event.user_id\n", "\"\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "new_section_contributors <- wmfdata::query_hive(query)" ] }, { "cell_type": "code", "execution_count": 231, "metadata": {}, "outputs": [], "source": [ "new_section_contributors$date <- as.Date(new_section_contributors$date, format = \"%Y-%m-%d\")" ] }, { "cell_type": "code", "execution_count": 232, "metadata": {}, "outputs": [], "source": [ "# reformat user-id and adjust to include wiki to account for duplicate user id instances.\n", "\n", "new_section_contributors$user <-\n", " as.character(paste(new_section_contributors$user, new_section_contributors$wiki, sep =\"-\"))\n", "\n", "# set edit count factor levels\n", "new_section_contributors$edit_count <-\n", " factor(new_section_contributors$edit_count,\n", " levels = c(\"under 100\", \"100-500\", \"over 500\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Overall" ] }, { "cell_type": "code", "execution_count": 233, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\n", "
A data.frame: 1 × 3
page_editorsdt_editorpct_dt_editors
<int><int><chr>
19659568822.44%
\n" ], "text/latex": [ "A data.frame: 1 × 3\n", "\\begin{tabular}{lll}\n", " page\\_editors & dt\\_editor & pct\\_dt\\_editors\\\\\n", " & & \\\\\n", "\\hline\n", "\t 19659 & 5688 & 22.44\\%\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 1 × 3\n", "\n", "| page_editors <int> | dt_editor <int> | pct_dt_editors <chr> |\n", "|---|---|---|\n", "| 19659 | 5688 | 22.44% |\n", "\n" ], "text/plain": [ " page_editors dt_editor pct_dt_editors\n", "1 19659 5688 22.44% " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "new_topic_edits <- new_section_contributors %>%\n", "# date released to all wikis\n", " filter(date >= '2021-03-17') %>%\n", " summarize(page_editors = n_distinct(user[page_edit >= 1]),\n", " dt_editor = n_distinct(user[dt_edit >=1]),\n", " pct_dt_editors = paste0(round(dt_editor/(dt_editor + page_editors) * 100,2), '%')\n", " )\n", "\n", "new_topic_edits" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### By Experience Level" ] }, { "cell_type": "code", "execution_count": 350, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 3 × 4
edit_countpage_editorsdt_editorpct_dt_editors
<fct><int><int><chr>
under 10010364145912.34%
100-500 1949 94732.7%
over 500 7595349631.52%
\n" ], "text/latex": [ "A tibble: 3 × 4\n", "\\begin{tabular}{llll}\n", " edit\\_count & page\\_editors & dt\\_editor & pct\\_dt\\_editors\\\\\n", " & & & \\\\\n", "\\hline\n", "\t under 100 & 10364 & 1459 & 12.34\\%\\\\\n", "\t 100-500 & 1949 & 947 & 32.7\\% \\\\\n", "\t over 500 & 7595 & 3496 & 31.52\\%\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 3 × 4\n", "\n", "| edit_count <fct> | page_editors <int> | dt_editor <int> | pct_dt_editors <chr> |\n", "|---|---|---|---|\n", "| under 100 | 10364 | 1459 | 12.34% |\n", "| 100-500 | 1949 | 947 | 32.7% |\n", "| over 500 | 7595 | 3496 | 31.52% |\n", "\n" ], "text/plain": [ " edit_count page_editors dt_editor pct_dt_editors\n", "1 under 100 10364 1459 12.34% \n", "2 100-500 1949 947 32.7% \n", "3 over 500 7595 3496 31.52% " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "new_topic_edits_byexperience <- new_section_contributors %>%\n", "# date released to all wikis\n", " filter(date >= '2021-03-17') %>%\n", " group_by(edit_count) %>%\n", " summarize(page_editors = n_distinct(user[page_edit >= 1]),\n", " dt_editor = n_distinct(user[dt_edit >=1]),\n", " pct_dt_editors = paste0(round(dt_editor/(dt_editor + page_editors) * 100,2), '%'),.groups = 'drop'\n", " )\n", "\n", "new_topic_edits_byexperience " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Arabic and Czech Wikipedia" ] }, { "cell_type": "code", "execution_count": 351, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\n", "
A tibble: 2 × 4
wikipage_editorsdt_editorpct_dt_editors
<chr><int><int><chr>
arwiki3879319.38%
cswiki1263220.25%
\n" ], "text/latex": [ "A tibble: 2 × 4\n", "\\begin{tabular}{llll}\n", " wiki & page\\_editors & dt\\_editor & pct\\_dt\\_editors\\\\\n", " & & & \\\\\n", "\\hline\n", "\t arwiki & 387 & 93 & 19.38\\%\\\\\n", "\t cswiki & 126 & 32 & 20.25\\%\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 2 × 4\n", "\n", "| wiki <chr> | page_editors <int> | dt_editor <int> | pct_dt_editors <chr> |\n", "|---|---|---|---|\n", "| arwiki | 387 | 93 | 19.38% |\n", "| cswiki | 126 | 32 | 20.25% |\n", "\n" ], "text/plain": [ " wiki page_editors dt_editor pct_dt_editors\n", "1 arwiki 387 93 19.38% \n", "2 cswiki 126 32 20.25% " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "new_topic_edits_bywiki <- new_section_contributors %>%\n", "# date released to all wikis\n", " filter(wiki %in% c('arwiki', 'cswiki')) %>%\n", " group_by(wiki) %>%\n", " summarize(page_editors = n_distinct(user[page_edit >= 1]),\n", " dt_editor = n_distinct(user[dt_edit >=1]),\n", " pct_dt_editors = paste0(round(dt_editor/(dt_editor + page_editors) * 100,2), '%'),.groups = 'drop'\n", " )\n", "\n", "new_topic_edits_bywiki" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Summary\n", "\n", "During the reviewed time period, 22.4% of all contributors that created a new topic on a talk page posted at least one new topic using the new discussion tool. \n", "\n", "Senior contributors more commonly used the tool at least once to create a new topic compared to Junior Contributors. Almost half (46.5%) of contributors with over 100 edits that created a new topic on a talk page posted at least one of their new topics using the new discussion tool. \n", "\n", "Similar to the noted proportion across all Wikipedias, 19.4% of Arabic contributors and 20.3% of Czech contributors that posted a new topic used the new discussion tool at least once." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## For contributors that have posted more than one new topic, what percent of distinct contributors used the New Discussion Tool to create the following percentage of all new topics within the time period?[^4]\n", "\n", "Purpose: How much are they using it? This metric helps us understand how many times people chose to use the New Discussion Tool in relation to the number of opportunities they had to use it. For this analysis, we limited our review to contributors that had accesss and used the tool at least once.\n", "\n", " * 0%-25% of new topics\n", " * 25%-50% of new topics\n", " * 50%-75% of new topics\n", " * 75%-100% of new topics\n", "\n", "\n", "[^4]: This metric has some slight noise as there could be cases where the following people end up looking the same in the data. Person A: added two new topics to talk pages in the reviewed timeframe, one of which was with the new discussion tool; Person B: made a total of 150 new topics to talk pages, 75 of which were with the New Discussion tool." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Overall" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### How many contributors made just 1 edit using the new discussion tool?" ] }, { "cell_type": "code", "execution_count": 329, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\n", "
A data.frame: 1 × 3
one_time_editorsall_editorspct_1_dt_edit
<int><int><chr>
5166568890.82%
\n" ], "text/latex": [ "A data.frame: 1 × 3\n", "\\begin{tabular}{lll}\n", " one\\_time\\_editors & all\\_editors & pct\\_1\\_dt\\_edit\\\\\n", " & & \\\\\n", "\\hline\n", "\t 5166 & 5688 & 90.82\\%\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 1 × 3\n", "\n", "| one_time_editors <int> | all_editors <int> | pct_1_dt_edit <chr> |\n", "|---|---|---|\n", "| 5166 | 5688 | 90.82% |\n", "\n" ], "text/plain": [ " one_time_editors all_editors pct_1_dt_edit\n", "1 5166 5688 90.82% " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "new_dt_contributors_1edit <- new_section_contributors %>%\n", " filter(date >= '2021-03-17') %>%\n", " summarise(one_time_editors = n_distinct(user[dt_edit ==1]),\n", " all_editors = n_distinct(user[dt_edit >= 1]),\n", " pct_1_dt_edit = paste0(round(one_time_editors/all_editors * 100, 2), \"%\") )\n", " \n", " \n", "new_dt_contributors_1edit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Most contributors (90.82%) that used the new discussion tool posted just one new topic with the tool during the reviewed timeframe." ] }, { "cell_type": "code", "execution_count": 145, "metadata": {}, "outputs": [], "source": [ "#Divide new discussion tool edits into groups\n", "b <- c(0, 25, 50, 75, 100)\n", "names <- c('1-25 percent', '26-50 percent', '51-75 percent', '76-100 percent')" ] }, { "cell_type": "code", "execution_count": 352, "metadata": {}, "outputs": [], "source": [ "new_dt_contributors_prop <- new_section_contributors %>%\n", " filter(date >= '2021-03-17') %>%\n", " filter(dt_edit >= 1,\n", " page_edit + dt_edit > 1) %>% # only editors that have posted at least 1 new topic with the tool and posted more than 1 new topic\n", " group_by(user) %>% \n", " summarise(dt_edit = sum(dt_edit),\n", " page_edit = sum(page_edit),\n", " pct_dt_edit = dt_edit/(dt_edit + page_edit) * 100,\n", " new_discussion_edits_group = cut(pct_dt_edit, breaks = b, labels = names) ,.groups = 'drop'\n", " )\n", " " ] }, { "cell_type": "code", "execution_count": 354, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 3 × 3
new_discussion_edits_groupn_userspct_new_discussion_contributors
<fct><int><chr>
26-50 percent 572.63%
51-75 percent 452.08%
76-100 percent206595.29%
\n" ], "text/latex": [ "A tibble: 3 × 3\n", "\\begin{tabular}{lll}\n", " new\\_discussion\\_edits\\_group & n\\_users & pct\\_new\\_discussion\\_contributors\\\\\n", " & & \\\\\n", "\\hline\n", "\t 26-50 percent & 57 & 2.63\\% \\\\\n", "\t 51-75 percent & 45 & 2.08\\% \\\\\n", "\t 76-100 percent & 2065 & 95.29\\%\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 3 × 3\n", "\n", "| new_discussion_edits_group <fct> | n_users <int> | pct_new_discussion_contributors <chr> |\n", "|---|---|---|\n", "| 26-50 percent | 57 | 2.63% |\n", "| 51-75 percent | 45 | 2.08% |\n", "| 76-100 percent | 2065 | 95.29% |\n", "\n" ], "text/plain": [ " new_discussion_edits_group n_users pct_new_discussion_contributors\n", "1 26-50 percent 57 2.63% \n", "2 51-75 percent 45 2.08% \n", "3 76-100 percent 2065 95.29% " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Breakdown of contributors by percent use\n", "\n", "prop_new_dt_overall <- new_dt_contributors_prop %>%\n", " group_by(new_discussion_edits_group ) %>%\n", " summarise(n_users = n(),.groups = 'drop') %>%\n", " mutate(pct_new_discussion_contributors = paste0(round(n_users/sum(n_users) * 100, 2), \"%\")\n", " )\n", "\n", "prop_new_dt_overall" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## By Experience Level" ] }, { "cell_type": "code", "execution_count": 355, "metadata": {}, "outputs": [], "source": [ "new_dt_contributors_prop_exp <- new_section_contributors %>%\n", " filter(date >= '2021-03-17') %>%\n", " filter(dt_edit >= 1,\n", " page_edit + dt_edit > 1) %>% # only editors that have posted at least 1 new topic with the tool and posted more than 1 new topic\n", " group_by(user, edit_count) %>% \n", " summarise(dt_edit = sum(dt_edit),\n", " page_edit = sum(page_edit),\n", " pct_dt_edit = dt_edit/(dt_edit + page_edit) * 100,\n", " new_discussion_edits_group = cut(pct_dt_edit, breaks = b, labels = names),.groups = 'drop' \n", " )\n" ] }, { "cell_type": "code", "execution_count": 359, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'edit_count' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 9 × 4
edit_countnew_discussion_edits_groupn_userspct_new_discussion_contributors
<fct><fct><int><chr>
under 10026-50 percent 195.18%
under 10051-75 percent 195.18%
under 10076-100 percent 32989.65%
100-500 26-50 percent 113.81%
100-500 51-75 percent 62.08%
100-500 76-100 percent 27294.12%
over 500 26-50 percent 301.91%
over 500 51-75 percent 241.53%
over 500 76-100 percent151896.56%
\n" ], "text/latex": [ "A grouped\\_df: 9 × 4\n", "\\begin{tabular}{llll}\n", " edit\\_count & new\\_discussion\\_edits\\_group & n\\_users & pct\\_new\\_discussion\\_contributors\\\\\n", " & & & \\\\\n", "\\hline\n", "\t under 100 & 26-50 percent & 19 & 5.18\\% \\\\\n", "\t under 100 & 51-75 percent & 19 & 5.18\\% \\\\\n", "\t under 100 & 76-100 percent & 329 & 89.65\\%\\\\\n", "\t 100-500 & 26-50 percent & 11 & 3.81\\% \\\\\n", "\t 100-500 & 51-75 percent & 6 & 2.08\\% \\\\\n", "\t 100-500 & 76-100 percent & 272 & 94.12\\%\\\\\n", "\t over 500 & 26-50 percent & 30 & 1.91\\% \\\\\n", "\t over 500 & 51-75 percent & 24 & 1.53\\% \\\\\n", "\t over 500 & 76-100 percent & 1518 & 96.56\\%\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 9 × 4\n", "\n", "| edit_count <fct> | new_discussion_edits_group <fct> | n_users <int> | pct_new_discussion_contributors <chr> |\n", "|---|---|---|---|\n", "| under 100 | 26-50 percent | 19 | 5.18% |\n", "| under 100 | 51-75 percent | 19 | 5.18% |\n", "| under 100 | 76-100 percent | 329 | 89.65% |\n", "| 100-500 | 26-50 percent | 11 | 3.81% |\n", "| 100-500 | 51-75 percent | 6 | 2.08% |\n", "| 100-500 | 76-100 percent | 272 | 94.12% |\n", "| over 500 | 26-50 percent | 30 | 1.91% |\n", "| over 500 | 51-75 percent | 24 | 1.53% |\n", "| over 500 | 76-100 percent | 1518 | 96.56% |\n", "\n" ], "text/plain": [ " edit_count new_discussion_edits_group n_users pct_new_discussion_contributors\n", "1 under 100 26-50 percent 19 5.18% \n", "2 under 100 51-75 percent 19 5.18% \n", "3 under 100 76-100 percent 329 89.65% \n", "4 100-500 26-50 percent 11 3.81% \n", "5 100-500 51-75 percent 6 2.08% \n", "6 100-500 76-100 percent 272 94.12% \n", "7 over 500 26-50 percent 30 1.91% \n", "8 over 500 51-75 percent 24 1.53% \n", "9 over 500 76-100 percent 1518 96.56% " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Breakdown of contributors by percent use\n", "\n", "prop_new_dt_byexperience <- new_dt_contributors_prop_exp %>%\n", " group_by(edit_count, new_discussion_edits_group) %>%\n", " summarise(n_users = n()) %>%\n", " mutate(pct_new_discussion_contributors = paste0(round(n_users/sum(n_users) * 100, 2), \"%\")\n", " )\n", "\n", "prop_new_dt_byexperience " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Arabic and Czech Wikipedias" ] }, { "cell_type": "code", "execution_count": 360, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 3 × 4
edit_countone_time_editorsall_editorspct_1_dt_edit
<fct><int><int><chr>
under 100364776.6%
100-500 81080%
over 500 646992.75%
\n" ], "text/latex": [ "A tibble: 3 × 4\n", "\\begin{tabular}{llll}\n", " edit\\_count & one\\_time\\_editors & all\\_editors & pct\\_1\\_dt\\_edit\\\\\n", " & & & \\\\\n", "\\hline\n", "\t under 100 & 36 & 47 & 76.6\\% \\\\\n", "\t 100-500 & 8 & 10 & 80\\% \\\\\n", "\t over 500 & 64 & 69 & 92.75\\%\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 3 × 4\n", "\n", "| edit_count <fct> | one_time_editors <int> | all_editors <int> | pct_1_dt_edit <chr> |\n", "|---|---|---|---|\n", "| under 100 | 36 | 47 | 76.6% |\n", "| 100-500 | 8 | 10 | 80% |\n", "| over 500 | 64 | 69 | 92.75% |\n", "\n" ], "text/plain": [ " edit_count one_time_editors all_editors pct_1_dt_edit\n", "1 under 100 36 47 76.6% \n", "2 100-500 8 10 80% \n", "3 over 500 64 69 92.75% " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "new_dt_contributors_1edit_bywiki <- new_section_contributors %>%\n", " filter(wiki %in% c('arwiki', 'cswiki')) %>%\n", " group_by(edit_count) %>%\n", " summarise(one_time_editors = n_distinct(user[dt_edit ==1]),\n", " all_editors = n_distinct(user[dt_edit >= 1]),\n", " pct_1_dt_edit = paste0(round(one_time_editors/all_editors * 100, 2), \"%\"),.groups = 'drop' )\n", " \n", " \n", "new_dt_contributors_1edit_bywiki" ] }, { "cell_type": "code", "execution_count": 361, "metadata": {}, "outputs": [], "source": [ "new_dt_contributors_prop_wiki <- new_section_contributors %>%\n", " filter(dt_edit >= 1,\n", " page_edit + dt_edit > 1,\n", " wiki %in% c('arwiki', 'cswiki')) %>% # only editors that have posted at least 1 new topic with the tool and posted more than 1 new topic\n", " group_by(user, wiki) %>% \n", " summarise(dt_edit = sum(dt_edit),\n", " page_edit = sum(page_edit),\n", " pct_dt_edit = dt_edit/(dt_edit + page_edit) * 100,\n", " new_discussion_edits_group = cut(pct_dt_edit, breaks = b, labels = names),.groups = 'drop' \n", " )\n" ] }, { "cell_type": "code", "execution_count": 364, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'wiki' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 5 × 4
wikinew_discussion_edits_groupn_userspercent_new_dt_users
<chr><fct><int><chr>
arwiki26-50 percent 25.13%
arwiki51-75 percent 12.56%
arwiki76-100 percent3692.31%
cswiki51-75 percent 313.64%
cswiki76-100 percent1986.36%
\n" ], "text/latex": [ "A grouped\\_df: 5 × 4\n", "\\begin{tabular}{llll}\n", " wiki & new\\_discussion\\_edits\\_group & n\\_users & percent\\_new\\_dt\\_users\\\\\n", " & & & \\\\\n", "\\hline\n", "\t arwiki & 26-50 percent & 2 & 5.13\\% \\\\\n", "\t arwiki & 51-75 percent & 1 & 2.56\\% \\\\\n", "\t arwiki & 76-100 percent & 36 & 92.31\\%\\\\\n", "\t cswiki & 51-75 percent & 3 & 13.64\\%\\\\\n", "\t cswiki & 76-100 percent & 19 & 86.36\\%\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 5 × 4\n", "\n", "| wiki <chr> | new_discussion_edits_group <fct> | n_users <int> | percent_new_dt_users <chr> |\n", "|---|---|---|---|\n", "| arwiki | 26-50 percent | 2 | 5.13% |\n", "| arwiki | 51-75 percent | 1 | 2.56% |\n", "| arwiki | 76-100 percent | 36 | 92.31% |\n", "| cswiki | 51-75 percent | 3 | 13.64% |\n", "| cswiki | 76-100 percent | 19 | 86.36% |\n", "\n" ], "text/plain": [ " wiki new_discussion_edits_group n_users percent_new_dt_users\n", "1 arwiki 26-50 percent 2 5.13% \n", "2 arwiki 51-75 percent 1 2.56% \n", "3 arwiki 76-100 percent 36 92.31% \n", "4 cswiki 51-75 percent 3 13.64% \n", "5 cswiki 76-100 percent 19 86.36% " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Breakdown of contributors by percent use\n", "\n", "prop_new_dt_bywiki <- new_dt_contributors_prop_wiki %>%\n", " group_by(wiki, new_discussion_edits_group ) %>%\n", " summarise(n_users = n(),.groups = NULL) %>%\n", " mutate(percent_new_dt_users = paste0(round(n_users/sum(n_users) * 100, 2), \"%\")\n", " )\n", "\n", "prop_new_dt_bywiki" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Summary\n", "\n", "Most contributors (90.82%) that used the new discussion tool posted just one new topic with the tool during the reviewed timeframe. Of the contributors that posted more than one new topic on a talk page, 95.3% of these contributors posted between 75 to 100 percent of their new topics using the new discussion tool, indicating that these contributors chose to use the tool when presented with an opportunity to start a new topic.\n", "\n", "For all three levels of editor experience, over 89% of all contributors that posted more than one new topic used the new discussion tool to make between 76-100 percent of their new topics. Senior contributors made the highest proprotion of their new topic edits using the new discussion tool (96.56% made between 76-100 percent of their new topic edits) compared to Junior Contributors (89.65% made between 76-100 percent of their new topic edits).\n", "\n", "The majority of contributors on on Arabic and Czech Wikipedia also 76-100 percent of their new topic using the new discussion tool." ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 4 }