{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Content Translation Article Deletion Ratios Across All Wikipedias\n", "\n", "[Task](https://phabricator.wikimedia.org/T286636)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Background" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From task description:\n", "\n", "\"Across all languages, Wikipedia articles created with Content Translation are deleted less often than those created from scratch. For example, in 2020, 3% of new translations were deleted, compared to 12% of other new articles. However, this is not the case for all Wikipedias and some specific wikis have a higher deletion rate for translations. For example, for Indonesian ([T219851#5914691](https://phabricator.wikimedia.org/T219851#5914691)) and Telugu ([T244769](https://phabricator.wikimedia.org/T244769)) the deletion ratios for Content Translation were higher compared to other articles created in these wikis.\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Purpose\n", "\n", "The purpose of this analysis is to identify and list the number of wikis where the deletion rate of articles created with content translation is higher than the deletion rate for articles created with other tools. Specifically, we want to answer the following questions:\n", "\n", "* How many wikis have translations deleted more often than regular articles?\n", "* Which are these wikis?\n", "* Has the number of those wikis reduced compared to the previous period?\n", "* How high is the highest deletion ratio a wiki has for translations?\n", "\n", "This analysis will be used as a baseline to assess the evolution of deletion rates as improvements are made. \n", "\n", "Results are updated quarterly and documented on [wiki](https://www.mediawiki.org/wiki/Content_translation/Deletion_statistics_comparison).\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data\n", "\n", "Data comes from the [mediawiki_history](https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/MediaWiki_history) table and reflects the deletion ratios of main namespace articles that were created using Content Translation compared to the deletion ratio for main namespace articles created without the tool. Bots were excluded. \n", "\n", "This data is collected quarterly (every three months) to assess the evolution of deletion rates as improvements are made. This timespan was selected to caputre a sufficient time for editors to review content and avoid seasonalilty effects\n", "\n", "**Wiki size threshold**: We removed wikis where 15 or fewer articles were created with content translation during the reviewed period to reduce noise in the data and focus on wikis with more representative data. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "shhh <- function(expr) suppressPackageStartupMessages(suppressWarnings(suppressMessages(expr)))\n", "shhh({\n", " library(tidyverse);\n", " # Tables:\n", " library(gt);\n", " library(gtsummary);\n", "})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Quarterly Comparison" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#FIXME: Update with parameters\n", "#FIXME: Investigate ability to add time contraint for when the page was deleted" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Update with time period you wish to review\n", "# Q2 October - December 2021\n", "\n", "mw_snapshot <- '2022-03'\n", "start_dt <- '2022-01-01'\n", "end_dt <- '2022-03-31'" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "\n", "query <-\n", "\"\n", "-- find both cx and non-cx created articles \n", "WITH created_articles AS (\n", "\n", "SELECT\n", " wiki_db AS wiki,\n", " SUM(CAST(ARRAY_CONTAINS(revision_tags, 'contenttranslation') AS INT)) AS created_cx,\n", " COUNT(*) AS created_total\n", "FROM wmf.mediawiki_history\n", "WHERE\n", " snapshot = '2022-03'\n", " AND event_timestamp BETWEEN '2022-01-01' and '2022-03-31'\n", "-- interested in main page namespaces\n", " AND page_namespace = 0\n", "-- only look at new page creations\n", " AND revision_parent_id = 0\n", " AND event_entity = 'revision'\n", " AND event_type = 'create' \n", "GROUP BY \n", " wiki_db\n", "),\n", "\n", "--find all deleted articles that were created with cx \n", "\n", "deleted_articles AS (\n", "\n", "SELECT\n", " wiki_db AS wiki,\n", " SUM(CAST(ARRAY_CONTAINS(revision_tags, 'contenttranslation') AS INT)) AS deleted_cx,\n", " COUNT(*) AS deleted_total\n", "FROM wmf.mediawiki_history\n", "WHERE\n", " snapshot = '2022-03'\n", " AND event_timestamp BETWEEN '2022-01-01' and '2022-03-31'\n", "-- interested in main page namespaces\n", " AND page_namespace = 0\n", "-- only look at new page creations\n", " AND revision_parent_id = 0\n", " AND event_entity = 'revision'\n", "-- find revisions moved to the archive table\n", " AND event_type = 'create'\n", " AND revision_is_deleted_by_page_deletion = TRUE\n", "-- remove all bots\n", " AND SIZE(event_user_is_bot_by_historical) = 0 -- not a bot\n", "GROUP BY \n", " wiki_db\n", ")\n", "\n", "-- main query to aggregate and join sources above\n", "SELECT\n", " created_articles.wiki,\n", " created_cx,\n", " (created_total - created_cx) AS created_non_cx,\n", " deleted_cx,\n", " (deleted_total - deleted_cx) AS deleted_non_cx\n", "FROM created_articles\n", "JOIN deleted_articles ON \n", " created_articles.wiki = deleted_articles.wiki\n", "\"" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Don't forget to authenticate with Kerberos using kinit\n", "\n" ] } ], "source": [ "cx_deletion_ratio <- wmfdata::query_hive(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overall Quarterly Deletion Ratio" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\n", "
A data.frame: 1 × 3
deleted_cx_pctdeleted_non_cx_pctdeletion_pct_diff
<chr><chr><chr>
3.08%4.97%1.89%
\n" ], "text/latex": [ "A data.frame: 1 × 3\n", "\\begin{tabular}{lll}\n", " deleted\\_cx\\_pct & deleted\\_non\\_cx\\_pct & deletion\\_pct\\_diff\\\\\n", " & & \\\\\n", "\\hline\n", "\t 3.08\\% & 4.97\\% & 1.89\\%\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 1 × 3\n", "\n", "| deleted_cx_pct <chr> | deleted_non_cx_pct <chr> | deletion_pct_diff <chr> |\n", "|---|---|---|\n", "| 3.08% | 4.97% | 1.89% |\n", "\n" ], "text/plain": [ " deleted_cx_pct deleted_non_cx_pct deletion_pct_diff\n", "1 3.08% 4.97% 1.89% " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "cx_deletion_ratio_overall <- cx_deletion_ratio %>%\n", " #filter(created_cx > 15) %>% # remove wikis with 15 or fewer articles created using cx\n", " summarise(deleted_cx_pct = paste0(round(sum(deleted_cx)/sum(created_cx) * 100, 2), \"%\"),\n", " deleted_non_cx_pct = paste0(round(sum(deleted_non_cx)/sum(created_non_cx) * 100, 2), \"%\"),\n", " deletion_pct_diff = paste0(round((sum(deleted_non_cx)/sum(created_non_cx)*100)-((sum(deleted_cx)/sum(created_cx))*100), 2),\"%\")\n", " )\n", "\n", "cx_deletion_ratio_overall\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## By Wiki\n", " " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# Add columns with calculated deletion ratio\n", "\n", "cx_deletion_ratio_bywiki <- cx_deletion_ratio %>%\n", " #filter(wiki == 'arwiki') %>% # use to find ratios for single wiki\n", " filter(created_cx > 15) %>% # remove wikis with 15 or fewer articles created using cx\n", " mutate(deleted_cx_ratio = deleted_cx/created_cx, \n", " deleted_non_cx_ratio = deleted_non_cx/created_non_cx, \n", " deletion_ratio_diff = ((deleted_non_cx/created_non_cx)-(deleted_cx/created_cx)\n", " ))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How many wikis have translations deleted more often than regular articles?" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "cx_deletion_higher <- cx_deletion_ratio_bywiki %>%\n", " filter(deletion_ratio_diff < 0) %>% #find wikis with higher cx deletion ratio\n", " summarise(total_wikis = n())\n" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Across all wikis where more than 15 articles have been created with content translation in Q3, there were 13 wikis where articles created with content translation were deleted more than articles created without cx\"\n" ] } ], "source": [ "print(paste0(\"Across all wikis where more than 15 articles have been created with content translation in Q3, there were \", \n", " cx_deletion_higher[1], \n", " \" wikis where articles created with content translation were deleted more than articles created without cx\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Which are these wikis?" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "cx_deletion_higher_list <- cx_deletion_ratio_bywiki %>%\n", " filter(deletion_ratio_diff < 0)%>% # only wikis where cx deletion ratio is higher\n", " arrange(deletion_ratio_diff) #sort by highest deletion ratio difference\n", " " ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Wikis with higher deletion ratios for articles created with Content Translation
Reviewed Time Period: January 2022 through March 2022 (Q3)
Wiki project1\n", " Created Articles\n", " \n", " Deleted Articles\n", " \n", " Deletion Ratios\n", "
Created CX ArticlesCreated non-CX ArticlesDeleted CX ArticlesDeleted non-CX ArticlesCX Articles Deletion RatioNon-CX Articles Deletion RatioDeletion Ratio Difference
etwiki174503552829.41%11.73%−17.69%
skwiki292587842427.59%16.39%−11.20%
mnwiki173781121464.71%56.61%−8.09%
bewiki165529115999.09%1.87%−7.22%
ltwiki3129961075432.26%25.17%−7.09%
astwiki271833310511.11%5.73%−5.38%
ckbwiki6978504415.80%0.52%−5.27%
eowiki202104694671.98%0.64%−1.34%
bclwiki176508231.14%0.59%−0.55%
bswiki14521437924.83%4.29%−0.53%
ttwiki497950992330.40%0.03%−0.37%
arzwiki6595589442720.61%0.49%−0.12%
skrwiki295526230.68%0.57%−0.11%
\n", "

\n", " \n", " 1\n", " \n", " \n", " Excludes wikis with 15 or fewer articles created with Content Translation\n", " during the reviewed time period\n", "
\n", "

\n", "
\n", "\n", "\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# reformat into table\n", "\n", "cx_deletion_higher_list_tbl <- cx_deletion_higher_list %>%\n", " gt() %>%\n", " tab_header(\n", " title = \"Wikis with higher deletion ratios for articles created with Content Translation\",\n", " subtitle = \"Reviewed Time Period: January 2022 through March 2022 (Q3)\") %>%\n", " fmt_percent(\n", " columns = 6:8\n", " ) %>%\n", "\n", " cols_label(wiki = \"Wiki project\",\n", " created_cx = \"Created CX Articles\", \n", " created_non_cx = \"Created non-CX Articles\",\n", " deleted_cx = \"Deleted CX Articles\",\n", " deleted_non_cx = \"Deleted non-CX Articles\",\n", " deleted_cx_ratio = \"CX Articles Deletion Ratio\",\n", " deleted_non_cx_ratio = \"Non-CX Articles Deletion Ratio\",\n", " deletion_ratio_diff = \"Deletion Ratio Difference\") %>%\n", " tab_spanner(\"Created Articles\", 2:3) %>%\n", " tab_spanner(\"Deleted Articles\", 4:5) %>%\n", " tab_spanner(\"Deletion Ratios\", 6:8) %>%\n", " tab_footnote(\n", " footnote = \"Excludes wikis with 15 or fewer articles created with Content Translation\n", " during the reviewed time period\",\n", " locations = cells_column_labels(\n", " columns = 'wiki'\n", " )) %>%\n", " gtsave(\n", " \"cx_deletion_higher_wikis_current.html\", inline_css = TRUE) \n", "\n", "\n", "IRdisplay::display_html(data = cx_deletion_higher_list_tbl, file = \"cx_deletion_higher_wikis_current.html\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How high is the highest deletion ratio a wiki has for translations?" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 5 × 8
wikicreated_cxcreated_non_cxdeleted_cxdeleted_non_cxdeleted_cx_ratiodeleted_non_cx_ratiodeletion_ratio_diff
<chr><int><int><int><int><chr><chr><chr>
1mnwiki17 3781121464.71%56.61%-8.09%
2ltwiki3129961075432.26%25.17%-7.09%
3etwiki174503 552829.41%11.73%-17.69%
4skwiki292587 842427.59%16.39%-11.2%
5guwiki16 162 3 6618.75%40.74%21.99%
\n" ], "text/latex": [ "A data.frame: 5 × 8\n", "\\begin{tabular}{r|llllllll}\n", " & wiki & created\\_cx & created\\_non\\_cx & deleted\\_cx & deleted\\_non\\_cx & deleted\\_cx\\_ratio & deleted\\_non\\_cx\\_ratio & deletion\\_ratio\\_diff\\\\\n", " & & & & & & & & \\\\\n", "\\hline\n", "\t1 & mnwiki & 17 & 378 & 11 & 214 & 64.71\\% & 56.61\\% & -8.09\\% \\\\\n", "\t2 & ltwiki & 31 & 2996 & 10 & 754 & 32.26\\% & 25.17\\% & -7.09\\% \\\\\n", "\t3 & etwiki & 17 & 4503 & 5 & 528 & 29.41\\% & 11.73\\% & -17.69\\%\\\\\n", "\t4 & skwiki & 29 & 2587 & 8 & 424 & 27.59\\% & 16.39\\% & -11.2\\% \\\\\n", "\t5 & guwiki & 16 & 162 & 3 & 66 & 18.75\\% & 40.74\\% & 21.99\\% \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 5 × 8\n", "\n", "| | wiki <chr> | created_cx <int> | created_non_cx <int> | deleted_cx <int> | deleted_non_cx <int> | deleted_cx_ratio <chr> | deleted_non_cx_ratio <chr> | deletion_ratio_diff <chr> |\n", "|---|---|---|---|---|---|---|---|---|\n", "| 1 | mnwiki | 17 | 378 | 11 | 214 | 64.71% | 56.61% | -8.09% |\n", "| 2 | ltwiki | 31 | 2996 | 10 | 754 | 32.26% | 25.17% | -7.09% |\n", "| 3 | etwiki | 17 | 4503 | 5 | 528 | 29.41% | 11.73% | -17.69% |\n", "| 4 | skwiki | 29 | 2587 | 8 | 424 | 27.59% | 16.39% | -11.2% |\n", "| 5 | guwiki | 16 | 162 | 3 | 66 | 18.75% | 40.74% | 21.99% |\n", "\n" ], "text/plain": [ " wiki created_cx created_non_cx deleted_cx deleted_non_cx deleted_cx_ratio\n", "1 mnwiki 17 378 11 214 64.71% \n", "2 ltwiki 31 2996 10 754 32.26% \n", "3 etwiki 17 4503 5 528 29.41% \n", "4 skwiki 29 2587 8 424 27.59% \n", "5 guwiki 16 162 3 66 18.75% \n", " deleted_non_cx_ratio deletion_ratio_diff\n", "1 56.61% -8.09% \n", "2 25.17% -7.09% \n", "3 11.73% -17.69% \n", "4 16.39% -11.2% \n", "5 40.74% 21.99% " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "cx_deletion_ration_highest <- cx_deletion_ratio_bywiki %>%\n", " arrange(desc(deleted_cx_ratio)) %>% #sort by highest to lowest cx deletion ratio\n", " mutate(deleted_cx_ratio = paste0(round(deleted_cx_ratio *100,2),\"%\") ,\n", " deleted_non_cx_ratio = paste0(round(deleted_non_cx_ratio *100,2),\"%\") ,\n", " deletion_ratio_diff = paste0(round(deletion_ratio_diff * 100,2),\"%\") )\n", "\n", "head(cx_deletion_ration_highest, 5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Has the number of those wikis reduced compared to the previous period?" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "# Deletion ratios from Q4\n", "\n", "query <-\n", "\"\n", "-- find all created articles \n", "WITH created_articles AS (\n", "\n", "SELECT\n", " wiki_db AS wiki,\n", " SUM(CAST(ARRAY_CONTAINS(revision_tags, 'contenttranslation') AS INT)) AS created_cx,\n", " COUNT(*) AS created_total\n", "FROM wmf.mediawiki_history\n", "WHERE\n", " snapshot = '2022-03'\n", " AND event_timestamp BETWEEN '2021-10-01' and '2021-12-31' \n", "-- interested in main page namespaces\n", " AND page_namespace = 0\n", "-- only look at new page creations\n", " AND revision_parent_id = 0\n", " AND event_entity = 'revision'\n", " AND event_type = 'create'\n", "-- remove bots\n", " AND SIZE(event_user_is_bot_by_historical) = 0 \n", "GROUP BY \n", " wiki_db\n", "),\n", "\n", "--find all deleted articles \n", "\n", "deleted_articles AS (\n", "\n", "SELECT\n", " wiki_db AS wiki,\n", " SUM(CAST(ARRAY_CONTAINS(revision_tags, 'contenttranslation') AS INT)) AS deleted_cx,\n", " COUNT(*) AS deleted_total\n", "FROM wmf.mediawiki_history\n", "WHERE\n", " snapshot = '2022-03'\n", " AND event_timestamp BETWEEN '2021-10-01' and '2021-12-31'\n", "-- interested in main page namespaces\n", " AND page_namespace = 0\n", "-- only look at new page creations\n", " AND revision_parent_id = 0\n", " AND event_entity = 'revision'\n", "-- find revisions moved to the archive table\n", " AND event_type = 'create'\n", " AND revision_is_deleted_by_page_deletion = TRUE\n", "-- remove bots\n", " AND SIZE(event_user_is_bot_by_historical) = 0 \n", "GROUP BY \n", " wiki_db\n", ")\n", "\n", "-- main query \n", "SELECT\n", " created_articles.wiki,\n", " created_cx,\n", " (created_total - created_cx) AS created_non_cx,\n", " deleted_cx,\n", " (deleted_total - deleted_cx) AS deleted_non_cx\n", "FROM created_articles\n", "JOIN deleted_articles ON \n", " created_articles.wiki = deleted_articles.wiki\n", "\"" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Don't forget to authenticate with Kerberos using kinit\n", "\n" ] } ], "source": [ "cx_deletion_ratio_previous <- wmfdata::query_hive(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Overall Previous Quarter Deletion Ratio" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\n", "
A data.frame: 1 × 3
deleted_cx_pctdeleted_non_cx_pctdeletion_pct_diff
<chr><chr><chr>
3.27%6%2.73%
\n" ], "text/latex": [ "A data.frame: 1 × 3\n", "\\begin{tabular}{lll}\n", " deleted\\_cx\\_pct & deleted\\_non\\_cx\\_pct & deletion\\_pct\\_diff\\\\\n", " & & \\\\\n", "\\hline\n", "\t 3.27\\% & 6\\% & 2.73\\%\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 1 × 3\n", "\n", "| deleted_cx_pct <chr> | deleted_non_cx_pct <chr> | deletion_pct_diff <chr> |\n", "|---|---|---|\n", "| 3.27% | 6% | 2.73% |\n", "\n" ], "text/plain": [ " deleted_cx_pct deleted_non_cx_pct deletion_pct_diff\n", "1 3.27% 6% 2.73% " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "cx_deletion_ratio_overall_previous <- cx_deletion_ratio_previous %>%\n", " #filter(created_cx > 15) %>% # remove wikis with 15 or fewer articles created using cx\n", " summarise(deleted_cx_pct = paste0(round(sum(deleted_cx)/sum(created_cx) * 100, 2), \"%\"),\n", " deleted_non_cx_pct = paste0(round(sum(deleted_non_cx)/sum(created_non_cx) * 100, 2), \"%\"),\n", " deletion_pct_diff = paste0(round((sum(deleted_non_cx)/sum(created_non_cx)*100)-((sum(deleted_cx)/sum(created_cx))*100), 2),\"%\")\n", " )\n", "\n", "cx_deletion_ratio_overall_previous" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# By Wiki Previous Quarter Deletion Ratios" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "cx_deletion_ratio_previous_bywiki <- cx_deletion_ratio_previous %>%\n", " #filter(wiki == 'idwiki') %>%\n", " filter(created_cx > 15) %>%\n", " mutate(deleted_cx_ratio = deleted_cx/created_cx,\n", " deleted_non_cx_ratio = deleted_non_cx/created_non_cx,\n", " deletion_ratio_diff = ((deleted_non_cx/created_non_cx)-(deleted_cx/created_cx)\n", " ))\n" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "cx_deletion_higher_previous <- cx_deletion_ratio_previous_bywiki %>%\n", " filter(deletion_ratio_diff < 0) %>%\n", " summarise(total_wikis = n())\n" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Across all wikis where more than 15 articles have been created with content translation in the previous quarter, there were 15 wikis where articles created with content translation were deleted more than articles created without cx\"\n" ] } ], "source": [ "print(paste0(\"Across all wikis where more than 15 articles have been created with content translation in the previous quarter, there were \", \n", " cx_deletion_higher_previous[1], \n", " \" wikis where articles created with content translation were deleted more than articles created without cx\"))" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 15 × 8
wikicreated_cxcreated_non_cxdeleted_cxdeleted_non_cxdeleted_cx_ratiodeleted_non_cx_ratiodeletion_ratio_diff
<chr><int><int><int><int><dbl><dbl><dbl>
ltwiki 38 442618 6900.473684210.155896972-0.3177872381
lawiki 21 1084 4 510.190476190.047047970-0.1434282200
hywiki 43 232610 3560.232558140.153052451-0.0795056890
arzwiki 62170059 4 2220.064516130.001305429-0.0632106997
ocwiki 20 405 2 150.100000000.037037037-0.0629629630
mrwiki 98 3421 7 700.071428570.020461853-0.0509667182
fiwiki 37 8219 5 7300.135135140.088818591-0.0463165441
swwiki 71 1673 4 240.056338030.014345487-0.0419925410
thwiki 70 7320 4 1770.057142860.024180328-0.0329625293
twwiki 82 275 6 140.073170730.050909091-0.0222616408
iswiki 20 706 2 550.100000000.077903683-0.0220963173
nlwiki 237 146444122750.172995780.155353728-0.0176420521
afwiki 82 1309 5 640.060975610.048892284-0.0120833256
kmwiki 43 557 7 900.162790700.161579892-0.0012108054
bewiki 154 5133 3 990.019480520.019286967-0.0001935528
\n" ], "text/latex": [ "A data.frame: 15 × 8\n", "\\begin{tabular}{llllllll}\n", " wiki & created\\_cx & created\\_non\\_cx & deleted\\_cx & deleted\\_non\\_cx & deleted\\_cx\\_ratio & deleted\\_non\\_cx\\_ratio & deletion\\_ratio\\_diff\\\\\n", " & & & & & & & \\\\\n", "\\hline\n", "\t ltwiki & 38 & 4426 & 18 & 690 & 0.47368421 & 0.155896972 & -0.3177872381\\\\\n", "\t lawiki & 21 & 1084 & 4 & 51 & 0.19047619 & 0.047047970 & -0.1434282200\\\\\n", "\t hywiki & 43 & 2326 & 10 & 356 & 0.23255814 & 0.153052451 & -0.0795056890\\\\\n", "\t arzwiki & 62 & 170059 & 4 & 222 & 0.06451613 & 0.001305429 & -0.0632106997\\\\\n", "\t ocwiki & 20 & 405 & 2 & 15 & 0.10000000 & 0.037037037 & -0.0629629630\\\\\n", "\t mrwiki & 98 & 3421 & 7 & 70 & 0.07142857 & 0.020461853 & -0.0509667182\\\\\n", "\t fiwiki & 37 & 8219 & 5 & 730 & 0.13513514 & 0.088818591 & -0.0463165441\\\\\n", "\t swwiki & 71 & 1673 & 4 & 24 & 0.05633803 & 0.014345487 & -0.0419925410\\\\\n", "\t thwiki & 70 & 7320 & 4 & 177 & 0.05714286 & 0.024180328 & -0.0329625293\\\\\n", "\t twwiki & 82 & 275 & 6 & 14 & 0.07317073 & 0.050909091 & -0.0222616408\\\\\n", "\t iswiki & 20 & 706 & 2 & 55 & 0.10000000 & 0.077903683 & -0.0220963173\\\\\n", "\t nlwiki & 237 & 14644 & 41 & 2275 & 0.17299578 & 0.155353728 & -0.0176420521\\\\\n", "\t afwiki & 82 & 1309 & 5 & 64 & 0.06097561 & 0.048892284 & -0.0120833256\\\\\n", "\t kmwiki & 43 & 557 & 7 & 90 & 0.16279070 & 0.161579892 & -0.0012108054\\\\\n", "\t bewiki & 154 & 5133 & 3 & 99 & 0.01948052 & 0.019286967 & -0.0001935528\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 15 × 8\n", "\n", "| wiki <chr> | created_cx <int> | created_non_cx <int> | deleted_cx <int> | deleted_non_cx <int> | deleted_cx_ratio <dbl> | deleted_non_cx_ratio <dbl> | deletion_ratio_diff <dbl> |\n", "|---|---|---|---|---|---|---|---|\n", "| ltwiki | 38 | 4426 | 18 | 690 | 0.47368421 | 0.155896972 | -0.3177872381 |\n", "| lawiki | 21 | 1084 | 4 | 51 | 0.19047619 | 0.047047970 | -0.1434282200 |\n", "| hywiki | 43 | 2326 | 10 | 356 | 0.23255814 | 0.153052451 | -0.0795056890 |\n", "| arzwiki | 62 | 170059 | 4 | 222 | 0.06451613 | 0.001305429 | -0.0632106997 |\n", "| ocwiki | 20 | 405 | 2 | 15 | 0.10000000 | 0.037037037 | -0.0629629630 |\n", "| mrwiki | 98 | 3421 | 7 | 70 | 0.07142857 | 0.020461853 | -0.0509667182 |\n", "| fiwiki | 37 | 8219 | 5 | 730 | 0.13513514 | 0.088818591 | -0.0463165441 |\n", "| swwiki | 71 | 1673 | 4 | 24 | 0.05633803 | 0.014345487 | -0.0419925410 |\n", "| thwiki | 70 | 7320 | 4 | 177 | 0.05714286 | 0.024180328 | -0.0329625293 |\n", "| twwiki | 82 | 275 | 6 | 14 | 0.07317073 | 0.050909091 | -0.0222616408 |\n", "| iswiki | 20 | 706 | 2 | 55 | 0.10000000 | 0.077903683 | -0.0220963173 |\n", "| nlwiki | 237 | 14644 | 41 | 2275 | 0.17299578 | 0.155353728 | -0.0176420521 |\n", "| afwiki | 82 | 1309 | 5 | 64 | 0.06097561 | 0.048892284 | -0.0120833256 |\n", "| kmwiki | 43 | 557 | 7 | 90 | 0.16279070 | 0.161579892 | -0.0012108054 |\n", "| bewiki | 154 | 5133 | 3 | 99 | 0.01948052 | 0.019286967 | -0.0001935528 |\n", "\n" ], "text/plain": [ " wiki created_cx created_non_cx deleted_cx deleted_non_cx deleted_cx_ratio\n", "1 ltwiki 38 4426 18 690 0.47368421 \n", "2 lawiki 21 1084 4 51 0.19047619 \n", "3 hywiki 43 2326 10 356 0.23255814 \n", "4 arzwiki 62 170059 4 222 0.06451613 \n", "5 ocwiki 20 405 2 15 0.10000000 \n", "6 mrwiki 98 3421 7 70 0.07142857 \n", "7 fiwiki 37 8219 5 730 0.13513514 \n", "8 swwiki 71 1673 4 24 0.05633803 \n", "9 thwiki 70 7320 4 177 0.05714286 \n", "10 twwiki 82 275 6 14 0.07317073 \n", "11 iswiki 20 706 2 55 0.10000000 \n", "12 nlwiki 237 14644 41 2275 0.17299578 \n", "13 afwiki 82 1309 5 64 0.06097561 \n", "14 kmwiki 43 557 7 90 0.16279070 \n", "15 bewiki 154 5133 3 99 0.01948052 \n", " deleted_non_cx_ratio deletion_ratio_diff\n", "1 0.155896972 -0.3177872381 \n", "2 0.047047970 -0.1434282200 \n", "3 0.153052451 -0.0795056890 \n", "4 0.001305429 -0.0632106997 \n", "5 0.037037037 -0.0629629630 \n", "6 0.020461853 -0.0509667182 \n", "7 0.088818591 -0.0463165441 \n", "8 0.014345487 -0.0419925410 \n", "9 0.024180328 -0.0329625293 \n", "10 0.050909091 -0.0222616408 \n", "11 0.077903683 -0.0220963173 \n", "12 0.155353728 -0.0176420521 \n", "13 0.048892284 -0.0120833256 \n", "14 0.161579892 -0.0012108054 \n", "15 0.019286967 -0.0001935528 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "cx_deletion_higher_list_previous <- cx_deletion_ratio_previous_bywiki %>%\n", " filter(deletion_ratio_diff < 0) %>%\n", " arrange(deletion_ratio_diff)\n", "\n", "cx_deletion_higher_list_previous" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Wikis with higher deletion ratios for articles created with Content Translation
Reviewed Time Period: July 2021 through September 2021 (Q1)
Wiki project1\n", " Created Articles\n", " \n", " Deleted Articles\n", " \n", " Deletion Ratios\n", "
Created CX ArticlesCreated non-CX ArticlesDeleted CX ArticlesDeleted non-CX ArticlesCX Articles Deletion RatioNon-CX Articles Deletion RatioDeletion Ratio Difference
ltwiki3844261869047.37%15.59%−31.78%
lawiki21108445119.05%4.70%−14.34%
hywiki4323261035623.26%15.31%−7.95%
arzwiki6217005942226.45%0.13%−6.32%
ocwiki2040521510.00%3.70%−6.30%
mrwiki9834217707.14%2.05%−5.10%
fiwiki378219573013.51%8.88%−4.63%
swwiki7116734245.63%1.43%−4.20%
thwiki70732041775.71%2.42%−3.30%
twwiki822756147.32%5.09%−2.23%
iswiki2070625510.00%7.79%−2.21%
nlwiki2371464441227517.30%15.54%−1.76%
afwiki8213095646.10%4.89%−1.21%
kmwiki4355779016.28%16.16%−0.12%
bewiki15451333991.95%1.93%−0.02%
\n", "

\n", " \n", " 1\n", " \n", " \n", " Excludes wikis with 15 or fewer articles created with Content Translation\n", " during the reviewed time period\n", "
\n", "

\n", "
\n", "\n", "\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# reformat into table\n", "\n", "cx_deletion_higher_list_tbl_previous <- cx_deletion_higher_list_previous %>%\n", " gt() %>%\n", " tab_header(\n", " title = \"Wikis with higher deletion ratios for articles created with Content Translation\",\n", " subtitle = \"Reviewed Time Period: July 2021 through September 2021 (Q1)\") %>%\n", " fmt_percent(\n", " columns = 6:8\n", " ) %>%\n", "\n", " cols_label(wiki = \"Wiki project\",\n", " created_cx = \"Created CX Articles\", \n", " created_non_cx = \"Created non-CX Articles\",\n", " deleted_cx = \"Deleted CX Articles\",\n", " deleted_non_cx = \"Deleted non-CX Articles\",\n", " deleted_cx_ratio = \"CX Articles Deletion Ratio\",\n", " deleted_non_cx_ratio = \"Non-CX Articles Deletion Ratio\",\n", " deletion_ratio_diff = \"Deletion Ratio Difference\") %>%\n", " tab_spanner(\"Created Articles\", 2:3) %>%\n", " tab_spanner(\"Deleted Articles\", 4:5) %>%\n", " tab_spanner(\"Deletion Ratios\", 6:8) %>%\n", " tab_footnote(\n", " footnote = \"Excludes wikis with 15 or fewer articles created with Content Translation\n", " during the reviewed time period\",\n", " locations = cells_column_labels(\n", " columns = 'wiki'\n", " )) %>%\n", " gtsave(\n", " \"cx_deletion_higher_wikis_previous.html\", inline_css = TRUE) \n", "\n", "\n", "IRdisplay::display_html(data = cx_deletion_higher_list_tbl_previous, file = \"cx_deletion_higher_wikis_previous.html\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How many wikis had higher deletion ratios for cx translated articles both quarters?" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 3 × 1
wiki
<chr>
ltwiki
arzwiki
bewiki
\n" ], "text/latex": [ "A data.frame: 3 × 1\n", "\\begin{tabular}{l}\n", " wiki\\\\\n", " \\\\\n", "\\hline\n", "\t ltwiki \\\\\n", "\t arzwiki\\\\\n", "\t bewiki \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 3 × 1\n", "\n", "| wiki <chr> |\n", "|---|\n", "| ltwiki |\n", "| arzwiki |\n", "| bewiki |\n", "\n" ], "text/plain": [ " wiki \n", "1 ltwiki \n", "2 arzwiki\n", "3 bewiki " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "intersect(cx_deletion_higher_list_previous[1], cx_deletion_higher_list[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 6 Month Period Comparison \n", "\n", "This was done in the analysis conducted as part of [https://phabricator.wikimedia.org/T286636#7345479](T286636) to assess very review timeframes. The team decided to proceed with quarterly updates but leaving this prior analysis here for reference. \n", "\n", "Results have not been updated since June 2021. " ] }, { "cell_type": "code", "execution_count": 269, "metadata": {}, "outputs": [], "source": [ "# Current 6 Months\n", "# Jan - June 2021\n", "query <-\n", "\"\n", "-- find both cx and non-cx created articles \n", "WITH created_articles AS (\n", "\n", "SELECT\n", " wiki_db AS wiki,\n", " SUM(CAST(ARRAY_CONTAINS(revision_tags, 'contenttranslation') AS INT)) AS created_cx,\n", " COUNT(*) AS created_total\n", "FROM wmf.mediawiki_history\n", "WHERE\n", " snapshot = '2021-08'\n", " AND event_timestamp BETWEEN '2021-01-01' and '2021-06-30' \n", "-- interested in main page namespaces\n", " AND page_namespace = 0\n", "-- only look at new page creations\n", " AND revision_parent_id = 0\n", " AND event_entity = 'revision'\n", " AND event_type = 'create'\n", "-- rremove bots\n", " AND SIZE(event_user_is_bot_by_historical) = 0 \n", "GROUP BY \n", " wiki_db\n", "),\n", "\n", "--find all deleted articles that were created with cx \n", "\n", "deleted_articles AS (\n", "\n", "SELECT\n", " wiki_db AS wiki,\n", " SUM(CAST(ARRAY_CONTAINS(revision_tags, 'contenttranslation') AS INT)) AS deleted_cx,\n", " COUNT(*) AS deleted_total\n", "FROM wmf.mediawiki_history\n", "WHERE\n", " snapshot = '2021-08'\n", " AND event_timestamp BETWEEN '2021-01-01' and '2021-06-30' \n", "-- interested in main page namespaces\n", " AND page_namespace = 0\n", "-- only look at new page creations\n", " AND revision_parent_id = 0\n", " AND event_entity = 'revision'\n", " AND event_type = 'create'\n", "-- find revisions moved to the archive table\n", " AND revision_is_deleted_by_page_deletion = TRUE\n", "-- remove bots\n", " AND SIZE(event_user_is_bot_by_historical) = 0 \n", "GROUP BY \n", " wiki_db\n", ")\n", "\n", "-- main query to aggregate and join sources above\n", "SELECT\n", " created_articles.wiki,\n", " created_cx,\n", " (created_total - created_cx) AS created_non_cx,\n", " deleted_cx,\n", " (deleted_total - deleted_cx) AS deleted_non_cx\n", "FROM created_articles\n", "JOIN deleted_articles ON \n", " created_articles.wiki = deleted_articles.wiki\n", "\"" ] }, { "cell_type": "code", "execution_count": 270, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Don't forget to authenticate with Kerberos using kinit\n", "\n" ] } ], "source": [ "cx_deletion_ratio_current_6mo <- wmfdata::query_hive(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overall Deletion Ratio - Current 6 mo" ] }, { "cell_type": "code", "execution_count": 271, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\n", "
A data.frame: 1 × 3
deleted_cx_pctdeleted_non_cx_pctdeletion_pct_diff
<chr><chr><chr>
3.6%8.47%4.87%
\n" ], "text/latex": [ "A data.frame: 1 × 3\n", "\\begin{tabular}{lll}\n", " deleted\\_cx\\_pct & deleted\\_non\\_cx\\_pct & deletion\\_pct\\_diff\\\\\n", " & & \\\\\n", "\\hline\n", "\t 3.6\\% & 8.47\\% & 4.87\\%\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 1 × 3\n", "\n", "| deleted_cx_pct <chr> | deleted_non_cx_pct <chr> | deletion_pct_diff <chr> |\n", "|---|---|---|\n", "| 3.6% | 8.47% | 4.87% |\n", "\n" ], "text/plain": [ " deleted_cx_pct deleted_non_cx_pct deletion_pct_diff\n", "1 3.6% 8.47% 4.87% " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "cx_deletion_ratio_6cur_overall <- cx_deletion_ratio_current_6mo %>%\n", " summarise(deleted_cx_pct = paste0(round(sum(deleted_cx)/sum(created_cx) * 100, 2), \"%\"),\n", " deleted_non_cx_pct = paste0(round(sum(deleted_non_cx)/sum(created_non_cx) * 100, 2), \"%\"),\n", " deletion_pct_diff = paste0(round((sum(deleted_non_cx)/sum(created_non_cx)*100)-((sum(deleted_cx)/sum(created_cx))*100), 2),\"%\")\n", " )\n", "\n", "cx_deletion_ratio_6cur_overall" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## By Wiki" ] }, { "cell_type": "code", "execution_count": 272, "metadata": {}, "outputs": [], "source": [ "cx_deletion_ratio_current_bywiki <- cx_deletion_ratio_current_6mo %>%\n", " #filter(wiki == 'idwiki') %>%\n", " filter(created_cx > 15) %>% # only review wikis with more than 15 cx articles\n", " mutate(deleted_cx_ratio = deleted_cx/created_cx,\n", " deleted_non_cx_ratio = deleted_non_cx/created_non_cx,\n", " deletion_ratio_diff = ((deleted_non_cx/created_non_cx)-(deleted_cx/created_cx)\n", " ))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How many wikis have translations deleted more often than regular articles?" ] }, { "cell_type": "code", "execution_count": 274, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\n", "
A data.frame: 1 × 1
total_wikis
<int>
20
\n" ], "text/latex": [ "A data.frame: 1 × 1\n", "\\begin{tabular}{l}\n", " total\\_wikis\\\\\n", " \\\\\n", "\\hline\n", "\t 20\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 1 × 1\n", "\n", "| total_wikis <int> |\n", "|---|\n", "| 20 |\n", "\n" ], "text/plain": [ " total_wikis\n", "1 20 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "cx_deletion_higher_current_6mo <- cx_deletion_ratio_current_bywiki %>%\n", " filter(deletion_ratio_diff < 0) %>%\n", " summarise(total_wikis = n())\n", "\n", "cx_deletion_higher_current_6mo " ] }, { "cell_type": "code", "execution_count": 284, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Across all wikis where more than 15 articles have been created with content translation from Jan 2021 - June 2021, there were 20 wikis where articles created with content translation were deleted more than articles created without cx\"\n" ] } ], "source": [ "print(paste0(\"Across all wikis where more than 15 articles have been created with content translation from Jan 2021 - June 2021, there were \", \n", " cx_deletion_higher_current_6mo[1], \n", " \" wikis where articles created with content translation were deleted more than articles created without cx\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Which are these wikis?" ] }, { "cell_type": "code", "execution_count": 276, "metadata": {}, "outputs": [], "source": [ "cx_deletion_higher_list_current <- cx_deletion_ratio_current_bywiki %>%\n", " filter(deletion_ratio_diff < 0)%>% #only wikis with higher cx deletion ratios\n", " arrange(deletion_ratio_diff)\n", " " ] }, { "cell_type": "code", "execution_count": 279, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Wikis with higher deletion ratios for articles created with Content Translation
Reviewed Time Period: January 2021 through June 2021
Wiki project1\n", " Created Articles\n", " \n", " Deleted Articles\n", " \n", " Deletion Ratios\n", "
Created CX ArticlesCreated non-CX ArticlesDeleted CX ArticlesDeleted non-CX ArticlesCX Articles Deletion RatioNon-CX Articles Deletion RatioDeletion Ratio Difference
hawwiki68128252536.76%19.53%−17.23%
iswiki302157714023.33%6.49%−16.84%
kuwiki22154863412715.38%2.31%−13.07%
arywiki57116194615.79%3.96%−11.83%
fiu_vrowiki312354612.90%2.55%−10.35%
thwiki249975341112.50%4.12%−8.38%
arzwiki11912164396067.56%0.50%−7.06%
azbwiki18167428311.11%4.96%−6.15%
siwiki37157348710.81%5.53%−5.28%
kawiki1701001033141519.41%14.14%−5.28%
lldwiki18171115.56%0.58%−4.97%
jvwiki25361182162456.39%3.81%−2.58%
crhwiki7726582182.60%0.68%−1.92%
fiwiki2301837025164610.87%8.96%−1.91%
pswiki557803315.45%3.97%−1.48%
bewiki4379537142033.20%2.13%−1.08%
afwiki2103797111705.24%4.48%−0.76%
mrwiki2681066751201.87%1.12%−0.74%
lawiki5617663835.36%4.70%−0.66%
eowiki5581076691271.61%1.18%−0.43%
\n", "

\n", " \n", " 1\n", " \n", " \n", " Excludes wikis with 15 or fewer articles created with Content Translation\n", " during the reviewed time period\n", "
\n", "

\n", "
\n", "\n", "\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# reformat into table\n", "\n", "cx_deletion_higher_list_6mo_tbl <- cx_deletion_higher_list_current %>%\n", " gt() %>%\n", " tab_header(\n", " title = \"Wikis with higher deletion ratios for articles created with Content Translation\",\n", " subtitle = \"Reviewed Time Period: January 2021 through June 2021\") %>%\n", " fmt_percent(\n", " columns = 6:8\n", " ) %>%\n", "\n", " cols_label(wiki = \"Wiki project\",\n", " created_cx = \"Created CX Articles\", \n", " created_non_cx = \"Created non-CX Articles\",\n", " deleted_cx = \"Deleted CX Articles\",\n", " deleted_non_cx = \"Deleted non-CX Articles\",\n", " deleted_cx_ratio = \"CX Articles Deletion Ratio\",\n", " deleted_non_cx_ratio = \"Non-CX Articles Deletion Ratio\",\n", " deletion_ratio_diff = \"Deletion Ratio Difference\") %>%\n", " tab_spanner(\"Created Articles\", 2:3) %>%\n", " tab_spanner(\"Deleted Articles\", 4:5) %>%\n", " tab_spanner(\"Deletion Ratios\", 6:8) %>%\n", " tab_footnote(\n", " footnote = \"Excludes wikis with 15 or fewer articles created with Content Translation\n", " during the reviewed time period\",\n", " locations = cells_column_labels(\n", " columns = 'wiki'\n", " )) %>%\n", " gtsave(\n", " \"cx_deletion_higher_wikis_6mo.html\", inline_css = TRUE) \n", "\n", "\n", "IRdisplay::display_html(data = cx_deletion_higher_list_6mo_tbl, file = \"cx_deletion_higher_wikis_6mo.html\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How high is the highest deletion ratio a wiki has for translations?\n" ] }, { "cell_type": "code", "execution_count": 282, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 5 × 8
wikicreated_cxcreated_non_cxdeleted_cxdeleted_non_cxdeleted_cx_ratiodeleted_non_cx_ratiodeletion_ratio_diff
<chr><int><int><int><int><chr><chr><chr>
1ltwiki 45 425417204137.78%47.98%10.2%
2hawwiki 68 12825 2536.76%19.53%-17.23%
3mnwiki 30 126510 54233.33%42.85%9.51%
4iswiki 30 2157 7 14023.33%6.49% -16.84%
5kawiki 1701001033141519.41%14.14%-5.28%
\n" ], "text/latex": [ "A data.frame: 5 × 8\n", "\\begin{tabular}{r|llllllll}\n", " & wiki & created\\_cx & created\\_non\\_cx & deleted\\_cx & deleted\\_non\\_cx & deleted\\_cx\\_ratio & deleted\\_non\\_cx\\_ratio & deletion\\_ratio\\_diff\\\\\n", " & & & & & & & & \\\\\n", "\\hline\n", "\t1 & ltwiki & 45 & 4254 & 17 & 2041 & 37.78\\% & 47.98\\% & 10.2\\% \\\\\n", "\t2 & hawwiki & 68 & 128 & 25 & 25 & 36.76\\% & 19.53\\% & -17.23\\%\\\\\n", "\t3 & mnwiki & 30 & 1265 & 10 & 542 & 33.33\\% & 42.85\\% & 9.51\\% \\\\\n", "\t4 & iswiki & 30 & 2157 & 7 & 140 & 23.33\\% & 6.49\\% & -16.84\\%\\\\\n", "\t5 & kawiki & 170 & 10010 & 33 & 1415 & 19.41\\% & 14.14\\% & -5.28\\% \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 5 × 8\n", "\n", "| | wiki <chr> | created_cx <int> | created_non_cx <int> | deleted_cx <int> | deleted_non_cx <int> | deleted_cx_ratio <chr> | deleted_non_cx_ratio <chr> | deletion_ratio_diff <chr> |\n", "|---|---|---|---|---|---|---|---|---|\n", "| 1 | ltwiki | 45 | 4254 | 17 | 2041 | 37.78% | 47.98% | 10.2% |\n", "| 2 | hawwiki | 68 | 128 | 25 | 25 | 36.76% | 19.53% | -17.23% |\n", "| 3 | mnwiki | 30 | 1265 | 10 | 542 | 33.33% | 42.85% | 9.51% |\n", "| 4 | iswiki | 30 | 2157 | 7 | 140 | 23.33% | 6.49% | -16.84% |\n", "| 5 | kawiki | 170 | 10010 | 33 | 1415 | 19.41% | 14.14% | -5.28% |\n", "\n" ], "text/plain": [ " wiki created_cx created_non_cx deleted_cx deleted_non_cx deleted_cx_ratio\n", "1 ltwiki 45 4254 17 2041 37.78% \n", "2 hawwiki 68 128 25 25 36.76% \n", "3 mnwiki 30 1265 10 542 33.33% \n", "4 iswiki 30 2157 7 140 23.33% \n", "5 kawiki 170 10010 33 1415 19.41% \n", " deleted_non_cx_ratio deletion_ratio_diff\n", "1 47.98% 10.2% \n", "2 19.53% -17.23% \n", "3 42.85% 9.51% \n", "4 6.49% -16.84% \n", "5 14.14% -5.28% " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "cx_deletion_ration_highest_current <- cx_deletion_ratio_current_bywiki %>%\n", " arrange(desc(deleted_cx_ratio)) %>% \n", " mutate(deleted_cx_ratio = paste0(round(deleted_cx_ratio *100,2),\"%\") ,\n", " deleted_non_cx_ratio = paste0(round(deleted_non_cx_ratio *100,2),\"%\") ,\n", " deletion_ratio_diff = paste0(round(deletion_ratio_diff * 100,2),\"%\") )\n", "\n", "head(cx_deletion_ration_highest_current, 5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lithuanian Wikipedia had the highest deletion ratio for articles created with content translation. 37.8% of all articles created with content translation rate were deleted; however, this was still less than the percent of non content translated article deletion ratio (47.9%).\n", "\n", "The Wiki that had the highest different in deletion ratios was Hawaiian Wikipedia. 36.8% of all articles created with cx were deleted during the reviewed time period comparted to 19.5% of articles created without content translation. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Has the number of those wikis reduced compared to the previous period?" ] }, { "cell_type": "code", "execution_count": 285, "metadata": {}, "outputs": [], "source": [ "# Previous 6 Months\n", "# July 2020 - December 2020\n", "\n", "query <-\n", "\"\n", "-- find both cx and non-cx created articles \n", "WITH created_articles AS (\n", "\n", "SELECT\n", " wiki_db AS wiki,\n", " SUM(CAST(ARRAY_CONTAINS(revision_tags, 'contenttranslation') AS INT)) AS created_cx,\n", " COUNT(*) AS created_total\n", "FROM wmf.mediawiki_history\n", "WHERE\n", " snapshot = '2021-08'\n", " AND event_timestamp BETWEEN '2020-07-01' and '2020-12-31' \n", "-- interested in main page namespaces\n", " AND page_namespace = 0\n", "-- only look at new page creations\n", " AND revision_parent_id = 0\n", " AND event_entity = 'revision'\n", " AND event_type = 'create'\n", "-- remove bots\n", " AND SIZE(event_user_is_bot_by_historical) = 0 \n", "GROUP BY \n", " wiki_db\n", "),\n", "\n", "--find all deleted articles that were created with cx \n", "\n", "deleted_articles AS (\n", "\n", "SELECT\n", " wiki_db AS wiki,\n", " SUM(CAST(ARRAY_CONTAINS(revision_tags, 'contenttranslation') AS INT)) AS deleted_cx,\n", " COUNT(*) AS deleted_total\n", "FROM wmf.mediawiki_history\n", "WHERE\n", " snapshot = '2021-08'\n", " AND event_timestamp BETWEEN '2020-07-01' and '2020-12-31' \n", "-- interested in main page namespaces\n", " AND page_namespace = 0\n", "-- only look at new page creations\n", " AND revision_parent_id = 0\n", " AND event_entity = 'revision'\n", "-- find revisions moved to the archive table\n", " AND event_type = 'create'\n", " AND revision_is_deleted_by_page_deletion = TRUE\n", "-- remove bots\n", " AND SIZE(event_user_is_bot_by_historical) = 0 \n", "GROUP BY \n", " wiki_db\n", ")\n", "\n", "-- main query to aggregate and join sources above\n", "SELECT\n", " created_articles.wiki,\n", " created_cx,\n", " (created_total - created_cx) AS created_non_cx,\n", " deleted_cx,\n", " (deleted_total - deleted_cx) AS deleted_non_cx\n", "FROM created_articles\n", "JOIN deleted_articles ON \n", " created_articles.wiki = deleted_articles.wiki\n", "\"" ] }, { "cell_type": "code", "execution_count": 286, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Don't forget to authenticate with Kerberos using kinit\n", "\n" ] } ], "source": [ "cx_deletion_ratio_previous_6mo <- wmfdata::query_hive(query)" ] }, { "cell_type": "code", "execution_count": 287, "metadata": {}, "outputs": [], "source": [ "cx_deletion_ratio_bywiki_previous <- cx_deletion_ratio_previous_6mo %>%\n", " #filter(wiki == 'idwiki') %>%\n", " filter(created_cx > 15) %>% # only wikis with at leat 15 created articles\n", " mutate(deleted_cx_ratio = deleted_cx/created_cx,\n", " deleted_non_cx_ratio = deleted_non_cx/created_non_cx,\n", " deletion_ratio_diff = ((deleted_non_cx/created_non_cx)-(deleted_cx/created_cx)\n", " ))\n" ] }, { "cell_type": "code", "execution_count": 288, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\n", "
A data.frame: 1 × 1
total_wikis
<int>
21
\n" ], "text/latex": [ "A data.frame: 1 × 1\n", "\\begin{tabular}{l}\n", " total\\_wikis\\\\\n", " \\\\\n", "\\hline\n", "\t 21\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 1 × 1\n", "\n", "| total_wikis <int> |\n", "|---|\n", "| 21 |\n", "\n" ], "text/plain": [ " total_wikis\n", "1 21 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "cx_deletion_higher_previous <- cx_deletion_ratio_bywiki_previous %>%\n", " filter(deletion_ratio_diff < 0) %>%\n", " summarise(total_wikis = n())\n", "\n", "cx_deletion_higher_previous" ] }, { "cell_type": "code", "execution_count": 290, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Across all wikis where more than 15 articles have been created with content translation between July 2020 and December 2020, there were 21 wikis where articles created with content translation were deleted more than articles created without cx\"\n" ] } ], "source": [ "print(paste0(\"Across all wikis where more than 15 articles have been created with content translation between July 2020 and December 2020, there were \", \n", " cx_deletion_higher_previous[1], \n", " \" wikis where articles created with content translation were deleted more than articles created without cx\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The number of wikis with higher content translation deletion ratios decreased by 1 from July 2020 to December 2020 to January 2021 to June 2021.\n", "\n", "We next compared the two lists of wikis to confirm if most of the wikis with higher deletion rates were the same across each quarter." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How many wikis had higher deletion ratios for cx translated articles both quarters?" ] }, { "cell_type": "code", "execution_count": 296, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 21 × 8
wikicreated_cxcreated_non_cxdeleted_cxdeleted_non_cxdeleted_cx_ratiodeleted_non_cx_ratiodeletion_ratio_diff
<chr><int><int><int><int><dbl><dbl><dbl>
fywiki 17 1755 14 650.823529410.037037037-0.786492375
hawwiki 42 132 31 240.738095240.181818182-0.556277056
ltwiki 59 3337 28 6440.474576270.192987714-0.281588558
iswiki 26 2000 7 1550.269230770.077500000-0.191730769
lawiki 48 2979 9 1580.187500000.053037932-0.134462068
hywiki 159 33338 22 10800.138364780.032395465-0.105969315
azwiki 206 29671 29 18850.140776700.063530046-0.077246653
arywiki 63 2443 5 500.079365080.020466639-0.058898440
mywiki 313 6698 37 4390.118210860.065541953-0.052668910
cywiki 122 1451 13 850.106557380.058580289-0.047977088
vecwiki 20 10293 1 460.050000000.004469057-0.045530943
arzwiki 133355316 4 7300.030075190.002054509-0.028020679
eowiki 277 10800 8 900.028880870.008333333-0.020547533
zhwiki 1512 80866137 57530.090608470.071142384-0.019466082
dewiki 505119158 87183510.172277230.154005606-0.018271622
zh_yuewiki 35 23696 1 2670.028571430.011267725-0.017303704
ckbwiki 64 2901 5 1830.078125000.063081696-0.015043304
kuwiki 402 5291 13 1330.032338310.025137025-0.007201283
fiwiki 138 20467 12 16800.086956520.082083354-0.004873168
etwiki 55 8239 6 8650.109090910.104988469-0.004102440
bswiki 62 2677 6 2540.096774190.094882331-0.001891863
\n" ], "text/latex": [ "A data.frame: 21 × 8\n", "\\begin{tabular}{llllllll}\n", " wiki & created\\_cx & created\\_non\\_cx & deleted\\_cx & deleted\\_non\\_cx & deleted\\_cx\\_ratio & deleted\\_non\\_cx\\_ratio & deletion\\_ratio\\_diff\\\\\n", " & & & & & & & \\\\\n", "\\hline\n", "\t fywiki & 17 & 1755 & 14 & 65 & 0.82352941 & 0.037037037 & -0.786492375\\\\\n", "\t hawwiki & 42 & 132 & 31 & 24 & 0.73809524 & 0.181818182 & -0.556277056\\\\\n", "\t ltwiki & 59 & 3337 & 28 & 644 & 0.47457627 & 0.192987714 & -0.281588558\\\\\n", "\t iswiki & 26 & 2000 & 7 & 155 & 0.26923077 & 0.077500000 & -0.191730769\\\\\n", "\t lawiki & 48 & 2979 & 9 & 158 & 0.18750000 & 0.053037932 & -0.134462068\\\\\n", "\t hywiki & 159 & 33338 & 22 & 1080 & 0.13836478 & 0.032395465 & -0.105969315\\\\\n", "\t azwiki & 206 & 29671 & 29 & 1885 & 0.14077670 & 0.063530046 & -0.077246653\\\\\n", "\t arywiki & 63 & 2443 & 5 & 50 & 0.07936508 & 0.020466639 & -0.058898440\\\\\n", "\t mywiki & 313 & 6698 & 37 & 439 & 0.11821086 & 0.065541953 & -0.052668910\\\\\n", "\t cywiki & 122 & 1451 & 13 & 85 & 0.10655738 & 0.058580289 & -0.047977088\\\\\n", "\t vecwiki & 20 & 10293 & 1 & 46 & 0.05000000 & 0.004469057 & -0.045530943\\\\\n", "\t arzwiki & 133 & 355316 & 4 & 730 & 0.03007519 & 0.002054509 & -0.028020679\\\\\n", "\t eowiki & 277 & 10800 & 8 & 90 & 0.02888087 & 0.008333333 & -0.020547533\\\\\n", "\t zhwiki & 1512 & 80866 & 137 & 5753 & 0.09060847 & 0.071142384 & -0.019466082\\\\\n", "\t dewiki & 505 & 119158 & 87 & 18351 & 0.17227723 & 0.154005606 & -0.018271622\\\\\n", "\t zh\\_yuewiki & 35 & 23696 & 1 & 267 & 0.02857143 & 0.011267725 & -0.017303704\\\\\n", "\t ckbwiki & 64 & 2901 & 5 & 183 & 0.07812500 & 0.063081696 & -0.015043304\\\\\n", "\t kuwiki & 402 & 5291 & 13 & 133 & 0.03233831 & 0.025137025 & -0.007201283\\\\\n", "\t fiwiki & 138 & 20467 & 12 & 1680 & 0.08695652 & 0.082083354 & -0.004873168\\\\\n", "\t etwiki & 55 & 8239 & 6 & 865 & 0.10909091 & 0.104988469 & -0.004102440\\\\\n", "\t bswiki & 62 & 2677 & 6 & 254 & 0.09677419 & 0.094882331 & -0.001891863\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 21 × 8\n", "\n", "| wiki <chr> | created_cx <int> | created_non_cx <int> | deleted_cx <int> | deleted_non_cx <int> | deleted_cx_ratio <dbl> | deleted_non_cx_ratio <dbl> | deletion_ratio_diff <dbl> |\n", "|---|---|---|---|---|---|---|---|\n", "| fywiki | 17 | 1755 | 14 | 65 | 0.82352941 | 0.037037037 | -0.786492375 |\n", "| hawwiki | 42 | 132 | 31 | 24 | 0.73809524 | 0.181818182 | -0.556277056 |\n", "| ltwiki | 59 | 3337 | 28 | 644 | 0.47457627 | 0.192987714 | -0.281588558 |\n", "| iswiki | 26 | 2000 | 7 | 155 | 0.26923077 | 0.077500000 | -0.191730769 |\n", "| lawiki | 48 | 2979 | 9 | 158 | 0.18750000 | 0.053037932 | -0.134462068 |\n", "| hywiki | 159 | 33338 | 22 | 1080 | 0.13836478 | 0.032395465 | -0.105969315 |\n", "| azwiki | 206 | 29671 | 29 | 1885 | 0.14077670 | 0.063530046 | -0.077246653 |\n", "| arywiki | 63 | 2443 | 5 | 50 | 0.07936508 | 0.020466639 | -0.058898440 |\n", "| mywiki | 313 | 6698 | 37 | 439 | 0.11821086 | 0.065541953 | -0.052668910 |\n", "| cywiki | 122 | 1451 | 13 | 85 | 0.10655738 | 0.058580289 | -0.047977088 |\n", "| vecwiki | 20 | 10293 | 1 | 46 | 0.05000000 | 0.004469057 | -0.045530943 |\n", "| arzwiki | 133 | 355316 | 4 | 730 | 0.03007519 | 0.002054509 | -0.028020679 |\n", "| eowiki | 277 | 10800 | 8 | 90 | 0.02888087 | 0.008333333 | -0.020547533 |\n", "| zhwiki | 1512 | 80866 | 137 | 5753 | 0.09060847 | 0.071142384 | -0.019466082 |\n", "| dewiki | 505 | 119158 | 87 | 18351 | 0.17227723 | 0.154005606 | -0.018271622 |\n", "| zh_yuewiki | 35 | 23696 | 1 | 267 | 0.02857143 | 0.011267725 | -0.017303704 |\n", "| ckbwiki | 64 | 2901 | 5 | 183 | 0.07812500 | 0.063081696 | -0.015043304 |\n", "| kuwiki | 402 | 5291 | 13 | 133 | 0.03233831 | 0.025137025 | -0.007201283 |\n", "| fiwiki | 138 | 20467 | 12 | 1680 | 0.08695652 | 0.082083354 | -0.004873168 |\n", "| etwiki | 55 | 8239 | 6 | 865 | 0.10909091 | 0.104988469 | -0.004102440 |\n", "| bswiki | 62 | 2677 | 6 | 254 | 0.09677419 | 0.094882331 | -0.001891863 |\n", "\n" ], "text/plain": [ " wiki created_cx created_non_cx deleted_cx deleted_non_cx\n", "1 fywiki 17 1755 14 65 \n", "2 hawwiki 42 132 31 24 \n", "3 ltwiki 59 3337 28 644 \n", "4 iswiki 26 2000 7 155 \n", "5 lawiki 48 2979 9 158 \n", "6 hywiki 159 33338 22 1080 \n", "7 azwiki 206 29671 29 1885 \n", "8 arywiki 63 2443 5 50 \n", "9 mywiki 313 6698 37 439 \n", "10 cywiki 122 1451 13 85 \n", "11 vecwiki 20 10293 1 46 \n", "12 arzwiki 133 355316 4 730 \n", "13 eowiki 277 10800 8 90 \n", "14 zhwiki 1512 80866 137 5753 \n", "15 dewiki 505 119158 87 18351 \n", "16 zh_yuewiki 35 23696 1 267 \n", "17 ckbwiki 64 2901 5 183 \n", "18 kuwiki 402 5291 13 133 \n", "19 fiwiki 138 20467 12 1680 \n", "20 etwiki 55 8239 6 865 \n", "21 bswiki 62 2677 6 254 \n", " deleted_cx_ratio deleted_non_cx_ratio deletion_ratio_diff\n", "1 0.82352941 0.037037037 -0.786492375 \n", "2 0.73809524 0.181818182 -0.556277056 \n", "3 0.47457627 0.192987714 -0.281588558 \n", "4 0.26923077 0.077500000 -0.191730769 \n", "5 0.18750000 0.053037932 -0.134462068 \n", "6 0.13836478 0.032395465 -0.105969315 \n", "7 0.14077670 0.063530046 -0.077246653 \n", "8 0.07936508 0.020466639 -0.058898440 \n", "9 0.11821086 0.065541953 -0.052668910 \n", "10 0.10655738 0.058580289 -0.047977088 \n", "11 0.05000000 0.004469057 -0.045530943 \n", "12 0.03007519 0.002054509 -0.028020679 \n", "13 0.02888087 0.008333333 -0.020547533 \n", "14 0.09060847 0.071142384 -0.019466082 \n", "15 0.17227723 0.154005606 -0.018271622 \n", "16 0.02857143 0.011267725 -0.017303704 \n", "17 0.07812500 0.063081696 -0.015043304 \n", "18 0.03233831 0.025137025 -0.007201283 \n", "19 0.08695652 0.082083354 -0.004873168 \n", "20 0.10909091 0.104988469 -0.004102440 \n", "21 0.09677419 0.094882331 -0.001891863 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "cx_deletion_higher_list_previous <- cx_deletion_ratio_bywiki_previous %>%\n", " filter(deletion_ratio_diff < 0) %>%\n", " arrange(deletion_ratio_diff)\n", "\n", "cx_deletion_higher_list_previous" ] }, { "cell_type": "code", "execution_count": 294, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 8 × 1
wiki
<chr>
hawwiki
iswiki
kuwiki
arywiki
arzwiki
fiwiki
lawiki
eowiki
\n" ], "text/latex": [ "A data.frame: 8 × 1\n", "\\begin{tabular}{l}\n", " wiki\\\\\n", " \\\\\n", "\\hline\n", "\t hawwiki\\\\\n", "\t iswiki \\\\\n", "\t kuwiki \\\\\n", "\t arywiki\\\\\n", "\t arzwiki\\\\\n", "\t fiwiki \\\\\n", "\t lawiki \\\\\n", "\t eowiki \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 8 × 1\n", "\n", "| wiki <chr> |\n", "|---|\n", "| hawwiki |\n", "| iswiki |\n", "| kuwiki |\n", "| arywiki |\n", "| arzwiki |\n", "| fiwiki |\n", "| lawiki |\n", "| eowiki |\n", "\n" ], "text/plain": [ " wiki \n", "1 hawwiki\n", "2 iswiki \n", "3 kuwiki \n", "4 arywiki\n", "5 arzwiki\n", "6 fiwiki \n", "7 lawiki \n", "8 eowiki " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "intersect(cx_deletion_higher_list_current[1], cx_deletion_higher_list_previous[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There were 8 wikis that had higher deletion ratios for content translated articles both quarters. " ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 4 }