{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# What percent of talk pages have not yet been created?\n", "\n", "[Task](https://phabricator.wikimedia.org/T272657)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Background\n", "\n", "This task is about uncovering the percentage of talk pages, across namespaces and Wikipedias, that have not yet been created. nowing the percentage of talk pages, across namespaces and Wikipedias, that have not yet been created will help us decide how highly we should prioritize work on designing the empty state experience (See [T252902](https://phabricator.wikimedia.org/T252902).\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Metrics\n", "\n", "* We are curious to know the following: Of all the pages in the subject namespace, what percentage of them do NOT have a corresponding talk page that's been created?\n", "* We would value seeing the percentage of non-yet-created talk pages grouped by Wikipedia and within each wiki, grouped by namespace.\n" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [], "source": [ "shhh <- function(expr) suppressPackageStartupMessages(suppressWarnings(suppressMessages(expr)))\n", "shhh({\n", " library(tidyverse);\n", "})" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 6 × 6
wikisubject_namespacenum_subject_pagesnum_talk_pagesnum_talk_pages_not_yet_createdprop_talk_pages_not_yet_created
<chr><int><int><int><int><dbl>
1aawiki 0 2 1 10.5000000
2aawiki 2170541160.6823529
3aawiki 4 7 0 71.0000000
4aawiki 8107 01071.0000000
5aawiki10 24 0 241.0000000
6aawiki14 29 0 291.0000000
\n" ], "text/latex": [ "A data.frame: 6 × 6\n", "\\begin{tabular}{r|llllll}\n", " & wiki & subject\\_namespace & num\\_subject\\_pages & num\\_talk\\_pages & num\\_talk\\_pages\\_not\\_yet\\_created & prop\\_talk\\_pages\\_not\\_yet\\_created\\\\\n", " & & & & & & \\\\\n", "\\hline\n", "\t1 & aawiki & 0 & 2 & 1 & 1 & 0.5000000\\\\\n", "\t2 & aawiki & 2 & 170 & 54 & 116 & 0.6823529\\\\\n", "\t3 & aawiki & 4 & 7 & 0 & 7 & 1.0000000\\\\\n", "\t4 & aawiki & 8 & 107 & 0 & 107 & 1.0000000\\\\\n", "\t5 & aawiki & 10 & 24 & 0 & 24 & 1.0000000\\\\\n", "\t6 & aawiki & 14 & 29 & 0 & 29 & 1.0000000\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 6 × 6\n", "\n", "| | wiki <chr> | subject_namespace <int> | num_subject_pages <int> | num_talk_pages <int> | num_talk_pages_not_yet_created <int> | prop_talk_pages_not_yet_created <dbl> |\n", "|---|---|---|---|---|---|---|\n", "| 1 | aawiki | 0 | 2 | 1 | 1 | 0.5000000 |\n", "| 2 | aawiki | 2 | 170 | 54 | 116 | 0.6823529 |\n", "| 3 | aawiki | 4 | 7 | 0 | 7 | 1.0000000 |\n", "| 4 | aawiki | 8 | 107 | 0 | 107 | 1.0000000 |\n", "| 5 | aawiki | 10 | 24 | 0 | 24 | 1.0000000 |\n", "| 6 | aawiki | 14 | 29 | 0 | 29 | 1.0000000 |\n", "\n" ], "text/plain": [ " wiki subject_namespace num_subject_pages num_talk_pages\n", "1 aawiki 0 2 1 \n", "2 aawiki 2 170 54 \n", "3 aawiki 4 7 0 \n", "4 aawiki 8 107 0 \n", "5 aawiki 10 24 0 \n", "6 aawiki 14 29 0 \n", " num_talk_pages_not_yet_created prop_talk_pages_not_yet_created\n", "1 1 0.5000000 \n", "2 116 0.6823529 \n", "3 7 1.0000000 \n", "4 107 1.0000000 \n", "5 24 1.0000000 \n", "6 29 1.0000000 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "test <- missing_talk_pages)" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [], "source": [ "query <-\n", "\n", "\"\n", "--find all subject namespace page\n", "WITH subject_pages AS (\n", "SELECT\n", "-- address duplicate page title instances in mediawiki_page_history\n", " DISTINCT page_title AS subject_title,\n", " wiki_db AS wiki,\n", " CASE\n", " WHEN page_namespace = 0 THEN 'Main/Article'\n", " WHEN page_namespace = 2 THEN 'User'\n", " WHEN page_namespace = 4 THEN 'Wikipedia'\n", " WHEN page_namespace = 6 THEN 'File'\n", " WHEN page_namespace = 8 THEN 'MediaWiki'\n", " WHEN page_namespace = 10 THEN 'Template'\n", " WHEN page_namespace = 12 THEN 'Help'\n", " WHEN page_namespace = 14 THEN 'Category'\n", " WHEN page_namespace = 100 THEN 'Portal'\n", " WHEN page_namespace = 118 THEN 'Draft'\n", " WHEN page_namespace = 710 THEN 'TimedText'\n", " WHEN page_namespace = 828 THEN 'Module'\n", " END AS subject_namespace\n", "FROM wmf.mediawiki_page_history\n", "INNER JOIN canonical_data.wikis\n", " ON\n", " wiki_db = database_code and\n", " database_group == 'wikipedia'\n", "WHERE \n", "-- review all primary subject namespaces\n", " page_namespace IN (0,2,4,6,8,10,12,14,100,118, 710, 828) \n", "--Remove redirects\n", " AND page_is_redirect = FALSE \n", "--Remove archived articles \n", " AND page_is_deleted = FALSE\n", " AND snapshot = '2020-12'\n", "),\n", "--find all talk namespace pages\n", "talk_pages AS (\n", "SELECT\n", " DISTINCT page_title AS talk_title,\n", " wiki_db AS wiki,\n", " CASE\n", " WHEN page_namespace = 1 THEN 'Talk'\n", " WHEN page_namespace = 3 THEN 'User talk'\n", " WHEN page_namespace = 5 THEN 'Wikipedia talk'\n", " WHEN page_namespace = 7 THEN 'File talk'\n", " WHEN page_namespace = 9 THEN 'MediaWiki talk'\n", " WHEN page_namespace = 11 THEN 'Template talk'\n", " WHEN page_namespace = 13 THEN 'Help talk'\n", " WHEN page_namespace = 15 THEN 'Category talk'\n", " WHEN page_namespace = 101 THEN 'Portal talk'\n", " WHEN page_namespace = 119 THEN 'Draft talk'\n", " WHEN page_namespace = 711 THEN 'TimedText talk'\n", " WHEN page_namespace = 829 THEN 'Module talk'\n", " END AS talk_namespace\n", "FROM wmf.mediawiki_page_history\n", "INNER JOIN canonical_data.wikis\n", " ON\n", " wiki_db = database_code and\n", " database_group == 'wikipedia'\n", "WHERE \n", "-- review all primary talk namespaces\n", " page_namespace IN (1,3,5,7,9,11,13,15, 101, 119, 711, 829) \n", "--Remove redirects\n", " AND page_is_redirect = FALSE \n", "--Remove archived articles \n", " AND page_is_deleted = FALSE\n", " AND snapshot = '2020-12'\n", ")\n", "-- MAIN QUERY --\n", "SELECT\n", " sp.wiki,\n", " sp.subject_namespace,\n", " COUNT(*) AS num_subject_pages,\n", " SUM(CAST(talk_title IS NOT NULL AS int)) AS num_talk_pages_created,\n", " SUM(CAST(talk_title IS NULL AS int)) AS num_talk_pages_not_yet_created\n", " FROM \n", " subject_pages AS sp\n", "LEFT JOIN talk_pages ON \n", " sp.subject_title = talk_pages.talk_title AND\n", " sp.wiki = talk_pages.wiki \n", "GROUP BY\n", " sp.wiki,\n", " sp.subject_namespace\n", " ;\n", "\"" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Don't forget to authenticate with Kerberos using kinit\n", "\n" ] } ], "source": [ "missing_talk_pages <- wmfdata::query_hive(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# By Namespace" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 12 × 4
subject_namespacetotal_talk_missingtotal_subjectprop_talk_missing
<chr><int><int><dbl>
Category 98033681348597272.69
Draft 50342 6656375.63
File 2116032 269958378.38
Help 18252 2273980.27
Main/Article322390285610766157.46
MediaWiki 106603 11503092.67
Module 148977 15765694.49
Portal 471198 52335390.03
Template 4398181 521963284.26
TimedText 1148 126690.68
User 4064202 703479357.77
Wikipedia 2411805 255986194.22
\n" ], "text/latex": [ "A tibble: 12 × 4\n", "\\begin{tabular}{llll}\n", " subject\\_namespace & total\\_talk\\_missing & total\\_subject & prop\\_talk\\_missing\\\\\n", " & & & \\\\\n", "\\hline\n", "\t Category & 9803368 & 13485972 & 72.69\\\\\n", "\t Draft & 50342 & 66563 & 75.63\\\\\n", "\t File & 2116032 & 2699583 & 78.38\\\\\n", "\t Help & 18252 & 22739 & 80.27\\\\\n", "\t Main/Article & 32239028 & 56107661 & 57.46\\\\\n", "\t MediaWiki & 106603 & 115030 & 92.67\\\\\n", "\t Module & 148977 & 157656 & 94.49\\\\\n", "\t Portal & 471198 & 523353 & 90.03\\\\\n", "\t Template & 4398181 & 5219632 & 84.26\\\\\n", "\t TimedText & 1148 & 1266 & 90.68\\\\\n", "\t User & 4064202 & 7034793 & 57.77\\\\\n", "\t Wikipedia & 2411805 & 2559861 & 94.22\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 12 × 4\n", "\n", "| subject_namespace <chr> | total_talk_missing <int> | total_subject <int> | prop_talk_missing <dbl> |\n", "|---|---|---|---|\n", "| Category | 9803368 | 13485972 | 72.69 |\n", "| Draft | 50342 | 66563 | 75.63 |\n", "| File | 2116032 | 2699583 | 78.38 |\n", "| Help | 18252 | 22739 | 80.27 |\n", "| Main/Article | 32239028 | 56107661 | 57.46 |\n", "| MediaWiki | 106603 | 115030 | 92.67 |\n", "| Module | 148977 | 157656 | 94.49 |\n", "| Portal | 471198 | 523353 | 90.03 |\n", "| Template | 4398181 | 5219632 | 84.26 |\n", "| TimedText | 1148 | 1266 | 90.68 |\n", "| User | 4064202 | 7034793 | 57.77 |\n", "| Wikipedia | 2411805 | 2559861 | 94.22 |\n", "\n" ], "text/plain": [ " subject_namespace total_talk_missing total_subject prop_talk_missing\n", "1 Category 9803368 13485972 72.69 \n", "2 Draft 50342 66563 75.63 \n", "3 File 2116032 2699583 78.38 \n", "4 Help 18252 22739 80.27 \n", "5 Main/Article 32239028 56107661 57.46 \n", "6 MediaWiki 106603 115030 92.67 \n", "7 Module 148977 157656 94.49 \n", "8 Portal 471198 523353 90.03 \n", "9 Template 4398181 5219632 84.26 \n", "10 TimedText 1148 1266 90.68 \n", "11 User 4064202 7034793 57.77 \n", "12 Wikipedia 2411805 2559861 94.22 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "missing_talk_bynamespace <- missing_talk_pages %>%\n", " group_by(subject_namespace) %>%\n", " summarise(total_talk_missing = sum(num_talk_pages_not_yet_created),\n", " total_subject = sum(num_subject_pages),\n", " prop_talk_missing = round(sum(num_talk_pages_not_yet_created)/sum(num_subject_pages) * 100,2), .groups = 'drop')\n", "\n", "missing_talk_bynamespace" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# By Wiki and Talk Namespace" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 2784 × 5
wikisubject_namespacetotal_talk_missingtotal_subjectprop_talk_missing
<chr><chr><int><int><dbl>
aawiki Category 29 29100.00
aawiki Main/Article 1 2 50.00
aawiki MediaWiki 107 107100.00
aawiki Template 24 24100.00
aawiki User 116 170 68.24
aawiki Wikipedia 7 7100.00
abwiki Category 7046 7086 99.44
abwiki File 10 10100.00
abwiki Help 2 2100.00
abwiki Main/Article 6125 6203 98.74
abwiki MediaWiki 66 71 92.96
abwiki Module 82 82100.00
abwiki Template 1199 1210 99.09
abwiki User 791 1007 78.55
abwiki Wikipedia 35 39 89.74
acewikiCategory 1052 1073 98.04
acewikiHelp 4 5 80.00
acewikiMain/Article1040610486 99.24
acewikiMediaWiki 107 112 95.54
acewikiModule 54 56 96.43
acewikiTemplate 1685 1704 98.88
acewikiUser 583 906 64.35
acewikiWikipedia 62 68 91.18
adywikiCategory 36 38 94.74
adywikiHelp 1 1100.00
adywikiMain/Article 555 564 98.40
adywikiMediaWiki 25 28 89.29
adywikiModule 8 8100.00
adywikiTemplate 1318 1321 99.77
adywikiUser 44 62 70.97
zh_min_nanwikiWikipedia 570 1706 33.41
zh_yuewiki Category 22565 23564 95.76
zh_yuewiki File 2037 2048 99.46
zh_yuewiki Help 20 27 74.07
zh_yuewiki Main/Article 97554 105811 92.20
zh_yuewiki MediaWiki 1706 1785 95.57
zh_yuewiki Module 312 334 93.41
zh_yuewiki Portal 413 459 89.98
zh_yuewiki Template 7994 8526 93.76
zh_yuewiki User 2045 3037 67.34
zh_yuewiki Wikipedia 2219 2433 91.20
zhwiki Category 295498 389965 75.78
zhwiki Draft 675 892 75.67
zhwiki File 50111 55899 89.65
zhwiki Help 298 427 69.79
zhwiki Main/Article5431431190778 45.61
zhwiki MediaWiki 8116 8451 96.04
zhwiki Module 3115 3509 88.77
zhwiki Portal 8374 9902 84.57
zhwiki Template 907022 936618 96.84
zhwiki User 68774 144267 47.67
zhwiki Wikipedia 47756 52642 90.72
zuwiki Category 1047 1083 96.68
zuwiki Help 1 1100.00
zuwiki Main/Article 6339 6412 98.86
zuwiki MediaWiki 14 16 87.50
zuwiki Module 26 26100.00
zuwiki Template 865 869 99.54
zuwiki User 1323 1456 90.87
zuwiki Wikipedia 20 25 80.00
\n" ], "text/latex": [ "A tibble: 2784 × 5\n", "\\begin{tabular}{lllll}\n", " wiki & subject\\_namespace & total\\_talk\\_missing & total\\_subject & prop\\_talk\\_missing\\\\\n", " & & & & \\\\\n", "\\hline\n", "\t aawiki & Category & 29 & 29 & 100.00\\\\\n", "\t aawiki & Main/Article & 1 & 2 & 50.00\\\\\n", "\t aawiki & MediaWiki & 107 & 107 & 100.00\\\\\n", "\t aawiki & Template & 24 & 24 & 100.00\\\\\n", "\t aawiki & User & 116 & 170 & 68.24\\\\\n", "\t aawiki & Wikipedia & 7 & 7 & 100.00\\\\\n", "\t abwiki & Category & 7046 & 7086 & 99.44\\\\\n", "\t abwiki & File & 10 & 10 & 100.00\\\\\n", "\t abwiki & Help & 2 & 2 & 100.00\\\\\n", "\t abwiki & Main/Article & 6125 & 6203 & 98.74\\\\\n", "\t abwiki & MediaWiki & 66 & 71 & 92.96\\\\\n", "\t abwiki & Module & 82 & 82 & 100.00\\\\\n", "\t abwiki & Template & 1199 & 1210 & 99.09\\\\\n", "\t abwiki & User & 791 & 1007 & 78.55\\\\\n", "\t abwiki & Wikipedia & 35 & 39 & 89.74\\\\\n", "\t acewiki & Category & 1052 & 1073 & 98.04\\\\\n", "\t acewiki & Help & 4 & 5 & 80.00\\\\\n", "\t acewiki & Main/Article & 10406 & 10486 & 99.24\\\\\n", "\t acewiki & MediaWiki & 107 & 112 & 95.54\\\\\n", "\t acewiki & Module & 54 & 56 & 96.43\\\\\n", "\t acewiki & Template & 1685 & 1704 & 98.88\\\\\n", "\t acewiki & User & 583 & 906 & 64.35\\\\\n", "\t acewiki & Wikipedia & 62 & 68 & 91.18\\\\\n", "\t adywiki & Category & 36 & 38 & 94.74\\\\\n", "\t adywiki & Help & 1 & 1 & 100.00\\\\\n", "\t adywiki & Main/Article & 555 & 564 & 98.40\\\\\n", "\t adywiki & MediaWiki & 25 & 28 & 89.29\\\\\n", "\t adywiki & Module & 8 & 8 & 100.00\\\\\n", "\t adywiki & Template & 1318 & 1321 & 99.77\\\\\n", "\t adywiki & User & 44 & 62 & 70.97\\\\\n", "\t ⋮ & ⋮ & ⋮ & ⋮ & ⋮\\\\\n", "\t zh\\_min\\_nanwiki & Wikipedia & 570 & 1706 & 33.41\\\\\n", "\t zh\\_yuewiki & Category & 22565 & 23564 & 95.76\\\\\n", "\t zh\\_yuewiki & File & 2037 & 2048 & 99.46\\\\\n", "\t zh\\_yuewiki & Help & 20 & 27 & 74.07\\\\\n", "\t zh\\_yuewiki & Main/Article & 97554 & 105811 & 92.20\\\\\n", "\t zh\\_yuewiki & MediaWiki & 1706 & 1785 & 95.57\\\\\n", "\t zh\\_yuewiki & Module & 312 & 334 & 93.41\\\\\n", "\t zh\\_yuewiki & Portal & 413 & 459 & 89.98\\\\\n", "\t zh\\_yuewiki & Template & 7994 & 8526 & 93.76\\\\\n", "\t zh\\_yuewiki & User & 2045 & 3037 & 67.34\\\\\n", "\t zh\\_yuewiki & Wikipedia & 2219 & 2433 & 91.20\\\\\n", "\t zhwiki & Category & 295498 & 389965 & 75.78\\\\\n", "\t zhwiki & Draft & 675 & 892 & 75.67\\\\\n", "\t zhwiki & File & 50111 & 55899 & 89.65\\\\\n", "\t zhwiki & Help & 298 & 427 & 69.79\\\\\n", "\t zhwiki & Main/Article & 543143 & 1190778 & 45.61\\\\\n", "\t zhwiki & MediaWiki & 8116 & 8451 & 96.04\\\\\n", "\t zhwiki & Module & 3115 & 3509 & 88.77\\\\\n", "\t zhwiki & Portal & 8374 & 9902 & 84.57\\\\\n", "\t zhwiki & Template & 907022 & 936618 & 96.84\\\\\n", "\t zhwiki & User & 68774 & 144267 & 47.67\\\\\n", "\t zhwiki & Wikipedia & 47756 & 52642 & 90.72\\\\\n", "\t zuwiki & Category & 1047 & 1083 & 96.68\\\\\n", "\t zuwiki & Help & 1 & 1 & 100.00\\\\\n", "\t zuwiki & Main/Article & 6339 & 6412 & 98.86\\\\\n", "\t zuwiki & MediaWiki & 14 & 16 & 87.50\\\\\n", "\t zuwiki & Module & 26 & 26 & 100.00\\\\\n", "\t zuwiki & Template & 865 & 869 & 99.54\\\\\n", "\t zuwiki & User & 1323 & 1456 & 90.87\\\\\n", "\t zuwiki & Wikipedia & 20 & 25 & 80.00\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 2784 × 5\n", "\n", "| wiki <chr> | subject_namespace <chr> | total_talk_missing <int> | total_subject <int> | prop_talk_missing <dbl> |\n", "|---|---|---|---|---|\n", "| aawiki | Category | 29 | 29 | 100.00 |\n", "| aawiki | Main/Article | 1 | 2 | 50.00 |\n", "| aawiki | MediaWiki | 107 | 107 | 100.00 |\n", "| aawiki | Template | 24 | 24 | 100.00 |\n", "| aawiki | User | 116 | 170 | 68.24 |\n", "| aawiki | Wikipedia | 7 | 7 | 100.00 |\n", "| abwiki | Category | 7046 | 7086 | 99.44 |\n", "| abwiki | File | 10 | 10 | 100.00 |\n", "| abwiki | Help | 2 | 2 | 100.00 |\n", "| abwiki | Main/Article | 6125 | 6203 | 98.74 |\n", "| abwiki | MediaWiki | 66 | 71 | 92.96 |\n", "| abwiki | Module | 82 | 82 | 100.00 |\n", "| abwiki | Template | 1199 | 1210 | 99.09 |\n", "| abwiki | User | 791 | 1007 | 78.55 |\n", "| abwiki | Wikipedia | 35 | 39 | 89.74 |\n", "| acewiki | Category | 1052 | 1073 | 98.04 |\n", "| acewiki | Help | 4 | 5 | 80.00 |\n", "| acewiki | Main/Article | 10406 | 10486 | 99.24 |\n", "| acewiki | MediaWiki | 107 | 112 | 95.54 |\n", "| acewiki | Module | 54 | 56 | 96.43 |\n", "| acewiki | Template | 1685 | 1704 | 98.88 |\n", "| acewiki | User | 583 | 906 | 64.35 |\n", "| acewiki | Wikipedia | 62 | 68 | 91.18 |\n", "| adywiki | Category | 36 | 38 | 94.74 |\n", "| adywiki | Help | 1 | 1 | 100.00 |\n", "| adywiki | Main/Article | 555 | 564 | 98.40 |\n", "| adywiki | MediaWiki | 25 | 28 | 89.29 |\n", "| adywiki | Module | 8 | 8 | 100.00 |\n", "| adywiki | Template | 1318 | 1321 | 99.77 |\n", "| adywiki | User | 44 | 62 | 70.97 |\n", "| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |\n", "| zh_min_nanwiki | Wikipedia | 570 | 1706 | 33.41 |\n", "| zh_yuewiki | Category | 22565 | 23564 | 95.76 |\n", "| zh_yuewiki | File | 2037 | 2048 | 99.46 |\n", "| zh_yuewiki | Help | 20 | 27 | 74.07 |\n", "| zh_yuewiki | Main/Article | 97554 | 105811 | 92.20 |\n", "| zh_yuewiki | MediaWiki | 1706 | 1785 | 95.57 |\n", "| zh_yuewiki | Module | 312 | 334 | 93.41 |\n", "| zh_yuewiki | Portal | 413 | 459 | 89.98 |\n", "| zh_yuewiki | Template | 7994 | 8526 | 93.76 |\n", "| zh_yuewiki | User | 2045 | 3037 | 67.34 |\n", "| zh_yuewiki | Wikipedia | 2219 | 2433 | 91.20 |\n", "| zhwiki | Category | 295498 | 389965 | 75.78 |\n", "| zhwiki | Draft | 675 | 892 | 75.67 |\n", "| zhwiki | File | 50111 | 55899 | 89.65 |\n", "| zhwiki | Help | 298 | 427 | 69.79 |\n", "| zhwiki | Main/Article | 543143 | 1190778 | 45.61 |\n", "| zhwiki | MediaWiki | 8116 | 8451 | 96.04 |\n", "| zhwiki | Module | 3115 | 3509 | 88.77 |\n", "| zhwiki | Portal | 8374 | 9902 | 84.57 |\n", "| zhwiki | Template | 907022 | 936618 | 96.84 |\n", "| zhwiki | User | 68774 | 144267 | 47.67 |\n", "| zhwiki | Wikipedia | 47756 | 52642 | 90.72 |\n", "| zuwiki | Category | 1047 | 1083 | 96.68 |\n", "| zuwiki | Help | 1 | 1 | 100.00 |\n", "| zuwiki | Main/Article | 6339 | 6412 | 98.86 |\n", "| zuwiki | MediaWiki | 14 | 16 | 87.50 |\n", "| zuwiki | Module | 26 | 26 | 100.00 |\n", "| zuwiki | Template | 865 | 869 | 99.54 |\n", "| zuwiki | User | 1323 | 1456 | 90.87 |\n", "| zuwiki | Wikipedia | 20 | 25 | 80.00 |\n", "\n" ], "text/plain": [ " wiki subject_namespace total_talk_missing total_subject\n", "1 aawiki Category 29 29 \n", "2 aawiki Main/Article 1 2 \n", "3 aawiki MediaWiki 107 107 \n", "4 aawiki Template 24 24 \n", "5 aawiki User 116 170 \n", "6 aawiki Wikipedia 7 7 \n", "7 abwiki Category 7046 7086 \n", "8 abwiki File 10 10 \n", "9 abwiki Help 2 2 \n", "10 abwiki Main/Article 6125 6203 \n", "11 abwiki MediaWiki 66 71 \n", "12 abwiki Module 82 82 \n", "13 abwiki Template 1199 1210 \n", "14 abwiki User 791 1007 \n", "15 abwiki Wikipedia 35 39 \n", "16 acewiki Category 1052 1073 \n", "17 acewiki Help 4 5 \n", "18 acewiki Main/Article 10406 10486 \n", "19 acewiki MediaWiki 107 112 \n", "20 acewiki Module 54 56 \n", "21 acewiki Template 1685 1704 \n", "22 acewiki User 583 906 \n", "23 acewiki Wikipedia 62 68 \n", "24 adywiki Category 36 38 \n", "25 adywiki Help 1 1 \n", "26 adywiki Main/Article 555 564 \n", "27 adywiki MediaWiki 25 28 \n", "28 adywiki Module 8 8 \n", "29 adywiki Template 1318 1321 \n", "30 adywiki User 44 62 \n", "⋮ ⋮ ⋮ ⋮ ⋮ \n", "2755 zh_min_nanwiki Wikipedia 570 1706 \n", "2756 zh_yuewiki Category 22565 23564 \n", "2757 zh_yuewiki File 2037 2048 \n", "2758 zh_yuewiki Help 20 27 \n", "2759 zh_yuewiki Main/Article 97554 105811 \n", "2760 zh_yuewiki MediaWiki 1706 1785 \n", "2761 zh_yuewiki Module 312 334 \n", "2762 zh_yuewiki Portal 413 459 \n", "2763 zh_yuewiki Template 7994 8526 \n", "2764 zh_yuewiki User 2045 3037 \n", "2765 zh_yuewiki Wikipedia 2219 2433 \n", "2766 zhwiki Category 295498 389965 \n", "2767 zhwiki Draft 675 892 \n", "2768 zhwiki File 50111 55899 \n", "2769 zhwiki Help 298 427 \n", "2770 zhwiki Main/Article 543143 1190778 \n", "2771 zhwiki MediaWiki 8116 8451 \n", "2772 zhwiki Module 3115 3509 \n", "2773 zhwiki Portal 8374 9902 \n", "2774 zhwiki Template 907022 936618 \n", "2775 zhwiki User 68774 144267 \n", "2776 zhwiki Wikipedia 47756 52642 \n", "2777 zuwiki Category 1047 1083 \n", "2778 zuwiki Help 1 1 \n", "2779 zuwiki Main/Article 6339 6412 \n", "2780 zuwiki MediaWiki 14 16 \n", "2781 zuwiki Module 26 26 \n", "2782 zuwiki Template 865 869 \n", "2783 zuwiki User 1323 1456 \n", "2784 zuwiki Wikipedia 20 25 \n", " prop_talk_missing\n", "1 100.00 \n", "2 50.00 \n", "3 100.00 \n", "4 100.00 \n", "5 68.24 \n", "6 100.00 \n", "7 99.44 \n", "8 100.00 \n", "9 100.00 \n", "10 98.74 \n", "11 92.96 \n", "12 100.00 \n", "13 99.09 \n", "14 78.55 \n", "15 89.74 \n", "16 98.04 \n", "17 80.00 \n", "18 99.24 \n", "19 95.54 \n", "20 96.43 \n", "21 98.88 \n", "22 64.35 \n", "23 91.18 \n", "24 94.74 \n", "25 100.00 \n", "26 98.40 \n", "27 89.29 \n", "28 100.00 \n", "29 99.77 \n", "30 70.97 \n", "⋮ ⋮ \n", "2755 33.41 \n", "2756 95.76 \n", "2757 99.46 \n", "2758 74.07 \n", "2759 92.20 \n", "2760 95.57 \n", "2761 93.41 \n", "2762 89.98 \n", "2763 93.76 \n", "2764 67.34 \n", "2765 91.20 \n", "2766 75.78 \n", "2767 75.67 \n", "2768 89.65 \n", "2769 69.79 \n", "2770 45.61 \n", "2771 96.04 \n", "2772 88.77 \n", "2773 84.57 \n", "2774 96.84 \n", "2775 47.67 \n", "2776 90.72 \n", "2777 96.68 \n", "2778 100.00 \n", "2779 98.86 \n", "2780 87.50 \n", "2781 100.00 \n", "2782 99.54 \n", "2783 90.87 \n", "2784 80.00 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "missing_talk_bynamespace <- missing_talk_pages %>%\n", " group_by(wiki, subject_namespace) %>%\n", " summarise(total_talk_missing = sum(num_talk_pages_not_yet_created),\n", " total_subject = sum(num_subject_pages),\n", " prop_talk_missing = round(sum(num_talk_pages_not_yet_created)/sum(num_subject_pages) * 100,2), .groups = 'drop')\n", "\n", "missing_talk_bynamespace" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 4 }