{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Post Deployment Language Switching QA" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Task](https://phabricator.wikimedia.org/T275762)\n", "|\n", "[Schema](https://gerrit.wikimedia.org/r/plugins/gitiles/schemas/event/secondary/+/refs/heads/master/jsonschema/analytics/legacy/universallanguageselector/current.yaml)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Contents\n", "1. [Language Links in the Sidebar](#Language-Links-in-the-Sidebar)\n", "2. [Language Switcher Button](#Language-Switcher-Button)\n", "3. [Language Button Context Events](#Language-Button-Context-Events)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "shhh <- function(expr) suppressPackageStartupMessages(suppressWarnings(suppressMessages(expr)))\n", "shhh({\n", " library(tidyverse); \n", " library(lubridate); \n", " library(scales);\n", " library(magrittr); \n", " library(dplyr)\n", "})\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Language Links in the Siderbar\n", "\n", "[Instrumentation Task](https://phabricator.wikimedia.org/T275762)\n", "\n", "Instrumentation Notes:\n", "- Instrumentation limited to legacy sidebar in modern Vector.\n", "- Logged as `event.context = 'languages-list'`" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "query <- \n", "\"\n", "SELECT\n", " TO_DATE(dt) AS `date`,\n", " wiki,\n", " event.web_session_id,\n", " event.usereditbucket,\n", " event.timetochangelanguage,\n", " event.isanon,\n", " event.interfacelanguage,\n", " event.contentlanguage,\n", " event.selectedinterfacelanguage,\n", " Count(*) AS n_events\n", "FROM event.universallanguageselector\n", "WHERE\n", " year = 2021\n", " AND ((Month = 04 AND DAY > 26) OR (MONTH = 05))\n", " AND event.context = 'languages-list'\n", "GROUP BY\n", " TO_DATE(dt),\n", " wiki,\n", " event.web_session_id,\n", " event.usereditbucket,\n", " event.timetochangelanguage,\n", " event.isanon,\n", " event.interfacelanguage,\n", " event.contentlanguage,\n", " event.selectedinterfacelanguage\n", "\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Don't forget to authenticate with Kerberos using kinit\n", "\n" ] } ], "source": [ "lang_link_events <- wmfdata::query_hive(query)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "lang_link_events$date <- as.Date(lang_link_events$date)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Daily Langauge Link Events" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 16 × 3
daten_eventsn_sessions
<date><int><int>
2021-04-28 791 636
2021-04-29 46934 33173
2021-04-30274519177001
2021-05-01243709151370
2021-05-02258630161650
2021-05-03303583195465
2021-05-04300642194931
2021-05-05295520191876
2021-05-06291788190268
2021-05-07265159171239
2021-05-08225456141143
2021-05-09244182152870
2021-05-10301699196353
2021-05-11296753193206
2021-05-12278392181382
2021-05-13 54123 37258
\n" ], "text/latex": [ "A tibble: 16 × 3\n", "\\begin{tabular}{lll}\n", " date & n\\_events & n\\_sessions\\\\\n", " & & \\\\\n", "\\hline\n", "\t 2021-04-28 & 791 & 636\\\\\n", "\t 2021-04-29 & 46934 & 33173\\\\\n", "\t 2021-04-30 & 274519 & 177001\\\\\n", "\t 2021-05-01 & 243709 & 151370\\\\\n", "\t 2021-05-02 & 258630 & 161650\\\\\n", "\t 2021-05-03 & 303583 & 195465\\\\\n", "\t 2021-05-04 & 300642 & 194931\\\\\n", "\t 2021-05-05 & 295520 & 191876\\\\\n", "\t 2021-05-06 & 291788 & 190268\\\\\n", "\t 2021-05-07 & 265159 & 171239\\\\\n", "\t 2021-05-08 & 225456 & 141143\\\\\n", "\t 2021-05-09 & 244182 & 152870\\\\\n", "\t 2021-05-10 & 301699 & 196353\\\\\n", "\t 2021-05-11 & 296753 & 193206\\\\\n", "\t 2021-05-12 & 278392 & 181382\\\\\n", "\t 2021-05-13 & 54123 & 37258\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 16 × 3\n", "\n", "| date <date> | n_events <int> | n_sessions <int> |\n", "|---|---|---|\n", "| 2021-04-28 | 791 | 636 |\n", "| 2021-04-29 | 46934 | 33173 |\n", "| 2021-04-30 | 274519 | 177001 |\n", "| 2021-05-01 | 243709 | 151370 |\n", "| 2021-05-02 | 258630 | 161650 |\n", "| 2021-05-03 | 303583 | 195465 |\n", "| 2021-05-04 | 300642 | 194931 |\n", "| 2021-05-05 | 295520 | 191876 |\n", "| 2021-05-06 | 291788 | 190268 |\n", "| 2021-05-07 | 265159 | 171239 |\n", "| 2021-05-08 | 225456 | 141143 |\n", "| 2021-05-09 | 244182 | 152870 |\n", "| 2021-05-10 | 301699 | 196353 |\n", "| 2021-05-11 | 296753 | 193206 |\n", "| 2021-05-12 | 278392 | 181382 |\n", "| 2021-05-13 | 54123 | 37258 |\n", "\n" ], "text/plain": [ " date n_events n_sessions\n", "1 2021-04-28 791 636 \n", "2 2021-04-29 46934 33173 \n", "3 2021-04-30 274519 177001 \n", "4 2021-05-01 243709 151370 \n", "5 2021-05-02 258630 161650 \n", "6 2021-05-03 303583 195465 \n", "7 2021-05-04 300642 194931 \n", "8 2021-05-05 295520 191876 \n", "9 2021-05-06 291788 190268 \n", "10 2021-05-07 265159 171239 \n", "11 2021-05-08 225456 141143 \n", "12 2021-05-09 244182 152870 \n", "13 2021-05-10 301699 196353 \n", "14 2021-05-11 296753 193206 \n", "15 2021-05-12 278392 181382 \n", "16 2021-05-13 54123 37258 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_link_events_daily <- lang_link_events %>%\n", " group_by(date) %>%\n", " summarize(n_events = sum(n_events),\n", " n_sessions = n_distinct(web_session_id))\n", "\n", "lang_link_events_daily" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We start recording events on 28 April 2021. There are an average 176,827 sessions per day including sessions by both logged in and logged out users. No unexpected spikes or drops so far." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clicks per session" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Check to make sure there are duplicate session id. Some sessions should have more than one click event." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "FALSE" ], "text/latex": [ "FALSE" ], "text/markdown": [ "FALSE" ], "text/plain": [ "[1] FALSE" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "length(unique(lang_link_events$web_session_id)) == nrow(lang_link_events)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\n", "
A tibble: 2 × 4
isanonavg_clicksmax_clicksmin_clciks
<chr><dbl><int><int>
false1.00032721
true 1.00016941
\n" ], "text/latex": [ "A tibble: 2 × 4\n", "\\begin{tabular}{llll}\n", " isanon & avg\\_clicks & max\\_clicks & min\\_clciks\\\\\n", " & & & \\\\\n", "\\hline\n", "\t false & 1.000327 & 2 & 1\\\\\n", "\t true & 1.000169 & 4 & 1\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 2 × 4\n", "\n", "| isanon <chr> | avg_clicks <dbl> | max_clicks <int> | min_clciks <int> |\n", "|---|---|---|---|\n", "| false | 1.000327 | 2 | 1 |\n", "| true | 1.000169 | 4 | 1 |\n", "\n" ], "text/plain": [ " isanon avg_clicks max_clicks min_clciks\n", "1 false 1.000327 2 1 \n", "2 true 1.000169 4 1 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_link_events_persession <- lang_link_events %>%\n", " group_by(isanon) %>%\n", " summarize(avg_clicks = mean(n_events),\n", " max_clicks = max(n_events),\n", " min_clciks = min(n_events))\n", "\n", "lang_link_events_persession" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## By Logged In Status" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\n", "
A tibble: 2 × 3
isanonn_eventsn_sessions
<chr><int><int>
false 113161 46278
true 35687192058676
\n" ], "text/latex": [ "A tibble: 2 × 3\n", "\\begin{tabular}{lll}\n", " isanon & n\\_events & n\\_sessions\\\\\n", " & & \\\\\n", "\\hline\n", "\t false & 113161 & 46278\\\\\n", "\t true & 3568719 & 2058676\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 2 × 3\n", "\n", "| isanon <chr> | n_events <int> | n_sessions <int> |\n", "|---|---|---|\n", "| false | 113161 | 46278 |\n", "| true | 3568719 | 2058676 |\n", "\n" ], "text/plain": [ " isanon n_events n_sessions\n", "1 false 113161 46278 \n", "2 true 3568719 2058676 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_link_events_isanon <- lang_link_events %>%\n", " group_by(isanon) %>%\n", " summarize(n_events = sum(n_events),\n", " n_sessions = n_distinct(web_session_id))\n", "\n", "lang_link_events_isanon " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "97.8% of all sessions with clicks to the language links are by logged out users. That's high but expected because instrumentation was limited to legacy sidebar in modern Vector (not from legacy or other skins such as timeless). The new language switching functionality was made available to all logged-in users opted into the latest version of the Vector skin.\n", "\n", "Legacy sidebar in modern Vector would mostly appear to logged-out users on test wikis where Vector is deployed as default.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## By Test Wiki" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 13 × 3
wikin_eventsn_sessions
<chr><int><int>
bnwiki 8451 5726
dewikivoyage 755 604
euwiki 31609 22351
fawiki 200781 110092
frwiki 19337431091442
frwiktionary 21307 13350
hewiki 173526 100165
kowiki 146461 81522
ptwiki 555228 328675
ptwikiversity 15 13
srwiki 98359 56521
trwiki 256347 151107
vecwiki 1348 1133
\n" ], "text/latex": [ "A tibble: 13 × 3\n", "\\begin{tabular}{lll}\n", " wiki & n\\_events & n\\_sessions\\\\\n", " & & \\\\\n", "\\hline\n", "\t bnwiki & 8451 & 5726\\\\\n", "\t dewikivoyage & 755 & 604\\\\\n", "\t euwiki & 31609 & 22351\\\\\n", "\t fawiki & 200781 & 110092\\\\\n", "\t frwiki & 1933743 & 1091442\\\\\n", "\t frwiktionary & 21307 & 13350\\\\\n", "\t hewiki & 173526 & 100165\\\\\n", "\t kowiki & 146461 & 81522\\\\\n", "\t ptwiki & 555228 & 328675\\\\\n", "\t ptwikiversity & 15 & 13\\\\\n", "\t srwiki & 98359 & 56521\\\\\n", "\t trwiki & 256347 & 151107\\\\\n", "\t vecwiki & 1348 & 1133\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 13 × 3\n", "\n", "| wiki <chr> | n_events <int> | n_sessions <int> |\n", "|---|---|---|\n", "| bnwiki | 8451 | 5726 |\n", "| dewikivoyage | 755 | 604 |\n", "| euwiki | 31609 | 22351 |\n", "| fawiki | 200781 | 110092 |\n", "| frwiki | 1933743 | 1091442 |\n", "| frwiktionary | 21307 | 13350 |\n", "| hewiki | 173526 | 100165 |\n", "| kowiki | 146461 | 81522 |\n", "| ptwiki | 555228 | 328675 |\n", "| ptwikiversity | 15 | 13 |\n", "| srwiki | 98359 | 56521 |\n", "| trwiki | 256347 | 151107 |\n", "| vecwiki | 1348 | 1133 |\n", "\n" ], "text/plain": [ " wiki n_events n_sessions\n", "1 bnwiki 8451 5726 \n", "2 dewikivoyage 755 604 \n", "3 euwiki 31609 22351 \n", "4 fawiki 200781 110092 \n", "5 frwiki 1933743 1091442 \n", "6 frwiktionary 21307 13350 \n", "7 hewiki 173526 100165 \n", "8 kowiki 146461 81522 \n", "9 ptwiki 555228 328675 \n", "10 ptwikiversity 15 13 \n", "11 srwiki 98359 56521 \n", "12 trwiki 256347 151107 \n", "13 vecwiki 1348 1133 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# events and sessions that include link to language link by test wiki\n", "lang_link_events_testwiki <- lang_link_events %>%\n", " filter(wiki %in% c('frwiktionary', 'hewiki', 'ptwikiversity', 'frwiki', \n", " 'euwiki', 'fawiki', 'ptwiki', 'kowiki', 'trwiki', 'srwiki', 'bnwiki', 'dewikivoyage', 'vecwiki' )) %>%\n", " group_by(wiki) %>%\n", " summarize(n_events = sum(n_events),\n", " n_sessions = n_distinct(web_session_id))\n", "\n", "lang_link_events_testwiki\n" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'istestwiki' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 4 × 4
istestwikiisanonn_eventsn_sessions
<chr><chr><int><int>
non_test_wikifalse 3540 1422
non_test_wikitrue 147 80
test_wiki false 102371 41836
test_wiki true 33255591922334
\n" ], "text/latex": [ "A grouped\\_df: 4 × 4\n", "\\begin{tabular}{llll}\n", " istestwiki & isanon & n\\_events & n\\_sessions\\\\\n", " & & & \\\\\n", "\\hline\n", "\t non\\_test\\_wiki & false & 3540 & 1422\\\\\n", "\t non\\_test\\_wiki & true & 147 & 80\\\\\n", "\t test\\_wiki & false & 102371 & 41836\\\\\n", "\t test\\_wiki & true & 3325559 & 1922334\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 4 × 4\n", "\n", "| istestwiki <chr> | isanon <chr> | n_events <int> | n_sessions <int> |\n", "|---|---|---|---|\n", "| non_test_wiki | false | 3540 | 1422 |\n", "| non_test_wiki | true | 147 | 80 |\n", "| test_wiki | false | 102371 | 41836 |\n", "| test_wiki | true | 3325559 | 1922334 |\n", "\n" ], "text/plain": [ " istestwiki isanon n_events n_sessions\n", "1 non_test_wiki false 3540 1422 \n", "2 non_test_wiki true 147 80 \n", "3 test_wiki false 102371 41836 \n", "4 test_wiki true 3325559 1922334 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# events and sessions that include link to language link by test wiki category and logged-in status\n", "lang_link_events_testwiki_isanon <- lang_link_events %>%\n", " mutate(istestwiki = ifelse(wiki %in% c('frwiktionary', 'hewiki', 'ptwikiversity', 'frwiki', \n", " 'euwiki', 'fawiki', 'ptwiki', 'kowiki', 'trwiki', 'srwiki', 'bnwiki', 'dewikivoyage', 'vecwiki' ), 'test_wiki', 'non_test_wiki')) %>%\n", " group_by(istestwiki, isanon) %>%\n", " summarize(n_events = sum(n_events),\n", " n_sessions = n_distinct(web_session_id))\n", "\n", "lang_link_events_testwiki_isanon" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Almost all of the events recorded to date (99%) have been on test wikis. This is expected as the new language switcher button was deployed to all users opt'd in to the modern vector on all non test wikis. Users with modern vector on the test wikis have not been shown the new language switcher and still shown the language links in the sidebar.\n", "\n", "On non test wikis, the majority (94.67%) of sessions with clicks to the language list on modern vector come from logged-in users. Need to confirm if it's possible to have language link in sidebar if you are logged-in, on modern vector and on a non test wiki. \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## By User Edit Bucket" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 6 × 3
usereditbucketn_eventsn_sessions
<chr><int><int>
0 edits 2009210961
1-4 edits 11510 5643
100-999 edits17264 6533
1000+ edits 31482 8932
5-99 edits 2555611332
NULL 7 5
\n" ], "text/latex": [ "A tibble: 6 × 3\n", "\\begin{tabular}{lll}\n", " usereditbucket & n\\_events & n\\_sessions\\\\\n", " & & \\\\\n", "\\hline\n", "\t 0 edits & 20092 & 10961\\\\\n", "\t 1-4 edits & 11510 & 5643\\\\\n", "\t 100-999 edits & 17264 & 6533\\\\\n", "\t 1000+ edits & 31482 & 8932\\\\\n", "\t 5-99 edits & 25556 & 11332\\\\\n", "\t NULL & 7 & 5\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 6 × 3\n", "\n", "| usereditbucket <chr> | n_events <int> | n_sessions <int> |\n", "|---|---|---|\n", "| 0 edits | 20092 | 10961 |\n", "| 1-4 edits | 11510 | 5643 |\n", "| 100-999 edits | 17264 | 6533 |\n", "| 1000+ edits | 31482 | 8932 |\n", "| 5-99 edits | 25556 | 11332 |\n", "| NULL | 7 | 5 |\n", "\n" ], "text/plain": [ " usereditbucket n_events n_sessions\n", "1 0 edits 20092 10961 \n", "2 1-4 edits 11510 5643 \n", "3 100-999 edits 17264 6533 \n", "4 1000+ edits 31482 8932 \n", "5 5-99 edits 25556 11332 \n", "6 NULL 7 5 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "logged_in_editcount <- lang_link_events %>%\n", " filter(isanon == 'false') %>%\n", " group_by(usereditbucket) %>%\n", " summarize(n_events = sum(n_events),\n", " n_sessions = n_distinct(web_session_id))\n", "\n", "logged_in_editcount" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\n", "
A tibble: 2 × 3
usereditbucketn_eventsn_sessions
<chr><int><int>
5-99 edits 1 1
NULL 33257051922412
\n" ], "text/latex": [ "A tibble: 2 × 3\n", "\\begin{tabular}{lll}\n", " usereditbucket & n\\_events & n\\_sessions\\\\\n", " & & \\\\\n", "\\hline\n", "\t 5-99 edits & 1 & 1\\\\\n", "\t NULL & 3325705 & 1922412\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 2 × 3\n", "\n", "| usereditbucket <chr> | n_events <int> | n_sessions <int> |\n", "|---|---|---|\n", "| 5-99 edits | 1 | 1 |\n", "| NULL | 3325705 | 1922412 |\n", "\n" ], "text/plain": [ " usereditbucket n_events n_sessions\n", "1 5-99 edits 1 1 \n", "2 NULL 3325705 1922412 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "logged_out_editcount <- lang_link_events %>%\n", " filter(isanon == 'true') %>%\n", " group_by(usereditbucket) %>%\n", " summarize(n_events = sum(n_events),\n", " n_sessions = n_distinct(web_session_id))\n", "\n", "logged_out_editcount" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are just a few instances (under 0.01%) of the event.usereditbucket field being populated for logged out users and recorded as NULL for logged-in users. Further investigation might be needed; however, the numbers of these events is not high enough to skew the data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## By Final Language" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'selectedinterfacelanguage' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 6 × 3
selectedinterfacelanguagen_sessionspct_sessions
<chr><int><dbl>
en14431050.73470332
es 1283580.06534871
de 1250610.06367016
it 650100.03309743
ar 644080.03279094
ru 524280.02669177
\n" ], "text/latex": [ "A grouped\\_df: 6 × 3\n", "\\begin{tabular}{lll}\n", " selectedinterfacelanguage & n\\_sessions & pct\\_sessions\\\\\n", " & & \\\\\n", "\\hline\n", "\t en & 1443105 & 0.73470332\\\\\n", "\t es & 128358 & 0.06534871\\\\\n", "\t de & 125061 & 0.06367016\\\\\n", "\t it & 65010 & 0.03309743\\\\\n", "\t ar & 64408 & 0.03279094\\\\\n", "\t ru & 52428 & 0.02669177\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 6 × 3\n", "\n", "| selectedinterfacelanguage <chr> | n_sessions <int> | pct_sessions <dbl> |\n", "|---|---|---|\n", "| en | 1443105 | 0.73470332 |\n", "| es | 128358 | 0.06534871 |\n", "| de | 125061 | 0.06367016 |\n", "| it | 65010 | 0.03309743 |\n", "| ar | 64408 | 0.03279094 |\n", "| ru | 52428 | 0.02669177 |\n", "\n" ], "text/plain": [ " selectedinterfacelanguage n_sessions pct_sessions\n", "1 en 1443105 0.73470332 \n", "2 es 128358 0.06534871 \n", "3 de 125061 0.06367016 \n", "4 it 65010 0.03309743 \n", "5 ar 64408 0.03279094 \n", "6 ru 52428 0.02669177 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# test that you can switch from one language to the next.\n", "top_final_languages <- lang_link_events %>%\n", " mutate(all_sessions = n_distinct(web_session_id)) %>%\n", " group_by(selectedinterfacelanguage) %>%\n", " summarize(n_sessions = n_distinct(web_session_id),\n", " pct_sessions = n_sessions/all_sessions) %>%\n", " distinct() %>%\n", " arrange(desc(n_sessions))\n", "\n", "head(top_final_languages )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The most frequent language switches are to english (73% of sessions) followed by spanish (6.5%), and german (6.3%)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## By Initial Language " ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'interfacelanguage', 'contentlanguage' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 6 × 4
interfacelanguagecontentlanguagen_sessionspct_sessions
<chr><chr><int><dbl>
frfr11037420.56192925
ptpt 3280120.16699513
trtr 1509770.07686433
fafa 1099890.05599681
hehe 1001250.05097493
koko 814290.04145655
\n" ], "text/latex": [ "A grouped\\_df: 6 × 4\n", "\\begin{tabular}{llll}\n", " interfacelanguage & contentlanguage & n\\_sessions & pct\\_sessions\\\\\n", " & & & \\\\\n", "\\hline\n", "\t fr & fr & 1103742 & 0.56192925\\\\\n", "\t pt & pt & 328012 & 0.16699513\\\\\n", "\t tr & tr & 150977 & 0.07686433\\\\\n", "\t fa & fa & 109989 & 0.05599681\\\\\n", "\t he & he & 100125 & 0.05097493\\\\\n", "\t ko & ko & 81429 & 0.04145655\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 6 × 4\n", "\n", "| interfacelanguage <chr> | contentlanguage <chr> | n_sessions <int> | pct_sessions <dbl> |\n", "|---|---|---|---|\n", "| fr | fr | 1103742 | 0.56192925 |\n", "| pt | pt | 328012 | 0.16699513 |\n", "| tr | tr | 150977 | 0.07686433 |\n", "| fa | fa | 109989 | 0.05599681 |\n", "| he | he | 100125 | 0.05097493 |\n", "| ko | ko | 81429 | 0.04145655 |\n", "\n" ], "text/plain": [ " interfacelanguage contentlanguage n_sessions pct_sessions\n", "1 fr fr 1103742 0.56192925 \n", "2 pt pt 328012 0.16699513 \n", "3 tr tr 150977 0.07686433 \n", "4 fa fa 109989 0.05599681 \n", "5 he he 100125 0.05097493 \n", "6 ko ko 81429 0.04145655 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "\n", "top_initial_languages <- lang_link_events %>%\n", " mutate(all_sessions = n_distinct(web_session_id)) %>%\n", " group_by(interfacelanguage, contentlanguage) %>%\n", " summarize(n_sessions = n_distinct(web_session_id),\n", " pct_sessions = n_sessions/all_sessions) %>%\n", " distinct() %>%\n", " arrange(desc(n_sessions))\n", "\n", "head(top_initial_languages )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The interfacelanguage and contentlanguage will usually be the same and should match for most instances, which is confirmed here.\n", "\n", "The top initial languages all from test wikis, which is expected since the language links are still shown to all logged-in and logged-out users on modern vector. The new language switcher, which replaces the lang links with a button, are show to all logged-in users opt'd into modern vector on non-test wikis. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## Most Frequent Switch Types\n" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'interfacelanguage', 'selectedinterfacelanguage' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 6 × 4
interfacelanguageselectedinterfacelanguagen_sessionspct_sessions
<chr><chr><int><dbl>
fren7757690.39495398
pten2643730.13459570
tren1188240.06049483
faen 956220.04868239
frde 949150.04832245
heen 846680.04310557
\n" ], "text/latex": [ "A grouped\\_df: 6 × 4\n", "\\begin{tabular}{llll}\n", " interfacelanguage & selectedinterfacelanguage & n\\_sessions & pct\\_sessions\\\\\n", " & & & \\\\\n", "\\hline\n", "\t fr & en & 775769 & 0.39495398\\\\\n", "\t pt & en & 264373 & 0.13459570\\\\\n", "\t tr & en & 118824 & 0.06049483\\\\\n", "\t fa & en & 95622 & 0.04868239\\\\\n", "\t fr & de & 94915 & 0.04832245\\\\\n", "\t he & en & 84668 & 0.04310557\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 6 × 4\n", "\n", "| interfacelanguage <chr> | selectedinterfacelanguage <chr> | n_sessions <int> | pct_sessions <dbl> |\n", "|---|---|---|---|\n", "| fr | en | 775769 | 0.39495398 |\n", "| pt | en | 264373 | 0.13459570 |\n", "| tr | en | 118824 | 0.06049483 |\n", "| fa | en | 95622 | 0.04868239 |\n", "| fr | de | 94915 | 0.04832245 |\n", "| he | en | 84668 | 0.04310557 |\n", "\n" ], "text/plain": [ " interfacelanguage selectedinterfacelanguage n_sessions pct_sessions\n", "1 fr en 775769 0.39495398 \n", "2 pt en 264373 0.13459570 \n", "3 tr en 118824 0.06049483 \n", "4 fa en 95622 0.04868239 \n", "5 fr de 94915 0.04832245 \n", "6 he en 84668 0.04310557 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "top_final_languages <- lang_link_events %>%\n", " mutate(all_sessions = n_distinct(web_session_id)) %>%\n", " group_by(interfacelanguage, selectedinterfacelanguage) %>%\n", " summarize(n_sessions = n_distinct(web_session_id),\n", " pct_sessions = n_sessions/all_sessions) %>%\n", " distinct() %>%\n", " arrange(desc(n_sessions))\n", "\n", "head(top_final_languages )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "39% of all sessions are switches are from French to English. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Language Switcher Button" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Instrumentation Task](https://phabricator.wikimedia.org/T281928)\n", "\n", "- Instrumented as `event.action=\"compact-language-links-open\"`\n", "- Logged when a user open the new language switcher button. \n", "- The first click and subsequent clicks should be recorded per https://gerrit.wikimedia.org/r/c/mediawiki/extensions/UniversalLanguageSelector/+/688994/\n", "- Change merged on May 11th and backported/deployed on May 12th.\n", "- There should not be any clicks to the Language Switcher button by logged-in users on test wikis or logged-out users, as the new lanuage button has not been deployed to these wikis or these users.\n", "- Only deployed as logged-in users on the latest vector on a non test wiki" ] }, { "cell_type": "code", "execution_count": 222, "metadata": {}, "outputs": [], "source": [ "query <- \n", "\"\n", "SELECT\n", " TO_DATE(dt) AS `date`,\n", " wiki,\n", " event.web_session_id,\n", " event.usereditbucket,\n", " event.skin,\n", " event.skinVersion,\n", " event.timetochangelanguage,\n", " event.isanon,\n", " event.context,\n", " event.interfacelanguage,\n", " event.contentlanguage,\n", " event.selectedinterfacelanguage,\n", " Count(*) AS n_events\n", "FROM event.universallanguageselector\n", "WHERE\n", " year = 2021\n", " AND month >= 05\n", " AND event.action = 'compact-language-links-open'\n", " AND useragent.is_bot = false\n", "GROUP BY\n", " TO_DATE(dt),\n", " wiki,\n", " event.web_session_id,\n", " event.usereditbucket,\n", " event.skin,\n", " event.skinVersion,\n", " event.timetochangelanguage,\n", " event.isanon,\n", " event.context,\n", " event.interfacelanguage,\n", " event.contentlanguage,\n", " event.selectedinterfacelanguage\n", "\"" ] }, { "cell_type": "code", "execution_count": 223, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Don't forget to authenticate with Kerberos using kinit\n", "\n" ] } ], "source": [ "lang_button_events <- wmfdata::query_hive(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Daily Lang Button Click Events" ] }, { "cell_type": "code", "execution_count": 224, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 49 × 3
daten_eventsn_sessions
<chr><int><int>
2021-05-12 12 10
2021-05-13 194 159
2021-05-14 1317 1147
2021-05-15 1159 928
2021-05-16 1358 1042
2021-05-17 32668 28541
2021-05-18 90811 78945
2021-05-19 89299 77108
2021-05-20 87644 75820
2021-05-21 81498 67973
2021-05-22 67460 55270
2021-05-23 76222 62416
2021-05-24 92724 76384
2021-05-25 92351 77546
2021-05-26 89947 75196
2021-05-27 87440 73193
2021-05-28 78177 65185
2021-05-29 65494 53538
2021-05-30 72238 59545
2021-05-31 87384 72570
2021-06-01 86754 72234
2021-06-02 85835 71774
2021-06-03 81864 68127
2021-06-04 74580 61948
2021-06-05 63824 52160
2021-06-06 71082 58131
2021-06-07 85287 71501
2021-06-08 84667 70399
2021-06-09 83472 69111
2021-06-10 80794 67411
2021-06-11 72647 60353
2021-06-12 60176 49825
2021-06-13 66600 54534
2021-06-14 80895 67718
2021-06-15 79705 67001
2021-06-16 78153 64678
2021-06-17 76308 63395
2021-06-18 70039 57090
2021-06-19 57473 46714
2021-06-20 62298 51153
2021-06-21 76513 63711
2021-06-22103507 80471
2021-06-23145211103694
2021-06-24151335106808
2021-06-25142435 98600
2021-06-26128018 85548
2021-06-27145148 96486
2021-06-28180394123458
2021-06-29 63085 45605
\n" ], "text/latex": [ "A tibble: 49 × 3\n", "\\begin{tabular}{lll}\n", " date & n\\_events & n\\_sessions\\\\\n", " & & \\\\\n", "\\hline\n", "\t 2021-05-12 & 12 & 10\\\\\n", "\t 2021-05-13 & 194 & 159\\\\\n", "\t 2021-05-14 & 1317 & 1147\\\\\n", "\t 2021-05-15 & 1159 & 928\\\\\n", "\t 2021-05-16 & 1358 & 1042\\\\\n", "\t 2021-05-17 & 32668 & 28541\\\\\n", "\t 2021-05-18 & 90811 & 78945\\\\\n", "\t 2021-05-19 & 89299 & 77108\\\\\n", "\t 2021-05-20 & 87644 & 75820\\\\\n", "\t 2021-05-21 & 81498 & 67973\\\\\n", "\t 2021-05-22 & 67460 & 55270\\\\\n", "\t 2021-05-23 & 76222 & 62416\\\\\n", "\t 2021-05-24 & 92724 & 76384\\\\\n", "\t 2021-05-25 & 92351 & 77546\\\\\n", "\t 2021-05-26 & 89947 & 75196\\\\\n", "\t 2021-05-27 & 87440 & 73193\\\\\n", "\t 2021-05-28 & 78177 & 65185\\\\\n", "\t 2021-05-29 & 65494 & 53538\\\\\n", "\t 2021-05-30 & 72238 & 59545\\\\\n", "\t 2021-05-31 & 87384 & 72570\\\\\n", "\t 2021-06-01 & 86754 & 72234\\\\\n", "\t 2021-06-02 & 85835 & 71774\\\\\n", "\t 2021-06-03 & 81864 & 68127\\\\\n", "\t 2021-06-04 & 74580 & 61948\\\\\n", "\t 2021-06-05 & 63824 & 52160\\\\\n", "\t 2021-06-06 & 71082 & 58131\\\\\n", "\t 2021-06-07 & 85287 & 71501\\\\\n", "\t 2021-06-08 & 84667 & 70399\\\\\n", "\t 2021-06-09 & 83472 & 69111\\\\\n", "\t 2021-06-10 & 80794 & 67411\\\\\n", "\t 2021-06-11 & 72647 & 60353\\\\\n", "\t 2021-06-12 & 60176 & 49825\\\\\n", "\t 2021-06-13 & 66600 & 54534\\\\\n", "\t 2021-06-14 & 80895 & 67718\\\\\n", "\t 2021-06-15 & 79705 & 67001\\\\\n", "\t 2021-06-16 & 78153 & 64678\\\\\n", "\t 2021-06-17 & 76308 & 63395\\\\\n", "\t 2021-06-18 & 70039 & 57090\\\\\n", "\t 2021-06-19 & 57473 & 46714\\\\\n", "\t 2021-06-20 & 62298 & 51153\\\\\n", "\t 2021-06-21 & 76513 & 63711\\\\\n", "\t 2021-06-22 & 103507 & 80471\\\\\n", "\t 2021-06-23 & 145211 & 103694\\\\\n", "\t 2021-06-24 & 151335 & 106808\\\\\n", "\t 2021-06-25 & 142435 & 98600\\\\\n", "\t 2021-06-26 & 128018 & 85548\\\\\n", "\t 2021-06-27 & 145148 & 96486\\\\\n", "\t 2021-06-28 & 180394 & 123458\\\\\n", "\t 2021-06-29 & 63085 & 45605\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 49 × 3\n", "\n", "| date <chr> | n_events <int> | n_sessions <int> |\n", "|---|---|---|\n", "| 2021-05-12 | 12 | 10 |\n", "| 2021-05-13 | 194 | 159 |\n", "| 2021-05-14 | 1317 | 1147 |\n", "| 2021-05-15 | 1159 | 928 |\n", "| 2021-05-16 | 1358 | 1042 |\n", "| 2021-05-17 | 32668 | 28541 |\n", "| 2021-05-18 | 90811 | 78945 |\n", "| 2021-05-19 | 89299 | 77108 |\n", "| 2021-05-20 | 87644 | 75820 |\n", "| 2021-05-21 | 81498 | 67973 |\n", "| 2021-05-22 | 67460 | 55270 |\n", "| 2021-05-23 | 76222 | 62416 |\n", "| 2021-05-24 | 92724 | 76384 |\n", "| 2021-05-25 | 92351 | 77546 |\n", "| 2021-05-26 | 89947 | 75196 |\n", "| 2021-05-27 | 87440 | 73193 |\n", "| 2021-05-28 | 78177 | 65185 |\n", "| 2021-05-29 | 65494 | 53538 |\n", "| 2021-05-30 | 72238 | 59545 |\n", "| 2021-05-31 | 87384 | 72570 |\n", "| 2021-06-01 | 86754 | 72234 |\n", "| 2021-06-02 | 85835 | 71774 |\n", "| 2021-06-03 | 81864 | 68127 |\n", "| 2021-06-04 | 74580 | 61948 |\n", "| 2021-06-05 | 63824 | 52160 |\n", "| 2021-06-06 | 71082 | 58131 |\n", "| 2021-06-07 | 85287 | 71501 |\n", "| 2021-06-08 | 84667 | 70399 |\n", "| 2021-06-09 | 83472 | 69111 |\n", "| 2021-06-10 | 80794 | 67411 |\n", "| 2021-06-11 | 72647 | 60353 |\n", "| 2021-06-12 | 60176 | 49825 |\n", "| 2021-06-13 | 66600 | 54534 |\n", "| 2021-06-14 | 80895 | 67718 |\n", "| 2021-06-15 | 79705 | 67001 |\n", "| 2021-06-16 | 78153 | 64678 |\n", "| 2021-06-17 | 76308 | 63395 |\n", "| 2021-06-18 | 70039 | 57090 |\n", "| 2021-06-19 | 57473 | 46714 |\n", "| 2021-06-20 | 62298 | 51153 |\n", "| 2021-06-21 | 76513 | 63711 |\n", "| 2021-06-22 | 103507 | 80471 |\n", "| 2021-06-23 | 145211 | 103694 |\n", "| 2021-06-24 | 151335 | 106808 |\n", "| 2021-06-25 | 142435 | 98600 |\n", "| 2021-06-26 | 128018 | 85548 |\n", "| 2021-06-27 | 145148 | 96486 |\n", "| 2021-06-28 | 180394 | 123458 |\n", "| 2021-06-29 | 63085 | 45605 |\n", "\n" ], "text/plain": [ " date n_events n_sessions\n", "1 2021-05-12 12 10 \n", "2 2021-05-13 194 159 \n", "3 2021-05-14 1317 1147 \n", "4 2021-05-15 1159 928 \n", "5 2021-05-16 1358 1042 \n", "6 2021-05-17 32668 28541 \n", "7 2021-05-18 90811 78945 \n", "8 2021-05-19 89299 77108 \n", "9 2021-05-20 87644 75820 \n", "10 2021-05-21 81498 67973 \n", "11 2021-05-22 67460 55270 \n", "12 2021-05-23 76222 62416 \n", "13 2021-05-24 92724 76384 \n", "14 2021-05-25 92351 77546 \n", "15 2021-05-26 89947 75196 \n", "16 2021-05-27 87440 73193 \n", "17 2021-05-28 78177 65185 \n", "18 2021-05-29 65494 53538 \n", "19 2021-05-30 72238 59545 \n", "20 2021-05-31 87384 72570 \n", "21 2021-06-01 86754 72234 \n", "22 2021-06-02 85835 71774 \n", "23 2021-06-03 81864 68127 \n", "24 2021-06-04 74580 61948 \n", "25 2021-06-05 63824 52160 \n", "26 2021-06-06 71082 58131 \n", "27 2021-06-07 85287 71501 \n", "28 2021-06-08 84667 70399 \n", "29 2021-06-09 83472 69111 \n", "30 2021-06-10 80794 67411 \n", "31 2021-06-11 72647 60353 \n", "32 2021-06-12 60176 49825 \n", "33 2021-06-13 66600 54534 \n", "34 2021-06-14 80895 67718 \n", "35 2021-06-15 79705 67001 \n", "36 2021-06-16 78153 64678 \n", "37 2021-06-17 76308 63395 \n", "38 2021-06-18 70039 57090 \n", "39 2021-06-19 57473 46714 \n", "40 2021-06-20 62298 51153 \n", "41 2021-06-21 76513 63711 \n", "42 2021-06-22 103507 80471 \n", "43 2021-06-23 145211 103694 \n", "44 2021-06-24 151335 106808 \n", "45 2021-06-25 142435 98600 \n", "46 2021-06-26 128018 85548 \n", "47 2021-06-27 145148 96486 \n", "48 2021-06-28 180394 123458 \n", "49 2021-06-29 63085 45605 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_button_events_daily <- lang_button_events %>%\n", " group_by(date) %>%\n", " summarize(n_events = sum(n_events),\n", " n_sessions = n_distinct(web_session_id))\n", "\n", "lang_button_events_daily" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Confirmed we start recording events on May 12th following backport of the change, starting with just a few events and then more as change rolled out to additional wikis.\n", "- There is an expected higher number of events than sessions, which is expected as a session can include multiple clicks to open the language button.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# By Context and Vector Version" ] }, { "cell_type": "code", "execution_count": 225, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'skinversion' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 4 × 4
skinversioncontextn_eventsn_sessions
<chr><chr><int><int>
latestheader 595196 326328
latestother 94351 77145
legacyother 13625651093551
NULL NULL 36761 31132
\n" ], "text/latex": [ "A grouped\\_df: 4 × 4\n", "\\begin{tabular}{llll}\n", " skinversion & context & n\\_events & n\\_sessions\\\\\n", " & & & \\\\\n", "\\hline\n", "\t latest & header & 595196 & 326328\\\\\n", "\t latest & other & 94351 & 77145\\\\\n", "\t legacy & other & 1362565 & 1093551\\\\\n", "\t NULL & NULL & 36761 & 31132\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 4 × 4\n", "\n", "| skinversion <chr> | context <chr> | n_events <int> | n_sessions <int> |\n", "|---|---|---|---|\n", "| latest | header | 595196 | 326328 |\n", "| latest | other | 94351 | 77145 |\n", "| legacy | other | 1362565 | 1093551 |\n", "| NULL | NULL | 36761 | 31132 |\n", "\n" ], "text/plain": [ " skinversion context n_events n_sessions\n", "1 latest header 595196 326328 \n", "2 latest other 94351 77145 \n", "3 legacy other 1362565 1093551 \n", "4 NULL NULL 36761 31132 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_button_events_bycontext <- lang_button_events %>%\n", "# date new instrumentation was added\n", " filter(date >= '2021-06-08')%>%\n", " group_by(skinversion, context) %>%\n", " summarize(n_events = sum(n_events),\n", " n_sessions = n_distinct(web_session_id))\n", "\n", "lang_button_events_bycontext" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Conirmed we are only recording new button clicks on the latest vector." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Double Check to make sure there are some sessions with more than one event" ] }, { "cell_type": "code", "execution_count": 226, "metadata": {}, "outputs": [ { "data": { "text/html": [ "FALSE" ], "text/latex": [ "FALSE" ], "text/markdown": [ "FALSE" ], "text/plain": [ "[1] FALSE" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "length(unique(lang_button_events$web_session_id)) == nrow(lang_button_events)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lang Button Events Per Wiki" ] }, { "cell_type": "code", "execution_count": 227, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 6 × 3
wikin_eventsn_sessions
<chr><int><int>
enwiki17195951341346
frwiki 455616 280201
ruwiki 230135 177510
dewiki 208276 173340
eswiki 154449 126942
ptwiki 145856 91869
\n" ], "text/latex": [ "A tibble: 6 × 3\n", "\\begin{tabular}{lll}\n", " wiki & n\\_events & n\\_sessions\\\\\n", " & & \\\\\n", "\\hline\n", "\t enwiki & 1719595 & 1341346\\\\\n", "\t frwiki & 455616 & 280201\\\\\n", "\t ruwiki & 230135 & 177510\\\\\n", "\t dewiki & 208276 & 173340\\\\\n", "\t eswiki & 154449 & 126942\\\\\n", "\t ptwiki & 145856 & 91869\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 6 × 3\n", "\n", "| wiki <chr> | n_events <int> | n_sessions <int> |\n", "|---|---|---|\n", "| enwiki | 1719595 | 1341346 |\n", "| frwiki | 455616 | 280201 |\n", "| ruwiki | 230135 | 177510 |\n", "| dewiki | 208276 | 173340 |\n", "| eswiki | 154449 | 126942 |\n", "| ptwiki | 145856 | 91869 |\n", "\n" ], "text/plain": [ " wiki n_events n_sessions\n", "1 enwiki 1719595 1341346 \n", "2 frwiki 455616 280201 \n", "3 ruwiki 230135 177510 \n", "4 dewiki 208276 173340 \n", "5 eswiki 154449 126942 \n", "6 ptwiki 145856 91869 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "\n", "lang_button_events_bywiki <- lang_button_events %>%\n", " group_by(wiki) %>%\n", " summarize(n_events = sum(n_events),\n", " n_sessions = n_distinct(web_session_id)) %>%\n", " arrange(desc(n_sessions))\n", "\n", "head(lang_button_events_bywiki)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Almost half (49.7%) of sessions with a click to the language button were recorded on English Wikipedia, followed by Russian, German and Spanish Wikipedia. Note: The language switcher button is currently was not available to logged-in users on test wikis until 22 June 2021. AB Test deployment on Fawiki was delayed until 28 June 2021. \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Lang Button Event on Test Wikis" ] }, { "cell_type": "code", "execution_count": 229, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 13 × 3
wikin_eventsn_sessions
<chr><int><int>
frwiki 455616280201
ptwiki 145856 91869
trwiki 67646 46391
hewiki 54807 33156
kowiki 39555 24618
srwiki 30836 21933
fawiki 15068 11986
frwiktionary 6245 4000
bnwiki 4205 2979
euwiki 3972 2944
vecwiki 887 794
dewikivoyage 259 177
ptwikiversity 4 3
\n" ], "text/latex": [ "A tibble: 13 × 3\n", "\\begin{tabular}{lll}\n", " wiki & n\\_events & n\\_sessions\\\\\n", " & & \\\\\n", "\\hline\n", "\t frwiki & 455616 & 280201\\\\\n", "\t ptwiki & 145856 & 91869\\\\\n", "\t trwiki & 67646 & 46391\\\\\n", "\t hewiki & 54807 & 33156\\\\\n", "\t kowiki & 39555 & 24618\\\\\n", "\t srwiki & 30836 & 21933\\\\\n", "\t fawiki & 15068 & 11986\\\\\n", "\t frwiktionary & 6245 & 4000\\\\\n", "\t bnwiki & 4205 & 2979\\\\\n", "\t euwiki & 3972 & 2944\\\\\n", "\t vecwiki & 887 & 794\\\\\n", "\t dewikivoyage & 259 & 177\\\\\n", "\t ptwikiversity & 4 & 3\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 13 × 3\n", "\n", "| wiki <chr> | n_events <int> | n_sessions <int> |\n", "|---|---|---|\n", "| frwiki | 455616 | 280201 |\n", "| ptwiki | 145856 | 91869 |\n", "| trwiki | 67646 | 46391 |\n", "| hewiki | 54807 | 33156 |\n", "| kowiki | 39555 | 24618 |\n", "| srwiki | 30836 | 21933 |\n", "| fawiki | 15068 | 11986 |\n", "| frwiktionary | 6245 | 4000 |\n", "| bnwiki | 4205 | 2979 |\n", "| euwiki | 3972 | 2944 |\n", "| vecwiki | 887 | 794 |\n", "| dewikivoyage | 259 | 177 |\n", "| ptwikiversity | 4 | 3 |\n", "\n" ], "text/plain": [ " wiki n_events n_sessions\n", "1 frwiki 455616 280201 \n", "2 ptwiki 145856 91869 \n", "3 trwiki 67646 46391 \n", "4 hewiki 54807 33156 \n", "5 kowiki 39555 24618 \n", "6 srwiki 30836 21933 \n", "7 fawiki 15068 11986 \n", "8 frwiktionary 6245 4000 \n", "9 bnwiki 4205 2979 \n", "10 euwiki 3972 2944 \n", "11 vecwiki 887 794 \n", "12 dewikivoyage 259 177 \n", "13 ptwikiversity 4 3 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "\n", "lang_button_events_bytestwiki <- lang_button_events %>%\n", " filter(wiki %in% c('frwiktionary', 'hewiki', 'ptwikiversity', 'frwiki', \n", " 'euwiki', 'fawiki', 'ptwiki', 'kowiki', 'trwiki', 'srwiki', 'bnwiki', \n", " 'dewikivoyage', 'vecwiki' )) %>%\n", " group_by(wiki) %>%\n", " summarize(n_events = sum(n_events),\n", " n_sessions = n_distinct(web_session_id)) %>%\n", " arrange(desc(n_sessions))\n", "\n", "lang_button_events_bytestwiki" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "About 8.4% of all sessions were recorded on the early adopter wikis. \n", "\n", "I reviewed the number of sessions on test wikis by logged-in status to determine if these were mostly logged-out or logged-in users." ] }, { "cell_type": "code", "execution_count": 230, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'istestwiki' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 4 × 4
istestwikiisanonn_eventsn_sessions
<chr><chr><int><int>
non_test_wikifalse 116409 68458
non_test_wikitrue 28921312324483
test_wiki false 27679 13284
test_wiki true 797277 508146
\n" ], "text/latex": [ "A grouped\\_df: 4 × 4\n", "\\begin{tabular}{llll}\n", " istestwiki & isanon & n\\_events & n\\_sessions\\\\\n", " & & & \\\\\n", "\\hline\n", "\t non\\_test\\_wiki & false & 116409 & 68458\\\\\n", "\t non\\_test\\_wiki & true & 2892131 & 2324483\\\\\n", "\t test\\_wiki & false & 27679 & 13284\\\\\n", "\t test\\_wiki & true & 797277 & 508146\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 4 × 4\n", "\n", "| istestwiki <chr> | isanon <chr> | n_events <int> | n_sessions <int> |\n", "|---|---|---|---|\n", "| non_test_wiki | false | 116409 | 68458 |\n", "| non_test_wiki | true | 2892131 | 2324483 |\n", "| test_wiki | false | 27679 | 13284 |\n", "| test_wiki | true | 797277 | 508146 |\n", "\n" ], "text/plain": [ " istestwiki isanon n_events n_sessions\n", "1 non_test_wiki false 116409 68458 \n", "2 non_test_wiki true 2892131 2324483 \n", "3 test_wiki false 27679 13284 \n", "4 test_wiki true 797277 508146 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# events and sessions that include link to language link by test wiki category and logged-in status\n", "lang_button_events_testwiki_isanon <- lang_button_events %>%\n", " mutate(istestwiki = ifelse(wiki %in% c('frwiktionary', 'hewiki', 'ptwikiversity', 'frwiki', \n", " 'euwiki', 'fawiki', 'ptwiki', 'kowiki', 'trwiki', 'srwiki', 'bnwiki', 'dewikivoyage', 'vecwiki' ), 'test_wiki', 'non_test_wiki')) %>%\n", " group_by(istestwiki, isanon) %>%\n", " summarize(n_events = sum(n_events),\n", " n_sessions = n_distinct(web_session_id))\n", "\n", "lang_button_events_testwiki_isanon " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "0.29% of sessions with a click to the lang switcher button came from logged-in users on the early adopter wikis and 8.1% of sessions by logged-out users. \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## By anon status" ] }, { "cell_type": "code", "execution_count": 231, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\n", "
A tibble: 2 × 3
isanonn_eventsn_sessions
<chr><int><int>
false 144088 81742
true 36894082832629
\n" ], "text/latex": [ "A tibble: 2 × 3\n", "\\begin{tabular}{lll}\n", " isanon & n\\_events & n\\_sessions\\\\\n", " & & \\\\\n", "\\hline\n", "\t false & 144088 & 81742\\\\\n", "\t true & 3689408 & 2832629\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 2 × 3\n", "\n", "| isanon <chr> | n_events <int> | n_sessions <int> |\n", "|---|---|---|\n", "| false | 144088 | 81742 |\n", "| true | 3689408 | 2832629 |\n", "\n" ], "text/plain": [ " isanon n_events n_sessions\n", "1 false 144088 81742 \n", "2 true 3689408 2832629 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_button_events_byanon <- lang_button_events %>%\n", " group_by(isanon) %>%\n", " summarize(n_events = sum(n_events),\n", " n_sessions = n_distinct(web_session_id)) \n", "\n", "head(lang_button_events_byanon)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The majority of sessions with clicks to the language button are by logged out users (97.2%) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clicks per session" ] }, { "cell_type": "code", "execution_count": 232, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\n", "
A tibble: 2 × 4
isanonavg_clicksmax_clicksmin_clciks
<chr><dbl><int><int>
false1.5835062851
true 1.2443247981
\n" ], "text/latex": [ "A tibble: 2 × 4\n", "\\begin{tabular}{llll}\n", " isanon & avg\\_clicks & max\\_clicks & min\\_clciks\\\\\n", " & & & \\\\\n", "\\hline\n", "\t false & 1.583506 & 285 & 1\\\\\n", "\t true & 1.244324 & 798 & 1\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 2 × 4\n", "\n", "| isanon <chr> | avg_clicks <dbl> | max_clicks <int> | min_clciks <int> |\n", "|---|---|---|---|\n", "| false | 1.583506 | 285 | 1 |\n", "| true | 1.244324 | 798 | 1 |\n", "\n" ], "text/plain": [ " isanon avg_clicks max_clicks min_clciks\n", "1 false 1.583506 285 1 \n", "2 true 1.244324 798 1 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_button_events_persession <- lang_button_events %>%\n", " group_by(isanon) %>%\n", " summarize(avg_clicks = mean(n_events),\n", " max_clicks = max(n_events),\n", " min_clciks = min(n_events))\n", "\n", "lang_button_events_persession" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Most sessions include between 1 to 2 clicks to the language switcher button.\n", "\n", "Note: There are some sessions by anon users with over 600 clicks, which is is likely automated traffic from bots and can be filtered out." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## User Edit Bucket" ] }, { "cell_type": "code", "execution_count": 233, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 6 × 3
usereditbucketn_eventsn_sessions
<chr><int><int>
0 edits 3610026263
1-4 edits 1537010060
100-999 edits2112411002
1000+ edits 3744415816
5-99 edits 3404818811
NULL 2 2
\n" ], "text/latex": [ "A tibble: 6 × 3\n", "\\begin{tabular}{lll}\n", " usereditbucket & n\\_events & n\\_sessions\\\\\n", " & & \\\\\n", "\\hline\n", "\t 0 edits & 36100 & 26263\\\\\n", "\t 1-4 edits & 15370 & 10060\\\\\n", "\t 100-999 edits & 21124 & 11002\\\\\n", "\t 1000+ edits & 37444 & 15816\\\\\n", "\t 5-99 edits & 34048 & 18811\\\\\n", "\t NULL & 2 & 2\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 6 × 3\n", "\n", "| usereditbucket <chr> | n_events <int> | n_sessions <int> |\n", "|---|---|---|\n", "| 0 edits | 36100 | 26263 |\n", "| 1-4 edits | 15370 | 10060 |\n", "| 100-999 edits | 21124 | 11002 |\n", "| 1000+ edits | 37444 | 15816 |\n", "| 5-99 edits | 34048 | 18811 |\n", "| NULL | 2 | 2 |\n", "\n" ], "text/plain": [ " usereditbucket n_events n_sessions\n", "1 0 edits 36100 26263 \n", "2 1-4 edits 15370 10060 \n", "3 100-999 edits 21124 11002 \n", "4 1000+ edits 37444 15816 \n", "5 5-99 edits 34048 18811 \n", "6 NULL 2 2 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "logged_in_editcount <- lang_button_events %>%\n", " filter(isanon == 'false') %>%\n", " group_by(usereditbucket) %>%\n", " summarize(n_events = sum(n_events),\n", " n_sessions = n_distinct(web_session_id))\n", "\n", "logged_in_editcount" ] }, { "cell_type": "code", "execution_count": 234, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\n", "
A tibble: 1 × 3
usereditbucketn_eventsn_sessions
<chr><int><int>
NULL36894082832629
\n" ], "text/latex": [ "A tibble: 1 × 3\n", "\\begin{tabular}{lll}\n", " usereditbucket & n\\_events & n\\_sessions\\\\\n", " & & \\\\\n", "\\hline\n", "\t NULL & 3689408 & 2832629\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 1 × 3\n", "\n", "| usereditbucket <chr> | n_events <int> | n_sessions <int> |\n", "|---|---|---|\n", "| NULL | 3689408 | 2832629 |\n", "\n" ], "text/plain": [ " usereditbucket n_events n_sessions\n", "1 NULL 3689408 2832629 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "logged_out_editcount <- lang_button_events %>%\n", " filter(isanon == 'true') %>%\n", " group_by(usereditbucket) %>%\n", " summarize(n_events = sum(n_events),\n", " n_sessions = n_distinct(web_session_id))\n", "\n", "logged_out_editcount" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Confirmed that we are not recording an edit count for logged out users as expected." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Content Language" ] }, { "cell_type": "code", "execution_count": 235, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'interfacelanguage', 'contentlanguage' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 6 × 4
interfacelanguagecontentlanguagen_sessionspct_sessions
<chr><chr><int><dbl>
enen13615860.46746010
frfr 2840330.09751429
ruru 1787660.06137400
dede 1745120.05991351
eses 1273340.04371635
ptpt 918750.03154255
\n" ], "text/latex": [ "A grouped\\_df: 6 × 4\n", "\\begin{tabular}{llll}\n", " interfacelanguage & contentlanguage & n\\_sessions & pct\\_sessions\\\\\n", " & & & \\\\\n", "\\hline\n", "\t en & en & 1361586 & 0.46746010\\\\\n", "\t fr & fr & 284033 & 0.09751429\\\\\n", "\t ru & ru & 178766 & 0.06137400\\\\\n", "\t de & de & 174512 & 0.05991351\\\\\n", "\t es & es & 127334 & 0.04371635\\\\\n", "\t pt & pt & 91875 & 0.03154255\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 6 × 4\n", "\n", "| interfacelanguage <chr> | contentlanguage <chr> | n_sessions <int> | pct_sessions <dbl> |\n", "|---|---|---|---|\n", "| en | en | 1361586 | 0.46746010 |\n", "| fr | fr | 284033 | 0.09751429 |\n", "| ru | ru | 178766 | 0.06137400 |\n", "| de | de | 174512 | 0.05991351 |\n", "| es | es | 127334 | 0.04371635 |\n", "| pt | pt | 91875 | 0.03154255 |\n", "\n" ], "text/plain": [ " interfacelanguage contentlanguage n_sessions pct_sessions\n", "1 en en 1361586 0.46746010 \n", "2 fr fr 284033 0.09751429 \n", "3 ru ru 178766 0.06137400 \n", "4 de de 174512 0.05991351 \n", "5 es es 127334 0.04371635 \n", "6 pt pt 91875 0.03154255 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "\n", "top_content_languages <- lang_button_events %>%\n", " mutate(all_sessions = n_distinct(web_session_id)) %>%\n", " group_by(interfacelanguage, contentlanguage) %>%\n", " summarize(n_sessions = n_distinct(web_session_id),\n", " pct_sessions = n_sessions/all_sessions) %>%\n", " distinct() %>%\n", " arrange(desc(n_sessions))\n", "\n", "head(top_content_languages )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The top current content and interface lang settings are english, russian, german and spanish, which fits with the top wikis we see where the language button switch clicks occur.\n", "\n", "Update: French is now on the list as of 22 June 2021 as AB test was deployed then" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Event Context" ] }, { "cell_type": "code", "execution_count": 236, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 3 × 3
contextn_eventsn_sessions
<chr><int><int>
header 595196 326328
NULL 17813841439296
other 14569161170691
\n" ], "text/latex": [ "A tibble: 3 × 3\n", "\\begin{tabular}{lll}\n", " context & n\\_events & n\\_sessions\\\\\n", " & & \\\\\n", "\\hline\n", "\t header & 595196 & 326328\\\\\n", "\t NULL & 1781384 & 1439296\\\\\n", "\t other & 1456916 & 1170691\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 3 × 3\n", "\n", "| context <chr> | n_events <int> | n_sessions <int> |\n", "|---|---|---|\n", "| header | 595196 | 326328 |\n", "| NULL | 1781384 | 1439296 |\n", "| other | 1456916 | 1170691 |\n", "\n" ], "text/plain": [ " context n_events n_sessions\n", "1 header 595196 326328 \n", "2 NULL 1781384 1439296 \n", "3 other 1456916 1170691 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "\n", "lang_button_events_context <- lang_button_events %>%\n", " group_by(context) %>%\n", " summarize(n_events = sum(n_events),\n", " n_sessions = n_distinct(web_session_id))\n", "\n", "lang_button_events_context" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are not recording `event.context` for these events. \n", "\n", "Note: This was changed with a fix deployed on 6 June 2021. We are now recorded new button clicks as event.context = 'header'. event.context = 'other' is recorded for people in control group (or using legacy skin) that click N more button in the sidebar." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Time to Change Language" ] }, { "cell_type": "code", "execution_count": 237, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Warning message in mean.default(timetochangelanguage):\n", "“argument is not numeric or logical: returning NA”\n", "Warning message in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):\n", "“argument is not numeric or logical: returning NA”\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\n", "
A data.frame: 1 × 4
avg_timemedian_timemax_timemin_time
<dbl><dbl><chr><chr>
NANANULLNULL
\n" ], "text/latex": [ "A data.frame: 1 × 4\n", "\\begin{tabular}{llll}\n", " avg\\_time & median\\_time & max\\_time & min\\_time\\\\\n", " & & & \\\\\n", "\\hline\n", "\t NA & NA & NULL & NULL\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 1 × 4\n", "\n", "| avg_time <dbl> | median_time <dbl> | max_time <chr> | min_time <chr> |\n", "|---|---|---|---|\n", "| NA | NA | NULL | NULL |\n", "\n" ], "text/plain": [ " avg_time median_time max_time min_time\n", "1 NA NA NULL NULL " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "\n", "lang_button_events_time <- lang_button_events %>%\n", " summarise(avg_time = mean(timetochangelanguage),\n", " median_time = median(timetochangelanguage),\n", " max_time = max(timetochangelanguage),\n", " min_time = min(timetochangelanguage))\n", "\n", "lang_button_events_time" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: We don't record time to change language for the initial click to the button but we record when the user actually changes languages." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Confirm we can identify language switches for sessions where button is clicked" ] }, { "cell_type": "code", "execution_count": 245, "metadata": {}, "outputs": [], "source": [ "#rough query to confirm approach\n", "# will be refine in analysis\n", "query <- \n", "\"\n", "-- sessions where lang button was selected\n", "WITH button AS (\n", "SELECT\n", " MIN(TO_DATE(dt)) as button_date,\n", " event.web_session_id as session_id,\n", " event.context as open_context,\n", " wiki as wiki\n", "FROM event.universallanguageselector\n", "WHERE\n", " year = 2021\n", " AND month >= 05\n", " AND useragent.is_bot = false\n", " AND event.action = 'compact-language-links-open'\n", "GROUP BY \n", " event.web_session_id,\n", " event.context,\n", " wiki\n", "),\n", "\n", "lang_switches AS (\n", " SELECT\n", " TO_DATE(dt) as switch_date,\n", " event.web_session_id as session_id,\n", " event.context as switch_context,\n", " wiki as wiki\n", "FROM event.universallanguageselector\n", "WHERE\n", " year = 2021\n", " AND month >= 05\n", " AND useragent.is_bot = false\n", " AND event.action = 'language-change'\n", ")\n", "\n", "SELECT\n", " button.button_date,\n", " lang_switches.switch_date,\n", " button.session_id,\n", " button.wiki,\n", " button.open_context,\n", "-- sessions with lang switch that occured after button clicks\n", " IF(lang_switches.session_id IS NOT NULL AND switch_date >= button_date, 1, 0) AS language_switch,\n", " lang_switches.switch_context\n", "FROM button\n", "LEFT JOIN lang_switches ON\n", " button.session_id = lang_switches.session_id AND\n", " button.wiki = lang_switches.wiki\n", " \n", "\"" ] }, { "cell_type": "code", "execution_count": 246, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Don't forget to authenticate with Kerberos using kinit\n", "\n" ] } ], "source": [ "lang_button_switch_events <- wmfdata::query_hive(query)" ] }, { "cell_type": "code", "execution_count": 247, "metadata": {}, "outputs": [], "source": [ "#reformat to date format\n", "lang_button_switch_events$button_date <- as.Date(lang_button_switch_events$button_date, format = \"%Y-%m-%d\")\n", "lang_button_switch_events$switch_date <- as.Date(lang_button_switch_events$switch_date, format = \"%Y-%m-%d\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lang Button Sessions with Language Switch" ] }, { "cell_type": "code", "execution_count": 241, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\n", "
A tibble: 2 × 2
language_switchn_sessions
<chr><int>
no_switch2106249
switch 842170
\n" ], "text/latex": [ "A tibble: 2 × 2\n", "\\begin{tabular}{ll}\n", " language\\_switch & n\\_sessions\\\\\n", " & \\\\\n", "\\hline\n", "\t no\\_switch & 2106249\\\\\n", "\t switch & 842170\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 2 × 2\n", "\n", "| language_switch <chr> | n_sessions <int> |\n", "|---|---|\n", "| no_switch | 2106249 |\n", "| switch | 842170 |\n", "\n" ], "text/plain": [ " language_switch n_sessions\n", "1 no_switch 2106249 \n", "2 switch 842170 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_button_switch_events_nsessions <- lang_button_switch_events %>%\n", " mutate(language_switch = ifelse(language_switch == 0, \"no_switch\", \"switch\")) %>%\n", " group_by(language_switch) %>%\n", " summarise(n_sessions = n_distinct(session_id))\n", "\n", "lang_button_switch_events_nsessions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "About 3.4% of sessions where the language button was clicked was followed by an event to switch the language. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lang Button Sessions with Switch by Context" ] }, { "cell_type": "code", "execution_count": 253, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'open_context' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 6 × 3
open_contextswitch_contextn_sessions
<chr><chr><int>
headercontent-language-switcher112876
headerinterface 3
headerlanguages-list 748
other content-language-switcher 58041
other interface 38
other languages-list 446
\n" ], "text/latex": [ "A grouped\\_df: 6 × 3\n", "\\begin{tabular}{lll}\n", " open\\_context & switch\\_context & n\\_sessions\\\\\n", " & & \\\\\n", "\\hline\n", "\t header & content-language-switcher & 112876\\\\\n", "\t header & interface & 3\\\\\n", "\t header & languages-list & 748\\\\\n", "\t other & content-language-switcher & 58041\\\\\n", "\t other & interface & 38\\\\\n", "\t other & languages-list & 446\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 6 × 3\n", "\n", "| open_context <chr> | switch_context <chr> | n_sessions <int> |\n", "|---|---|---|\n", "| header | content-language-switcher | 112876 |\n", "| header | interface | 3 |\n", "| header | languages-list | 748 |\n", "| other | content-language-switcher | 58041 |\n", "| other | interface | 38 |\n", "| other | languages-list | 446 |\n", "\n" ], "text/plain": [ " open_context switch_context n_sessions\n", "1 header content-language-switcher 112876 \n", "2 header interface 3 \n", "3 header languages-list 748 \n", "4 other content-language-switcher 58041 \n", "5 other interface 38 \n", "6 other languages-list 446 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_button_switch_events_nsessions <- lang_button_switch_events %>%\n", " filter(language_switch == 1,\n", " button_date >= '2021-06-27') %>%\n", " group_by(open_context, switch_context) %>%\n", " summarise(n_sessions = n_distinct(session_id))\n", "\n", "lang_button_switch_events_nsessions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Following 22 June 2021 fix, we are recording three types of language switch events after the new button is clicked:\n", "(1) 'interface': \n", "(2) 'languages-list'\n", "(3) 'content-language-switcher'\n", "\n", "\n", "TODO: Need to clarfiy differences.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Language Button Context Events\n", "\n", "Change deployed on 8 June 2021\n", "Rechecked data to confirm that events are recording as expected and that were able to distinguish between the following three event types:\n", "\n", "* People selecting language button on top of the page\n", "* People that are in the control bucket clicking the \"more\" button\n", "* People that have opted out of modern vector clicking the more button\n", "\n", "Update: Fix deployed on 10 June 2021 to address bug identified in post deployment QA, where links to switch languages after clicking the langauge switcher button were not instrumented. Data updated below to reflect changes following this fix.\n", "\n", "Note on instrumentation:\n", "If the user has opened the language switcher in the header and switched language, we should two events with the following properties:\n", "\n", "1. event.web_session_id=\"foo\" event.action=\"compact-language-links-open\" event.context=\"header\" event.skinversion=\"latest\"\n", "2. event.web_session_id=\"foo\" event.action=\"language-change\" event.context=\"interface\"`\n", "\n", "22 June 2021 Update: Fix deployed to add additional context to differentiate between the following events (both of these were previously marked as event.context = 'interface:\n", "- A user clicks the language button in the header and changes the display language setting\n", "- A user clicks the language button in the header and switches languages" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## By Skin Type" ] }, { "cell_type": "code", "execution_count": 254, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'date', 'skin' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 50 × 4
dateskinskinversionn_sessions
<chr><chr><chr><int>
2021-06-11NULL NULL 18
2021-06-11vectorlatest 5467
2021-06-11vectorlegacy54874
2021-06-12NULL NULL 14
2021-06-12vectorlatest 4363
2021-06-12vectorlegacy45448
2021-06-13NULL NULL 5
2021-06-13vectorlatest 5053
2021-06-13vectorlegacy49477
2021-06-14NULL NULL 9
2021-06-14vectorlatest 5926
2021-06-14vectorlegacy61784
2021-06-15NULL NULL 6
2021-06-15vectorlatest 5952
2021-06-15vectorlegacy61043
2021-06-16NULL NULL 2
2021-06-16vectorlatest 5725
2021-06-16vectorlegacy58951
2021-06-17NULL NULL 1
2021-06-17vectorlatest 5622
2021-06-17vectorlegacy57772
2021-06-18NULL NULL 2
2021-06-18vectorlatest 4948
2021-06-18vectorlegacy52140
2021-06-19NULL NULL 4
2021-06-19vectorlatest 4288
2021-06-19vectorlegacy42424
2021-06-20NULL NULL 2
2021-06-20vectorlatest 4618
2021-06-20vectorlegacy46535
2021-06-21vectorlatest 5528
2021-06-21vectorlegacy58187
2021-06-22vectorlatest23023
2021-06-22vectorlegacy57450
2021-06-23vectorlatest47755
2021-06-23vectorlegacy55941
2021-06-24vectorlatest51067
2021-06-24vectorlegacy55742
2021-06-25NULL NULL 1
2021-06-25vectorlatest48562
2021-06-25vectorlegacy50041
2021-06-26vectorlatest44291
2021-06-26vectorlegacy41258
2021-06-27vectorlatest51986
2021-06-27vectorlegacy44501
2021-06-28NULL NULL 1
2021-06-28vectorlatest68627
2021-06-28vectorlegacy54833
2021-06-29vectorlatest23540
2021-06-29vectorlegacy22065
\n" ], "text/latex": [ "A grouped\\_df: 50 × 4\n", "\\begin{tabular}{llll}\n", " date & skin & skinversion & n\\_sessions\\\\\n", " & & & \\\\\n", "\\hline\n", "\t 2021-06-11 & NULL & NULL & 18\\\\\n", "\t 2021-06-11 & vector & latest & 5467\\\\\n", "\t 2021-06-11 & vector & legacy & 54874\\\\\n", "\t 2021-06-12 & NULL & NULL & 14\\\\\n", "\t 2021-06-12 & vector & latest & 4363\\\\\n", "\t 2021-06-12 & vector & legacy & 45448\\\\\n", "\t 2021-06-13 & NULL & NULL & 5\\\\\n", "\t 2021-06-13 & vector & latest & 5053\\\\\n", "\t 2021-06-13 & vector & legacy & 49477\\\\\n", "\t 2021-06-14 & NULL & NULL & 9\\\\\n", "\t 2021-06-14 & vector & latest & 5926\\\\\n", "\t 2021-06-14 & vector & legacy & 61784\\\\\n", "\t 2021-06-15 & NULL & NULL & 6\\\\\n", "\t 2021-06-15 & vector & latest & 5952\\\\\n", "\t 2021-06-15 & vector & legacy & 61043\\\\\n", "\t 2021-06-16 & NULL & NULL & 2\\\\\n", "\t 2021-06-16 & vector & latest & 5725\\\\\n", "\t 2021-06-16 & vector & legacy & 58951\\\\\n", "\t 2021-06-17 & NULL & NULL & 1\\\\\n", "\t 2021-06-17 & vector & latest & 5622\\\\\n", "\t 2021-06-17 & vector & legacy & 57772\\\\\n", "\t 2021-06-18 & NULL & NULL & 2\\\\\n", "\t 2021-06-18 & vector & latest & 4948\\\\\n", "\t 2021-06-18 & vector & legacy & 52140\\\\\n", "\t 2021-06-19 & NULL & NULL & 4\\\\\n", "\t 2021-06-19 & vector & latest & 4288\\\\\n", "\t 2021-06-19 & vector & legacy & 42424\\\\\n", "\t 2021-06-20 & NULL & NULL & 2\\\\\n", "\t 2021-06-20 & vector & latest & 4618\\\\\n", "\t 2021-06-20 & vector & legacy & 46535\\\\\n", "\t 2021-06-21 & vector & latest & 5528\\\\\n", "\t 2021-06-21 & vector & legacy & 58187\\\\\n", "\t 2021-06-22 & vector & latest & 23023\\\\\n", "\t 2021-06-22 & vector & legacy & 57450\\\\\n", "\t 2021-06-23 & vector & latest & 47755\\\\\n", "\t 2021-06-23 & vector & legacy & 55941\\\\\n", "\t 2021-06-24 & vector & latest & 51067\\\\\n", "\t 2021-06-24 & vector & legacy & 55742\\\\\n", "\t 2021-06-25 & NULL & NULL & 1\\\\\n", "\t 2021-06-25 & vector & latest & 48562\\\\\n", "\t 2021-06-25 & vector & legacy & 50041\\\\\n", "\t 2021-06-26 & vector & latest & 44291\\\\\n", "\t 2021-06-26 & vector & legacy & 41258\\\\\n", "\t 2021-06-27 & vector & latest & 51986\\\\\n", "\t 2021-06-27 & vector & legacy & 44501\\\\\n", "\t 2021-06-28 & NULL & NULL & 1\\\\\n", "\t 2021-06-28 & vector & latest & 68627\\\\\n", "\t 2021-06-28 & vector & legacy & 54833\\\\\n", "\t 2021-06-29 & vector & latest & 23540\\\\\n", "\t 2021-06-29 & vector & legacy & 22065\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 50 × 4\n", "\n", "| date <chr> | skin <chr> | skinversion <chr> | n_sessions <int> |\n", "|---|---|---|---|\n", "| 2021-06-11 | NULL | NULL | 18 |\n", "| 2021-06-11 | vector | latest | 5467 |\n", "| 2021-06-11 | vector | legacy | 54874 |\n", "| 2021-06-12 | NULL | NULL | 14 |\n", "| 2021-06-12 | vector | latest | 4363 |\n", "| 2021-06-12 | vector | legacy | 45448 |\n", "| 2021-06-13 | NULL | NULL | 5 |\n", "| 2021-06-13 | vector | latest | 5053 |\n", "| 2021-06-13 | vector | legacy | 49477 |\n", "| 2021-06-14 | NULL | NULL | 9 |\n", "| 2021-06-14 | vector | latest | 5926 |\n", "| 2021-06-14 | vector | legacy | 61784 |\n", "| 2021-06-15 | NULL | NULL | 6 |\n", "| 2021-06-15 | vector | latest | 5952 |\n", "| 2021-06-15 | vector | legacy | 61043 |\n", "| 2021-06-16 | NULL | NULL | 2 |\n", "| 2021-06-16 | vector | latest | 5725 |\n", "| 2021-06-16 | vector | legacy | 58951 |\n", "| 2021-06-17 | NULL | NULL | 1 |\n", "| 2021-06-17 | vector | latest | 5622 |\n", "| 2021-06-17 | vector | legacy | 57772 |\n", "| 2021-06-18 | NULL | NULL | 2 |\n", "| 2021-06-18 | vector | latest | 4948 |\n", "| 2021-06-18 | vector | legacy | 52140 |\n", "| 2021-06-19 | NULL | NULL | 4 |\n", "| 2021-06-19 | vector | latest | 4288 |\n", "| 2021-06-19 | vector | legacy | 42424 |\n", "| 2021-06-20 | NULL | NULL | 2 |\n", "| 2021-06-20 | vector | latest | 4618 |\n", "| 2021-06-20 | vector | legacy | 46535 |\n", "| 2021-06-21 | vector | latest | 5528 |\n", "| 2021-06-21 | vector | legacy | 58187 |\n", "| 2021-06-22 | vector | latest | 23023 |\n", "| 2021-06-22 | vector | legacy | 57450 |\n", "| 2021-06-23 | vector | latest | 47755 |\n", "| 2021-06-23 | vector | legacy | 55941 |\n", "| 2021-06-24 | vector | latest | 51067 |\n", "| 2021-06-24 | vector | legacy | 55742 |\n", "| 2021-06-25 | NULL | NULL | 1 |\n", "| 2021-06-25 | vector | latest | 48562 |\n", "| 2021-06-25 | vector | legacy | 50041 |\n", "| 2021-06-26 | vector | latest | 44291 |\n", "| 2021-06-26 | vector | legacy | 41258 |\n", "| 2021-06-27 | vector | latest | 51986 |\n", "| 2021-06-27 | vector | legacy | 44501 |\n", "| 2021-06-28 | NULL | NULL | 1 |\n", "| 2021-06-28 | vector | latest | 68627 |\n", "| 2021-06-28 | vector | legacy | 54833 |\n", "| 2021-06-29 | vector | latest | 23540 |\n", "| 2021-06-29 | vector | legacy | 22065 |\n", "\n" ], "text/plain": [ " date skin skinversion n_sessions\n", "1 2021-06-11 NULL NULL 18 \n", "2 2021-06-11 vector latest 5467 \n", "3 2021-06-11 vector legacy 54874 \n", "4 2021-06-12 NULL NULL 14 \n", "5 2021-06-12 vector latest 4363 \n", "6 2021-06-12 vector legacy 45448 \n", "7 2021-06-13 NULL NULL 5 \n", "8 2021-06-13 vector latest 5053 \n", "9 2021-06-13 vector legacy 49477 \n", "10 2021-06-14 NULL NULL 9 \n", "11 2021-06-14 vector latest 5926 \n", "12 2021-06-14 vector legacy 61784 \n", "13 2021-06-15 NULL NULL 6 \n", "14 2021-06-15 vector latest 5952 \n", "15 2021-06-15 vector legacy 61043 \n", "16 2021-06-16 NULL NULL 2 \n", "17 2021-06-16 vector latest 5725 \n", "18 2021-06-16 vector legacy 58951 \n", "19 2021-06-17 NULL NULL 1 \n", "20 2021-06-17 vector latest 5622 \n", "21 2021-06-17 vector legacy 57772 \n", "22 2021-06-18 NULL NULL 2 \n", "23 2021-06-18 vector latest 4948 \n", "24 2021-06-18 vector legacy 52140 \n", "25 2021-06-19 NULL NULL 4 \n", "26 2021-06-19 vector latest 4288 \n", "27 2021-06-19 vector legacy 42424 \n", "28 2021-06-20 NULL NULL 2 \n", "29 2021-06-20 vector latest 4618 \n", "30 2021-06-20 vector legacy 46535 \n", "31 2021-06-21 vector latest 5528 \n", "32 2021-06-21 vector legacy 58187 \n", "33 2021-06-22 vector latest 23023 \n", "34 2021-06-22 vector legacy 57450 \n", "35 2021-06-23 vector latest 47755 \n", "36 2021-06-23 vector legacy 55941 \n", "37 2021-06-24 vector latest 51067 \n", "38 2021-06-24 vector legacy 55742 \n", "39 2021-06-25 NULL NULL 1 \n", "40 2021-06-25 vector latest 48562 \n", "41 2021-06-25 vector legacy 50041 \n", "42 2021-06-26 vector latest 44291 \n", "43 2021-06-26 vector legacy 41258 \n", "44 2021-06-27 vector latest 51986 \n", "45 2021-06-27 vector legacy 44501 \n", "46 2021-06-28 NULL NULL 1 \n", "47 2021-06-28 vector latest 68627 \n", "48 2021-06-28 vector legacy 54833 \n", "49 2021-06-29 vector latest 23540 \n", "50 2021-06-29 vector legacy 22065 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_button_switch_events_byskin <- lang_button_events %>%\n", " filter(date >= '2021-06-11') %>% # date fix deployed\n", " group_by(date, skin, skinversion) %>%\n", " summarise(n_sessions = n_distinct(web_session_id))\n", "\n", "lang_button_switch_events_byskin " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are recording two skin types: 'NULL and 'vector' and recording both skin version types for vector: 'latest' and 'legacy'. I'm assuming NULL counts as all skin types not as vector. It's not needed for the AB test but I recommend revising to clarify the specific non-vector skin types at some point." ] }, { "cell_type": "code", "execution_count": 201, "metadata": {}, "outputs": [], "source": [ "# By Context Field" ] }, { "cell_type": "code", "execution_count": 255, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'context', 'skin' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 4 × 4
contextskinskinversionn_sessions
<chr><chr><chr><int>
headervectorlatest325736
NULL NULL NULL 65
other vectorlatest 62458
other vectorlegacy939722
\n" ], "text/latex": [ "A grouped\\_df: 4 × 4\n", "\\begin{tabular}{llll}\n", " context & skin & skinversion & n\\_sessions\\\\\n", " & & & \\\\\n", "\\hline\n", "\t header & vector & latest & 325736\\\\\n", "\t NULL & NULL & NULL & 65\\\\\n", "\t other & vector & latest & 62458\\\\\n", "\t other & vector & legacy & 939722\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 4 × 4\n", "\n", "| context <chr> | skin <chr> | skinversion <chr> | n_sessions <int> |\n", "|---|---|---|---|\n", "| header | vector | latest | 325736 |\n", "| NULL | NULL | NULL | 65 |\n", "| other | vector | latest | 62458 |\n", "| other | vector | legacy | 939722 |\n", "\n" ], "text/plain": [ " context skin skinversion n_sessions\n", "1 header vector latest 325736 \n", "2 NULL NULL NULL 65 \n", "3 other vector latest 62458 \n", "4 other vector legacy 939722 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_button_switch_events_bycontext <- lang_button_events %>%\n", " filter(date >= '2021-06-11') %>% # date fix deployed\n", " group_by(context, skin, skinversion) %>%\n", " summarise(n_sessions = n_distinct(web_session_id))\n", "\n", "lang_button_switch_events_bycontext " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Confirmed that with new instrumentation we can differentiate the following event types:\n", " * People on vector selecting language button on top of the page: `event.action = 'compact-language-links-open'`; `event.context = 'header'`; `event.skinVersion = 'latest'`\n", " * People on vector that are in the control bucket clicking the \" N more\" button in the sidebar: `event.action = 'compact-language-links-open'` and `event.context = 'other'`, `event.skinVersion = 'latest'`\n", " * People that have opted out of modern vector clicking the more button: `event.context = 'NULL'`; `event.skinVersion = 'NULL'`\n", "\n", "\n", "We are only recording clicks to the new button in the header on the latest vector as expected. \n", "\n", "There are clicks to the N More button by both legacy and latest as expected. The new language button was not deployed to logged-in users on test wikis so they would still have clicks to the N more button on the latest vector. When the AB test run, only users in the control group will be able to access the N more button on the latest vector.\n", "\n", "Note context value events are only set for user on vector; however, we can identify click from other skins to the N More button by the lack of values (`event.context = NULL`)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## By logged-in and logged-out status" ] }, { "cell_type": "code", "execution_count": 256, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'context', 'skin', 'skinversion' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 8 × 5
contextskinskinversionisanonn_sessions
<chr><chr><chr><chr><int>
headervectorlatestfalse 10298
headervectorlatesttrue 315691
NULL NULL NULL false 1
NULL NULL NULL true 64
other vectorlatestfalse 2226
other vectorlatesttrue 60257
other vectorlegacyfalse 24755
other vectorlegacytrue 915463
\n" ], "text/latex": [ "A grouped\\_df: 8 × 5\n", "\\begin{tabular}{lllll}\n", " context & skin & skinversion & isanon & n\\_sessions\\\\\n", " & & & & \\\\\n", "\\hline\n", "\t header & vector & latest & false & 10298\\\\\n", "\t header & vector & latest & true & 315691\\\\\n", "\t NULL & NULL & NULL & false & 1\\\\\n", "\t NULL & NULL & NULL & true & 64\\\\\n", "\t other & vector & latest & false & 2226\\\\\n", "\t other & vector & latest & true & 60257\\\\\n", "\t other & vector & legacy & false & 24755\\\\\n", "\t other & vector & legacy & true & 915463\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 8 × 5\n", "\n", "| context <chr> | skin <chr> | skinversion <chr> | isanon <chr> | n_sessions <int> |\n", "|---|---|---|---|---|\n", "| header | vector | latest | false | 10298 |\n", "| header | vector | latest | true | 315691 |\n", "| NULL | NULL | NULL | false | 1 |\n", "| NULL | NULL | NULL | true | 64 |\n", "| other | vector | latest | false | 2226 |\n", "| other | vector | latest | true | 60257 |\n", "| other | vector | legacy | false | 24755 |\n", "| other | vector | legacy | true | 915463 |\n", "\n" ], "text/plain": [ " context skin skinversion isanon n_sessions\n", "1 header vector latest false 10298 \n", "2 header vector latest true 315691 \n", "3 NULL NULL NULL false 1 \n", "4 NULL NULL NULL true 64 \n", "5 other vector latest false 2226 \n", "6 other vector latest true 60257 \n", "7 other vector legacy false 24755 \n", "8 other vector legacy true 915463 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_button_switch_events_byusertype <- lang_button_events %>%\n", " filter(date >= '2021-06-11') %>% # date fix deployed\n", " group_by(context, skin, skinversion, isanon) %>%\n", " summarise(n_sessions = n_distinct(web_session_id))\n", "\n", "lang_button_switch_events_byusertype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Totals by Anon" ] }, { "cell_type": "code", "execution_count": 259, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\n", "
A tibble: 2 × 3
isanonn_eventsn_sessions
<chr><int><int>
false116483378
true 9 5
\n" ], "text/latex": [ "A tibble: 2 × 3\n", "\\begin{tabular}{lll}\n", " isanon & n\\_events & n\\_sessions\\\\\n", " & & \\\\\n", "\\hline\n", "\t false & 11648 & 3378\\\\\n", "\t true & 9 & 5\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 2 × 3\n", "\n", "| isanon <chr> | n_events <int> | n_sessions <int> |\n", "|---|---|---|\n", "| false | 11648 | 3378 |\n", "| true | 9 | 5 |\n", "\n" ], "text/plain": [ " isanon n_events n_sessions\n", "1 false 11648 3378 \n", "2 true 9 5 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_button_events_byusertype_all <- lang_button_events %>%\n", " filter(date < '2021-06-22',\n", " context == 'header') %>% # date fix deployed\n", " group_by(isanon) %>%\n", " summarize(n_events = sum(n_events),\n", " n_sessions = n_distinct(web_session_id)) \n", "\n", "head(lang_button_events_byusertype_all)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Confirmed we are only recording clicks to the new language button by logged-in users. All new button clicks recorded so far have been on latest vector. \n", "\n", "Update: AB Test deployed on 22 June 2021. We start seeing a lot more logged-out users with clicks to the new button after that date." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Investigate logged-out new lang button events" ] }, { "cell_type": "code", "execution_count": 205, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 3 × 13
datewikiweb_session_idusereditbucketskinskinversiontimetochangelanguageisanoncontextinterfacelanguagecontentlanguageselectedinterfacelanguagen_events
<chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><int>
2021-06-11testwikibc29f4fc89d59f4bc99cNULLvectorlatestNULLtrueheaderenenNULL1
2021-06-11bnwiki 4fa77beb78338661d19cNULLvectorlatestNULLtrueheaderbnbnNULL1
2021-06-13testwiki1c87469b4eb16ead0427NULLvectorlatestNULLtrueheaderenenNULL3
\n" ], "text/latex": [ "A data.frame: 3 × 13\n", "\\begin{tabular}{lllllllllllll}\n", " date & wiki & web\\_session\\_id & usereditbucket & skin & skinversion & timetochangelanguage & isanon & context & interfacelanguage & contentlanguage & selectedinterfacelanguage & n\\_events\\\\\n", " & & & & & & & & & & & & \\\\\n", "\\hline\n", "\t 2021-06-11 & testwiki & bc29f4fc89d59f4bc99c & NULL & vector & latest & NULL & true & header & en & en & NULL & 1\\\\\n", "\t 2021-06-11 & bnwiki & 4fa77beb78338661d19c & NULL & vector & latest & NULL & true & header & bn & bn & NULL & 1\\\\\n", "\t 2021-06-13 & testwiki & 1c87469b4eb16ead0427 & NULL & vector & latest & NULL & true & header & en & en & NULL & 3\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 3 × 13\n", "\n", "| date <chr> | wiki <chr> | web_session_id <chr> | usereditbucket <chr> | skin <chr> | skinversion <chr> | timetochangelanguage <chr> | isanon <chr> | context <chr> | interfacelanguage <chr> | contentlanguage <chr> | selectedinterfacelanguage <chr> | n_events <int> |\n", "|---|---|---|---|---|---|---|---|---|---|---|---|---|\n", "| 2021-06-11 | testwiki | bc29f4fc89d59f4bc99c | NULL | vector | latest | NULL | true | header | en | en | NULL | 1 |\n", "| 2021-06-11 | bnwiki | 4fa77beb78338661d19c | NULL | vector | latest | NULL | true | header | bn | bn | NULL | 1 |\n", "| 2021-06-13 | testwiki | 1c87469b4eb16ead0427 | NULL | vector | latest | NULL | true | header | en | en | NULL | 3 |\n", "\n" ], "text/plain": [ " date wiki web_session_id usereditbucket skin skinversion\n", "1 2021-06-11 testwiki bc29f4fc89d59f4bc99c NULL vector latest \n", "2 2021-06-11 bnwiki 4fa77beb78338661d19c NULL vector latest \n", "3 2021-06-13 testwiki 1c87469b4eb16ead0427 NULL vector latest \n", " timetochangelanguage isanon context interfacelanguage contentlanguage\n", "1 NULL true header en en \n", "2 NULL true header bn bn \n", "3 NULL true header en en \n", " selectedinterfacelanguage n_events\n", "1 NULL 1 \n", "2 NULL 1 \n", "3 NULL 3 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_button_events_isanon <- lang_button_events %>%\n", " filter(date >= '2021-06-11',\n", " context == 'header',\n", " isanon == \"true\")\n", "\n", "lang_button_events_isanon " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Two of these events occured on testwiki and one on bwiki, which is one of the early adopter wikis.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# By Test Wiki" ] }, { "cell_type": "code", "execution_count": 261, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'context', 'skin', 'skinversion' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 5 × 5
contextskinskinversionisanonn_sessions
<chr><chr><chr><chr><int>
headervectorlatestfalse 4743
headervectorlatesttrue 258299
other vectorlatestfalse 450
other vectorlatesttrue 3781
other vectorlegacyfalse 165
\n" ], "text/latex": [ "A grouped\\_df: 5 × 5\n", "\\begin{tabular}{lllll}\n", " context & skin & skinversion & isanon & n\\_sessions\\\\\n", " & & & & \\\\\n", "\\hline\n", "\t header & vector & latest & false & 4743\\\\\n", "\t header & vector & latest & true & 258299\\\\\n", "\t other & vector & latest & false & 450\\\\\n", "\t other & vector & latest & true & 3781\\\\\n", "\t other & vector & legacy & false & 165\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 5 × 5\n", "\n", "| context <chr> | skin <chr> | skinversion <chr> | isanon <chr> | n_sessions <int> |\n", "|---|---|---|---|---|\n", "| header | vector | latest | false | 4743 |\n", "| header | vector | latest | true | 258299 |\n", "| other | vector | latest | false | 450 |\n", "| other | vector | latest | true | 3781 |\n", "| other | vector | legacy | false | 165 |\n", "\n" ], "text/plain": [ " context skin skinversion isanon n_sessions\n", "1 header vector latest false 4743 \n", "2 header vector latest true 258299 \n", "3 other vector latest false 450 \n", "4 other vector latest true 3781 \n", "5 other vector legacy false 165 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_button_switch_events_byisanon_testwiki <- lang_button_events %>%\n", " filter(wiki %in% c('frwiktionary', 'hewiki', 'ptwikiversity', 'frwiki', \n", " 'euwiki', 'fawiki', 'ptwiki', 'kowiki', 'trwiki', 'srwiki', 'bnwiki', 'dewikivoyage', 'vecwiki' ), \n", " date >= '2021-06-24') %>% # following AB test\n", " group_by(context, skin, skinversion, isanon) %>%\n", " summarise(n_sessions = n_distinct(web_session_id))\n", "\n", "lang_button_switch_events_byisanon_testwiki" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Numbers appear as expected. There are a limited number of events recorded on legacy vector since the new skin is deployed as opt-out.\n", "\n", "We're still seeing some clicks to the N More button on the latest vector. Not sure where these are coming frmo." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# By Non Test Wiki" ] }, { "cell_type": "code", "execution_count": 262, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'context', 'skin', 'skinversion' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 4 × 6
contextskinskinversionisanonn_sessionsn_events
<chr><chr><chr><chr><int><int>
headervectorlatestfalse425215217
headervectorlatesttrue 62 107
other vectorlatestfalse 3 3
other vectorlatesttrue 19 24
\n" ], "text/latex": [ "A grouped\\_df: 4 × 6\n", "\\begin{tabular}{llllll}\n", " context & skin & skinversion & isanon & n\\_sessions & n\\_events\\\\\n", " & & & & & \\\\\n", "\\hline\n", "\t header & vector & latest & false & 4252 & 15217\\\\\n", "\t header & vector & latest & true & 62 & 107\\\\\n", "\t other & vector & latest & false & 3 & 3\\\\\n", "\t other & vector & latest & true & 19 & 24\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 4 × 6\n", "\n", "| context <chr> | skin <chr> | skinversion <chr> | isanon <chr> | n_sessions <int> | n_events <int> |\n", "|---|---|---|---|---|---|\n", "| header | vector | latest | false | 4252 | 15217 |\n", "| header | vector | latest | true | 62 | 107 |\n", "| other | vector | latest | false | 3 | 3 |\n", "| other | vector | latest | true | 19 | 24 |\n", "\n" ], "text/plain": [ " context skin skinversion isanon n_sessions n_events\n", "1 header vector latest false 4252 15217 \n", "2 header vector latest true 62 107 \n", "3 other vector latest false 3 3 \n", "4 other vector latest true 19 24 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_button_switch_events_bynontest <- lang_button_events %>%\n", " filter(skinversion == 'latest',\n", " skin == 'vector',\n", " date >= '2021-06-11',\n", " !wiki %in% c('frwiktionary', 'hewiki', 'ptwikiversity', 'frwiki', \n", " 'euwiki', 'fawiki', 'ptwiki', 'kowiki', 'trwiki', 'srwiki', 'bnwiki', 'dewikivoyage', 'vecwiki' ), \n", " date >= '2021-06-08') %>% # date fix deployed\n", " group_by(context, skin, skinversion, isanon) %>%\n", " summarise(n_sessions = n_distinct(web_session_id),\n", " n_events = sum(n_events))\n", "\n", "lang_button_switch_events_bynontest" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On non-test wikis, we should only be recording clicks to the new button by logged-in users. There are two events by logged-out users which occurs on test wiki, see above.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Content Language Check" ] }, { "cell_type": "code", "execution_count": 162, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'interfacelanguage', 'contentlanguage' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 6 × 4
interfacelanguagecontentlanguagen_sessionspct_sessions
<chr><chr><int><dbl>
en en850.29513889
ja ja200.06944444
zh-twzh200.06944444
de de140.04861111
ru ru120.04166667
ca ca110.03819444
\n" ], "text/latex": [ "A grouped\\_df: 6 × 4\n", "\\begin{tabular}{llll}\n", " interfacelanguage & contentlanguage & n\\_sessions & pct\\_sessions\\\\\n", " & & & \\\\\n", "\\hline\n", "\t en & en & 85 & 0.29513889\\\\\n", "\t ja & ja & 20 & 0.06944444\\\\\n", "\t zh-tw & zh & 20 & 0.06944444\\\\\n", "\t de & de & 14 & 0.04861111\\\\\n", "\t ru & ru & 12 & 0.04166667\\\\\n", "\t ca & ca & 11 & 0.03819444\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 6 × 4\n", "\n", "| interfacelanguage <chr> | contentlanguage <chr> | n_sessions <int> | pct_sessions <dbl> |\n", "|---|---|---|---|\n", "| en | en | 85 | 0.29513889 |\n", "| ja | ja | 20 | 0.06944444 |\n", "| zh-tw | zh | 20 | 0.06944444 |\n", "| de | de | 14 | 0.04861111 |\n", "| ru | ru | 12 | 0.04166667 |\n", "| ca | ca | 11 | 0.03819444 |\n", "\n" ], "text/plain": [ " interfacelanguage contentlanguage n_sessions pct_sessions\n", "1 en en 85 0.29513889 \n", "2 ja ja 20 0.06944444 \n", "3 zh-tw zh 20 0.06944444 \n", "4 de de 14 0.04861111 \n", "5 ru ru 12 0.04166667 \n", "6 ca ca 11 0.03819444 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "\n", "top_content_languages <- lang_button_events %>%\n", " filter(date >= '2021-06-11',\n", " context == 'header') %>%\n", " mutate(all_sessions = n_distinct(web_session_id)) %>%\n", " group_by(interfacelanguage, contentlanguage) %>%\n", " summarize(n_sessions = n_distinct(web_session_id),\n", " pct_sessions = n_sessions/all_sessions) %>%\n", " distinct() %>%\n", " arrange(desc(n_sessions))\n", "\n", "head(top_content_languages)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "No test wikis and larger size non-test wikis are listed as the top content langauges where a new search button was clicked." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Edit Count" ] }, { "cell_type": "code", "execution_count": 209, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'usereditbucket' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 6 × 3
usereditbucketisanonn_sessions
<chr><chr><int>
0 edits false 93
1-4 edits false 64
100-999 editsfalse135
1000+ edits false360
5-99 edits false181
NULL true 3
\n" ], "text/latex": [ "A grouped\\_df: 6 × 3\n", "\\begin{tabular}{lll}\n", " usereditbucket & isanon & n\\_sessions\\\\\n", " & & \\\\\n", "\\hline\n", "\t 0 edits & false & 93\\\\\n", "\t 1-4 edits & false & 64\\\\\n", "\t 100-999 edits & false & 135\\\\\n", "\t 1000+ edits & false & 360\\\\\n", "\t 5-99 edits & false & 181\\\\\n", "\t NULL & true & 3\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 6 × 3\n", "\n", "| usereditbucket <chr> | isanon <chr> | n_sessions <int> |\n", "|---|---|---|\n", "| 0 edits | false | 93 |\n", "| 1-4 edits | false | 64 |\n", "| 100-999 edits | false | 135 |\n", "| 1000+ edits | false | 360 |\n", "| 5-99 edits | false | 181 |\n", "| NULL | true | 3 |\n", "\n" ], "text/plain": [ " usereditbucket isanon n_sessions\n", "1 0 edits false 93 \n", "2 1-4 edits false 64 \n", "3 100-999 edits false 135 \n", "4 1000+ edits false 360 \n", "5 5-99 edits false 181 \n", "6 NULL true 3 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "\n", "lang_button_byeditcount <- lang_button_events %>%\n", " filter(date >= '2021-06-11',\n", " context == 'header') %>%\n", " group_by(usereditbucket, isanon) %>%\n", " summarize(n_sessions = n_distinct(web_session_id))\n", "\n", "lang_button_byeditcount" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are three NULL events recorded. These are all for the logged-out user events with clicks to the header identified above.\n", "\n", "There appears to be some type of bug with these instances. Need to identify more info associated with these events to see if we can isolate what's happening here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Check that you can determine language switches from sessions with clicks to new button" ] }, { "cell_type": "code", "execution_count": 433, "metadata": {}, "outputs": [], "source": [ "#rough query to confirm approach\n", "# will be refine in analysis\n", "query <- \n", "\"\n", "-- sessions where lang button was selected\n", "WITH button AS (\n", "SELECT\n", " MIN(TO_DATE(dt)) as button_date,\n", " event.web_session_id as session_id,\n", " event.skinVersion as skinversion,\n", " event.isAnon As isAnon,\n", " event.context as button_type,\n", " wiki as wiki\n", "FROM event.universallanguageselector\n", "WHERE\n", " year = 2021\n", " AND month = 06\n", " AND Day >= 23\n", " AND Day <= 30\n", " AND useragent.is_bot = false\n", " AND event.action = 'compact-language-links-open'\n", "GROUP BY \n", " event.web_session_id,\n", " event.context,\n", " event.isAnon,\n", " event.skinversion,\n", " wiki\n", "),\n", "\n", "lang_switches AS (\n", " SELECT\n", " TO_DATE(dt) as switch_date,\n", " event.web_session_id as session_id,\n", " event.context as switch_context,\n", " event.timetochangelanguage as switch_time,\n", " wiki as wiki\n", "FROM event.universallanguageselector\n", "WHERE\n", " year = 2021\n", " AND month = 06\n", " AND day >= 23\n", " AND useragent.is_bot = false\n", " AND event.action = 'language-change'\n", ")\n", "\n", "SELECT\n", " button.button_date,\n", " button.button_type,\n", " button.skinversion,\n", " button.isAnon,\n", " lang_switches.switch_time,\n", " lang_switches.switch_date,\n", " button.session_id,\n", " button.wiki,\n", "-- sessions with lang switch that occured after button clicks\n", " IF(lang_switches.session_id IS NOT NULL, 1, 0) AS language_switch,\n", " lang_switches.switch_context\n", "FROM button\n", "LEFT JOIN lang_switches ON\n", " button.session_id = lang_switches.session_id\n", " \n", "\"" ] }, { "cell_type": "code", "execution_count": 434, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Don't forget to authenticate with Kerberos using kinit\n", "\n" ] } ], "source": [ "lang_button_switch_events <- wmfdata::query_hive(query)" ] }, { "cell_type": "code", "execution_count": 435, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'button_type', 'wiki' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 20 × 4
button_typewikicontent-language-switcherinterface
<chr><chr><int><int>
headerbnwiki 49NA
headereuwiki 105 1
headerfrwiki 2903 1
headerfrwiktionary 67NA
headerhewiki 660NA
headerkowiki 341 4
headerptwiki 928 1
headersrwiki 135NA
headertrwiki 358 1
headervecwiki 1NA
other bnwiki 3NA
other euwiki 8NA
other frwiki 149 1
other frwiktionary 4NA
other hewiki 15NA
other kowiki 11NA
other ptwiki 50NA
other srwiki 35NA
other trwiki 30NA
other vecwiki 1NA
\n" ], "text/latex": [ "A grouped\\_df: 20 × 4\n", "\\begin{tabular}{llll}\n", " button\\_type & wiki & content-language-switcher & interface\\\\\n", " & & & \\\\\n", "\\hline\n", "\t header & bnwiki & 49 & NA\\\\\n", "\t header & euwiki & 105 & 1\\\\\n", "\t header & frwiki & 2903 & 1\\\\\n", "\t header & frwiktionary & 67 & NA\\\\\n", "\t header & hewiki & 660 & NA\\\\\n", "\t header & kowiki & 341 & 4\\\\\n", "\t header & ptwiki & 928 & 1\\\\\n", "\t header & srwiki & 135 & NA\\\\\n", "\t header & trwiki & 358 & 1\\\\\n", "\t header & vecwiki & 1 & NA\\\\\n", "\t other & bnwiki & 3 & NA\\\\\n", "\t other & euwiki & 8 & NA\\\\\n", "\t other & frwiki & 149 & 1\\\\\n", "\t other & frwiktionary & 4 & NA\\\\\n", "\t other & hewiki & 15 & NA\\\\\n", "\t other & kowiki & 11 & NA\\\\\n", "\t other & ptwiki & 50 & NA\\\\\n", "\t other & srwiki & 35 & NA\\\\\n", "\t other & trwiki & 30 & NA\\\\\n", "\t other & vecwiki & 1 & NA\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 20 × 4\n", "\n", "| button_type <chr> | wiki <chr> | content-language-switcher <int> | interface <int> |\n", "|---|---|---|---|\n", "| header | bnwiki | 49 | NA |\n", "| header | euwiki | 105 | 1 |\n", "| header | frwiki | 2903 | 1 |\n", "| header | frwiktionary | 67 | NA |\n", "| header | hewiki | 660 | NA |\n", "| header | kowiki | 341 | 4 |\n", "| header | ptwiki | 928 | 1 |\n", "| header | srwiki | 135 | NA |\n", "| header | trwiki | 358 | 1 |\n", "| header | vecwiki | 1 | NA |\n", "| other | bnwiki | 3 | NA |\n", "| other | euwiki | 8 | NA |\n", "| other | frwiki | 149 | 1 |\n", "| other | frwiktionary | 4 | NA |\n", "| other | hewiki | 15 | NA |\n", "| other | kowiki | 11 | NA |\n", "| other | ptwiki | 50 | NA |\n", "| other | srwiki | 35 | NA |\n", "| other | trwiki | 30 | NA |\n", "| other | vecwiki | 1 | NA |\n", "\n" ], "text/plain": [ " button_type wiki content-language-switcher interface\n", "1 header bnwiki 49 NA \n", "2 header euwiki 105 1 \n", "3 header frwiki 2903 1 \n", "4 header frwiktionary 67 NA \n", "5 header hewiki 660 NA \n", "6 header kowiki 341 4 \n", "7 header ptwiki 928 1 \n", "8 header srwiki 135 NA \n", "9 header trwiki 358 1 \n", "10 header vecwiki 1 NA \n", "11 other bnwiki 3 NA \n", "12 other euwiki 8 NA \n", "13 other frwiki 149 1 \n", "14 other frwiktionary 4 NA \n", "15 other hewiki 15 NA \n", "16 other kowiki 11 NA \n", "17 other ptwiki 50 NA \n", "18 other srwiki 35 NA \n", "19 other trwiki 30 NA \n", "20 other vecwiki 1 NA " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_button_switch_bybuttontype <- lang_button_switch_events %>%\n", " filter(\n", " language_switch == 1, #only sessions with lang switch\n", " wiki %in% c('frwiktionary', 'hewiki', 'ptwikiversity', 'frwiki', \n", " 'euwiki', 'ptwiki', 'kowiki', 'trwiki', 'srwiki', 'bnwiki', 'dewikivoyage', 'vecwiki' ),\n", " skinversion == 'latest',\n", " isanon == 'false',\n", " switch_context %in% c('content-language-switcher', 'interface')) %>% \n", " group_by(button_type, wiki, switch_context) %>%\n", " summarise(n_sessions = n_distinct(session_id)) %>%\n", " spread(switch_context, n_sessions)\n", "\n", "lang_button_switch_bybuttontype" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Since deployment of the fix on 11 June 2021, were are now recording language switches that occur after the user clicks the new button. On 22 June 2021, context fields were added to differentiate click to change the display setting (event.context = interface) from clicks to languages.\n", "\n", "Data above restricted to only AB sessions. There are more langauge switches in the treatment group (button_type = header) but that's because the data for the control group is missing clicks to the language links in the sidebar. I'll run a query and add those numbers to confirm.\n", "\n", "ISSUES:\n", "* There are language list selections associated with sessions where a header was clicked. Need to investigate further.\n", "\n", "Notes: \n", "*A NULL button_type means the user clicked from a skin type other than vector.\n", "* The language-list event is only recorded for direct clicks to links in the sidebar and does not require selecting the NMore button. The language-list sessions listed above include sessions where both a button was clicked and a user clicked the lang list in the sidebar.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lang Link Clicks to Sidebar" ] }, { "cell_type": "code", "execution_count": 438, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "query <- \n", "\n", "\"\n", "SELECT DISTINCT\n", " event.web_session_id as session_id,\n", " wiki as wiki\n", "FROM event.universallanguageselector\n", "WHERE\n", " year = 2021\n", " AND month = 06\n", "-- first full day of events\n", " AND Day >= 23\n", " AND DAy <= 30\n", " AND useragent.is_bot = false\n", " AND event.isAnon = false\n", " AND wiki IN ('frwiktionary', 'hewiki', 'ptwikiversity', 'frwiki', \n", " 'euwiki', 'ptwiki', 'kowiki', 'trwiki', 'srwiki', 'bnwiki', 'dewikivoyage', 'vecwiki')\n", " AND event.action = 'language-change'\n", " AND event.context = 'languages-list'\n", " AND event.skinversion = 'latest'\n", "\"" ] }, { "cell_type": "code", "execution_count": 439, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Don't forget to authenticate with Kerberos using kinit\n", "\n" ] } ], "source": [ "lang_link_sidebar <- wmfdata::query_hive(query)" ] }, { "cell_type": "code", "execution_count": 440, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 11 × 3
wikisidebarbutton_type
<chr><int><chr>
bnwiki 99other
dewikivoyage 5other
euwiki 108other
frwiki 6069other
frwiktionary 176other
hewiki 986other
kowiki 627other
ptwiki 1886other
srwiki 388other
trwiki 806other
vecwiki 39other
\n" ], "text/latex": [ "A tibble: 11 × 3\n", "\\begin{tabular}{lll}\n", " wiki & sidebar & button\\_type\\\\\n", " & & \\\\\n", "\\hline\n", "\t bnwiki & 99 & other\\\\\n", "\t dewikivoyage & 5 & other\\\\\n", "\t euwiki & 108 & other\\\\\n", "\t frwiki & 6069 & other\\\\\n", "\t frwiktionary & 176 & other\\\\\n", "\t hewiki & 986 & other\\\\\n", "\t kowiki & 627 & other\\\\\n", "\t ptwiki & 1886 & other\\\\\n", "\t srwiki & 388 & other\\\\\n", "\t trwiki & 806 & other\\\\\n", "\t vecwiki & 39 & other\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 11 × 3\n", "\n", "| wiki <chr> | sidebar <int> | button_type <chr> |\n", "|---|---|---|\n", "| bnwiki | 99 | other |\n", "| dewikivoyage | 5 | other |\n", "| euwiki | 108 | other |\n", "| frwiki | 6069 | other |\n", "| frwiktionary | 176 | other |\n", "| hewiki | 986 | other |\n", "| kowiki | 627 | other |\n", "| ptwiki | 1886 | other |\n", "| srwiki | 388 | other |\n", "| trwiki | 806 | other |\n", "| vecwiki | 39 | other |\n", "\n" ], "text/plain": [ " wiki sidebar button_type\n", "1 bnwiki 99 other \n", "2 dewikivoyage 5 other \n", "3 euwiki 108 other \n", "4 frwiki 6069 other \n", "5 frwiktionary 176 other \n", "6 hewiki 986 other \n", "7 kowiki 627 other \n", "8 ptwiki 1886 other \n", "9 srwiki 388 other \n", "10 trwiki 806 other \n", "11 vecwiki 39 other " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_link_sidebar_bywiki <- lang_link_sidebar %>%\n", " group_by(wiki) %>%\n", " summarise(sidebar = n_distinct(session_id),\n", " button_type = 'other') # add column to specific this is the control group (button type will be other)\n", "\n", "lang_link_sidebar_bywiki" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we'll merge to have all lang switches together in the same location. " ] }, { "cell_type": "code", "execution_count": 441, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 21 × 5
button_typewikicontent-language-switcherinterfacesidebar
<chr><chr><int><int><int>
headerbnwiki 49NA NA
headereuwiki 105 1 NA
headerfrwiki 2903 1 NA
headerfrwiktionary 67NA NA
headerhewiki 660NA NA
headerkowiki 341 4 NA
headerptwiki 928 1 NA
headersrwiki 135NA NA
headertrwiki 358 1 NA
headervecwiki 1NA NA
other bnwiki 3NA 99
other euwiki 8NA 108
other frwiki 149 16069
other frwiktionary 4NA 176
other hewiki 15NA 986
other kowiki 11NA 627
other ptwiki 50NA1886
other srwiki 35NA 388
other trwiki 30NA 806
other vecwiki 1NA 39
other dewikivoyage NANA 5
\n" ], "text/latex": [ "A grouped\\_df: 21 × 5\n", "\\begin{tabular}{lllll}\n", " button\\_type & wiki & content-language-switcher & interface & sidebar\\\\\n", " & & & & \\\\\n", "\\hline\n", "\t header & bnwiki & 49 & NA & NA\\\\\n", "\t header & euwiki & 105 & 1 & NA\\\\\n", "\t header & frwiki & 2903 & 1 & NA\\\\\n", "\t header & frwiktionary & 67 & NA & NA\\\\\n", "\t header & hewiki & 660 & NA & NA\\\\\n", "\t header & kowiki & 341 & 4 & NA\\\\\n", "\t header & ptwiki & 928 & 1 & NA\\\\\n", "\t header & srwiki & 135 & NA & NA\\\\\n", "\t header & trwiki & 358 & 1 & NA\\\\\n", "\t header & vecwiki & 1 & NA & NA\\\\\n", "\t other & bnwiki & 3 & NA & 99\\\\\n", "\t other & euwiki & 8 & NA & 108\\\\\n", "\t other & frwiki & 149 & 1 & 6069\\\\\n", "\t other & frwiktionary & 4 & NA & 176\\\\\n", "\t other & hewiki & 15 & NA & 986\\\\\n", "\t other & kowiki & 11 & NA & 627\\\\\n", "\t other & ptwiki & 50 & NA & 1886\\\\\n", "\t other & srwiki & 35 & NA & 388\\\\\n", "\t other & trwiki & 30 & NA & 806\\\\\n", "\t other & vecwiki & 1 & NA & 39\\\\\n", "\t other & dewikivoyage & NA & NA & 5\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 21 × 5\n", "\n", "| button_type <chr> | wiki <chr> | content-language-switcher <int> | interface <int> | sidebar <int> |\n", "|---|---|---|---|---|\n", "| header | bnwiki | 49 | NA | NA |\n", "| header | euwiki | 105 | 1 | NA |\n", "| header | frwiki | 2903 | 1 | NA |\n", "| header | frwiktionary | 67 | NA | NA |\n", "| header | hewiki | 660 | NA | NA |\n", "| header | kowiki | 341 | 4 | NA |\n", "| header | ptwiki | 928 | 1 | NA |\n", "| header | srwiki | 135 | NA | NA |\n", "| header | trwiki | 358 | 1 | NA |\n", "| header | vecwiki | 1 | NA | NA |\n", "| other | bnwiki | 3 | NA | 99 |\n", "| other | euwiki | 8 | NA | 108 |\n", "| other | frwiki | 149 | 1 | 6069 |\n", "| other | frwiktionary | 4 | NA | 176 |\n", "| other | hewiki | 15 | NA | 986 |\n", "| other | kowiki | 11 | NA | 627 |\n", "| other | ptwiki | 50 | NA | 1886 |\n", "| other | srwiki | 35 | NA | 388 |\n", "| other | trwiki | 30 | NA | 806 |\n", "| other | vecwiki | 1 | NA | 39 |\n", "| other | dewikivoyage | NA | NA | 5 |\n", "\n" ], "text/plain": [ " button_type wiki content-language-switcher interface sidebar\n", "1 header bnwiki 49 NA NA \n", "2 header euwiki 105 1 NA \n", "3 header frwiki 2903 1 NA \n", "4 header frwiktionary 67 NA NA \n", "5 header hewiki 660 NA NA \n", "6 header kowiki 341 4 NA \n", "7 header ptwiki 928 1 NA \n", "8 header srwiki 135 NA NA \n", "9 header trwiki 358 1 NA \n", "10 header vecwiki 1 NA NA \n", "11 other bnwiki 3 NA 99 \n", "12 other euwiki 8 NA 108 \n", "13 other frwiki 149 1 6069 \n", "14 other frwiktionary 4 NA 176 \n", "15 other hewiki 15 NA 986 \n", "16 other kowiki 11 NA 627 \n", "17 other ptwiki 50 NA 1886 \n", "18 other srwiki 35 NA 388 \n", "19 other trwiki 30 NA 806 \n", "20 other vecwiki 1 NA 39 \n", "21 other dewikivoyage NA NA 5 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_switched_AB <- full_join(lang_button_switch_bybuttontype, lang_link_sidebar_bywiki, \n", " by = c('wiki','button_type')\n", ")\n", "\n", "lang_switched_AB" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Lang Button Switches by Skin Type" ] }, { "cell_type": "code", "execution_count": 215, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'button_type', 'switch_context', 'language_switch' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 10 × 5
button_typeswitch_contextlanguage_switchskinversionn_sessions
<chr><chr><int><chr><int>
headerinterface 1latest 614
headerNULL 0latest 220
NULL interface 1NULL 3
NULL NULL 0NULL 36
other interface 1latest 8197
other interface 1legacy71475
other languages-list1latest 5604
other languages-list1legacy 6
other NULL 0latest 3076
other NULL 0legacy77735
\n" ], "text/latex": [ "A grouped\\_df: 10 × 5\n", "\\begin{tabular}{lllll}\n", " button\\_type & switch\\_context & language\\_switch & skinversion & n\\_sessions\\\\\n", " & & & & \\\\\n", "\\hline\n", "\t header & interface & 1 & latest & 614\\\\\n", "\t header & NULL & 0 & latest & 220\\\\\n", "\t NULL & interface & 1 & NULL & 3\\\\\n", "\t NULL & NULL & 0 & NULL & 36\\\\\n", "\t other & interface & 1 & latest & 8197\\\\\n", "\t other & interface & 1 & legacy & 71475\\\\\n", "\t other & languages-list & 1 & latest & 5604\\\\\n", "\t other & languages-list & 1 & legacy & 6\\\\\n", "\t other & NULL & 0 & latest & 3076\\\\\n", "\t other & NULL & 0 & legacy & 77735\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 10 × 5\n", "\n", "| button_type <chr> | switch_context <chr> | language_switch <int> | skinversion <chr> | n_sessions <int> |\n", "|---|---|---|---|---|\n", "| header | interface | 1 | latest | 614 |\n", "| header | NULL | 0 | latest | 220 |\n", "| NULL | interface | 1 | NULL | 3 |\n", "| NULL | NULL | 0 | NULL | 36 |\n", "| other | interface | 1 | latest | 8197 |\n", "| other | interface | 1 | legacy | 71475 |\n", "| other | languages-list | 1 | latest | 5604 |\n", "| other | languages-list | 1 | legacy | 6 |\n", "| other | NULL | 0 | latest | 3076 |\n", "| other | NULL | 0 | legacy | 77735 |\n", "\n" ], "text/plain": [ " button_type switch_context language_switch skinversion n_sessions\n", "1 header interface 1 latest 614 \n", "2 header NULL 0 latest 220 \n", "3 NULL interface 1 NULL 3 \n", "4 NULL NULL 0 NULL 36 \n", "5 other interface 1 latest 8197 \n", "6 other interface 1 legacy 71475 \n", "7 other languages-list 1 latest 5604 \n", "8 other languages-list 1 legacy 6 \n", "9 other NULL 0 latest 3076 \n", "10 other NULL 0 legacy 77735 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_button_switch_byskintype <- lang_button_switch_events %>%\n", " group_by(button_type, switch_context, language_switch, skinversion) %>%\n", " summarise(n_sessions = n_distinct(session_id))\n", "\n", "lang_button_switch_byskintype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Time to Change Language" ] }, { "cell_type": "code", "execution_count": 219, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'button_type', 'switch_context' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 28230 × 4
button_typeswitch_contextswitch_timen_sessions
<chr><chr><chr><int>
headerinterface10017.0 1
headerinterface10019.0 1
headerinterface10033.9001464843751
headerinterface100522.899902343751
headerinterface10064.39990234375 1
headerinterface10088.89990234375 1
headerinterface10102.9001464843751
headerinterface101213.600097656251
headerinterface10155.9001464843751
headerinterface10161.5 1
headerinterface10190.39990234375 1
headerinterface10196.4001464843751
headerinterface10205.8000488281251
headerinterface10209.3000488281251
headerinterface10210.0 1
headerinterface10220.4001464843751
headerinterface10240.39990234375 1
headerinterface10260.2451171875 1
headerinterface10282.6999511718751
headerinterface102956.0 1
headerinterface10310.7001953125 1
headerinterface10314.89990234375 1
headerinterface10350.10009765625 1
headerinterface10404.0 1
headerinterface10410.0 1
headerinterface10436.0 1
headerinterface10448.2998046875 1
headerinterface10474.0 1
headerinterface10489.30029296875 1
headerinterface10501.0 1
otherlanguages-list9946.9001464843751
otherlanguages-list9948.9001464843751
otherlanguages-list9949.5 1
otherlanguages-list9952.3000488281251
otherlanguages-list9955.0 1
otherlanguages-list9956.2998046875 1
otherlanguages-list9958.1999511718751
otherlanguages-list99629.5 1
otherlanguages-list9964.8000488281251
otherlanguages-list9965.0 1
otherlanguages-list99685.199951171881
otherlanguages-list9969.0 1
otherlanguages-list9969.10009765625 1
otherlanguages-list9970.0 1
otherlanguages-list9970.7998046875 1
otherlanguages-list9972.0 1
otherlanguages-list997265.0 1
otherlanguages-list99734.0 1
otherlanguages-list9977.1198730468751
otherlanguages-list9979.8000488281251
otherlanguages-list998.0 1
otherlanguages-list9981.8000488281251
otherlanguages-list998152.0 1
otherlanguages-list9982.39990234375 1
otherlanguages-list9988.0 1
otherlanguages-list9989.0 1
otherlanguages-list9992.4001464843751
otherlanguages-list99969.300537109381
otherlanguages-list9997.0 1
otherlanguages-list9999.0 1
\n" ], "text/latex": [ "A grouped\\_df: 28230 × 4\n", "\\begin{tabular}{llll}\n", " button\\_type & switch\\_context & switch\\_time & n\\_sessions\\\\\n", " & & & \\\\\n", "\\hline\n", "\t header & interface & 10017.0 & 1\\\\\n", "\t header & interface & 10019.0 & 1\\\\\n", "\t header & interface & 10033.900146484375 & 1\\\\\n", "\t header & interface & 100522.89990234375 & 1\\\\\n", "\t header & interface & 10064.39990234375 & 1\\\\\n", "\t header & interface & 10088.89990234375 & 1\\\\\n", "\t header & interface & 10102.900146484375 & 1\\\\\n", "\t header & interface & 101213.60009765625 & 1\\\\\n", "\t header & interface & 10155.900146484375 & 1\\\\\n", "\t header & interface & 10161.5 & 1\\\\\n", "\t header & interface & 10190.39990234375 & 1\\\\\n", "\t header & interface & 10196.400146484375 & 1\\\\\n", "\t header & interface & 10205.800048828125 & 1\\\\\n", "\t header & interface & 10209.300048828125 & 1\\\\\n", "\t header & interface & 10210.0 & 1\\\\\n", "\t header & interface & 10220.400146484375 & 1\\\\\n", "\t header & interface & 10240.39990234375 & 1\\\\\n", "\t header & interface & 10260.2451171875 & 1\\\\\n", "\t header & interface & 10282.699951171875 & 1\\\\\n", "\t header & interface & 102956.0 & 1\\\\\n", "\t header & interface & 10310.7001953125 & 1\\\\\n", "\t header & interface & 10314.89990234375 & 1\\\\\n", "\t header & interface & 10350.10009765625 & 1\\\\\n", "\t header & interface & 10404.0 & 1\\\\\n", "\t header & interface & 10410.0 & 1\\\\\n", "\t header & interface & 10436.0 & 1\\\\\n", "\t header & interface & 10448.2998046875 & 1\\\\\n", "\t header & interface & 10474.0 & 1\\\\\n", "\t header & interface & 10489.30029296875 & 1\\\\\n", "\t header & interface & 10501.0 & 1\\\\\n", "\t ⋮ & ⋮ & ⋮ & ⋮\\\\\n", "\t other & languages-list & 9946.900146484375 & 1\\\\\n", "\t other & languages-list & 9948.900146484375 & 1\\\\\n", "\t other & languages-list & 9949.5 & 1\\\\\n", "\t other & languages-list & 9952.300048828125 & 1\\\\\n", "\t other & languages-list & 9955.0 & 1\\\\\n", "\t other & languages-list & 9956.2998046875 & 1\\\\\n", "\t other & languages-list & 9958.199951171875 & 1\\\\\n", "\t other & languages-list & 99629.5 & 1\\\\\n", "\t other & languages-list & 9964.800048828125 & 1\\\\\n", "\t other & languages-list & 9965.0 & 1\\\\\n", "\t other & languages-list & 99685.19995117188 & 1\\\\\n", "\t other & languages-list & 9969.0 & 1\\\\\n", "\t other & languages-list & 9969.10009765625 & 1\\\\\n", "\t other & languages-list & 9970.0 & 1\\\\\n", "\t other & languages-list & 9970.7998046875 & 1\\\\\n", "\t other & languages-list & 9972.0 & 1\\\\\n", "\t other & languages-list & 997265.0 & 1\\\\\n", "\t other & languages-list & 99734.0 & 1\\\\\n", "\t other & languages-list & 9977.119873046875 & 1\\\\\n", "\t other & languages-list & 9979.800048828125 & 1\\\\\n", "\t other & languages-list & 998.0 & 1\\\\\n", "\t other & languages-list & 9981.800048828125 & 1\\\\\n", "\t other & languages-list & 998152.0 & 1\\\\\n", "\t other & languages-list & 9982.39990234375 & 1\\\\\n", "\t other & languages-list & 9988.0 & 1\\\\\n", "\t other & languages-list & 9989.0 & 1\\\\\n", "\t other & languages-list & 9992.400146484375 & 1\\\\\n", "\t other & languages-list & 99969.30053710938 & 1\\\\\n", "\t other & languages-list & 9997.0 & 1\\\\\n", "\t other & languages-list & 9999.0 & 1\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 28230 × 4\n", "\n", "| button_type <chr> | switch_context <chr> | switch_time <chr> | n_sessions <int> |\n", "|---|---|---|---|\n", "| header | interface | 10017.0 | 1 |\n", "| header | interface | 10019.0 | 1 |\n", "| header | interface | 10033.900146484375 | 1 |\n", "| header | interface | 100522.89990234375 | 1 |\n", "| header | interface | 10064.39990234375 | 1 |\n", "| header | interface | 10088.89990234375 | 1 |\n", "| header | interface | 10102.900146484375 | 1 |\n", "| header | interface | 101213.60009765625 | 1 |\n", "| header | interface | 10155.900146484375 | 1 |\n", "| header | interface | 10161.5 | 1 |\n", "| header | interface | 10190.39990234375 | 1 |\n", "| header | interface | 10196.400146484375 | 1 |\n", "| header | interface | 10205.800048828125 | 1 |\n", "| header | interface | 10209.300048828125 | 1 |\n", "| header | interface | 10210.0 | 1 |\n", "| header | interface | 10220.400146484375 | 1 |\n", "| header | interface | 10240.39990234375 | 1 |\n", "| header | interface | 10260.2451171875 | 1 |\n", "| header | interface | 10282.699951171875 | 1 |\n", "| header | interface | 102956.0 | 1 |\n", "| header | interface | 10310.7001953125 | 1 |\n", "| header | interface | 10314.89990234375 | 1 |\n", "| header | interface | 10350.10009765625 | 1 |\n", "| header | interface | 10404.0 | 1 |\n", "| header | interface | 10410.0 | 1 |\n", "| header | interface | 10436.0 | 1 |\n", "| header | interface | 10448.2998046875 | 1 |\n", "| header | interface | 10474.0 | 1 |\n", "| header | interface | 10489.30029296875 | 1 |\n", "| header | interface | 10501.0 | 1 |\n", "| ⋮ | ⋮ | ⋮ | ⋮ |\n", "| other | languages-list | 9946.900146484375 | 1 |\n", "| other | languages-list | 9948.900146484375 | 1 |\n", "| other | languages-list | 9949.5 | 1 |\n", "| other | languages-list | 9952.300048828125 | 1 |\n", "| other | languages-list | 9955.0 | 1 |\n", "| other | languages-list | 9956.2998046875 | 1 |\n", "| other | languages-list | 9958.199951171875 | 1 |\n", "| other | languages-list | 99629.5 | 1 |\n", "| other | languages-list | 9964.800048828125 | 1 |\n", "| other | languages-list | 9965.0 | 1 |\n", "| other | languages-list | 99685.19995117188 | 1 |\n", "| other | languages-list | 9969.0 | 1 |\n", "| other | languages-list | 9969.10009765625 | 1 |\n", "| other | languages-list | 9970.0 | 1 |\n", "| other | languages-list | 9970.7998046875 | 1 |\n", "| other | languages-list | 9972.0 | 1 |\n", "| other | languages-list | 997265.0 | 1 |\n", "| other | languages-list | 99734.0 | 1 |\n", "| other | languages-list | 9977.119873046875 | 1 |\n", "| other | languages-list | 9979.800048828125 | 1 |\n", "| other | languages-list | 998.0 | 1 |\n", "| other | languages-list | 9981.800048828125 | 1 |\n", "| other | languages-list | 998152.0 | 1 |\n", "| other | languages-list | 9982.39990234375 | 1 |\n", "| other | languages-list | 9988.0 | 1 |\n", "| other | languages-list | 9989.0 | 1 |\n", "| other | languages-list | 9992.400146484375 | 1 |\n", "| other | languages-list | 99969.30053710938 | 1 |\n", "| other | languages-list | 9997.0 | 1 |\n", "| other | languages-list | 9999.0 | 1 |\n", "\n" ], "text/plain": [ " button_type switch_context switch_time n_sessions\n", "1 header interface 10017.0 1 \n", "2 header interface 10019.0 1 \n", "3 header interface 10033.900146484375 1 \n", "4 header interface 100522.89990234375 1 \n", "5 header interface 10064.39990234375 1 \n", "6 header interface 10088.89990234375 1 \n", "7 header interface 10102.900146484375 1 \n", "8 header interface 101213.60009765625 1 \n", "9 header interface 10155.900146484375 1 \n", "10 header interface 10161.5 1 \n", "11 header interface 10190.39990234375 1 \n", "12 header interface 10196.400146484375 1 \n", "13 header interface 10205.800048828125 1 \n", "14 header interface 10209.300048828125 1 \n", "15 header interface 10210.0 1 \n", "16 header interface 10220.400146484375 1 \n", "17 header interface 10240.39990234375 1 \n", "18 header interface 10260.2451171875 1 \n", "19 header interface 10282.699951171875 1 \n", "20 header interface 102956.0 1 \n", "21 header interface 10310.7001953125 1 \n", "22 header interface 10314.89990234375 1 \n", "23 header interface 10350.10009765625 1 \n", "24 header interface 10404.0 1 \n", "25 header interface 10410.0 1 \n", "26 header interface 10436.0 1 \n", "27 header interface 10448.2998046875 1 \n", "28 header interface 10474.0 1 \n", "29 header interface 10489.30029296875 1 \n", "30 header interface 10501.0 1 \n", "⋮ ⋮ ⋮ ⋮ ⋮ \n", "28201 other languages-list 9946.900146484375 1 \n", "28202 other languages-list 9948.900146484375 1 \n", "28203 other languages-list 9949.5 1 \n", "28204 other languages-list 9952.300048828125 1 \n", "28205 other languages-list 9955.0 1 \n", "28206 other languages-list 9956.2998046875 1 \n", "28207 other languages-list 9958.199951171875 1 \n", "28208 other languages-list 99629.5 1 \n", "28209 other languages-list 9964.800048828125 1 \n", "28210 other languages-list 9965.0 1 \n", "28211 other languages-list 99685.19995117188 1 \n", "28212 other languages-list 9969.0 1 \n", "28213 other languages-list 9969.10009765625 1 \n", "28214 other languages-list 9970.0 1 \n", "28215 other languages-list 9970.7998046875 1 \n", "28216 other languages-list 9972.0 1 \n", "28217 other languages-list 997265.0 1 \n", "28218 other languages-list 99734.0 1 \n", "28219 other languages-list 9977.119873046875 1 \n", "28220 other languages-list 9979.800048828125 1 \n", "28221 other languages-list 998.0 1 \n", "28222 other languages-list 9981.800048828125 1 \n", "28223 other languages-list 998152.0 1 \n", "28224 other languages-list 9982.39990234375 1 \n", "28225 other languages-list 9988.0 1 \n", "28226 other languages-list 9989.0 1 \n", "28227 other languages-list 9992.400146484375 1 \n", "28228 other languages-list 99969.30053710938 1 \n", "28229 other languages-list 9997.0 1 \n", "28230 other languages-list 9999.0 1 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_button_switch_bytime<- lang_button_switch_events %>%\n", " filter(language_switch == 1,\n", " skinversion == \"latest\") %>%\n", " group_by(button_type, switch_context, switch_time) %>%\n", " summarise(n_sessions = n_distinct(session_id))\n", "\n", "lang_button_switch_bytime" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Confirmed time to change is recorded." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Check Sidebar Clicks with added skin context fields" ] }, { "cell_type": "code", "execution_count": 272, "metadata": {}, "outputs": [], "source": [ "query <- \n", "\"\n", "SELECT\n", " TO_DATE(dt) AS `date`,\n", " wiki,\n", " event.web_session_id,\n", " event.usereditbucket,\n", " event.skin,\n", " event.skinVersion,\n", " event.timetochangelanguage,\n", " event.isanon,\n", " event.action,\n", " event.interfacelanguage,\n", " event.contentlanguage,\n", " event.selectedinterfacelanguage,\n", " Count(*) AS n_events\n", "FROM event.universallanguageselector\n", "WHERE\n", " year = 2021\n", " AND month = 06\n", " AND day >= 11\n", " AND event.context = 'languages-list'\n", " AND useragent.is_bot = false\n", "GROUP BY\n", " TO_DATE(dt),\n", " wiki,\n", " event.web_session_id,\n", " event.usereditbucket,\n", " event.skin,\n", " event.skinVersion,\n", " event.timetochangelanguage,\n", " event.isanon,\n", " event.action,\n", " event.interfacelanguage,\n", " event.contentlanguage,\n", " event.selectedinterfacelanguage\n", "\"" ] }, { "cell_type": "code", "execution_count": 273, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Don't forget to authenticate with Kerberos using kinit\n", "\n" ] } ], "source": [ "lang_sidebar_events <- wmfdata::query_hive(query)" ] }, { "cell_type": "code", "execution_count": 274, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'skin', 'skinversion' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 4 × 4
skinskinversionisanonn_sessions
<chr><chr><chr><int>
NULL NULL false 5
NULL NULL true 162
vectorlatestfalse 44738
vectorlatesttrue 1658646
\n" ], "text/latex": [ "A grouped\\_df: 4 × 4\n", "\\begin{tabular}{llll}\n", " skin & skinversion & isanon & n\\_sessions\\\\\n", " & & & \\\\\n", "\\hline\n", "\t NULL & NULL & false & 5\\\\\n", "\t NULL & NULL & true & 162\\\\\n", "\t vector & latest & false & 44738\\\\\n", "\t vector & latest & true & 1658646\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 4 × 4\n", "\n", "| skin <chr> | skinversion <chr> | isanon <chr> | n_sessions <int> |\n", "|---|---|---|---|\n", "| NULL | NULL | false | 5 |\n", "| NULL | NULL | true | 162 |\n", "| vector | latest | false | 44738 |\n", "| vector | latest | true | 1658646 |\n", "\n" ], "text/plain": [ " skin skinversion isanon n_sessions\n", "1 NULL NULL false 5 \n", "2 NULL NULL true 162 \n", "3 vector latest false 44738 \n", "4 vector latest true 1658646 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_sidebar_events_byskintype<- lang_sidebar_events %>%\n", " group_by(skin, skinversion, isanon) %>%\n", " summarize(n_sessions = n_distinct(web_session_id))\n", "\n", "lang_sidebar_events_byskintype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are recording clicks to the language list in the sidebar either the latest version of vector or NULL. This is expected as instrumentation for sidebar clicks was limited to language list. Further instrumentation here to clarfiy NULL values would be helpful but it is assumed that any events identifed with a NULL skin type came from non latest vector.\n", "\n", "This instrumentation will be good for the AB Test since we will only be looking users on the latest vector skin; however, further clarification of these other skin types will be useful in the future in case we want to know percent of users clicking on these links from legacy vector vs other skin types." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### By Non-Test Wiki" ] }, { "cell_type": "code", "execution_count": 172, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'skin', 'skinversion' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\n", "
A grouped_df: 2 × 4
skinskinversionisanonn_sessions
<chr><chr><chr><int>
vectorlatestfalse5
vectorlatesttrue 4
\n" ], "text/latex": [ "A grouped\\_df: 2 × 4\n", "\\begin{tabular}{llll}\n", " skin & skinversion & isanon & n\\_sessions\\\\\n", " & & & \\\\\n", "\\hline\n", "\t vector & latest & false & 5\\\\\n", "\t vector & latest & true & 4\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 2 × 4\n", "\n", "| skin <chr> | skinversion <chr> | isanon <chr> | n_sessions <int> |\n", "|---|---|---|---|\n", "| vector | latest | false | 5 |\n", "| vector | latest | true | 4 |\n", "\n" ], "text/plain": [ " skin skinversion isanon n_sessions\n", "1 vector latest false 5 \n", "2 vector latest true 4 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_sidebar_events_byskintype_nontestwiki <- lang_sidebar_events %>%\n", " filter(!wiki %in% c('frwiktionary', 'hewiki', 'ptwikiversity', 'frwiki', \n", " 'euwiki', 'fawiki', 'ptwiki', 'kowiki', 'trwiki', 'srwiki', 'bnwiki', 'dewikivoyage', 'vecwiki' )) %>%\n", " group_by(skin, skinversion, isanon) %>%\n", " summarize(n_sessions = n_distinct(web_session_id))\n", "\n", "lang_sidebar_events_byskintype_nontestwiki" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## By Test Wiki" ] }, { "cell_type": "code", "execution_count": 275, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'skin', 'skinversion' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 4 × 4
skinskinversionisanonn_sessions
<chr><chr><chr><int>
NULL NULL false 5
NULL NULL true 162
vectorlatestfalse 44680
vectorlatesttrue 1658560
\n" ], "text/latex": [ "A grouped\\_df: 4 × 4\n", "\\begin{tabular}{llll}\n", " skin & skinversion & isanon & n\\_sessions\\\\\n", " & & & \\\\\n", "\\hline\n", "\t NULL & NULL & false & 5\\\\\n", "\t NULL & NULL & true & 162\\\\\n", "\t vector & latest & false & 44680\\\\\n", "\t vector & latest & true & 1658560\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 4 × 4\n", "\n", "| skin <chr> | skinversion <chr> | isanon <chr> | n_sessions <int> |\n", "|---|---|---|---|\n", "| NULL | NULL | false | 5 |\n", "| NULL | NULL | true | 162 |\n", "| vector | latest | false | 44680 |\n", "| vector | latest | true | 1658560 |\n", "\n" ], "text/plain": [ " skin skinversion isanon n_sessions\n", "1 NULL NULL false 5 \n", "2 NULL NULL true 162 \n", "3 vector latest false 44680 \n", "4 vector latest true 1658560 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_sidebar_events_byskintype_testwiki <- lang_sidebar_events %>%\n", " filter(wiki %in% c('frwiktionary', 'hewiki', 'ptwikiversity', 'frwiki', \n", " 'euwiki', 'fawiki', 'ptwiki', 'kowiki', 'trwiki', 'srwiki', 'bnwiki', 'dewikivoyage', 'vecwiki' )) %>%\n", " group_by(skin, skinversion, isanon) %>%\n", " summarize(n_sessions = n_distinct(web_session_id))\n", "\n", "lang_sidebar_events_byskintype_testwiki" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the AB test has not started, we are not logging a lot of clicks by logged in or logged out users to the lang links on the sidebar on non-test wikis (the new search button was deployed as to all logged-in users on latest vector). \n", "\n", "Note: There are some sessions by both logged-in and logged-out users on the latest vector recorded as having click a lang list link. Not sure where these are coming from but it's such a small percentage it should not impact the AB test.\n", "\n", "The majority of clicks occur on test wikis by logged-out as expected. \n", "\n", "Once the AB test runs, any clicks to the lang links in the sidebar should only be recorded by users in the control group. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### By Action" ] }, { "cell_type": "code", "execution_count": 276, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\n", "
A tibble: 1 × 2
actionn_sessions
<chr><int>
language-change1702128
\n" ], "text/latex": [ "A tibble: 1 × 2\n", "\\begin{tabular}{ll}\n", " action & n\\_sessions\\\\\n", " & \\\\\n", "\\hline\n", "\t language-change & 1702128\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 1 × 2\n", "\n", "| action <chr> | n_sessions <int> |\n", "|---|---|\n", "| language-change | 1702128 |\n", "\n" ], "text/plain": [ " action n_sessions\n", "1 language-change 1702128 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_sidebar_events_byaction<- lang_sidebar_events %>%\n", " group_by(action) %>%\n", " summarize(n_sessions = n_distinct(web_session_id))\n", "\n", "lang_sidebar_events_byaction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Only associated with `event.action = 'language-change'` as expected." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Check any follow-up actions" ] }, { "cell_type": "code", "execution_count": 277, "metadata": {}, "outputs": [], "source": [ "#rough query to confirm approach\n", "# will be refine in analysis\n", "query <- \n", "\"\n", "-- sessions where lang button was selected\n", "WITH button AS (\n", "SELECT\n", " MIN(TO_DATE(dt)) as button_date,\n", " event.web_session_id as session_id,\n", " event.skinVersion as skinversion,\n", " event.context as button_type,\n", " wiki as wiki\n", "FROM event.universallanguageselector\n", "WHERE\n", " year = 2021\n", " AND month = 06\n", " AND Day >= 11\n", " AND useragent.is_bot = false\n", " AND event.action = 'compact-language-links-open'\n", "GROUP BY \n", " event.web_session_id,\n", " event.context,\n", " event.skinversion,\n", " wiki\n", "),\n", "\n", "follow_actions AS (\n", " SELECT\n", " TO_DATE(dt) as action_date,\n", " event.action as action_type,\n", " event.web_session_id as session_id,\n", " event.context as action_context,\n", " wiki as wiki\n", "FROM event.universallanguageselector\n", "WHERE\n", " year = 2021\n", " AND month = 06\n", " AND Day >= 11\n", " AND useragent.is_bot = false\n", " AND event.action != 'compact-language-links-open'\n", ")\n", "\n", "SELECT\n", " button.button_date,\n", " button.button_type,\n", " button.skinversion,\n", " follow_actions.action_date,\n", " follow_actions.action_type,\n", " button.session_id,\n", " button.wiki,\n", "-- sessions with lang switch that occured after button clicks\n", " IF(follow_actions.session_id IS NOT NULL, 1, 0) AS follow_action,\n", " follow_actions.action_context\n", "FROM button\n", "LEFT JOIN follow_actions ON\n", " button.session_id = follow_actions.session_id\n", " \n", "\"" ] }, { "cell_type": "code", "execution_count": 278, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Don't forget to authenticate with Kerberos using kinit\n", "\n", "Warning message in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :\n", "“embedded nul(s) found in input”\n" ] } ], "source": [ "more_lang_actions <- wmfdata::query_hive(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### By Action Type" ] }, { "cell_type": "code", "execution_count": 279, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 11 × 2
action_typen_sessions
<chr><int>
font-change 13
ime-change 263
ime-disable 52
ime-enable 143
language-change 284961
more-languages-access 94
no-search-results 4078
NULL 45822
settings-open 198
webfonts-disable 3
webfonts-enable 29
\n" ], "text/latex": [ "A tibble: 11 × 2\n", "\\begin{tabular}{ll}\n", " action\\_type & n\\_sessions\\\\\n", " & \\\\\n", "\\hline\n", "\t font-change & 13\\\\\n", "\t ime-change & 263\\\\\n", "\t ime-disable & 52\\\\\n", "\t ime-enable & 143\\\\\n", "\t language-change & 284961\\\\\n", "\t more-languages-access & 94\\\\\n", "\t no-search-results & 4078\\\\\n", "\t NULL & 45822\\\\\n", "\t settings-open & 198\\\\\n", "\t webfonts-disable & 3\\\\\n", "\t webfonts-enable & 29\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 11 × 2\n", "\n", "| action_type <chr> | n_sessions <int> |\n", "|---|---|\n", "| font-change | 13 |\n", "| ime-change | 263 |\n", "| ime-disable | 52 |\n", "| ime-enable | 143 |\n", "| language-change | 284961 |\n", "| more-languages-access | 94 |\n", "| no-search-results | 4078 |\n", "| NULL | 45822 |\n", "| settings-open | 198 |\n", "| webfonts-disable | 3 |\n", "| webfonts-enable | 29 |\n", "\n" ], "text/plain": [ " action_type n_sessions\n", "1 font-change 13 \n", "2 ime-change 263 \n", "3 ime-disable 52 \n", "4 ime-enable 143 \n", "5 language-change 284961 \n", "6 more-languages-access 94 \n", "7 no-search-results 4078 \n", "8 NULL 45822 \n", "9 settings-open 198 \n", "10 webfonts-disable 3 \n", "11 webfonts-enable 29 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# find what types of actions are recorded with a new button clicks\n", "new_button_actions <- more_lang_actions %>%\n", " filter(button_type == 'header') %>%\n", " group_by(action_type) %>%\n", " summarise (n_sessions = n_distinct(session_id))\n", "\n", "new_button_actions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So far, we have recorded 'ime-change' (user changed the input method), 'no-search-results' (User searched for a language with no results) and 'webfonts-enable' (webfonts-enable: User enabled the webfonts functionality via ULS settings) actions in sessions where clicks to the new button were logged. \n", "\n", "With the fix, we are now also recording language-change events." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### By Action Type and Button Type" ] }, { "cell_type": "code", "execution_count": 180, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'action_type' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 20 × 3
action_typebutton_typen_sessions
<chr><chr><int>
ime-change header 3
language-change header 219
more-languages-accessheader 1
no-search-results header 8
NULL header 79
webfonts-enable header 1
language-change NULL 2
no-search-results NULL 2
NULL NULL 13
font-change other 1
ime-change other 71
ime-disable other 24
ime-enable other 47
language-change other 26266
more-languages-accessother 32
no-search-results other 4449
NULL other 22555
settings-open other 1471
ui-lang-revert other 1
webfonts-enable other 7
\n" ], "text/latex": [ "A grouped\\_df: 20 × 3\n", "\\begin{tabular}{lll}\n", " action\\_type & button\\_type & n\\_sessions\\\\\n", " & & \\\\\n", "\\hline\n", "\t ime-change & header & 3\\\\\n", "\t language-change & header & 219\\\\\n", "\t more-languages-access & header & 1\\\\\n", "\t no-search-results & header & 8\\\\\n", "\t NULL & header & 79\\\\\n", "\t webfonts-enable & header & 1\\\\\n", "\t language-change & NULL & 2\\\\\n", "\t no-search-results & NULL & 2\\\\\n", "\t NULL & NULL & 13\\\\\n", "\t font-change & other & 1\\\\\n", "\t ime-change & other & 71\\\\\n", "\t ime-disable & other & 24\\\\\n", "\t ime-enable & other & 47\\\\\n", "\t language-change & other & 26266\\\\\n", "\t more-languages-access & other & 32\\\\\n", "\t no-search-results & other & 4449\\\\\n", "\t NULL & other & 22555\\\\\n", "\t settings-open & other & 1471\\\\\n", "\t ui-lang-revert & other & 1\\\\\n", "\t webfonts-enable & other & 7\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 20 × 3\n", "\n", "| action_type <chr> | button_type <chr> | n_sessions <int> |\n", "|---|---|---|\n", "| ime-change | header | 3 |\n", "| language-change | header | 219 |\n", "| more-languages-access | header | 1 |\n", "| no-search-results | header | 8 |\n", "| NULL | header | 79 |\n", "| webfonts-enable | header | 1 |\n", "| language-change | NULL | 2 |\n", "| no-search-results | NULL | 2 |\n", "| NULL | NULL | 13 |\n", "| font-change | other | 1 |\n", "| ime-change | other | 71 |\n", "| ime-disable | other | 24 |\n", "| ime-enable | other | 47 |\n", "| language-change | other | 26266 |\n", "| more-languages-access | other | 32 |\n", "| no-search-results | other | 4449 |\n", "| NULL | other | 22555 |\n", "| settings-open | other | 1471 |\n", "| ui-lang-revert | other | 1 |\n", "| webfonts-enable | other | 7 |\n", "\n" ], "text/plain": [ " action_type button_type n_sessions\n", "1 ime-change header 3 \n", "2 language-change header 219 \n", "3 more-languages-access header 1 \n", "4 no-search-results header 8 \n", "5 NULL header 79 \n", "6 webfonts-enable header 1 \n", "7 language-change NULL 2 \n", "8 no-search-results NULL 2 \n", "9 NULL NULL 13 \n", "10 font-change other 1 \n", "11 ime-change other 71 \n", "12 ime-disable other 24 \n", "13 ime-enable other 47 \n", "14 language-change other 26266 \n", "15 more-languages-access other 32 \n", "16 no-search-results other 4449 \n", "17 NULL other 22555 \n", "18 settings-open other 1471 \n", "19 ui-lang-revert other 1 \n", "20 webfonts-enable other 7 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# find what types of actions are recorded with all button clicks\n", "all_button_actions <- more_lang_actions %>%\n", " group_by(action_type, button_type) %>%\n", " summarise (n_sessions = n_distinct(session_id)) %>%\n", " arrange(button_type)\n", "\n", "all_button_actions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some of the actions not recorded for the new clicks button make sense. For example, we shouldn't be recording settings-open events in sessions with clicks to the new language button but I would anticipate there being clicks to switch languages from the button." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Check new content-language-switcher event" ] }, { "cell_type": "code", "execution_count": 341, "metadata": {}, "outputs": [], "source": [ "query <- \n", "\"\n", "SELECT\n", " event.web_session_id as session_id,\n", " event.skinVersion as skinversion,\n", " wiki as wiki\n", "FROM event.universallanguageselector\n", "WHERE\n", " year = 2021\n", " AND month = 06\n", " AND Day >= 22\n", " AND useragent.is_bot = false\n", " AND event.context = 'content-language-switcher'\n", " AND event.action = 'language-change'\n", "GROUP BY \n", " event.web_session_id,\n", " event.skinVersion,\n", " wiki\n", ")\n", "\"" ] }, { "cell_type": "code", "execution_count": 342, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Don't forget to authenticate with Kerberos using kinit\n", "\n", "Warning message in system(cmd, intern = TRUE):\n", "“running command 'export HADOOP_HEAPSIZE=1024 && ionice nice hive -S -f ./temp_query11634119c207.hql 2>&1 > ./temp_results116316d08b51.tsv' had status 20”\n" ] }, { "ename": "ERROR", "evalue": "Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input\n", "output_type": "error", "traceback": [ "Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input\nTraceback:\n", "1. wmfdata::query_hive(query)", "2. utils::read.delim(results_dump, sep = \"\\t\", quote = \"\", as.is = TRUE, \n . header = TRUE)", "3. read.table(file = file, header = header, sep = sep, quote = quote, \n . dec = dec, fill = fill, comment.char = comment.char, ...)", "4. stop(\"no lines available in input\")" ] } ], "source": [ "lang_content_switches <- wmfdata::query_hive(query)" ] }, { "cell_type": "code", "execution_count": 301, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 6 × 6
datesession_idskinversionisanonactionwiki
<chr><chr><chr><chr><chr><chr>
12021-06-23053a2ed57541904671e1latesttruelanguage-changetrwiki
22021-06-2328f7400c785a050e1466latesttruelanguage-changefrwiki
32021-06-23c2ebfd1bede689af758clegacytruelanguage-changeenwiki
42021-06-2338d8c82b0d93a8db81b0latesttruelanguage-changehewiki
52021-06-2311e1d5c96a48c08365b2legacytruelanguage-changeeswiki
62021-06-238f1fb84ee0abe3ea118blegacytruelanguage-changeenwiki
\n" ], "text/latex": [ "A data.frame: 6 × 6\n", "\\begin{tabular}{r|llllll}\n", " & date & session\\_id & skinversion & isanon & action & wiki\\\\\n", " & & & & & & \\\\\n", "\\hline\n", "\t1 & 2021-06-23 & 053a2ed57541904671e1 & latest & true & language-change & trwiki\\\\\n", "\t2 & 2021-06-23 & 28f7400c785a050e1466 & latest & true & language-change & frwiki\\\\\n", "\t3 & 2021-06-23 & c2ebfd1bede689af758c & legacy & true & language-change & enwiki\\\\\n", "\t4 & 2021-06-23 & 38d8c82b0d93a8db81b0 & latest & true & language-change & hewiki\\\\\n", "\t5 & 2021-06-23 & 11e1d5c96a48c08365b2 & legacy & true & language-change & eswiki\\\\\n", "\t6 & 2021-06-23 & 8f1fb84ee0abe3ea118b & legacy & true & language-change & enwiki\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 6 × 6\n", "\n", "| | date <chr> | session_id <chr> | skinversion <chr> | isanon <chr> | action <chr> | wiki <chr> |\n", "|---|---|---|---|---|---|---|\n", "| 1 | 2021-06-23 | 053a2ed57541904671e1 | latest | true | language-change | trwiki |\n", "| 2 | 2021-06-23 | 28f7400c785a050e1466 | latest | true | language-change | frwiki |\n", "| 3 | 2021-06-23 | c2ebfd1bede689af758c | legacy | true | language-change | enwiki |\n", "| 4 | 2021-06-23 | 38d8c82b0d93a8db81b0 | latest | true | language-change | hewiki |\n", "| 5 | 2021-06-23 | 11e1d5c96a48c08365b2 | legacy | true | language-change | eswiki |\n", "| 6 | 2021-06-23 | 8f1fb84ee0abe3ea118b | legacy | true | language-change | enwiki |\n", "\n" ], "text/plain": [ " date session_id skinversion isanon action wiki \n", "1 2021-06-23 053a2ed57541904671e1 latest true language-change trwiki\n", "2 2021-06-23 28f7400c785a050e1466 latest true language-change frwiki\n", "3 2021-06-23 c2ebfd1bede689af758c legacy true language-change enwiki\n", "4 2021-06-23 38d8c82b0d93a8db81b0 latest true language-change hewiki\n", "5 2021-06-23 11e1d5c96a48c08365b2 legacy true language-change eswiki\n", "6 2021-06-23 8f1fb84ee0abe3ea118b legacy true language-change enwiki" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "head(lang_content_switches)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## By date" ] }, { "cell_type": "code", "execution_count": 305, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 8 × 2
datenum_sessions
<chr><int>
2021-06-2232193
2021-06-2366203
2021-06-2470480
2021-06-2566043
2021-06-2658343
2021-06-2766835
2021-06-2886242
2021-06-2946804
\n" ], "text/latex": [ "A tibble: 8 × 2\n", "\\begin{tabular}{ll}\n", " date & num\\_sessions\\\\\n", " & \\\\\n", "\\hline\n", "\t 2021-06-22 & 32193\\\\\n", "\t 2021-06-23 & 66203\\\\\n", "\t 2021-06-24 & 70480\\\\\n", "\t 2021-06-25 & 66043\\\\\n", "\t 2021-06-26 & 58343\\\\\n", "\t 2021-06-27 & 66835\\\\\n", "\t 2021-06-28 & 86242\\\\\n", "\t 2021-06-29 & 46804\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 8 × 2\n", "\n", "| date <chr> | num_sessions <int> |\n", "|---|---|\n", "| 2021-06-22 | 32193 |\n", "| 2021-06-23 | 66203 |\n", "| 2021-06-24 | 70480 |\n", "| 2021-06-25 | 66043 |\n", "| 2021-06-26 | 58343 |\n", "| 2021-06-27 | 66835 |\n", "| 2021-06-28 | 86242 |\n", "| 2021-06-29 | 46804 |\n", "\n" ], "text/plain": [ " date num_sessions\n", "1 2021-06-22 32193 \n", "2 2021-06-23 66203 \n", "3 2021-06-24 70480 \n", "4 2021-06-25 66043 \n", "5 2021-06-26 58343 \n", "6 2021-06-27 66835 \n", "7 2021-06-28 86242 \n", "8 2021-06-29 46804 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_content_switches %>%\n", " group_by(date) %>%\n", " summarize(num_sessions = n_distinct(session_id))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Confirmed we start recording events on 22 June 2021 with the first full day of events recorded on 23 June 2021." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## By action" ] }, { "cell_type": "code", "execution_count": 306, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\n", "
A tibble: 1 × 2
actionnum_sessions
<chr><int>
language-change464766
\n" ], "text/latex": [ "A tibble: 1 × 2\n", "\\begin{tabular}{ll}\n", " action & num\\_sessions\\\\\n", " & \\\\\n", "\\hline\n", "\t language-change & 464766\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 1 × 2\n", "\n", "| action <chr> | num_sessions <int> |\n", "|---|---|\n", "| language-change | 464766 |\n", "\n" ], "text/plain": [ " action num_sessions\n", "1 language-change 464766 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lang_content_switches %>%\n", " group_by(action) %>%\n", " summarize(num_sessions = n_distinct(session_id))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Confirmed we are only recording for language-change actions as expected." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## By Test Wiki Status" ] }, { "cell_type": "code", "execution_count": 310, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\n", "
A tibble: 2 × 2
istestnum_sessions
<chr><int>
non_test134361
test 237847
\n" ], "text/latex": [ "A tibble: 2 × 2\n", "\\begin{tabular}{ll}\n", " istest & num\\_sessions\\\\\n", " & \\\\\n", "\\hline\n", "\t non\\_test & 134361\\\\\n", "\t test & 237847\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 2 × 2\n", "\n", "| istest <chr> | num_sessions <int> |\n", "|---|---|\n", "| non_test | 134361 |\n", "| test | 237847 |\n", "\n" ], "text/plain": [ " istest num_sessions\n", "1 non_test 134361 \n", "2 test 237847 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "\n", "lang_content_switches %>%\n", " filter(date > '2021-06-23',\n", " wiki != 'fawiki') %>%\n", " mutate(istest = ifelse(wiki %in% c('frwiktionary', 'hewiki', 'ptwikiversity', 'frwiki', \n", " 'euwiki', 'ptwiki', 'kowiki', 'trwiki', 'srwiki', 'bnwiki', 'dewikivoyage', 'vecwiki' ), 'test', 'non_test')) %>%\n", " group_by(istest) %>%\n", " summarize(num_sessions = n_distinct(session_id))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The majority (64%) of content-language-switcher events have occured on the test wikis, since 23 June 2021. This is expected as these events should only fire with the new language button which is only available to logged-in users that opt-in on non test wikis." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## By Skin" ] }, { "cell_type": "code", "execution_count": 340, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` regrouping output by 'istest' (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A grouped_df: 4 × 3
istestskinversionnum_sessions
<chr><chr><int>
non_testlatest 984
non_testlegacy143948
test latest263198
test legacy 88
\n" ], "text/latex": [ "A grouped\\_df: 4 × 3\n", "\\begin{tabular}{lll}\n", " istest & skinversion & num\\_sessions\\\\\n", " & & \\\\\n", "\\hline\n", "\t non\\_test & latest & 984\\\\\n", "\t non\\_test & legacy & 143948\\\\\n", "\t test & latest & 263198\\\\\n", "\t test & legacy & 88\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A grouped_df: 4 × 3\n", "\n", "| istest <chr> | skinversion <chr> | num_sessions <int> |\n", "|---|---|---|\n", "| non_test | latest | 984 |\n", "| non_test | legacy | 143948 |\n", "| test | latest | 263198 |\n", "| test | legacy | 88 |\n", "\n" ], "text/plain": [ " istest skinversion num_sessions\n", "1 non_test latest 984 \n", "2 non_test legacy 143948 \n", "3 test latest 263198 \n", "4 test legacy 88 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "\n", "lang_content_switches %>%\n", " filter(date > '2021-06-23',\n", " wiki != 'fawiki') %>%\n", " mutate(istest = ifelse(wiki %in% c('frwiktionary', 'hewiki', 'ptwikiversity', 'frwiki', \n", " 'euwiki', 'ptwiki', 'kowiki', 'trwiki', 'srwiki', 'bnwiki', 'dewikivoyage', 'vecwiki' ), 'test', 'non_test')) %>%\n", " group_by(istest, skinversion) %>%\n", " summarize(num_sessions = n_distinct(session_id))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Potential issue: We're seeing events recorded for legacy. Need to confirm if this is possible." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## AB Balance Check\n", "\n", "Looked at query documented in https://phabricator.wikimedia.org/T280825\n", "\n", "Adjusted to restrict to wikis in the AB test and also account for clicks to the language list in the sidebar for the control group.\n", "\n", "Unfortunately, it is not possible to accurately determine buckets as we do not have instrumentation to track disitinct users that visit the site during the time of the AB test. Further checks on the client will be done to confirm." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compare Clicks to New Header Button (Treatment) vs Clicks to N Other Button or Language Links in Sidebar" ] }, { "cell_type": "code", "execution_count": 430, "metadata": {}, "outputs": [], "source": [ "query <- \n", "\n", "\"\n", "SELECT\n", " event.web_session_id as session_id,\n", " wiki as wiki,\n", " SUM(1) AS num_sessions,\n", " SUM(if(event.action = 'compact-language-links-open' AND event.context = 'header', 1, 0)) as n_header,\n", " SUM(if(event.action = 'language-change' AND event.context = 'languages-list', 1, 0)) as n_sidebar_link,\n", " SUM(if(event.action = 'compact-language-links-open' AND event.context = 'other', 1, 0)) as n_other,\n", " SUM(if(event.action = 'settings-open' AND event.context = 'interlanguage', 1, 0)) as n_sidebar_settings\n", "FROM event.universallanguageselector\n", "WHERE\n", " year = 2021\n", " AND month = 06\n", "-- first full day of events\n", " AND Day >= 23\n", " AND useragent.is_bot = false\n", " AND event.skinVersion = 'latest'\n", " AND event.isAnon = false\n", " AND wiki IN ('frwiktionary', 'hewiki', 'ptwikiversity', 'frwiki', \n", " 'euwiki', 'ptwiki', 'kowiki', 'trwiki', 'srwiki', 'bnwiki', 'dewikivoyage', 'vecwiki')\n", "GROUP BY\n", " event.web_session_id,\n", " wiki\n", "\"" ] }, { "cell_type": "code", "execution_count": 431, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Don't forget to authenticate with Kerberos using kinit\n", "\n" ] } ], "source": [ "ab_test_data <- wmfdata::query_hive(query)" ] }, { "cell_type": "code", "execution_count": 432, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "`summarise()` ungrouping output (override with `.groups` argument)\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 11 × 4
wikitestcontrolall
<chr><int><int><int>
bnwiki 205 220 550
dewikivoyage 0 10 10
euwiki 392 176 883
frwiki 100951530232761
frwiktionary 270 338 807
hewiki 2131 3465 7106
kowiki 1641 1711 4351
ptwiki 3301 443610159
srwiki 451 967 1818
trwiki 1303 1554 3718
vecwiki 3 108 113
\n" ], "text/latex": [ "A tibble: 11 × 4\n", "\\begin{tabular}{llll}\n", " wiki & test & control & all\\\\\n", " & & & \\\\\n", "\\hline\n", "\t bnwiki & 205 & 220 & 550\\\\\n", "\t dewikivoyage & 0 & 10 & 10\\\\\n", "\t euwiki & 392 & 176 & 883\\\\\n", "\t frwiki & 10095 & 15302 & 32761\\\\\n", "\t frwiktionary & 270 & 338 & 807\\\\\n", "\t hewiki & 2131 & 3465 & 7106\\\\\n", "\t kowiki & 1641 & 1711 & 4351\\\\\n", "\t ptwiki & 3301 & 4436 & 10159\\\\\n", "\t srwiki & 451 & 967 & 1818\\\\\n", "\t trwiki & 1303 & 1554 & 3718\\\\\n", "\t vecwiki & 3 & 108 & 113\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 11 × 4\n", "\n", "| wiki <chr> | test <int> | control <int> | all <int> |\n", "|---|---|---|---|\n", "| bnwiki | 205 | 220 | 550 |\n", "| dewikivoyage | 0 | 10 | 10 |\n", "| euwiki | 392 | 176 | 883 |\n", "| frwiki | 10095 | 15302 | 32761 |\n", "| frwiktionary | 270 | 338 | 807 |\n", "| hewiki | 2131 | 3465 | 7106 |\n", "| kowiki | 1641 | 1711 | 4351 |\n", "| ptwiki | 3301 | 4436 | 10159 |\n", "| srwiki | 451 | 967 | 1818 |\n", "| trwiki | 1303 | 1554 | 3718 |\n", "| vecwiki | 3 | 108 | 113 |\n", "\n" ], "text/plain": [ " wiki test control all \n", "1 bnwiki 205 220 550\n", "2 dewikivoyage 0 10 10\n", "3 euwiki 392 176 883\n", "4 frwiki 10095 15302 32761\n", "5 frwiktionary 270 338 807\n", "6 hewiki 2131 3465 7106\n", "7 kowiki 1641 1711 4351\n", "8 ptwiki 3301 4436 10159\n", "9 srwiki 451 967 1818\n", "10 trwiki 1303 1554 3718\n", "11 vecwiki 3 108 113" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ab_test_data %>%\n", " group_by(wiki) %>%\n", " summarise(test = sum(n_header),\n", " control = sum(n_other + n_sidebar_link + n_sidebar_settings),\n", " all = sum(num_sessions))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Confirmed that we are logging both control and test events and the instrumentation allows us to distinguish these events.\n", "\n", "The splits are not perfectly balanced but there are no signficant differences indicating a regression or difference in sampling rate. Differences appear as expected based on a 50/50 split.\n", "\n", "Vec" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Overall" ] }, { "cell_type": "code", "execution_count": 347, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\n", "
A data.frame: 1 × 3
testcontrolall
<int><int><int>
184252605557645
\n" ], "text/latex": [ "A data.frame: 1 × 3\n", "\\begin{tabular}{lll}\n", " test & control & all\\\\\n", " & & \\\\\n", "\\hline\n", "\t 18425 & 26055 & 57645\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 1 × 3\n", "\n", "| test <int> | control <int> | all <int> |\n", "|---|---|---|\n", "| 18425 | 26055 | 57645 |\n", "\n" ], "text/plain": [ " test control all \n", "1 18425 26055 57645" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ab_test_data %>%\n", " summarise(test = sum(n_header),\n", " control = sum(n_other + n_sidebar_link + n_sidebar_settings),\n", " all = sum(num_sessions))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 4 }