{ "cells": [ { "cell_type": "markdown", "id": "8b0df975", "metadata": {}, "source": [ "# AB-test - Part 2/3(AB-test)\n", "\n", "> AB-test - Part 2(AB-test)\n", "\n", "- toc: true\n", "- branch: master\n", "- badges: true\n", "- comments: true\n", "- author: Zmey56\n", "- categories: [data analysis, ab-test, aa-test]" ] }, { "cell_type": "markdown", "id": "96ee691c", "metadata": {}, "source": [ "This article is a continuation of the [last article on A/B testing](https://alex.gladkikh.org/datascience/catboost/xgboost/job/2022/08/09/aa-test-article.html) and a series of articles on my work on analytics.\n", "\n", "Now let's analyze the results of the experiment that took place from 2022–05–24 to 2022–05–30 inclusive. Groups 2 and 1 were used for the experiment.\n", "\n", "In group 2, one of the new algorithms for recommending posts was used, group 1 was used as a control.\n", "\n", "The main hypothesis is that the new algorithm in the 2nd group will lead to an increase in CTR.\n", "\n", "At the first step, as usual, we will download the necessary libraries." ] }, { "cell_type": "code", "execution_count": 8, "id": "22076dc2", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import pandahouse as ph\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from scipy import stats" ] }, { "cell_type": "markdown", "id": "f9b4ce8e", "metadata": {}, "source": [ "Connecting to a data base in which a division into five groups has already been created earlier" ] }, { "cell_type": "code", "execution_count": 2, "id": "ca062b5e", "metadata": {}, "outputs": [], "source": [ "connection = {\n", " 'host': 'https://clickhouse.lab.karpov.courses',\n", " 'password': 'dpo_python_2020',\n", " 'user': 'student',\n", " 'database': 'simulator_20220620'\n", "}" ] }, { "cell_type": "markdown", "id": "ee7a07fa", "metadata": {}, "source": [ "We get only groups 2 and 3 from the database" ] }, { "cell_type": "code", "execution_count": 3, "id": "e9a15564", "metadata": {}, "outputs": [], "source": [ "q = \"\"\"\n", "SELECT exp_group, \n", " user_id,\n", " sum(action = 'like') as likes,\n", " sum(action = 'view') as views,\n", " likes/views as ctr\n", "FROM {db}.feed_actions \n", "WHERE toDate(time) between '2022-05-24' and '2022-05-30'\n", " and exp_group in (1,2)\n", "GROUP BY exp_group, user_id\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 4, "id": "27ef6d34", "metadata": {}, "outputs": [], "source": [ "df = ph.read_clickhouse(q, connection=connection)" ] }, { "cell_type": "code", "execution_count": 5, "id": "ec30fdc9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | user_id | \n", "likes | \n", "views | \n", "ctr | \n", "
---|---|---|---|---|
exp_group | \n", "\n", " | \n", " | \n", " | \n", " |
1 | \n", "10079 | \n", "10079 | \n", "10079 | \n", "10079 | \n", "
2 | \n", "9952 | \n", "9952 | \n", "9952 | \n", "9952 | \n", "