{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" }, "colab": { "name": "2021-06-28-step-by-step-ab-testing-guide.ipynb", "provenance": [], "toc_visible": true } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "-44lu3y8xbZz" }, "source": [ "# A/B testing step-by-step guide in Python\n", "> In this notebook we'll go over the process of analysing an A/B test, from formulating a hypothesis, testing it, and finally interpreting results.\n", "\n", "- toc: true\n", "- badges: true\n", "- comments: true\n", "- categories: [ABTest]\n", "- author: \"Renato Fillinich\"\n", "- image:" ] }, { "cell_type": "markdown", "metadata": { "id": "D3XBuKL-xbZ3" }, "source": [ "In this notebook we'll go over the process of analysing an A/B test, from formulating a hypothesis, testing it, and finally interpreting results. For our data, we'll use a dataset from Kaggle which contains the results of an A/B test on what seems to be 2 different designs of a website page (old_page vs. new_page). Here's what we'll do:\n", "\n", "1. Designing our experiment\n", "2. Collecting and preparing the data\n", "3. Visualising the results\n", "4. Testing the hypothesis\n", "5. Drawing conclusions\n", "\n", "To make it a bit more realistic, here's a potential **scenario** for our study:\n", "\n", "> Let's imagine you work on the product team at a medium-sized **online e-commerce business**. The UX designer worked really hard on a new version of the product page, with the hope that it will lead to a higher conversion rate. The product manager (PM) told you that the **current conversion rate** is about **13%** on average throughout the year, and that the team would be happy with an **increase of 2%**, meaning that the new design will be considered a success if it raises the conversion rate to 15%.\n", "\n", "Before rolling out the change, the team would be more comfortable testing it on a small number of users to see how it performs, so you suggest running an **A/B test** on a subset of your user base users." ] }, { "cell_type": "markdown", "metadata": { "id": "mSR2XKnOxbZ5" }, "source": [ "***\n", "## 1. Designing our experiment" ] }, { "cell_type": "markdown", "metadata": { "id": "UddxvZVjxbZ6" }, "source": [ "### Formulating a hypothesis\n", "\n", "First things first, we want to make sure we formulate a hypothesis at the start of our project. This will make sure our interpretation of the results is correct as well as rigorous.\n", "\n", "Given we don't know if the new design will perform better or worse (or the same?) as our current design, we'll choose a **two-tailed test**:\n", "\n", "$$H_0: p = p_0$$\n", "$$H_a: p \\ne p_0$$\n", "\n", "where $p$ and $p_0$ stand for the conversion rate of the new and old design, respectively. We'll also set a **confidence level of 95%**:\n", "\n", "$$\\alpha = 0.05$$\n", "\n", "The $\\alpha$ value is a threshold we set, by which we say \"if the probability of observing a result as extreme or more ($p$-value) is lower than $\\alpha$, then we reject the null hypothesis\". Since our $\\alpha=0.05$ (indicating 5% probability), our confidence (1 - $\\alpha$) is 95%.\n", "\n", "Don't worry if you are not familiar with the above, all this really means is that whatever conversion rate we observe for our new design in our test, we want to be 95% confident it is statistically different from the conversion rate of our old design, before we decide to reject the Null hypothesis $H_0$. " ] }, { "cell_type": "markdown", "metadata": { "id": "23b3NwD9xbZ7" }, "source": [ "### Choosing the variables\n", "\n", "For our test we'll need **two groups**:\n", "* A `control` group - They'll be shown the old design\n", "* A `treatment` (or experimental) group - They'll be shown the new design\n", "\n", "This will be our *Independent Variable*. The reason we have two groups even though we know the baseline conversion rate is that we want to control for other variables that could have an effect on our results, such as seasonality: by having a `control` group we can directly compare their results to the `treatment` group, because the only systematic difference between the groups is the design of the product page, and we can therefore attribute any differences in results to the designs.\n", "\n", "For our *Dependent Variable* (i.e. what we are trying to measure), we are interested in capturing the `conversion rate`. A way we can code this is by each user session with a binary variable:\n", "* `0` - The user did not buy the product during this user session\n", "* `1` - The user bought the product during this user session\n", "\n", "This way, we can easily calculate the mean for each group to get the conversion rate of each design." ] }, { "cell_type": "markdown", "metadata": { "id": "yKqt1-cOxbZ-" }, "source": [ "### Choosing a sample size\n", "\n", "It is important to note that since we won't test the whole user base (our population), the conversion rates that we'll get will inevitably be only *estimates* of the true rates.\n", "\n", "The number of people (or user sessions) we decide to capture in each group will have an effect on the precision of our estimated conversion rates: **the larger the sample size**, the more precise our estimates (i.e. the smaller our confidence intervals), **the higher the chance to detect a difference** in the two groups, if present.\n", "\n", "On the other hand, the larger our sample gets, the more expensive (and impractical) our study becomes.\n", "\n", "*So how many people should we have in each group?*\n", "\n", "The sample size we need is estimated through something called *Power analysis*, and it depends on a few factors:\n", "* **Power of the test** ($1 - \\beta$) - This represents the probability of finding a statistical difference between the groups in our test when a difference is actually present. This is usually set at 0.8 as a convention (here's more info on statistical power, if you are curious)\n", "* **Alpha value** ($\\alpha$) - The critical value we set earlier to 0.05\n", "* **Effect size** - How big of a difference we expect there to be between the conversion rates\n", "\n", "Since our team would be happy with a difference of 2%, we can use 13% and 15% to calculate the effect size we expect. \n", "\n", "Luckily, **Python takes care of all these calculations for us**:" ] }, { "cell_type": "code", "metadata": { "id": "J0Bq-G8DxbaA" }, "source": [ "# Packages imports\n", "import numpy as np\n", "import pandas as pd\n", "import scipy.stats as stats\n", "import statsmodels.stats.api as sms\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from math import ceil\n", "\n", "%matplotlib inline\n", "\n", "import warnings\n", "warnings.filterwarnings(\"ignore\")\n", "\n", "# Some plot styling preferences\n", "plt.style.use('seaborn-whitegrid')\n", "font = {'family' : 'Helvetica',\n", " 'weight' : 'bold',\n", " 'size' : 14}\n", "\n", "mpl.rc('font', **font)" ], "execution_count": 5, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "eiaZhk-vxbaC", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "8c4fc745-a881-4d6d-f6f3-6e25308ff967" }, "source": [ "effect_size = sms.proportion_effectsize(0.13, 0.15) # Calculating effect size based on our expected rates\n", "\n", "required_n = sms.NormalIndPower().solve_power(\n", " effect_size, \n", " power=0.8, \n", " alpha=0.05, \n", " ratio=1\n", " ) # Calculating sample size needed\n", "\n", "required_n = ceil(required_n) # Rounding up to next whole number \n", "\n", "print(required_n)" ], "execution_count": 6, "outputs": [ { "output_type": "stream", "text": [ "4720\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "eZl2r4v7xbaF" }, "source": [ "We'd need **at least 4720 observations for each group**. \n", "\n", "Having set the `power` parameter to 0.8 in practice means that if there exists an actual difference in conversion rate between our designs, assuming the difference is the one we estimated (13% vs. 15%), we have about 80% chance to detect it as statistically significant in our test with the sample size we calculated." ] }, { "cell_type": "markdown", "metadata": { "id": "CURpYxEZxbaG" }, "source": [ "***\n", "## 2. Collecting and preparing the data" ] }, { "cell_type": "markdown", "metadata": { "id": "QnEwYEjhxbaH" }, "source": [ "Great stuff! So now that we have our required sample size, we need to collect the data. Usually at this point you would work with your team to set up the experiment, likely with the help of the Engineering team, and make sure that you collect enough data based on the sample size needed.\n", "\n", "However, since we'll use a dataset that we found online, in order to simulate this situation we'll:\n", "1. Download the dataset from Kaggle\n", "2. Read the data into a pandas DataFrame\n", "3. Check and clean the data as needed\n", "4. Randomly sample `n=4720` rows from the DataFrame for each group *****\n", "\n", "***Note**: Normally, we would not need to perform step 4, this is just for the sake of the exercise\n", "\n", "Since I already downloaded the dataset, I'll go straight to number 2." ] }, { "cell_type": "code", "metadata": { "id": "Z4s5ICJQxbaH", "colab": { "base_uri": "https://localhost:8080/", "height": 204 }, "outputId": "36350c16-000d-4124-f4be-38115ca5b396" }, "source": [ "df = pd.read_csv('https://github.com/sparsh-ai/reco-data/raw/master/ab-testing.zip')\n", "\n", "df.head()" ], "execution_count": 7, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user_idtimestampgrouplanding_pageconverted
08511042017-01-21 22:11:48.556739controlold_page0
18042282017-01-12 08:01:45.159739controlold_page0
26615902017-01-11 16:55:06.154213treatmentnew_page0
38535412017-01-08 18:28:03.143765treatmentnew_page0
48649752017-01-21 01:52:26.210827controlold_page1
\n", "
" ], "text/plain": [ " user_id timestamp group landing_page converted\n", "0 851104 2017-01-21 22:11:48.556739 control old_page 0\n", "1 804228 2017-01-12 08:01:45.159739 control old_page 0\n", "2 661590 2017-01-11 16:55:06.154213 treatment new_page 0\n", "3 853541 2017-01-08 18:28:03.143765 treatment new_page 0\n", "4 864975 2017-01-21 01:52:26.210827 control old_page 1" ] }, "metadata": { "tags": [] }, "execution_count": 7 } ] }, { "cell_type": "code", "metadata": { "id": "Z3RiYHMBxbaI", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "d912411a-3e0c-4bbe-dba2-d67f074a1238" }, "source": [ "df.info()" ], "execution_count": 8, "outputs": [ { "output_type": "stream", "text": [ "\n", "RangeIndex: 294478 entries, 0 to 294477\n", "Data columns (total 5 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 user_id 294478 non-null int64 \n", " 1 timestamp 294478 non-null object\n", " 2 group 294478 non-null object\n", " 3 landing_page 294478 non-null object\n", " 4 converted 294478 non-null int64 \n", "dtypes: int64(2), object(3)\n", "memory usage: 11.2+ MB\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "3jGYR9EJxbaI", "colab": { "base_uri": "https://localhost:8080/", "height": 142 }, "outputId": "dd8c7044-844f-4f04-a0bd-8a6b535e2117" }, "source": [ "# To make sure all the control group are seeing the old page and viceversa\n", "\n", "pd.crosstab(df['group'], df['landing_page'])" ], "execution_count": 9, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
landing_pagenew_pageold_page
group
control1928145274
treatment1453111965
\n", "
" ], "text/plain": [ "landing_page new_page old_page\n", "group \n", "control 1928 145274\n", "treatment 145311 1965" ] }, "metadata": { "tags": [] }, "execution_count": 9 } ] }, { "cell_type": "markdown", "metadata": { "id": "v4EyzPcPxbaJ" }, "source": [ "There are **294478 rows** in the DataFrame, each representing a user session, as well as **5 columns** :\n", "* `user_id` - The user ID of each session\n", "* `timestamp` - Timestamp for the session\n", "* `group` - Which group the user was assigned to for that session {`control`, `treatment`}\n", "* `landing_page` - Which design each user saw on that session {`old_page`, `new_page`}\n", "* `converted` - Whether the session ended in a conversion or not (binary, `0`=not converted, `1`=converted)\n", "\n", "We'll actually only use the `group` and `converted` columns for the analysis.\n", "\n", "Before we go ahead and sample the data to get our subset, let's make sure there are no users that have been sampled multiple times." ] }, { "cell_type": "code", "metadata": { "id": "32x1ywhYxbaJ", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "cb57b90a-00eb-4144-ec79-c88774bd58c8" }, "source": [ "session_counts = df['user_id'].value_counts(ascending=False)\n", "multi_users = session_counts[session_counts > 1].count()\n", "\n", "print(f'There are {multi_users} users that appear multiple times in the dataset')" ], "execution_count": 10, "outputs": [ { "output_type": "stream", "text": [ "There are 3894 users that appear multiple times in the dataset\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "yI1v4QCrxbaK" }, "source": [ "There are, in fact, users that appear more than once. Since the number is pretty low, we'll go ahead and remove them from the DataFrame to avoid sampling the same users twice." ] }, { "cell_type": "code", "metadata": { "id": "1VTCNR5SxbaK", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "298a791b-4df8-4ea2-d2f1-293cae82b43b" }, "source": [ "users_to_drop = session_counts[session_counts > 1].index\n", "\n", "df = df[~df['user_id'].isin(users_to_drop)]\n", "print(f'The updated dataset now has {df.shape[0]} entries')" ], "execution_count": 11, "outputs": [ { "output_type": "stream", "text": [ "The updated dataset now has 286690 entries\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "VYalyUeJxbaK" }, "source": [ "### Sampling\n", "\n", "Now that our DataFrame is nice and clean, we can proceed and sample `n=4720` entries for each of the groups. We can use pandas' `DataFrame.sample()` method to do this, which will perform Simple Random Sampling for us. \n", "\n", "**Note**: I've set `random_state=22` so that the results are reproducible if you feel like following on your own Notebook: just use `random_state=22` in your function and you should get the same sample as I did." ] }, { "cell_type": "code", "metadata": { "id": "6EXbtFaexbaL" }, "source": [ "control_sample = df[df['group'] == 'control'].sample(n=required_n, random_state=22)\n", "treatment_sample = df[df['group'] == 'treatment'].sample(n=required_n, random_state=22)\n", "\n", "ab_test = pd.concat([control_sample, treatment_sample], axis=0)\n", "ab_test.reset_index(drop=True, inplace=True)" ], "execution_count": 12, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "ORf0Hlv2xbaL", "colab": { "base_uri": "https://localhost:8080/", "height": 419 }, "outputId": "0edd346b-7612-4ef1-bb54-7578893a3326" }, "source": [ "ab_test" ], "execution_count": 13, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user_idtimestampgrouplanding_pageconverted
07638542017-01-21 03:43:17.188315controlold_page0
16905552017-01-18 06:38:13.079449controlold_page0
28615202017-01-06 21:13:40.044766controlold_page0
36307782017-01-05 16:42:36.995204controlold_page0
46566342017-01-04 15:31:21.676130controlold_page0
..................
94359085122017-01-14 22:02:29.922674treatmentnew_page0
94368732112017-01-05 00:57:16.167151treatmentnew_page0
94376312762017-01-20 18:56:58.167809treatmentnew_page0
94386623012017-01-03 08:10:57.768806treatmentnew_page0
94399446232017-01-19 10:56:01.648653treatmentnew_page1
\n", "

9440 rows × 5 columns

\n", "
" ], "text/plain": [ " user_id timestamp group landing_page converted\n", "0 763854 2017-01-21 03:43:17.188315 control old_page 0\n", "1 690555 2017-01-18 06:38:13.079449 control old_page 0\n", "2 861520 2017-01-06 21:13:40.044766 control old_page 0\n", "3 630778 2017-01-05 16:42:36.995204 control old_page 0\n", "4 656634 2017-01-04 15:31:21.676130 control old_page 0\n", "... ... ... ... ... ...\n", "9435 908512 2017-01-14 22:02:29.922674 treatment new_page 0\n", "9436 873211 2017-01-05 00:57:16.167151 treatment new_page 0\n", "9437 631276 2017-01-20 18:56:58.167809 treatment new_page 0\n", "9438 662301 2017-01-03 08:10:57.768806 treatment new_page 0\n", "9439 944623 2017-01-19 10:56:01.648653 treatment new_page 1\n", "\n", "[9440 rows x 5 columns]" ] }, "metadata": { "tags": [] }, "execution_count": 13 } ] }, { "cell_type": "code", "metadata": { "id": "fIFKXUmBxbaN", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "cdecfb42-bbf6-4e21-ab13-778bb1b5af33" }, "source": [ "ab_test.info()" ], "execution_count": 14, "outputs": [ { "output_type": "stream", "text": [ "\n", "RangeIndex: 9440 entries, 0 to 9439\n", "Data columns (total 5 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 user_id 9440 non-null int64 \n", " 1 timestamp 9440 non-null object\n", " 2 group 9440 non-null object\n", " 3 landing_page 9440 non-null object\n", " 4 converted 9440 non-null int64 \n", "dtypes: int64(2), object(3)\n", "memory usage: 368.9+ KB\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "5mGQRVnSxbaO", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "9c440cdc-37ac-4233-eb7f-b824768b18b1" }, "source": [ "ab_test['group'].value_counts()" ], "execution_count": 15, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "control 4720\n", "treatment 4720\n", "Name: group, dtype: int64" ] }, "metadata": { "tags": [] }, "execution_count": 15 } ] }, { "cell_type": "markdown", "metadata": { "id": "HPUc1bznxbaO" }, "source": [ "Great, looks like everything went as planned, and we are now ready to analyse our results." ] }, { "cell_type": "markdown", "metadata": { "id": "g1PoRb9KxbaP" }, "source": [ "***\n", "## 3. Visualising the results" ] }, { "cell_type": "markdown", "metadata": { "id": "fmwnE9rXxbaP" }, "source": [ "The first thing we can do is to calculate some **basic statistics** to get an idea of what our samples look like." ] }, { "cell_type": "code", "metadata": { "id": "OrfD9yUyxbaQ", "colab": { "base_uri": "https://localhost:8080/", "height": 103 }, "outputId": "2e3fc958-aaf7-49c1-aed5-a8bd53ac2b68" }, "source": [ "conversion_rates = ab_test.groupby('group')['converted']\n", "\n", "std_p = lambda x: np.std(x, ddof=0) # Std. deviation of the proportion\n", "se_p = lambda x: stats.sem(x, ddof=0) # Std. error of the proportion (std / sqrt(n))\n", "\n", "conversion_rates = conversion_rates.agg([np.mean, std_p, se_p])\n", "conversion_rates.columns = ['conversion_rate', 'std_deviation', 'std_error']\n", "\n", "\n", "conversion_rates.style.format('{:.3f}')" ], "execution_count": 16, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
conversion_rate std_deviation std_error
group
control0.1230.3290.005
treatment0.1260.3310.005
" ], "text/plain": [ "" ] }, "metadata": { "tags": [] }, "execution_count": 16 } ] }, { "cell_type": "markdown", "metadata": { "id": "MPADWAVexbaR" }, "source": [ "Judging by the stats above, it does look like **our two designs performed very similarly**, with our new design performing slightly better, approx. **12.3% vs. 12.6% conversion rate**.\n", "\n", "Plotting the data will make these results easier to grasp:" ] }, { "cell_type": "code", "metadata": { "id": "gcasaVmXxbaR", "colab": { "base_uri": "https://localhost:8080/", "height": 489 }, "outputId": "11dcf7af-5468-43f0-959e-a7e4577fb1bd" }, "source": [ "plt.figure(figsize=(8,6))\n", "\n", "sns.barplot(x=ab_test['group'], y=ab_test['converted'], ci=False)\n", "\n", "plt.ylim(0, 0.17)\n", "plt.title('Conversion rate by group', pad=20)\n", "plt.xlabel('Group', labelpad=15)\n", "plt.ylabel('Converted (proportion)', labelpad=15);" ], "execution_count": 17, "outputs": [ { "output_type": "stream", "text": [ "findfont: Font family ['Helvetica'] not found. Falling back to DejaVu Sans.\n", "findfont: Font family ['Helvetica'] not found. Falling back to DejaVu Sans.\n", "findfont: Font family ['Helvetica'] not found. Falling back to DejaVu Sans.\n" ], "name": "stderr" }, { "output_type": "display_data", "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "tags": [] } } ] }, { "cell_type": "markdown", "metadata": { "id": "9s8WH7MoxbaS" }, "source": [ "The conversion rates for our groups are indeed very close. Also note that the conversion rate of the `control` group is lower than what we would have expected given what we knew about our avg. conversion rate (12.3% vs. 13%). This goes to show that there is some variation in results when sampling from a population.\n", "\n", "So... the `treatment` group's value is higher. **Is this difference *statistically significant***?" ] }, { "cell_type": "markdown", "metadata": { "id": "Cz0hF-gXxbaS" }, "source": [ "***\n", "## 4. Testing the hypothesis" ] }, { "cell_type": "markdown", "metadata": { "id": "np2wDTn4xbaS" }, "source": [ "The last step of our analysis is testing our hypothesis. Since we have a very large sample, we can use the normal approximation for calculating our $p$-value (i.e. z-test). \n", "\n", "Again, Python makes all the calculations very easy. We can use the `statsmodels.stats.proportion` module to get the $p$-value and confidence intervals:" ] }, { "cell_type": "code", "metadata": { "id": "WRFRW8NpxbaT" }, "source": [ "from statsmodels.stats.proportion import proportions_ztest, proportion_confint" ], "execution_count": 18, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "dbjM4L56xbaU" }, "source": [ "control_results = ab_test[ab_test['group'] == 'control']['converted']\n", "treatment_results = ab_test[ab_test['group'] == 'treatment']['converted']" ], "execution_count": 19, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "HUm9Alv-xbaV", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "bd092939-cc8c-4401-97f0-73e4ac35d3c4" }, "source": [ "n_con = control_results.count()\n", "n_treat = treatment_results.count()\n", "successes = [control_results.sum(), treatment_results.sum()]\n", "nobs = [n_con, n_treat]\n", "\n", "z_stat, pval = proportions_ztest(successes, nobs=nobs)\n", "(lower_con, lower_treat), (upper_con, upper_treat) = proportion_confint(successes, nobs=nobs, alpha=0.05)\n", "\n", "print(f'z statistic: {z_stat:.2f}')\n", "print(f'p-value: {pval:.3f}')\n", "print(f'ci 95% for control group: [{lower_con:.3f}, {upper_con:.3f}]')\n", "print(f'ci 95% for treatment group: [{lower_treat:.3f}, {upper_treat:.3f}]')" ], "execution_count": 20, "outputs": [ { "output_type": "stream", "text": [ "z statistic: -0.34\n", "p-value: 0.732\n", "ci 95% for control group: [0.114, 0.133]\n", "ci 95% for treatment group: [0.116, 0.135]\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "wY-zEPrLxbaV" }, "source": [ "***\n", "## 5. Drawing conclusions" ] }, { "cell_type": "markdown", "metadata": { "id": "AqesRmhrxbaW" }, "source": [ "Since our $p$-value=0.732 is way above our $\\alpha$=0.05, we cannot reject the null hypothesis $H_0$, which means that our new design did not perform significantly different (let alone better) than our old one :(\n", "\n", "Additionally, if we look at the confidence interval for the `treatment` group ([0.116, 0.135], i.e. 11.6-13.5%) we notice that:\n", "1. It includes our baseline value of 13% conversion rate\n", "2. It does not include our target value of 15% (the 2% uplift we were aiming for)\n", "\n", "What this means is that it is more likely that the true conversion rate of the new design is similar to our baseline, rather than the 15% target we had hoped for. This is further proof that our new design is not likely to be an improvement on our old design, and that unfortunately we are back to the drawing board! " ] } ] }