{ "metadata": { "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2-final" }, "orig_nbformat": 2, "kernelspec": { "name": "Python 3.8.2 64-bit", "display_name": "Python 3.8.2 64-bit", "metadata": { "interpreter": { "hash": "f3131ab0c53afd587e929ab5f3e0081bb3b1675889ab2c259292a9bd8773e05a" } } } }, "nbformat": 4, "nbformat_minor": 2, "cells": [ { "source": [ "# A/B testing with Python" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "Introduction to A/B testing with python.\n", "\n", "\n", "From the [Data Science from Scratch book](https://www.oreilly.com/library/view/data-science-from/9781492041122/)." ], "cell_type": "markdown", "metadata": {} }, { "source": [ "## Libraries and helper functions" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import math as m\n", "from typing import Tuple" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "def normal_probability_below(x: float, mu: float = 0, sigma: float = 1) -> float:\n", " return (1 + m.erf((x - mu) / m.sqrt(2) / sigma)) / 2\n", "\n", "def normal_probability_above(lo: float, mu: float = 0, sigma: float = 1) -> float:\n", " return 1 - normal_probability_below(lo, mu, sigma)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "def two_sided_p_value(x: float, mu: float = 0, sigma: float = 1) -> float:\n", " \"\"\"Return the probability of getting at least as extreme value as `x`, given\n", " that our values are from a normal distribution with `mu` mean and `sigma` std.\n", " \"\"\"\n", "\n", " # If x is greater than the mean return everything above x\n", " if x >= mu:\n", " return 2 * normal_probability_above(x, mu, sigma)\n", " # If x is less than the mean than return everything below x\n", " else:\n", " return 2 * normal_probability_below(x, mu, sigma)" ] }, { "source": [ "## A/B test" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "def estimate_parameters(N: int, true: int) -> Tuple[float, float]:\n", " p = true / N\n", " sigma = m.sqrt(p * (1 - p) / N)\n", " return p, sigma" ] }, { "source": [ "$H_0$: $p_a$ and $p_b$ are the same\n", "\n", "With simplification this means that $p_a - p_b = 0$" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "def a_b_test_statistic(N_A: int, A: int, N_B: int, B:int) -> float:\n", " p_A, sigma_A = estimate_parameters(N_A, A)\n", " p_B, simga_B = estimate_parameters(N_B, B)\n", "\n", " return (p_B - p_A) / m.sqrt(sigma_A ** 2 + simga_B ** 2)" ] }, { "source": [ "## Example 1" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(-1.1403464899034472, 0.2541419765422359)" ] }, "metadata": {}, "execution_count": 20 } ], "source": [ "z = a_b_test_statistic(1000, 200, 1000, 180)\n", "z, two_sided_p_value(z)" ] }, { "source": [ "That is, the probability of at least such a big difference occurring assuming that the two probabilities are the same is ~0.25." ], "cell_type": "markdown", "metadata": {} }, { "source": [ "## Example 2" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "Let's decrease the occurance of $b$ even more." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(-2.948839123097944, 0.003189699706216853)" ] }, "metadata": {}, "execution_count": 19 } ], "source": [ "z = a_b_test_statistic(1000, 200, 1000, 150)\n", "z, two_sided_p_value(z)" ] }, { "source": [ "That is the probability of at least this large difference to occur if the probabilities are the same is 0.03." ], "cell_type": "markdown", "metadata": {} } ] }