{
 "metadata": {
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2-final"
  },
  "orig_nbformat": 2,
  "kernelspec": {
   "name": "Python 3.8.2 64-bit",
   "display_name": "Python 3.8.2 64-bit",
   "metadata": {
    "interpreter": {
     "hash": "f3131ab0c53afd587e929ab5f3e0081bb3b1675889ab2c259292a9bd8773e05a"
    }
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2,
 "cells": [
  {
   "source": [
    "# A/B testing with Python"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "source": [
    "Introduction to A/B testing with python.\n",
    "\n",
    "\n",
    "From the [Data Science from Scratch book](https://www.oreilly.com/library/view/data-science-from/9781492041122/)."
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "source": [
    "## Libraries and helper functions"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import math as m\n",
    "from typing import Tuple"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "def normal_probability_below(x: float, mu: float = 0, sigma: float = 1) -> float:\n",
    "    return (1 + m.erf((x - mu) / m.sqrt(2) / sigma)) / 2\n",
    "\n",
    "def normal_probability_above(lo: float, mu: float = 0, sigma: float = 1) -> float:\n",
    "    return 1 - normal_probability_below(lo, mu, sigma)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "def two_sided_p_value(x: float, mu: float = 0, sigma: float = 1) -> float:\n",
    "    \"\"\"Return the probability of getting at least as extreme value as `x`, given\n",
    "    that our values are from a normal distribution with `mu` mean and `sigma` std.\n",
    "    \"\"\"\n",
    "\n",
    "    # If x is greater than the mean return everything above x\n",
    "    if x >= mu:\n",
    "        return 2 * normal_probability_above(x, mu, sigma)\n",
    "    # If x is less than the mean than return everything below x\n",
    "    else:\n",
    "        return 2 * normal_probability_below(x, mu, sigma)"
   ]
  },
  {
   "source": [
    "## A/B test"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "def estimate_parameters(N: int, true: int) -> Tuple[float, float]:\n",
    "    p = true / N\n",
    "    sigma = m.sqrt(p * (1 - p) / N)\n",
    "    return p, sigma"
   ]
  },
  {
   "source": [
    "$H_0$: $p_a$ and $p_b$ are the same\n",
    "\n",
    "With simplification this means that $p_a - p_b = 0$"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "def a_b_test_statistic(N_A: int, A: int, N_B: int, B:int) -> float:\n",
    "    p_A, sigma_A = estimate_parameters(N_A, A)\n",
    "    p_B, simga_B = estimate_parameters(N_B, B)\n",
    "\n",
    "    return (p_B - p_A) / m.sqrt(sigma_A ** 2 + simga_B ** 2)"
   ]
  },
  {
   "source": [
    "## Example 1"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "(-1.1403464899034472, 0.2541419765422359)"
      ]
     },
     "metadata": {},
     "execution_count": 20
    }
   ],
   "source": [
    "z = a_b_test_statistic(1000, 200, 1000, 180)\n",
    "z, two_sided_p_value(z)"
   ]
  },
  {
   "source": [
    "That is, the probability of at least such a big difference occurring assuming that the two probabilities are the same is ~0.25."
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "source": [
    "## Example 2"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "source": [
    "Let's decrease the occurance of $b$ even more."
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "(-2.948839123097944, 0.003189699706216853)"
      ]
     },
     "metadata": {},
     "execution_count": 19
    }
   ],
   "source": [
    "z = a_b_test_statistic(1000, 200, 1000, 150)\n",
    "z, two_sided_p_value(z)"
   ]
  },
  {
   "source": [
    "That is the probability of at least this large difference to occur if the probabilities are the same is 0.03."
   ],
   "cell_type": "markdown",
   "metadata": {}
  }
 ]
}