{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# More Distributions and the Central Limit Theorem\n", "> It’s time to explore one of the most important probability distributions in statistics, normal distribution. You’ll create histograms to plot normal distributions and gain an understanding of the central limit theorem, before expanding your knowledge of statistical functions by adding the Poisson, exponential, and t-distributions to your repertoire. This is the Summary of lecture \"Introduction to Statistics in Python\", via datacamp.\n", "\n", "- toc: true \n", "- badges: true\n", "- comments: true\n", "- author: Chanseok Kang\n", "- categories: [Python, Datacamp, Statistics]\n", "- image: images/uniform_dist_amir.png" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Normal Distrubtion\n", "- The normal distribution\n", " - Symmetrical\n", " - Area = 1\n", " - Curve never hits 0\n", " - Described by mean and standard deviation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Distribution of Amir's sales\n", "Since each deal Amir worked on (both won and lost) was different, each was worth a different amount of money. These values are stored in the `amount` column of `amir_deals` As part of Amir's performance review, you want to be able to estimate the probability of him selling different amounts, but before you can do this, you'll need to determine what kind of distribution the `amount` variable follows." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
productclientstatusamountnum_users
1Product FCurrentWon7389.5219
2Product CNewWon4493.0143
3Product BNewWon5738.0987
4Product ICurrentWon2591.2483
5Product ECurrentWon6622.9717
\n", "
" ], "text/plain": [ " product client status amount num_users\n", "1 Product F Current Won 7389.52 19\n", "2 Product C New Won 4493.01 43\n", "3 Product B New Won 5738.09 87\n", "4 Product I Current Won 2591.24 83\n", "5 Product E Current Won 6622.97 17" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "amir_deals = pd.read_csv('./dataset/amir_deals.csv', index_col=0)\n", "amir_deals.head()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXMAAAD4CAYAAAAeugY9AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAARY0lEQVR4nO3df4xlZX3H8fe3LOiWsbtQ9GZdSQdSYkqYiO6ELrVpZvAXQlM0oQnE6G7FjGlrY+skzaJJq7EmayPamDbVtaAbo4xWsRDUGkIZiUmDnVV0Fle6gFtlobsSYWUIaR399o/7LB13Z+beuT/mzjzzfiU3c85znnvP95mz97Nnzjn33MhMJEnr268MugBJUvcMc0mqgGEuSRUwzCWpAoa5JFVg02qu7Lzzzsvh4eHVXOVznnnmGc4+++yBrHs1Oc76bJSxOs6lHThw4InMfOFyfVY1zIeHh5mZmVnNVT5nenqasbGxgax7NTnO+myUsTrOpUXEf7Xq42EWSaqAYS5JFTDMJakChrkkVcAwl6QKGOaSVAHDXJIqYJhLUgUMc0mqwKp+AlRqZXjPl9vqNzkyz+42+651R/ZePegSVAH3zCWpAoa5JFXAMJekChjmklQBw1ySKmCYS1IFDHNJqoBhLkkVMMwlqQKGuSRVwDCXpAoY5pJUAcNckipgmEtSBVqGeUQ8PyK+GRHfiYgHIuJ9pf2CiLgvIg5HxOci4qz+lytJWkw7e+b/A1yRmS8DLgWujIidwAeBj2TmRcCTwA39K1OStJyWYZ5Nc2X2zPJI4ArgC6V9P/CGvlQoSWqprWPmEXFGRNwPHAfuAh4GnsrM+dLlUWB7f0qUJLUSmdl+54itwJeAvwI+mZm/WdrPB76SmSOLPGcCmABoNBo7pqamelH3is3NzTE0NDSQda+m9T7O2aMn2urX2AzHnu1zMatkZPuWZZev923aLse5tPHx8QOZObpcnxV9B2hmPhUR08BOYGtEbCp75y8BHlviOfuAfQCjo6M5Nja2klX2zPT0NINa92pa7+Ns93s9J0fmuWm2jq+wPfKmsWWXr/dt2i7H2Z12rmZ5YdkjJyI2A68GDgH3ANeWbruA23tenSSpLe3s2mwD9kfEGTTD//OZeWdEfA+Yioi/Ab4N3NzHOiVJy2gZ5pn5XeDli7Q/AlzWj6IkSSvjJ0AlqQJ1nEGS1rHhFid9J0fm2z4xvBJH9l7d89fU4LhnLkkVMMwlqQKGuSRVwDCXpAoY5pJUAcNckipgmEtSBQxzSaqAYS5JFTDMJakChrkkVcB7s+g0re4VImntcc9ckipgmEtSBQxzSaqAYS5JFTDMJakChrkkVcAwl6QKGOaSVAHDXJIq0DLMI+L8iLgnIg5FxAMR8c7S/t6IOBoR95fHVf0vV5K0mHY+zj8PTGbmtyLiBcCBiLirLPtIZn6of+VJktrRMswz83Hg8TL9dEQcArb3uzBJUvsiM9vvHDEM3AtcArwL2A38FJihuff+5CLPmQAmABqNxo6pqalua+7I3NwcQ0NDA1n3aurFOGePnuhRNf3T2AzHnh10FaujX2Md2b6l9y/aBd+jSxsfHz+QmaPL9Wk7zCNiCPg68IHMvC0iGsATQALvB7Zl5luXe43R0dGcmZlpa329Nj09zdjY2EDWvZp6Mc71cNfEyZF5bprdGDf97NdYj+y9uuev2Q3fo0uLiJZh3tbVLBFxJvBF4DOZeRtAZh7LzJ9n5i+ATwCXrag6SVLPtHM1SwA3A4cy88ML2rct6PZG4GDvy5MktaOdv91eCbwZmI2I+0vbu4HrI+JSmodZjgBv70uFkqSW2rma5RtALLLoK70vR5LUCT8BKkkVMMwlqQKGuSRVwDCXpAoY5pJUAcNckipgmEtSBQxzSaqAYS5JFTDMJakChrkkVcAwl6QKGOaSVAHDXJIqYJhLUgUMc0mqgGEuSRUwzCWpAoa5JFXAMJekChjmklQBw1ySKtAyzCPi/Ii4JyIORcQDEfHO0n5uRNwVEYfLz3P6X64kaTHt7JnPA5OZ+VvATuBPI+JiYA9wd2ZeBNxd5iVJA9AyzDPz8cz8Vpl+GjgEbAeuAfaXbvuBN/SrSEnS8iIz2+8cMQzcC1wC/DAzty5Y9mRmnnaoJSImgAmARqOxY2pqqsuSOzM3N8fQ0NBA1r2aejHO2aMnelRN/zQ2w7FnB13F6ujXWEe2b+n9i3bB9+jSxsfHD2Tm6HJ92g7ziBgCvg58IDNvi4in2gnzhUZHR3NmZqat9fXa9PQ0Y2NjA1n3aurFOIf3fLk3xfTR5Mg8N81uGnQZq6JfYz2y9+qev2Y3fI8uLSJahnlbV7NExJnAF4HPZOZtpflYRGwry7cBx1dUnSSpZ9q5miWAm4FDmfnhBYvuAHaV6V3A7b0vT5LUjnb+dnsl8GZgNiLuL23vBvYCn4+IG4AfAn/YnxIlSa20DPPM/AYQSyx+VW/LkSR1wk+ASlIFDHNJqoBhLkkVMMwlqQKGuSRVwDCXpAoY5pJUAcNckipgmEtSBQxzSaqAYS5JFTDMJakChrkkVWBjfFWLpNMM8hul1tq3HNXAPXNJqoBhLkkVMMwlqQKGuSRVwDCXpAoY5pJUAcNckipgmEtSBQxzSapAyzCPiFsi4nhEHFzQ9t6IOBoR95fHVf0tU5K0nHb2zD8FXLlI+0cy89Ly+Epvy5IkrUTLMM/Me4GfrEItkqQORWa27hQxDNyZmZeU+fcCu4GfAjPAZGY+ucRzJ4AJgEajsWNqaqoHZa/c3NwcQ0NDA1n3aurFOGePnuhRNf3T2AzHnh10FaujxrGObN9yWpvv0aWNj48fyMzR5fp0GuYN4AkggfcD2zLzra1eZ3R0NGdmZlpX3gfT09OMjY0NZN2rqRfjHOTd9No1OTLPTbMb46afNY51sbsm+h5dWkS0DPOOrmbJzGOZ+fPM/AXwCeCyTl5HktQbHYV5RGxbMPtG4OBSfSVJ/dfyb7eIuBUYA86LiEeBvwbGIuJSmodZjgBv72ONkqQWWoZ5Zl6/SPPNfahFktShus6qVKaTE5GTI/PsXgcnMCX1lh/nl6QKGOaSVAHDXJIqYJhLUgUMc0mqgGEuSRUwzCWpAoa5JFXAMJekChjmklQBw1ySKmCYS1IFDHNJqoBhLkkVMMwlqQKGuSRVwDCXpAoY5pJUAcNckipgmEtSBQxzSapAyzCPiFsi4nhEHFzQdm5E3BURh8vPc/pbpiRpOe3smX8KuPKUtj3A3Zl5EXB3mZckDUjLMM/Me4GfnNJ8DbC/TO8H3tDjuiRJKxCZ2bpTxDBwZ2ZeUuafysytC5Y/mZmLHmqJiAlgAqDRaOyYmprqQdkrNzc3x9DQ0EDW3anZoydW/JzGZjj2bB+KWWM2yjihzrGObN9yWtt6fI92opNxjo+PH8jM0eX6bOqqqjZk5j5gH8Do6GiOjY31e5WLmp6eZlDr7tTuPV9e8XMmR+a5abbvm3XgNso4oc6xHnnT2Glt6/E92ol+jbPTq1mORcQ2gPLzeO9KkiStVKdhfgewq0zvAm7vTTmSpE60c2nircC/Ay+NiEcj4gZgL/CaiDgMvKbMS5IGpOWBuMy8folFr+pxLZKkDtV1VkXSujC8yMn9yZH5jk76r8SRvVf39fUHyY/zS1IFDHNJqoBhLkkVMMwlqQKGuSRVwDCXpAp4aWIbFruMSpLWEvfMJakChrkkVcAwl6QKGOaSVAHDXJIqYJhLUgUMc0mqgGEuSRUwzCWpAoa5JFXAMJekChjmklQBw1ySKmCYS1IFuroFbkQcAZ4Gfg7MZ+ZoL4qSJK1ML+5nPp6ZT/TgdSRJHfIwiyRVIDKz8ydH/AB4Ekjg45m5b5E+E8AEQKPR2DE1NdXx+roxNzfH0NBQR8+dPXqix9X0T2MzHHt20FX030YZJ2ycsdY+zpHtW4DOsmh8fPxAq8PY3Yb5izPzsYh4EXAX8GeZee9S/UdHR3NmZqbj9XVjenqasbGxjp67nr42bnJknptm6/82wI0yTtg4Y619nEf2Xg10lkUR0TLMuzrMkpmPlZ/HgS8Bl3XzepKkznQc5hFxdkS84OQ08FrgYK8KkyS1r5u/aRrAlyLi5Ot8NjP/tSdVSZJWpOMwz8xHgJf1sBZJUoe8NFGSKmCYS1IFDHNJqoBhLkkVMMwlqQKGuSRVwDCXpAqsmxshdHt/lMmReXavo3usSNJKuGcuSRUwzCWpAoa5JFXAMJekChjmklQBw1ySKmCYS1IFDHNJqoBhLkkVMMwlqQKGuSRVwDCXpAoY5pJUAcNckipgmEtSBboK84i4MiIejIiHImJPr4qSJK1Mx2EeEWcA/wC8HrgYuD4iLu5VYZKk9nWzZ34Z8FBmPpKZ/wtMAdf0pixJ0kpEZnb2xIhrgSsz821l/s3Ab2fmO07pNwFMlNmXAg92Xm5XzgOeGNC6V5PjrM9GGavjXNpvZOYLl+vQzXeAxiJtp/3PkJn7gH1drKcnImImM0cHXUe/Oc76bJSxOs7udHOY5VHg/AXzLwEe664cSVInugnz/wAuiogLIuIs4Drgjt6UJUlaiY4Ps2TmfES8A/gacAZwS2Y+0LPKem/gh3pWieOsz0YZq+PsQscnQCVJa4efAJWkChjmklSB6sN8vd9yICLOj4h7IuJQRDwQEe8s7edGxF0Rcbj8PKe0R0R8tIz3uxHxigWvtav0PxwRuwY1puVExBkR8e2IuLPMXxAR95WaP1dOthMRzyvzD5Xlwwte48bS/mBEvG4wI1leRGyNiC9ExPfLtr28xm0aEX9R/t0ejIhbI+L5NWzTiLglIo5HxMEFbT3bfhGxIyJmy3M+GhGLXQr+yzKz2gfNE7MPAxcCZwHfAS4edF0rHMM24BVl+gXAf9K8fcLfAntK+x7gg2X6KuCrND8HsBO4r7SfCzxSfp5Tps8Z9PgWGe+7gM8Cd5b5zwPXlemPAX9cpv8E+FiZvg74XJm+uGzn5wEXlO1/xqDHtcg49wNvK9NnAVtr26bAduAHwOYF23J3DdsU+D3gFcDBBW09237AN4HLy3O+Cry+ZU2D3uB9/oVfDnxtwfyNwI2DrqvLMd0OvIbmJ2m3lbZtwINl+uPA9Qv6P1iWXw98fEH7L/VbCw+an1W4G7gCuLP8Q34C2HTq9qR5FdXlZXpT6RenbuOF/dbKA/i1EnJxSntV27SE+Y9KWG0q2/R1tWxTYPiUMO/J9ivLvr+g/Zf6LfWo/TDLyX9MJz1a2tal8mfny4H7gEZmPg5Qfr6odFtqzOvhd/F3wF8Cvyjzvw48lZnzZX5hzc+Npyw/Ufqvh3FeCPwY+GQ5pPRPEXE2lW3TzDwKfAj4IfA4zW10gDq3KfRu+20v06e2L6v2MG/rlgPrQUQMAV8E/jwzf7pc10Xacpn2NSEifh84npkHFjYv0jVbLFvT4yw20fwT/R8z8+XAMzT/LF/KuhxrOWZ8Dc1DIy8GzqZ5l9VT1bBNl7PScXU03trDvIpbDkTEmTSD/DOZeVtpPhYR28rybcDx0r7UmNf67+KVwB9ExBGad+C8guae+taIOPnhtoU1PzeesnwL8BPW/jihWeOjmXlfmf8CzXCvbZu+GvhBZv44M38G3Ab8DnVuU+jd9nu0TJ/avqzaw3zd33KgnMW+GTiUmR9esOgO4OTZ7100j6WfbH9LOYO+EzhR/uT7GvDaiDin7DG9trStCZl5Y2a+JDOHaW6nf8vMNwH3ANeWbqeO8+T4ry39s7RfV66MuAC4iObJpDUjM/8b+FFEvLQ0vQr4HpVtU5qHV3ZGxK+Wf8cnx1ndNi16sv3KsqcjYmf5vb1lwWstbdAnEVbhJMVVNK8AeRh4z6Dr6aD+36X5J9Z3gfvL4yqaxxLvBg6Xn+eW/kHzS0MeBmaB0QWv9VbgofL4o0GPbZkxj/H/V7NcSPON+xDwz8DzSvvzy/xDZfmFC57/njL+B2njKoABjfFSYKZs13+heTVDddsUeB/wfeAg8GmaV6Ss+20K3ErzPMDPaO5J39DL7QeMlt/Zw8Dfc8rJ8sUefpxfkipQ+2EWSdoQDHNJqoBhLkkVMMwlqQKGuSRVwDCXpAoY5pJUgf8DkM7rnGIYrtoAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Histogram of amount with 10 bins and show plot\n", "amir_deals['amount'].hist(bins=10);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Probabilities from the normal distribution\n", "Since each deal Amir worked on (both won and lost) was different, each was worth a different amount of money. These values are stored in the `amount` column of `amir_deals` and follow a normal distribution with a mean of 5000 dollars and a standard deviation of 2000 dollars. As part of his performance metrics, you want to calculate the probability of Amir closing a deal worth various amounts.\n", "\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.8943502263331446\n", "0.9772498680518208\n", "0.6826894921370859\n", "3651.0204996078364\n" ] } ], "source": [ "from scipy.stats import norm\n", "\n", "# Probability of deal < 7500\n", "prob_less_7500 = norm.cdf(7500, 5000, 2000)\n", "print(prob_less_7500)\n", "\n", "# Probability of deal > 1000\n", "prob_over_1000 = 1 - norm.cdf(1000, 5000, 2000)\n", "print(prob_over_1000)\n", "\n", "# Probability of deal between 3000 and 7000\n", "prob_3000_to_7000 = norm.cdf(7000, 5000, 2000) - norm.cdf(3000, 5000, 2000)\n", "print(prob_3000_to_7000)\n", "\n", "# Calculate amount that 25% of deals will be less than\n", "pct_25 = norm.ppf(0.25, 5000, 2000)\n", "print(pct_25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simulating sales under new market conditions\n", "The company's financial analyst is predicting that next quarter, the worth of each sale will increase by 20% and the volatility, or standard deviation, of each sale's worth will increase by 30%. To see what Amir's sales might look like next quarter under these new market conditions, you'll simulate new sales amounts using the normal distribution." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAD4CAYAAADFAawfAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAOF0lEQVR4nO3dfYxldX3H8ffHXRBBLCC3BoHpQGpIjIlAJxakMS0oChj5xz+W1PpQm0n6iLaJWdI/jP9hY4w1NerEh5oW8QHRmt0gGh/SmLRrdxF1cdny4IrrE0uMgrapot/+cc8sl+Huzlm4Z+a3M+9XcjLn/s7v3vn+5jf7yZnzsCdVhSSpXU9b7wIkSUdnUEtS4wxqSWqcQS1JjTOoJalxW4f40DPPPLPm5+eH+GhJ2pD27NnzUFWNpm0bJKjn5+fZvXv3EB8tSRtSku8eaZuHPiSpcQa1JDXOoJakxhnUktQ4g1qSGmdQS1LjegV1kjcnuSvJ3iQ3Jzlp6MIkSWOrBnWSs4G/ARaq6gXAFmDb0IVJksb6HvrYCjwjyVbgZOAHw5UkSZq06p2JVfX9JO8AHgD+F/h8VX1+Zb8ki8AiwNzc3KzrlGZifvvOdfm+B268Zl2+rzaGPoc+TgeuBc4DnguckuQ1K/tV1VJVLVTVwmg09XZ1SdKT0OfQx0uB71TVoar6FXAr8OJhy5IkLesT1A8AlyQ5OUmAK4B9w5YlSVq2alBX1S7gFuAO4Fvde5YGrkuS1On135xW1VuBtw5ciyRpCu9MlKTGGdSS1DiDWpIaZ1BLUuMMaklqnEEtSY0zqCWpcQa1JDXOoJakxhnUktQ4g1qSGmdQS1LjDGpJapxBLUmNM6glqXEGtSQ1rs/DbS9IcufE8nCSN61FcZKkHk94qar9wIUASbYA3wc+PXBdkqTOsR76uAK4r6q+O0QxkqQnOtag3gbcPEQhkqTpegd1khOBVwGfPML2xSS7k+w+dOjQrOqTpE3vWPaorwLuqKofT9tYVUtVtVBVC6PRaDbVSZKOKaivw8MekrTmegV1kpOBlwG3DluOJGmlVS/PA6iq/wGePXAtkqQpvDNRkhpnUEtS4wxqSWqcQS1JjTOoJalxBrUkNc6glqTGGdSS1DiDWpIaZ1BLUuMMaklqnEEtSY0zqCWpcQa1JDXOoJakxhnUktQ4g1qSGtf3UVynJbklyd1J9iW5dOjCJEljvR7FBfwj8LmqenWSE4GTB6xJkjRh1aBO8izgJcDrAarql8Avhy1LkrSszx71+cAh4MNJXgjsAa6vql9MdkqyCCwCzM3NzbrODW1++851+b4HbrxmXb6vpGPT5xj1VuBi4L1VdRHwC2D7yk5VtVRVC1W1MBqNZlymJG1efYL6IHCwqnZ1r29hHNySpDWwalBX1Y+A7yW5oGu6Avj2oFVJkg7re9XHXwM3dVd83A+8YbiSJEmTegV1Vd0JLAxciyRpCu9MlKTGGdSS1DiDWpIaZ1BLUuMMaklqnEEtSY0zqCWpcQa1JDXOoJakxhnUktQ4g1qSGmdQS1LjDGpJapxBLUmNM6glqXEGtSQ1zqCWpMb1esJLkgPAI8CvgUeryqe9SNIa6fvMRIA/qqqHBqtEkjSVhz4kqXF996gL+HySAt5fVUsrOyRZBBYB5ubmZlehBjO/fed6lyCph7571JdV1cXAVcBfJnnJyg5VtVRVC1W1MBqNZlqkJG1mvYK6qn7QfX0Q+DTwoiGLkiQ9ZtWgTnJKklOX14Ergb1DFyZJGutzjPo5wKeTLPf/aFV9btCqJEmHrRrUVXU/8MI1qEWSNIWX50lS4wxqSWqcQS1JjTOoJalxBrUkNc6glqTGGdSS1DiDWpIaZ1BLUuMMaklqnEEtSY0zqCWpcQa1JDXOoJakxhnUktQ4g1qSGmdQS1Ljegd1ki1Jvp5kx5AFSZIe71j2qK8H9g1ViCRpul5BneQc4BrgA8OWI0laqe8e9buAtwC/OVKHJItJdifZfejQoZkUJ0nqEdRJXgk8WFV7jtavqpaqaqGqFkaj0cwKlKTNrs8e9WXAq5IcAD4GXJ7kXwetSpJ02KpBXVU3VNU5VTUPbAO+VFWvGbwySRLgddSS1Lytx9K5qr4CfGWQSiRJU7lHLUmNM6glqXEGtSQ1zqCWpMYZ1JLUOINakhpnUEtS4wxqSWqcQS1JjTOoJalxBrUkNc6glqTGGdSS1DiDWpIaZ1BLUuMMaklqnEEtSY3r8xTyk5J8Lck3ktyV5G1rUZgkaazPo7j+D7i8qn6e5ATgq0luq6r/HLg2SRI9grqqCvh59/KEbqkhi5IkPabXw22TbAH2AL8LvKeqdk3pswgsAszNzc2yRum4N79957p97wM3XrNu31uz0etkYlX9uqouBM4BXpTkBVP6LFXVQlUtjEajWdcpSZvWMV31UVU/Bb4CvGKQaiRJT9Dnqo9RktO69WcALwXuHrowSdJYn2PUZwEf6Y5TPw34RFXtGLYsSdKyPld9fBO4aA1qkSRN4Z2JktQ4g1qSGmdQS1LjDGpJapxBLUmNM6glqXEGtSQ1zqCWpMYZ1JLUOINakhpnUEtS4wxqSWqcQS1JjTOoJalxBrUkNc6glqTGGdSS1Lg+z0w8N8mXk+xLcleS69eiMEnSWJ9nJj4K/F1V3ZHkVGBPki9U1bcHrk2SRI896qr6YVXd0a0/AuwDzh66MEnSWJ896sOSzDN+0O2uKdsWgUWAubm5J13Q/PadT/q9ktqxGf8tH7jxmkE+t/fJxCTPBD4FvKmqHl65vaqWqmqhqhZGo9Esa5SkTa1XUCc5gXFI31RVtw5bkiRpUp+rPgJ8ENhXVe8cviRJ0qQ+e9SXAX8CXJ7kzm65euC6JEmdVU8mVtVXgaxBLZKkKbwzUZIaZ1BLUuMMaklqnEEtSY0zqCWpcQa1JDXOoJakxhnUktQ4g1qSGmdQS1LjDGpJapxBLUmNM6glqXEGtSQ1zqCWpMYZ1JLUOINakhrX55mJH0ryYJK9a1GQJOnx+uxR/zPwioHrkCQdwapBXVX/DvxkDWqRJE2x6sNt+0qyCCwCzM3NzepjJT1F89t3rncJeopmdjKxqpaqaqGqFkaj0aw+VpI2Pa/6kKTGGdSS1Lg+l+fdDPwHcEGSg0neOHxZkqRlq55MrKrr1qIQSdJ0HvqQpMYZ1JLUOINakhpnUEtS4wxqSWqcQS1JjTOoJalxBrUkNc6glqTGGdSS1DiDWpIaZ1BLUuMMaklqnEEtSY0zqCWpcQa1JDXOoJakxvUK6iSvSLI/yb1Jtg9dlCTpMX2embgFeA9wFfB84Lokzx+6MEnSWJ896hcB91bV/VX1S+BjwLXDliVJWrbqw22Bs4HvTbw+CPz+yk5JFoHF7uXPk+x/6uU160zgofUuYo1txjHD5hy3Y36S8van9PbfOdKGPkGdKW31hIaqJWDpGIo6biXZXVUL613HWtqMY4bNOW7H3J4+hz4OAudOvD4H+MEw5UiSVuoT1P8FPC/JeUlOBLYBnx22LEnSslUPfVTVo0n+Crgd2AJ8qKruGryytm2KQzwrbMYxw+Yct2NuTKqecLhZktQQ70yUpMYZ1JLUOIO6k+TcJF9Osi/JXUmu79rPSPKFJPd0X0/v2pPk3d1t9d9McvHEZ72u639Pktet15j6SLIlydeT7Ohen5dkV1f7x7sTyCR5evf63m77/MRn3NC170/y8vUZSX9JTktyS5K7u/m+dBPM85u73+u9SW5OctJGm+skH0ryYJK9E20zm9ckv5fkW9173p1k2qXLw6gql/Fx+rOAi7v1U4H/ZnzL/D8A27v27cDbu/WrgdsYX2d+CbCraz8DuL/7enq3fvp6j+8o4/5b4KPAju71J4Bt3fr7gD/v1v8CeF+3vg34eLf+fOAbwNOB84D7gC3rPa5VxvwR4M+69ROB0zbyPDO+ae07wDMm5vj1G22ugZcAFwN7J9pmNq/A14BLu/fcBly1ZmNb7x9uqwvwb8DLgP3AWV3bWcD+bv39wHUT/fd3268D3j/R/rh+LS2Mr4n/InA5sKP7BXwI2NptvxS4vVu/Hbi0W9/a9QtwA3DDxGce7tfiAjyrC62saN/I87x8d/EZ3dztAF6+EecamF8R1DOZ127b3RPtj+s39OKhjym6P/UuAnYBz6mqHwJ0X3+76zbt1vqzj9LeoncBbwF+071+NvDTqnq0ez1Z++Fxddt/1vU/nsYLcD5wCPhwd8jnA0lOYQPPc1V9H3gH8ADwQ8Zzt4eNP9cwu3k9u1tf2b4mDOoVkjwT+BTwpqp6+Ghdp7TVUdqbkuSVwINVtWeyeUrXWmXbcTHeCVsZ/3n83qq6CPgF4z+Jj+S4H3d3XPZaxocrngucwvh/w1xpo8310RzrGNd17Ab1hCQnMA7pm6rq1q75x0nO6rafBTzYtR/p1vrj5Zb7y4BXJTnA+H9EvJzxHvZpSZZvhJqs/fC4uu2/BfyE42e8yw4CB6tqV/f6FsbBvVHnGeClwHeq6lBV/Qq4FXgxG3+uYXbzerBbX9m+JgzqTncG94PAvqp658SmzwLLZ35fx/jY9XL7a7uzx5cAP+v+tLoduDLJ6d2ezJVdW1Oq6oaqOqeq5hmfMPpSVf0x8GXg1V23leNd/jm8uutfXfu27kqB84DnMT7p0qSq+hHwvSQXdE1XAN9mg85z5wHgkiQnd7/ny2Pe0HPdmcm8dtseSXJJ9zN87cRnDW+9D/63sgB/wPhPmW8Cd3bL1YyPzX0RuKf7ekbXP4wfqHAf8C1gYeKz/hS4t1vesN5j6zH2P+Sxqz7OZ/yP717gk8DTu/aTutf3dtvPn3j/33c/h/2s4ZnwpzDeC4Hd3Vx/hvHZ/Q09z8DbgLuBvcC/ML5yY0PNNXAz42Pwv2K8B/zGWc4rsND9/O4D/okVJ6SHXLyFXJIa56EPSWqcQS1JjTOoJalxBrUkNc6glqTGGdSS1DiDWpIa9/+dexEybvGoZAAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Calculate new average amount\n", "new_mean = 5000 * 1.2\n", "\n", "# Calculate new standard deviation\n", "new_sd = 2000 * 1.3\n", "\n", "# Simulate 36 new sales\n", "new_sales = norm.rvs(new_mean, new_sd, 36)\n", "\n", "# Create histogram and show\n", "plt.hist(new_sales);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The central limit theorem\n", "- Sampling Distribution\n", "- Central limit theorem (CLT)\n", " - The sampling distribution of a statistic becomes closer to the normal distribution as the number of trials increases\n", " - Samples should be random and indepdendent\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The CLT in action\n", "The central limit theorem states that a sampling distribution of a sample statistic approaches the normal distribution as you take more samples, no matter the original distribution being sampled from.\n", "\n", "In this exercise, you'll focus on the sample mean and see the central limit theorem in action while examining the `num_users` column of `amir_deals` more closely, which contains the number of people who intend to use the product Amir is selling." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD4CAYAAAD1jb0+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAQQklEQVR4nO3df4zkdX3H8ee7HOp5azkoOjkP0sWUWA0bQSaU1qaZRW1OaAomNpEYe0Sa9Q9tabtJc7VJq7EmZyLSNjGmpyDXhrJaxELA2JArKzFpsXtK3cPTonjFO+idxONkCamuvvvHfNdMll3muz9mZz/7fT6Sycz3M9+Z7/vNZ/fFd7/3/c5EZiJJKs8vDLsASdLqGOCSVCgDXJIKZYBLUqEMcEkq1LaN3Nj555+fo6Ojtdd/7rnn2LFjx+AK2qTsu1ma2jc0t/eV9n348OGnM/OVi8c3NMBHR0eZmZmpvf709DSdTmdwBW1S9t0sTe0bmtv7SvuOiP9ZatxDKJJUKANckgplgEtSoQxwSSqUAS5JhTLAJalQBrgkFcoAl6RCGeCSVKgNvRJzLUb33T+0bR/bf83Qti1Jy3EPXJIKZYBLUqEMcEkqlAEuSYUywCWpUMWchTJMG30GzOTYPDfsu9+zXyS9KPfAJalQBrgkFapvgEfEyyLiqxHxXxHxaER8qBq/KCIejojHIuKzEfGSwZcrSVpQZw/8/4CrMvMNwKXAnoi4EvgocEtmXgycBm4cXJmSpMX6Bnh2zVWLZ1e3BK4C7qrGDwLXDaRCSdKSah0Dj4izIuIR4BTwAPBd4JnMnK9WOQ7sHkyJkqSlRGbWXzliJ/AF4C+Bz2Tmr1TjFwJfzMyxJV4zAUwAtFqty6empmpvb25ujpGREQBmT5yp/brStbbDyedhbPc5wy5lQ/XOd5M0tW9obu8r7Xt8fPxwZrYXj6/oPPDMfCYipoErgZ0Rsa3aC78AeHKZ1xwADgC02+3sdDq1tzc9Pc3C+jcM8dMIN9rk2Dw3z27j2Ls6wy5lQ/XOd5M0tW9obu/r1Xeds1BeWe15ExHbgbcAR4EHgXdUq+0F7llzNZKk2ursge8CDkbEWXQD/3OZeV9EfBOYioi/Br4O3DrAOiVJi/QN8Mz8BnDZEuOPA1cMoihJUn9eiSlJhTLAJalQBrgkFcoAl6RCGeCSVCgDXJIKZYBLUqEMcEkqlAEuSYUywCWpUAa4JBXKAJekQhngklQoA1ySCmWAS1KhDHBJKpQBLkmFMsAlqVAGuCQVygCXpEIZ4JJUKANckgplgEtSoQxwSSpU3wCPiAsj4sGIOBoRj0bETdX4ByPiREQ8Ut2uHny5kqQF22qsMw9MZubXIuIVwOGIeKB67pbM/NjgypMkLadvgGfmU8BT1eNnI+IosHvQhUmSXlxkZv2VI0aBh4BLgD8FbgB+BMzQ3Us/vcRrJoAJgFardfnU1FTt7c3NzTEyMgLA7IkztV9XutZ2OPk8jO0+Z9ilbKje+W6SpvYNze19pX2Pj48fzsz24vHaAR4RI8CXgY9k5t0R0QKeBhL4MLArM9/zYu/RbrdzZmamdtHT09N0Oh0ARvfdX/t1pZscm+fm2W0c23/NsEvZUL3z3SRN7Rua2/tK+46IJQO81lkoEXE28Hngjsy8GyAzT2bmTzPzZ8CngCtqVyNJWrM6Z6EEcCtwNDM/3jO+q2e1twNH1r88SdJy6pyF8ibg3cBsRDxSjX0AuD4iLqV7COUY8N6BVChJWlKds1C+AsQST31x/cuRJNXllZiSVCgDXJIKZYBLUqEMcEkqlAEuSYUywCWpUAa4JBXKAJekQhngklQoA1ySCmWAS1KhDHBJKpQBLkmFMsAlqVAGuCQVygCXpEIZ4JJUKANckgplgEtSoQxwSSpUnW+l15CM7rt/KNs9tv+aoWxX0sq4By5JhTLAJalQfQM8Ii6MiAcj4mhEPBoRN1Xj50XEAxHxWHV/7uDLlSQtqLMHPg9MZubrgCuB90XE64F9wKHMvBg4VC1LkjZI3wDPzKcy82vV42eBo8Bu4FrgYLXaQeC6QRUpSXqhyMz6K0eMAg8BlwBPZObOnudOZ+YLDqNExAQwAdBqtS6fmpqqvb25uTlGRkYAmD1xpvbrStfaDiefH972x3afM5Tt9s53kzS1b2hu7yvte3x8/HBmtheP1w7wiBgBvgx8JDPvjohn6gR4r3a7nTMzM7WLnp6eptPpAMM7pW4YJsfmuXl2eGd4Dus0wt75bpKm9g3N7X2lfUfEkgFe6yyUiDgb+DxwR2beXQ2fjIhd1fO7gFO1q5EkrVmds1ACuBU4mpkf73nqXmBv9XgvcM/6lydJWk6dv9PfBLwbmI2IR6qxDwD7gc9FxI3AE8DvDaZESdJS+gZ4Zn4FiGWefvP6liNJqssrMSWpUH6YlV5gWGf8TI7N0xnKlod7ltPte3YMbdsqm3vgklQoA1ySCmWAS1KhDHBJKpQBLkmFMsAlqVAGuCQVygCXpEIZ4JJUKANckgplgEtSoQxwSSqUAS5JhTLAJalQBrgkFcoAl6RCGeCSVCgDXJIK5VeqaVMZ5lebSaVxD1ySCmWAS1KhDHBJKlTfAI+I2yLiVEQc6Rn7YESciIhHqtvVgy1TkrRYnT3w24E9S4zfkpmXVrcvrm9ZkqR++gZ4Zj4E/HADapEkrUBkZv+VIkaB+zLzkmr5g8ANwI+AGWAyM08v89oJYAKg1WpdPjU1Vbu4ubk5RkZGAJg9cab260rX2g4nnx92FRuvqX1fdM5ZP/85b5re3/EmWWnf4+PjhzOzvXh8tQHeAp4GEvgwsCsz39Pvfdrtds7MzNQuenp6mk6nAzTr/ODJsXlunm3eKfpN7fv2PTt+/nPeNL2/402y0r4jYskAX9VZKJl5MjN/mpk/Az4FXLGa95Ekrd6qAjwidvUsvh04sty6kqTB6Pv3akTcCXSA8yPiOPBXQCciLqV7COUY8N4B1ihJWkLfAM/M65cYvnUAtUiSVsArMSWpUAa4JBXKAJekQhngklQoA1ySCmWAS1KhDHBJKpQBLkmFMsAlqVAGuCQVygCXpEIZ4JJUKANckgrVvK8/kTaZ2RNnuGEI3zh1bP81G77NBQvfsDU5Nr+hvQ+z50FwD1ySCmWAS1KhDHBJKpQBLkmFMsAlqVAGuCQVygCXpEIZ4JJUKANckgrVN8Aj4raIOBURR3rGzouIByLiser+3MGWKUlarM4e+O3AnkVj+4BDmXkxcKhaliRtoL4BnpkPAT9cNHwtcLB6fBC4bp3rkiT1EZnZf6WIUeC+zLykWn4mM3f2PH86M5c8jBIRE8AEQKvVunxqaqp2cXNzc4yMjADdD/xpitZ2OPn8sKvYePa9scZ2n7PxG60s/D5vdO/D7LlXb7bVMT4+fjgz24vHB/5phJl5ADgA0G63s9Pp1H7t9PQ0C+sP49PahmVybJ6bZ5v3QZH2vbGOvauz4dtccEPPpxFuZO/D7LlXb7atxWrPQjkZEbsAqvtTa65EkrQiqw3we4G91eO9wD3rU44kqa46pxHeCfw78NqIOB4RNwL7gbdGxGPAW6tlSdIG6nvwKTOvX+apN69zLZKkFfBKTEkqlAEuSYUywCWpUAa4JBXKAJekQhngklQoA1ySCtW8D56QBMBogz5faMEwez62/5p1f0/3wCWpUAa4JBXKAJekQhngklQoA1ySCmWAS1KhDHBJKpQBLkmFMsAlqVAGuCQVygCXpEIZ4JJUKANckgplgEtSoQxwSSrUmj4PPCKOAc8CPwXmM7O9HkVJkvpbjy90GM/Mp9fhfSRJK+AhFEkqVGTm6l8c8T3gNJDA32fmgSXWmQAmAFqt1uVTU1O1339ubo6RkREAZk+cWXWdpWlth5PPD7uKjWffzdOk3sd2n/Pzx73ZVsf4+PjhpQ5RrzXAX52ZT0bEq4AHgD/MzIeWW7/dbufMzEzt95+enqbT6QDN+v6+ybF5bp5t3teV2nfzNKn33u/E7M22OiJiyQBf0yGUzHyyuj8FfAG4Yi3vJ0mqb9UBHhE7IuIVC4+B3waOrFdhkqQXt5a/XVrAFyJi4X3+KTO/tC5VSZL6WnWAZ+bjwBvWsRZJ0gp4GqEkFcoAl6RCGeCSVCgDXJIKZYBLUqEMcEkqlAEuSYUywCWpUAa4JBXKAJekQhngklQoA1ySCmWAS1KhDHBJKpQBLkmFMsAlqVAGuCQVygCXpEIZ4JJUKANckgplgEtSoQxwSSqUAS5JhTLAJalQawrwiNgTEd+OiO9ExL71KkqS1N+qAzwizgI+AbwNeD1wfUS8fr0KkyS9uLXsgV8BfCczH8/MHwNTwLXrU5YkqZ/IzNW9MOIdwJ7M/INq+d3Ar2Xm+xetNwFMVIuvBb69gs2cDzy9qgLLZt/N0tS+obm9r7TvX87MVy4e3LaGAmKJsRf83yAzDwAHVrWBiJnMbK/mtSWz72Zpat/Q3N7Xq++1HEI5DlzYs3wB8OTaypEk1bWWAP9P4OKIuCgiXgK8E7h3fcqSJPWz6kMomTkfEe8H/hU4C7gtMx9dt8q6VnXoZQuw72Zpat/Q3N7Xpe9V/yOmJGm4vBJTkgplgEtSoTZtgDflMv2IuDAiHoyIoxHxaETcVI2fFxEPRMRj1f25w651vUXEWRHx9Yi4r1q+KCIernr+bPWP41tOROyMiLsi4lvVvP96Q+b7T6qf8SMRcWdEvGwrznlE3BYRpyLiSM/YkvMbXX9X5dw3IuKNK9nWpgzwhl2mPw9MZubrgCuB91W97gMOZebFwKFqeau5CTjas/xR4Jaq59PAjUOpavD+FvhSZv4q8Aa6/w229HxHxG7gj4B2Zl5C98SHd7I15/x2YM+iseXm923AxdVtAvjkSja0KQOcBl2mn5lPZebXqsfP0v1l3k2334PVageB64ZT4WBExAXANcCnq+UArgLuqlbZcj0DRMQvAr8F3AqQmT/OzGfY4vNd2QZsj4htwMuBp9iCc56ZDwE/XDS83PxeC/xDdv0HsDMidtXd1mYN8N3A93uWj1djW1pEjAKXAQ8Drcx8CrohD7xqeJUNxN8Afwb8rFr+JeCZzJyvlrfqnL8G+AHwmerw0acjYgdbfL4z8wTwMeAJusF9BjhMM+Yclp/fNWXdZg3wWpfpbyURMQJ8HvjjzPzRsOsZpIj4HeBUZh7uHV5i1a0459uANwKfzMzLgOfYYodLllId870WuAh4NbCD7uGDxbbinL+YNf3cb9YAb9Rl+hFxNt3wviMz766GTy78KVXdnxpWfQPwJuB3I+IY3cNjV9HdI99Z/XkNW3fOjwPHM/PhavkuuoG+lecb4C3A9zLzB5n5E+Bu4DdoxpzD8vO7pqzbrAHemMv0q2O/twJHM/PjPU/dC+ytHu8F7tno2gYlM/88My/IzFG6c/tvmfku4EHgHdVqW6rnBZn5v8D3I+K11dCbgW+yhee78gRwZUS8vPqZX+h7y895Zbn5vRf4/epslCuBMwuHWmrJzE15A64G/hv4LvAXw65ngH3+Jt0/mb4BPFLdrqZ7TPgQ8Fh1f96wax1Q/x3gvurxa4CvAt8B/hl46bDrG1DPlwIz1Zz/C3BuE+Yb+BDwLeAI8I/AS7finAN30j3O/xO6e9g3Lje/dA+hfKLKuVm6Z+nU3paX0ktSoTbrIRRJUh8GuCQVygCXpEIZ4JJUKANckgplgEtSoQxwSSrU/wPxKoMAqosfTAAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Create a histogram of num_users and show\n", "amir_deals['num_users'].hist();" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "32.0\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD4CAYAAAD1jb0+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAN/klEQVR4nO3df4jk9X3H8ec7nm3FDf6oZrme0g1FUsUj17q1glB2TROusVQDCVRSUWq5FLRYONpc8k8tbeD6h7H/lFJTrUdrspVGUbyQ9Li4lUJJs5dce9prME2u4il3SNS4Iimr7/4x3yN7c7M3s7vf+fG+PB+w7Hw/893PvPbD3GtnvjffmchMJEn1vGfcASRJG2OBS1JRFrgkFWWBS1JRFrgkFbVllDd22WWX5czMTKtzvvXWW1x44YWtzjls1TJXywtmHhUzj8ahQ4dezczLu8dHWuAzMzMsLS21Oufi4iJzc3Otzjls1TJXywtmHhUzj0ZE/G+vcQ+hSFJRFrgkFWWBS1JRFrgkFWWBS1JRFrgkFWWBS1JRFrgkFWWBS1JRIz0TU+szs2f/WG732N6bx3K7ktbHR+CSVJQFLklFWeCSVJQFLklFWeCSVJQFLklFWeCSVJQFLklFWeCSVJQFLklFWeCSVJQFLklFWeCSVJQFLklFWeCSVJQFLklFWeCSVJQFLklFWeCSVJQFLklFWeCSVJQFLklFWeCSVFTfAo+IKyPimYg4GhHPR8S9zfilEXEgIl5ovl8y/LiSpFMGeQS+AuzOzKuBG4C7I+IaYA9wMDOvAg4225KkEelb4Jn5SmZ+q7n8JnAU2AbcAuxrdtsH3DqskJKkM0VmDr5zxAzwLHAt8GJmXrzqutcy84zDKBGxC9gFMD09fd3CwsImI59ueXmZqampVucctkEzHzn+xgjSnGn7totO2x7lGrf1O09fACfeHnz/7t95HM7l+/IkqZh5fn7+UGbOdo8PXOARMQX8C/C5zHw8Il4fpMBXm52dzaWlpXVGP7vFxUXm5uZanXPYBs08s2f/8MP0cGzvzadtj3KN2/qdd29f4f4jWwbev/t3Hodz+b48SSpmjoieBT7Qq1Ai4nzgy8Cjmfl4M3wiIrY2128FTrYVVpLU3yCvQgngIeBoZn5+1VVPAXc0l+8Anmw/niRpLYM8x7wRuB04EhGHm7HPAnuBxyLiLuBF4BPDiShJ6qVvgWfmvwKxxtUfajeOJGlQnokpSUVZ4JJUlAUuSUVZ4JJUlAUuSUVZ4JJUlAUuSUVZ4JJUlAUuSUVZ4JJUlAUuSUVZ4JJUlAUuSUVZ4JJUlAUuSUVZ4JJUlAUuSUVZ4JJUlAUuSUVZ4JJUlAUuSUX1/VR6/eSZ2bP/tO3d21e4s2vsXNP9O4/Ssb03j+22VZuPwCWpKAtckoqywCWpKAtckoqywCWpKAtckoqywCWpKAtckoqywCWpKAtckoqywCWpKAtckoqywCWpKAtckorqW+AR8XBEnIyI51aN3RcRxyPicPP10eHGlCR1G+QR+CPAzh7jD2TmjubrK+3GkiT107fAM/NZ4AcjyCJJWofIzP47RcwAT2fmtc32fcCdwA+BJWB3Zr62xs/uAnYBTE9PX7ewsNBC7B9bXl5mamqq1TmHbdDMR46/MYI0/U1fACfeHneK9amUefu2i4Bz+748SSpmnp+fP5SZs93jGy3waeBVIIE/A7Zm5u/2m2d2djaXlpbWl7yPxcVF5ubmWp1z2AbNPM6P+Vpt9/YV7j9S69P3KmU+9ZFq5/J9eZJUzBwRPQt8Q69CycwTmflOZr4LfAG4frMBJUnrs6ECj4itqzY/Bjy31r6SpOHo+xwzIr4EzAGXRcRLwJ8AcxGxg84hlGPAp4aYUZLUQ98Cz8zbegw/NIQskqR18ExMSSrKApekoixwSSrKApekoixwSSrKApekoixwSSrKApekoixwSSrKApekoixwSSrKApekoixwSSrKApekoixwSSrKApekoixwSSrKApekoixwSSrKApekoixwSSrKApekoixwSSrKApekoixwSSrKApekoixwSSrKApekoixwSSrKApekoixwSSrKApekoixwSSrKApekoixwSSrKApekovoWeEQ8HBEnI+K5VWOXRsSBiHih+X7JcGNKkroN8gj8EWBn19ge4GBmXgUcbLYlSSPUt8Az81ngB13DtwD7msv7gFtbziVJ6iMys/9OETPA05l5bbP9emZevOr61zKz52GUiNgF7AKYnp6+bmFhoYXYP7a8vMzU1FSrc3Y7cvyNVuebvgBOvN3qlENVLS/Uyrx920XAaO7LbTPzaMzPzx/KzNnu8S3DvuHMfBB4EGB2djbn5uZanX9xcZG25+x25579rc63e/sK9x8Z+tK3plpeqJX52CfngNHcl9tm5vHa6KtQTkTEVoDm+8n2IkmSBrHRAn8KuKO5fAfwZDtxJEmDGuRlhF8C/g34QES8FBF3AXuBD0fEC8CHm21J0gj1PUiYmbetcdWHWs4iSVoHz8SUpKIscEkqygKXpKIscEkqygKXpKIscEkqygKXpKIscEkqygKXpKIscEkqygKXpKIscEkqygKXpKIscEkqygKXpKIscEkqygKXpKIscEkqygKXpKIscEkqygKXpKIscEkqygKXpKIscEkqygKXpKIscEkqygKXpKIscEkqygKXpKIscEkqasu4A0g/6Wb27Adg9/YV7mwuj8KxvTeP7LY0HD4Cl6SiLHBJKsoCl6SiLHBJKsoCl6SiNvUqlIg4BrwJvAOsZOZsG6EkSf218TLC+cx8tYV5JEnr4CEUSSoqMnPjPxzxfeA1IIG/ycwHe+yzC9gFMD09fd3CwsKGb6+X5eVlpqamWp2z25Hjb7Q63/QFcOLtVqccqmp5wcyjstHM27dd1H6YAY2iM9o2Pz9/qNch6s0W+M9l5ssR8T7gAPAHmfnsWvvPzs7m0tLShm+vl8XFRebm5lqds9tMy2fH7d6+wv1H6pwEWy0vmHlUNpp5nGeBjqIz2hYRPQt8U4dQMvPl5vtJ4Ang+s3MJ0ka3IYLPCIujIj3nroMfAR4rq1gkqSz28zztWngiYg4Nc8XM/OrraSSJPW14QLPzO8BH2wxiyRpHXwZoSQVZYFLUlEWuCQVZYFLUlEWuCQVZYFLUlEWuCQVZYFLUlEWuCQVZYFLUlEWuCQVZYFLUlEWuCQVZYFLUlEWuCQVZYFLUlEWuCQVZYFLUlEWuCQVZYFLUlEWuCQVZYFLUlEWuCQVZYFLUlEWuCQVZYFLUlEWuCQVZYFLUlEWuCQVZYFLUlFbxh1gUDN79vcc3719hTvXuE6SVpvZs39snXFs782tz+kjcEkqygKXpKIscEkqygKXpKIscEkqalMFHhE7I+I7EfHdiNjTVihJUn8bLvCIOA/4K+A3gGuA2yLimraCSZLObjOPwK8HvpuZ38vM/wMWgFvaiSVJ6icyc2M/GPFxYGdm/l6zfTvwq5l5T9d+u4BdzeYHgO9sPG5PlwGvtjznsFXLXC0vmHlUzDwaP5+Zl3cPbuZMzOgxdsZfg8x8EHhwE7dz9hARS5k5O6z5h6Fa5mp5wcyjYubx2swhlJeAK1dtXwG8vLk4kqRBbabAvwlcFRHvj4ifAn4beKqdWJKkfjZ8CCUzVyLiHuBrwHnAw5n5fGvJBje0wzNDVC1ztbxg5lEx8xht+D8xJUnj5ZmYklSUBS5JRZUp8Ii4MiKeiYijEfF8RNzbjN8XEccj4nDz9dFxZz0lIn4mIv49Iv6jyfynzfj7I+IbEfFCRPxj85/AE+EsmR+JiO+vWucd487aLSLOi4hvR8TTzfbErvMpPTJP9DpHxLGIONJkW2rGLo2IA806H4iIS8adc7U1Mk9sb6xHmQIHVoDdmXk1cANw96pT9x/IzB3N11fGF/EMPwJuyswPAjuAnRFxA/AXdDJfBbwG3DXGjN3WygzwR6vW+fD4Iq7pXuDoqu1JXudTujPD5K/zfJPt1Gup9wAHm3U+2GxPmu7MMLm9MbAyBZ6Zr2Tmt5rLb9K5028bb6qzy47lZvP85iuBm4B/asb3AbeOIV5PZ8k80SLiCuBm4G+b7WCC1xnOzFzYLXTWFyZwnc9lZQp8tYiYAX4J+EYzdE9E/GdEPDyBT9/Oi4jDwEngAPA/wOuZudLs8hIT9oeoO3NmnlrnzzXr/EBE/PQYI/byl8AfA+822z/LhK8zZ2Y+ZZLXOYF/johDzdtkAExn5ivQeaAFvG9s6XrrlRkmuDcGVa7AI2IK+DLwh5n5Q+CvgV+g83T/FeD+McY7Q2a+k5k76Jypej1wda/dRpvq7LozR8S1wGeAXwR+BbgU+PQYI54mIn4TOJmZh1YP99h1YtZ5jcwwwevcuDEzf5nOu5DeHRG/Nu5AA+iVeaJ7Y1ClCjwizqdT3o9m5uMAmXmiKZx3gS/QKcmJk5mvA4t0jt9fHBGnTqKa2LcgWJV5Z3MIKzPzR8DfMVnrfCPwWxFxjM67Yt5E59HtJK/zGZkj4h8mfJ3JzJeb7yeBJ+jkOxERWwGa7yfHl/BMvTJX6Y1+yhR4c0zzIeBoZn5+1fjWVbt9DHhu1NnWEhGXR8TFzeULgF+nc+z+GeDjzW53AE+OJ+GZ1sj836v+gQadY5wTs86Z+ZnMvCIzZ+i8pcPXM/OTTPA6r5H5dyZ5nSPiwoh476nLwEfo5HuKzvrChK3zWpknuTfWYzPvRjhqNwK3A0ea47MAn6XzQRI76Dw9PgZ8ajzxetoK7IvOh1+8B3gsM5+OiP8CFiLiz4Fv0/nDNCnWyvz1iLiczqGJw8DvjzPkgD7N5K7zWh6d4HWeBp7o/G1hC/DFzPxqRHwTeCwi7gJeBD4xxozd1sr89xPcGwPzVHpJKqrMIRRJ0ukscEkqygKXpKIscEkqygKXpKIscEkqygKXpKL+HwENbfPBfp+tAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Set seed to 104\n", "np.random.seed(104)\n", "\n", "# Sample 20 num_users with replacement from amir_deals\n", "samp_20 = amir_deals['num_users'].sample(20, replace=True)\n", "\n", "# Take the mean of samp_20\n", "print(np.mean(samp_20))\n", "\n", "sample_means = []\n", "# Loop 100 times\n", "for i in range(100):\n", " # Take sample of 20 num_users\n", " samp_20 = amir_deals['num_users'].sample(20, replace=True)\n", " # Calculate mean of samp_20\n", " samp_20_mean = np.mean(samp_20)\n", " # Append samp_20_mean to sample_means\n", " sample_means.append(samp_20_mean)\n", " \n", "# Convert to Series and plot histogram\n", "sample_means_series = pd.Series(sample_means)\n", "sample_means_series.hist();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The mean of means\n", "You want to know what the average number of users (`num_users`) is per deal, but you want to know this number for the entire company so that you can see if Amir's deals have more or fewer users than the company's average deal. The problem is that over the past year, the company has worked on more than ten thousand deals, so it's not realistic to compile all the data. Instead, you'll estimate the mean by taking several random samples of deals, since this is much easier than collecting data from everyone in the company." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
productnum_users
Unnamed: 0
1354419
2507343
3614987
4786383
51417
\n", "
" ], "text/plain": [ " product num_users\n", "Unnamed: 0 \n", "1 3544 19\n", "2 5073 43\n", "3 6149 87\n", "4 7863 83\n", "5 14 17" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_deals = pd.read_csv('./dataset/all_deals.csv', index_col=0)\n", "all_deals.head()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "38.31333333333332\n", "37.651685393258425\n" ] } ], "source": [ "# Set seed to 321\n", "np.random.seed(321)\n", "\n", "sample_means = []\n", "# Loop 30 times to take 30 means\n", "for i in range(30):\n", " # Take sample of size 20 from num_users col of all_deals with replacement\n", " cur_sample = all_deals['num_users'].sample(20, replace=True)\n", " # Take mean of cur_sample\n", " cur_mean = np.mean(cur_sample)\n", " # Append cur_mean to sample_means\n", " sample_means.append(cur_mean)\n", " \n", "# Print mean of sample_means\n", "print(np.mean(sample_means))\n", "\n", "# Print mean of num_users in amir_deals\n", "print(amir_deals['num_users'].mean())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Poisson distribution\n", "- Poisson process\n", " - Events appear to happen at a certain rate, but completely at random\n", " - Examples\n", " - Number of animals adopted from an animal shelter per week\n", " - Number of people arriving at a restaurant per hour\n", " - Number of earthquakes in California per year\n", " - Time unit is irrelevant, as long as you use the same unit when talking about the same situation\n", "- Poisson distribution\n", " - Probability of some # of events occurring over a fixed period of time\n", " - Examples\n", " - Probability of $\\geq$ 5 animals adopted from an animal shelter per week\n", " - Probability of 12 people arriving at a restaurant per hour\n", " - Probability of $\\lt$ 20 earthquakes in California per year\n", "- Lambda $\\lambda$\n", " - Average number of events per time interval\n", "- CLT still applies!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tracking lead responses\n", "Your company uses sales software to keep track of new sales leads. It organizes them into a queue so that anyone can follow up on one when they have a bit of free time. Since the number of lead responses is a countable outcome over a period of time, this scenario corresponds to a Poisson distribution. On average, Amir responds to 4 leads each day. In this exercise, you'll calculate probabilities of Amir responding to different numbers of leads.\n", "\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.1562934518505317\n", "0.17140068409793663\n", "0.23810330555354436\n", "0.0028397661205137315\n" ] } ], "source": [ "from scipy.stats import poisson\n", "\n", "# Probability of 5 responses\n", "prob_5 = poisson.pmf(5, 4)\n", "print(prob_5)\n", "\n", "# Probability of 5 response\n", "prob_coworker = poisson.pmf(5, 5.5)\n", "print(prob_coworker)\n", "\n", "# Probability of 2 or fewer responses\n", "prob_2_or_less = poisson.cdf(2, 4)\n", "print(prob_2_or_less)\n", "\n", "# Probability of > 10 response\n", "prob_over_10 = 1 - poisson.cdf(10, 4)\n", "print(prob_over_10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## More probability distributions\n", "- Exponential distribution\n", " - Probability of time between Poisson events\n", " - Example\n", " - Probability of $\\gt$ 1 day between adoptions\n", " - Probability of $\\lt$ 10 minutes between restaurant arrivals\n", " - Probability of 6-8 months between earthquakes\n", " - Also uses $\\lambda$\n", " - Continuous (time)\n", "- (Student's) t-distribution\n", " - Similar shape as the normal distribution\n", "- Degress of Freedom (DoF)\n", " - Parameter degree of freedom (df) which affects the thickness of the tails\n", " - Lower DF: thicker tails, higher standard deviation\n", " - Higher DF: closer to normal distribution\n", "- Log-normal distribution\n", " - Variable whose logarithm is normally distributed\n", " - Results in distributions that are skewed, unlike the normal distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Modeling time between leads\n", "To further evaluate Amir's performance, you want to know how much time it takes him to respond to a lead after he opens it. On average, it takes 2.5 hours for him to respond. In this exercise, you'll calculate probabilities of different amounts of time passing between Amir receiving a lead and sending a response.\n", "\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.4511883639059735\n", "0.02732372244729253\n", "0.04694985576704136\n" ] } ], "source": [ "from scipy.stats import expon\n", "\n", "# Print probability response takes < 1 hour\n", "print(expon.cdf(1, 0.4))\n", "\n", "# Print probability response takes > 4 hours\n", "print(1 - expon.cdf(4, 0.4))\n", "\n", "# Print probability response takes 3-4 hours\n", "print(expon.cdf(4, 0.4) - expon.cdf(3, 0.4))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }