{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "uoP34MT3umZC" }, "source": [ "# Stochastic simulation helps you grasp concepts of statistics\n", "## Dr. Tirthajyoti Sarkar ([LinkedIn](https://www.linkedin.com/in/tirthajyoti-sarkar-2127aa7/), [Github](https://github.com/tirthajyoti)), Fremont, CA, July 2020\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simulation helps distilling concepts\n", "### Grasping statistic concepts can be hard\n", "Do you find grasping the concepts of statistical analysis - law of large numbers, expectation value, confidence interval, p-value - somewhat difficult and troublesome?\n", "\n", "You are not alone.\n", "\n", "Our human brain and psyche have not evolved to deal with rigorous statistical methods. In fact, [a study](https://www.sciencedaily.com/releases/2018/10/181012082713.htm) of why people struggle to solve statistical problems reveals a preference for complicated rather than simpler, more intuitive solutions - which often leads to failure in solving the problem altogether.\n", "\n", "We are good with a small set of numbers. The short-term working memory of the human brain is around [7–8 items/numbers](https://human-memory.net/short-term-working-memory/).\n", "\n", "Therefore, whenever a process presents itself with a scale of thousands or millions, we tend to lose our grasp on the 'inherent nature' of that process. The laws and patterns, which are only manifested at the limit of large numbers, seem random and meaningless to us.\n", "\n", "Statistics deals with large numbers and almost all theories and results in the statistical modeling and analysis are valid at the limit of large numbers only.\n", "\n", "![frustrated-at-stat](https://raw.githubusercontent.com/tirthajyoti/Stats-Maths-with-Python/master/images/Frustrated-at-stat.png)\n", "\n", "### Data science/Machine learning is rooted in statistics - what to do?\n", "In this era of data science and machine learning, where knowledge of the core statistical concepts are considered essential for success in those fields, this can be worrisome for data science practitioners and folks who are on their path to learn the trade.\n", "\n", "But do not despair. There is a surprisingly easy way to tackle this. And it is called 'simulation'. In particular, discrete, stochastic, event-based simulation.\n", "\n", "## Let me show you the simplest possible example\n", "\n", "![dice](https://cdn.pixabay.com/photo/2016/09/08/18/45/cube-1655118_1280.jpg)\n", "\n", "Suppose we are throwing a (fair) dice with 6 possible faces - 1 to 6. This event of the dice face taking up a value from the set {1,2,3,4,5,6} is represented by a random variable. In a formal setting, the so-called 'expectation value' (denoted by $\\text{E}[X]$) of any random variable $X$ is given by,\n", "\n", "where $f(x)$ is the probability distribution function (PDF) or probability mass function (PMF) for $X$ i.e. the mathematical function that describes the distribution of the possible values that $X$ can assume.\n", "\n", "For a dice throwing situation, the random variable $X$ is of discrete nature i.e. it can assume discrete values only, so it has a PMF (and not a PDF). And it is a very simple PMF,\n", "\n", "$$f(x) = \\frac{1}{6}$$\n", "\n", "This is because the random variable has a 'uniform probability distribution' over the sample space {1,2,3,4,5,6} i.e. any dice throw can result in any one of these values, completely randomly, and without any bias towards any particular value. Therefore, the expected value is,\n", "\n", "$$\\text{E}[X] = \\sum_{x}x.f(x) = \\frac{1}{6}.(1+2+3+4+5+6)=\\frac{21}{6}=3.5$$\n", "\n", "So, as per theory, this is the expected value of the dice throwing process.\n", "\n", "**Is it the most probable value?** No. Because a dice does not even have a face with 3.5! So, what's the meaning of this quantity?\n", "\n", "**Is it some kind of probability**? No. Because the value is clearly greater than 1 and probability values are always between 0 and 1.\n", "\n", "**Does it mean we can expect the face to turn up either 3 or 4 most times (3.5 is the average of 3 and 4)?** No. Because the PMF tells us that all the faces are equally likely to turn up.\n", "\n", "Fortunately, the answer is provided by a fundamental tenet of statistics - The Law of large numbers - which says that, in the long run, the expected value is simply the average of all the values that the random variable will take.\n", "\n", "Notice the phrase \"_in the long run_\". How do we verify this? Can we simulate such a scenario?\n", "\n", "Sure we can. Simple Python code can help us simulate the scenario and verify the Law of Large Numbers." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": {}, "colab_type": "code", "id": "zvw2bqmQumZD" }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import scipy.stats as stats\n", "from scipy import mean" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The `dice` array and a simple dice-throwing function" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "dice = np.array([1,2,3,4,5,6])\n", "\n", "def dice_throw(dice):\n", " \"\"\"\n", " Simulates a single dice throw\n", " \"\"\"\n", " return np.random.choice(dice)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Throw the dice a few times" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Here are 10 throws...\n", "3,3,5,1,3,2,3,1,2,6,\n", "\n", "Here are 10 more throws...\n", "3,6,4,2,3,3,5,2,6,3," ] } ], "source": [ "print(\"Here are 10 throws...\")\n", "for i in range(10):\n", " print(dice_throw(dice),end=',')\n", " \n", "print(\"\\n\\nHere are 10 more throws...\")\n", "for i in range(10):\n", " print(dice_throw(dice),end=',')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simulate for a long time" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "av = []\n", "n_throws = []\n", "for i in [5,10,15,20,25,50,75,100,150,200,250,500,750,1000]:\n", " throws = []\n", " for j in range(i):\n", " throws.append(dice_throw(dice)) \n", " mean = np.array(throws).mean()\n", " av.append(mean)\n", " n_throws.append(i)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(8,5))\n", "plt.title(\"Running average of dice throw simulation\",fontsize=18)\n", "plt.plot(n_throws,av,color='blue',marker='o',markersize=10)\n", "plt.hlines(y=3.5,xmin=0,xmax=1100,linestyle='--',lw=3)\n", "plt.xticks(fontsize=15)\n", "plt.yticks(fontsize=15)\n", "plt.xlabel(\"Number of dice throw\",fontsize=15)\n", "plt.ylabel(\"Running average\",fontsize=15)\n", "plt.grid(True)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "2TTVgVHpumZG" }, "source": [ "## Confidence interval\n", "\n", "### Some essential definitions\n", "\n", "***Population***: The whole collection of which we want to measure some property. We can (almost) never get enough data about the whole population. Therefore, we can never know the true values of population properties.\n", "\n", "***Sample***: A fraction (subset) of data from the population, which we can gather, and which helps us estimate the properties of the population. Because we cannot measure the true values of the population properties, we can only estimate them. This is the central job of statisticians.\n", "\n", "***Statistic***: A statistic is a function of a sample. It is a random variable because every time you take a new sample (from the same population) you will get a new value for the statistic. Examples are the sample mean or the sample variance. These are good (unbiased) estimates of the population.\n", "\n", "$$\\bar{X_n} = \\frac{1}{n}\\sum_{i=1}^{i=n}X_i, \\ \\ S_n=\\frac{1}{n-1}\\sum_{i=1}^{i=n}(X_i-\\bar{X_n})^2$$\n", "\n", "### Confidence interval\n", "\n", "A range/bound around the statistic (of our choice). We need this min/max bound to quantify the uncertainty of the random nature of our sampling. Let's clarify this further with the example of the confidence interval for the mean.\n", "\n", "Depending on where and how we are drawing the sample, we may get a good representation of the population or not. So, if we **repeat the process of drawing the sample many times**, in some cases the sample will contain the true mean of the population, and in other cases, it will miss it. \n", "\n", "**Can we say anything about the proportion of our success in drawing a sample which contains the true mean**? \n", "\n", "The answer to this question is found in the confidence interval. If some assumptions are met, then we can calculate the confidence interval that will contain the true mean (when we sample a large number of times) with a certain fraction.\n", "\n", "The necessary formulas are given below. We won't get into details about this formula or why the particular t-distribution is used in this equation. Readers can refer to any undergraduate level stats text or excellent online resources to understand the rationale.\n", "\n", "
\n", "\n", "\n", "\n", "**Source**: https://psu.instructure.com/courses/1844486/pages/chapter-3-confidence-intervals\n", "\n", "
\n", "Confidence intervals are a calculated range or boundary around a parameter or a statistic that is supported mathematically with a certain level of confidence. \n", "\n", "This is *__different__* than having a 95% probability that the true population proportion is within our confidence interval. Essentially, if we were to repeat this process, 95% of our calculated confidence intervals would contain the true proportion.\n", "\n", "The equation to create a confidence interval can also be shown as:\n", "\n", "$$Population\\ Proportion\\ or\\ Mean\\ \\pm (t-multiplier *\\ Standard\\ Error)$$\n", "\n", "The _Standard Error_ is calculated differenly for population proportion and mean:\n", "\n", "$$Standard\\ Error \\ for\\ Population\\ Proportion = \\sqrt{\\frac{Population\\ Proportion * (1 - Population\\ Proportion)}{Number\\ Of\\ Observations}}$$\n", "\n", "$$Standard\\ Error \\ for\\ Mean = \\frac{Standard\\ Deviation}{\\sqrt{Number\\ Of\\ Observations}}$$\n", "\n", "Therefore, the $(1-\\alpha)$% C.I. (**C**onfidence **I**nterval) is given by,\n", "\n", "$$ C.I. = \\bar{X_n} \\pm t_{\\alpha,n-1}*\\frac{S_n}{\\sqrt{n}} $$\n", "\n", "where,\n", "\n", "$\\bar{X_n} = \\text{sample mean}$, $S_n = \\text{sample standard dev}$, $n = \\text{number of samples}$, $t_{\\alpha,n-1}$ is the t-statistic for parameter $\\alpha$ and degrees of freedom $(n-1)$.\n", "\n", "### What is the practical utility?\n", "Be careful about the definition and the process to understand the true practical utility of the confidence interval.\n", "\n", "When you are calculating a 95% confidence interval of mean, you are not calculating any probability (0.95 or otherwise). You are calculating two specific numbers (min and max bounds around the sample mean) which creates a range of values that will contain the true population mean (unknown) if we were to repeat the process.\n", "Here lies the practical utility. We are not repeating the process. We are just drawing the sample once and constructing this range.\n", "\n", "If we could repeat the process a million times, we would be able to verify the claim that the true mean lies inside this range in 95% cases.\n", "\n", "But sampling a million times can be quite expensive and downright impossible in real life. So, the theoretical calculation of the confidence interval provides us with the min/max range, just from one draw of the sample. This is amazing, isn't it?\n", "\n", "### But in the simulation, we can experiment a million times!\n", "Yes, simulation is fantastic. We can repeat the sampling process a million times and verify the claim that our theoretical confidence interval truly contains the population mean, approximate 95% of the time.\n", "\n", "Let's verify it using a real-life example of factory production. Let's say in a factory, a certain machine produces 20 tons of product on average, with a standard deviation of 5 tons. These are the true population mean and standard deviation. So, we can write simple Python code to generate a typical production run over a year (52 weeks) and plot it." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": {}, "colab_type": "code", "id": "40xYmAb_umZG" }, "outputs": [], "source": [ "num_weeks = 52\n", "production = np.random.normal(loc=20,scale=5,size=num_weeks)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 288 }, "colab_type": "code", "executionInfo": { "elapsed": 1635, "status": "ok", "timestamp": 1567211402670, "user": { "displayName": "Tirthajyoti Sarkar", "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mD6d7dlMqdpzL4sermIF1ujmpSRxY2WnE4tuB-UsQ=s64", "userId": "01914075970409030121" }, "user_tz": 420 }, "id": "BGgac_VsumZI", "outputId": "63758a0d-a59f-403b-d367-2e54187ed75c" }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(10,4))\n", "plt.title(\"Typical factory production over a year\",fontsize=16)\n", "plt.plot(production,c='blue',lw=2,marker='o',markersize=10)\n", "plt.grid(True)\n", "plt.xlabel(\"Weeks\",fontsize=15)\n", "plt.ylabel(\"Production (tons)\",fontsize=15)\n", "plt.hlines(y=production.mean(),xmin=-2,xmax=54,color='red',linestyle='--',lw=3)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sample mean" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": {}, "colab_type": "code", "id": "hMisk5v3umZM" }, "outputs": [], "source": [ "n = len(production)\n", "m = production.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sample standard deviation" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "executionInfo": { "elapsed": 1506, "status": "ok", "timestamp": 1567211402671, "user": { "displayName": "Tirthajyoti Sarkar", "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mD6d7dlMqdpzL4sermIF1ujmpSRxY2WnE4tuB-UsQ=s64", "userId": "01914075970409030121" }, "user_tz": 420 }, "id": "uXeImgLCumZO", "outputId": "e4c0ae30-25ca-4497-92c1-f4d42e6a4f1f" }, "outputs": [ { "data": { "text/plain": [ "4.919252299560373" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "production.std()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sample standard error" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "executionInfo": { "elapsed": 1496, "status": "ok", "timestamp": 1567211402671, "user": { "displayName": "Tirthajyoti Sarkar", "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mD6d7dlMqdpzL4sermIF1ujmpSRxY2WnE4tuB-UsQ=s64", "userId": "01914075970409030121" }, "user_tz": 420 }, "id": "rNII4IARumZQ", "outputId": "2daf79cc-ee3e-476d-af24-2f0b93011c2e" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.6821775539618872\n" ] } ], "source": [ "std_err=production.std()/np.sqrt(n)\n", "print(std_err)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 90% confidence interval" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "executionInfo": { "elapsed": 1491, "status": "ok", "timestamp": 1567211402672, "user": { "displayName": "Tirthajyoti Sarkar", "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mD6d7dlMqdpzL4sermIF1ujmpSRxY2WnE4tuB-UsQ=s64", "userId": "01914075970409030121" }, "user_tz": 420 }, "id": "tjCO4IQBumZU", "outputId": "5a599449-94f2-4646-e55e-6be4bae00b0e" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "90% confidence interval of mean from 18.19130937896271 to 20.47618007932221\n" ] } ], "source": [ "confidence = 0.9\n", "h = std_err * stats.t.ppf((1 + confidence) / 2, n)\n", "i90 =[m-h,m+h]\n", "print(\"90% confidence interval of mean from \",m-h,\" to \",m+h)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 99% confidence interval" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "executionInfo": { "elapsed": 1484, "status": "ok", "timestamp": 1567211402672, "user": { "displayName": "Tirthajyoti Sarkar", "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mD6d7dlMqdpzL4sermIF1ujmpSRxY2WnE4tuB-UsQ=s64", "userId": "01914075970409030121" }, "user_tz": 420 }, "id": "Ww8G6CVmumZX", "outputId": "4b4a7bdf-33c7-4a7a-8a78-00f811ae6f72" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "99% confidence interval of mean from 17.509783661041904 to 21.157705797243015\n" ] } ], "source": [ "confidence = 0.99\n", "h = std_err * stats.t.ppf((1 + confidence) / 2, n)\n", "i99 =[m-h,m+h]\n", "print(\"99% confidence interval of mean from \",m-h,\" to \",m+h)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "executionInfo": { "elapsed": 1477, "status": "ok", "timestamp": 1567211402672, "user": { "displayName": "Tirthajyoti Sarkar", "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mD6d7dlMqdpzL4sermIF1ujmpSRxY2WnE4tuB-UsQ=s64", "userId": "01914075970409030121" }, "user_tz": 420 }, "id": "TUkK7WPfumZZ", "outputId": "75861ce1-7970-419a-e3cc-d404daddd87c" }, "outputs": [ { "data": { "text/plain": [ "(18.19130937896271, 20.47618007932221)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "i90[0],i90[1]" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "executionInfo": { "elapsed": 1471, "status": "ok", "timestamp": 1567211402673, "user": { "displayName": "Tirthajyoti Sarkar", "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mD6d7dlMqdpzL4sermIF1ujmpSRxY2WnE4tuB-UsQ=s64", "userId": "01914075970409030121" }, "user_tz": 420 }, "id": "mWojRg6PumZb", "outputId": "2721d35c-0322-4180-c406-0aa4f712eec2" }, "outputs": [ { "data": { "text/plain": [ "(17.509783661041904, 21.157705797243015)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "i99[0],i99[1]" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "yyikuQkG6LnN" }, "source": [ "### Repeat the random process many times" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "colab": {}, "colab_type": "code", "id": "pQyXDKYhumZd" }, "outputs": [], "source": [ "def repeat(n):\n", " \"\"\"\n", " Simulates the factory run `n` number of times\n", " Counts the frequency where population mean (i.e. 20) is contained in the C.I.\n", " \"\"\"\n", " interval_90_count = 0\n", " interval_99_count = 0\n", " num_weeks = 52\n", " \n", " for i in range(n): \n", " production = np.random.normal(loc=20,scale=5,size=num_weeks)\n", " m = production.mean()\n", " std_err=production.std()/np.sqrt(num_weeks)\n", " # For 90% C.I. \n", " confidence = 0.9\n", " h = std_err * stats.t.ppf((1 + confidence) / 2, num_weeks)\n", " if m-h <= 20 <= m+h:\n", " interval_90_count+=1\n", " # For 99% C.I.\n", " confidence = 0.99\n", " h = std_err * stats.t.ppf((1 + confidence) / 2, num_weeks)\n", " if m-h <= 20 <= m+h:\n", " interval_99_count+=1\n", " return (interval_90_count,interval_99_count)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "colab": {}, "colab_type": "code", "id": "HTb5JIMwumZf" }, "outputs": [], "source": [ "repeatations = 10000\n", "int_90,int_99 = repeat(repeatations)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "executionInfo": { "elapsed": 1458, "status": "ok", "timestamp": 1567211402674, "user": { "displayName": "Tirthajyoti Sarkar", "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mD6d7dlMqdpzL4sermIF1ujmpSRxY2WnE4tuB-UsQ=s64", "userId": "01914075970409030121" }, "user_tz": 420 }, "id": "DtUlX-6fumZh", "outputId": "0bb67e2a-9903-493a-a2a4-3677fe94b8c3" }, "outputs": [ { "data": { "text/plain": [ "0.898" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "round(int_90/repeatations,3)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "executionInfo": { "elapsed": 1452, "status": "ok", "timestamp": 1567211402674, "user": { "displayName": "Tirthajyoti Sarkar", "photoUrl": "https://lh3.googleusercontent.com/a-/AAuE7mD6d7dlMqdpzL4sermIF1ujmpSRxY2WnE4tuB-UsQ=s64", "userId": "01914075970409030121" }, "user_tz": 420 }, "id": "VrJpX6L8umZk", "outputId": "a6ddbf00-a448-46bd-b2b4-630e90193254" }, "outputs": [ { "data": { "text/plain": [ "0.99" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "round(int_99/repeatations,3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary and thoughts for simulation\n", "In this notebook, we demonstrated the power of simulation to understand concepts of statistical estimation like expected value and confidence interval. In reality, we do not get the chance to repeat a statistical experiment thousands of times, but we can simulate the process on a computer, which helps us to distill down these concepts in a clear and intuitive manner.\n", "\n", "Once you master the art of simulating a stochastic event, you can investigate the properties of the random variables and the esoteric statistical theory behind them, with a new weapon of analysis.\n", "\n", "For example, you can investigate, using stochastic simulation,\n", "\n", "- The convergence of the mean of many stochastic events to a Normal distribution (verifying the Central Limit Theorem by numerical experiment)\n", "- Check what happens when you mix or transform many statistical distributions together in this way or that? what kind of resulting distributions do you get?\n", "- If a stochastic event does not follow the theoretical assumptions, what kind of aberrant behavior you can get in the result? In this case, the simulation could be your only friend because the standard theory fails if the assumptions are not met.\n", "- What kind of statistical properties emerges from the operation of a Deep Learning network?\n", "\n", "For learning the foundational principles of data science and machine learning, the importance of these kinds of exercise cannot be emphasized enough.\n", "\n", "![simulation-ds](https://raw.githubusercontent.com/tirthajyoti/Stats-Maths-with-Python/master/images/Simulation-problems-DS.png)" ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "Conf_inv_sampling_hypothesis.ipynb", "provenance": [], "version": "0.3.2" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 4 }