{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# First time here?\n", "Please see our [demo](ReadMeFirst.ipynb) on how to use the notebooks.\n", "\n", "___\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Why is Everything a Gaussian?\n", "\n", "In a nutshell, the central limit theorem states that the sample average of a sequence of random variables will converge towards their expected values. The distribution of sample averages, for a sufficiently large sample size, will approach that of a normal distribution.\n", "\n", "This notebook will illustrate this principal with an example." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, let's import required libraries for Python..." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "from ipywidgets import interact, interactive, fixed, interact_manual\n", "import ipywidgets as widgets\n", "import matplotlib.pyplot as plt\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's define a function to generate some random samples. We're going to use the Poisson distribution here, but you can (and should) try others. The function will take a number of tries (or replicas) and the number of samples to draw from the distribution. It will then show a histogram of the mean values from each replica." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def randomSample(number_of_measurements, points_per_measurement):\n", " means = []\n", " for i in range(number_of_measurements):\n", " samples = np.random.poisson(2, points_per_measurement)\n", " means.append(np.mean(samples))\n", "\n", " count, bins, ignores = plt.hist(means, 30, density=True)\n", " \n", " mu = np.mean(means)\n", " sigma = np.std(means)\n", " \n", " plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * np.exp( -(bins-mu)**2 / (2*sigma**2)),\n", " linewidth=2, color='r')\n", " plt.show()\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Start by setting the number of measurements to be large. In that case, if the number of points per measurement is 1, you'll get back a good approximation of the underlying distribution (Poisson). However, as the number of data points in each measurement increases, the resulting probability distribution will shift and eventually become a Gaussian. \n", "\n", "On the other hand, try setting the number of measurements to be small. Now, regardless of how many points you use per measurement, the resulting distribution may or may not look particularly Gaussian. \n", "\n", "Play with the two sliders (each move of the sliders creates a brand new data set!) to get a feel for how the data behaves..." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "