{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "The formula for [sample variance](https://en.wikipedia.org/wiki/Variance#Sample_variance):\n", " $$s_n^2 = \\frac{1}{n-1}\\sum (x_i-\\bar{x})^2$$\n", " has that funny $n-1$ in the denominator.\n", " \n", "The n-1 is referred to as [Bessel's correction](https://en.wikipedia.org/wiki/Bessel%27s_correction).\n", "The usual explanation involves vague terms such as [degrees of freedom](https://en.wikipedia.org/wiki/Degrees_of_freedom_(statistics%29) which always sounded flaky to me." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Let us first check the n-1 by experiment" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "f (generic function with 1 method)" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function f(n)\n", " x = randn(n)\n", " norm(x-mean(x))^2\n", "end" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10.00378620254928" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n=11\n", "mean([f(n) for i=1:1_000_000])" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.9965121482424095" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n=5\n", "mean([f(n) for i=1:1_000_000])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. A few facts about randn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "randn(n) is an n-vector of independent standard normals.\n", "\n", "If Q is any orthgonal matrix, $Q*$randn(n) is also an n-vector of independent standard normals.\n", "There is no mathematical way to distinguish randn(n) from $Q*$randn(n). This is because the\n", "probability function is proportional to $e^{-\\|x\\|^2/2}$, i.e., it only depends on the length of x, not\n", "the direction.\n", "\n", "Also the expected value of randn(1)^2 is 1." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Linear Algebra makes n-1 easy to understand" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consider the projection matrix $P=I-1/n$. The matrix-vector product $Px$ computes x-mean(x)." ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4×4 Array{Rational{Int64},2}:\n", " 3//4 -1//4 -1//4 -1//4\n", " -1//4 3//4 -1//4 -1//4\n", " -1//4 -1//4 3//4 -1//4\n", " -1//4 -1//4 -1//4 3//4" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# example \n", "n = 4\n", "P = eye(Int,n) - 1//n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we write the eigendecomposition $P=Q\\Lambda Q'$, then $\\Lambda$ has one diagonal entry (say the first) $0$ and the\n", "rest $1$.\n", "
\n", "Therefore if x=randn(n) so is Qx as a random variable, and $$\\|PQx\\|^2 = \\|Q\\Lambda x\\|^2 = \\|\\Lambda x\\|^2=x_2^2 +\\ldots+x_n^2 $$ which is obviously n-1 in expectation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Julia 0.6.2", "language": "julia", "name": "julia-0.6" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "0.6.2" } }, "nbformat": 4, "nbformat_minor": 2 }