{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Inequalities and Identities" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "import sympy\n", "from sympy import symbols\n", "sympy.init_printing(use_unicode=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## General-purpose Inequalities" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Arithmetic-Geometric Mean Inequality\n", "Let $\\{ a_j \\}_{j=1}^n$ and $\\{q_j\\}_{j=1}^n$ be sets of nonnegative numbers with\n", "$\\sum_j q_j = 1$.\n", "Then\n", "\\begin{equation*} \n", " \\prod_{j=1}^n a_j^{q_j} < \\sum_{j=1}^n q_j a_j.\n", "\\end{equation*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Cauchy-Schwarz Inequality\n", "\n", "Let $\\mathcal{X}$ be a real linear vector space, and let $x, y \\in \\mathcal{X}$. Then \n", "\\begin{equation*} \n", " | \\langle x, y \\rangle|^2 \\le \\langle x, x \\rangle \\langle y, y \\rangle.\n", "\\end{equation*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Rearrangement Inequalities \n", "\n", "#### Rearrangements of sequences\n", "\n", "Let $x \\equiv (x_j)_{j=1}^n \\in {\\mathbb R}^n$.\n", "The _decreasing rearrangement_ of $x$, denoted $x^*$, has the same multiset of \n", "component values as $x$, but ordered so that\n", "\\begin{equation*} \n", "x_1^* \\ge x_2^* \\ge \\cdots \\ge x_n^*.\n", "\\end{equation*}\n", "\n", "The _increasing rearrangmement_ of $x$, denoted $x_*$, has the same multiset of component\n", "values as $x$, but ordered so that\n", "\\begin{equation*} \n", "x_{*1} \\le x_{*2} \\le \\cdots \\le x_{*n}.\n", "\\end{equation*}\n", "\n", "#### The Hardy-Littlewood Rearrangement Inequality for two vectors\n", "Suppose $x \\in \\mathbb{R}^n$ and $y \\in \\mathbb{R}^n$.\n", "Let $x \\cdot y \\equiv \\sum_{j=1}^n x_j y_j$.\n", "Then \n", "\\begin{equation*} \n", " x^* \\cdot y_* \\le x \\cdot y \\le x^* \\cdot y^*.\n", "\\end{equation*}\n", "\n", "#### The Symmetric Decreasing Rearrangement of a function\n", "\n", "Suppose $f: \\mathbb{R}^n \\rightarrow \\mathbb{R}^+$ has a finite integral. \n", "The _symmetric decreasing rearrangement_ $f^*$ of $f$ is a function such that:\n", "\n", "+ $f(x) = f(-x)$ (symmetry)\n", "+ $f(x) \\ge f(y)$ if $y \\ge x \\ge 0$ (decreasing)\n", "+ $\\int_{\\mathbb{R}^n} 1_{f(x) \\ge t} dx = \\int_{\\mathbb{R}^n} 1_{f^*(x) \\ge t} dx$ for all $t \\ge 0$ (the measure of the level sets is the same)\n", "\n", "\n", "#### The Hardy-Littlewood Rearrangement Inequality for two functions\n", "Suppose $f: \\mathbb{R}^{n} \\to \\mathbb{R}^{+}$ and $g: \\mathbb{R}^{n} \\to \\mathbb{R}^{+}$.\n", "Then \n", "\\begin{equation*} \n", " \\int_{\\mathbb{R}^n} f(x) g(x) dx \\le \\int_{\\mathbb{R}^n} f^*(x) g^*(x) dx.\n", "\\end{equation*}\n", "\n", "\n", "#### The Riesz Rearrangement Inequality for three functions\n", "Suppose $f: \\mathbb{R}^{n} \\to \\mathbb{R}^{+}$, $g: \\mathbb{R}^{n} \\to \\mathbb{R}^{+}$, and\n", "$g: \\mathbb{R}^{n} \\to \\mathbb{R}^{+}$.\n", "\n", "Then \n", "\\begin{equation*} \n", "\\int f(x)g(x-y)h(y)\\,dx\\,dy \\le \\int f^{*}(x)g^{*}(x-y)h^{*}(y)\\,dx\\,dy.\n", "\\end{equation*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Stirling's inequalities\n", "\n", "\\begin{equation} \n", "e n^{n+1/2} e^{-n} \\ge n! \\ge \\sqrt{2 \\pi} n^{n+1/2} e^{-n}.\n", "\\end{equation}\n", "\n", "For $\\ell \\ge 1$ and $m \\ge 2$,\n", "\n", "\n", "\\begin{equation} \n", "{ {\\ell m } \\choose { \\ell }} \\ge \\frac{m^{m(\\ell-1)+1}}{\\sqrt{\\ell} (m-1)^{(m-1)(\\ell-1)}}.\n", "\\end{equation}\n", "\n", "\n", "## Entropy bounds on combinations\n", "\n", "\n", "\\begin{equation} \n", "\\frac{2^{nH(k/n)}}{n+1} \\le {n \\choose k} \\le 2^{nH(k/n)},\n", "\\end{equation}\n", "\n", "where $H(q) \\equiv -q \\log_2(q) - (1-q) \\log_2 (1-q)$.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Identities involving conditional expectation\n", "\n", "### Law of Iterated Expectation (Adam's Law, Law of Total Expectation) <a name=\"law_of_total_expectation\" />\n", "\n", "If $X$ and $Y$ are random variables with a joint distribution,\n", "\n", "\\begin{equation*} \n", " \\mathbb{E} X = \\mathbb{E}_Y \\mathbb{E}(X|Y).\n", "\\end{equation*}\n", "\n", "### Law of Total Variance (Eve's Law)\n", "\n", "If $X$ and $Y$ are random variables with a joint distribution,\n", "\n", "\\begin{equation*} \n", " Var(X) = \\mathbb{E}_Y Var(X|Y) + Var \\mathbb{E} (X|Y).\n", "\\end{equation*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Probability Inequalities\n", "This section follows Lugosi (2006) closely.\n", "\n", "### The tail-sum formula\n", "If $X$ is a nonnegative real-valued random variable,\n", "\\begin{equation*} \n", " {\\mathbb E} X = \\int_0^\\infty {\\mathbb{P}} \\{X \\ge x\\} dx\n", "\\end{equation*}\n", "\n", "### Jensen's Inequality\n", "If $\\phi$ is a convex function from ${\\mathcal X}$ to $\\mathbb{R}$, then $\\phi({\\mathbb E} X) \\le {\\mathbb E} \\phi(X)$.\n", "\n", "\n", "\n", "### Markov's, Chebychev's, and related inequalities\n", "From the tail-sum formula,\n", "\\begin{equation*} \n", " {\\mathbb E} X \\ge \\int_0^t {\\mathbb{P}} \\{X \\ge x\\} dx \\ge t {\\mathbb{P}} \\{X \\ge t \\}\n", "\\end{equation*}\n", "so\n", "\\begin{equation*} \n", " {\\mathbb{P}} \\{ X \\ge t \\} \\le \\frac{{\\mathbb E} X}{t}.\n", "\\end{equation*}\n", "This is _Markov's Inequality_.\n", "\n", "Moreover, for any strictly monotonic function $f$ and nonnegative $X$,\n", "\\begin{equation*} \n", " {\\mathbb{P}} \\{ X \\ge t \\} = {\\mathbb{P}} \\{ f(X) \\ge f(t) \\} \\le \\frac{{\\mathbb E} f(X)}{f(t)}.\n", "\\end{equation*}\n", "In particular, for any real-valued $X$ and real $q > 0$,\n", "$| X - {\\mathbb E} X |$ is a nonnegative\n", "random variable and $f(x) = x^q$ is strictly monotonic, so \n", "\\begin{equation*} \n", " {\\mathbb{P}} \\{| X - {\\mathbb E} X | \\ge t \\} = {\\mathbb{P}} \\{ | X - {\\mathbb E} X |^q \\ge t^q \\} \\le\n", " \\frac{{\\mathbb E} | X - {\\mathbb E} X |^q}{t^q}.\n", "\\end{equation*}\n", "This is the _Generalized Markov Inequality_.\n", "\n", "_Chebychev's Inequality_ is a special case:\n", "\\begin{equation*} \n", " {\\mathbb{P}} \\{ | X - {\\mathbb E} X |^2 \\ge t^2 \\} \\le\n", " \\frac{{\\mathbb E} | X - {\\mathbb E} X |^2}{t^2}\n", " = \\frac{{\\mbox{Var}} X}{t^2}.\n", "\\end{equation*}\n", "\n", "### Chernoff bounds\n", "If $X$ has a moment generating function (if ${\\mathbb{E}} e^{sX}$ exists for $s \\in [-a, a]$\n", "for some $a > 0$), we can apply the generalized Markov Inequality with $f(x) = e^{sx}$ for positive $s$:\n", "\\begin{equation*} \n", " {\\mathbb{P}} \\{ X \\ge t \\} = {\\mathbb{P}} \\{ e^{sX} \\ge e^{st} \\} \\le \\frac{{\\mathbb E} e^{sX}}{e^{st}}\n", "\\end{equation*}\n", "for all $s \\in [-a, a]$.\n", "For any particular $X$, one can optimize this over $s$:\n", "\\begin{equation*} \n", " {\\mathbb{P}} \\{ X \\ge t \\} = {\\mathbb{P}} \\{ e^{sX} \\ge e^{st} \\} \\le \\inf_{s \\in [-a, a]}\n", " \\frac{{\\mathbb E} e^{sX}}{e^{st}}.\n", "\\end{equation*}\n", "\n", "### Hoeffding's Inequality\n", "Let $\\{ X_j \\}_{j=1}^n$ be independent, and define \n", "$S_n \\equiv \\sum_{j=1}^n X_j$.\n", "Then, applying the Chernoff bound gives\n", "\\begin{equation*} \n", " {\\mathbb{P}} \\{ S_n - {\\mathbb E} S_n \\ge t \\} \\le e^{-st} {\\mathbb E} e^{sS_n} =\n", " e^{-st} \\prod_{j=1}^n e^{s(X_j - E X_j)}.\n", "\\end{equation*}\n", "Hoeffding bounds the moment generating function for a bounded random variable\n", "$X$:\n", "If $\\mathbb{P} (X \\in [a, b]) = 1$, then\n", "\\begin{equation*} \n", " {\\mathbb E} e^{sX} \\le e^{s^2 (b-a)^2/8},\n", "\\end{equation*}\n", "from which follows _Hoeffding's tail bound_:\n", "\n", "If $\\{X_j\\}_{j=1}^n$ are independent and ${\\mathbb{P}} \\{a_j \\le X_j \\le b_j\\} = 1$,\n", "then\n", "\\begin{equation*} \n", " {\\mathbb{P}} \\{ S_n - {\\mathbb E} S_n \\ge t \\} \\le e^{-2t^2/\\sum_{j=1}^n (b_j - a_j)^2}\n", "\\end{equation*} \n", "and\n", "\\begin{equation*} \n", " {\\mathbb{P}} \\{ S_n - {\\mathbb E} S_n \\le -t \\} \\le e^{-2t^2/\\sum_{j=1}^n (b_j - a_j)^2}\n", "\\end{equation*} \n", "\n", "### Hoeffding's Other Inequality\n", "Suppose $f$ is a convex, continuous, real function and ${\\mathcal X}$ is a finite set.\n", "Let $\\{X_j \\}_{j=1}^n$ be a simple random sample from ${\\mathcal X}$ and\n", "let $\\{Y_j \\}_{j=1}^n$ be an iid uniform random sample (with replacement) from ${\\mathcal X}$.\n", "Then\n", "\\begin{equation*} \n", " {\\mathbb E} f \\left ( \\sum_{j=1}^n X_j \\right ) \\le {\\mathbb E} f \\left ( \\sum_{j=1}^n Y_j \\right ).\n", "\\end{equation*}\n", "\n", "### Bernstein's Inequality\n", "\n", "There are many Bernstein Inequalities, which are derived by applying Markov's inequality to random variables of\n", "the form $\\exp (\\lambda \\sum _{j=1}^n X_j )$ for suitable $\\lambda$. \n", "Chernoff bounds, Hoeffding's inequality, and Azuma's inequality are special cases.\n", "\n", "Here are two of Bernstein's inequalities.\n", "\n", "1. Suppose $\\{X_j \\}_{j=1}^n$ are independent with ${\\mathbb E} X_j = 0$ for all $j$,\n", "${\\mathbb{P}} \\{ | X_j | \\le c\\} = 1$,\n", "$\\sigma_j^2 = {\\mathbb E} X_j^2$ and $\\sigma = \\frac{1}{n} \\sum_{j=1}^n \\sigma_j^2$.\n", "Then for any $\\epsilon > 0$,\n", "\\begin{equation*} \n", " {\\mathbb{P}} \\{ S_n/n > \\epsilon \\} \\le e^{-n \\epsilon^2 / 2(\\sigma^2 + c\\epsilon/3)}.\n", "\\end{equation*}\n", "\n", "1. $\\{X_j \\}_{j=1}^n$ are independent with ${\\mathbb E} X_j = 0$ for all $j$.\n", "Suppose that for some $C>0$ and every integer $k \\ge 2$,\n", "\\begin{equation}\n", "\\mathbb{E} \\left| X_j^k \\right| \\le \\frac{C^{k-2} k!}{2} \\mathbb{E} X_i^{2}.\n", "\\end{equation}\n", "Then for all $t \\in \\left [ 0, (2C)^{-1} \\sqrt{\\sum \\mathbb{E} X_j^2} \\right ]$,\n", "\\begin{equation}\n", "\\mathbb{P} \\left( \\sum_{j=1}^n X_j \\ge 2t \\sqrt{\\sum \\mathbb {E} X_j^2} \\right) < e^{-t^2}.\n", "\\end{equation}\n", "\n", "There are also _empirical_ Bernstein bounds, which bound probabilities in terms of observed\n", "(sample) moments instead of the theoretical moments.\n", "The following example is from Maurer and Pontil, 1987 (Theorem 11):\n", "\n", "Let $X = (X_j)_{j=1}^n$ be a vector of independent random variables that take values in $[0, 1]$. \n", "Let $\\bar{X} := \\frac{1}{n} \\sum_j X_j$ and $V_n(X) := \\frac{1}{n(n-1)} \\sum_{i, j = 1}^n (X_i - X_j)^2/2$.\n", "Let $\\delta > 0$. \n", "Then with probability at least $1 − \\delta$ in $X$,\n", "\\begin{equation}\n", "\\mathbb{E} \\bar{X} \\le \\bar{X} + \\sqrt{ \\frac{2V_n (X) \\ln 2/\\delta}{n}} + \\frac{7 \\ln 2/\\delta}{3(n − 1)}.\n", "\\end{equation}\n", "\n", "### Gibbs' inquality\n", "\n", "Suppose the distributions $\\mathbb{P}$, $\\mathbb{Q}$, and $\\mathbb{R}$ on a common outcome space\n", "have densities $f_{\\mathbb{P}}$, $f_{\\mathbb{Q}}$, and $f_{\\mathbb{R}}$ with respect\n", "to the same dominating measure $\\mu$.\n", "Then\n", "\\begin{equation}\n", "\\mathbb{E}_\\mathbb{Q} \\log(f_{\\mathbb{Q}}/f_{\\mathbb{P}}) \\ge \\mathbb{E}_\\mathbb{Q} \\log(f_{\\mathbb{R}}/f_{\\mathbb{P}}).\n", "\\end{equation}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Martingales\n", "\n", "See [martingales](./martingales.ipynb) for more.\n", "\n", "A sequence of random variables $\\{ X_j \\}$ is a _martingale_ if \n", "\n", "+ $\\mathbb{E} |X_j| < \\infty$ $\\forall j$, and\n", "\n", "+ $\\mathbb{E} (|X_{j+1}| \\mid X_j) = X_j$.\n", "\n", "\n", "A sequence of random variables $\\{ X_j \\}$ is a martingale with respect to the sequence\n", "$\\{Y_j \\}$ if\n", "\n", "+ $\\mathbb{E} |X_j| < \\infty$ $\\forall j$, and\n", "\n", "+ $\\mathbb{E} (|X_{j+1}| \\mid Y_j) = X_j$.\n", "\n", "A _nonnegative martingale_ is a martingale for which $\\Pr \\{X_j \\ge 0 \\} = 1$ $\\forall j$.\n", "\n", "### Examples\n", "\n", "+ A gambler's fortune in a series of fair bets with the same stakes is a martingale: the expected fortune after the $j+1$st bet is the fortune after the $j$th bet.\n", "\n", "+ Likelihood ratios: Suppose $\\{Y_j \\}$ are IID with density $f$, and let $g$ be some other density. \n", "Define $X_j \\equiv \\prod_{i=1}^j g(X_i)/f(X_i)$. \n", "Then $(X_j)$ is a martingale with respect to $\\{Y_j\\}$.\n", "(More generaly, likelihood ratios are nonnegative martingales with respect to the distribution in the denominator.)\n", "\n", "\n", "### Ville's Inequality\n", "\n", "If $(X_j)$ is a nonnegative supermartingale, then\n", "\\begin{equation*} \n", " \\Pr \\{ \\sup_j X_j \\ge a \\} \\le \\mathbb{E} X_1/a.\n", "\\end{equation*}\n", "\n", "Ville's inequality is foundational for sequential testing.\n", "See [the SPRT](./sprt.ipynb), [the SPRT without replacement](./pSPRTnoReplacement.ipynb),\n", "and [nonparametric confidence sets](./nonparametricConfidence.ipynb)." ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4" } }, "nbformat": 4, "nbformat_minor": 4 }