{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Derivative-free proof that the median minimizes the sum of absolute differences\n", "\n", "\n", " #### [Back to main page](https://petrosyan.page/fall2020math3215)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "**Definition (Median)**\n", "\n", "\n", "
\n", "\n", "\n", "Let $D=\\{x_1,\\dots, x_n\\}$ be a set of values with $x_1\\leq \\cdots\\leq x_n$.\n", "The median of the set is defined to be the following number:\n", "1. If $n$ is odd and $n=2k+1$ then \n", "\n", "$$\\text{Median}(D)=x_{k+1}.$$\n", "\n", "2. If $n$ is even and $n=2k$ then\n", "\n", "$$\\text{Median}(D)=\\frac{x_{k}+x_{k+1}}{2}.$$\n", "\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "\n", "
\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Theorem.**\n", "\n", "\n", "
\n", " \n", " \n", "The minimum of the sum\n", "\n", "$$g(c)=\\sum\\limits_{i=1}^n|x_i-c|$$\n", "\n", "is attained at the median. \n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Proof.**\n", "\n", "Let us take the case of only two points $D=\\{x_1, x_2\\}$. Using the triangle inequality \n", "\n", "$$|x_1-c|+|x_2-c|\\geq |x_1-x_2|=x_2-x_1.$$\n", "\n", "Moreover, for any $c\\in [x_1,x_2]$, \n", "\n", "$$|x_1-c|+|x_2-c|=(c-x_1)+(x_2-c)=x_2-x_1.$$\n", "\n", "So any point in $[x_1,x_2]$ minimizes $|x_1-c|+|x_2-c|$.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **When n is even**\n", "\n", "Then rewrite \n", "\n", "$$\\sum\\limits_{i=1}^{2k}|x_i-c|= \\color{blue}{(|x_1-c|+|x_{2k}-c|)}+\\color{blue}{(|x_2-c|+|x_{2k-1}-c|)}+\\cdots + \\color{blue}{(|x_{k}-c|+|x_{k+1}-c|)}$$\n", "\n", "
\n", "\n", "\n", "According to the discussion above the first term is minimal when $c\\in [x_1, x_{2k}]$, the second term is minimal when $c\\in [x_2, x_{2k-1}]$ and so on. Thus, when $c\\in [x_k,x_{k+1}]$ all the terms have their smallest values. In particular, the median is in that interval: $\\frac{x_{k}+x_{k+1}}{2}\\in [x_k,x_{k+1}]$ so we have that the median the sum takes its minimal value. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **When n is odd**\n", "\n", "Then rewrite \n", "\n", "$$\\sum\\limits_{i=1}^{2k+1}|x_i-c|= \\color{blue}{(|x_1-c|+|x_{2k+1}-c|)}+\\color{blue}{(|x_2-c|+|x_{2k}-c|)}+\\cdots + \\color{blue}{(|x_{k}-c|+|x_{k+2}-c|)}+\\color{green}{|x_{k+1}-c|}.$$\n", "\n", "
\n", "\n", "\n", "Again, according to the discussion above the first term is minimal when $c\\in [x_1, x_{2k+1}]$, the second term is minimal when $c\\in [x_2, x_{2k}]$ and so on. Fiinally, the last term $|x_{k+1}-c|$ is minimal when $c=x_{k+1}$. Thus, when $c=x_{k+1}$ all the terms have their smallest values since it belongs to all the above intervals. And we concude thay the minimal value of their sum is at the median. $\\blacksquare$\n", "\n", "A more general proof of the theorem can be found in \"The theory of probability\" by Gnedenko (p. 190) among other places. " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 4 }