{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np;\n",
    "import matplotlib\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "np.random.seed(1337)\n",
    "\n",
    "kwargs = {'linewidth' : 3.5}\n",
    "font = {'weight' : 'normal', 'size'   : 24}\n",
    "matplotlib.rc('font', **font)\n",
    "\n",
    "def error_plot(ys, yscale='log'):\n",
    "    plt.figure(figsize=(8, 8))\n",
    "    plt.xlabel('Step')\n",
    "    plt.ylabel('Error')\n",
    "    plt.yscale(yscale)\n",
    "    plt.plot(range(len(ys)), ys, **kwargs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$\\LaTeX \\text{ commands here}\n",
    "\\newcommand{\\R}{\\mathbb{R}}\n",
    "\\newcommand{\\im}{\\text{im}\\,}\n",
    "\\newcommand{\\norm}[1]{||#1||}\n",
    "\\newcommand{\\inner}[1]{\\langle #1 \\rangle}\n",
    "\\newcommand{\\span}{\\mathrm{span}}\n",
    "\\newcommand{\\proj}{\\mathrm{proj}}\n",
    "\\newcommand{\\OPT}{\\mathrm{OPT}}\n",
    "$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "<hr style=\"border: 5px solid black\">\n",
    "\n",
    "**Georgia Tech, CS 4540**\n",
    "\n",
    "# L8: Duality, Convexity, & Postive Semidefiniteness\n",
    "\n",
    "Jake Abernethy & Benjamin Bray\n",
    "\n",
    "*Thursday, September 13, 2018*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Recall:  Linear Program in Standard Form\n",
    "\n",
    "<div style=\"padding:20px;margin:20px;border: 1px solid black\">\n",
    "\\begin{align}\n",
    "\\max_{x \\in \\R^n} &\\quad c^T x \\\\\n",
    "\\text{such that}  &\\quad Ax \\leq b & \\text{(only $\\leq$ constraints)} \\\\\n",
    "                  &\\quad  x \\geq 0 & \\text{(variables nonnegative)}\n",
    "\\end{align}\n",
    "</div>\n",
    "\n",
    "* Decision variables $x \\in \\R^n$\n",
    "* Linear objective $c^T x$ for $c \\in \\R^n$\n",
    "* Constraint matrix $A \\in \\R^{m \\times n}$ and vector $b \\in \\R^m$.  Each row corresponds to a constraint."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Recall:  Activity Analysis Problem\n",
    "\n",
    "A company has $m$ different resources which can be used to manufacture $n$ different products.\n",
    "\n",
    "* $x = (x_1,\\dots,x_n) \\in \\R^n$ is the amount of each product manufactured\n",
    "* $r = (r_1,\\dots,r_m) \\in \\R^m$ is the supply of each resource\n",
    "* $c = (c_1,\\dots,c_n) \\in \\R^n$ is the profit generated by each product\n",
    "* $A = [a_{ij}] \\in \\R^{m \\times n}$, where $a_{ij}$ is the amount of resource $i$ needed to make product $j$\n",
    "\n",
    "<div style=\"padding:10px;margin:20px;border: 1px solid black\">\n",
    "$$\\max_{x \\in \\R^n} \\quad c^T x \\quad\n",
    "\\text{s.t.} \\quad A x \\leq r \\quad\\text{and}\\quad x \\geq 0$$\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Problem:  Interpreting the Dual\n",
    "\n",
    "<div style=\"padding:20px;margin:20px;border: 1px solid black\">\n",
    "<b>Problem:</b>  Write down the dual linear program for the activity analysis problem.  What \"units\" do the dual variables need have?  How can they be interpreted?\n",
    "</div>\n",
    "\n",
    "> If it helps, assume the amount of each resource available is measured in *kilograms*, and our profit is measured in *euros* €.  Maybe we're manufacturing rope, so the amount of each product is measured in *meters*."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Solution:  Interpreting the Dual (1)\n",
    "\n",
    "Using our rope factory example,\n",
    "\n",
    "* $r_i$ is measured in $(kg)$\n",
    "* $x_j$ is measured in $(m)$\n",
    "* $c_j$ is measured in $(€/m)$\n",
    "* $a_{ij}$ is measured in $(kg/m)$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Solution:  Interpreting the Dual (2)\n",
    "\n",
    "The dual program has the constraint $y^T A \\geq c$, which has terms like $y_i a_{ij} \\geq c_j$.  The units are\n",
    "\n",
    "$$\n",
    "\\bigg( ? \\bigg) \\times \\bigg( \\frac{kg}{m} \\bigg) \\geq \\bigg( \\frac{€}{m} \\bigg)\n",
    "$$\n",
    "\n",
    "* For this to make sense, $y_i$ needs to have units $(€/kg)$ or **euros per kilogram**.\n",
    "* Then the dual objective $y^T r$ has units $(€/kg)\\times (kg) = (€)$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Solution:  Interpreting the Dual (3)\n",
    "\n",
    "We decided that $y_i$ has units $(€/kg)$, and $y^T r$ has units $(€)$.\n",
    "* Imagine that we also have the option of **selling our resources** directly to a buyer, for a price of $y_i$ euros per kilogram, rather than turning them into rope.\n",
    "* This is only worth it to us if the resources used to make each rope are **more valuable** than the ropes themselves.  This is the dual constraint!\n",
    "* The buyer knows this, so he'll try to **minimize the amount he pays for our raw materials** by choosing as small $y_i$ as possible.\n",
    "\n",
    "<div style=\"padding:20px;margin:20px;border: 1px solid black\">\n",
    "$$\n",
    "\\begin{align}\n",
    "\\min_{y \\in \\R^m} &\\quad y^T r \\\\\n",
    "\\text{such that}  &\\quad y^T A \\geq c^T & \\text{(dominates objective)} \\\\\n",
    "                  &\\quad  y \\geq 0 & \\text{(nonnegative combinations)}\n",
    "\\end{align}\n",
    "$$\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Farkas Lemma (v2) and Strong Duality\n",
    "\n",
    "<b>Lemma.</b> (Farkas)  Let $A \\in \\R^{m \\times n}$ and $b \\in \\R^m$.  Then exactly one of the following two statements is true:\n",
    "1.  There exists $x \\in \\R^n$ such that $Ax \\geq b$ and $x \\geq 0$.\n",
    "2.  There exists $y \\in \\R^m$ such that $y^T A \\leq 0$ and $b^T y > 0$ and $y \\geq 0$\n",
    "\n",
    "#### Problem:  Strong Duality\n",
    "\n",
    "**Prove** that strong duality holds using the Farkas Lemma (v2). That is, the following two are *equal*:\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "\\max_{x \\in \\R^n} &\\quad c^T x & \\text{such that}  &\\quad A x \\leq b, \\quad x \\geq 0 \\\\\n",
    "= \\min_{y \\in \\R^m} &\\quad y^T b & \\text{such that}  &\\quad y^T A \\geq c^T, \\quad y \\geq 0 \\\\\n",
    "\\end{align}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Positive Semidefinite Matrices\n",
    "\n",
    "**Definition**: A matrix $M \\in \\R^{n\\times n}$ is *positive semidefinite* (PSD) if $x^\\top M x \\geq 0\\; \\forall x \\in \\R^n$\n",
    "\n",
    "Commonly, we work with matrices $M$ that are *symmetric* (that is, $M = M^\\top$). In this case, the following are equivalent:\n",
    "\n",
    "1. $M$ is PSD\n",
    "2. The eigenvalues of $M$ are all $\\geq 0$\n",
    "3. The matrix $M$ can be written as $B^\\top B$ for some $B \\in \\R^{n \\times n}$\n",
    "\n",
    "#### Problem\n",
    "Show that (3) $\\implies$ (1) $\\implies$ (2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Solution\n",
    "\n",
    "+ (3) $\\implies$ (1):  If $M = B^\\top B$ then for any $x$ we have\n",
    "$$x^\\top M x = x^\\top (B^\\top B) x = (x^\\top B^\\top) (B x) = (B x)^\\top (B x) = \\|B x\\|_2^2 \\geq 0$$\n",
    "+ (1) $\\implies$ (2): If $M$ is PSD, then for any vector $x$ we have $x^\\top M x \\geq 0$. Let's take an vector $v$ of $M$, with eigenvalue $\\lambda$, hence $M v = \\lambda v$. We need to show that $\\lambda \\geq 0$. Notice that $v \\ne 0$, hence, if we multiply on the left hand side by $v$, we use the fact that $M$ is PSD to get\n",
    "$$ 0 \\leq v^\\top M v = v^\\top(\\lambda v) = \\lambda v^\\top v = \\lambda \\| v\\|^2.$$\n",
    "The fact that $0 \\leq \\lambda \\| v \\|^2$ can only occur if $\\lambda \\geq 0$. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### PSD matrices\n",
    "\n",
    "We often use the notation $M \\succcurlyeq 0$ to denote that $M$ is PSD. The notation $M \\succcurlyeq N$ is equivalent to $M - N \\succcurlyeq 0$, i.e. $M - N$ is PSD.\n",
    "\n",
    "#### Problem A\n",
    "\n",
    "Let $M \\in \\R^{n \\times n}$ be an arbitrary symmetric matrix. Let $P \\in \\R$ be any nonsingular matrix. Show that\n",
    "$$M \\succcurlyeq 0 \\Longleftrightarrow P^\\top M P \\succcurlyeq 0$$\n",
    "\n",
    "#### Problem B\n",
    "\n",
    "Show that the set $S_+^n := \\{ M \\in \\R^{n\\times n}: M = M^\\top, M \\succcurlyeq 0 \\}$ forms a convex cone."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Solutions\n",
    "\n",
    "+ (Problem A)\n",
    "    + First assume that $M \\succcurlyeq 0$. Then $x^\\top M x \\geq 0$ for all $x$. Now take an arbitrary vector $y$, and observe that $y^\\top (P^\\top M P) y = (P y)^\\top M (P y)$. Of course, $P y$ is just some arbitrary vector, hence $(P y)^\\top M (P y) \\geq 0$ as desired\n",
    "    + For the second direction, assume that for all $y$ we have $y^\\top (P^\\top M P) y \\geq 0$. We need to show that for all $x$, $x^\\top M x \\geq 0$. Notice that $P$ is invertible, hence we can set $y = P^{-1} x$, and observe that\n",
    "    $$ 0 \\leq  (P^{-1} x)^\\top (P^\\top M P) (P^{-1} x) = x^\\top (P^{-1})^\\top P^\\top M P P^{-1} x = x^\\top M x$$\n",
    "    as desired.\n",
    "+ (Problem B) Let us quickly check convexity. Let $N,M \\in S_+^n$, and observe that for any $x$ we have $x^\\top M x \\geq 0$ and $x^\\top N x \\geq 0$. Hence, if we take a convex combination $\\theta M + (1-\\theta)$, with $\\theta \\in [0,1]$ we have\n",
    "$$ x^\\top (\\theta M + (1-\\theta) N) x = \\theta x^\\top M x + (1-\\theta) x^\\top N x$$\n",
    "which is nonnegative since both terms are nonnegative"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Convex Functions\n",
    "\n",
    "A function $f : \\R^d \\to \\R$ is convex if, for any $x,y \\in \\text{dom}(f)$ and any $\\theta \\in [0,1]$, we have\n",
    "$$f(\\theta x + (1-\\theta)y) \\leq \\theta f(x) + (1-\\theta) f(y)$$\n",
    "\n",
    "**First-order Condition** When $f$ is differentiable, then $f$ is convex if and only if\n",
    "$$f(y) \\geq f(x) + \\nabla f(x)^\\top(y - x) \\quad \\text{ for any } x,y \\in \\text{dom}(f)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Problem\n",
    "\n",
    "**First-order Condition** When $f$ is differentiable, then $f$ is convex if and only if\n",
    "$$f(y) \\geq f(x) + \\nabla f(x)^\\top(y - x) \\quad \\text{ for any } x,y \\in \\text{dom}(f)$$\n",
    "\n",
    "Use the first order condition to show that, for any convex and differentiable $f$, we have\n",
    "$$(\\nabla f(y) - \\nabla f(x))^\\top(y - x) \\geq 0 \\text{ for any } x,y \\in \\text{dom}(f)$$\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Answer\n",
    "\n",
    "Let's apply the first order condition twice, once at $x$ and once at $y$:\n",
    "\\begin{eqnarray*}\n",
    "f(y) & \\geq & f(x) + \\nabla f(x)^\\top(y - x) \\\\\n",
    "f(x) & \\geq & f(y) + \\nabla f(y)^\\top(x - y).\n",
    "\\end{eqnarray*}\n",
    "Add these two inequalities together, and subtract $f(x) + f(y)$ from both sides and you are done!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Problem\n",
    "\n",
    "**Second-order Condition** When $f$ is twice differentiable, then $f$ is convex if and only if\n",
    "the hessian satisfies $\\nabla^2f(x) \\succcurlyeq 0$.\n",
    "\n",
    "Find conditions under which the following function is convex $f(x) = \\frac 1 2 x^\\top A x$ for a symmetric matrix $A \\in \\R^{d \\times d}$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Answer\n",
    "\n",
    "As we computed earlier in lecture, for $f(x) = \\frac 1 2 x^\\top A x$, the hessian is $\\nabla^2 f(x) = A$. We know that a twice-differentiable function is convex if its hessian is positive semi-definite. Therefore, $f$ is convex if and only if $A$ is positive semi-definite."
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python [conda env:anaconda3]",
   "language": "python",
   "name": "conda-env-anaconda3-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}