{ "cells": [ { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np;\n", "import matplotlib\n", "import matplotlib.pyplot as plt\n", "\n", "np.random.seed(1337)\n", "\n", "kwargs = {'linewidth' : 3.5}\n", "font = {'weight' : 'normal', 'size' : 24}\n", "matplotlib.rc('font', **font)\n", "\n", "def error_plot(ys, yscale='log'):\n", " plt.figure(figsize=(8, 8))\n", " plt.xlabel('Step')\n", " plt.ylabel('Error')\n", " plt.yscale(yscale)\n", " plt.plot(range(len(ys)), ys, **kwargs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$\\LaTeX \\text{ commands here}\n", "\\newcommand{\\R}{\\mathbb{R}}\n", "\\newcommand{\\im}{\\text{im}\\,}\n", "\\newcommand{\\norm}[1]{||#1||}\n", "\\newcommand{\\inner}[1]{\\langle #1 \\rangle}\n", "\\newcommand{\\span}{\\mathrm{span}}\n", "\\newcommand{\\proj}{\\mathrm{proj}}\n", "\\newcommand{\\OPT}{\\mathrm{OPT}}\n", "$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "
\n", "\n", "**Georgia Tech, CS 4540**\n", "\n", "# L8: Duality, Convexity, & Postive Semidefiniteness\n", "\n", "Jake Abernethy & Benjamin Bray\n", "\n", "*Thursday, September 13, 2018*" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Recall: Linear Program in Standard Form\n", "\n", "
\n", "\\begin{align}\n", "\\max_{x \\in \\R^n} &\\quad c^T x \\\\\n", "\\text{such that} &\\quad Ax \\leq b & \\text{(only $\\leq$ constraints)} \\\\\n", " &\\quad x \\geq 0 & \\text{(variables nonnegative)}\n", "\\end{align}\n", "
\n", "\n", "* Decision variables $x \\in \\R^n$\n", "* Linear objective $c^T x$ for $c \\in \\R^n$\n", "* Constraint matrix $A \\in \\R^{m \\times n}$ and vector $b \\in \\R^m$. Each row corresponds to a constraint." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Recall: Activity Analysis Problem\n", "\n", "A company has $m$ different resources which can be used to manufacture $n$ different products.\n", "\n", "* $x = (x_1,\\dots,x_n) \\in \\R^n$ is the amount of each product manufactured\n", "* $r = (r_1,\\dots,r_m) \\in \\R^m$ is the supply of each resource\n", "* $c = (c_1,\\dots,c_n) \\in \\R^n$ is the profit generated by each product\n", "* $A = [a_{ij}] \\in \\R^{m \\times n}$, where $a_{ij}$ is the amount of resource $i$ needed to make product $j$\n", "\n", "
\n", "$$\\max_{x \\in \\R^n} \\quad c^T x \\quad\n", "\\text{s.t.} \\quad A x \\leq r \\quad\\text{and}\\quad x \\geq 0$$\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Problem: Interpreting the Dual\n", "\n", "
\n", "Problem: Write down the dual linear program for the activity analysis problem. What \"units\" do the dual variables need have? How can they be interpreted?\n", "
\n", "\n", "> If it helps, assume the amount of each resource available is measured in *kilograms*, and our profit is measured in *euros* €. Maybe we're manufacturing rope, so the amount of each product is measured in *meters*." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Solution: Interpreting the Dual (1)\n", "\n", "Using our rope factory example,\n", "\n", "* $r_i$ is measured in $(kg)$\n", "* $x_j$ is measured in $(m)$\n", "* $c_j$ is measured in $(€/m)$\n", "* $a_{ij}$ is measured in $(kg/m)$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Solution: Interpreting the Dual (2)\n", "\n", "The dual program has the constraint $y^T A \\geq c$, which has terms like $y_i a_{ij} \\geq c_j$. The units are\n", "\n", "$$\n", "\\bigg( ? \\bigg) \\times \\bigg( \\frac{kg}{m} \\bigg) \\geq \\bigg( \\frac{€}{m} \\bigg)\n", "$$\n", "\n", "* For this to make sense, $y_i$ needs to have units $(€/kg)$ or **euros per kilogram**.\n", "* Then the dual objective $y^T r$ has units $(€/kg)\\times (kg) = (€)$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Solution: Interpreting the Dual (3)\n", "\n", "We decided that $y_i$ has units $(€/kg)$, and $y^T r$ has units $(€)$.\n", "* Imagine that we also have the option of **selling our resources** directly to a buyer, for a price of $y_i$ euros per kilogram, rather than turning them into rope.\n", "* This is only worth it to us if the resources used to make each rope are **more valuable** than the ropes themselves. This is the dual constraint!\n", "* The buyer knows this, so he'll try to **minimize the amount he pays for our raw materials** by choosing as small $y_i$ as possible.\n", "\n", "
\n", "$$\n", "\\begin{align}\n", "\\min_{y \\in \\R^m} &\\quad y^T r \\\\\n", "\\text{such that} &\\quad y^T A \\geq c^T & \\text{(dominates objective)} \\\\\n", " &\\quad y \\geq 0 & \\text{(nonnegative combinations)}\n", "\\end{align}\n", "$$\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Farkas Lemma (v2) and Strong Duality\n", "\n", "Lemma. (Farkas) Let $A \\in \\R^{m \\times n}$ and $b \\in \\R^m$. Then exactly one of the following two statements is true:\n", "1. There exists $x \\in \\R^n$ such that $Ax \\geq b$ and $x \\geq 0$.\n", "2. There exists $y \\in \\R^m$ such that $y^T A \\leq 0$ and $b^T y > 0$ and $y \\geq 0$\n", "\n", "#### Problem: Strong Duality\n", "\n", "**Prove** that strong duality holds using the Farkas Lemma (v2). That is, the following two are *equal*:\n", "\n", "$$\n", "\\begin{align}\n", "\\max_{x \\in \\R^n} &\\quad c^T x & \\text{such that} &\\quad A x \\leq b, \\quad x \\geq 0 \\\\\n", "= \\min_{y \\in \\R^m} &\\quad y^T b & \\text{such that} &\\quad y^T A \\geq c^T, \\quad y \\geq 0 \\\\\n", "\\end{align}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Positive Semidefinite Matrices\n", "\n", "**Definition**: A matrix $M \\in \\R^{n\\times n}$ is *positive semidefinite* (PSD) if $x^\\top M x \\geq 0\\; \\forall x \\in \\R^n$\n", "\n", "Commonly, we work with matrices $M$ that are *symmetric* (that is, $M = M^\\top$). In this case, the following are equivalent:\n", "\n", "1. $M$ is PSD\n", "2. The eigenvalues of $M$ are all $\\geq 0$\n", "3. The matrix $M$ can be written as $B^\\top B$ for some $B \\in \\R^{n \\times n}$\n", "\n", "#### Problem\n", "Show that (3) $\\implies$ (1) $\\implies$ (2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Solution\n", "\n", "+ (3) $\\implies$ (1): If $M = B^\\top B$ then for any $x$ we have\n", "$$x^\\top M x = x^\\top (B^\\top B) x = (x^\\top B^\\top) (B x) = (B x)^\\top (B x) = \\|B x\\|_2^2 \\geq 0$$\n", "+ (1) $\\implies$ (2): If $M$ is PSD, then for any vector $x$ we have $x^\\top M x \\geq 0$. Let's take an vector $v$ of $M$, with eigenvalue $\\lambda$, hence $M v = \\lambda v$. We need to show that $\\lambda \\geq 0$. Notice that $v \\ne 0$, hence, if we multiply on the left hand side by $v$, we use the fact that $M$ is PSD to get\n", "$$ 0 \\leq v^\\top M v = v^\\top(\\lambda v) = \\lambda v^\\top v = \\lambda \\| v\\|^2.$$\n", "The fact that $0 \\leq \\lambda \\| v \\|^2$ can only occur if $\\lambda \\geq 0$. \n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### PSD matrices\n", "\n", "We often use the notation $M \\succcurlyeq 0$ to denote that $M$ is PSD. The notation $M \\succcurlyeq N$ is equivalent to $M - N \\succcurlyeq 0$, i.e. $M - N$ is PSD.\n", "\n", "#### Problem A\n", "\n", "Let $M \\in \\R^{n \\times n}$ be an arbitrary symmetric matrix. Let $P \\in \\R$ be any nonsingular matrix. Show that\n", "$$M \\succcurlyeq 0 \\Longleftrightarrow P^\\top M P \\succcurlyeq 0$$\n", "\n", "#### Problem B\n", "\n", "Show that the set $S_+^n := \\{ M \\in \\R^{n\\times n}: M = M^\\top, M \\succcurlyeq 0 \\}$ forms a convex cone." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Solutions\n", "\n", "+ (Problem A)\n", " + First assume that $M \\succcurlyeq 0$. Then $x^\\top M x \\geq 0$ for all $x$. Now take an arbitrary vector $y$, and observe that $y^\\top (P^\\top M P) y = (P y)^\\top M (P y)$. Of course, $P y$ is just some arbitrary vector, hence $(P y)^\\top M (P y) \\geq 0$ as desired\n", " + For the second direction, assume that for all $y$ we have $y^\\top (P^\\top M P) y \\geq 0$. We need to show that for all $x$, $x^\\top M x \\geq 0$. Notice that $P$ is invertible, hence we can set $y = P^{-1} x$, and observe that\n", " $$ 0 \\leq (P^{-1} x)^\\top (P^\\top M P) (P^{-1} x) = x^\\top (P^{-1})^\\top P^\\top M P P^{-1} x = x^\\top M x$$\n", " as desired.\n", "+ (Problem B) Let us quickly check convexity. Let $N,M \\in S_+^n$, and observe that for any $x$ we have $x^\\top M x \\geq 0$ and $x^\\top N x \\geq 0$. Hence, if we take a convex combination $\\theta M + (1-\\theta)$, with $\\theta \\in [0,1]$ we have\n", "$$ x^\\top (\\theta M + (1-\\theta) N) x = \\theta x^\\top M x + (1-\\theta) x^\\top N x$$\n", "which is nonnegative since both terms are nonnegative" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Convex Functions\n", "\n", "A function $f : \\R^d \\to \\R$ is convex if, for any $x,y \\in \\text{dom}(f)$ and any $\\theta \\in [0,1]$, we have\n", "$$f(\\theta x + (1-\\theta)y) \\leq \\theta f(x) + (1-\\theta) f(y)$$\n", "\n", "**First-order Condition** When $f$ is differentiable, then $f$ is convex if and only if\n", "$$f(y) \\geq f(x) + \\nabla f(x)^\\top(y - x) \\quad \\text{ for any } x,y \\in \\text{dom}(f)$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Problem\n", "\n", "**First-order Condition** When $f$ is differentiable, then $f$ is convex if and only if\n", "$$f(y) \\geq f(x) + \\nabla f(x)^\\top(y - x) \\quad \\text{ for any } x,y \\in \\text{dom}(f)$$\n", "\n", "Use the first order condition to show that, for any convex and differentiable $f$, we have\n", "$$(\\nabla f(y) - \\nabla f(x))^\\top(y - x) \\geq 0 \\text{ for any } x,y \\in \\text{dom}(f)$$\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Answer\n", "\n", "Let's apply the first order condition twice, once at $x$ and once at $y$:\n", "\\begin{eqnarray*}\n", "f(y) & \\geq & f(x) + \\nabla f(x)^\\top(y - x) \\\\\n", "f(x) & \\geq & f(y) + \\nabla f(y)^\\top(x - y).\n", "\\end{eqnarray*}\n", "Add these two inequalities together, and subtract $f(x) + f(y)$ from both sides and you are done!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Problem\n", "\n", "**Second-order Condition** When $f$ is twice differentiable, then $f$ is convex if and only if\n", "the hessian satisfies $\\nabla^2f(x) \\succcurlyeq 0$.\n", "\n", "Find conditions under which the following function is convex $f(x) = \\frac 1 2 x^\\top A x$ for a symmetric matrix $A \\in \\R^{d \\times d}$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Answer\n", "\n", "As we computed earlier in lecture, for $f(x) = \\frac 1 2 x^\\top A x$, the hessian is $\\nabla^2 f(x) = A$. We know that a twice-differentiable function is convex if its hessian is positive semi-definite. Therefore, $f$ is convex if and only if $A$ is positive semi-definite." ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python [conda env:anaconda3]", "language": "python", "name": "conda-env-anaconda3-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.5" } }, "nbformat": 4, "nbformat_minor": 2 }