{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "# Table of Contents\n", "

1  Convexity, Duality, and Optimality Conditions (KL Chapter 11, BV Chapter 5)
1.1  Convexity
1.2  Duality
1.3  KKT optimality conditions
1.3.1  Nonconvex problems
1.3.2  Convex problems
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Convexity, Duality, and Optimality Conditions (KL Chapter 11, BV Chapter 5)\n", "\n", "## Convexity\n", "\n", "\n", "\n", "* A function $f: \\mathbb{R}^n \\mapsto \\mathbb{R}$ is **convex** if \n", " 1. $\\text{dom} f$ is a convex set: $\\lambda \\mathbf{x} + (1-\\lambda) \\mathbf{y} \\in \\text{dom} f$ for all $\\mathbf{x},\\mathbf{y} \\in \\text{dom} f$ and any $\\lambda \\in (0, 1)$, and \n", " 2. $f(\\lambda \\mathbf{x} + (1-\\lambda) \\mathbf{y}) \\le \\lambda f(\\mathbf{x}) + (1-\\lambda) f(\\mathbf{y})$, for all $\\mathbf{x},\\mathbf{y} \\in \\text{dom} f$ and $\\lambda \\in (0,1)$.\n", " \n", "* $f$ is **strictly convex** if the inequality is strict for all $\\mathbf{x} \\ne \\mathbf{y} \\in \\text{dom} f$ and $\\lambda$.\n", "\n", "* **Supporting hyperplane inequality**. A differentiable function $f$ is convex if and only if \n", "$$\n", "f(\\mathbf{x}) \\ge f(\\mathbf{y}) + \\nabla f(\\mathbf{y})^T (\\mathbf{x}-\\mathbf{y})\n", "$$ \n", "for all $\\mathbf{x}, \\mathbf{y} \\in \\text{dom} f$.\n", "\n", "* **Second-order condition for convexity**. A twice differentiable function $f$ is convex if and only if $\\nabla^2f(\\mathbf{x})$ is psd for all $\\mathbf{x} \\in \\text{dom} f$. It is strictly convex if and only if $\\nabla^2f(\\mathbf{x})$ is pd for all $\\mathbf{x} \\in \\text{dom} f$.\n", "\n", "* Convexity and global optima. Suppose $f$ is a convex function. \n", " 1. Any stationary point $\\mathbf{y}$, i.e., $\\nabla f(\\mathbf{y})=\\mathbf{0}$, is a global minimum. (Proof: By supporting hyperplane inequality, $f(\\mathbf{x}) \\ge f(\\mathbf{y}) + \\nabla f(\\mathbf{y})^T (\\mathbf{x} - \\mathbf{y}) = f(\\mathbf{y})$ for all $\\mathbf{x} \\in \\text{dom} f$.) \n", " 2. Any local minimum is a global minimum. \n", " 3. The set of (global) minima is convex. \n", " 4. If $f$ is strictly convex, then the global minimum, if exists, is unique.\n", " \n", "* Example: Least squares estimate. $f(\\beta) = \\frac 12 \\| \\mathbf{y} - \\mathbf{X} \\beta \\|_2^2$ has Hessian $\\nabla^2f = \\mathbf{X}^T \\mathbf{X}$ which is psd. So $f$ is convex and any stationary point (solution to the normal equation) is a global minimum. When $\\mathbf{X}$ is rank deficient, the set of solutions is convex.\n", "\n", "* **Jensen's inequality**. If $h$ is convex and $\\mathbf{W}$ a random vector taking values in $\\text{dom} f$, then \n", "$$\n", "\t\\mathbf{E}[h(\\mathbf{W})] \\ge h [\\mathbf{E}(\\mathbf{W})],\n", "$$\n", "provided both expectations exist. For a strictly convex $h$, equality holds if and only if $W = \\mathbf{E}(W)$ almost surely. \n", "\n", " Proof: Take $\\mathbf{x} = \\mathbf{W}$ and $\\mathbf{y} = \\mathbf{E} (\\mathbf{W})$ in the supporting hyperplane inequality.\n", " \n", "* **Information inequality**. Let $f$ and $g$ be two densities with respect to a common measure $\\mu$ and $f, g>0$ almost everywhere relative to $\\mu$. Then \n", "$$\n", "\t\\mathbf{E}_f (\\log f) \\ge \\mathbf{E}_f (\\log g),\n", "$$\n", "with equality if and only if $f = g$ almost everywhere on $\\mu$. \n", "\n", " Proof: Apply Jensen's inequality to the convex function $- \\ln(t)$ and random variable $W=g(X)/f(X)$ where $X \\sim f$.\n", " \n", " Important applications of information inequality: M-estimation, EM algorithm." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Duality\n", "\n", "* Consider optimization problem\n", "\\begin{eqnarray*}\n", " &\\text{minimize}& f_0(\\mathbf{x}) \\\\\n", " &\\text{subject to}& f_i(\\mathbf{x}) \\le 0, \\quad i = 1,\\ldots,m \\\\\n", " & & h_i(\\mathbf{x}) = 0, \\quad i = 1,\\ldots,p.\n", "\\end{eqnarray*}\n", "\n", "* The **Lagrangian** is \n", "\\begin{eqnarray*}\n", " L(\\mathbf{x}, \\lambda, \\nu) = f_0(\\mathbf{x}) + \\sum_{i=1}^m \\lambda_i f_i(\\mathbf{x}) + \\sum_{i=1}^p \\nu_i h_i(\\mathbf{x}).\n", "\\end{eqnarray*}\n", "The vectors $\\lambda = (\\lambda_1,\\ldots, \\lambda_m)^T$ and $\\nu = (\\nu_1,\\ldots,\\nu_p)^T$ are called the **Lagrange multiplier vectors** or **dual variables**.\n", "\n", "* The **Lagrange dual function** is the minimum value of the Langrangian over $\\mathbf{x}$\n", "\\begin{eqnarray*}\n", " g(\\lambda, \\mu) = \\inf_{\\mathbf{x}} L(\\mathbf{x}, \\lambda, \\nu) = \\inf_{\\mathbf{x}} \\left( f_0(\\mathbf{x}) + \\sum_{i=1}^m \\lambda_i f_i(\\mathbf{x}) + \\sum_{i=1}^p \\nu_i h_i(\\mathbf{x}) \\right).\n", "\\end{eqnarray*}\n", "\n", "* Denote the optimal value of original problem by $p^\\star$. For any $\\lambda \\succeq \\mathbf{0}$ and any $\\nu$, we have\n", "\\begin{eqnarray*}\n", " g(\\lambda, \\nu) \\le p^\\star.\n", "\\end{eqnarray*}\n", "Proof: For any feasible point $\\tilde{\\mathbf{x}}$, \n", "\\begin{eqnarray*}\n", " L(\\tilde{\\mathbf{x}}, \\lambda, \\nu) = f_0(\\tilde{\\mathbf{x}}) + \\sum_{i=1}^m \\lambda_i f_i(\\tilde{\\mathbf{x}}) + \\sum_{i=1}^p \\nu_i h_i(\\tilde{\\mathbf{x}}) \\le f_0(\\tilde{\\mathbf{x}})\n", "\\end{eqnarray*}\n", "because the second term is non-positive and the third term is zero. Then\n", "\\begin{eqnarray*}\n", " g(\\lambda, \\mu) = \\inf_{\\mathbf{x}} L(\\mathbf{x}, \\lambda, \\mu) \\le L(\\tilde{\\mathbf{x}}, \\lambda, \\nu) \\le f_0(\\tilde{\\mathbf{x}}).\n", "\\end{eqnarray*}\n", "\n", "* Since each pair $(\\lambda, \\nu)$ with $\\lambda \\succeq \\mathbf{0}$ gives a lower bound to the optimal value $p^\\star$. It is natural to ask for the best possible lower bound the Lagrange dual function can provide. This leads to the **Lagrange dual problem**\n", "\\begin{eqnarray*}\n", " &\\text{maximize}& g(\\lambda, \\nu) \\\\\n", " &\\text{subject to}& \\lambda \\succeq \\mathbf{0},\n", "\\end{eqnarray*}\n", "which is a convex problem regardless the primal problem is convex or not.\n", "\n", "* We denote the optimal value of the Lagrange dual problem by $d^\\star$, which satifies the **week duality**\n", "\\begin{eqnarray*}\n", " d^\\star \\le p^\\star.\n", "\\end{eqnarray*}\n", "The difference $p^\\star - d^\\star$ is called the **optimal duality gap**. \n", "\n", "* If the primal problem is convex, that is\n", "\\begin{eqnarray*}\n", " &\\text{minimize}& f_0(\\mathbf{x}) \\\\\n", " &\\text{subject to}& f_i(\\mathbf{x}) \\le 0, \\quad i = 1,\\ldots,m \\\\\n", " & & \\mathbf{A} \\mathbf{x} = \\mathbf{b},\n", "\\end{eqnarray*}\n", "with $f_0,\\ldots,f_m$ convex, we usually (but not always) have the **strong duality**, i.e., $d^\\star = p^\\star$. \n", "\n", "* The conditions under which strong duality holds are called **constraint qualifications**. A commonly used one is **Slater's condition**: There exists a point in the relative interior of the domain such that\n", "\\begin{eqnarray*}\n", " f_i(\\mathbf{x}) < 0, \\quad i = 1,\\ldots,m, \\quad \\mathbf{A} \\mathbf{x} = \\mathbf{b}.\n", "\\end{eqnarray*}\n", "Such a point is also called **strictly feasible**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## KKT optimality conditions\n", "\n", "KKT is \"one of the great triumphs of 20th century applied mathematics\" (KL Chapter 11).\n", "\n", "\n", "\n", "\n", "\n", "### Nonconvex problems\n", "\n", "* Assume $f_0,\\ldots,f_m,h_1,\\ldots,h_p$ are differentiable. Let $\\mathbf{x}^\\star$ and $(\\lambda^\\star, \\nu^\\star)$ be any primal and dual optimal points with zero duality gap, i.e., strong duality holds. \n", "\n", "* Since $\\mathbf{x}^\\star$ minimizes $L(\\mathbf{x}, \\lambda^\\star, \\nu^\\star)$ over $\\mathbf{x}$, its gradient vanishes at $\\mathbf{x}^\\star$, we have the **Karush-Kuhn-Tucker (KKT) conditions**\n", "\\begin{eqnarray*}\n", " f_i(\\mathbf{x}^\\star) &\\le& 0, \\quad i = 1,\\ldots,m \\\\\n", " h_i(\\mathbf{x}^\\star) &=& 0, \\quad i = 1,\\ldots,p \\\\\n", " \\lambda_i^\\star &\\ge& 0, \\quad i = 1,\\ldots,m \\\\\n", " \\lambda_i^\\star f_i(\\mathbf{x}^\\star) &=& 0, \\quad i=1,\\ldots,m \\\\\n", " \\nabla f_0(\\mathbf{x}^\\star) + \\sum_{i=1}^m \\lambda_i^\\star \\nabla f_i(\\mathbf{x}^\\star) + \\sum_{i=1}^p \\nu_i^\\star \\nabla h_i(\\mathbf{x}^\\star) &=& \\mathbf{0}.\n", "\\end{eqnarray*}\n", "\n", "* The fourth condition (**complementary slackness**) follows from\n", "\\begin{eqnarray*}\n", " f_0(\\mathbf{x}^\\star) &=& g(\\lambda^\\star, \\nu^\\star) \\\\\n", " &=& \\inf_{\\mathbf{x}} \\left( f_0(\\mathbf{x}) + \\sum_{i=1}^m \\lambda_i^\\star f_i(\\mathbf{x}) + \\sum_{i=1}^p \\nu_i^\\star h_i(\\mathbf{x}) \\right) \\\\\n", " &\\le& f_0(\\mathbf{x}^\\star) + \\sum_{i=1}^m \\lambda_i^\\star f_i(\\mathbf{x}^\\star) + \\sum_{i=1}^p \\nu_i^\\star h_i(\\mathbf{x}^\\star) \\\\\n", " &\\le& f_0(\\mathbf{x}^\\star).\n", "\\end{eqnarray*}\n", "Since $\\sum_{i=1}^m \\lambda_i^\\star f_i(\\mathbf{x}^\\star)=0$ and each term is non-positive, we have $\\lambda_i^\\star f_i(\\mathbf{x}^\\star)=0$, $i=1,\\ldots,m$.\n", "\n", "* To summarize, for any optimization problem with differentiable objective and constraint functions for which strong duality obtains, any pair of primal and dual optimal points must satisfy the KKT conditions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Convex problems\n", "\n", "* When the primal problem is convex, the KKT conditions are also sufficient for the points to be primal and dual optimal. \n", "\n", "* If $f_i$ are convex and $h_i$ are affine, and $(\\tilde{\\mathbf{x}}, \\tilde \\lambda, \\tilde \\nu)$ satisfy the KKT conditions, then $\\tilde{\\mathbf{x}}$ and $(\\tilde \\lambda, \\tilde \\nu)$ are primal and dual optimal, with zero duality gap.\n", "\n", "* The KKT conditions play an important role in optimization. Many algorithms for convex optimization are conceived as, or can be interpreted as, methods for solving the KKT conditions." ] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.1.0", "language": "julia", "name": "julia-1.1" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.1.0" }, "toc": { "colors": { "hover_highlight": "#DAA520", "running_highlight": "#FF0000", "selected_highlight": "#FFD700" }, "moveMenuLeft": true, "nav_menu": { "height": "117.19999694824219px", "width": "251.60000610351562px" }, "navigate_menu": true, "number_sections": true, "sideBar": true, "skip_h1_title": true, "threshold": 4, "toc_cell": true, "toc_position": { "height": "399px", "left": "0px", "right": "1237.800048828125px", "top": "33px", "width": "199.8000030517578px" }, "toc_section_display": "block", "toc_window_display": true, "widenNotebook": false } }, "nbformat": 4, "nbformat_minor": 2 }