{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "***\n", "# Lecture 2 - Minima of Convex Functions\n", "***\n", "$\\newcommand{\\vct}[1]{\\mathbf{#1}}$\n", "$\\newcommand{\\mtx}[1]{\\mathbf{#1}}$\n", "$\\newcommand{\\e}{\\varepsilon}$\n", "$\\newcommand{\\norm}[1]{\\|#1\\|}$\n", "$\\newcommand{\\minimize}{\\text{minimize}\\quad}$\n", "$\\newcommand{\\maximize}{\\text{maximize}\\quad}$\n", "$\\newcommand{\\subjto}{\\quad\\text{subject to}\\quad}$\n", "$\\newcommand{\\R}{\\mathbb{R}}$\n", "$\\newcommand{\\trans}{T}$\n", "$\\newcommand{\\ip}[2]{\\langle {#1}, {#2} \\rangle}$\n", "$\\newcommand{\\zerovct}{\\vct{0}}$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this lecture we will study the unconstrained problem\n", "\n", "[1]\n", "\\begin{equation*}\n", " \\minimize\\ f(\\vct{x}),\n", "\\end{equation*}\n", "\n", "where $\\vct{x}\\in \\R^n$. **Optimality conditions** aim to identify properties that potential minimizers need to satisfy in relation to $f(\\vct{x})$. We will review the well known local optimality conditions for differentiable functions from calculus. We then introduce convex functions and discuss some of their properties." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## Unconstrained optimization\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Solutions to [1] come in different flavours, as in the following definition.\n", "\n", "**Definition.** A point $\\vct{x}^*\\in \\R^n$ is a\n", "\n", " * **global minimizer** of [1] if for all $\\vct{x}\\in U$, $f(\\vct{x}^*)\\leq f(\\vct{x})$;\n", " * a **local minimizer**, if there is an open neighbourhood $U$ of $\\vct{x}$ such that $f(\\vct{x}^*)\\leq f(\\vct{x})$ for all $\\vct{x}\\in U$;\n", " * a **strict local minimizer**, if there is an open neighbourhood $U$ of $\\vct{x}$ such that $f(\\vct{x}^*)0$ guarantee that we have a local minimizer. These conditions generalise to higher dimension, but first we need to know what $f''(x)>0$ means when we have more than one variable.\n", "\n", "Recall that a matrix $\\mtx{A}$ is **positive semidefinite**, written $\\mtx{A}\\succeq \\zerovct$, if for every $\\vct{x}\\in \\R^n$, $\\vct{x}^{\\top}\\mtx{A}\\vct{x}\\geq 0$, and **positive definite**, written $\\mtx{A}\\succ \\zerovct$, if $\\vct{x}^{\\top}\\mtx{A}\\vct{x}>0$. The property that the Hessian matrix is positive semidefinite is a multivariate generalization of the property that the second derivative is nonnegative. The known conditions for a minimizer involving the second derivative generalise accordingly.\n", "\n", "**Theorem.** Let $f\\in C^2(U)$ for some open set $U$ and $\\vct{x}^*\\in U$. \n", " If $\\vct{x}^*$ is a local minimizer, then $\\nabla f(\\vct{x}^*)=0$ and $\\nabla^2f(\\vct{x}^*)$ is positive semidefinite. Conversely, if $\\nabla f(\\vct{x}^*)=\\zerovct$ and $\\nabla^2f(\\vct{x}^*)$ is positive definite, then $\\vct{x}^*$ is a strict local minimizer. \n", "\n", "Unfortunately, the above criteria are not able to identify global minimizers, as differentiability is a local property. If, however, the function is **convex**, then we can say a lot more!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## Convex sets and functions\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now come to the central notion of this course. \n", "\n", "**Definition.** A set $C\\subseteq \\R^n$ is **convex** if for all $\\vct{x},\\vct{y}\\in C$ and $\\lambda \\in [0,1]$, the line $\\lambda \\vct{x}+(1-\\lambda)\\vct{y}\\in C$. A **convex body** is a convex set that is closed and bounded.\n", "\n", "**Definition.**\n", "Let $S\\subseteq \\R^n$. A function $f\\colon S\\to \\R$ is called **convex** if $S$ is convex and for all $\\vct{x},\\vct{y}\\in \\R^n$ and $\\lambda\\in [0,1]$,\n", "\n", "\\begin{equation*}\n", " f(\\lambda \\vct{x}+(1-\\lambda)\\vct{y})\\leq \\lambda f(\\vct{x})+(1-\\lambda)f(\\vct{y}).\n", "\\end{equation*}\n", "\n", "The function $f$ is called **strictly convex** if\n", "\n", "\\begin{equation*}\n", " f(\\lambda \\vct{x}+(1-\\lambda)\\vct{y})< \\lambda f(\\vct{x})+(1-\\lambda)f(\\vct{y}).\n", "\\end{equation*}\n", "\n", "A function $f$ is called **concave**, if $-f$ is convex. \n", "\n", "The next figure illustrates how a convex set in $\\R^2$ and a function of one variable looks like. The graph of the function lies below any line connecting two points on it.\n", "\n", "![Convex function](images/convset.png)\n", "\n", "Convex function have pleasant properties, while at the same time covering many of the functions that arise in applications. Perhaps the most important property is that local minima are global minima.\n", "\n", "**Theorem.** Let $f\\colon \\R^n\\to \\R$ be a convex function. Then any local minimizer of $f$ is a global minimizer.\n", "\n", "**Proof.** Let $\\vct{x}^*$ be a local minimizer and assume that it is not a global minimizer. Then there exists a vector $\\vct{y}\\in \\R^n$ such that $f(\\vct{y})" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "import numpy.linalg as la\n", "import matplotlib.pyplot as plt\n", "% matplotlib inline\n", "\n", "# Create random data: we use the randn function\n", "X = np.random.randn(3,2)\n", "y = np.random.randn(3)\n", "\n", "# Solve least squares problem minimize \\|X\\beta-y\\|^2\n", "# the index 0 says that we get the first component of the solution\n", "# (the function lstsq give more output than just the beta vector)\n", "beta = la.lstsq(X,y)[0]\n", "\n", "# Create function and plot the contours\n", "def f(a,b):\n", " return sum((a*X[:,0]+b*X[:,1]-y)**2)\n", "\n", "# Find the \"right\" boundaries around the minimum\n", "xx = np.linspace(beta[0]-8,beta[0]+8,100)\n", "yy = np.linspace(beta[1]-8,beta[1]+8,100)\n", "\n", "# A mesh grid gives a pair of matrix XX and YY,\n", "# such that for each (i,j), the pairs (XX[i,j],YY[i,j])\n", "# cover all the points on the square defined by xx and yy\n", "XX, YY = np.meshgrid(xx,yy)\n", "\n", "# Compute the Z values corresponding to the meshgrid\n", "Z = np.zeros(XX.shape)\n", "for i in range(Z.shape[0]):\n", " for j in range(Z.shape[1]):\n", " Z[i,j] = f(XX[i,j],YY[i,j])\n", "\n", "# Before plotting the contour map, choose a nice colormap\n", "cmap = plt.cm.get_cmap(\"coolwarm\")\n", "plt.contourf(XX,YY,Z, cmap = cmap)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [py27]", "language": "python", "name": "Python [py27]" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.12" } }, "nbformat": 4, "nbformat_minor": 0 }