{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## EECS 545 - Machine Learning\n",
    "## Lecture 3: Convex Optimization\n",
    "### Date: January 13, 2016\n",
    "### Instructor: Jacob Abernethy\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Populating the interactive namespace from numpy and matplotlib\n"
     ]
    }
   ],
   "source": [
    "from IPython.core.display import HTML, Image\n",
    "from IPython.display import YouTubeVideo\n",
    "from sympy import init_printing, Matrix, symbols, Rational\n",
    "import sympy as sym\n",
    "from warnings import filterwarnings\n",
    "init_printing(use_latex = 'mathjax')\n",
    "filterwarnings('ignore')\n",
    "\n",
    "%pylab inline\n",
    "\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Some important notes\n",
    "* HW1 is out! Due January 25th at 11pm\n",
    "* Homework will be submitted via *Gradescope*. Please see Piazza for precise instructions. Do it soon, not at the last minute!!\n",
    "* There is an *optional* tutorial **this evening**, 5pm, in Dow 1013. Come see Daniel go over some tough problems.\n",
    "* No class on Monday January 18, MLK day!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Python: We recommend Anacdona\n",
    "<img src=\"https://www.continuum.io/sites/all/themes/continuum_foundation/images/logos/logo-anaconda.svg\" alt=\"Drawing\" style=\"width: 300px;\"/>\n",
    "* Anaconda is standalone Python distribution that includes all the most important scientific packages: *numpy, scipy, matplotlib, sympy, sklearn, etc.*\n",
    "* Easy to install, available for OS X, Windows, Linux.\n",
    "* Small warning: it's kind of large (250MB)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Some notes on using Python\n",
    "* HW1 has only a very simple programming exercise, just as a warmup. We don't expect you to submit code this time\n",
    "* This is a good time to start learning Python basics\n",
    "* There are **a ton** of good places on the web to learn python, we'll post some\n",
    "* Highly recommended: **ipython**; it's a much more user friendly terminal interface to python\n",
    "* Even better: **jupyter notebook**, a web based interface. This is how I'm making these slides!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Checking if all is installed, and HelloWorld\n",
    "If you got everything installed, this should run:\n",
    "```python\n",
    "\n",
    "# numpy is crucial for vectors, matrices, etc.\n",
    "import numpy as np              \n",
    "# Lots of cool plotting tools with matplotlib\n",
    "import matplotlib.pyplot as plt\n",
    "# For later: scipy has a ton of stats tools\n",
    "import scipy as sp\n",
    "# For later: sklearn has many standard ML algs\n",
    "import sklearn\n",
    "# Here we go!\n",
    "print(\"Hello World!\")\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## More on learning python\n",
    "* We will have one tutorial devoted to this\n",
    "* If you're new to Python, go slow!\n",
    "    * First learn the basics (lists, dicts, for loops, etc.)\n",
    "    * Then spend a couple days playing with numpy\n",
    "    * Then explore matplotlib\n",
    "    * etc.\n",
    "* Piazza = your friend. We have a designated python instructor (IA Ben Bray) who has lots of answers."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Outline for this Lecture\n",
    "- Convexity\n",
    "    - Convex Set\n",
    "    - Convex Function\n",
    "- Introduction to Optimization\n",
    "- Introduction to Lagrange Duality"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "> In this lecture, we will first introduce convex set, convex function and optimization problem. One approach to solve optimization problem is to solve its dual problem. We will briefly cover some basics of duality in this lecture. More about optimization and duality will come when we study support vector machine (SVM)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Convexity\n",
    "### Convex Sets\n",
    "- $C \\subseteq \\mathbb{R}^n$ is **convex** if\n",
    "$$t x + (1-t)y \\in C$$\n",
    "for any $x, y \\in C$ and $0 \\leq t \\leq 1$\n",
    "- that is, a set is convex if the line connecting **any** two points in the set is entirely inside the set\n",
    "\n",
    "<div class=\"container-fluid\">\n",
    "<div class=\"row\" style=\"width=100%\">\n",
    "  <div class=\"col-md-6\"><img src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/0/06/Convex_polygon_illustration1.png/254px-Convex_polygon_illustration1.png\"  style=\"float: center width:254px; height:240px\"></div>\n",
    "  <div class=\"col-md-6\"><img src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/1/11/Convex_polygon_illustration2.png/254px-Convex_polygon_illustration2.png\" style=\"float: center; width:254px; height:240px;\"></div>\n",
    "</div>\n",
    "</div>\n",
    "\n",
    "<center><span>(Left: Convex Set; Right: Non-convex Set)</span></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Convex Functions\n",
    "* We say that a function $f$ is *convex* if, for any distinct pair of points $x,y$ we have\n",
    "$$f(tx_1+(1-t)x_2) \\leq tf(x_1)+(1-t)f(x_2) \\quad \\forall t \\in[0,1]$$\n",
    "* A function $f$ is said to be *concave* if $-f$ is convex\n",
    "<p align=\"center\"><img src='http://www.probabilitycourse.com/images/chapter6/Convex_b.png' width=\"400px\"/></p>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Fun Facts About Convex Functions\n",
    "* If $f$ is differentiable, then $f$ is convex iff $f$ \"lies above its linear approximation\", i.e.:\n",
    "$$ f(x + y) \\geq f(x) + \\nabla_x f(x) \\cdot y \\quad \\forall x,y$$\n",
    "* If $f$ is twice-differentiable, then the hessian is always positive semi-definite!\n",
    "* This last one you will show on your homeowork :-)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Introduction to Optimization\n",
    "### The Most General Optimization Problem\n",
    "Assume $f$ is some function, and $C \\subset \\mathbb{R}^n$ is some set. The following is an *optimization problem*:\n",
    "$$\n",
    "\\begin{array}{ll}\n",
    "\\mbox{minimize} & f(x) \\\\\n",
    "\\mbox{subject to} & x \\in C\n",
    "\\end{array}\n",
    "$$\n",
    "* How hard is it to find a solution that is (near-) optimal? This is one of the fundamental problems in Computer Science and Operations Research.\n",
    "* A huge portion of ML relies on this task"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### A Rough Optimization Hierarchy\n",
    "$$\n",
    "\\mbox{minimize } \\ f(x) \\quad \\mbox{subject to } x \\in C\n",
    "$$\n",
    "* **[Really Easy]** $C = \\mathbb{R}^n$ (i.e. problem is *unconstrained*), $f$ is convex, $f$ is differentiable, strictly convex, and \"slowly-changing\" gradients\n",
    "* **[Easyish]** $C = \\mathbb{R}^n$, $f$ is convex\n",
    "* **[Medium]** $C$ is a convex set, $f$ is convex\n",
    "* **[Hard]** $C$ is a convex set, $f$ is non-convex\n",
    "* **[REALLY Hard]** $C$ is an arbitrary set, $f$ is non-convex"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Optimization Without Constraints\n",
    "$$\n",
    "\\begin{array}{ll}\n",
    "\\mbox{minimize} & f(x) \\\\\n",
    "\\mbox{subject to} & x \\in \\mathbb{R}^n\n",
    "\\end{array}\n",
    "$$\n",
    "* This problem tends to be easier than constrained optimization\n",
    "* We just need to find an $x$ such that $\\nabla f(x) = \\vec{0}$\n",
    "* Techniques like *gradient descent* or *Newton's method* work in this setting. (More on this later)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Optimization With Constraints\n",
    "$$\n",
    "\\begin{aligned}\n",
    "& {\\text{minimize}} & & f(\\mathbf{x})\\\\\n",
    "& \\text{subject to} & & g_i(\\mathbf{x}) \\leq 0, \\quad i = 1, ..., m\\\\\n",
    "& & & h_j(x) = 0, \\quad j = 1, ..., n\n",
    "\\end{aligned}\n",
    "$$\n",
    "* Here $C = \\{ x : g_i(x) \\leq 0,\\ h_j(x) = 0, \\ i=1, \\ldots, m,\\ j = 1, ..., n \\}$\n",
    "* $C$ is convex as long as all $g_i(x)$ convex and all $h_j(x)$ affine\n",
    "* The solution of this optimization may occur in the *interior* of $C$, in which case the optimal $x$ will have $\\nabla f(x) = 0$\n",
    "* But what if the solution occurs on the *boundary* of $C$?\n",
    "<center> <img src=\"http://staff.www.ltu.se/~larserik/applmath/chap7en/lagrange.gif\"  style=\"float:right width:350px;height:250px;\"> </center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Introduction to Lagrange Duality\n",
    "* In some cases original (**primal**) optimization problem can hard to solve, solving a proxy problem sometimes can be easier\n",
    "* The proxy problem could be **dual** problem which is transformed from primal problem\n",
    "* Here is how to transform from primal to dual. For primal problem\n",
    "$$\n",
    "\\begin{aligned}\n",
    "& {\\text{minimize}} & & f(\\mathbf{x})\\\\\n",
    "& \\text{subject to} & & g_i(\\mathbf{x}) \\leq 0, \\quad i = 1, ..., m\\\\\n",
    "& & & h_j(x) = 0, \\quad j = 1, ..., n\n",
    "\\end{aligned}\n",
    "$$\n",
    "Its Lagrangian is \n",
    "$$L(x,\\boldsymbol{\\lambda}, \\boldsymbol{\\nu}) := f(x) + \\sum_{i=1}^m \\lambda_i g_i(x) + \\sum_{j=1}^n \\nu_j h_j(x)$$\n",
    "of which $\\boldsymbol{\\lambda} \\in \\mathbb{R}^m$, $\\boldsymbol{\\nu} \\in \\mathbb{R}^n$ are **dual variables**\n",
    "\n",
    "* The **Langragian dual function** is \n",
    "$$L_D(\\boldsymbol{\\lambda}, \\boldsymbol{\\nu}) \\triangleq \\underset{x}{\\inf}L(x,\\boldsymbol{\\lambda}, \\boldsymbol{\\nu}) = \\underset{x}{\\inf} \\ \\left[ f(x) + \\sum_{i=1}^m \\lambda_i g_i(x) + \\sum_{j=1}^n \\nu_j h_j(x) \\right] $$\n",
    "The minization is usually done by finding the stable point of $L(x,\\boldsymbol{\\lambda}, \\boldsymbol{\\nu})$ with respect to $x$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### The Lagrange Dual Problem\n",
    "\n",
    "* Then the **dual problem** is \n",
    "$$\n",
    "\\begin{aligned}\n",
    "& {\\text{maximize}} & & L_D(\\mathbf{\\lambda}, \\mathbf{\\nu})\\\\\n",
    "& \\text{subject to} & & \\lambda_i, \\nu_j \\geq 0 \\quad i = 1, \\ldots, m ,\\ j = 1, ..., n\\\\\n",
    "\\end{aligned}\n",
    "$$\n",
    "Instead of solving primal problem with respect to $x$, we now need to solve dual problem with respect to $\\mathbf{\\lambda}$ and $\\mathbf{\\nu}$\n",
    "* $ L_D(\\mathbf{\\lambda}, \\mathbf{\\nu})$ is concave even if primal problem is not convex\n",
    "* Let the $p^*$ and $d^*$ denote the optimal values of primal problem and dual problem, we always have *weak duality*: $p^* \\geq d^*$\n",
    "* Under nice conditions, we get *strong duality*: $p^* = d^*$\n",
    "* Many details are ommited here and they will come when we study **support vector machine (SVM)**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Recommended reading:\n",
    "* Free online!\n",
    "* Chapter 5 covers duality\n",
    "<img src=\"http://stanford.edu/~boyd/cvxbook/bv_cvxbook_cover.jpg\" width=\"200px\"/>"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}