{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Problem 1 [30 points] Maximum and minimum eigenvalues\n", "\n", "Consider the maximum eigenvalue $\\lambda_{\\max}$ and the minimum eigenvalue $\\lambda_{\\min}$ for any $X\\in\\mathbb{S}^{n}$. \n", "\n", "The purpose of this exercise is to investigate whether $\\lambda_{\\max}(X)$ and $\\lambda_{\\min}(X)$ are convex functions of the matrix variable $X\\in\\mathbb{S}^{n}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## (a) [10 points] An inequality\n", "\n", "For any $x\\in\\mathbb{R}^{n}$ and any $X\\in\\mathbb{S}^{n}$, **derive** the inequality:\n", "\n", "$$\\lambda_{\\min}(X) \\leq \\dfrac{x^{\\top}Xx}{x^{\\top}x} \\leq \\lambda_{\\max}(X).$$\n", "\n", "(**Hint:** start from the spectral theorem applied to a symmetric matrix.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## (b) [10 points] Variational characterization of $\\lambda_{\\max}$ and $\\lambda_{\\min}$\n", "\n", "Use part (a), to **prove** the following two formula:\n", "\n", "$$\\lambda_{\\max}(X) = \\underset{\\|x\\|_{2}=1}{\\sup} x^{\\top}X x, \\qquad \\lambda_{\\min}(X) = \\underset{\\|x\\|_{2}=1}{\\inf} x^{\\top}X x, \\qquad x\\in\\mathbb{R}^{n}, \\quad X\\in\\mathbb{S}^{n}.$$\n", "\n", "(**Hint:** think if and when the equalities are achieved in the weak inequality derived in part (a).)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## (c) [10 points] Function convexity\n", "\n", "Use part (b) to argue which of the two functions $f(X) := \\lambda_{\\max}(X)$ and $g(X) = \\lambda_{\\min}(X)$ is/are convex over $\\textbf{dom}(f) = \\textbf{dom}(g) = \\mathbb{S}^{n}$. Explain.\n", "\n", "(**Hint:** think operations preserving function convexity/concavity from Lec. 8.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Problem 2 [30 points] Support function revisited\n", "\n", "Consider the support function $h(y)$ of any compact convex set introduced in HW3, Prob. 1(b)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## (a) [1 + 9 = 10 points] Function convexity\n", "\n", "Is the support function convex or concave or neither? Give reasons to explain your answer.\n", "\n", "(**Hint:** again think operations preserving function convexity/concavity from Lec. 8.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## (b) [(1 + 7) + 2 = 10 points] Convex conjugate\n", "\n", "(i) What is the function $f(x)$ whose convex conjugate equals the support function $h(y)$? Give reasons to explain your answer.\n", "\n", "(ii) What is the geometric interpretation of your result in 2(b)(i)?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## (c) [(1 + 4) + (1 + 4) = 10 points] Epigraph\n", "\n", "(i) Is the epigraph of support function: $\\text{epi}(h)$ a cone? Why/why not?\n", "\n", "(ii) Is $\\text{epi}(h)$ a convex cone? Why/why not? " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Problem 3 [40 points] Bregman divergence\n", "\n", "Given a **strictly convex differentiable function** $\\psi$ over a closed convex set $\\mathcal{X}$, the associated Bregman divergence $D_{\\psi}:\\mathcal{X}\\times\\mathcal{X}\\mapsto\\mathbb{R}_{\\geq 0}$ is defined as\n", "\n", "$$D_{\\psi}(x,y) := \\psi(x) - \\psi(y) - \\langle\\nabla\\psi(y), x-y\\rangle, \\qquad\\forall x,y\\in\\mathcal{X}.$$\n", "\n", "The above formula has a **clean geometric meaning**: the Bregman divergence quantifies the amount by which a strictly convex function lies above its linear approximation (tangent hyperplane).\n", "\n", "For example, when $\\psi(x) = \\|x\\|_{2}^{2}$, then for any closed $\\mathcal{X}\\subset\\mathbb{R}^{n}$, we have $D_{\\psi}(x,y) = \\|x-y\\|_{2}^{2}$, the squared Euclidean distance. (It is a good idea to verify this yourself!)\n", "\n", "As another example, when $\\psi(x) = \\sum_{i=1}^{n}x_{i}\\log x_{i}$, $\\mathcal{X}\\equiv \\{x\\in\\mathbb{R}^{n}_{\\geq 0}\\mid \\sum_{i=1}^{n}x_{i}=1\\}$ (the standard simplex, see Lec. 5, p. 15), then $D_{\\psi}(x,y) = \\sum_{i=1}^{n}x_{i}\\log(x_{i}/y_{i})$, the Kullback-Leibler divergence/relative entropy. (Again, good idea to verify this yourself!)\n", "\n", "Even though $D_{\\psi}(x,y)\\neq D_{\\psi}(y,x)$ in general, the Bregman divergence has [many nice properties](https://en.wikipedia.org/wiki/Bregman_divergence#Properties), and appears frequently in the machine learning applications." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## (a) [(1 + 4) + (2 + 3) + 5 = 15 points] Function convexity\n", "\n", "(i) Is $D_{\\psi}(x,y)$ **strictly** convex in $x$? Why/why not?\n", "\n", "(ii) If $\\psi$ is **$m$-strongly convex**, then prove that $D_{\\psi}(x,y) \\geq \\frac{m}{2}\\|x-y\\|_{2}^{2}$, **and** that $D_{\\psi}(x,y)$ is $m$-strongly convex in $x$. \n", "\n", "(iii) Construct a counterexample to demonstrate that $D_{\\psi}(x,y)$ **need not be convex in its second argument $y$**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## (b) [5 + 5 + 15 = 25 points] Distance-like function\n", "\n", "The Bregman divergence $D_{\\psi}(x,y)$ behaves like **the square of** a generalized distance between elements in $\\mathcal{X}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(i) **Prove** that \n", "$$\\nabla_{x}D_{\\psi}(x,y) = \\nabla_{x}\\psi(x) - \\nabla_{y}\\psi(y),$$ \n", "\n", "where the subscript of the gradient operator $\\nabla$ shows the gradient is taken with respect to which vector. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(ii) **Prove** the generalized [law of cosines](https://en.wikipedia.org/wiki/Law_of_cosines):\n", "\n", "$$D_{\\psi}(x,y) = D_{\\psi}(x,z) + D_{\\psi}(z,y) - \\langle x-z, \\nabla\\psi(y) - \\nabla\\psi(z)\\rangle, \\qquad \\forall x,y,z\\in\\mathcal{X}.$$\n", "\n", "**Remark:** Notice that for $\\psi(x) = \\|x\\|_{2}^2$, this reduces to $\\|x-y\\|_{2}^{2} = \\|x-z\\|_{2}^{2} + \\|z-y\\|_{2}^{2} - 2\\underbrace{\\langle x-z, y-z\\rangle}_{\\|x-z\\|_{2}\\|y-z\\|_{2}\\cos\\angle yzx}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(iii) Recall that $\\mathcal{X}\\equiv\\textbf{dom}(\\psi)$. Consider a closed convex set $\\mathcal{C}$ such that $\\mathcal{C}\\cap\\text{interior}\\left(\\mathcal{X}\\right)\\neq\\emptyset$. For notational convenience, introduce $\\mathcal{A}:=\\mathcal{C}\\cap\\text{interior}\\left(\\mathcal{X}\\right)$, which is a convex set.\n", "\n", "Define the **Bregman projection** of $y$ onto $\\mathcal{A}$ with respect to $\\psi$ as \n", "\n", "$${\\rm{proj}}_{\\mathcal{A}}\\left(y\\right) := \\underset{a\\in\\mathcal{A}}{\\arg\\min} \\,D_{\\psi}\\left(a, y\\right). \\qquad $$\n", "\n", "Using parts (i)-(ii), **prove** the genralized [Pythagorean theorem](https://en.wikipedia.org/wiki/Pythagorean_theorem):\n", "$$D_{\\psi}(x,y) \\geq D_{\\psi}(x, {\\rm{proj}}_{\\mathcal{A}}\\left(y\\right)) \\: + \\: D_{\\psi}( {\\rm{proj}}_{\\mathcal{A}}\\left(y\\right),y).$$\n", "\n", "**Hint:** If $a^{\\text{opt}}$ is a local minimizer of the (possibly nonconvex) function $f(a)$ over $\\mathcal{A}$, then the directional derivative \n", "$$\\bigg\\langle a - a^{\\text{opt}}, \\nabla_{a} f(a)\\big\\vert_{a=a^{\\text{opt}}}\\bigg\\rangle \\geq 0, \\qquad \\forall a\\in\\mathcal{A}.$$\n", "If $f$ is convex over $\\mathcal{A}$, then the above inequality is also sufficient to guarantee that $a^{\\text{opt}}$ is a minimizer.\n", "\n", "**Remark:** In the generalized Pythagorean theorem, the equality is achieved when the set $\\mathcal{A}$ is affine." ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.16" } }, "nbformat": 4, "nbformat_minor": 2 }