{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Problem 1 [55 points] Convex sets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## (a) [5 x 5 = 25 points] Prove or disprove set convexity\n", "\n", "For each of the following, **state clearly if the set is convex or not**. Then **prove your statement**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(i) Rectangle: (vector inequality is to be understood elementwise)\n", "\n", "$$\\{x\\in\\mathbb{R}^{n} \\mid \\alpha \\leq x \\leq \\beta, \\quad \\alpha,\\beta\\in\\mathbb{R}^{n}\\}.$$ \n", "\n", "\n", "(ii) Wedge: \n", "\n", "$$\\{x\\in\\mathbb{R}^{n} \\mid a_{1}^{\\top}x \\leq b_{1}, \\quad a_{2}^{\\top}x \\leq b_{2}, \\quad a_{1},a_{2}\\in\\mathbb{R}^{n}, \\quad b_{1},b_{2}\\in\\mathbb{R}\\}.$$\n", "\n", "(iii) Set of points closer to a given point than a given set: \n", "\n", "$$\\{x\\in\\mathbb{R}^{n} \\mid \\| x - x_{0}\\|_{2} \\:\\leq\\: \\| x - y\\|_{2}, \\quad \\forall\\: y\\in\\mathcal{S}\\subseteq\\mathbb{R}^{n}, \\quad x_{0}\\in\\mathbb{R}^{n}\\}.$$ \n", "\n", "(iv) Set of points closer to one set than another:\n", "\n", "$$\\{x\\in\\mathbb{R}^{n} \\mid \\text{dist}\\left(x,\\mathcal{A}\\right)\\leq \\text{dist}\\left(x,\\mathcal{B}\\right), \\quad \\mathcal{A},\\mathcal{B}\\subset\\mathbb{R}^{n}\\}, \\quad \\text{dist}\\left(x,\\mathcal{A}\\right):= \\underset{a\\in\\mathcal{A}}{\\inf}\\|x-a\\|_{2}.$$\n", "\n", "(v) The set:\n", "$$\\{c\\in\\mathbb{R}^{n+1} \\mid c_{0} + c_{n} = 1, \\big|c_{0} + c_{1}x + c_{2}x^{2} + ... + c_{n}x^{n}\\big| \\leq 1 \\;\\text{for}\\; 2020\\leq x\\leq 2021\\}.$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solution for 1(a):\n", "\n", "(i) **Convex**. In fact, it is a polyhedron, since the set is an intersection of $2n$ halfspaces in $\\mathbb{R}^{n}$.\n", "\n", "(ii) **Convex**. In fact, a polyhedron since it is an intersection of 2 halfspaces. It is a convex (polyhedral) cone for $b_1 = b_2 =0$.\n", "\n", "(iii) **Convex**. This is beacuse the set can be expressed as\n", "$$\\bigcap_{y\\in\\mathcal{S}}\\{x\\in\\mathbb{R}^{n} \\:\\mid\\: \\parallel x - x_{0}\\parallel_{2} \\:\\leq\\: \\parallel x - y\\parallel_{2}\\},$$\n", "that is, an intersection of halfspaces. Notice that for fixed $y\\in\\mathbb{R}^{n}$, the set $\\{x\\in\\mathbb{R}^{n} \\:\\mid\\: \\parallel x - x_{0}\\parallel_{2} \\:\\leq\\: \\parallel x - y\\parallel_{2}\\}$ is a halfspace.\n", "\n", "(iv) In general, this set is **nonconvex**. To disprove convexity, consider counterexample: $n=1$ with $\\mathcal{A}=\\{-1,1\\}$ and $\\mathcal{B}=\\{0\\}$. Then\n", "$$\\{x\\in\\mathbb{R} \\:\\mid\\: {\\rm{dist}}(x,\\mathcal{A})\\leq {\\rm{dist}}(x,\\mathcal{B}), \\quad \\mathcal{A},\\mathcal{B}\\subseteq\\mathbb{R}^{n}\\} = \\{x \\leq -1/2\\} \\cup \\{x \\geq 1/2\\},$$\n", "which is not convex.\n", "\n", "(v) **Convex**. For fixed $x\\in[2020,2021]$, let \n", "$$\\mathcal{S}_{x} := \\{c\\in\\mathbb{R}^{n+1}\\mid c_{0}+c_{n}=1, \\big|c_{0} + c_{1}x + c_{2}x^{2} + ... + c_{n}x^{n}\\big| \\leq 1\\}.$$\n", "For each fixed $x$, the set $\\mathcal{S}_{x}$ is convex since it is an intersection of 2 halfspaces and 1 hyperplane in $\\mathbb{R}^{n+1}$. The given set is $\\bigcap_{2020\\leq x\\leq2021}\\mathcal{S}_{x}$, that is, an uncountable intersection of convex sets, and hence convex (see Lec. 6, p. 2). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## (b) [10 + 10 + 10 = 30 points] Support function\n", "\n", "The support function of a compact convex set $\\mathcal{X}\\subset\\mathbb{R}^{n}$ is defined as\n", "\n", "$$h(y) := \\sup_{x\\in\\mathcal{X}}\\:\\{x^{\\top}y \\:\\mid y\\in\\mathbb{R}^{n}\\},$$\n", "\n", "which can be geometrically interpreted as the perpendicular distane from the origin to the supporting hyperplane of $\\mathcal{X}$. Therefore, the support function uniquely determines a convex set." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(i) Suppose two compact convex sets $\\mathcal{X}_{1},\\mathcal{X}_{2}\\subset\\mathbb{R}^{n}$ have support functions $h_1(\\cdot)$ and $h_2(\\cdot)$, respectively. Derive the support function $h(\\cdot)$ for the set sum/Minkowski sum $\\mathcal{X}_{1}+\\mathcal{X}_{2}$ (which is a convex set, see Lec. 6, p. 4). In other words, express $h(\\cdot)$ in terms of $h_{1}(\\cdot)$ and $h_2(\\cdot)$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solution for 1(b)(i):\n", "\n", "By definition, the Minkowski sum\n", "$$\\mathcal{X} := \\mathcal{X}_{1} + \\mathcal{X}_{2} := \\{x\\in\\mathbb{R}^{n} \\:\\mid\\: x = x_{1} + x_{2}, \\: x_{1} \\in \\mathcal{X}_{1} \\subset \\mathbb{R}^{n}, \\: x_{2} \\in \\mathcal{X}_{2}\\subset \\mathbb{R}^{n}\\}.$$\n", "Therefore, its support function\n", "$$h(y) = \\sup_{x = (x_{1} + x_{2})\\in\\mathcal{X}}\\:\\{x^{\\top}y \\:\\mid y\\in\\mathbb{R}^{n}\\} = \\sup_{x_{1} \\in \\mathcal{X}_{1}, x_{2}\\in\\mathcal{X}_{2}}\\:\\{(x_{1} + x_{2})^{\\top}y \\:\\mid y\\in\\mathbb{R}^{n}\\} = \\sup_{x_{1} \\in \\mathcal{X}_{1}}\\:\\{x_{1}^{\\top}y \\:\\mid y\\in\\mathbb{R}^{n}\\} \\:+\\: \\sup_{x_{2}\\in\\mathcal{X}_{2}}\\:\\{x_{2}^{\\top}y \\:\\mid y\\in\\mathbb{R}^{n}\\} = h_{1}(y) + h_{2}(y).$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(ii) Prove that $h(y+z) \\leq h(y) + h(z)$ for any compact convex set $\\mathcal{X}\\subset\\mathbb{R}^{n}$, and any $y,z\\in\\mathbb{R}^{n}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solution for 1(b)(ii):\n", "\n", "Using the definition of support function, it is clear that we need to establish: $\\underset{x\\in\\mathcal{X}}{\\sup}\\{x^{\\top}y + x^{\\top}z\\} \\leq \\underset{x\\in\\mathcal{X}}{\\sup}\\{x^{\\top}y\\} + \\underset{x\\in\\mathcal{X}}{\\sup}\\{x^{\\top}z\\}$. \n", "\n", "Let us establish more general result: \n", "$$\\underset{x\\in\\mathcal{X}}{\\sup}\\{f(x) + g(x)\\} \\leq \\left(\\underset{x\\in\\mathcal{X}}{\\sup}f(x)\\right) + \\left(\\underset{x\\in\\mathcal{X}}{\\sup}g(x)\\right), \\quad\\text{for any}\\;f,g.$$\n", "Suppose $\\widetilde{x}$ is the supremizer for the above LHS. But any function evaluation must not exceed its supremal/maximal evaluation:\n", "$$f(\\widetilde{x}) \\leq \\underset{x\\in\\mathcal{X}}{\\sup}f(x), \\qquad g(\\widetilde{x}) \\leq \\underset{x\\in\\mathcal{X}}{\\sup}g(x).$$\n", "Adding the above two, the result follows." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(iii) Derive the support function $h(y)$ for the Euclidean ball $\\mathcal{B}(0,r)\\subset\\mathbb{R}^{n}$ (see Lec. 5, p. 12) with center at origin and radius $r>0$. (Hint: [Cauchy-Schwarz inequality](https://en.wikipedia.org/wiki/Cauchy%E2%80%93Schwarz_inequality))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solution for 1(b)(iii):\n", "\n", "For any $x\\in\\mathcal{B}(0,r)\\subset\\mathbb{R}^{n}$, we have $\\|x\\|_{2}\\leq r$. By the Cauchy-Schwarz inequality, we get\n", "\n", "$$ |x^{\\top} y| \\leq \\|x\\|_{2} \\|y\\|_{2} \\leq r \\|y\\|_{2} \\quad \\Rightarrow\\quad -r \\|y\\|_{2}\\leq x^{\\top} y \\leq r \\|y\\|_{2}.$$\n", "\n", "But the specific choice $x = r\\frac{y}{\\|y\\|_{2}}$ yields $x^{\\top} y = r \\|y\\|_{2}$, thus achieving the upper bound above. Therefore,\n", "\n", "$$h(y) := \\sup_{x\\in\\mathcal{X}}\\:\\{x^{\\top}y \\:\\mid y\\in\\mathbb{R}^{n}\\} = r \\|y\\|_{2}.$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Problem 2 [3 x 15 = 45 points] Convex functions\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For each of the following, **state clearly if the function is convex or concave or both or neither**. Then **prove your statement**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(i) $f(x,t) = \\log\\left(\\dfrac{1}{t^{2} - x^{\\top}x}\\right)$ over ${\\rm{\\mathbf{dom}}}(f) = \\{(x,t) \\in \\mathbb{R}^{n} \\times \\mathbb{R} \\:\\mid\\: \\parallel x\\parallel_{2} \\:<\\: t\\}$.\n", "\n", "(ii) $f(p,q) = \\displaystyle\\sum_{i=1}^{n} p_{i}\\log\\left(\\displaystyle\\frac{p_{i}}{q_{i}}\\right)$ over ${\\rm{\\mathbf{dom}}}(f) = \\{(p,q)\\in\\mathbb{R}_{\\geq 0}^{n}\\times\\mathbb{R}_{\\geq 0}^{n} \\:\\mid\\: \\mathbf{1}^{\\top}p = \\mathbf{1}^{\\top}q = 1\\}$.\n", "\n", "(iii) $f(x) = \\dfrac{1}{x}\\displaystyle\\int_{0}^{x}\\!g(t)\\:{\\mathrm{d}}t$ over ${\\rm{\\mathbf{dom}}}(f) = \\mathbb{R}_{>0}$, where $g:\\mathbb{R}\\mapsto\\mathbb{R}$ is convex and differentiable." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solution for 2(i):\n", "\n", "$f(x,t)$ is a **convex function**. \n", "\n", "First, notice that $\\textbf{dom}(f)$ is the second order/Lorentz/ice cream cone, which is a convex set (see Lec. 5, p. 8-9). An **alternative way** to show that $\\textbf{dom}(f)$ is a convex set is to notice that\n", "\n", "$$\\textbf{dom}(f) = \\,\\text{preimage of the unit 2-norm ball}\\:\\{x\\in\\mathbb{R}^{n}\\mid \\|x\\|_{2}\\leq 1\\}\\,\\text{under the perspective function (Lec. 6, p. 4-6)}.$$\n", "\n", "Since the unit 2-norm ball is convex, hence by Lec. 6, p. 6, $\\textbf{dom}(f)$ is a convex set.\n", "\n", "Having established that the set convexity requirement for the domain checks out, there are two ways to argue the rest:\n", "\n", "**Approach 1: identify the operations preserving function convexity**\n", "\n", "Write: \n", "$$f(x,t) = - \\log(t^{2} - x^{\\top}x) = -\\log t - \\log\\left(t - \\frac{x^{\\top}x}{t}\\right).$$\n", "The first term: $-\\log t$ is convex over $t>0$ (Lec. 7, p. 15). Let us examine the second term above. The quadratic-over-linear function $x^{\\top}x/t$ is convex on $\\{(x,t)\\:|\\: t>0\\}$ following the same arguments in Lec. 7, p. 17. Therefore, its negative $-x^{\\top}x/t$ is concave. Consequently, $(t - x^{\\top}x/t)$ is sum of concaves (linear is both convex and concave), and hence concave. Since $\\log(\\cdot)$ is concave increasing, hence by composition rules in textbook p. 84 (we mentioned this Lec. 8, p. 16), the composite function $\\log(t - x^{\\top}x/t)$ is concave, whose negative is $f(x,t)$. So we have shown that $f(x,t)$ is negative of a concave function, hence is convex in $(x,t)$.\n", "\n", "**Approach 2: keep calm and do calculus**\n", "\n", "Let us check the second order (Hessian) condition. For $h:\\mathbb{R}\\mapsto\\mathbb{R}$, $g:\\mathbb{R}^{m}\\mapsto\\mathbb{R}$, notice that the chain rule for Hessian of the composition $f(z) = h \\circ g (z)$ is\n", "\n", "$$\\text{Hess}\\left(f\\right) = h^{\\prime}\\left(g(z)\\right)\\:\\text{Hess}(g) \\: + \\: h^{\\prime\\prime}(g(z))\\:\\left(\\nabla g\\right)\\left(\\nabla g\\right)^{\\top}, \\quad \\text{which is an $m\\times m$ symmetric matrix}.$$\n", "\n", "For us, $h(.) = -\\log(.)$, $g(z) = z^{\\top}Dz$ where $z=(t,x)\\in\\mathbb{R}^{n+1}$, and the $(n+1)\\times(n+1)$ diagonal matrix $D={\\rm{diag}}(1,-\\mathbf{1})$. Direct application of the above chain rule gives\n", "\n", "$$\\text{Hess}\\left(f\\right) = \\displaystyle\\frac{2}{\\left(z^{\\top}Dz\\right)^2}\\left[2Dzz^{\\top}D \\: - \\: \\left(z^{\\top}Dz\\right)D\\right].$$\n", "\n", "Since the pre-factor of the Hessian above is positive, all that remains to show is: $y^{\\top}\\left[2Dzz^{\\top}D \\: - \\: \\left(z^{\\top}Dz\\right)D\\right]y \\geq 0$ for all $y\\in\\mathbb{R}^{n+1}$. This follows from noting that\n", "\n", "\\begin{aligned}\n", "y^{\\top}\\left[2Dzz^{\\top}D \\: - \\: \\left(z^{\\top}Dz\\right)D\\right]y &= 2 \\left(y^{\\top}Dz\\right)^{2} \\: - \\: \\left(y^{\\top}Dy\\right) \\left(z^{\\top}Dz\\right)\\\\\n", "&= 2\\left(y_{1}z_{1} \\:-\\: y_{2}z_{2} \\:-\\: y_{3}z_{3} \\:-\\: ... \\:-\\: y_{n+1}z_{n+1}\\right)^{2} \\: - \\: \\left(y_{1}^{2} \\:-\\: y_{2}^{2} \\:-\\: y_{3}^{2} \\:-\\: ... \\:-\\: y_{n+1}^{2}\\right)\\left(z_{1}^{2} \\:-\\: z_{2}^{2} \\:-\\: z_{3}^{2} \\:-\\: ... \\:-\\: z_{n+1}^{2}\\right)\\\\\n", "&= \\left(y_{1}z_{1} \\:-\\: y_{2}z_{2} \\:-\\: y_{3}z_{3} \\:-\\: ... \\:-\\: y_{n+1}z_{n+1}\\right)^{2} \\: + \\: \\displaystyle\\sum_{i\\neq j} \\left(y_{i}z_{j}\\right)^{2}\\\\\n", "&\\geq 0.\n", "\\end{aligned}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solution for 2(ii):\n", "\n", "$f(p,q)$ is a **convex function**.\n", "\n", "First, notice that $\\textbf{dom}(f)$ is the Cartesian product of two standard/probability simplices (Lec. 5, p. 15). Each of the simplices being an intersection of 2 hyperplanes and 2 halfspaces, is a convex set, in fact a polyhedron. Since the Cartesian product of convex sets is convex (lEC. 6, P. 3), $\\textbf{dom}(f)$ is a convex set. \n", "\n", "Having established that the set convexity requirement for the domain checks out, there are two ways to argue the rest:\n", "\n", "**Approach 1: identify the operations preserving function convexity**\n", "\n", "See Example 3.19 in textbook.\n", "\n", "**Approach 2: check the zeroth order condition (Lec. 7, p. 7-8)**\n", "\n", "Choose two diffrent pairs of probability vectors $(p_{1},q_{1})\\in\\textbf{dom}(f)$ and $(p_{2},q_{2})\\in\\textbf{dom}(f)$. For any $0\\leq\\theta\\leq 1$, let\n", "$$p = \\theta p_{1} + (1-\\theta)p_{2}, \\quad q = \\theta q_{1} + (1-\\theta)q_{2}.$$\n", "We will now prove that\n", "$$f(p,q) \\leq \\theta f(p_{1},q_{1}) \\:+\\: (1-\\theta) f(p_{2},q_{2}).$$\n", "We start with the following Lemma.\n", "\n", "> **Lemma (Log sum inequality):** Let $(a,b)\\in\\mathbb{R}^{n}_{\\geq 0} \\times \\mathbb{R}^{n}_{\\geq 0}$. Then\n", "$$\\displaystyle\\sum_{i=1}^{n} a_{i}\\log\\displaystyle\\frac{a_{i}}{b_{i}} \\geq \\left(\\displaystyle\\sum_{i=1}^{n}a_{i}\\right)\\log\\displaystyle\\frac{\\displaystyle\\sum_{i=1}^{n}a_{i}}{\\displaystyle\\sum_{i=1}^{n}b_{i}}.$$\n", ">**Proof:** For notational ease, let us denote $\\sum_{i}a_{i}=\\alpha$, and $\\sum_{i}b_{i}=\\beta$. Recalling that $g(x) = x\\log x$ is a convex function (which can be easily checked), we have\n", "$$ \\displaystyle\\sum_{i=1}^{n} a_{i}\\log\\displaystyle\\frac{a_{i}}{b_{i}} = \\displaystyle\\sum_{i=1}^{n} b_{i} g\\left(\\displaystyle\\frac{a_{i}}{b_{i}}\\right) = \\beta \\displaystyle\\sum_{i=1}^{n} \\displaystyle\\frac{b_{i}}{\\beta} g\\left(\\displaystyle\\frac{a_{i}}{b_{i}}\\right) \\geq \\beta g\\left(\\displaystyle\\sum_{i=1}^{n}\\displaystyle\\frac{b_{i}}{\\beta}\\displaystyle\\frac{a_{i}}{b_{i}}\\right) \\geq \\beta g\\left(\\displaystyle\\frac{1}{\\beta}\\displaystyle\\sum_{i=1}^{n}a_{i}\\right) = \\beta g\\left(\\frac{\\alpha}{\\beta}\\right) = \\alpha \\log\\frac{\\alpha}{\\beta},$$\n", "where the inequality follows from the Jensen's inequality applied to the convex function $g(.)$. We have also used $b_{i}/\\beta \\geq 0$, $\\sum_{i}b_{i}/\\beta = 1$.\n", "\n", "Now back to our function Kullback-Leibler divergence, let $a_{1i}=\\theta p_{1i}$, $a_{2i}=(1-\\theta) p_{2i}$, $b_{1i}=\\theta q_{1i}$, $b_{2i}=(1-\\theta) q_{2i}$. Then,\n", "$$\\begin{aligned}\n", "f(p,q) = \\displaystyle\\sum_{i=1}^{n} \\left(\\theta p_{1i} + (1-\\theta)p_{2i}\\right)\\log\\displaystyle\\frac{\\theta p_{1i} + (1-\\theta)p_{2i}}{\\theta q_{1i} + (1-\\theta)q_{2i}} &= \\displaystyle\\sum_{i=1}^{n} \\left(a_{1i} + a_{2i}\\right) \\log\\displaystyle\\frac{a_{1i}+a_{2i}}{b_{1i}+b_{2i}}\\\\\n", "&\\leq \\displaystyle\\sum_{i=1}^{n} \\left(a_{1i}\\log\\displaystyle\\frac{a_{1i}}{b_{1i}} + a_{2i}\\log\\displaystyle\\frac{a_{2i}}{b_{2i}}\\right) \\qquad \\text{(using the log sum inequality from above lemma)}\\\\\n", "&= \\displaystyle\\sum_{i=1}^{n} \\left(\\theta p_{1i}\\log\\displaystyle\\frac{\\theta p_{1i}}{\\theta q_{1i}} \\: + \\: (1-\\theta)p_{2i}\\log\\displaystyle\\frac{(1-\\theta)p_{2i}}{(1-\\theta)q_{2i}}\\right) \\\\\n", "&= \\theta\\:f(p_{1},q_{1}) \\: + \\: (1-\\theta)\\:f(p_{2},q_{2}).\n", "\\end{aligned}$$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solution for 2(iii):\n", "\n", "$f(x)$ is a **convex function**.\n", "\n", "First, $\\textbf{dom}(f)$, being a halfspace, is a convex set.\n", "\n", "Next, we check the second order condition for $f(x)$ as follows:\n", "\\begin{align}\n", "f^{\\prime}(x) &= -\\dfrac{1}{x^2}\\int_{0}^{x}g(t)\\:{\\mathrm{d}}t \\:+\\: \\frac{1}{x}g(x)\\\\\n", "\\Rightarrow f^{\\prime\\prime}(x) &= \\frac{2}{x^3} \\bigg\\{\\left(\\int_{0}^{x}g(t)\\:{\\mathrm{d}}t\\right) - xg(x) + \\frac{x^2}{2}g^{\\prime}(x)\\bigg\\} \\\\\n", "&= \\frac{2}{x^3} \\int_{0}^{x}\\left(g(t) - g(x) -g^{\\prime}(x)(t-x)\\right){\\mathrm{d}}t \\geq 0\n", "\\end{align}\n", "where the last inequality is due to the first order condition of convexity for the differentiable function $g(\\cdot)$, which makes the above integrand, and hence the integral $\\geq 0$." ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.16" } }, "nbformat": 4, "nbformat_minor": 2 }