{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Proving $f(X) = -\\log\\det(X)$ is convex on $\\mathbf{dom}(f)=\\mathbb{S}^{n}_{++}$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Proof 1 (showing post-composition of $f(\\cdot)$ with affine function is convex)\n", "\n", "See text p. 74 bottom." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Proof 2 (showing first order condition for function convexity)\n", "\n", "In class, we proved that $\\displaystyle\\frac{\\partial f}{\\partial X} = -X^{-1}$ for $X\\in\\mathbb{S}^{n}_{++}$. To establish convexity via first order condition, we need to demonstrate that\n", "\n", "$$f(Y) \\:\\geq\\: f(X) \\: + \\: \\Big\\langle \\displaystyle\\frac{\\partial f}{\\partial X}, Y - X \\Big\\rangle,$$\n", "\n", "where the inner product $\\langle A, B \\rangle := \\text{trace}(A^{\\top}B)$. Substituting our $f(\\cdot)$ and $\\displaystyle\\frac{\\partial f}{\\partial X}$ in the above condition, we are required to prove:\n", "\n", "$$\\text{trace}\\left(X^{-1}Y\\right) \\:-\\: n \\:+\\: \\log\\det\\left(XY^{-1}\\right) \\,\\geq\\, 0.$$\n", "\n", "We observe that the LHS of the above inequality is twice the Kullback-Leibler divergence $D_{\\rm{KL}}\\left(\\mathcal{N}\\left(0,Y\\right) || \\mathcal{N}\\left(0,X\\right)\\right)$, where $D_{\\rm{KL}}\\left(p(x) || q(x)\\right)$ between two probability density functions $p(x)$ and $q(x)$ over $x\\in\\mathbb{R}^{n}$, is defined as\n", "$$D_{\\rm{KL}}\\left(p(x) || q(x)\\right) := \\displaystyle\\int_{\\mathbb{R}^{n}} p(x)\\log\\displaystyle\\frac{p(x)}{q(x)}\\:{\\rm{d}}x \\geq 0.$$\n", "That the above is always $\\geq 0$, follows from a generalization of Jensen's inequality: for $h,g:\\mathbb{R}\\mapsto\\mathbb{R}$, and $h(\\cdot)$ convex, $\\mathbb{E}\\left[h \\circ g(x)\\right] \\geq h\\left(\\mathbb{E}\\left[g(x)\\right]\\right)$. You can think of it as Jensen's inequality applied to the random variable $g(x(\\omega))$. Specifically, take $h(x) = -\\log(x)$ and $g(x) = q(x)/p(x)$ in that generalized Jensen's inequality to get the result. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Proof 3 (showing second order condition for function convexity)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We need to compute the Hessian of $f(X)$, which will be $n^{2} \\times n^{2}$ matrix. Hessian is the negative Jacobian of the matrix-valued function $F(X):= X^{-1}$, that is, ${\\rm{Hess}}(f) = -{\\rm{D}}F(X)$, where $F(X) = - \\displaystyle\\frac{\\partial f}{\\partial X}$. \n", "\n", "Taking differential ${\\rm{d}}(\\cdot)$ to both sides of the identity $XX^{-1}=I$, we obtain\n", "\n", "$${\\rm{d}}(X^{-1}) = -X^{-1}\\left({\\rm{d}}X\\right)X^{-1}.$$\n", "\n", "Applying ${\\rm{vec}}(\\cdot)$ operator to both sides of the above, using the fact that the operators ${\\rm{vec}}(\\cdot)$ and ${\\rm{d}}(\\cdot)$ commute, and employing the identity ${\\rm{vec}}(PQR) = \\left(R^{\\top}\\otimes P\\right) {\\rm{vec}}(Q)$, we get\n", "\n", "$${\\rm{d}}({\\rm{vec}}(X^{-1})) = - \\left(X^{-1}\\otimes X^{-1}\\right) {\\rm{d}}\\left({\\rm{vec}}(X)\\right),$$\n", "\n", "which yields ${\\rm{D}}F(X) = - \\left(X^{-1}\\otimes X^{-1}\\right)$, and hence ${\\rm{Hess}}(f) = -{\\rm{D}}F(X) = X^{-1}\\otimes X^{-1}$. Consequently, ${\\rm{Hess}}(f)$ is positive definite since it has eigenvalues $\\left(\\lambda_{i}\\lambda_{j}\\right)^{-1}$ for $i,j=1,...,n$, where $\\lambda_{1}, ..., \\lambda_{n}$ are the eigenvalues of $X$. Therefore, $f(X)$ is a convex function of $X\\in\\mathbb{S}^{n}_{++}$." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.14" } }, "nbformat": 4, "nbformat_minor": 2 }