{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "# Table of Contents\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Optimization Examples - Second Order Cone Programming (SOCP)\n", "\n", "## SOCP\n", "\n", "* A **second-order cone program (SOCP)**\n", "\\begin{eqnarray*}\n", "\t&\\text{minimize}& \\mathbf{f}^T \\mathbf{x} \\\\\n", "\t&\\text{subject to}& \\|\\mathbf{A}_i \\mathbf{x} + \\mathbf{b}_i\\|_2 \\le \\mathbf{c}_i^T \\mathbf{x} + d_i, \\quad i = 1,\\ldots,m \\\\\n", "\t& & \\mathbf{F} \\mathbf{x} = \\mathbf{g}\n", "\\end{eqnarray*}\n", "over $\\mathbf{x} \\in \\mathbb{R}^n$. This says the points $(\\mathbf{A}_i \\mathbf{x} + \\mathbf{b}_i, \\mathbf{c}_i^T \\mathbf{x} + d_i)$ live in the second order cone (ice cream cone, Lorentz cone, quadratic cone)\n", "\\begin{eqnarray*}\n", "\t\\mathbf{Q}^{n+1} = \\{(\\mathbf{x}, t): \\|\\mathbf{x}\\|_2 \\le t\\}\n", "\\end{eqnarray*}\n", "in $\\mathbb{R}^{n+1}$.\n", "\n", "* QP is a special case of SOCP. Why?\n", " \n", "* When $\\mathbf{c}_i = \\mathbf{0}$ for $i=1,\\ldots,m$, SOCP is equivalent to a **quadratically constrained quadratic program (QCQP)**\n", "\\begin{eqnarray*}\n", "\t&\\text{minimize}& (1/2) \\mathbf{x}^T \\mathbf{P}_0 \\mathbf{x} + \\mathbf{q}_0^T \\mathbf{x} \\\\\n", "\t&\\text{subject to}& (1/2) \\mathbf{x}^T \\mathbf{P}_i \\mathbf{x} + \\mathbf{q}_i^T \\mathbf{x} + r_i \\le 0, \\quad i = 1,\\ldots,m \\\\\n", "\t& & \\mathbf{A} \\mathbf{x} = \\mathbf{b},\n", "\\end{eqnarray*}\n", "where $\\mathbf{P}_i \\in \\mathbf{S}_+^n$, $i=0,1,\\ldots,m$. Why?\n", "\n", "* A **rotated quadratic cone** in $\\mathbb{R}^{n+2}$ is\n", "\\begin{eqnarray*}\n", "\t\\mathbf{Q}_r^{n+2} = \\{(\\mathbf{x}, t_1, t_2): \\|\\mathbf{x}\\|_2^2 \\le 2 t_1 t_2, t_1 \\ge 0, t_2 \\ge 0\\}.\n", "\\end{eqnarray*}\n", "A point $\\mathbf{x} \\in \\mathbb{R}^{n+1}$ belongs to the second order cone $\\mathbf{Q}^{n+1}$ if and only if \n", "\\begin{eqnarray*}\n", "\t\\begin{pmatrix} \\mathbf{I}_{n-2} & 0 & 0 \\\\\n", "\t0 & - 1/\\sqrt 2 & 1 / \\sqrt 2 \\\\\n", "\t0 & 1/\\sqrt 2 & 1 / \\sqrt 2\n", "\t\\end{pmatrix} \\mathbf{x}\n", "\\end{eqnarray*}\n", "belongs to the rotated quadratic cone $\\mathbf{Q}_r^{n+1}$.\n", "\n", " Gurobi allows users to input second order cone constraint and quadratic constraints directly.\n", "\n", " Mosek allows users to input second order cone constraint, quadratic constraints, and rotated quadratic cone constraint directly.\n", "\n", "* Following sets are _(rotated) quadratic cone representable sets_:\n", "\n", " * (Absolute values) $|x| \\le t \\Leftrightarrow (x, t) \\in \\mathbf{Q}^2$.\n", "\n", " * Euclidean norms) $\\|\\mathbf{x}\\|_2 \\le t \\Leftrightarrow (\\mathbf{x}, t) \\in \\mathbf{Q}^{n+1}$.\n", "\n", " * (Sume of squares) $\\|\\mathbf{x}\\|_2^2 \\le t \\Leftrightarrow (\\mathbf{x}, t, 1/2) \\in \\mathbf{Q}_r^{n+2}$.\n", "\n", " * (Ellipsoid) For $\\mathbf{P} \\in \\mathbf{S}_+^n$ and if $\\mathbf{P} = \\mathbf{F}^T \\mathbf{F}$, where $\\mathbf{F} \\in \\mathbf{R}^{n \\times k}$, then\n", " \\begin{eqnarray*}\n", " & & (1/2) \\mathbf{x}^T \\mathbf{P} \\mathbf{x} + \\mathbf{c}^T \\mathbf{x} + r \\le 0 \\\\\n", " &\\Leftrightarrow& \\mathbf{x}^T \\mathbf{P} \\mathbf{x} \\le 2t, t + \\mathbf{c}^T \\mathbf{x} + r = 0\t\\\\\n", " &\\Leftrightarrow& (\\mathbf{F} \\mathbf{x}, t, 1) \\in \\mathbf{Q}_r^{k+2}, t + \\mathbf{c}^T \\mathbf{x} + r = 0.\n", " \\end{eqnarray*}\n", " Similarly,\n", " \\begin{eqnarray*}\n", " \\|\\mathbf{F} (\\mathbf{x} - \\mathbf{c})\\|_2 \\le t \\Leftrightarrow (\\mathbf{y}, t) \\in \\mathbf{Q}^{n+1}, \\mathbf{y} = \\mathbf{F}(\\mathbf{x} - \\mathbf{c}).\n", " \\end{eqnarray*}\n", " This fact shows that QP and QCQP are instances of SOCP.\n", "\n", " * (Second order cones) $\\|\\mathbf{A} \\mathbf{x} + \\mathbf{b}\\|_2 \\le \\mathbf{c}^T \\mathbf{x} + d \\Leftrightarrow (\\mathbf{A} \\mathbf{x} + \\mathbf{b}, \\mathbf{c}^T \\mathbf{x} + d) \\in \\mathbf{Q}^{m+1}$.\n", "\n", " * (Simple polynomial sets)\n", " \\begin{eqnarray*}\n", " \\{(t, x): |t| \\le \\sqrt x, x \\ge 0\\} &=& \\{ (t,x): (t, x, 1/2) \\in \\mathbf{Q}_r^3\\} \\\\\n", " \\{(t, x): t \\ge x^{-1}, x \\ge 0\\} &=& \\{ (t,x): (\\sqrt 2, x, t) \\in \\mathbf{Q}_r^3\\} \\\\\n", " \\{(t, x): t \\ge x^{3/2}, x \\ge 0\\} &=& \\{ (t,x): (x, s, t), (s, x, 1/8) \\in \\mathbf{Q}_r^3\\} \\\\\n", " \\{(t, x): t \\ge x^{5/3}, x \\ge 0\\} &=& \\{ (t,x): (x, s, t), (s, 1/8, z), (z, s, x) \\in \\mathbf{Q}_r^3\\} \\\\\n", " \\{(t, x): t \\ge x^{(2k-1)/k}, x \\ge 0\\}&,& k \\ge 2, \\text{can be represented similarly} \\\\\n", " \\{(t, x): t \\ge x^{-2}, x \\ge 0\\} &=& \\{ (t,x): (s, t, 1/2), (\\sqrt 2, x, s) \\in \\mathbf{Q}_r^3\\} \\\\\n", " \\{(t, x, y): t \\ge |x|^3/y^2, y \\ge 0\\} &=& \\{ (t,x,y): (x, z) \\in \\mathbf{Q}^2, (z, y/ 2, s), (s, t/2, z) \\in \\mathbf{Q}_r^3\\}\n", " \\end{eqnarray*}\n", "\n", " * (Geometric mean) The hypograph of the (concave) geometric mean function\n", " \\begin{eqnarray*}\n", " \\mathbf{K}_{\\text{gm}}^n = \\{(\\mathbf{x}, t) \\in \\mathbb{R}^{n+1}: (x_1 x_2 \\cdots x_n)^{1/n} \\ge t, \\mathbf{x} \\succeq \\mathbf{0}\\}\n", " \\end{eqnarray*}\n", " can be represented by rotated quadratic cones. For example,\n", " \\begin{eqnarray*}\n", " \\mathbf{K}_{\\text{gm}}^2 &=& \\{(x_1, x_2, t): \\sqrt{x_1 x_2} \\ge t, x_1, x_2 \\ge 0\\} \\\\\n", " &=& \\{(x_1, x_2, t): (\\sqrt 2 t, x_1, x_2) \\in \\mathbf{Q}_r^3\\}.\n", " \\end{eqnarray*}\n", "\n", " * (Harmonic mean) The hypograph of the harmonic mean function $\\left( n^{-1} \\sum_{i=1}^n x_i^{-1} \\right)^{-1}$ can be represented by rotated quadratic cones\n", " \\begin{eqnarray*}\n", " & & \\left( n^{-1} \\sum_{i=1}^n x_i^{-1} \\right)^{-1} \\ge t, \\mathbf{x} \\succeq \\mathbf{0} \\\\\n", " &\\Leftrightarrow& n^{-1} \\sum_{i=1}^n x_i^{-1} \\le y, \\mathbf{x} \\succeq \\mathbf{0} \\\\\n", " &\\Leftrightarrow& x_i z_i \\ge 1, \\sum_{i=1}^n z_i = ny, \\mathbf{x} \\succeq \\mathbf{0} \\\\\n", " &\\Leftrightarrow& 2 x_i z_i \\ge 2, \\sum_{i=1}^n z_i = ny, \\mathbf{x} \\succeq \\mathbf{0}, \\mathbf{z} \\succeq \\mathbf{0} \\\\\n", " &\\Leftrightarrow& (\\sqrt 2, x_i, z_i) \\in \\mathbf{Q}_r^3, \\mathbf{0}^T \\mathbf{z} = ny, \\mathbf{x} \\succeq \\mathbf{0}, \\mathbf{z} \\succeq \\mathbf{0}.\n", " \\end{eqnarray*}\n", "\n", " * (Convex increasing rational powers) For $p,q \\in \\mathbf{Z}_+$ and $p/q \\ge 1$,\n", " \\begin{eqnarray*}\n", " \\mathbf{K}^{p/q} = \\{(x, t): x^{p/q} \\le t, x \\ge 0\\} = \\{(x,t): (t\\mathbf{1}_q, \\mathbf{1}_{p-q}, x) \\in \\mathbf{K}_{\\text{gm}}^p\\}.\n", " \\end{eqnarray*}\n", "\n", " * (Convex decreasing rational powers) For any $p,q \\in \\mathbf{Z}_+$,\n", " \\begin{eqnarray*}\n", " \\mathbf{K}^{-p/q} = \\{(x, t): x^{-p/q} \\le t, x \\ge 0\\} = \\{(x,t): (x\\mathbf{1}_p, t\\mathbf{1}_{q}, 1) \\in \\mathbf{K}_{\\text{gm}}^{p+q}\\}.\n", " \\end{eqnarray*}\n", "\n", " * (Power cones) The _power cone_ with rational powers is\n", " \\begin{eqnarray*}\n", " \\mathbf{K}_{\\alpha}^{n+1} = \\left\\{ (\\mathbf{x},y) \\in \\mathbb{R}_+^n \\times \\mathbb{R}: |y| \\le \\prod_{j=1}^n x_j^{p_j/q_j} \\right\\},\n", " \\end{eqnarray*}\n", " where $p_j, q_j$ are integers satisfying $0 < p_j \\le q_j$ and $\\sum_{j=1}^n p_j/q_j = 1$. Let $\\beta = \\text{lcm}(q_1,\\ldots, q_n)$ and\n", " \\begin{eqnarray*}\n", " s_j = \\beta \\sum_{k=1}^j \\frac{p_k}{q_k}, \\quad j=1,\\ldots,n-1.\n", " \\end{eqnarray*}\n", " Then it can be represented as\n", " \\begin{eqnarray*}\n", " & & |y| \\le (z_1 z_2 \\cdots z_\\beta)^{1/q} \\\\\n", " & & z_1 = \\cdots = z_{s_1} = x_1, \\quad z_{s_1+1} = \\cdots = z_{s_2} = x_2, \\quad z_{s_{n-1}+1} = \\cdots = z_\\beta = x_n.\n", " \\end{eqnarray*}\n", " \n", "* References for above examples: Papers [Lobo, Vandergerghe, Boyd, Lebret (1998)](https://doi.org/10.1016/S0024-3795(98)10032-0), [Alizadeh and Goldfarb (2003)](https://doi.org/10.1007/s10107-002-0339-5), and book by [Ben-Tal and Nemirovski (2001)](https://doi.org/10.1137/1.9780898718829). Now our catalogue of SOCP terms includes all above terms.\n", "\n", "* Most of these function are implemented as the built-in function in the convex optimization modeling language cvx (for Matlab) or Convex.jl (for Julia)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SOCP example: group lasso\n", "\n", "* In many applications, we need to perform variable selection at group level. For instance, in factorial analysis, we want to select or de-select the group of regression coefficients for a factor simultaneously. [Yuan and Lin (2006)](https://doi.org/10.1111/j.1467-9868.2005.00532.x) propose the group lasso that\n", "\\begin{eqnarray*}\n", "\t&\\text{minimize}& \\frac 12 \\|\\mathbf{y} - \\beta_0 \\mathbf{1} - \\mathbf{X} \\beta\\|_2^2 + \\lambda \\sum_{g=1}^G w_g \\|\\beta_g\\|_2,\n", "\\end{eqnarray*}\n", "where $\\beta_g$ is the subvector of regression coefficients for group $g$, and $w_g$ are fixed group weights. This is equivalent to the SOCP\n", "\\begin{eqnarray*}\n", "\t&\\text{minimize}& \\frac 12 \\beta^T \\mathbf{X}^T \\left(\\mathbf{I} - \\frac{\\mathbf{1} \\mathbf{1}^T}{n} \\right) \\mathbf{X} \\beta + \\\\\n", "\t& & \\quad \\mathbf{y}^T \\left(\\mathbf{I} - \\frac{\\mathbf{1} \\mathbf{1}^T}{n} \\right) \\mathbf{X} \\beta + \\lambda \\sum_{g=1}^G w_g t_g \\\\\n", "\t&\\text{subject to}& \\|\\beta_g\\|_2 \\le t_g, \\quad g = 1,\\ldots, G,\n", "\\end{eqnarray*}\n", "in variables $\\beta$ and $t_1,\\ldots,t_G$.\n", " \n", "* Overlapping groups are allowed here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SOCP example: sparse group lasso\n", "\n", "* \n", "\\begin{eqnarray*}\n", "\t&\\text{minimize}& \\frac 12 \\|\\mathbf{y} - \\beta_0 \\mathbf{1} - \\mathbf{X} \\beta\\|_2^2 + \\lambda_1 \\|\\beta\\|_1 + \\lambda_2 \\sum_{g=1}^G w_g \\|\\beta_g\\|_2\n", "\\end{eqnarray*}\n", "achieves sparsity at both group and individual coefficient level and can be solved by SOCP as well.\n", "\n", "\n", "\n", "* Apparently we can solve any previous loss functions (quantile, $\\ell_1$, composite quantile, Huber, multi-response model) plus group or sparse group penalty by SOCP." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SOCP example: square-root lasso \n", "\n", "* [Belloni, Chernozhukov, and Wang (2011)]() minimizes\n", "\\begin{eqnarray*}\n", "\t\\|\\mathbf{y} - \\beta_0 \\mathbf{1} - \\mathbf{X} \\beta\\|_2 + \\lambda \\|\\beta\\|_1\n", "\\end{eqnarray*}\n", "by SOCP. This variant generates the same solution path as lasso (why?) but \n", "simplifies the choice of $\\lambda$.\n", "\n", "* A demo example: