{ "cells": [ { "cell_type": "markdown", "metadata": { "code_folding": [ 1 ], "slideshow": { "slide_type": "slide" } }, "source": [ "# Lecture 2: Matrix norms and unitary matrices" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Recap of the previous lecture\n", "\n", "- Floating point (double, single, number of bytes), rounding error\n", "- Vector norms are measures of smallness, used to compute the distance and accuracy\n", "- Forward/backward error (and stability of algorithms)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Notations\n", "\n", "We use notation \n", "$A= \\begin{bmatrix} a_{11} & \\dots & a_{1m} \\\\ \\vdots & \\ddots & \\vdots \\\\ a_{n1} & \\dots & a_{nm} \\end{bmatrix} \\equiv \\{a_{ij}\\}_{i,j=1}^{n,m}\\in \\mathbb{C}^{n\\times m}$.\n", "\n", "$A^*\\stackrel{\\mathrm{def}}{=}\\overline{A^\\top}$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Matrices and norms\n", "\n", "- Recall vector norms that allow to evaluate distance between two vectors or how large are elements of a vector.\n", "\n", "- How to generalize this concept to matrices?\n", "\n", "- A trivial answer is that there is no big difference between matrices and vectors, and here comes the simplest matrix norm –– **Frobenius** norm:\n", "\n", "$$\n", " \\Vert A \\Vert_F \\stackrel{\\mathrm{def}}{=} \\Big(\\sum_{i=1}^n \\sum_{j=1}^m |a_{ij}|^2\\Big)^{1/2}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Matrix norms\n", "$\\Vert \\cdot \\Vert$ is called a **matrix norm** if it is a vector norm on the vector space of $n \\times m$ matrices:\n", "1. $\\|A\\| \\geq 0$ and if $\\|A\\| = 0$ then $A = O$\n", "3. $\\|\\alpha A\\| = |\\alpha| \\|A\\|$\n", "4. $\\|A+B\\| \\leq \\|A\\| + \\|B\\|$ (triangle inequality)\n", "\n", "Additionally some norms can satisfy the *submultiplicative property*\n", "\n", "* $\\Vert A B \\Vert \\leq \\Vert A \\Vert \\Vert B \\Vert$ \n", "\n", "These norms are called **submultiplicative norms**.\n", "\n", "The submultiplicative property is needed in many places, for example in the estimates for the error of solution of linear systems (we will cover this subject later). \n", "\n", "Example of a non-submultiplicative norm is Chebyshev norm \n", "\n", "$$\n", "\\|A\\|_C = \\displaystyle{\\max_{i,j}}\\, |a_{ij}|\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Operator norms\n", "The most important class of the matrix norms is the class of **operator norms**. Mathematically, they are defined as\n", "\n", "$$\n", " \\Vert A \\Vert_{*,**} = \\sup_{x \\ne 0} \\frac{\\Vert A x \\Vert_*}{\\Vert x \\Vert_{**}},\n", "$$\n", "\n", "where $\\Vert \\cdot \\Vert_*$ and $\\| \\cdot \\|_{**}$ are **vector norms**.\n", "\n", "It is easy to check that operator norms are submultiplicative.\n", "\n", "**Frobenius norm** is a matrix norm, but not an operator norm, i.e. you can not find $\\Vert \\cdot \\Vert_*$ and $\\| \\cdot \\|_{**}$ that induce it. A nontrivial fact. The general criterion can be found in [Theorem 6 and Corollary 4](http://www.sciencedirect.com/science/article/pii/S0024379515004346).\n", "For $\\Vert \\cdot \\Vert_* = \\| \\cdot \\|_{**}$ let us check on the blackboard!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Matrix $p$-norms\n", "\n", "Important case of operator norms are matrix $p$-norms, which are defined for $\\|\\cdot\\|_* = \\|\\cdot\\|_{**} = \\|\\cdot\\|_p$.
\n", " Among all $p$-norms three norms are the most common ones: \n", "\n", "- $p = 1, \\quad \\Vert A \\Vert_{1} = \\displaystyle{\\max_j \\sum_{i=1}^n} |a_{ij}|$.\n", "\n", "- $p = 2, \\quad$ spectral norm, denoted by $\\Vert A \\Vert_2$.\n", "\n", "- $p = \\infty, \\quad \\Vert A \\Vert_{\\infty} = \\displaystyle{\\max_i \\sum_{j=1}^m} |a_{ij}|$.\n", "\n", "Let us check it for $p=\\infty$ on a blackboard." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Spectral norm\n", "\n", "- Spectral norm, $\\Vert A \\Vert_2$ is one of the most used matrix norms (along with the Frobenius norm). \n", "- It can not be computed directly from the entries using a simple formula, like the Frobenius norm, however, there are efficient algorithms to compute it. \n", "- It is directly related to the **singular value decomposition** (SVD) of the matrix. It holds\n", "\n", "$$\n", " \\Vert A \\Vert_2 = \\sigma_1(A) = \\sqrt{\\lambda_\\max(A^*A)}\n", "$$\n", "\n", "where $\\sigma_1(A)$ is the largest singular value of the matrix $A$ and $^*$ is a *conjugate transpose* of the matrix. \n", "\n", "- We will soon learn all about the SVD. Meanwhile, we can already compute the norm in Python." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Spectral: 54.2841177779168 \n", "Frobenius: 446.1247275008811 \n", "1-norm: 102.41503739208666 \n", "infinity: 1670.3307675670248\n" ] } ], "source": [ "import numpy as np\n", "n = 100\n", "m = 2000\n", "a = np.random.randn(n, m) #Random n x n matrix\n", "s1 = np.linalg.norm(a, 2) #Spectral\n", "s2 = np.linalg.norm(a, 'fro') #Frobenius\n", "s3 = np.linalg.norm(a, 1) #1-norm\n", "s4 = np.linalg.norm(a, np.inf) \n", "print('Spectral: {0:} \\nFrobenius: {1:} \\n1-norm: {2:} \\ninfinity: {3:}'.format(s1, s2, s3, s4))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Examples\n", "\n", "Several examples of optimization problems where matrix norms arise:\n", "* $\\displaystyle{\\min_{\\mathrm{rank}(A_r)=r}}\\| A - A_r\\|$ –– finding best rank-r approximation. SVD helps to solve this problem for $\\|\\cdot\\|_2$ and $\\|\\cdot\\|_F$.\n", "\n", "\n", "* $\\displaystyle{\\min_B}\\| P_\\Omega \\odot(A - B)\\| + \\mathrm{rank}(B)$ –– matrix completion. \n", "$$\n", "(P_\\Omega)_{ij} = \\begin{cases} 1 & i,j\\in\\Omega \\\\ 0 & \\text{otherwise} \\end{cases}\n", "$$\n", "$\\odot$ denotes Hadamard product (elementwise)\n", "\n", "\n", "* $\\displaystyle{\\min_{B,C\\geq 0}} \\|A - BC\\|$ –– nonnegative matrix factorization. Symbol $B\\geq0$ here means that all elements of $B$ are nonnegative." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Scalar product\n", "While norm is a measure of distance, the **scalar product** takes angle into account. \n", "\n", "It is defined as\n", "\n", "* **For vectors:**\n", "$$\n", " (x, y) = x^* y = \\sum_{i=1}^n \\overline{x}_i y_i ,\n", "$$\n", "where $\\overline{x}$ denotes the *complex conjugate* of $x$. The Euclidean norm is then\n", "$$\n", " \\Vert x \\Vert_2 = \\sqrt{(x, x)},\n", "$$\n", "or it is said the norm is **induced** by scalar product. \n", "\n", "\n", "* **For matrices** (Frobenius scalar product):\n", "$$\n", " (A, B)_F = \\displaystyle{\\sum_{i=1}^{n}\\sum_{j=1}^{m}} \\overline{a}_{ij} b_{ij} \\equiv \\mathrm{trace}(A^* B),\n", "$$\n", "where $\\mathrm{trace}(A)$ denotes sum of diagonal elements of $A$. One can check that $\\|A\\|_F = \\sqrt{(A, A)_F}$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Remark**. The angle between two vectors is defined as\n", "$$\n", " \\cos \\phi = \\frac{(x, y)}{\\Vert x \\Vert_2 \\Vert y \\Vert_2}.\n", "$$\n", "Similar expression holds for matrices." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "An important property of the scalar product is the **Cauchy-Schwarz-Bunyakovski inequality**:\n", "\n", "$$|(x, y)| \\leq \\Vert x \\Vert_2 \\Vert y \\Vert_2,$$\n", "\n", "and thus the angle between two vectors is defined properly." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Matrices preserving the norm\n", "\n", "- For stability it is really important that the error does not grow after we apply some transformations. \n", "\n", "- Suppose you are given $\\widehat{x}$ –– the approximation of $x$ such that, \n", "\n", "$$\n", " \\frac{\\Vert x - \\widehat{x} \\Vert}{\\Vert x \\Vert} \\leq \\varepsilon.\n", "$$\n", "\n", "- Let us calculate a linear transformation of $x$ and $\\widehat{x}$: \n", "\n", "$$\n", " y = U x, \\quad \\widehat{y} = U \\widehat{x}.\n", "$$\n", "\n", "- When building new algorithms, we want to use transformations that do not increase (or even preserve) the error:\n", "\n", "$$\n", " \\frac{\\Vert y - \\widehat{y} \\Vert}{\\Vert y \\Vert } = \\frac{\\Vert U ( x - \\widehat{x}) \\Vert}{\\Vert U x\\Vert} \\leq \\varepsilon.\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The question is for which kind of matrices the norm of the vector **will not change**, so that\n", "\n", "$$\n", "\\frac{\\Vert U ( x - \\widehat{x}) \\Vert}{\\Vert U x\\Vert} = \\frac{ \\|x - \\widehat{x}\\|}{\\|x\\|}.\n", "$$\n", "\n", "For the euclidean norm $\\|\\cdot\\|_2$ the answer is **unitary** (or orthogonal in the real case) matrices." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Unitary (orthogonal) matrices\n", "Let $U$ be complex $n \\times n$ matrix, and $\\Vert U z \\Vert_2 = \\Vert z \\Vert_2$ for all $z$. \n", "\n", "This can happen **if and only if** (can be abbreviated as **iff**)\n", "\n", "$$\n", " U^* U = I_n,\n", "$$\n", "\n", "where $I_n$ is an identity matrix $n\\times n$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Complex $n\\times n$ square matrix is called **unitary** if\n", "$$\n", " U^*U = UU^* = I_n,\n", "$$\n", "which means that columns and rows of unitary matrices both form orthonormal basis in $\\mathbb{C}^{n}$.\n", "\n", "For rectangular matrices of size $m\\times n$ ($n\\not= m$) only one of the equalities can hold\n", "\n", "$$\n", " U^*U = I_n \\text{ –– left unitary for $m>n$} \\quad \\text{or} \\quad UU^* = I_m \\text{ –– right unitary for $m " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Important property of Householder reflection\n", "\n", "A nice property of Housholder transformation is that it can zero all elements of a vector except for the first one:\n", "\n", "$$\n", " H \\begin{bmatrix} \\times \\\\ \\times \\\\ \\times \\\\ \\times \\end{bmatrix} = \n", " \\begin{bmatrix} \\times \\\\ 0 \\\\ 0 \\\\ 0 \\end{bmatrix}.\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "*Proof.* Let $e_1 = (1,0,\\dots, 0)^T$, then we want to find $v$ such that\n", "\n", "$$\n", " H x = x - 2(v^* x) v = \\alpha e_1,\n", "$$\n", "\n", "where $\\alpha$ is unknown constant. Since $\\|\\cdot\\|_2$ is unitary invariant we get\n", "$$\\|x\\|_2 = \\|Hx\\|_2 = \\|\\alpha e_1\\|_2 = |\\alpha|.$$\n", "and $$\\alpha = \\pm \\|x\\|_2$$\n", "\n", "Also, we can express $v$ from $x - 2(v^* x) v = \\alpha e_1$:\n", "\n", "$$v = \\dfrac{x-\\alpha e_1}{2 v^* x}$$\n", "\n", "Multiplying the latter expression by $x^*$ we get\n", "\n", "$$\n", " x^* x - 2 (v^* x) x^* v = \\alpha x_1;\n", "$$\n", "$$\n", " \\|x\\|_2^2 - 2 (v^* x)^2 = \\alpha x_1\n", "$$\n", "$$\n", " (v^* x)^2 = \\frac{\\|x\\|_2^2 - \\alpha x_1}{2}.\n", "$$\n", "\n", "So, $v$ exists and equals\n", "$$\n", " v = \\dfrac{x \\pm \\|x\\|_2 e_1}{2v^* x} = \\dfrac{x \\pm \\|x\\|_2 e_1}{\\pm\\sqrt{2(\\|x\\|_2^2 \\mp \\|x\\|_2 x_1)}}.\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Housholder algorithm for QR decomposition\n", "\n", "Using the obtained property we can make arbitrary matrix $A$ lower triangular:\n", "$$\n", "H_2 H_1 A = \n", "\\begin{bmatrix}\n", "\\times & \\times & \\times & \\times \\\\ \n", "0 & \\times & \\times & \\times \\\\ \n", "0 & 0 & \\boldsymbol{\\times} & \\times\\\\ \n", "0 &0 & \\boldsymbol{\\times} & \\times \\\\ \n", "0 &0 & \\boldsymbol{\\times} & \\times \n", "\\end{bmatrix}\n", "$$\n", "then finding $H_3=\\begin{bmatrix}I_2 & \\\\ & {\\widetilde H}_3 \\end{bmatrix}$ such that\n", "$$ \n", "{\\widetilde H}_3 \\begin{bmatrix} \\boldsymbol{\\times}\\\\ \\boldsymbol{\\times} \\\\ \\boldsymbol{\\times} \\end{bmatrix} = \n", "\\begin{bmatrix} \\times \\\\ 0 \\\\ 0 \\end{bmatrix}.\n", "$$\n", "we get\n", "$$\n", " H_3 H_2 H_1 A = \n", " \\begin{bmatrix}\n", " \\times & \\times & \\times & \\times \\\\ \n", " 0 & \\times & \\times & \\times \\\\ \n", " 0 & 0 & {\\times} & \\times\\\\ \n", " 0 &0 & 0 & \\times \\\\ \n", " 0 &0 & 0 & \\times \n", " \\end{bmatrix}\n", "$$\n", "Finding $H_4$ by analogy we arrive at upper-triangular matrix." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Since product and inverse of unitary matrices is a unitary matrix we get:\n", "\n", "** Corollary:** (QR decomposition) Every $A\\in \\mathbb{C}^{n\\times m}$ can be represented as\n", "\n", "$$\n", " A = QR,\n", "$$\n", "\n", "where $Q$ is unitary and $R$ is upper triangular. \n", "\n", "See [poster](../decompositions.pdf), what are the sizes of $Q$ and $R$ for $n>m$ and $n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "alpha = -3*np.pi / 4\n", "G = np.array([\n", " [np.cos(alpha), -np.sin(alpha)],\n", " [np.sin(alpha), np.cos(alpha)]\n", "])\n", "x = np.array([-1./np.sqrt(2), 1./np.sqrt(2)])\n", "y = G.dot(x)\n", "\n", "plt.quiver([0, 0], [0, 0], [x[0], y[0]], [x[1], y[1]], angles='xy', scale_units='xy', scale=1)\n", "plt.xlim(-1., 1.)\n", "plt.ylim(-1., 1.)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## QR via Givens rotations\n", "\n", "Similarly we can make matrix upper-triagular using Givens rotations:\n", "\n", "$$\\begin{bmatrix} \\times & \\times & \\times \\\\ \\bf{*} & \\times & \\times \\\\ \\bf{*} & \\times & \\times \\end{bmatrix} \\to \\begin{bmatrix} * & \\times & \\times \\\\ * & \\times & \\times \\\\ 0 & \\times & \\times \\end{bmatrix} \\to \\begin{bmatrix} \\times & \\times & \\times \\\\ 0 & * & \\times \\\\ 0 & * & \\times \\end{bmatrix} \\to \\begin{bmatrix} \\times & \\times & \\times \\\\ 0 & \\times & \\times \\\\ 0 & 0 & \\times \\end{bmatrix} $$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Givens vs. Housholder\n", "\n", "- Housholder is useful for dense matrices (complexity is $\\approx$ twice smaller than for Jacobi) and we need to zero large number of elements.\n", "- Givens rotations are more suitable for sparse matrice or parallel machines as it acts locally on elements." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Singular Value Decomposition\n", "\n", "SVD will be considered later in more details.\n", "\n", "**Theorem** Any matrix $A\\in \\mathbb{C}^{n\\times m}$ can be written as a product of three matrices: \n", "\n", "$$\n", " A = U \\Sigma V^*,\n", "$$\n", "\n", "where \n", "- $U$ is an $n \\times n$ unitary matrix\n", "- $V$ is an $m \\times m$ unitary matrix\n", "- $\\Sigma$ is a diagonal matrix with non-negative elements $\\sigma_1 \\geq \\ldots, \\geq \\sigma_{\\min (m,n)}$ on the diagonal.\n", "\n", "Moreover, if $\\text{rank}(A) = r$, then $\\sigma_{r+1} = \\dots = \\sigma_{\\min (m,n)} = 0$.\n", "\n", "See [poster](../decompositions.pdf) for the visualization.\n", "\n", "If one truncates (replace by $0$) all singular values except for $r$ first, then the resulting matrix yields best rank-$r$ approximation both in $\\|\\cdot\\|_2$ and $\\|\\cdot\\|_F$. This is called Eckart-Young theorem and will be proved later in our course." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Summary \n", "\n", "- Most important matrix norms: Frobenius and spectral\n", "\n", "- Unitary matrices preserve thes norms\n", "\n", "- There are two \"basic\" classes of unitary matrices, Householder and Givens." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Questions?" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.display import HTML\n", "def css_styling():\n", " styles = open(\"./styles/custom.css\", \"r\").read()\n", " return HTML(styles)\n", "css_styling()" ] } ], "metadata": { "anaconda-cloud": {}, "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" }, "nav_menu": {}, "toc": { "navigate_menu": true, "number_sections": false, "sideBar": true, "threshold": 6, "toc_cell": false, "toc_section_display": "block", "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }