{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Lecture 6: Eigenvalues and eigenvectors" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Recap of the previous lecture\n", "- Linear systems\n", "- Gaussian elimination\n", "- LU decomposition\n", "- Condition number as a measure of forward stability of the problem" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Today lecture\n", "Today we will talk about:\n", "- Eigenvectors and their applications (PageRank)\n", "- Gershgorin circles\n", "- Computing eigenvectors using power method\n", "- Schur theorem\n", "- Normal matrices" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## What is an eigenvector?\n", "\n", "- **Definition.** A vector $x \\ne 0$ is called an **eigenvector** of a square matrix $A$ if there exists a number $\\lambda$ such that \n", "\n", "$$ Ax = \\lambda x. $$\n", "\n", "- The number $\\lambda$ is called an **eigenvalue**. The name **eigenpair** is also used.\n", "\n", "- Since $A - \\lambda I$ should have a non-trivial kernel, eigenvalues are the roots of the characteristic polynomial\n", "\n", "$$ \\det (A - \\lambda I) = 0.$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Eigendecomposition\n", "If matrix $A$ of size $n\\times n$ has $n$ eigenvectors $s_i$, $i=1,\\dots,n$:\n", "\n", "$$ As_i = \\lambda_i s_i, $$\n", "\n", "then this can be written as\n", "\n", "$$ A S = S \\Lambda, \\quad\\text{where}\\quad S=(s_1,\\dots,s_n), \\quad \\Lambda = \\text{diag}(\\lambda_1, \\dots, \\lambda_n), $$\n", "\n", "or equivalently\n", "\n", "$$ A = S\\Lambda S^{-1}. $$\n", "\n", "- This is called **eigendecomposition** of a matrix. Matrices that can be represented by their eigendecomposition are called **diagonalizable**." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Existence\n", "\n", "- What classes of matrices are diagonalizable?\n", "\n", "- Simple example can be matrices with all different eigenvalues.\n", "- More generally, matrix is diagonalizable iff **algebraic multiplicity** of each eigenvalue (mutiplicity of eigenvalue in the characteristic polynomial) is equal to its **geometric multiplicity** (dimension of eigensubspace). \n", "\n", "- For our purposes the most important class of diagonalizable matrices is the class of **normal matrices**: \n", "\n", "$$AA^* = A^* A.$$\n", "\n", "- You will learn how to prove that normal matrices are diagonalizable after a few slides (Schur decomposition topic)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Example\n", "\n", "* You can simply check that, e.g. matrix \n", "\n", "$$A = \\begin{pmatrix} 1 & 1 \\\\ 0 & 1 \\end{pmatrix}$$ \n", "\n", "has one eigenvalue $1$ of multiplicity $2$ (since its characteristic polynomial is $p(\\lambda)=(1-\\lambda)^2$), but only one eigenvector $\\begin{pmatrix} c \\\\ 0 \\end{pmatrix}$ and hence the matrix is not diagonalizable." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Why eigenvectors and eigenvalues are important?\n", "\n", "- Eigenvectors are both important auxiliary tools and also play important role in applications. \n", "- To start with all our microworld is governed by the **Schrodinger equation** which is an eigenvalue problem:\n", "\n", "$$ H \\psi = E \\psi, $$\n", "\n", "where $E$ is the ground state energy, $\\psi$ is called wavefunction and $H$ is the Hamiltonian. \n", "- More than 50% of the computer power is spent on solving this type of problems for computational material / drug design." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Eigenvalues are vibrational frequencies\n", "\n", "A typical computation of eigenvectors / eigenvectors is for studying \n", "\n", "- Vibrational computations of mechanical structures\n", "- Model order reduction of complex systems" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "from IPython.display import YouTubeVideo \n", "YouTubeVideo(\"xKGA3RNzvKg\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Google PageRank\n", "- One of the most famous eigenvectors computation is the **Google PageRank**. \n", "- It is not actively used by Google nowadays, but it was one of the main features in its early stages. The question is how do we rank webpages, which one is important, and which one is not. \n", "- All we know about the web is which page refers to which. PageRank is defined by a recursive definition. \n", "- Denote by $p_i$ the **importance** of the $i$-th page. \n", "- Then we define this importance as an average value of all importances of all pages that refer to the current page. It gives us a linear system \n", "\n", "$$ p_i = \\sum_{j \\in N(i)} \\frac{p_j}{L(j)}, $$\n", "\n", "where $L(j)$ is the number of outgoing links on the $j$-th page, $N(i)$ are all the neighbours of the $i$-th page. It can be rewritten as \n", "\n", "$$ p = G p, \\quad G_{ij} = \\frac{1}{L(j)} $$\n", "\n", "or as an eigenvalue problem" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "$$\n", " Gp = 1 p,\n", "$$\n", "\n", "i.e. the eigenvalue $1$ is already known. Note that $G$ is **left stochastic**, i.e. its columns sum up to $1$. \n", "Check that any left stochastic matrix has maximum eigenvalue equal to $1$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Demo\n", "- We can compute PageRank using some Python packages. \n", "- We will use ```networkx``` package for working with graphs that can be installed using \n", "\n", "```conda install networkx```\n", "\n", "- Other packages to work with graphs in Python are [igraph](https://igraph.org/) and [graph-tool](https://graph-tool.skewed.de/)\n", "- They can be useful in your projects" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- We will use a simple example of [Zachary karate club network](https://en.wikipedia.org/wiki/Zachary%27s_karate_club). \n", "- This data was manually collected in 1977, and is a classical social network dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "import numpy as np\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import networkx as nx\n", "kn = nx.read_gml('karate.gml')\n", "#nx.write_gml(kn, 'karate2.gml')\n", "nx.draw_networkx(kn, node_color=\"red\") #Draw the graph" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Now we can actually compute the PageRank using the NetworkX built-in function. \n", "- We also plot the size of the nodes larger if its PageRank is larger." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "pr = nx.algorithms.link_analysis.pagerank(kn)\n", "pr_vector = list(pr.values())\n", "pr_vector = np.array(pr_vector) * 3000\n", "nx.draw_networkx(kn, node_size=pr_vector, node_color=\"red\", labels=None)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Computations of eigenvalues\n", "\n", "- How to compute eigenvalues and eigenvectors? \n", "\n", "There are two types of eigenproblems:\n", "\n", "- full eigenproblem (all eigenvalues & eigenvectors are required)\n", "- partial eigenvalues (minimal/maximal eigenvalues, eigenvalues within the specified region are required)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Computation of the eigenvalues via characteristic equations\n", "The eigenvalue problem has the form \n", "\n", "$$ Ax = \\lambda x, $$\n", "\n", "or\n", "\n", "$$ (A - \\lambda I) x = 0, $$\n", "\n", "therefore matrix $A - \\lambda I$ has non-trivial kernel and should be singular. \n", "\n", "That means, that the **determinant** \n", "\n", "$$ p(\\lambda) = \\det(A - \\lambda I) = 0. $$\n", "\n", "- The equation is called **characteristic equations** and is a polynomial of order $n$. \n", "- The $n$-degree polynomial has $n$ complex roots!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Recall the determinant\n", "The determinant of a square matrix $A$ is defined as \n", "\n", "$$\\det A = \\sum_{\\sigma \\in S_n} \\mathrm{sgn}({\\sigma})\\prod^n_{i=1} a_{i, \\sigma_i},$$\n", "\n", "where \n", "- $S_n$ is the set of all **permutations** of the numbers $1, \\ldots, n$\n", "- $\\mathrm{sgn}$ is the **signature** of the permutation ( $(-1)^p$, where $p$ is the number of transpositions to be made)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Properties of determinant\n", "Determinant has many nice properties:\n", "\n", "_1._ $\\det(AB) = \\det(A) \\det(B)$\n", "\n", "_2._ If we have one row as a sum of two vectors, determinant is a sum of two determinants\n", "\n", "_3._ \"Minor expansion\": we can expand determinant through a selected row or column.\n", "\n", "- If you do it via **minor expansion**, we get **exponential** complexity in $n$.\n", "\n", "- Can we do $\\mathcal{O}(n^3)$? " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Eigenvalues and characteristic equation\n", "\n", "- Now we go back to the eigenvalues.\n", "\n", "- The characteristic equation can be used to compute the eigenvalues, which leads to **naïve** algorithm:\n", "\n", "$$p(\\lambda) = \\det(A - \\lambda I)$$\n", "\n", "1. Compute coefficients of the polynomial\n", "2. Compute the roots\n", "\n", "**Is this a good idea**? \n", "\n", "**Give your feedback**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We can do a short demo of this" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "import numpy as np\n", "n = 40\n", "a = [[1.0 / (i - j + 0.5) for i in range(n)] for j in range(n)]\n", "a = np.array(a)\n", "ev = np.linalg.eigvals(a)\n", "#There is a special numpy function for chacteristic polynomial\n", "cf = np.poly(a)\n", "ev_roots = np.roots(cf)\n", "#print('Coefficients of the polynomial:', cf)\n", "#print('Polynomial roots:', ev_roots)\n", "plt.scatter(ev_roots.real, ev_roots.imag, marker='x', label='roots')\n", "b = a + 0 * np.random.randn(n, n)\n", "ev_b = np.linalg.eigvals(b)\n", "plt.scatter(ev_b.real, ev_b.imag, marker='o', label='Lapack')\n", "#plt.scatter(ev_roots.real, ev_roots.imag, marker='o', label='Brute force')\n", "plt.legend(loc='best')\n", "plt.xlabel('Real part')\n", "plt.ylabel('Imaginary part')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Morale\n", "\n", "- Do not do that, unless you have a reason. \n", "- Polynomial rootfinding is very **ill-conditioned** (can be much better, but not with monomials $\\{1,x,x^2,\\dots\\}$!). Note that Gram matrix of monomials \n", "\n", "$$h_{ij} = \\int_0^1 x^i x^j\\, dx = \\frac{1}{i+j+1},$$\n", "\n", "is the Hilbert matrix, which has exponential decay of singular values. \n", "- So, monomials are \"almost\" linearly dependent." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Gershgorin circles\n", "- There is a very interesting theorem that sometimes helps to localize the eigenvalues. \n", "\n", "- It is called **Gershgorin theorem**. \n", "\n", "- It states that all eigenvalues $\\lambda_i, i = 1, \\ldots, n$ are located inside the union of **Gershgorin circles** $C_i$, where $C_i$ is a disk on the complex plane with center $a_{ii}$ and radius \n", "\n", "$$r_i = \\sum_{j \\ne i} |a_{ij}|.$$\n", "\n", "- Moreover, if the circles do not intersect they contain only one eigenvalue per circle. \n", "- The proof is instructive since it uses the concepts we looked at the previous lectures." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Proof\n", "First, we need to show that if the matrix $A$ is **strictly diagonally dominant**, i.e. \n", "\n", "$$\n", " |a_{ii}| > \\sum_{j \\ne i} |a_{ij}|,\n", "$$\n", "then such matrix is non-singular." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We separate the diagonal part and off-diagonal part, and get \n", "\n", "$$\n", " A = D + S = D( I + D^{-1}S),\n", "$$\n", "\n", "and $\\Vert D^{-1} S\\Vert_1 < 1$. Therefore, by using the **Neumann series**, the matrix $I + D^{-1}S$ is invertible and hence $A$ is invertible." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Now the proof follows by contradiction: \n", "\n", "- if any of the eigenvalues lies outside all of the circles, the matrix $(A - \\lambda I)$ \n", "is strictly diagonally dominant\n", "- thus it is invertible\n", "- that means, that $(A - \\lambda I) x = 0$ means $x = 0$. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## A short demo" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "import numpy as np\n", "%matplotlib inline\n", "n = 3\n", "fig, ax = plt.subplots(1, 1)\n", "a = [[5, 1, 1], [1, 0, 0.5], [2, 0, 10]]\n", "#a = [[1.0 / (i - j + 0.5) for i in xrange(n)] for j in xrange(n)]\n", "a = np.array(a)\n", "#a = np.diag(np.arange(n))\n", "a = a + 2 * np.random.randn(n, n)\n", "#u = np.random.randn(n, n)\n", "#a = np.linalg.inv(u).dot(a).dot(u)\n", "xg = np.diag(a).real\n", "yg = np.diag(a).imag\n", "rg = np.zeros(n)\n", "ev = np.linalg.eigvals(a)\n", "for i in range(n):\n", " rg[i] = np.sum(np.abs(a[i, :])) - np.abs(a[i, i])\n", " crc = plt.Circle((xg[i], yg[i]), radius=rg[i], fill=False)\n", " ax.add_patch(crc)\n", "plt.scatter(ev.real, ev.imag, color='r', label=\"Eigenvalues\")\n", "plt.axis('equal')\n", "plt.legend()\n", "ax.set_title('Eigenvalues and Gershgorin circles')\n", "fig.tight_layout()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Note**: There are more complicated figures, like **[Cassini ovals](https://en.wikipedia.org/wiki/Cassini_oval)**, that include the spectrum\n", "\n", "$$ _{ij} = \\{z\\in\\mathbb{C}: |a_{ii} - z|\\cdot |a_{jj} - z|\\leq r_i r_j\\}, \\quad r_i = \\sum_{l\\not= i} |a_{il}|. $$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Power method\n", "\n", "- We are often interested in the computation of the part of the spectrum, like the largest eigenvalues or smallest eigenvalues. \n", "- Also it is interesting to note that for the Hermitian matrices $(A = A^*)$ the eigenvalues are always real (prove it!). \n", "- A power method is the simplest method for the computation of **the largest eigenvalue in modulus**. \n", "- It is also our first example of the **iterative method** and **Krylov method**." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Power method\n", "\n", "- The eigenvalue problem\n", "\n", "$$Ax = \\lambda x, \\quad \\Vert x \\Vert_2 = 1 \\ \\text{for stability}.$$ \n", "\n", "can be rewritten as a **fixed-point iteration**.\n", "- This iteration is called **power method** and finds the largest in modulus eigenvalue of $A$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Power method has the form\n", "\n", "$$ x_{k+1} = A x_k, \\quad x_{k+1} := \\frac{x_{k+1}}{\\Vert x_{k+1} \\Vert_2}$$\n", "\n", "and \n", "\n", "$$ x_{k+1}\\to v_1,$$ \n", "\n", "where $Av_1 = \\lambda_1 v_1$ and $\\lambda_1$ is the largest eigenvalue and $v_1$ is the corresponding eigenvector.\n", "\n", "- On the $(k+1)$-th iteration approximation to $\\lambda_1$ can be found as\n", "\n", "$$ \\lambda^{(k+1)} = (Ax_{k+1}, x_{k+1}), $$\n", "\n", "- Note that $\\lambda^{(k+1)}$ is not required for the $(k+2)$-th iteration, but might be useful to measure error on each iteration: $\\|Ax_{k+1} - \\lambda^{(k+1)}x_{k+1}\\|$. \n", "\n", "- The convergence is geometric, but the convergence ratio is $q^k$, where $q = \\left|\\frac{\\lambda_{2}}{\\lambda_{1}}\\right| < 1$, for $\\lambda_1>\\lambda_2\\geq\\dots\\geq \\lambda_n$ and $k$ is the number of iteration. \n", "- It means, the convergence can be artitrary small. To prove it, it is sufficient to consider a $2 \\times 2$ diagonal matrix." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Convergence analysis for $A=A^*$\n", "\n", "Let's have a more precise look at the power method when $A$ is Hermitian.\n", "In two slides you will learn that every Hermitian matrix is diagonalizable. Therefore, there exists orthonormal basis of eigenvectors $v_1,\\dots,v_n$ such that $Av_i = \\lambda_i v_i$. Let us decompose $x_0$ into a sum of $v_i$ with coefficients $c_i$:\n", "\n", "$$ x_0 = c_1 v_1 + \\dots + c_n v_n. $$\n", "\n", "Since $v_i$ are eigenvectors, we have\n", "\n", "$$\n", "\\begin{split}\n", "x_1 &= \\frac{Ax_0}{\\|Ax_0\\|} = \\frac{c_1 \\lambda_1 v_1 + \\dots + c_n \\lambda_n v_n}{\\|c_1 \\lambda_1 v_1 + \\dots + c_n \\lambda_n v_n \\|} \\\\\n", "&\\vdots\\\\\n", "x_k &= \\frac{Ax_{k-1}}{\\|Ax_{k-1}\\|} = \\frac{c_1 \\lambda_1^k v_1 + \\dots + c_n \\lambda_n^k v_n}{\\|c_1 \\lambda_1^k v_1 + \\dots + c_n \\lambda_n^k v_n \\|}\n", "\\end{split}\n", "$$\n", "\n", "Now you see, that \n", "\n", "$$\n", "x_k = \\frac{c_1}{|c_1|}\\left(\\frac{\\lambda_1}{|\\lambda_1|}\\right)^k\\frac{ v_1 + \\frac{c_2}{c_1}\\frac{\\lambda_2^k}{\\lambda_1^k}v_2 + \\dots + \\frac{c_n}{c_1}\\frac{\\lambda_n^k}{\\lambda_1^k}v_n}{\\left\\|v_1 + \\frac{c_2}{c_1}\\frac{\\lambda_2^k}{\\lambda_1^k}v_2 + \\dots + \\frac{c_n}{c_1}\\frac{\\lambda_n^k}{\\lambda_1^k}v_n\\right\\|},\n", "$$\n", "\n", "which converges to $v_1$ since $\\left| \\frac{c_1}{|c_1|}\\left(\\frac{\\lambda_1}{|\\lambda_1|}\\right)^k\\right| = 1$ and $\\left(\\frac{\\lambda_2}{\\lambda_1}\\right)^k \\to 0$ if $|\\lambda_2|<|\\lambda_1|$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Things to remember about the power method\n", "- Power method gives estimate of the largest eigenvalue in modulus or spectral radius of the given matrix\n", "- One step requires one matrix-by-vector product. If the matrix allows for an $\\mathcal{O}(n)$ matvec (for example, it is sparse), \n", " then power method is tractable for larger $n$.\n", "- Convergence can be slow\n", "- If only a rough estimate is needed, only a few iterations are sufficient\n", "- The solution vector is in the **Krylov subspace** $\\{x_0, Ax_0,\\dots,A^{k}x_0\\}$ and has the form $\\mu A^k x_0$, where $\\mu$ is the normalization constant. \n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Matrix decomposition: the Schur form\n", "There is one class of matrices when eigenvalues can be found easily: **triangular matrices**\n", "\n", "$$\n", " A = \\begin{pmatrix}\n", " \\lambda_1 & * & * \\\\\n", " 0 & \\lambda_2 & * \\\\\n", " 0 & 0 & \\lambda_3 \\\\\n", " \\end{pmatrix}.\n", "$$\n", "\n", "The eigenvalues of $A$ are $\\lambda_1, \\lambda_2, \\lambda_3$. Why? " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Because the determinant is \n", "\n", "$$\n", " \\det(A - \\lambda I) = (\\lambda - \\lambda_1) (\\lambda - \\lambda_2) (\\lambda - \\lambda_3).\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- Thus, computing the eigenvalues of triangular matrices is easy. Now, the unitary matrices come to help. \n", "- Let $U$ be a unitary matrix, i.e. $U^* U = I$. Then \n", "\n", "$$\n", " \\det(A - \\lambda I) = \\det(U (U^* A U - \\lambda I) U^*) = \\det(UU^*) \\det(U^* A U - \\lambda I) = \\det(U^* A U - \\lambda I),\n", "$$\n", "\n", "where we have used the famous multiplicativity property of the determinant, $\\det(AB) = \\det(A) \\det(B)$. \n", "- It means, that the matrices $U^* A U$ and $A$ have the same characteristic polynomials, and the same eigenvalues. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- If we manage to make $U^* A U = T$ where $T$ is **upper triangular**, then we are done. \n", "\n", "- Multplying from the left and the right by $U$ and $U^*$ respectively, we get the desired decomposition: \n", "\n", "$$ A = U T U^*. $$\n", "\n", "This is the celebrated **Schur decomposition**. \n", "- Recall that unitary matrices imply stability, thus the eigenvalues are computed very accurately. The Schur decomposition shows why we need matrix decompositions: it represents a matrix into a product of three matrices with a convenient structure. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Schur theorem\n", "\n", "**Theorem:** Every $A \\in \\mathbb{C}^{n \\times n}$ matrix can be represented in the Schur form $A = UTU^*$, where $U$ is unitary and $T$ is upper triangular. \n", "\n", "**Sketch of the proof**.\n", "1. Every matrix has at least $1$ non-zero eigenvector (take a root of characteristic polynomial, $(A-\\lambda I)$ is singular, has non-trivial nullspace). Let \n", "\n", "$$Av_1 = \\lambda_1 v_1, \\quad \\Vert v_1 \\Vert_2 = 1$$\n", "\n", "2. Let $U_1 = [v_1,v_2,\\dots,v_n]$, where $v_2,\\dots, v_n$ are any vectors othogonal to $v_1$. Then \n", " \n", "$$ U^*_1 A U_1 = \\begin{pmatrix} \\lambda_1 & * \\\\ 0 & A_2 \\end{pmatrix}, $$\n", " \n", "where $A_2$ is an $(n-1) \\times (n-1)$ matrix. This is called **block triangular form**. We can now work with $A_2$ only and so on. \n", " \n", "**Note**: Since we need eigenvectors in this proof, this proof is not a practical algorithm." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Application of the Schur theorem\n", "\n", "- Important application of the Schur theorem: **normal matrices**. \n", "\n", "- **Definition.** Matrix $A$ is called **normal matrix**, if \n", "\n", "$$ AA^* = A^* A. $$\n", "\n", "**Q:** Examples of normal matrices?" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "Examples: Hermitian matrices, unitary matrices. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Normal matrices\n", "\n", "**Theorem**: $A$ is a **normal matrix**, iff $A = U \\Lambda U^*$, where $U$ is unitary and $\\Lambda$ is diagonal. \n", "\n", "**Sketch of the proof:** \n", "- One way is straightforward (if the decomposition holds, the matrix is normal). \n", "- The other is more complicated. Consider the Schur form of the matrix $A$. Then $AA^* = A^*A$ means $TT^* = T^* T$. \n", "- By looking at the elements we immediately see, that the only upper triangular matrix $T$ that satisfies $TT^* = T^* T$ is a diagonal matrix!\n", "\n", "#### Important consequence\n", "\n", "Therefore, every normal matrix is **unitary diagonalizable**, which means that it can be diagonalized by unitary matrix $U$. \n", "\n", "In other words every normal matrix has orthogonal basis of eigenvectors." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## How we compute the Schur decomposition?\n", "\n", "- Everything is fine, but how we compute the Schur form?\n", "\n", "- This will be covered in the next lecture." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Variational principle for eigenvalues\n", "\n", "- In many cases, minimal/maximal eigenvalues are needed\n", "- Then, if $A$ is a Hermitian matrix, the **Rayleigh quotient** is defined as\n", "\n", "$$R_A(x) = \\frac{(Ax, x)}{(x, x)},$$\n", "\n", "and the maximal eigenvalue is the maximum of $R_A(x)$, and the minimal eigenvalue is the minimal of $R_A(x)$. \n", "\n", "- Thus, we can use optimization method to find these **extreme eigenvalues**.\n", "\n", "Now, \"advanced\" concept." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Spectrum and pseudospectrum\n", "\n", "- For linear dynamical systems given by the matrix $A$, spectrum can tell a lot about the system (i.e. stability, ...)\n", "\n", "- However, for **non-normal matrices**, spectrum can be unstable with respect to small perturbations.\n", "\n", "- In order to measure such perturbation, the notion of **pseudospectrum** has been developed." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Pseudospectrum\n", "\n", "We consider the union of all possible eigenvalues of all perturbations of the matrix $A$.\n", "\n", "$$\\Lambda_{\\epsilon}(A) = \\{ \\lambda \\in \\mathbb{C}: \\exists E, x \\ne 0: (A + E) x = \\lambda x, \\quad \\Vert E \\Vert_2 \\leq \\epsilon. \\}$$\n", "\n", "- For small $E$ and normal $A$ these will be circules around eigenvalues, for non-normal matrices, the structure can be much different. More details: http://www.cs.ox.ac.uk/pseudospectra/\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Summary of todays lecture\n", "\n", "- Eigenvalues, eigenvectors\n", "- Gershgorin theorem\n", "- Power method\n", "- Schur theorem \n", "- Normal matrices\n", "- Some advanced topics\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Next lecture\n", "- Matrix factorizations and how we compute them." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Questions?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from IPython.core.display import HTML\n", "def css_styling():\n", " styles = open(\"./styles/custom.css\", \"r\").read()\n", " return HTML(styles)\n", "css_styling()" ] } ], "metadata": { "anaconda-cloud": {}, "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" }, "latex_envs": { "LaTeX_envs_menu_present": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "nav_menu": {}, "toc": { "navigate_menu": true, "number_sections": false, "sideBar": true, "threshold": 6, "toc_cell": false, "toc_section_display": "block", "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }