{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Iterative Methods for Solving Linear Equations\n",
    "\n",
    "## Introduction\n",
    "\n",
    "So far we have considered direct methods for solving linear equations.    \n",
    "\n",
    "* **Direct methods** (flops fixed _a priori_) vs **iterative methods**:\n",
    "    - Direct method (GE/LU, Cholesky, QR, SVD): good for dense, small to moderate sized $\\mathbf{A}$  \n",
    "    - Iterative methods (Jacobi, Gauss-Seidal, SOR, conjugate-gradient, GMRES): good for large, sparse, structured linear system, parallel computing, warm start\n",
    "    \n",
    "\n",
    "    \n",
    "## PageRank problem\n",
    "\n",
    "<img src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/f/fb/PageRanks-Example.svg/400px-PageRanks-Example.svg.png\" width=\"300\" align=\"center\"/>\n",
    "\n",
    "* $\\mathbf{A}  \\in \\{0,1\\}^{n \\times n}$ the connectivity matrix of webpages with entries\n",
    "$$\n",
    "\\begin{eqnarray*}\n",
    "\ta_{ij} = \\begin{cases}\n",
    "\t1 &  \\text{if page $i$ links to page $j$} \\\\\n",
    "\t0 & \\text{otherwise}\n",
    "\t\\end{cases}. \n",
    "\\end{eqnarray*}\n",
    "$$\n",
    "$n \\approx 10^9$ in May 2017.\n",
    "\n",
    "* $r_i = \\sum_j a_{ij}$ is the *out-degree* of page $i$. \n",
    "\n",
    "* [Larry Page](https://en.wikipedia.org/wiki/PageRank) imagines a random surfer wandering on internet according to following rules:\n",
    "    - From a page $i$ with $r_i>0$\n",
    "        * with probability $p$, (s)he randomly chooses a link on page $i$ (uniformly) and follows that link to the next page  \n",
    "        * with probability $1-p$, (s)he randomly chooses one page from the set of all $n$ pages (uniformly) and proceeds to that page \n",
    "    - From a page $i$ with $r_i=0$ (a dangling page), (s)he randomly chooses one page from the set of all $n$ pages (uniformly) and proceeds to that page\n",
    "    \n",
    "The process defines a Markov chain on the space of $n$ pages. Stationary distribution of this Markov chain gives the ranks (probabilities) of each page.\n",
    "\n",
    "* Stationary distribution is the top left eigenvector of the transition matrix $\\mathbf{P}$ corresponding to eigenvalue 1. Equivalently it can be cast as a linear equation.\n",
    "$$\n",
    "    (\\mathbf{I} - \\mathbf{P}^T) \\mathbf{x} = \\mathbf{0}.\n",
    "$$\n",
    "\n",
    "* Gene Golub: Largest matrix computation in world. \n",
    "\n",
    "* GE/LU will take $2 \\times (10^9)^3/3/10^{12} \\approx 6.66 \\times 10^{14}$ seconds $\\approx 20$ million years on a tera-flop supercomputer!\n",
    "\n",
    "* Iterative methods come to the rescue."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Jacobi method\n",
    "\n",
    "<img src=\"https://www.usna.edu/Users/math/meh/Jacobi.jpeg\" width=\"150\" align=\"center\"/>\n",
    "\n",
    "Solve $\\mathbf{A} \\mathbf{x} = \\mathbf{b}$.\n",
    "\n",
    "* Jacobi iteration: \n",
    "$$\n",
    "x_i^{(t+1)} = \\frac{b_i - \\sum_{j=1}^{i-1} a_{ij} x_j^{(t)} - \\sum_{j=i+1}^n a_{ij} x_j^{(t)}}{a_{ii}}.\n",
    "$$\n",
    "\n",
    "* With splitting: $\\mathbf{A} = \\mathbf{L} + \\mathbf{D} + \\mathbf{U}$, Jacobi iteration can be written as\n",
    "$$\n",
    "    \\mathbf{D} \\mathbf{x}^{(t+1)} = - (\\mathbf{L} + \\mathbf{U}) \\mathbf{x}^{(t)} + \\mathbf{b},\n",
    "$$\n",
    "i.e., \n",
    "$$\n",
    "\t\\mathbf{x}^{(t+1)} = - \\mathbf{D}^{-1} (\\mathbf{L} + \\mathbf{U}) \\mathbf{x}^{(t)} + \\mathbf{D}^{-1} \\mathbf{b} = - \\mathbf{D}^{-1} \\mathbf{A} \\mathbf{x}^{(t)} + \\mathbf{x}^{(t)} + \\mathbf{D}^{-1} \\mathbf{b}.\n",
    "$$\n",
    "\n",
    "* One round costs $2n^2$ flops with an **unstructured** $\\mathbf{A}$. Gain over GE/LU if converges in $o(n)$ iterations. Saving is huge for **sparse** or **structured** $\\mathbf{A}$. By structured, we mean the matrix-vector multiplication $\\mathbf{A} \\mathbf{v}$ is fast ($O(n)$ or $O(n \\log n)$)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Gauss-Seidel method\n",
    "\n",
    "<img src=\"https://upload.wikimedia.org/wikipedia/commons/9/9b/Carl_Friedrich_Gauss.jpg\" width=\"150\" align=\"center\"/>\n",
    "\n",
    "<img src=\"http://www.scientificlib.com/en/Physics/Biographies/images/Thumbs/ThLudwigVonSeidel01.jpg\" width=\"150\" align=\"center\"/>\n",
    "\n",
    "* Gauss-Seidel (GS) iteration:\n",
    "$$\n",
    "x_i^{(t+1)} = \\frac{b_i - \\sum_{j=1}^{i-1} a_{ij} x_j^{(t+1)} - \\sum_{j=i+1}^n a_{ij} x_j^{(t)}}{a_{ii}}.\n",
    "$$\n",
    "\n",
    "* With splitting, $(\\mathbf{D} + \\mathbf{L}) \\mathbf{x}^{(t+1)} = - \\mathbf{U} \\mathbf{x}^{(t)} + \\mathbf{b}$, i.e., \n",
    "$$\n",
    "\\mathbf{x}^{(t+1)} = - (\\mathbf{D} + \\mathbf{L})^{-1} \\mathbf{U} \\mathbf{x}^{(t)} + (\\mathbf{D} + \\mathbf{L})^{-1} \\mathbf{b}.\n",
    "$$\n",
    "\n",
    "* GS converges for any $\\mathbf{x}^{(0)}$ for symmetric and pd $\\mathbf{A}$.\n",
    "\n",
    "* Convergence rate of Gauss-Seidel is the spectral radius of the $(\\mathbf{D} + \\mathbf{L})^{-1} \\mathbf{U}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Successive over-relaxation (SOR)\n",
    "\n",
    "* SOR: \n",
    "$$\n",
    "x_i^{(t+1)} = \\omega \\left( b_i - \\sum_{j=1}^{i-1} a_{ij} x_j^{(t+1)} - \\sum_{j=i+1}^n a_{ij} x_j^{(t)} \\right) / a_{ii} + (1-\\omega) x_i^{(t)},\n",
    "$$\n",
    "i.e., \n",
    "$$\n",
    "\\mathbf{x}^{(t+1)} = (\\mathbf{D} + \\omega \\mathbf{L})^{-1} [(1-\\omega) \\mathbf{D} - \\omega \\mathbf{U}] \\mathbf{x}^{(t)} + (\\mathbf{D} + \\omega \\mathbf{L})^{-1} (\\mathbf{D} + \\mathbf{L})^{-1} \\omega \\mathbf{b}.\n",
    "$$\n",
    "\n",
    "* Need to pick $\\omega \\in [0,1]$ beforehand, with the goal of improving convergence rate."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conjugate gradient method\n",
    "\n",
    "* **Conjugate gradient and its variants are the top-notch iterative methods for solving huge, structured linear systems.**\n",
    "\n",
    "* A UCLA invention! \n",
    "\n",
    "* Solving $\\mathbf{A} \\mathbf{x} = \\mathbf{b}$ is equivalent to minimizing the quadratic function $\\frac{1}{2} \\mathbf{x}^T \\mathbf{A} \\mathbf{x} - \\mathbf{b}^T \\mathbf{x}$. \n",
    "\n",
    "* To do later, when we study optimization. \n",
    "\n",
    "[Kershaw's results](http://www.sciencedirect.com/science/article/pii/0021999178900980?via%3Dihub) for a fusion problem.\n",
    "\n",
    "| Method                                 | Number of Iterations |\n",
    "|----------------------------------------|----------------------|\n",
    "| Gauss Seidel                           | 208,000              |\n",
    "| Block SOR methods                      | 765                  |\n",
    "| Incomplete Cholesky conjugate gradient | 25                   |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Numerical examples\n",
    "\n",
    "The [`IterativeSolvers.jl`](https://github.com/JuliaMath/IterativeSolvers.jl) package implements many commonly used iterative solvers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "typeof(A) = SparseMatrixCSC{Float64,Int64}\n",
      "typeof(Afull) = Array{Float64,2}\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0.00425125"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "using BenchmarkTools, IterativeSolvers\n",
    "\n",
    "srand(280)\n",
    "\n",
    "n = 4000\n",
    "# a sparse pd matrix, about 0.5% non-zero entries\n",
    "A = sprand(n, n, 0.002)\n",
    "A = A + A' + (2n) * I\n",
    "@show typeof(A)\n",
    "b = randn(n)\n",
    "# dense matrix representation of A\n",
    "Afull = full(A)\n",
    "@show typeof(Afull)\n",
    "# actual sparsity level\n",
    "countnz(A) / length(A)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "BenchmarkTools.Trial: \n",
       "  memory estimate:  31.33 KiB\n",
       "  allocs estimate:  2\n",
       "  --------------\n",
       "  minimum time:     5.618 ms (0.00% GC)\n",
       "  median time:      7.455 ms (0.00% GC)\n",
       "  mean time:        7.712 ms (0.00% GC)\n",
       "  maximum time:     14.812 ms (0.00% GC)\n",
       "  --------------\n",
       "  samples:          646\n",
       "  evals/sample:     1"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# dense matrix-vector multiplication\n",
    "@benchmark Afull * b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "BenchmarkTools.Trial: \n",
       "  memory estimate:  31.33 KiB\n",
       "  allocs estimate:  2\n",
       "  --------------\n",
       "  minimum time:     66.647 μs (0.00% GC)\n",
       "  median time:      69.379 μs (0.00% GC)\n",
       "  mean time:        74.066 μs (2.24% GC)\n",
       "  maximum time:     1.427 ms (92.48% GC)\n",
       "  --------------\n",
       "  samples:          10000\n",
       "  evals/sample:     1"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# sparse matrix-vector multiplication\n",
    "@benchmark A * b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "BenchmarkTools.Trial: \n",
       "  memory estimate:  122.10 MiB\n",
       "  allocs estimate:  11\n",
       "  --------------\n",
       "  minimum time:     277.813 ms (2.90% GC)\n",
       "  median time:      279.444 ms (2.87% GC)\n",
       "  mean time:        284.541 ms (3.83% GC)\n",
       "  maximum time:     335.722 ms (17.85% GC)\n",
       "  --------------\n",
       "  samples:          18\n",
       "  evals/sample:     1"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# dense solve via Cholesky\n",
    "xchol = cholfact(Afull) \\ b\n",
    "@benchmark cholfact(Afull) \\ b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "vecnorm(xjacobi - xchol) = 0.009481824791564454\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0.009481824791564454"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Jacobi solution is fairly close to Cholesky solution\n",
    "xjacobi, = jacobi(A, b)\n",
    "@show vecnorm(xjacobi - xchol)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "BenchmarkTools.Trial: \n",
       "  memory estimate:  94.36 KiB\n",
       "  allocs estimate:  13\n",
       "  --------------\n",
       "  minimum time:     1.034 ms (0.00% GC)\n",
       "  median time:      1.061 ms (0.00% GC)\n",
       "  mean time:        1.100 ms (0.51% GC)\n",
       "  maximum time:     3.343 ms (58.19% GC)\n",
       "  --------------\n",
       "  samples:          4537\n",
       "  evals/sample:     1"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@benchmark jacobi(A, b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.009481824791564454"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Gauss-Seidel solution is fairly close to Cholesky solution\n",
    "xgs, = gauss_seidel(A, b)\n",
    "vecnorm(xgs - xchol)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "BenchmarkTools.Trial: \n",
       "  memory estimate:  63.11 KiB\n",
       "  allocs estimate:  13\n",
       "  --------------\n",
       "  minimum time:     1.116 ms (0.00% GC)\n",
       "  median time:      1.215 ms (0.00% GC)\n",
       "  mean time:        1.232 ms (0.26% GC)\n",
       "  maximum time:     3.518 ms (65.23% GC)\n",
       "  --------------\n",
       "  samples:          4050\n",
       "  evals/sample:     1"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@benchmark gauss_seidel(A, b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.009481824791561674"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# symmetric SOR with ω=0.75\n",
    "xsor, = ssor(A, b, 0.75)\n",
    "vecnorm(xsor - xchol)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "BenchmarkTools.Trial: \n",
       "  memory estimate:  94.81 KiB\n",
       "  allocs estimate:  38\n",
       "  --------------\n",
       "  minimum time:     1.123 ms (0.00% GC)\n",
       "  median time:      1.165 ms (0.00% GC)\n",
       "  mean time:        1.210 ms (0.39% GC)\n",
       "  maximum time:     2.644 ms (55.17% GC)\n",
       "  --------------\n",
       "  samples:          4121\n",
       "  evals/sample:     1"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@benchmark sor(A, b, 0.75)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.009481824791658372"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# conjugate gradient\n",
    "xcg, = cg(A, b)\n",
    "vecnorm(xcg - xchol)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "BenchmarkTools.Trial: \n",
       "  memory estimate:  126.86 KiB\n",
       "  allocs estimate:  31\n",
       "  --------------\n",
       "  minimum time:     229.348 μs (0.00% GC)\n",
       "  median time:      237.998 μs (0.00% GC)\n",
       "  mean time:        254.097 μs (3.40% GC)\n",
       "  maximum time:     2.293 ms (81.10% GC)\n",
       "  --------------\n",
       "  samples:          10000\n",
       "  evals/sample:     1"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@benchmark cg(A, b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Julia Version 0.6.2\n",
      "Commit d386e40c17 (2017-12-13 18:08 UTC)\n",
      "Platform Info:\n",
      "  OS: macOS (x86_64-apple-darwin14.5.0)\n",
      "  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz\n",
      "  WORD_SIZE: 64\n",
      "  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)\n",
      "  LAPACK: libopenblas64_\n",
      "  LIBM: libopenlibm\n",
      "  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)\n"
     ]
    }
   ],
   "source": [
    "versioninfo()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Julia 0.6.2",
   "language": "julia",
   "name": "julia-0.6"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "0.6.2"
  },
  "toc": {
   "colors": {
    "hover_highlight": "#DAA520",
    "running_highlight": "#FF0000",
    "selected_highlight": "#FFD700"
   },
   "moveMenuLeft": true,
   "nav_menu": {
    "height": "143px",
    "width": "252px"
   },
   "navigate_menu": true,
   "number_sections": false,
   "sideBar": true,
   "threshold": 4,
   "toc_cell": false,
   "toc_section_display": "block",
   "toc_window_display": false,
   "widenNotebook": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}