{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "# Table of Contents\n", "

1  Algorithms
1.1  Measure of efficiency
1.2  Performance of computer systems
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Julia Version 1.1.0\n", "Commit 80516ca202 (2019-01-21 21:24 UTC)\n", "Platform Info:\n", " OS: macOS (x86_64-apple-darwin14.5.0)\n", " CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz\n", " WORD_SIZE: 64\n", " LIBM: libopenlibm\n", " LLVM: libLLVM-6.0.1 (ORCJIT, skylake)\n", "Environment:\n", " JULIA_EDITOR = code\n" ] } ], "source": [ "versioninfo()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Algorithms\n", "\n", "* Algorithm is loosely defined as a set of instructions for doing something. Input $\\to$ Output.\n", "\n", "* [Knuth](https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming): (1) finiteness, (2) definiteness, (3) input, (4) output, (5) effectiveness.\n", "\n", "\n", "## Measure of efficiency\n", "\n", "* A basic unit for measuring algorithmic efficiency is **flop**. \n", "> A flop (**floating point operation**) consists of a floating point addition, subtraction, multiplication, division, or comparison, and the usually accompanying fetch and store. \n", "\n", "Some books count multiplication followed by an addition (fused multiply-add, FMA) as one flop. This results a factor of up to 2 difference in flop counts.\n", "\n", "* How to measure efficiency of an algorithm? Big O notation. If $n$ is the size of a problem, an algorithm has order $O(f(n))$, where the leading term in the number of flops is $c \\cdot f(n)$. For example,\n", " - matrix-vector multiplication `A * b`, where `A` is $m \\times n$ and `b` is $n \\times 1$, takes $2mn$ or $O(mn)$ flops \n", " - matrix-matrix multiplication `A * B`, where `A` is $m \\times n$ and `B` is $n \\times p$, takes $2mnp$ or $O(mnp)$ flops\n", "\n", "* A hierarchy of computational complexity: \n", " Let $n$ be the problem size.\n", " - Exponential order: $O(b^n)$ (NP-hard=\"horrible\") \n", " - Polynomial order: $O(n^q)$ (doable) \n", " - $O(n \\log n )$ (fast) \n", " - Linear order $O(n)$ (fast) \n", " - Log order $O(\\log n)$ (super fast) \n", " \n", "* Classification of data sets by [Huber](http://link.springer.com/chapter/10.1007%2F978-3-642-52463-9_1).\n", "\n", "| Data Size | Bytes | Storage Mode |\n", "|-----------|-----------|-----------------------|\n", "| Tiny | $10^2$ | Piece of paper |\n", "| Small | $10^4$ | A few pieces of paper |\n", "| Medium | $10^6$ (megatbytes) | A floppy disk |\n", "| Large | $10^8$ | Hard disk |\n", "| Huge | $10^9$ (gigabytes) | Hard disk(s) |\n", "| Massive | $10^{12}$ (terabytes) | RAID storage |\n", "\n", "* Difference of $O(n^2)$ and $O(n\\log n)$ on massive data. Suppose we have a teraflop supercomputer capable of doing $10^{12}$ flops per second. For a problem of size $n=10^{12}$, $O(n \\log n)$ algorithm takes about \n", "$$10^{12} \\log (10^{12}) / 10^{12} \\approx 27 \\text{ seconds}.$$ \n", "$O(n^2)$ algorithm takes about $10^{12}$ seconds, which is approximately 31710 years!\n", "\n", "* QuickSort and FFT (invented by Tukey!) are celebrated algorithms that turn $O(n^2)$ operations into $O(n \\log n)$. Another example is the Strassen's method, which turns $O(n^3)$ matrix multiplication into $O(n^{\\log_2 7})$. \n", "\n", "* One goal of this course is to get familiar with the flop counts for some common numerical tasks in statistics. \n", "> **The form of a mathematical expression and the way the expression should be evaluated in actual practice may be quite different.**\n", "\n", "* For example, compare flops of the two mathematically equivalent expressions: `A * B * x` and `A * (B * x)` where `A` and `B` are matrices and `x` is a vector." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 7.64 MiB\n", " allocs estimate: 3\n", " --------------\n", " minimum time: 13.637 ms (0.00% GC)\n", " median time: 13.927 ms (0.00% GC)\n", " mean time: 14.571 ms (3.61% GC)\n", " maximum time: 49.666 ms (70.42% GC)\n", " --------------\n", " samples: 343\n", " evals/sample: 1" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using BenchmarkTools, Random\n", "\n", "Random.seed!(123) # seed\n", "n = 1000\n", "A = randn(n, n)\n", "B = randn(n, n)\n", "x = randn(n)\n", "\n", "# complexity is n^3 + n^2 = O(n^3)\n", "@benchmark $A * $B * $x" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 15.88 KiB\n", " allocs estimate: 2\n", " --------------\n", " minimum time: 431.931 μs (0.00% GC)\n", " median time: 533.876 μs (0.00% GC)\n", " mean time: 555.505 μs (0.00% GC)\n", " maximum time: 1.894 ms (0.00% GC)\n", " --------------\n", " samples: 8707\n", " evals/sample: 1" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# complexity is n^2 + n^2 = O(n^2)\n", "@benchmark $A * ($B * $x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Performance of computer systems\n", "\n", "* **FLOPS** (floating point operations per second) is a measure of computer performance. \n", "\n", "* For example, my laptop has the Intel i7-6920HQ (Skylake) CPU with 4 cores runing at 2.90 GHz (cycles per second)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Julia Version 1.1.0\n", "Commit 80516ca202 (2019-01-21 21:24 UTC)\n", "Platform Info:\n", " OS: macOS (x86_64-apple-darwin14.5.0)\n", " CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz\n", " WORD_SIZE: 64\n", " LIBM: libopenlibm\n", " LLVM: libLLVM-6.0.1 (ORCJIT, skylake)\n", "Environment:\n", " JULIA_EDITOR = code\n" ] } ], "source": [ "versioninfo()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Intel Skylake CPUs can do 16 DP flops per cylce and 32 SP flops per cycle. Then the **theoretical throughput** of my laptop is\n", "$$ 4 \\times 2.9 \\times 10^9 \\times 16 = 185.6 \\text{ GFLOPS DP} $$\n", "in double precision and\n", "$$ 4 \\times 2.9 \\times 10^9 \\times 32 = 371.2 \\text{ GFLOPS SP} $$\n", "in single precision. \n", "\n", "* In Julia, computes the peak flop rate of the computer by using double precision `gemm!`" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.5873294017052066e11" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using LinearAlgebra\n", "\n", "LinearAlgebra.peakflops(2^14) # matrix size 2^14" ] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.1.0", "language": "julia", "name": "julia-1.1" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.1.0" }, "toc": { "colors": { "hover_highlight": "#DAA520", "running_highlight": "#FF0000", "selected_highlight": "#FFD700" }, "moveMenuLeft": true, "nav_menu": { "height": "67px", "width": "252px" }, "navigate_menu": true, "number_sections": true, "sideBar": true, "skip_h1_title": true, "threshold": 4, "toc_cell": true, "toc_section_display": "block", "toc_window_display": true, "widenNotebook": false } }, "nbformat": 4, "nbformat_minor": 2 }