{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "# Table of Contents\n", "

1  Algorithms
1.1  Measure of efficiency
1.2  Performance of computer systems
versioninfo()

# Algorithms

* Algorithm is loosely defined as a set of instructions for doing something. Input $\\to$ Output.\n", "\n", "* [Knuth](https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming): (1) finiteness, (2) definiteness, (3) input, (4) output, (5) effectiveness.\n", "\n", "\n", "## Measure of efficiency\n", "\n", "* A basic unit for measuring algorithmic efficiency is **flop**. \n", "> A flop (**floating point operation**) consists of a floating point addition, subtraction, multiplication, division, or comparison, and the usually accompanying fetch and store. \n", "\n", "Some books count multiplication followed by an addition (fused multiply-add, FMA) as one flop. This results a factor of up to 2 difference in flop counts.\n", "\n", "* How to measure efficiency of an algorithm? Big O notation. If $n$ is the size of a problem, an algorithm has order $O(f(n))$, where the leading term in the number of flops is $c \\cdot f(n)$. For example,\n", " - matrix-vector multiplication `A * b`, where `A` is $m \\times n$ and `b` is $n \\times 1$, takes $2mn$ or $O(mn)$ flops \n", " - matrix-matrix multiplication `A * B`, where `A` is $m \\times n$ and `B` is $n \\times p$, takes $2mnp$ or $O(mnp)$ flops\n", "\n", "* A hierarchy of computational complexity: \n", " Let $n$ be the problem size.\n", " - Exponential order: $O(b^n)$ (NP-hard=\"horrible\") \n", " - Polynomial order: $O(n^q)$ (doable) \n", " - $O(n \\log n )$ (fast) \n", " - Linear order $O(n)$ (fast) \n", " - Log order $O(\\log n)$ (super fast) \n", " \n", "* Classification of data sets by [Huber](http://link.springer.com/chapter/10.1007%2F978-3-642-52463-9_1).\n", "\n", "| Data Size | Bytes | Storage Mode |\n", "|-----------|-----------|-----------------------|\n", "| Tiny | $10^2$ | Piece of paper |\n", "| Small | $10^4$ | A few pieces of paper |\n", "| Medium | $10^6$ (megatbytes) | A floppy disk |\n", "| Large | $10^8$ | Hard disk |\n", "| Huge | $10^9$ (gigabytes) | Hard disk(s) |\n", "| Massive | $10^{12}$ (terabytes) | RAID storage |\n", "\n", "* Difference of $O(n^2)$ and $O(n\\log n)$ on massive data. Suppose we have a teraflop supercomputer capable of doing $10^{12}$ flops per second. For a problem of size $n=10^{12}$, $O(n \\log n)$ algorithm takes about \n", "$$10^{12} \\log (10^{12}) / 10^{12} \\approx 27 \\text{ seconds}.$$ \n", "$O(n^2)$ algorithm takes about $10^{12}$ seconds, which is approximately 31710 years!\n", "\n", "* QuickSort and FFT (invented by Tukey!) are celebrated algorithms that turn $O(n^2)$ operations into $O(n \\log n)$. Suppose we have a teraflop supercomputer capable of doing $10^{12}$ flops per second. For a problem of size $n=10^{12}$, $O(n \log n)$ algorithm takes about 
$$10^{12} \log (10^{12}) / 10^{12} \approx 27 \text{ seconds}.$$ 
$O(n^2)$ algorithm takes about $10^{12}$ seconds, which is approximately 31710 years!

* QuickSort and FFT (invented by Tukey!) are celebrated algorithms that turn $O(n^2)$ operations into $O(n \log n)$. Another example is the Strassen's method, which turns $O(n^3)$ matrix multiplication into $O(n^{\\log_2 7})$. 

* One goal of this course is to get familiar with the flop counts for some common numerical tasks in statistics. 
> **The form of a mathematical expression and the way the expression should be evaluated in actual practice may be quite different.**

* For example, compare flops of the two mathematically equivalent expressions: `A * B * x` and `A * (B * x)` where `A` and `B` are matrices and `x` is a vector.

using BenchmarkTools, Random

Random.seed!(123) # seed
n = 1000
A = randn(n, n)
B = randn(n, n)
x = randn(n)

# complexity is n^3 + n^2 = O(n^3)
@benchmark $A * $B * $x

BenchmarkTools.Trial: 
 memory estimate: 7.64 MiB
 allocs estimate: 3
 --------------
 minimum time: 13.637 ms (0.00% GC)
 median time: 13.927 ms (0.00% GC)
 mean time: 14.571 ms (3.61% GC)
 maximum time: 49.666 ms (70.42% GC)
 --------------
 samples: 343
 evals/sample: 1

# complexity is n^2 + n^2 = O(n^2)
@benchmark $A * ($B * $x)

BenchmarkTools.Trial: 
 memory estimate: 15.88 KiB
 allocs estimate: 2
 --------------
 minimum time: 431.931 μs (0.00% GC)
 median time: 533.876 μs (0.00% GC)
 mean time: 555.505 μs (0.00% GC)
 maximum time: 1.894 ms (0.00% GC)
 --------------
 samples: 8707
 evals/sample: 1

## Performance of computer systems

* **FLOPS** (floating point operations per second) is a measure of computer performance. 

* For example, my laptop has the Intel i7-6920HQ (Skylake) CPU with 4 cores runing at 2.90 GHz (cycles per second).

Intel Skylake CPUs can do 16 DP flops per cylce and 32 SP flops per cycle. Then the **theoretical throughput** of my laptop is\n", "$$ 4 \\times 2.9 \\times 10^9 \\times 16 = 185.6 \\text{ GFLOPS DP} $$\n", "in double precision and\n", "$$ 4 \\times 2.9 \\times 10^9 \\times 32 = 371.2 \\text{ GFLOPS SP} $$\n", "in single precision. \n", "\n", "* In Julia, computes the peak flop rate of the computer by using double precision `gemm!`" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.5873294017052066e11" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using LinearAlgebra\n", "\n", "LinearAlgebra.peakflops(2^14) # matrix size 2^14" ] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.1.0", "language": "julia", "name": "julia-1.1" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.1.0" }, "toc": { "colors": { "hover_highlight": "#DAA520", "running_highlight": "#FF0000", "selected_highlight": "#FFD700" }, "moveMenuLeft": true, "nav_menu": { "height": "67px", "width": "252px" }, "navigate_menu": true, "number_sections": true, "sideBar": true, "skip_h1_title": true, "threshold": 4, "toc_cell": true, "toc_section_display": "block", "toc_window_display": true, "widenNotebook": false } }, "nbformat": 4, "nbformat_minor": 2 }