{ "metadata": { "language": "Julia", "name": "", "signature": "sha256:d626340af21f8d147b9383b7356827d91ec39cb12d353836849107f24b740e24" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Design considerations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a JIT (\"Just in Time\")-compiled language, Julia is designed for good performance. Currently, it is usually expected that it should usually be able to reach speeds within at most a factor of 2 of that of corresponding C code.\n", "\n", "However, to attain decent performance, there are certain principles that must be used in code; see the [Performance tips](http://julia.readthedocs.org/en/latest/manual/performance-tips/) section of the Julia manual for more details." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Profiling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When profiling, always run each function once with the correct argument types before timing it, since the first time it is run the compilation time will play a large role." ] }, { "cell_type": "code", "collapsed": false, "input": [ "Pkg.update()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stderr", "text": [ "INFO: Updating METADATA...\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "INFO: Updating cache of Compose...\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "INFO: Updating cache of Gadfly...\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "INFO: Computing changes...\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "INFO: Cloning cache of Contour from git://github.com/tlycken/Contour.jl.git\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "INFO: Upgrading Compose: v0.3.0 => v0.3.1\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "INFO: Installing Contour v0.0.1\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "INFO: Upgrading Gadfly: v0.3.0 => v0.3.1\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "INFO: Building Datetime\n" ] } ], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "@time sin(10)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "elapsed time: 6" ] }, { "output_type": "stream", "stream": "stdout", "text": [ ".941e-6 seconds (96 bytes allocated)\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 2, "text": [ "-0.5440211108893698" ] } ], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "a = 3" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ "3" ] } ], "prompt_number": 3 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "No global variables please!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Global variables are slow in Julia: **do not use global variables!**\n", "\n", "Your main program should be wrapped in a function.\n", "Any time you are tempted to use globals, just send them as arguments to functions, and return them if necessary.\n", "\n", "If you have many variables to pass around, wrap them in a type, e.g. called `State`" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Type stability" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The second important idea for gaining performance is that of *type stability*.\n", "\n", "Any calculation will be immediately slowed down by having variables which can change type during a calculation, simply due to the extra work that must be done at run-time to check the type of the variables. (This is one of the main reasons for the slowness of Python and the necessity for type declarations in Cython to gain speed.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A simple example (due to Leah Hanson) is the following pair of almost-identical functions:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "function sum1(N::Int)\n", " total = 0\n", " \n", " for i in 1:N\n", " total += i/2\n", " end\n", " \n", " total\n", "end\n", "\n", "function sum2(N::Int)\n", " total = 0.0\n", " \n", " for i in 1:N\n", " total += i/2\n", " end\n", " \n", " total\n", "end" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ "sum2 (generic function with 1 method)" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We must first run the functions once each to compile them, before looking at any timings:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "sum1(10), sum2(10)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ "(27.5,27.5)" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Happily, they produce the same result!]" ] }, { "cell_type": "code", "collapsed": false, "input": [ "N = 10000000\n", "\n", "@time sum1(N)\n", "@time sum2(N)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "elapsed time: 0." ] }, { "output_type": "stream", "stream": "stdout", "text": [ "656544448 seconds (320108552 bytes allocated, 36.27% gc time)\n", "elapsed time: 0.039772714 seconds (96 bytes allocated)\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ "2.50000025e13" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The second version is consistently over **10 times faster** than the first version, due simply to type stability. It also allocates almost no memory. The first version allocates an enormous amount of memory (in fact, it is allocating and deallocating all the time), and spends a large fraction of its time in garbage collection." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To help with type stability, there are functions `zero(x)` and `one(x)` that return the correctly-typed zero and one with the same type as the variable `x`:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Packages: `Lint.jl`, `TypeCheck.jl`" ] }, { "cell_type": "code", "collapsed": false, "input": [ "x = 1\n", "zero(x)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 2, "text": [ "0" ] } ], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "y = 0.5\n", "zero(y)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ "0.0" ] } ], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "x = BigFloat(\"0.1\")\n", "one(x)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ "1e+00 with 256 bits of precision" ] } ], "prompt_number": 8 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Exploring the guts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Julia gives us access to basically every step in the compilation process:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "code_lowered(sum1, (Int,))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 34, "text": [ "1-element Array{Any,1}:\n", " :($(Expr(:lambda, {:N}, {{:total,:#s246,:#s245,:#s244,:i},{{:N,:Any,0},{:total,:Any,2},{:#s246,:Any,18},{:#s245,:Any,2},{:#s244,:Any,18},{:i,:Any,18}},{}}, :(begin # In[23], line 2:\n", " total = 0 # line 4:\n", " #s246 = colon(1,N)\n", " #s245 = top(start)(#s246)\n", " unless top(!)(top(done)(#s246,#s245)) goto 1\n", " 2: \n", " #s244 = top(next)(#s246,#s245)\n", " i = top(tupleref)(#s244,1)\n", " #s245 = top(tupleref)(#s244,2) # line 5:\n", " total = total + i / 2\n", " 3: \n", " unless top(!)(top(!)(top(done)(#s246,#s245))) goto 2\n", " 1: \n", " 0: # line 8:\n", " return total\n", " end))))" ] } ], "prompt_number": 34 }, { "cell_type": "code", "collapsed": false, "input": [ "code_lowered(sum2, (Int,))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 35, "text": [ "1-element Array{Any,1}:\n", " :($(Expr(:lambda, {:N}, {{:total,:#s246,:#s245,:#s244,:i},{{:N,:Any,0},{:total,:Any,2},{:#s246,:Any,18},{:#s245,:Any,2},{:#s244,:Any,18},{:i,:Any,18}},{}}, :(begin # In[23], line 12:\n", " total = 0.0 # line 14:\n", " #s246 = colon(1,N)\n", " #s245 = top(start)(#s246)\n", " unless top(!)(top(done)(#s246,#s245)) goto 1\n", " 2: \n", " #s244 = top(next)(#s246,#s245)\n", " i = top(tupleref)(#s244,1)\n", " #s245 = top(tupleref)(#s244,2) # line 15:\n", " total = total + i / 2\n", " 3: \n", " unless top(!)(top(!)(top(done)(#s246,#s245))) goto 2\n", " 1: \n", " 0: # line 18:\n", " return total\n", " end))))" ] } ], "prompt_number": 35 }, { "cell_type": "code", "collapsed": false, "input": [ "code_typed(sum1, (Int,))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 36, "text": [ "1-element Array{Any,1}:\n", " :($(Expr(:lambda, {:N}, {{:total,:#s246,:#s245,:#s244,:i,:_var0,:_var1},{{:N,Int64,0},{:total,Any,2},{:#s246,UnitRange{Int64},18},{:#s245,Int64,2},{:#s244,(Int64,Int64),18},{:i,Int64,18},{:_var0,Int64,18},{:_var1,Int64,18}},{}}, :(begin # In[23], line 2:\n", " total = 0 # line 4:\n", " #s246 = $(Expr(:new, UnitRange{Int64}, 1, :(top(getfield)(Intrinsics,:select_value)(top(sle_int)(1,N::Int64)::Bool,N::Int64,top(box)(Int64,top(sub_int)(1,1))::Int64)::Int64)))::UnitRange{Int64}\n", " #s245 = top(getfield)(#s246::UnitRange{Int64},:start)::Int64\n", " unless top(box)(Bool,top(not_int)(#s245::Int64 === top(box)(Int64,top(add_int)(top(getfield)(#s246::UnitRange{Int64},:stop)::Int64,1))::Int64::Bool))::Bool goto 1\n", " 2: \n", " _var0 = #s245::Int64\n", " _var1 = top(box)(Int64,top(add_int)(#s245::Int64,1))::Int64\n", " i = _var0::Int64\n", " #s245 = _var1::Int64 # line 5:\n", " total = total::Union(Int64,Float64) + top(box)(Float64,top(div_float)(top(box)(Float64,top(sitofp)(Float64,i::Int64))::Float64,top(box)(Float64,top(sitofp)(Float64,2))::Float64))::Float64::Float64\n", " 3: \n", " unless top(box)(Bool,top(not_int)(top(box)(Bool,top(not_int)(#s245::Int64 === top(box)(Int64,top(add_int)(top(getfield)(#s246::UnitRange{Int64},:stop)::Int64,1))::Int64::Bool))::Bool))::Bool goto 2\n", " 1: \n", " 0: # line 8:\n", " return total::Union(Int64,Float64)\n", " end::Union(Int64,Float64)))))" ] } ], "prompt_number": 36 }, { "cell_type": "code", "collapsed": false, "input": [ "code_typed(sum2, (Int,))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 37, "text": [ "1-element Array{Any,1}:\n", " :($(Expr(:lambda, {:N}, {{:total,:#s246,:#s245,:#s244,:i,:_var0,:_var1},{{:N,Int64,0},{:total,Float64,2},{:#s246,UnitRange{Int64},18},{:#s245,Int64,2},{:#s244,(Int64,Int64),18},{:i,Int64,18},{:_var0,Int64,18},{:_var1,Int64,18}},{}}, :(begin # In[23], line 12:\n", " total = 0.0 # line 14:\n", " #s246 = $(Expr(:new, UnitRange{Int64}, 1, :(top(getfield)(Intrinsics,:select_value)(top(sle_int)(1,N::Int64)::Bool,N::Int64,top(box)(Int64,top(sub_int)(1,1))::Int64)::Int64)))::UnitRange{Int64}\n", " #s245 = top(getfield)(#s246::UnitRange{Int64},:start)::Int64\n", " unless top(box)(Bool,top(not_int)(#s245::Int64 === top(box)(Int64,top(add_int)(top(getfield)(#s246::UnitRange{Int64},:stop)::Int64,1))::Int64::Bool))::Bool goto 1\n", " 2: \n", " _var0 = #s245::Int64\n", " _var1 = top(box)(Int64,top(add_int)(#s245::Int64,1))::Int64\n", " i = _var0::Int64\n", " #s245 = _var1::Int64 # line 15:\n", " total = top(box)(Float64,top(add_float)(total::Float64,top(box)(Float64,top(div_float)(top(box)(Float64,top(sitofp)(Float64,i::Int64))::Float64,top(box)(Float64,top(sitofp)(Float64,2))::Float64))::Float64))::Float64\n", " 3: \n", " unless top(box)(Bool,top(not_int)(top(box)(Bool,top(not_int)(#s245::Int64 === top(box)(Int64,top(add_int)(top(getfield)(#s246::UnitRange{Int64},:stop)::Int64,1))::Int64::Bool))::Bool))::Bool goto 2\n", " 1: \n", " 0: # line 18:\n", " return total::Float64\n", " end::Float64))))" ] } ], "prompt_number": 37 }, { "cell_type": "code", "collapsed": false, "input": [ "code_llvm(sum1, (Int, ))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\n", "define %jl_value_t* @\"julia_sum1;19538\"(i64) {\n", "top:\n", " %1 = alloca [5 x %jl_value_t*], align 8\n", " %.sub = getelementptr inbounds [5 x %jl_value_t*]* %1, i64 0, i64 0\n", " %2 = getelementptr [5 x %jl_value_t*]* %1, i64 0, i64 2, !dbg !2590\n", " store %jl_value_t* inttoptr (i64 6 to %jl_value_t*), %jl_value_t** %.sub, align 8\n", " %3 = load %jl_value_t*** @jl_pgcstack, align 8, !dbg !2590\n", " %4 = getelementptr [5 x %jl_value_t*]* %1, i64 0, i64 1, !dbg !2590\n", " %.c = bitcast %jl_value_t** %3 to %jl_value_t*, !dbg !2590\n", " store %jl_value_t* %.c, %jl_value_t** %4, align 8, !dbg !2590\n", " store %jl_value_t** %.sub, %jl_value_t*** @jl_pgcstack, align 8, !dbg !2590\n", " %5 = getelementptr [5 x %jl_value_t*]* %1, i64 0, i64 3\n", " store %jl_value_t* null, %jl_value_t** %5, align 8\n", " %6 = getelementptr [5 x %jl_value_t*]* %1, i64 0, i64 4\n", " store %jl_value_t* null, %jl_value_t** %6, align 8\n", " store %jl_value_t* inttoptr (i64 140474354759232 to %jl_value_t*), %jl_value_t** %2, align 8, !dbg !2591\n", " %7 = icmp sgt i64 %0, 0, !dbg !2592\n", " br i1 %7, label %L, label %L3, !dbg !2592\n", "\n", "L: ; preds = %top, %L\n", " %8 = phi %jl_value_t* [ %16, %L ], [ inttoptr (i64 140474354759232 to %jl_value_t*), %top ], !dbg !2592\n", " %\"#s245.0\" = phi i64 [ %9, %L ], [ 1, %top ]\n", " %9 = add i64 %\"#s245.0\", 1, !dbg !2592\n", " store %jl_value_t* %8, %jl_value_t** %5, align 8, !dbg !2593\n", " %10 = sitofp i64 %\"#s245.0\" to double, !dbg !2593\n", " %11 = fmul double %10, 5.000000e-01, !dbg !2593\n", " %12 = call %jl_value_t* @alloc_2w(), !dbg !2593\n", " %13 = getelementptr inbounds %jl_value_t* %12, i64 0, i32 0, !dbg !2593\n", " store %jl_value_t* inttoptr (i64 140474354684320 to %jl_value_t*), %jl_value_t** %13, align 8, !dbg !2593\n", " %14 = getelementptr inbounds %jl_value_t* %12, i64 1, i32 0, !dbg !2593\n", " %15 = bitcast %jl_value_t** %14 to double*, !dbg !2593\n", " store double %11, double* %15, align 8, !dbg !2593\n", " store %jl_value_t* %12, %jl_value_t** %6, align 8, !dbg !2593\n", " %16 = call %jl_value_t* @jl_apply_generic(%jl_value_t* inttoptr (i64 140474385387040 to %jl_value_t*), %jl_value_t** %5, i32 2), !dbg !2593\n", " store %jl_value_t* %16, %jl_value_t** %2, align 8, !dbg !2593\n", " %17 = icmp eq i64 %\"#s245.0\", %0, !dbg !2593\n", " br i1 %17, label %L3, label %L, !dbg !2593\n", "\n", "L3: ; preds = %L, %top\n", " %18 = phi %jl_value_t* [ inttoptr (i64 140474354759232 to %jl_value_t*), %top ], [ %16, %L ]\n", " %19 = load %jl_value_t** %4, align 8, !dbg !2594\n", " %20 = getelementptr inbounds %jl_value_t* %19, i64 0, i32 0, !dbg !2594\n", " store %jl_value_t** %20, %jl_value_t*** @jl_pgcstack, align 8, !dbg !2594\n", " ret %jl_value_t* %18, !dbg !2594\n", "}\n" ] } ], "prompt_number": 38 }, { "cell_type": "code", "collapsed": false, "input": [ "code_native(sum1, (Int,))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\t.section\t__TEXT,__text,regular,pure_instructions\n", "Filename: In[23]\n", "Source line: 2\n", "\tpush\tRBP\n", "\tmov\tRBP, RSP\n", "\tpush\tR15\n", "\tpush\tR14\n", "\tpush\tR13\n", "\tpush\tR12\n", "\tpush\tRBX\n", "\tsub\tRSP, 56\n", "\tmov\tR12, RDI\n", "\tmov\tQWORD PTR [RBP - 80], 6\n", "Source line: 2\n", "\tmovabs\tRCX, 4463631920\n", "\tmov\tRAX, QWORD PTR [RCX]\n", "\tmov\tQWORD PTR [RBP - 72], RAX\n", "\tlea\tRAX, QWORD PTR [RBP - 80]\n", "\tmov\tQWORD PTR [RCX], RAX\n", "\tmov\tQWORD PTR [RBP - 56], 0\n", "\tmov\tQWORD PTR [RBP - 48], 0\n", "\tmovabs\tRAX, 140474354759232\n", "Source line: 2\n", "\tmov\tQWORD PTR [RBP - 64], RAX\n", "\ttest\tR12, R12\n", "\tjle\t121\n", "\tmov\tEBX, 1\n", "Source line: 5\n", "\tmovabs\tR13, 4451150720\n", "\tmovabs\tR15, 140474354684320\n", "\tmovabs\tRCX, 4605169152\n", "\tvmovsd\tXMM0, QWORD PTR [RCX]\n", "\tvmovsd\tQWORD PTR [RBP - 88], XMM0\n", "\tmovabs\tR14, 4450807376\n", "\tmov\tQWORD PTR [RBP - 56], RAX\n", "\tcall\tR13\n", "\tmov\tQWORD PTR [RAX], R15\n", "\tvcvtsi2sd\tXMM0, XMM0, RBX\n", "\tvmulsd\tXMM0, XMM0, QWORD PTR [RBP - 88]\n", "\tvmovsd\tQWORD PTR [RAX + 8], XMM0\n", "\tmov\tQWORD PTR [RBP - 48], RAX\n", "\tmovabs\tRDI, 140474385387040\n", "\tlea\tRSI, QWORD PTR [RBP - 56]\n", "\tmov\tEDX, 2\n", "\tcall\tR14\n", "Source line: 4\n", "\tinc\tRBX\n", "Source line: 5\n", "\tdec\tR12\n", "\tmov\tQWORD PTR [RBP - 64], RAX\n", "\tjne\t-67\n", "Source line: 8\n", "\tmov\tRCX, QWORD PTR [RBP - 72]\n", "Source line: 2\n", "\tmovabs\tRDX, 4463631920\n", "Source line: 8\n", "\tmov\tQWORD PTR [RDX], RCX\n", "\tadd\tRSP, 56\n", "\tpop\tRBX\n", "\tpop\tR12\n", "\tpop\tR13\n", "\tpop\tR14\n", "\tpop\tR15\n", "\tpop\tRBP\n", "\tret\n" ] } ], "prompt_number": 39 }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Profiling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Simple profiling of a function may be achieved using the `@time` macro\n", "\n", "A detailed profile may be obtained using `@profile`.\n", "\n", "A graphical view is available via the `ProfileView.jl` package." ] }, { "cell_type": "code", "collapsed": false, "input": [ "@profile sum1(10000000)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 10, "text": [ "2.50000025e13" ] } ], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "f(N) = sum1(N)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ "f (generic function with 1 method)" ] } ], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "@profile f(10000000)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "2.50000025e13" ] } ], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }