{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Writing Julia Functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The purpose of these notes is to introduce [R](http://www.r-project.org) programmers to the [Julia](http://julialang.org) programming language.\n", "\n", "Julia functions are sufficiently similar to R functions that most R programmers can read most Julia functions. They are sufficiently different that initial attempts at writing a Julia function can be frustrating.\n", "\n", "One way of learning a programming language like Julia is to read programs written by experienced in that language. One of the amazing aspects of Julia is that much of the Base system is written in Julia itself. One of the current (June, 2014) deficiencies of Julia is that the documentation, especially package documentation, is quite sparse and you often need to read the source of a function to understand its purpose.\n", "\n", "In these notes I am assuming the use of julia-0.4.2 or later." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Julia Version 0.4.6\n", "Commit 2e358ce (2016-06-19 17:16 UTC)\n", "Platform Info:\n", " System: Linux (x86_64-unknown-linux-gnu)\n", " CPU: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz\n", " WORD_SIZE: 64\n", " BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)\n", " LAPACK: libopenblas64_\n", " LIBM: libopenlibm\n", " LLVM: libLLVM-3.3\n" ] } ], "source": [ "versioninfo()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tools for examining functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Method signatures and sources" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Entering the name of a function in R returns the definition of the function. Entering the name of a function in Julia provides information on the number of methods defined for that generic. Remember that all Julia functions are generic functions." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "mean (generic function with 4 methods)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mean" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `methods` function gives the signatures of the methods for the generic." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "4 methods for generic function mean:" ], "text/plain": [ "# 4 methods for generic function \"mean\":\n", "mean{T<:Real}(r::Range{T<:Real}) at range.jl:710\n", "mean(A::AbstractArray{T,N}) at statistics.jl:19\n", "mean{T}(A::AbstractArray{T,N}, region) at statistics.jl:31\n", "mean(iterable) at statistics.jl:6" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "methods(mean)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In an IJulia notebook the names of the files are links to the file in the github repository." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Prebuilt binary packages for various operating systems may have a copy of the base directory elsewhere. The Ubuntu binaries put it in `/usr/share/julia/base`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The @which macro" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to functions Julia provides macros. Writing macros is not a priority when you start but you should know about some of the macros that can be useful with regard to functions. A call to a macro always begins with the `@` character and parenthesis around the argument list are optional." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For generics that have many methods (try `methods(*)`) it may not be obvious which method applies to a particular set of arguments. The `@which` macro tells you which method will be used. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "mean(A::AbstractArray{T,N}) at statistics.jl:19" ], "text/plain": [ "mean(A::AbstractArray{T,N}) at statistics.jl:19" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@which mean([1:10;])" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "In a REPL session the filenames will not be live links. You can use `@less mean([1:10;])` to view the source file at the indicated location." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other useful macros" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Two very useful macros for tuning the performance of your functions are `@time` and @profile. Recent versions of the `@time` tell you the elapsed time, the total number of bytes allocated and the proportion of the time spent in garbage collection.\n", "\n", "Note that the number of bytes allocated does not measure the increase in memory used by the process. Often objects are allocated, populated, used and freed within the course of a functions's execution.\n", "\n", "Also, bear in mind that the very short execution times are quite variable." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1x1000000 Array{Float64,2}:\n", " -0.141432 -0.307227 -0.276117 … -0.134498 0.0388967 -0.0461761" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mean(randn((25,1_000_000)),1) # 1,000,000 replications of a normal sample of size 25" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.154326 seconds (584 allocations: 198.398 MB, 2.17% gc time)\n" ] } ], "source": [ "@time mn25n = mean(randn(25,1_000_000),1);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first time a particular function signature is called it will be slower, often much slower, than subsequent times. This is because the first call to a signature can cause a lot of compilation of methods.\n", "\n", "Another useful macro is `@profile`, especially when paired with the `view` function in the `ProfileView` package." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "@profile mean(randn(25,1_000_000),1);" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "Profile results\n", "Function:\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n" ], "text/plain": [ "ProfileView.ProfileData(157x10 Array{ColorTypes.RGB{FixedPointNumbers.UFixed{UInt8,8}},2}:\n", " RGB{U8}(0.91,0.439,0.867) … RGB{U8}(1.0,1.0,1.0) \n", " RGB{U8}(0.91,0.439,0.867) RGB{U8}(1.0,1.0,1.0) \n", " RGB{U8}(0.91,0.439,0.867) RGB{U8}(1.0,1.0,1.0) \n", " RGB{U8}(0.91,0.439,0.867) RGB{U8}(1.0,1.0,1.0) \n", " RGB{U8}(0.91,0.439,0.867) RGB{U8}(1.0,1.0,1.0) \n", " RGB{U8}(0.91,0.439,0.867) … RGB{U8}(1.0,1.0,1.0) \n", " RGB{U8}(0.91,0.439,0.867) RGB{U8}(1.0,1.0,1.0) \n", " RGB{U8}(0.91,0.439,0.867) RGB{U8}(1.0,1.0,1.0) \n", " RGB{U8}(0.91,0.439,0.867) RGB{U8}(1.0,1.0,1.0) \n", " RGB{U8}(0.91,0.439,0.867) RGB{U8}(1.0,1.0,1.0) \n", " RGB{U8}(0.91,0.439,0.867) … RGB{U8}(1.0,1.0,1.0) \n", " RGB{U8}(0.91,0.439,0.867) RGB{U8}(1.0,1.0,1.0) \n", " RGB{U8}(0.91,0.439,0.867) RGB{U8}(1.0,1.0,1.0) \n", " ⋮ ⋱ \n", " RGB{U8}(0.91,0.439,0.867) … RGB{U8}(0.62,0.62,0.62) \n", " RGB{U8}(0.91,0.439,0.867) RGB{U8}(0.62,0.62,0.62) \n", " RGB{U8}(0.91,0.439,0.867) RGB{U8}(0.62,0.62,0.62) \n", " RGB{U8}(0.91,0.439,0.867) RGB{U8}(0.62,0.62,0.62) \n", " RGB{U8}(0.91,0.439,0.867) RGB{U8}(0.62,0.62,0.62) \n", " RGB{U8}(0.91,0.439,0.867) … RGB{U8}(0.804,0.725,1.0)\n", " RGB{U8}(0.91,0.439,0.867) RGB{U8}(1.0,1.0,1.0) \n", " RGB{U8}(0.91,0.439,0.867) RGB{U8}(1.0,1.0,1.0) \n", " RGB{U8}(0.91,0.439,0.867) RGB{U8}(1.0,1.0,1.0) \n", " RGB{U8}(0.91,0.439,0.867) RGB{U8}(1.0,1.0,1.0) \n", " RGB{U8}(0.91,0.439,0.867) … RGB{U8}(1.0,1.0,1.0) \n", " RGB{U8}(0.91,0.439,0.867) RGB{U8}(1.0,1.0,1.0) ,Dict(0x00007f6f0e206099=>Base.Profile.LineInfo(\"execute_request_0x535c5df2\",\"/home/bates/.julia/v0.4/IJulia/src/execute_request.jl\",183,\"\",-1,false,140114955100313),0x00007f6d071202e8=>Base.Profile.LineInfo(\"dsfmt_fill_array_close1_open2\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libdSFMT.so\",-1,\"\",-1,true,140106246783720),0x00007f6f0e22b5fd=>Base.Profile.LineInfo(\"mean\",\"statistics.jl\",31,\"\",-1,false,140114955253245),0x00007f6d0712043f=>Base.Profile.LineInfo(\"dsfmt_fill_array_close1_open2\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libdSFMT.so\",-1,\"\",-1,true,140106246784063),0x00007f6f0e2261be=>Base.Profile.LineInfo(\"randn!\",\"random.jl\",1135,\"\",-1,false,140114955231678),0x00007f6f0e22cb56=>Base.Profile.LineInfo(\"mapreduce_seq_impl\",\"simdloop.jl\",67,\"reduce.jl\",229,false,140114955258710),0x00007f6d071203c1=>Base.Profile.LineInfo(\"dsfmt_fill_array_close1_open2\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libdSFMT.so\",-1,\"\",-1,true,140106246783937),0x00007f6f0e22b4c9=>Base.Profile.LineInfo(\"mean\",\"statistics.jl\",31,\"\",-1,false,140114955252937),0x00007f6f0e22c2dc=>Base.Profile.LineInfo(\"julia_sum!_22093\",\"\",-1,\"\",-1,true,140114955256540),0x00007f6f0e1f84f8=>Base.Profile.LineInfo(\"eventloop\",\"/home/bates/.julia/v0.4/IJulia/src/IJulia.jl\",143,\"\",-1,false,140114955044088),0x00007f6d071202f5=>Base.Profile.LineInfo(\"dsfmt_fill_array_close1_open2\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libdSFMT.so\",-1,\"\",-1,true,140106246783733),0x00007f6f0e226578=>Base.Profile.LineInfo(\"randn_unlikely\",\"random.jl\",1126,\"\",-1,false,140114955232632),0x00007f6f0e1e63b7=>Base.Profile.LineInfo(\"gen_rand\",\"random.jl\",88,\"\",-1,false,140114954970039),0x00007f6f0e2261f0=>Base.Profile.LineInfo(\"randn!\",\"random.jl\",1135,\"\",-1,false,140114955231728),0x00007f6f0e226351=>Base.Profile.LineInfo(\"randn_unlikely\",\"random.jl\",1126,\"\",-1,false,140114955232081),0x00007f6cf73ab1a9=>Base.Profile.LineInfo(\"exp\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libopenlibm.so\",-1,\"\",-1,true,140105981014441),0x00007f6d0712027f=>Base.Profile.LineInfo(\"dsfmt_fill_array_close1_open2\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libdSFMT.so\",-1,\"\",-1,true,140106246783615),0x00007f6f0e22c19b=>Base.Profile.LineInfo(\"jlcall_sum!_22093\",\"\",-1,\"\",-1,true,140114955256219),0x00007f6f0e22b68a=>Base.Profile.LineInfo(\"scale!\",\"linalg/dense.jl\",16,\"\",-1,false,140114955253386),0x00007f6f0be9e8a0=>Base.Profile.LineInfo(\"???\",\"/lib/x86_64-linux-gnu/libc.so.6\",140114917976224,\"\",-1,true,140114917976224),0x00007f6f0e22c5b9=>Base.Profile.LineInfo(\"sum!\",\"reducedim.jl\",258,\"\",-1,false,140114955257273),0x00007f6cf73ab01c=>Base.Profile.LineInfo(\"exp\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libopenlibm.so\",-1,\"\",-1,true,140105981014044),0x00007f6f0e2057a7=>Base.Profile.LineInfo(\"jlcall_execute_request_0x535c5df2_21862\",\"\",-1,\"\",-1,true,140114955098023),0x00007f6f0e226210=>Base.Profile.LineInfo(\"randn!\",\"random.jl\",1135,\"\",-1,false,140114955231760),0x00007f6f0e22c85b=>Base.Profile.LineInfo(\"_mapreducedim!\",\"reducedim.jl\",197,\"\",-1,false,140114955257947),0x00007f6f0d1e9880=>Base.Profile.LineInfo(\"jl_gc_collect\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libjulia.so\",-1,\"\",-1,true,140114938206336),0x00007f6f0e1e6385=>Base.Profile.LineInfo(\"gen_rand\",\"random.jl\",88,\"\",-1,false,140114954969989),0x00007f6f0e2265fa=>Base.Profile.LineInfo(\"randn_unlikely\",\"random.jl\",1126,\"\",-1,false,140114955232762),0x00007f6f0e2261a1=>Base.Profile.LineInfo(\"randn!\",\"random.jl\",1135,\"\",-1,false,140114955231649),0x00007f6f0e1e645a=>Base.Profile.LineInfo(\"dsfmt_fill_array_close1_open2!\",\"dSFMT.jl\",67,\"\",-1,false,140114954970202),0x00007f6f0d1750d4=>Base.Profile.LineInfo(\"jl_load_file_string\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libjulia.so\",-1,\"\",-1,true,140114937729236),0x00007f6f0e226236=>Base.Profile.LineInfo(\"randn!\",\"random.jl\",1135,\"\",-1,false,140114955231798),0x00007f6f0d1eb217=>Base.Profile.LineInfo(\"jl_gc_allocobj\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libjulia.so\",-1,\"\",-1,true,140114938212887),0x00007f6d05f16025=>Base.Profile.LineInfo(\"dscal_k_SANDYBRIDGE\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libopenblas64_.so\",-1,\"\",-1,true,140106227867685),0x00007f6d071202aa=>Base.Profile.LineInfo(\"dsfmt_fill_array_close1_open2\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libdSFMT.so\",-1,\"\",-1,true,140106246783658),0x00007f6f0e22b929=>Base.Profile.LineInfo(\"reduced_dims\",\"reducedim.jl\",15,\"\",-1,false,140114955254057),0x00007f6f0d1dd203=>Base.Profile.LineInfo(\"???\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libjulia.so\",140114938155523,\"\",-1,true,140114938155523),0x00007f6f0e22ced0=>Base.Profile.LineInfo(\"fill!\",\"array.jl\",193,\"\",-1,false,140114955259600),0x00007f6cf73ab0d3=>Base.Profile.LineInfo(\"exp\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libopenlibm.so\",-1,\"\",-1,true,140105981014227),0x00007f6f0d1dde5c=>Base.Profile.LineInfo(\"???\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libjulia.so\",140114938158684,\"\",-1,true,140114938158684),0x00007f6f0e22cba9=>Base.Profile.LineInfo(\"mapreduce_seq_impl\",\"reduce.jl\",230,\"\",-1,false,140114955258793),0x00007f6cf73ab0ad=>Base.Profile.LineInfo(\"exp\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libopenlibm.so\",-1,\"\",-1,true,140105981014189),0x00007f6f0e237385=>Base.Profile.LineInfo(\"anonymous\",\"profile.jl\",16,\"\",-1,false,140114955301765),0x00007f6f0d1e36b0=>Base.Profile.LineInfo(\"???\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libjulia.so\",140114938181296,\"\",-1,true,140114938181296),0x00007f6d07120288=>Base.Profile.LineInfo(\"dsfmt_fill_array_close1_open2\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libdSFMT.so\",-1,\"\",-1,true,140106246783624),0x00007f6f0e22663d=>Base.Profile.LineInfo(\"randn_unlikely\",\"random.jl\",1127,\"\",-1,false,140114955232829),0x00007f6f0e22cbad=>Base.Profile.LineInfo(\"mapreduce_seq_impl\",\"simdloop.jl\",75,\"reduce.jl\",230,false,140114955258797),0x00007f6f0e22622c=>Base.Profile.LineInfo(\"randn!\",\"random.jl\",1135,\"\",-1,false,140114955231788),0x00007f6d05f15c98=>Base.Profile.LineInfo(\"???\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libopenblas64_.so\",140106227866776,\"\",-1,true,140106227866776),0x00007f6f0e22b5dd=>Base.Profile.LineInfo(\"mean\",\"statistics.jl\",31,\"\",-1,false,140114955253213),0x00007f6f0e22cb64=>Base.Profile.LineInfo(\"mapreduce_seq_impl\",\"simdloop.jl\",67,\"reduce.jl\",229,false,140114955258724),0x00007f6f0e22cb27=>Base.Profile.LineInfo(\"mapreduce_seq_impl\",\"simdloop.jl\",67,\"reduce.jl\",229,false,140114955258663),0x00007f6f0d1e14df=>Base.Profile.LineInfo(\"???\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libjulia.so\",140114938172639,\"\",-1,true,140114938172639),0x00007f6f0e22bed3=>Base.Profile.LineInfo(\"julia_sum!_22092\",\"\",-1,\"\",-1,true,140114955255507),0x00007f6f0e2261d2=>Base.Profile.LineInfo(\"randn!\",\"random.jl\",1135,\"\",-1,false,140114955231698),0x00007f6d07120301=>Base.Profile.LineInfo(\"dsfmt_fill_array_close1_open2\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libdSFMT.so\",-1,\"\",-1,true,140106246783745),0x00007f6f0d1e36d2=>Base.Profile.LineInfo(\"???\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libjulia.so\",140114938181330,\"\",-1,true,140114938181330),0x00007f6f0e22c5cb=>Base.Profile.LineInfo(\"sum!\",\"reducedim.jl\",258,\"\",-1,false,140114955257291),0x00007f6f0e22659e=>Base.Profile.LineInfo(\"randn_unlikely\",\"random.jl\",1126,\"\",-1,false,140114955232670),0x00007f6f0e22b855=>Base.Profile.LineInfo(\"fill!\",\"array.jl\",193,\"\",-1,false,140114955253845),0x00007f6f0d1e97e6=>Base.Profile.LineInfo(\"jl_gc_collect\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libjulia.so\",-1,\"\",-1,true,140114938206182),0x00007f6cf73ab089=>Base.Profile.LineInfo(\"exp\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libopenlibm.so\",-1,\"\",-1,true,140105981014153),0x00007f6f0e209c8c=>Base.Profile.LineInfo(\"include_string\",\"loading.jl\",282,\"\",-1,false,140114955115660),0x00007f6f0e22621c=>Base.Profile.LineInfo(\"randn!\",\"random.jl\",1135,\"\",-1,false,140114955231772),0x00007f6f0e19c02a=>Base.Profile.LineInfo(\"???\",\"???\",140114954666026,\"\",-1,true,140114954666026),0x00007f6f0e22cb5a=>Base.Profile.LineInfo(\"mapreduce_seq_impl\",\"simdloop.jl\",67,\"reduce.jl\",229,false,140114955258714),0x00007f6f0e226608=>Base.Profile.LineInfo(\"randn_unlikely\",\"random.jl\",1126,\"\",-1,false,140114955232776),0x00007f6d07120299=>Base.Profile.LineInfo(\"dsfmt_fill_array_close1_open2\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libdSFMT.so\",-1,\"\",-1,true,140106246783641),0x00007f6d071202db=>Base.Profile.LineInfo(\"dsfmt_fill_array_close1_open2\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libdSFMT.so\",-1,\"\",-1,true,140106246783707),0x00007f6f0e2261e1=>Base.Profile.LineInfo(\"randn!\",\"random.jl\",1135,\"\",-1,false,140114955231713),0x00007f6f0e226323=>Base.Profile.LineInfo(\"randn_unlikely\",\"random.jl\",1120,\"\",-1,false,140114955232035),0x00007f6d071202c9=>Base.Profile.LineInfo(\"dsfmt_fill_array_close1_open2\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libdSFMT.so\",-1,\"\",-1,true,140106246783689),0x00007f6f0e22cb88=>Base.Profile.LineInfo(\"mapreduce_seq_impl\",\"simdloop.jl\",67,\"reduce.jl\",229,false,140114955258760),0x00007f6f0e1fa749=>Base.Profile.LineInfo(\"anonymous\",\"task.jl\",447,\"\",-1,false,140114955052873),0x00007f6d0712029d=>Base.Profile.LineInfo(\"dsfmt_fill_array_close1_open2\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libdSFMT.so\",-1,\"\",-1,true,140106246783645),0x00007f6f0e22c833=>Base.Profile.LineInfo(\"_mapreducedim!\",\"reducedim.jl\",196,\"\",-1,false,140114955257907),0x00007f6f0d1cea95=>Base.Profile.LineInfo(\"rec_backtrace\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libjulia.so\",-1,\"\",-1,true,140114938096277),0x00007f6cf73ab095=>Base.Profile.LineInfo(\"exp\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libopenlibm.so\",-1,\"\",-1,true,140105981014165),0x00007f6f0e2261fd=>Base.Profile.LineInfo(\"randn!\",\"random.jl\",1135,\"\",-1,false,140114955231741),0x00007f6d04f0d682=>Base.Profile.LineInfo(\"dscal_64_\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libopenblas64_.so\",-1,\"\",-1,true,140106211055234),0x00007f6f0e2262e9=>Base.Profile.LineInfo(\"randn_unlikely\",\"random.jl\",1120,\"\",-1,false,140114955231977),0x00007f6f0e23736d=>Base.Profile.LineInfo(\"anonymous\",\"profile.jl\",16,\"\",-1,false,140114955301741),0x00007f6f0e226258=>Base.Profile.LineInfo(\"randn!\",\"random.jl\",1135,\"\",-1,false,140114955231832),0x00007f6f0e22b537=>Base.Profile.LineInfo(\"mean\",\"statistics.jl\",31,\"\",-1,false,140114955253047),0x00007f6f0be9e8a4=>Base.Profile.LineInfo(\"???\",\"/lib/x86_64-linux-gnu/libc.so.6\",140114917976228,\"\",-1,true,140114917976228),0x00007f6f0e226241=>Base.Profile.LineInfo(\"randn!\",\"random.jl\",1135,\"\",-1,false,140114955231809),0x00007f6f0d1dd243=>Base.Profile.LineInfo(\"???\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libjulia.so\",140114938155587,\"\",-1,true,140114938155587),0x00007f6f0e22ca76=>Base.Profile.LineInfo(\"mapreduce_pairwise_impl\",\"reduce.jl\",108,\"\",-1,false,140114955258486),0x00007f6f0e22ca46=>Base.Profile.LineInfo(\"mapreduce_pairwise_impl\",\"reduce.jl\",107,\"\",-1,false,140114955258438),0x00007f6f0d1e7587=>Base.Profile.LineInfo(\"???\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libjulia.so\",140114938197383,\"\",-1,true,140114938197383),0x00007f6d071202f9=>Base.Profile.LineInfo(\"dsfmt_fill_array_close1_open2\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libdSFMT.so\",-1,\"\",-1,true,140106246783737),0x00007f6f0d1cebc4=>Base.Profile.LineInfo(\"???\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libjulia.so\",140114938096580,\"\",-1,true,140114938096580),0x00007f6f0d1e45d5=>Base.Profile.LineInfo(\"???\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libjulia.so\",140114938185173,\"\",-1,true,140114938185173),0x00007f6f0e226254=>Base.Profile.LineInfo(\"randn!\",\"random.jl\",1135,\"\",-1,false,140114955231828),0x00007f6f0e22ca71=>Base.Profile.LineInfo(\"mapreduce_pairwise_impl\",\"reduce.jl\",108,\"\",-1,false,140114955258481),0x00007f6f0e22cb4f=>Base.Profile.LineInfo(\"mapreduce_seq_impl\",\"simdloop.jl\",67,\"reduce.jl\",229,false,140114955258703),0x00007f6f0ca843d0=>Base.Profile.LineInfo(\"???\",\"/lib/x86_64-linux-gnu/libpthread.so.0\",140114930451408,\"\",-1,true,140114930451408),0x00007f6d0712027b=>Base.Profile.LineInfo(\"dsfmt_fill_array_close1_open2\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libdSFMT.so\",-1,\"\",-1,true,140106246783611),0x00007f6f0d17317b=>Base.Profile.LineInfo(\"jl_apply_generic\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libjulia.so\",-1,\"\",-1,true,140114937721211),0x00007f6d071203b8=>Base.Profile.LineInfo(\"dsfmt_fill_array_close1_open2\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libdSFMT.so\",-1,\"\",-1,true,140106246783928),0x00007f6d071202b3=>Base.Profile.LineInfo(\"dsfmt_fill_array_close1_open2\",\"/usr/local/src/julia-2e358ce975/bin/../lib/julia/libdSFMT.so\",-1,\"\",-1,true,140106246783667),0x00007f6f0e22cb5d=>Base.Profile.LineInfo(\"mapreduce_seq_impl\",\"simdloop.jl\",67,\"reduce.jl\",229,false,140114955258717)),157x10 Array{ProfileView.TagData,2}:\n", " ProfileView.TagData(0x00007f6f0e1fa749,0) … ProfileView.TagData(0x0000000000000000,-1)\n", " ProfileView.TagData(0x00007f6f0e1fa749,0) ProfileView.TagData(0x0000000000000000,-1)\n", " ProfileView.TagData(0x00007f6f0e1fa749,0) ProfileView.TagData(0x0000000000000000,-1)\n", " ProfileView.TagData(0x00007f6f0e1fa749,0) ProfileView.TagData(0x0000000000000000,-1)\n", " ProfileView.TagData(0x00007f6f0e1fa749,0) ProfileView.TagData(0x0000000000000000,-1)\n", " ProfileView.TagData(0x00007f6f0e1fa749,0) … ProfileView.TagData(0x0000000000000000,-1)\n", " ProfileView.TagData(0x00007f6f0e1fa749,0) ProfileView.TagData(0x0000000000000000,-1)\n", " ProfileView.TagData(0x00007f6f0e1fa749,0) ProfileView.TagData(0x0000000000000000,-1)\n", " ProfileView.TagData(0x00007f6f0e1fa749,0) ProfileView.TagData(0x0000000000000000,-1)\n", " ProfileView.TagData(0x00007f6f0e1fa749,0) ProfileView.TagData(0x0000000000000000,-1)\n", " ProfileView.TagData(0x00007f6f0e1fa749,0) … ProfileView.TagData(0x0000000000000000,-1)\n", " ProfileView.TagData(0x00007f6f0e1fa749,0) ProfileView.TagData(0x0000000000000000,-1)\n", " ProfileView.TagData(0x00007f6f0e1fa749,0) ProfileView.TagData(0x0000000000000000,-1)\n", " ⋮ ⋱ \n", " ProfileView.TagData(0x00007f6f0e1fa749,0) … ProfileView.TagData(0x00007f6f0e22cb27,0) \n", " ProfileView.TagData(0x00007f6f0e1fa749,0) ProfileView.TagData(0x00007f6f0e22cb27,0) \n", " ProfileView.TagData(0x00007f6f0e1fa749,0) ProfileView.TagData(0x00007f6f0e22cb64,0) \n", " ProfileView.TagData(0x00007f6f0e1fa749,0) ProfileView.TagData(0x00007f6f0e22cb4f,0) \n", " ProfileView.TagData(0x00007f6f0e1fa749,0) ProfileView.TagData(0x00007f6f0e22cb4f,0) \n", " ProfileView.TagData(0x00007f6f0e1fa749,0) … ProfileView.TagData(0x00007f6f0e22cbad,0) \n", " ProfileView.TagData(0x00007f6f0e1fa749,0) ProfileView.TagData(0x0000000000000000,-1)\n", " ProfileView.TagData(0x00007f6f0e1fa749,0) ProfileView.TagData(0x0000000000000000,-1)\n", " ProfileView.TagData(0x00007f6f0e1fa749,0) ProfileView.TagData(0x0000000000000000,-1)\n", " ProfileView.TagData(0x00007f6f0e1fa749,0) ProfileView.TagData(0x0000000000000000,-1)\n", " ProfileView.TagData(0x00007f6f0e1fa749,0) … ProfileView.TagData(0x0000000000000000,-1)\n", " ProfileView.TagData(0x00007f6f0e1fa749,0) ProfileView.TagData(0x0000000000000000,-1),12)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Pkg.add(\"ProfileView\") # similar to install.packages() in R\n", "using ProfileView # similar to library(ProfileView) in R\n", "ProfileView.view()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case the results aren't very interesting but in a REPL session the bars in the image are live links to lines in the source code.\n", "\n", "Note that using one package may bring in others." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Base 26878 KB Module\n", " ColorTypes 253 KB Module\n", " Colors 577 KB Module\n", " Compat 225 KB Module\n", " Core 4712 KB Module\n", " FixedPointNumbers 32 KB Module\n", " IJulia 8189 KB Module\n", " IPythonDisplay 27 KB Module\n", " JSON 232 KB Module\n", " Main 47285 KB Module\n", " Nettle 57 KB Module\n", " ProfileView 529 KB Module\n", " ProfileViewSVG 17 KB Module\n", " Reexport 3628 bytes Module\n", " ZMQ 80 KB Module\n", " mn25n 7812 KB 1x1000000 Array{Float64,2}\n" ] } ], "source": [ "whos()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simulation of the distribution of sample statistics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An example I like to use for function creation is simulation of the distribution of sample statistics. For creating samples from various distributions we will use the Distributions package. I was going to use the Gadfly package for visualization but it doesn't seem to be happy today." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Pkg.add(\"Distributions\") # done previously\n", "using Distributions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the last section, we simulated the means of samples of size 25 from a standard normal distribution. We know that the distribution of this sample statistic should be a normal with mean zero and standard deviation 5." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "-7.607745026973842e-5" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mean(mn25n) # expect a value close to zero" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.19998354641851948" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "std(mn25n) # expect a value close to 0.2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `randn` function for sampling from a normal distribution is in the `Base` module. For other distributions we use the Distributions package. Check the documentation for the Distributions package to see all the possibilities.\n", "\n", "To write a general function for simulating the distribution of sample statistics we need the distribution from which to sample, the sample size, the statistic to evaluate and the number of replications to perform. A distribution object encompasses both the distribution type and the parameters of the distribution." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "DataType" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "typeof(Poisson)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "4-element Array{Any,1}:\n", " call(::Type{Distributions.Poisson}) at /home/bates/.julia/v0.4/Distributions/src/univariate/discrete/poisson.jl:25 \n", " call(::Type{Distributions.Poisson}, λ::Real) at /home/bates/.julia/v0.4/Distributions/src/univariate/discrete/poisson.jl:24\n", " call{T}(::Type{T}, arg) at essentials.jl:56 \n", " call{T}(::Type{T}, args...) at essentials.jl:57 " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "methods(Poisson)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Distributions.Poisson(λ=5.0)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = Poisson(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A function to replicate N times the evaluation of the statistic, s, on a sample of size n from distribution d can be written" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "sampstat (generic function with 1 method)" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function sampstat(N::Integer,s::Function,n::Integer,d::Distribution)\n", " samp = rand(d,n) # simulate the first sample\n", " v1 = s(samp) # evaluate the statistic\n", " res = Array(typeof(v1),(N,)) # create the result array\n", " res[1] = v1 # set the first element\n", " for i in 2:N\n", " res[i] = s(rand(d,n))\n", " end\n", " res\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It might be good to step through the evaluation just to check that everything is okay." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "median (generic function with 43 methods)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "N = 100; n = 20; s = median" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1x20 Array{Int64,2}:\n", " 8 8 5 4 1 3 4 13 2 4 5 3 5 5 5 5 6 5 5 5" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "samp = rand(d,n);\n", "samp'" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "5.0" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v1 = s(samp)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "5.0" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res = Array(typeof(v1),N);\n", "res[1] = v1" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for i in 2:N\n", " res[i] = s(rand(d,n))\n", "end" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1x100 Array{Float64,2}:\n", " 5.0 5.5 6.0 4.5 4.0 6.0 4.0 5.0 … 5.0 5.0 5.0 6.0 5.0 4.0 4.5" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we generate a large sample and check the timing" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 1.110126 seconds (6.01 M allocations: 572.455 MB, 7.03% gc time)\n" ] } ], "source": [ "@time med20p = sampstat(1_000_000,median,20,d);" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1x1000000 Array{Float64,2}:\n", " 4.0 4.0 5.0 5.0 4.0 4.0 4.5 5.0 … 4.5 2.5 5.0 5.0 4.0 5.5 4.0" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "med20p'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that a large amount of storage is allocated and about 25% of the execution time is in garbage collection. Because the function is fairly simple we can guess that the allocation is taking place in the call to `rand`. For each of the 1,000,000 replications a new vector of 20 Int64 values is being allocated, populated, passed to the function `s` then released to be garbage collected later.\n", "\n", "Julia allows for __mutating__ functions that change the value of one or more of their arguments. This may seem heretical to R programmers but it can be very useful. Tuning the performance of long-running functions often comes down to finding out where temporary storage is being allocated and avoiding doing that.\n", "\n", "By convention, the names of mutating functions end in `!` so `rand` allocates new storage whereas `rand!` uses storage passed to it. Fortunately the vector `samp` has already been allocated and we can reuse it." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1x20 Array{Int64,2}:\n", " 8 8 5 4 1 3 4 13 2 4 5 3 5 5 5 5 6 5 5 5" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "samp'" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1x20 Array{Int64,2}:\n", " 3 6 6 4 3 5 3 6 6 4 4 8 4 6 5 3 7 5 5 5" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rand!(d,samp)'" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1x20 Array{Int64,2}:\n", " 3 6 6 4 3 5 3 6 6 4 4 8 4 6 5 3 7 5 5 5" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "samp'" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "sampstat1 (generic function with 1 method)" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function sampstat1(N::Integer,s::Function,n::Integer,d::Distribution)\n", " samp = rand(d,n) # simulate the first sample\n", " v1 = s(samp) # evaluate the statistic\n", " res = Array(typeof(v1),(N,)) # create the result array\n", " res[1] = v1 # set the first element\n", " for i in 2:N\n", " res[i] = s(rand!(d,samp))\n", " end\n", " res\n", "end" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 1.032929 seconds (5.00 M allocations: 366.426 MB, 2.46% gc time)\n" ] }, { "data": { "text/plain": [ "1x1000000 Array{Float64,2}:\n", " 5.0 5.0 5.0 5.0 4.5 5.0 4.5 5.0 … 5.0 5.0 5.0 6.0 4.5 6.5 5.0" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@time med20p1 = sampstat1(1_000_000,s,n,d)'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The reuse of the `samp` vector has helped to reduce the amount of storage allocated and the time spent in garbage collection but not substantially. A less obvious source of allocation is the `median` function. The non-mutating version must take a copy of the vector to do the (partial) sort evaluating the median. We don't need to preserve the sample because we are going to overwrite it in the next iteration anyway." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.967369 seconds (4.00 M allocations: 152.581 MB, 1.40% gc time)\n" ] }, { "data": { "text/plain": [ "1x1000000 Array{Float64,2}:\n", " 6.0 5.0 5.0 5.0 5.0 6.0 5.0 4.0 … 4.0 6.5 6.0 4.0 5.0 5.0 4.0" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@time med20p1 = sampstat1(1_000_000,median!,n,d)'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We should, of course, check that the results are consistent by setting the random number generator seed and comparing the results from the two `sampstat` functions and the mutating and non-mutating `median` functions." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "true" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "srand(1234321)\n", "s1 = sampstat(100_000,median,n,d);\n", "srand(1234321)\n", "s2 = sampstat(100_000,median!,n,d);\n", "srand(1234321)\n", "s3 = sampstat1(100_000,median,n,d);\n", "srand(1234321)\n", "s4 = sampstat1(100_000,median!,n,d);\n", "all(s1 .== s2 .== s3 .== s4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Syntax of function (actually method) definitions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have seen that a method is defined by a block of the form\n", "```\n", "function (arg1::Type1,arg2::Type2,...,argn::Typen)\n", " ...\n", "end\n", "```\n", "\n", "Be careful not to let your R instincts take over and write\n", "```\n", "functionanme = function(arg1::Type1,...,argn::Typen)\n", " ...\n", "end\n", "```\n", "That is not syntactically wrong but it isn't what you want. (The gory details are that it creates an anonymous function, which is generally less effective because of the way the compiler and type inference operates, then assigning this anonymous function to a name.)\n", "\n", "The type annotations are optional. Obviously when you are defining methods that are distinguished by the types of their arguments you use them. Other times you may want to use them to validate the arguments. In `sampstat` the argument `s` is applied as a function so it should be a function. The argument `d` is passed to `rand` or `rand!` and it could be something other than a Distribution but we want a Distribution here. It wouldn't make sense for `n` and `N` to be other than integers as they represent sizes. The `Integer` type is an abstract type with several subtypes. " ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "DataType" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "typeof(Integer)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "4-element Array{Any,1}:\n", " BigInt \n", " Bool \n", " Signed \n", " Unsigned" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "subtypes(Integer)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Those coming to Julia with a background in strongly typed languages (C, C++, Java) tend to overspecify the argument types. Generally I use argument type declarations when I want to distinguish methods and to validate the argument types. Frequently in R I will begin a function with\n", "```s\n", "f <- function(a, b, c) {\n", " stopifnot(is.integer(a), length(a) == 1L, ...)\n", " ...\n", "}\n", "```\n", "to validate the types of the actual arguments. In Julia I specify the types that I want in the argument list. If other types are passed the evaluation of the function fails to find an appropriate method.\n", "\n", "The value of a function evaluation is the last object evaluated. There is also a `return` directive which can be useful in short-circuiting evaluation.\n", "\n", "A method that consists of a single expression can be written with an `=` sign. For example, the `insupport` generic checks whether a value `x` is in the support of a distribution `d`. The method for the Binomial type is\n", "```julia\n", "insupport(d::Binomial, x::Real) = isinteger(x) && 0 <= x <= d.size\n", "```\n", "If we want to check for all of a vector's elements being in the support we could write" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "vsupport (generic function with 1 method)" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function vsupport(d::Binomial,v::Vector)\n", " for el in v\n", " insupport(d, el) || return false\n", " end\n", " true\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Doing things this way if you have a million values to check and the second one fails you don't bother checking the rest.\n", "\n", "An short anonymous function can be written with the \"stabby lambda syntax\". To apply the logit transform to a vector of values in (0,1) we could write" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
μη
10.0944217798994027-2.260801891104243
20.93661110038766872.692979387335428
30.2583267373061602-1.0546835920616155
40.93092369420190322.600965538110032
50.55528329645979730.22204096940697216
60.87150986193960131.9143750261699557
70.04155298306875843-3.138344970491189
80.96877905283180413.4349473171034735
90.65356571134415730.6347499234293605
100.4581009073306117-0.16799032669054131
" ], "text/plain": [ "10×2 DataFrames.DataFrame\n", "│ Row │ μ │ η │\n", "├─────┼───────────┼──────────┤\n", "│ 1 │ 0.0944218 │ -2.2608 │\n", "│ 2 │ 0.936611 │ 2.69298 │\n", "│ 3 │ 0.258327 │ -1.05468 │\n", "│ 4 │ 0.930924 │ 2.60097 │\n", "│ 5 │ 0.555283 │ 0.222041 │\n", "│ 6 │ 0.87151 │ 1.91438 │\n", "│ 7 │ 0.041553 │ -3.13834 │\n", "│ 8 │ 0.968779 │ 3.43495 │\n", "│ 9 │ 0.653566 │ 0.63475 │\n", "│ 10 │ 0.458101 │ -0.16799 │" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "μ = rand(10);\n", "η = map(p -> log(p/(1-p)), μ);\n", "using DataFrames\n", "DataFrame(μ = μ, η = η)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Many of the reduction functions (`any`, `all`, `sum`, `prod`, etc.) can take a function as the first argument. In that form `any` and `all` have the short-circuiting behavior so the `vsupport` function could be written" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "vsupport (generic function with 2 methods)" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vsupport(d::Distribution, v) = any(x -> insupport(d, x), v)" ] } ], "metadata": { "kernelspec": { "display_name": "Julia 0.4.6", "language": "julia", "name": "julia-0.4" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "0.4.6" } }, "nbformat": 4, "nbformat_minor": 0 }