{ "metadata": { "language": "Julia", "name": "", "signature": "sha256:bca8e2ee1f2efc021aa64eea50c39f5a85d7b59e56ffd1b1aa29f04c81ef9e44" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "NUFFT with Julia" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Julia](http://julialang.org/) language offers an interesting alternative to python when crunching numbers. Python has several ways to improve its inherently low performance, such as numpy, cython, or numba. In an excellent [post](https://jakevdp.github.io/blog/2015/02/24/optimizing-python-with-numpy-and-numba/) Jake Vanderplas discusses how to use numba to achieve a performance similar to Fortran, without writing any cython or Fortran code. The results are amazing. Because the philosophies behind Julia and numba are similar, I wanted to see how Julia would perform. As I am new to Julia, this was also a way to learn the language, and I am sure I have made several mistakes that I would appreciate if you readers can point out. \n", "\n", "We'll start with a Direct Fourier Transform (DFT). We'll import some python things into Julia, to compare different results." ] }, { "cell_type": "code", "collapsed": false, "input": [ "using PyCall\n", "@pyimport nufft as nufft_fortran\n", "@pyimport numpy as np" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stderr", "text": [ "Warning: imported binding for transpose overwritten in module __anon__\n" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we define 4 versions of the DFT. The first one uses Python and is as close as I could get to In[1] in Jake's post. The other two use element-wise operations instead of whole-array operations. They just differ in the loop order, which in this case is irelevant. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "function nufftfreqs(M,df=1.0)\n", " \"\"\"Compute the frequency range used in nufft for M frequency bins\"\"\"\n", " df * [-fld(M, 2): M - fld(M, 2)-1]\n", "end\n", "\n", "function nudftpy(x, y, M, df=1.0, iflag=1)\n", " \"\"\"Non-Uniform Direct Fourier Transform. Using numpy\"\"\"\n", " sign = iflag < 0 ? -1 : 1\n", " (1 / length(x)) * np.dot(y, np.exp(sign * 1im * x*transpose(nufftfreqs(M, df))))\n", "end\n", "\n", "function nudft(x, y, M, df=1.0, iflag=1)\n", " \"\"\"Non-Uniform Direct Fourier Transform. Using whole array operations\"\"\"\n", " sign = iflag < 0 ? -1 : 1\n", " (1 / length(x)) * *(transpose(y''), exp(sign * 1im * x*transpose(nufftfreqs(M, df))))\n", "end\n", "\n", "function nudft2(x::Vector, y::Vector, M::Int, df=1.0, iflag=1)\n", " \"\"\"Non-Uniform Direct Fourier Transform. Using comprehensions\"\"\"\n", " freqs = nufftfreqs(M, df)\n", " sign = iflag < 0 ? -1 : 1\n", " n = size(x,1)\n", " m = size(freqs, 1)\n", " r = (1 / n) * [y[i]*exp(sign*1im*x[i]*freqs[j]) for i=1:n, j=1:m]\n", " sum(r,1)\n", "end\n", "\n", "function nudft3(x::Vector, y::Vector, M::Int, df=1.0, iflag=1)\n", " \"\"\"Non-Uniform Direct Fourier Transform. Using comprehensions, summing different axis\"\"\"\n", " freqs = nufftfreqs(M, df)\n", " sign = iflag < 0 ? -1 : 1\n", " n = size(x,1)\n", " m = size(freqs, 1)\n", " r = (1 / n) * [y[i]*exp(sign*1im*x[i]*freqs[j]) for j=1:m, i=1:n]\n", " transpose(sum(r,2))\n", "end\n" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ "nudft3 (generic function with 3 methods)" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we test them:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "x = 100 * rand(1000)\n", "y = sin(x)\n", "Y0 = @time nudftpy(x, y, 1000) \n", "Y1 = @time nudft(x, y, 1000)\n", "Y2 = @time nudft2(x, y, 1000)\n", "Y3 = @time nudft3(x, y, 1000)\n", "Yf = @time nufft_fortran.nufft1(x, y, 1000)\n", "print([np.allclose(Y0, Yf), np.allclose(Y1, Yf),np.allclose(Y2, Yf),np.allclose(Y3, Yf)])" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "elapsed time: 0." ] }, { "output_type": "stream", "stream": "stdout", "text": [ "272123779 seconds (32289652 bytes allocated)\n", "elapsed time: " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "0.211703597 seconds (32309564 bytes allocated, 26.77% gc time)\n", "elapsed time: " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "0.14629479 seconds (32874852 bytes allocated)\n", "elapsed time: " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "0.207123126 seconds (32918144 bytes allocated, 28.28% gc time)\n", "elapsed time: 0.002116402 seconds (19784 bytes allocated)\n", "Bool[true,true,true,true]" ] } ], "prompt_number": 263 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The results agree! But they are incredibly inefficient. The functions are also slower than the same numpy function called from python (about twice as fast in python). Probably there is some overhead in calling python from Julia. Besides the `@time` macro in Julia works differently than the `@timeit` magicfunction...\n", "\n", "Our real interest is comparing Julia with numba, so I'll go on. Here is an FFT implementation of the `nuft_numpy` function of Jake's post (In[4]). To be easily identifiable, I've kept the same doc-strings and function names, which kind-of make no sense here..." ] }, { "cell_type": "code", "collapsed": false, "input": [ "function _compute_grid_params(M, epsilon)\n", " # Choose Msp & tau from eps following Dutt & Rokhlin (1993)\n", " if epsilon <= 1E-33 || epsilon >= 1E-1\n", " error(@sprintf(\"eps = %f; must satisfy 1e-33 < eps < 1e-1.\", epsilon))\n", " end\n", " ratio = epsilon > 1E-11 ? 2 : 3\n", " Msp = itrunc(-log(epsilon) / (pi * (ratio - 1) / (ratio - 0.5)) + 0.5)\n", " Mr = max(ratio * M, 2 * Msp)\n", " lambda_ = Msp / (ratio * (ratio - 0.5))\n", " tau = pi * lambda_ / M^2\n", " Msp, Mr, tau\n", "end\n", "\n", "function nufft_python(x, c, M, df=1.0, epsilon=1E-15, iflag=1):\n", " \"\"\"Fast Non-Uniform Fourier Transform with Python\"\"\"\n", " Msp, Mr, tau = _compute_grid_params(M, epsilon)\n", " N = length(x)\n", "\n", " # Construct the convolved grid\n", " ftau = zeros(typeof(c[1]), Mr)\n", " Mr = size(ftau,1)\n", " hx = 2pi / Mr\n", " mm = [-Msp:Msp-1]\n", " for i=1:N\n", " xi = (x[i] * df) % (2 * pi)\n", " m = 1 + div(xi,hx)\n", " spread = exp(-0.25 * (xi - hx * (m + mm)).^2 / tau)\n", " ftau[1+mod(m + mm, Mr)] += c[i] * spread\n", " end\n", " # Compute the FFT on the convolved grid\n", " if iflag < 0\n", " Ftau = (1 / Mr) * fft(ftau)\n", " else\n", " Ftau = ifft(ftau)\n", " end\n", " Ftau = [Ftau[end-div(M,2)+1:end], Ftau[1:div(M,2)+M%2]]\n", " # Deconvolve the grid using convolution theorem\n", " k = nufftfreqs(M)\n", " (Ftau.*(1 / N) * sqrt(pi / tau)).* exp(tau * k.^2)\n", "end" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 1, "text": [ "nufft_python (generic function with 4 methods)" ] } ], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Following again Jake's post, I write a couple of functions to test our implementations. They are a litteral translation from the Python code." ] }, { "cell_type": "code", "collapsed": false, "input": [ "function test_nufft(nufft_func, M=1000, Mtime=100000)\n", " # Test vs the direct method\n", " print(repeat(\"-\",30), \"\\n\")\n", " print(\"testing \",nufft_func, \"\\n\")\n", " x = 100 * rand(M + 1)\n", " y = sin(x)\n", " for df in [1.0, 2.0]\n", " for iflag in [1, -1]\n", " F1 = nudft(x, y, M, df, iflag)\n", " F2 = nufft_func(x, y, M, df, 1E-15, iflag)\n", " assert(all(x -> isapprox(x...), zip(F1, F2)))\n", " end\n", " end\n", " print(\"- Results match the DFT\\n\")\n", " \n", " # Time the nufft function\n", " x = 100 * rand(Mtime)\n", " y = sin(x)\n", " times = Float64[]\n", " for i = 1:5\n", " tic()\n", " F = nufft_func(x, y, Mtime)\n", " t1 = toq()\n", " push!(times,t1)\n", " end\n", " @printf(\"- Execution time (M=%d): %.2f sec\\n\",Mtime, median(times))\n", "end" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 67, "text": [ "test_nufft (generic function with 3 methods)" ] } ], "prompt_number": 67 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's test it:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "test_nufft(nufft_python)\n", "test_nufft(nufft_fortran.nufft1)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "------------------------------\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "testing nufft_python\n", "- Results match the DFT\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "- Execution time (M=" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "100000): 1.07 sec\n", "------------------------------\n", "testing fn\n", "- Results match the DFT\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "- Execution time (M=" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "100000): 0.12 sec\n" ] } ], "prompt_number": 71 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The results are an order of magnitude slower than the fortran code. They are about three times faster than the numpy_python code in my computer (3.7 sec). Good! So pure Julia is faster than simple python!\n", "\n", "Python spends most of the time in the loop. This can be improved using numpy `add` function. As this function does not exist in Julia, I used a loop instead. The result is pretty cumbersome, but it was just a game to see if I could get something close to the numpy version." ] }, { "cell_type": "code", "collapsed": false, "input": [ "function nufft_numpy(x, y, M, df=1.0, epsilon=1E-15, iflag=1):\n", " \"\"\"Fast Non-Uniform Fourier Transform\"\"\"\n", " Msp, Mr, tau = _compute_grid_params(M, epsilon)\n", " N = length(x)\n", " # Construct the convolved grid\n", " ftau = zeros(typeof(y[1]), Mr)\n", " hx = 2pi / Mr\n", " xmod = map(mod2pi, x*df)\n", " m = 1+ int(xmod/hx)\n", " mm = [-Msp:Msp-1]\n", " mpmm = broadcast(+, transpose(m), mm)\n", " spread = broadcast(*, exp(-0.25 * (transpose(xmod).- hx*mpmm).^ 2 / tau), transpose(y))\n", " for (i,s) in zip(map(xi->1+mod(xi, Mr), mpmm), spread)\n", " ftau[i] += s\n", " end\n", " # Compute the FFT on the convolved grid\n", " if iflag < 0\n", " Ftau = (1 / Mr) * fft(ftau)\n", " else\n", " Ftau = ifft(ftau)\n", " end\n", " Ftau = [Ftau[end-div(M,2)+1:end], Ftau[1:div(M,2)+M%2]]\n", " # Deconvolve the grid using convolution theorem\n", " k = nufftfreqs(M)\n", " (Ftau.*(1 / N) * sqrt(pi / tau)).* exp(tau * k.^2)\n", "end" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 69, "text": [ "nufft_numpy (generic function with 4 methods)" ] } ], "prompt_number": 69 }, { "cell_type": "code", "collapsed": false, "input": [ "test_nufft(nufft_numpy)\n", "test_nufft(nufft_fortran.nufft1)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "------------------------------\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "testing nufft_numpy\n", "- Results match the DFT\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "- Execution time (M=" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "100000): 4.58 sec\n", "------------------------------\n", "testing fn\n", "- Results match the DFT\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "- Execution time (M=" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "100000): 0.12 sec\n" ] } ], "prompt_number": 70 }, { "cell_type": "markdown", "metadata": {}, "source": [ "So our attempt to emulate the numpy setup was a disaster. This could have been expected. After all, in python we were trying to remove a for loop, but these loops are not inherently slower in Julia, so that the complicated broadcasting resulted in a degradation of performance. It's nice to see that complex code works worse!\n", "\n", "Let's see if the numba code results in more efficient Julia code. This is the line-by-line translation of the numba code (In[11]):" ] }, { "cell_type": "code", "collapsed": false, "input": [ "function build_grid(x, c, tau, Msp, ftau)\n", " Mr = size(ftau,1)\n", " hx = 2pi / Mr\n", " for i=1:size(x,1)\n", " xi = mod2pi(x[i])\n", " m = 1 + int(xi/hx)\n", " for mm=-Msp:Msp-1\n", " ftau[1 + mod((m + mm) , Mr)] += c[i] * exp(-0.25 * (xi - hx * (m + mm))^2 / tau)\n", " end\n", " end\n", " ftau\n", "end\n", "\n", "function nufft_numba(x, c, M, df=1.0, eps=1E-15, iflag=1)\n", " \"\"\"Fast Non-Uniform Fourier Transform from Python numba code\"\"\"\n", " Msp, Mr, tau = _compute_grid_params(M, eps)\n", " N = length(x)\n", "\n", " # Construct the convolved grid\n", " ftau = build_grid(x * df, c, tau, Msp, zeros(typeof(c[1]), Mr))\n", "\n", " # Compute the FFT on the convolved grid\n", " if iflag < 0\n", " Ftau = (1 / Mr) * fft(ftau)\n", " else\n", " Ftau = ifft(ftau)\n", " end\n", " Ftau = [Ftau[end-div(M,2)+1:end], Ftau[1:div(M,2)+M%2]]\n", "\n", " # Deconvolve the grid using convolution theorem\n", " k = nufftfreqs(M)\n", " (1 / N) * sqrt(pi / tau) .* exp(tau * k.^2).*Ftau\n", "end" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 72, "text": [ "nufft_numba (generic function with 4 methods)" ] } ], "prompt_number": 72 }, { "cell_type": "code", "collapsed": false, "input": [ "test_nufft(nufft_numba)\n", "test_nufft(nufft_fortran.nufft1)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "------------------------------\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "testing nufft_numba\n", "- Results match the DFT\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "- Execution time (M=" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "100000): 0.32 sec\n", "------------------------------\n", "testing fn\n", "- Results match the DFT\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "- Execution time (M=" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "100000): 0.12 sec\n" ] } ], "prompt_number": 73 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Last, we can potentially gain some more speed pre-calculating the exponentials. Again, this is a pure translation of Jake's code." ] }, { "cell_type": "code", "collapsed": false, "input": [ "function build_grid_fast(x, c, tau, Msp, ftau, E3)\n", " Mr = size(ftau,1)\n", " hx = 2pi / Mr\n", " # precompute some exponents\n", " for j=0:Msp\n", " E3[j+1] = exp(-(pi * j / Mr)^2 / tau)\n", " end\n", " # spread values onto ftau\n", " for i=1:size(x,1)\n", " xi = mod2pi(x[i])\n", " m = 1 + int(xi/hx)\n", " xi = xi - hx * m\n", " E1 = exp(-0.25 * xi^2 / tau)\n", " E2 = exp((xi * pi) / (Mr * tau))\n", " E2mm = 1\n", " for mm=0:Msp-1\n", " ftau[1+mod((m + mm) , Mr)] += c[i] * E1 * E2mm * E3[mm+1]\n", " E2mm *= E2\n", " ftau[1+mod((m - mm - 1) , Mr)] += c[i] * E1 / E2mm *E3[mm+2]\n", " end\n", " end\n", " ftau\n", "end\n", "\n", "function nufft_numba_fast(x, c, M, df=1.0, eps=1E-15, iflag=1)\n", " \"\"\"Fast Non-Uniform Fourier Transform from Python numba code\"\"\"\n", " Msp, Mr, tau = _compute_grid_params(M, eps)\n", " N = length(x)\n", "\n", " # Construct the convolved grid\n", " ftau = build_grid_fast(x * df, c, tau, Msp, \n", " zeros(typeof(c[1]), Mr), zeros(typeof(x[1]), Msp+1) )\n", "\n", " # Compute the FFT on the convolved grid\n", " if iflag < 0\n", " Ftau = (1 / Mr) * fft(ftau)\n", " else\n", " Ftau = ifft(ftau)\n", " end\n", " Ftau = [Ftau[end-div(M,2)+1:end], Ftau[1:div(M,2)+M%2]]\n", "\n", " # Deconvolve the grid using convolution theorem\n", " k = nufftfreqs(M)\n", " (1 / N) * sqrt(pi / tau) .* exp(tau * k.^2).*Ftau\n", "end" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 289, "text": [ "nufft_numba_fast (generic function with 4 methods)" ] } ], "prompt_number": 289 }, { "cell_type": "code", "collapsed": false, "input": [ "test_nufft(nufft_numba_fast)\n", "test_nufft(nufft_fortran.nufft1)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "------------------------------\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "testing nufft_numba_fast\n", "- Results match the DFT\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "- Execution time (M=" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "100000): 0.70 sec\n", "------------------------------\n", "testing fn\n", "- Results match the DFT\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "- Execution time (M=" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "100000): 0.12 sec\n" ] } ], "prompt_number": 290 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here I am surprised to see that the performance is worse than before. I really don't see any reason for that, and of course, a profiler should be the next step. Whereas the `nufft_numba_fast` in python is almost as efficient as the fortran code (0.14 sec vs. 0.11 sec), with Julia it is 100% slower than the simpler `nufft_numba`.\n", "\n", "Conclusion? Julia is easy and powerful, but for those used to python, numba is a great alternative that can produce even faster code with less effort (for a Python programmer).\n", "\n", "As I am new to Julia I may have made several mistakes and I would appreciate if readers can point them out." ] } ], "metadata": {} } ] }