{ "metadata": { "name": "", "signature": "sha256:74c8e7fb80d34a84df949565029c27e03a8b0c1502901ae4ae8447df0bf27c28" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# We're going to start with a (compressed) Cython tutorial\n", "http://docs.cython.org/src/tutorial/cython_tutorial.html\n", "##Cython is a superset of python which adds typing and additional features to the python language to help the cython interpreter convert the cython code to C/C++. This can allow users to convert slow Python code to fast C/C++ code with minimal changes. It can also be used to wrap existing C/C++/Fortran code.\n", "\n", "lets look at a simple prime number finder in python and cython:\n", "```\n", "python simplesetup.py build_ext --inplace\n", "```\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import simplecy, simplepy\n", "simplepy.primes(30) == simplecy.primes(30)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 1, "text": [ "array([ True, True, True, True, True, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True, True, True, True, True, True, True, True,\n", " True, True, True], dtype=bool)" ] } ], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit simplecy.primes(30)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "100000 loops, best of 3: 3.32 \u00b5s per loop\n" ] } ], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit simplepy.primes(30)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1000 loops, best of 3: 907 \u00b5s per loop\n" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A simple example with a 300x speedup!\n", "\n", "Let's take a quick look at the C source. The interesting part starts around line 600 (on Cython 21.1) reproduced in part below:\n", "\n", "```c++\n", "\n", "static PyObject *__pyx_pf_8simplecy_primes(CYTHON_UNUSED PyObject *__pyx_self, int __pyx_v_kmax) {\n", " int __pyx_v_n;\n", " int __pyx_v_k;\n", " int __pyx_v_i;\n", " int __pyx_v_p[1000];\n", " PyObject *__pyx_v_result = NULL;\n", " PyObject *__pyx_r = NULL;\n", " __Pyx_RefNannyDeclarations\n", " PyObject *__pyx_t_1 = NULL;\n", " int __pyx_t_2;\n", " int __pyx_t_3;\n", " int __pyx_t_4;\n", " int __pyx_lineno = 0;\n", " const char *__pyx_filename = NULL;\n", " int __pyx_clineno = 0;\n", " __Pyx_RefNannySetupContext(\"primes\", 0);\n", "\n", " /* \"simplecy.pyx\":7\n", " * cdef int n, k, i\n", " * cdef int p[1000]\n", " * result = [] # <<<<<<<<<<<<<<\n", " * if kmax > 1000:\n", " * kmax = 1000\n", " */\n", " __pyx_t_1 = PyList_New(0); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 7; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n", " __Pyx_GOTREF(__pyx_t_1);\n", " __pyx_v_result = ((PyObject*)__pyx_t_1);\n", " __pyx_t_1 = 0;\n", "\n", " /* \"simplecy.pyx\":8\n", " * cdef int p[1000]\n", " * result = []\n", " * if kmax > 1000: # <<<<<<<<<<<<<<\n", " * kmax = 1000\n", " * k = 0\n", " */\n", " __pyx_t_2 = ((__pyx_v_kmax > 1000) != 0);\n", " if (__pyx_t_2) {\n", "\n", " /* \"simplecy.pyx\":9\n", " * result = []\n", " * if kmax > 1000:\n", " * kmax = 1000 # <<<<<<<<<<<<<<\n", " * k = 0\n", " * n = 2\n", " */\n", " __pyx_v_kmax = 1000;\n", " goto __pyx_L3;\n", " }\n", " __pyx_L3:;\n", "\n", " /* \"simplecy.pyx\":10\n", " * if kmax > 1000:\n", " * kmax = 1000\n", " * k = 0 # <<<<<<<<<<<<<<\n", " * n = 2\n", " * while k < kmax:\n", " */\n", " __pyx_v_k = 0;\n", "\n", " /* \"simplecy.pyx\":11\n", " * kmax = 1000\n", " * k = 0\n", " * n = 2 # <<<<<<<<<<<<<<\n", " * while k < kmax:\n", " * i = 0\n", " */\n", " __pyx_v_n = 2;\n", "\n", " /* \"simplecy.pyx\":12\n", " * k = 0\n", " * n = 2\n", " * while k < kmax: # <<<<<<<<<<<<<<\n", " * i = 0\n", " * while i < k and n % p[i] != 0:\n", " */\n", " while (1) {\n", " __pyx_t_2 = ((__pyx_v_k < __pyx_v_kmax) != 0);\n", " if (!__pyx_t_2) break;\n", "\n", " /* \"simplecy.pyx\":13\n", " * n = 2\n", " * while k < kmax:\n", " * i = 0 # <<<<<<<<<<<<<<\n", " * while i < k and n % p[i] != 0:\n", " * i = i + 1\n", " */\n", " __pyx_v_i = 0;\n", "\n", " /* \"simplecy.pyx\":14\n", " * while k < kmax:\n", " * i = 0\n", " * while i < k and n % p[i] != 0: # <<<<<<<<<<<<<<\n", " * i = i + 1\n", " * if i == k:\n", " */\n", " \n", "\n", " /* \"simplecy.pyx\":21\n", " * result.append(n)\n", " * n = n + 1\n", " * return result # <<<<<<<<<<<<<<\n", " */\n", " __Pyx_XDECREF(__pyx_r);\n", " __Pyx_INCREF(__pyx_v_result);\n", " __pyx_r = __pyx_v_result;\n", " goto __pyx_L0;\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###Writing good (ie. fast) Cython code can be non-trivial. Sometimes the general rule of just add static typing everywhere can make things SLOWER!\n", "\n", "###We'll leave the details of fast cython code for another time (Ross mentioned briefly the basic tools to let you know how much C vs. Python code is being called)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#Wrapping Existing (or new) C++ and Fortran\n", "###Often the simplest strategy for writing fast code is to write the compute intensive functions in C++ or Fortran and then interface to them in Python. For some simple tasks where the inputs are simple and the number of calls are limited the python ctypes module is great (though it has many problems and haters). For more complex inputs or situations where making many calls to a function is needed Cython offers a great alternative to writing your own C/C++ interface.\n", "\n", "###We will look at wrapping a C++ and a Fortran function (from PyNE http://pyne.io)\n", "\n", "### These two functions implement the same functionality - a conversion from string to float. We will wrap the C++ directly with Cython and wrap the Fortran version by first implementing a C++ API and then wrapping that (see f2py for a more automated solution)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import endftod" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit endftod.endftod_cpp(' 2.040000+2')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1000000 loops, best of 3: 545 ns per loop\n" ] } ], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit endftod.endftod_f(' 2.040000+2')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1000000 loops, best of 3: 1.59 \u00b5s per loop\n" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "endftod.endftod_f(' 2.040000+2') == endftod.endftod_f('2.040000E+2')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ "True" ] } ], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "endftod.endftod_cpp(' 2.040000+2') == endftod.endftod_cpp('2.040000E+2')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ "False" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "#Now that we've gone over cython in a bit more detail lets look at Memoryviews - a better way of accessing numpy arrays in cython\n", "\n", "from the docs :\n", "\n", "Typed memoryviews allow efficient access to memory buffers, such as those underlying NumPy arrays, without incurring any Python overhead. Memoryviews are similar to the current NumPy array buffer support (np.ndarray[np.float64_t, ndim=2]), but they have more features and cleaner syntax.\n", "\n", "Memoryviews are more general than the old NumPy array buffer support, because they can handle a wider variety of sources of array data. For example, they can handle C arrays and the Cython array type (Cython arrays).\n", "\n", "A memoryview can be used in any context (function parameters, module-level, cdef class attribute, etc) and can be obtained from nearly any object that exposes writable buffer through the PEP 3118 buffer interface\n", "\n", "Now lets look at what this can do for cython (via some notebooks by Jake Vanderplas)\n", "\n", "http://docs.cython.org/src/userguide/memoryviews.html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#We can put all this together \n", "\n", "```cython\n", "from __future__ import print_function\n", "from cython.parallel import prange\n", "import cython\n", "import numpy as np\n", "cimport numpy as np\n", "from libc.stdlib cimport free\n", "\n", "cdef extern double trapfilter(int *, int, int, int, int, int, double, int, int) nogil\n", "\n", "@cython.boundscheck(False)\n", "@cython.wraparound(False)\n", "def trapfilterlots(np.ndarray inputarr not None, int peaking,\n", " int gap, int M, int len_in, double N):\n", " \"\"\"\n", " Filter a 2-d array of pulses. return heights and polarities. currently uses\n", " first 1200 points for baseline subtraction\n", " \"\"\"\n", " cdef int [:] avg = np.average(inputarr[:, :1200], 1).astype(np.int32)\n", " cdef int [:, :] inarr = np.ascontiguousarray(inputarr,'i4')\n", " cdef double [:] final = np.zeros(inarr.shape[0])\n", " cdef int [:] pol = np.zeros(inarr.shape[0], dtype=np.int32)\n", " cdef Py_ssize_t i\n", " cdef int m = inarr.shape[1]\n", " for i in prange(inarr.shape[0], nogil=True):\n", " pol[i] = 1\n", " if inarr[i][2200] - avg[i] < 0:\n", " pol[i] = -1\n", " final[i] = trapfilter(&inarr[i][0], peaking,\n", " gap, M, m, len_in, N, avg[i], pol[i])\n", " return np.asarray(final), np.asarray(pol)\n", "```" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }