{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# A Python Tour of Data Science: High Performance Computing\n", "\n", "[Michaël Defferrard](http://deff.ch), *PhD student*, [EPFL](http://epfl.ch) [LTS2](http://lts2.epfl.ch)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise: is Python slow ?\n", "\n", "That is one of the most heard complain about Python. Because [CPython](https://en.wikipedia.org/wiki/CPython), the Python reference implementation, [interprets the language](https://en.wikipedia.org/wiki/Interpreted_language) (i.e. it compiles Python code to intermediate bytecode which is then interpreted by a virtual machine), it is inherentably slower than [compiled languages](https://en.wikipedia.org/wiki/Compiled_language), especially for computation heavy tasks such as number crunching.\n", "\n", "There are three ways around it that we'll explore in this exercise:\n", "1. Specialized libraries.\n", "1. Compile Python to machine code.\n", "1. Implement in a compiled language and call from Python.\n", "\n", "In this exercise we'll compare many possible implementations of a function. Our goal is to compare the execution time of our implementations and get a sense of the many ways to write efficient Python code. We test seven implementations:\n", "\n", "Along the exercise we'll use the function\n", "$$accuracy(\\hat{y}, y) = \\frac1n \\sum_{i=0}^{i=n-1} 1(\\hat{y}_i = y_i),$$\n", "where $1(x)$ is the [indicator function](https://en.wikipedia.org/wiki/Indicator_function) and $n$ is the number of samples. This function computes the accuracy, i.e. the percentage of correct predictions, of a classifier. A pure Python implementation is given below." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def accuracy_python(y_pred, y_true):\n", " \"\"\"Plain Python implementation.\"\"\"\n", " num_correct = 0\n", " for y_pred_i, y_true_i in zip(y_pred, y_true):\n", " if y_pred_i == y_true_i:\n", " num_correct += 1\n", " return num_correct / len(y_true)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below we test and measure the execution time of the above implementation. The [%timeit](http://ipython.readthedocs.io/en/stable/interactive/magics.html?highlight=timeit#magic-timeit) function provided by IPython is a useful helper to measure the execution time of a line of Python code. As we'll see, the above implementation is very inefficient compared to what we can achieve." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Expected accuracy: 0.1\n", "Empirical accuracy: 0.100053\n", "10 loops, best of 3: 141 ms per loop\n" ] } ], "source": [ "import numpy as np\n", "\n", "c = 10 # Number of classes.\n", "n = int(1e6) # Number of samples.\n", "\n", "y_true = np.random.randint(0, c, size=n)\n", "y_pred = np.random.randint(0, c, size=n)\n", "print('Expected accuracy: {}'.format(1/c))\n", "print('Empirical accuracy: {}'.format(accuracy_python(y_pred, y_true)))\n", "\n", "%timeit accuracy_python(y_pred, y_true)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1 Specialized libraries\n", "\n", "Specialized libraries, which provide efficient compiled implementations of the heavy computations, is an easy way to solve the performance problem. That is for example NumPy, which uses efficient BLAS and LAPACK implementations as a backend. SciPy and scikit-learn fall in the same category.\n", "\n", "Implement below the accuracy function using:\n", "1. Only functions provided by [NumPy](http://www.numpy.org/). The idea here is to *vectorize* the computation.\n", "2. The implementation of [scikit-learn](http://scikit-learn.org/), our machine learning library.\n", "\n", "Then test that it provides the correct result and measure it's execution time. How much faster are they compared to the pure Python implementation ?" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def accuracy_numpy(y_pred, y_true):\n", " \"\"\"Numpy implementation.\"\"\"\n", " # Your code here.\n", " return 0\n", "\n", "def accuracy_sklearn(y_pred, y_true):\n", " \"\"\"Scikit-learn implementation.\"\"\"\n", " # Your code here.\n", " return 0\n", "\n", "# assert np.allclose(accuracy_numpy(y_pred, y_true), accuracy_python(y_pred, y_true))\n", "# assert np.allclose(accuracy_sklearn(y_pred, y_true), accuracy_python(y_pred, y_true))\n", "\n", "# %timeit accuracy_numpy(y_pred, y_true)\n", "# %timeit accuracy_sklearn(y_pred, y_true)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2 Compiled Python\n", "\n", "The second option of choice, when the algorirthm does not exist in our favorite libraries and we have to implement it, it to implement in Python and compile it to machine code.\n", "\n", "Below you'll compile Python with two frameworks.\n", "1. [Numba](http://numba.pydata.org) is a [just-in-time (JIT)](https://en.wikipedia.org/wiki/Just-in-time_compilation) compiler for Python, using the [LLVM](http://llvm.org) compiler infrastructure.\n", "1. [Cython](http://cython.org), which requires type information, [transpiles](https://en.wikipedia.org/wiki/Source-to-source_compiler) Python to C then compiles the generated C code.\n", "\n", "While these two approaches offer maximal compatibility with the CPython and NumPy ecosystems, another approach is to use another Python implementation such as [PyPy](http://pypy.org), which features a just-in-time compiler and supports multiple back-ends (C, CLI, JVM). Alternatives are [Jython](http://www.jython.org), which runs Python on the Java platform, and [IronPython](http://ironpython.net) / [PythonNet](http://pythonnet.github.io) for the .NET platform." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from numba import jit\n", "\n", "@jit\n", "def accuracy_numba(y_pred, y_true):\n", " \"\"\"Plain Python implementation, compiled by LLVM through Numba.\"\"\"\n", " return 0" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/usr/lib/python3.5/site-packages/Cython/Distutils/old_build_ext.py:30: UserWarning: Cython.Distutils.old_build_ext does not properly handle dependencies and is deprecated.\n", " \"Cython.Distutils.old_build_ext does not properly handle dependencies \"\n" ] } ], "source": [ "%load_ext Cython" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%cython\n", "cimport numpy as np\n", "cimport cython\n", "\n", "def accuracy_cython(np.ndarray[long, ndim=1] y_pred, np.ndarray[long, ndim=1] y_true):\n", " \"\"\"Python implementation with type information, transpiled to C by Cython.\"\"\"\n", " return 0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Evaluate below the performance of those two implementations, while testing their correctness. How do they compare with plain Python and specialized libraries ?" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Your code here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3 Using C from Python\n", "\n", "Here we'll explore our third option to make Python faster: implement in another language ! Below you'll\n", "1. implement the accuracy function in C,\n", "1. compile it, e.g. with the GNU compiler collection ([GCC](https://gcc.gnu.org/)),\n", "1. call it from Python." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting function.c\n" ] } ], "source": [ "%%file function.c\n", "\n", "// The content of this cell is written to the \"function.c\" file in the current directory.\n", "\n", "double accuracy(long* y_pred, long* y_true, int n) {\n", " // Your code here.\n", " return 0;\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The below cell describe a shell script, which will be executed by IPython as if you typed the commands in your terminal. Those commands are compiling the above C program into a dynamic library with GCC. You can use any other compiler, text editor or IDE to produce the C library. Windows users, you may want to use Microsoft toolchain." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "libfunction.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=eed9b1a0543aa0dd93d1f53b12a2e62fc97d0293, not stripped\n" ] } ], "source": [ "%%script sh\n", "FILE=function\n", "gcc -c -O3 -Wall -std=c11 -pedantic -fPIC -o $FILE.o $FILE.c\n", "gcc -o lib$FILE.so -shared $FILE.o\n", "file lib$FILE.so" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The below cell finally create a wrapper around our C library so that we can easily use it from Python." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import ctypes\n", "\n", "libfunction = np.ctypeslib.load_library('libfunction', './')\n", "libfunction.accuracy.restype = ctypes.c_double\n", "libfunction.accuracy.argtypes = [\n", " np.ctypeslib.ndpointer(dtype=np.int),\n", " np.ctypeslib.ndpointer(dtype=np.int),\n", " ctypes.c_int\n", "]\n", "\n", "def accuracy_c(y_pred, y_true):\n", " n = y_pred.size\n", " return libfunction.accuracy(y_pred, y_true, n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Evaluate below the performance of your C implementation, and test its correctness. How does it compare with the others ?" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Your code here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 4 Using Fortran from Python\n", "\n", "Same idea as before, with Fortran ! [Fortran](https://en.wikipedia.org/wiki/Fortran) is an imperative programming language developed in the 50s, especially suited to numeric computation and scientific computing. While you probably won't write new code in Fortran, you may have to interface with legacy code (especially in large and old corporations). Here we'll resort to the [f2py](https://docs.scipy.org/doc/numpy-dev/f2py/) utility provided by the Numpy project for the (almost automatic) generation of a wrapper." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting function.f\n" ] } ], "source": [ "%%file function.f\n", "\n", "! The content of this cell is written to the \"function.f\" file in the current directory.\n", "\n", " SUBROUTINE DACCURACY(YPRED, YTRUE, ACC, N)\n", "\n", "CF2PY INTENT(OUT) :: ACC\n", "CF2PY INTENT(HIDE) :: N\n", " INTEGER*4 YPRED(N)\n", " INTEGER*4 YTRUE(N)\n", " DOUBLE PRECISION ACC\n", " INTEGER N, NCORRECT\n", " \n", " ! Your code here.\n", "\n", " ACC = 0 \n", " END" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The below command compile the Fortran code and generate a Python wrapper." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[39mrunning build\u001b[0m\r\n", "\u001b[39mrunning config_cc\u001b[0m\r\n", "\u001b[39munifing config_cc, config, build_clib, build_ext, build commands --compiler options\u001b[0m\r\n", "\u001b[39mrunning config_fc\u001b[0m\r\n", "\u001b[39munifing config_fc, config, build_clib, build_ext, build commands --fcompiler options\u001b[0m\r\n", "\u001b[39mrunning build_src\u001b[0m\r\n", "\u001b[39mbuild_src\u001b[0m\r\n", "\u001b[39mbuilding extension \"function\" sources\u001b[0m\r\n", "\u001b[39mf2py options: []\u001b[0m\r\n", "\u001b[39mf2py:> /tmp/tmpkn97e3yu/src.linux-x86_64-3.5/functionmodule.c\u001b[0m\r\n", "\u001b[39mcreating /tmp/tmpkn97e3yu/src.linux-x86_64-3.5\u001b[0m\r\n", "Reading fortran codes...\r\n", "\tReading file 'function.f' (format:fix,strict)\r\n", "Post-processing...\r\n", "\tBlock: function\r\n", "\t\t\tBlock: daccuracy\r\n", "Post-processing (stage 2)...\r\n", "Building modules...\r\n", "\tBuilding module \"function\"...\r\n", "\t\tConstructing wrapper function \"daccuracy\"...\r\n", "\t\t acc = daccuracy(ypred,ytrue)\r\n", "\tWrote C/API module \"function\" to file \"/tmp/tmpkn97e3yu/src.linux-x86_64-3.5/functionmodule.c\"\r\n", "\u001b[39m adding '/tmp/tmpkn97e3yu/src.linux-x86_64-3.5/fortranobject.c' to sources.\u001b[0m\r\n", "\u001b[39m adding '/tmp/tmpkn97e3yu/src.linux-x86_64-3.5' to include_dirs.\u001b[0m\r\n", "\u001b[39mcopying /usr/lib/python3.5/site-packages/numpy/f2py/src/fortranobject.c -> /tmp/tmpkn97e3yu/src.linux-x86_64-3.5\u001b[0m\r\n", "\u001b[39mcopying /usr/lib/python3.5/site-packages/numpy/f2py/src/fortranobject.h -> /tmp/tmpkn97e3yu/src.linux-x86_64-3.5\u001b[0m\r\n", "\u001b[39mbuild_src: building npy-pkg config files\u001b[0m\r\n", "\u001b[39mrunning build_ext\u001b[0m\r\n", "\u001b[39mcustomize UnixCCompiler\u001b[0m\r\n", "\u001b[39mcustomize UnixCCompiler using build_ext\u001b[0m\r\n", "\u001b[39mcustomize Gnu95FCompiler\u001b[0m\r\n", "\u001b[39mFound executable /usr/bin/gfortran\u001b[0m\r\n", "\u001b[39mcustomize Gnu95FCompiler\u001b[0m\r\n", "\u001b[39mcustomize Gnu95FCompiler using build_ext\u001b[0m\r\n", "\u001b[39mbuilding 'function' extension\u001b[0m\r\n", "\u001b[39mcompiling C sources\u001b[0m\r\n", "\u001b[39mC compiler: gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fPIC\r\n", "\u001b[0m\r\n", "\u001b[39mcreating /tmp/tmpkn97e3yu/tmp\u001b[0m\r\n", "\u001b[39mcreating /tmp/tmpkn97e3yu/tmp/tmpkn97e3yu\u001b[0m\r\n", "\u001b[39mcreating /tmp/tmpkn97e3yu/tmp/tmpkn97e3yu/src.linux-x86_64-3.5\u001b[0m\r\n", "\u001b[39mcompile options: '-I/tmp/tmpkn97e3yu/src.linux-x86_64-3.5 -I/usr/lib/python3.5/site-packages/numpy/core/include -I/usr/include/python3.5m -c'\u001b[0m\r\n", "\u001b[39mgcc: /tmp/tmpkn97e3yu/src.linux-x86_64-3.5/fortranobject.c\u001b[0m\r\n", "In file included from /usr/lib/python3.5/site-packages/numpy/core/include/numpy/ndarraytypes.h:1777:0,\r\n", " from /usr/lib/python3.5/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,\r\n", " from /usr/lib/python3.5/site-packages/numpy/core/include/numpy/arrayobject.h:4,\r\n", " from /tmp/tmpkn97e3yu/src.linux-x86_64-3.5/fortranobject.h:13,\r\n", " from /tmp/tmpkn97e3yu/src.linux-x86_64-3.5/fortranobject.c:2:\r\n", "/usr/lib/python3.5/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning \"Using deprecated NumPy API, disable it by \" \"#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION\" [-Wcpp]\r\n", " #warning \"Using deprecated NumPy API, disable it by \" \\\r\n", " ^~~~~~~\r\n", "/tmp/tmpkn97e3yu/src.linux-x86_64-3.5/fortranobject.c: In function ‘format_def’:\r\n", "/tmp/tmpkn97e3yu/src.linux-x86_64-3.5/fortranobject.c:139:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]\r\n", " if (size < sizeof(notalloc)) {\r\n", " ^\r\n", "\u001b[39mgcc: /tmp/tmpkn97e3yu/src.linux-x86_64-3.5/functionmodule.c\u001b[0m\r\n", "In file included from /usr/lib/python3.5/site-packages/numpy/core/include/numpy/ndarraytypes.h:1777:0,\r\n", " from /usr/lib/python3.5/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,\r\n", " from /usr/lib/python3.5/site-packages/numpy/core/include/numpy/arrayobject.h:4,\r\n", " from /tmp/tmpkn97e3yu/src.linux-x86_64-3.5/fortranobject.h:13,\r\n", " from /tmp/tmpkn97e3yu/src.linux-x86_64-3.5/functionmodule.c:19:\r\n", "/usr/lib/python3.5/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning \"Using deprecated NumPy API, disable it by \" \"#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION\" [-Wcpp]\r\n", " #warning \"Using deprecated NumPy API, disable it by \" \\\r\n", " ^~~~~~~\r\n", "/tmp/tmpkn97e3yu/src.linux-x86_64-3.5/functionmodule.c:112:12: warning: ‘f2py_size’ defined but not used [-Wunused-function]\r\n", " static int f2py_size(PyArrayObject* var, ...)\r\n", " ^~~~~~~~~\r\n", "\u001b[39mcompiling Fortran sources\u001b[0m\r\n", "\u001b[39mFortran f77 compiler: /usr/bin/gfortran -Wall -g -ffixed-form -fno-second-underscore -fPIC -O3 -funroll-loops\r\n", "Fortran f90 compiler: /usr/bin/gfortran -Wall -g -fno-second-underscore -fPIC -O3 -funroll-loops\r\n", "Fortran fix compiler: /usr/bin/gfortran -Wall -g -ffixed-form -fno-second-underscore -Wall -g -fno-second-underscore -fPIC -O3 -funroll-loops\u001b[0m\r\n", "\u001b[39mcompile options: '-I/tmp/tmpkn97e3yu/src.linux-x86_64-3.5 -I/usr/lib/python3.5/site-packages/numpy/core/include -I/usr/include/python3.5m -c'\u001b[0m\r\n", "\u001b[39mgfortran:f77: function.f\u001b[0m\r\n", "function.f:11:33:\r\n", "\r\n", " INTEGER N, NTOTAL, NCORRECT\r\n", " 1\r\n", "Warning: Unused variable ‘ncorrect’ declared at (1) [-Wunused-variable]\r\n", "function.f:11:23:\r\n", "\r\n", " INTEGER N, NTOTAL, NCORRECT\r\n", " 1\r\n", "Warning: Unused variable ‘ntotal’ declared at (1) [-Wunused-variable]\r\n", "function.f:4:32:\r\n", "\r\n", " SUBROUTINE DACCURACY(YPRED, YTRUE, ACC, N)\r\n", " 1\r\n", "Warning: Unused dummy argument ‘ypred’ at (1) [-Wunused-dummy-argument]\r\n", "function.f:4:39:\r\n", "\r\n", " SUBROUTINE DACCURACY(YPRED, YTRUE, ACC, N)\r\n", " 1\r\n", "Warning: Unused dummy argument ‘ytrue’ at (1) [-Wunused-dummy-argument]\r\n", "\u001b[39m/usr/bin/gfortran -Wall -g -Wall -g -shared /tmp/tmpkn97e3yu/tmp/tmpkn97e3yu/src.linux-x86_64-3.5/functionmodule.o /tmp/tmpkn97e3yu/tmp/tmpkn97e3yu/src.linux-x86_64-3.5/fortranobject.o /tmp/tmpkn97e3yu/function.o -L/usr/lib -lpython3.5m -lgfortran -o ./function.cpython-35m-x86_64-linux-gnu.so\u001b[0m\r\n", "Removing build directory /tmp/tmpkn97e3yu\r\n" ] } ], "source": [ "!f2py -c -m function function.f # >> /dev/null" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import function\n", "def accuracy_fortran(y_pred, y_true):\n", " return function.daccuracy(y_pred, y_true)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Evaluate below the performance of your Fortran implementation, and test its correctness. How does it compare with the others ?" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Your code here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5 Analysis\n", "\n", "Plot a graph with `n` as the x-axis and the execution time of the various methods on the y-axis. " ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Your code here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To reflect on what we've done, answer the following questions:\n", "1. Which implementation is the fastest ?\n", "1. Which solution was the easiest to work with ?\n", "1. Which solution would you use in which situation ?" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }