{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "code", "collapsed": false, "input": [ "%autosave 10" ], "language": "python", "metadata": {}, "outputs": [ { "javascript": [ "IPython.notebook.set_autosave_interval(10000)" ], "metadata": {}, "output_type": "display_data" }, { "output_type": "stream", "stream": "stdout", "text": [ "Autosaving every 10 seconds\n" ] } ], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "##\u00a0Intro\n", "\n", "- Multiple cores, and GPUs, are becoming mainstream. Eventually thousands of cores will become typical.\n", "\n", "## Cython\n", "\n", "- All except two pure Python standard library modules will compile as-is using Cython, so it is compliant.\n", " - Just use Cython without annotation, usually quite good.\n", "- Can re-use existing C/C++\n", "- SciPy/pandas already use Cython.\n", "\n", "##\u00a0Workflow\n", "\n", "1. Write Cython code in `pyx` file.\n", " - Can start as original pure Python.\n", "2. Compile, execute `setup.py`.\n", "3. Get extension module, `*.so` (Linux), `*.pyd` (Windows).\n", "4. Use your extension.\n", " - Users do not require Cython to use compiled modules.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "!!AI goes through Cython basics, same as tutorial on webpage." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Annotations - Cython at work\n", "\n", "- Generate HTML of annotated output, because C output is not intended for human consumption.\n", "- Python is yellow, C is white, click to see C source.\n", "- Try to turn yellow to white.\n", "\n", "## Pure Python with Decorators\n", "\n", "In newer Cython versions\n", "\n", " import cython\n", " \n", " @cython.locals(a=cython.double, b=cython.double)\n", " def add(a, b):\n", " return a + b\n", "\n", "Still valid Python, will of course need `cython` module. Can still compile down.\n", "\n", "This makes sense. Decorators are used for cross-cutting concerns, i.e. not part of core logic, and Cython is just that.\n", "\n", "##\u00a0Automate the Automation\n", "\n", "`pyximport` tries to compile all `pyz` and `py` files, then:\n", "\n", " import pyximport\n", " pyximport.install()\n", " \n", " import cy_101_pyximport\n", " \n", " print(cy_101_pyximport.typed_python_func(3, 4))\n", "\n", "## The Buffer Interface\n", "\n", "- NumPy-inspired low-level view of C data structures.\n", "- Want to avoid copying data, this is expensive. Just use data inplace.\n", "\n", "##\u00a0Example\n", "\n", "- `a` and `b` are 2000 x 2000, 4 million elements.\n", "` (a + b) * 2 + (a * b)`.\n", " - !!AI *not* matrix multiplication, element-wise multiplication.\n", " - Hence classic \"embarrassingly parallel\" problem.\n", "\n", "##\u00a0With multiprocessing\n", "\n", "- Split matrix into 4 quadrants.\n", "- But 6 times slower than NumPy.\n", " - Due to serialisation / pickling of data to and from processes.\n", " - Need to keep data close to calculators, not send it around.\n", "\n", "## The Buffer Interface from Cython\n", "\n", "Key point:\n", "\n", " @cython.boundscheck(False)\n", " @cython.wraparound(False)\n", " def func(object[double, ndim=2] buf1 not None,\n", " object[double, ndim=2] buf2 not None,\n", " object[double, ndim=2] output=one):\n", "\n", "\n", "You want to loop over the NumPy arrray. Usually a no-no, but since this gets converted down to C fine.\n", "\n", "Why does this speed up?\n", "\n", "- Tiling. Rather than going row by row going into quadrants improves usage of caches (subset of data can fit into L1/L2). Without parallelisation it is faster.\n", "\n", "## Cython and OpenMP\n", "\n", "- OpenMP is very featureful, and Cython can access that.\n", " - e.g. optimally give work to workers who need work, rather than same-size sharing.\n", "- You can (must) disable the GIL in key points.\n", "- However you lose the safety-net of Python. You need to know about C and OpenMP.\n", " \n", "### Using threads explicitly\n", "\n", "- If you want to you can explicitly refer to threads, but not necessary to use OpenMP.\n", "- Use `with nogil` when in C-only code.\n", "- Use `with gil` when need Python objects.\n", "- Can nest `with gil` below `with nogil`.\n", "- You need `with nogil` to have genuine parallelism when using threads explicitly.\n", "\n", "### Use threads implicitly\n", "\n", "- `prange`\n", "- But OpenMP with 4 threads gives x2 speedup (hyperthreading not useful), but tiling effect gives x4 speedup.\n", "\n", "##\u00a0Questions\n", "\n", "- Haven't tried OpenCL and benchmarked it." ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }