{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Volvemos al torneo del rendimiento!!!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Recapitulando. Un artículo sobre Cython donde conseguíamos [mejoras de velocidad de código Python con numpy arrays de 40x usando Cython](http://pybonacci.org/2015/03/09/c-elemental-querido-cython/) desembocó [en mejoras de 70x usando numba](http://pybonacci.org/2015/03/13/como-acelerar-tu-codigo-python-con-numba/). En esta tercera toma vamos a ver si con Cython conseguimos las velocidades de numba tomando algunas ideas de la implementación de JuanLu y definiendo una función un poco más inteligente que mi implementación con Cython ([busca_min_cython9](http://pybonacci.org/2015/03/09/c-elemental-querido-cython/#Cythonizando,-que-es-gerundio-%28toma-9%29.))." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Preparamos el *setup inicial*." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import numba\n", "\n", "np.random.seed(0)\n", "\n", "data = np.random.randn(2000, 2000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "JuanLu hizo alguna trampa usando un numpy array en lugar de dos listas y devolviendo el resultado usando `numpy.nonzero`. En realidad no es trampa, es pura envidia mía al ver que ha usado una forma más inteligente de conseguir lo mismo que hacía mi función original :-P\n", "\n", "Usando esa implementación considero que es más inteligente tener un numpy array de salida por lo que el uso de `np.nonzero` sería innecesario y añadiría algo de pérdida de rendimiento si luego vamos a seguir trabajando con numpy arrays. Por tanto, la implementación de JuanLu eliminando el uso de `numpy.nonzero` sería:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def busca_min_np_jit(malla):\n", " minimos = np.zeros_like(malla, dtype=bool)\n", " _busca_min(malla, minimos)\n", " return minimos # en lugar de 'return np.nonzero(minimos)'\n", "\n", "@numba.jit(nopython=True)\n", "def _busca_min(malla, minimos):\n", " for i in range(1, malla.shape[1]-1):\n", " for j in range(1, malla.shape[0]-1):\n", " if (malla[j, i] < malla[j-1, i-1] and\n", " malla[j, i] < malla[j-1, i] and\n", " malla[j, i] < malla[j-1, i+1] and\n", " malla[j, i] < malla[j, i-1] and\n", " malla[j, i] < malla[j, i+1] and\n", " malla[j, i] < malla[j+1, i-1] and\n", " malla[j, i] < malla[j+1, i] and\n", " malla[j, i] < malla[j+1, i+1]):\n", " minimos[i, j] = True" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "100 loops, best of 3: 33 ms per loop\n" ] } ], "source": [ "%timeit -n 100 busca_min_np_jit(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ejecutándolo 100 veces obtenemos un valor más bajo de 33.6 ms devolviendo un numpy.array de 1's y 0's con los 1's indicando la posición de los máximos.\n", "\n", "La implementación original la vamos a modificar un poco para que devuelva lo mismo." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def busca_min(malla):\n", " minimos = np.zeros_like(malla)\n", " for i in range(1, malla.shape[1]-1):\n", " for j in range(1, malla.shape[0]-1):\n", " if (malla[j, i] < malla[j-1, i-1] and\n", " malla[j, i] < malla[j-1, i] and\n", " malla[j, i] < malla[j-1, i+1] and\n", " malla[j, i] < malla[j, i-1] and\n", " malla[j, i] < malla[j, i+1] and\n", " malla[j, i] < malla[j+1, i-1] and\n", " malla[j, i] < malla[j+1, i] and\n", " malla[j, i] < malla[j+1, i+1]):\n", " minimos[i, j] = 1\n", "\n", " return minimos" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 loops, best of 3: 3.4 s per loop\n" ] } ], "source": [ "%timeit busca_min(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Los tiempos son similares a la función original y, aunque estamos usando más memoria, tenemos una mejora con numba que ya llega a los dos órdenes de magnitud (alrededor de 100x!!) y una función más usable para trabajar con numpy." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Vamos a modificar la opción Cython más rápida que obtuvimos para que se comporte igual que las de Numba y Python.\n", "\n", "Primero cargamos la extensión Cython." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# antes cythonmagic\n", "%load_ext Cython" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Vamos a usar la opción `annotate` para ver cuanto 'blanco' tenemos y la nueva versión Cython la vamos a llamar `busca_min_cython10`." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "

Generated by Cython 0.22

\n", "
+01: import numpy as np
\n", "
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, -1); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "/* … */\n",
       "  __pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "
 02: from cython cimport boundscheck, wraparound
\n", "
 03: 
\n", "
+04: cpdef char[:,::1] busca_min_cython10(double[:, ::1] malla):
\n", "
static PyObject *__pyx_pw_46_cython_magic_de7594aedda59602146d5e749862b110_1busca_min_cython10(PyObject *__pyx_self, PyObject *__pyx_arg_malla); /*proto*/\n",
       "static __Pyx_memviewslice __pyx_f_46_cython_magic_de7594aedda59602146d5e749862b110_busca_min_cython10(__Pyx_memviewslice __pyx_v_malla, CYTHON_UNUSED int __pyx_skip_dispatch) {\n",
       "  unsigned int __pyx_v_i;\n",
       "  unsigned int __pyx_v_j;\n",
       "  unsigned int __pyx_v_ii;\n",
       "  unsigned int __pyx_v_jj;\n",
       "  __Pyx_memviewslice __pyx_v_minimos = { 0, 0, { 0 }, { 0 }, { 0 } };\n",
       "  unsigned int __pyx_v_start;\n",
       "  __Pyx_memviewslice __pyx_r = { 0, 0, { 0 }, { 0 }, { 0 } };\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"busca_min_cython10\", 0);\n",
       "/* … */\n",
       "  /* function exit code */\n",
       "  __pyx_L1_error:;\n",
       "  __Pyx_XDECREF(__pyx_t_1);\n",
       "  __Pyx_XDECREF(__pyx_t_2);\n",
       "  __Pyx_XDECREF(__pyx_t_3);\n",
       "  __Pyx_XDECREF(__pyx_t_4);\n",
       "  __Pyx_XDECREF(__pyx_t_5);\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_t_6, 1);\n",
       "  __pyx_r.data = NULL;\n",
       "  __pyx_r.memview = NULL;\n",
       "  __Pyx_AddTraceback(\"_cython_magic_de7594aedda59602146d5e749862b110.busca_min_cython10\", __pyx_clineno, __pyx_lineno, __pyx_filename);\n",
       "\n",
       "  goto __pyx_L2;\n",
       "  __pyx_L0:;\n",
       "  if (unlikely(!__pyx_r.memview)) {\n",
       "    PyErr_SetString(PyExc_TypeError,\"Memoryview return value is not initialized\");\n",
       "  }\n",
       "  __pyx_L2:;\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_v_minimos, 1);\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "\n",
       "/* Python wrapper */\n",
       "static PyObject *__pyx_pw_46_cython_magic_de7594aedda59602146d5e749862b110_1busca_min_cython10(PyObject *__pyx_self, PyObject *__pyx_arg_malla); /*proto*/\n",
       "static PyObject *__pyx_pw_46_cython_magic_de7594aedda59602146d5e749862b110_1busca_min_cython10(PyObject *__pyx_self, PyObject *__pyx_arg_malla) {\n",
       "  __Pyx_memviewslice __pyx_v_malla = { 0, 0, { 0 }, { 0 }, { 0 } };\n",
       "  PyObject *__pyx_r = 0;\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"busca_min_cython10 (wrapper)\", 0);\n",
       "  assert(__pyx_arg_malla); {\n",
       "    __pyx_v_malla = __Pyx_PyObject_to_MemoryviewSlice_d_dc_double(__pyx_arg_malla); if (unlikely(!__pyx_v_malla.memview)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L3_error;}\n",
       "  }\n",
       "  goto __pyx_L4_argument_unpacking_done;\n",
       "  __pyx_L3_error:;\n",
       "  __Pyx_AddTraceback(\"_cython_magic_de7594aedda59602146d5e749862b110.busca_min_cython10\", __pyx_clineno, __pyx_lineno, __pyx_filename);\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return NULL;\n",
       "  __pyx_L4_argument_unpacking_done:;\n",
       "  __pyx_r = __pyx_pf_46_cython_magic_de7594aedda59602146d5e749862b110_busca_min_cython10(__pyx_self, __pyx_v_malla);\n",
       "  int __pyx_lineno = 0;\n",
       "  const char *__pyx_filename = NULL;\n",
       "  int __pyx_clineno = 0;\n",
       "\n",
       "  /* function exit code */\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "\n",
       "static PyObject *__pyx_pf_46_cython_magic_de7594aedda59602146d5e749862b110_busca_min_cython10(CYTHON_UNUSED PyObject *__pyx_self, __Pyx_memviewslice __pyx_v_malla) {\n",
       "  PyObject *__pyx_r = NULL;\n",
       "  __Pyx_RefNannyDeclarations\n",
       "  __Pyx_RefNannySetupContext(\"busca_min_cython10\", 0);\n",
       "  __Pyx_XDECREF(__pyx_r);\n",
       "  if (unlikely(!__pyx_v_malla.memview)) { __Pyx_RaiseUnboundLocalError(\"malla\"); {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;} }\n",
       "  __pyx_t_1 = __pyx_f_46_cython_magic_de7594aedda59602146d5e749862b110_busca_min_cython10(__pyx_v_malla, 0); if (unlikely(!__pyx_t_1.memview)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __pyx_t_2 = __pyx_memoryview_fromslice(__pyx_t_1, 2, (PyObject *(*)(char *)) __pyx_memview_get_char, (int (*)(char *, PyObject *)) __pyx_memview_set_char, 0);; if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_t_1, 1);\n",
       "  __pyx_r = __pyx_t_2;\n",
       "  __pyx_t_2 = 0;\n",
       "  goto __pyx_L0;\n",
       "\n",
       "  /* function exit code */\n",
       "  __pyx_L1_error:;\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_t_1, 1);\n",
       "  __Pyx_XDECREF(__pyx_t_2);\n",
       "  __Pyx_AddTraceback(\"_cython_magic_de7594aedda59602146d5e749862b110.busca_min_cython10\", __pyx_clineno, __pyx_lineno, __pyx_filename);\n",
       "  __pyx_r = NULL;\n",
       "  __pyx_L0:;\n",
       "  __PYX_XDEC_MEMVIEW(&__pyx_v_malla, 1);\n",
       "  __Pyx_XGIVEREF(__pyx_r);\n",
       "  __Pyx_RefNannyFinishContext();\n",
       "  return __pyx_r;\n",
       "}\n",
       "
 05:     cdef unsigned int i, j
\n", "
+06:     cdef unsigned int ii = malla.shape[1]-1
\n", "
  __pyx_v_ii = ((__pyx_v_malla.shape[1]) - 1);\n",
       "
+07:     cdef unsigned int jj = malla.shape[0]-1
\n", "
  __pyx_v_jj = ((__pyx_v_malla.shape[0]) - 1);\n",
       "
+08:     cdef char[:,::1] minimos = np.zeros_like(malla, dtype = np.int8)
\n", "
  __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  __pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_zeros_like); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_2);\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  __pyx_t_1 = __pyx_memoryview_fromslice(__pyx_v_malla, 2, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  __pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_3);\n",
       "  PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_1);\n",
       "  __Pyx_GIVEREF(__pyx_t_1);\n",
       "  __pyx_t_1 = 0;\n",
       "  __pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_1);\n",
       "  __pyx_t_4 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_4)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_4);\n",
       "  __pyx_t_5 = __Pyx_PyObject_GetAttrStr(__pyx_t_4, __pyx_n_s_int8); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_5);\n",
       "  __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;\n",
       "  if (PyDict_SetItem(__pyx_t_1, __pyx_n_s_dtype, __pyx_t_5) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;\n",
       "  __pyx_t_5 = __Pyx_PyObject_Call(__pyx_t_2, __pyx_t_3, __pyx_t_1); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_GOTREF(__pyx_t_5);\n",
       "  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;\n",
       "  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;\n",
       "  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;\n",
       "  __pyx_t_6 = __Pyx_PyObject_to_MemoryviewSlice_d_dc_char(__pyx_t_5);\n",
       "  if (unlikely(!__pyx_t_6.memview)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 8; __pyx_clineno = __LINE__; goto __pyx_L1_error;}\n",
       "  __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;\n",
       "  __pyx_v_minimos = __pyx_t_6;\n",
       "  __pyx_t_6.memview = NULL;\n",
       "  __pyx_t_6.data = NULL;\n",
       "
 09:     #minimos[...] = 0
\n", "
+10:     cdef unsigned int start = 1
\n", "
  __pyx_v_start = 1;\n",
       "
 11:     #cdef float [:, :] malla_view = malla
\n", "
 12:     with boundscheck(False), wraparound(False):
\n", "
+13:         for j in range(start, ii):
\n", "
  __pyx_t_7 = __pyx_v_ii;\n",
       "  for (__pyx_t_8 = __pyx_v_start; __pyx_t_8 < __pyx_t_7; __pyx_t_8+=1) {\n",
       "    __pyx_v_j = __pyx_t_8;\n",
       "
+14:             for i in range(start, jj):
\n", "
    __pyx_t_9 = __pyx_v_jj;\n",
       "    for (__pyx_t_10 = __pyx_v_start; __pyx_t_10 < __pyx_t_9; __pyx_t_10+=1) {\n",
       "      __pyx_v_i = __pyx_t_10;\n",
       "
+15:                 if (malla[j, i] < malla[j-1, i-1] and
\n", "
      __pyx_t_12 = __pyx_v_j;\n",
       "      __pyx_t_13 = __pyx_v_i;\n",
       "      __pyx_t_14 = (__pyx_v_j - 1);\n",
       "      __pyx_t_15 = (__pyx_v_i - 1);\n",
       "      __pyx_t_16 = (((*((double *) ( /* dim=1 */ ((char *) (((double *) ( /* dim=0 */ (__pyx_v_malla.data + __pyx_t_12 * __pyx_v_malla.strides[0]) )) + __pyx_t_13)) ))) < (*((double *) ( /* dim=1 */ ((char *) (((double *) ( /* dim=0 */ (__pyx_v_malla.data + __pyx_t_14 * __pyx_v_malla.strides[0]) )) + __pyx_t_15)) )))) != 0);\n",
       "      if (__pyx_t_16) {\n",
       "      } else {\n",
       "        __pyx_t_11 = __pyx_t_16;\n",
       "        goto __pyx_L8_bool_binop_done;\n",
       "      }\n",
       "
+16:                     malla[j, i] < malla[j-1, i] and
\n", "
      __pyx_t_17 = __pyx_v_j;\n",
       "      __pyx_t_18 = __pyx_v_i;\n",
       "      __pyx_t_19 = (__pyx_v_j - 1);\n",
       "      __pyx_t_20 = __pyx_v_i;\n",
       "      __pyx_t_16 = (((*((double *) ( /* dim=1 */ ((char *) (((double *) ( /* dim=0 */ (__pyx_v_malla.data + __pyx_t_17 * __pyx_v_malla.strides[0]) )) + __pyx_t_18)) ))) < (*((double *) ( /* dim=1 */ ((char *) (((double *) ( /* dim=0 */ (__pyx_v_malla.data + __pyx_t_19 * __pyx_v_malla.strides[0]) )) + __pyx_t_20)) )))) != 0);\n",
       "      if (__pyx_t_16) {\n",
       "      } else {\n",
       "        __pyx_t_11 = __pyx_t_16;\n",
       "        goto __pyx_L8_bool_binop_done;\n",
       "      }\n",
       "
+17:                     malla[j, i] < malla[j-1, i+1] and
\n", "
      __pyx_t_21 = __pyx_v_j;\n",
       "      __pyx_t_22 = __pyx_v_i;\n",
       "      __pyx_t_23 = (__pyx_v_j - 1);\n",
       "      __pyx_t_24 = (__pyx_v_i + 1);\n",
       "      __pyx_t_16 = (((*((double *) ( /* dim=1 */ ((char *) (((double *) ( /* dim=0 */ (__pyx_v_malla.data + __pyx_t_21 * __pyx_v_malla.strides[0]) )) + __pyx_t_22)) ))) < (*((double *) ( /* dim=1 */ ((char *) (((double *) ( /* dim=0 */ (__pyx_v_malla.data + __pyx_t_23 * __pyx_v_malla.strides[0]) )) + __pyx_t_24)) )))) != 0);\n",
       "      if (__pyx_t_16) {\n",
       "      } else {\n",
       "        __pyx_t_11 = __pyx_t_16;\n",
       "        goto __pyx_L8_bool_binop_done;\n",
       "      }\n",
       "
+18:                     malla[j, i] < malla[j, i-1] and
\n", "
      __pyx_t_25 = __pyx_v_j;\n",
       "      __pyx_t_26 = __pyx_v_i;\n",
       "      __pyx_t_27 = __pyx_v_j;\n",
       "      __pyx_t_28 = (__pyx_v_i - 1);\n",
       "      __pyx_t_16 = (((*((double *) ( /* dim=1 */ ((char *) (((double *) ( /* dim=0 */ (__pyx_v_malla.data + __pyx_t_25 * __pyx_v_malla.strides[0]) )) + __pyx_t_26)) ))) < (*((double *) ( /* dim=1 */ ((char *) (((double *) ( /* dim=0 */ (__pyx_v_malla.data + __pyx_t_27 * __pyx_v_malla.strides[0]) )) + __pyx_t_28)) )))) != 0);\n",
       "      if (__pyx_t_16) {\n",
       "      } else {\n",
       "        __pyx_t_11 = __pyx_t_16;\n",
       "        goto __pyx_L8_bool_binop_done;\n",
       "      }\n",
       "
+19:                     malla[j, i] < malla[j, i+1] and
\n", "
      __pyx_t_29 = __pyx_v_j;\n",
       "      __pyx_t_30 = __pyx_v_i;\n",
       "      __pyx_t_31 = __pyx_v_j;\n",
       "      __pyx_t_32 = (__pyx_v_i + 1);\n",
       "      __pyx_t_16 = (((*((double *) ( /* dim=1 */ ((char *) (((double *) ( /* dim=0 */ (__pyx_v_malla.data + __pyx_t_29 * __pyx_v_malla.strides[0]) )) + __pyx_t_30)) ))) < (*((double *) ( /* dim=1 */ ((char *) (((double *) ( /* dim=0 */ (__pyx_v_malla.data + __pyx_t_31 * __pyx_v_malla.strides[0]) )) + __pyx_t_32)) )))) != 0);\n",
       "      if (__pyx_t_16) {\n",
       "      } else {\n",
       "        __pyx_t_11 = __pyx_t_16;\n",
       "        goto __pyx_L8_bool_binop_done;\n",
       "      }\n",
       "
+20:                     malla[j, i] < malla[j+1, i-1] and
\n", "
      __pyx_t_33 = __pyx_v_j;\n",
       "      __pyx_t_34 = __pyx_v_i;\n",
       "      __pyx_t_35 = (__pyx_v_j + 1);\n",
       "      __pyx_t_36 = (__pyx_v_i - 1);\n",
       "      __pyx_t_16 = (((*((double *) ( /* dim=1 */ ((char *) (((double *) ( /* dim=0 */ (__pyx_v_malla.data + __pyx_t_33 * __pyx_v_malla.strides[0]) )) + __pyx_t_34)) ))) < (*((double *) ( /* dim=1 */ ((char *) (((double *) ( /* dim=0 */ (__pyx_v_malla.data + __pyx_t_35 * __pyx_v_malla.strides[0]) )) + __pyx_t_36)) )))) != 0);\n",
       "      if (__pyx_t_16) {\n",
       "      } else {\n",
       "        __pyx_t_11 = __pyx_t_16;\n",
       "        goto __pyx_L8_bool_binop_done;\n",
       "      }\n",
       "
+21:                     malla[j, i] < malla[j+1, i] and
\n", "
      __pyx_t_37 = __pyx_v_j;\n",
       "      __pyx_t_38 = __pyx_v_i;\n",
       "      __pyx_t_39 = (__pyx_v_j + 1);\n",
       "      __pyx_t_40 = __pyx_v_i;\n",
       "      __pyx_t_16 = (((*((double *) ( /* dim=1 */ ((char *) (((double *) ( /* dim=0 */ (__pyx_v_malla.data + __pyx_t_37 * __pyx_v_malla.strides[0]) )) + __pyx_t_38)) ))) < (*((double *) ( /* dim=1 */ ((char *) (((double *) ( /* dim=0 */ (__pyx_v_malla.data + __pyx_t_39 * __pyx_v_malla.strides[0]) )) + __pyx_t_40)) )))) != 0);\n",
       "      if (__pyx_t_16) {\n",
       "      } else {\n",
       "        __pyx_t_11 = __pyx_t_16;\n",
       "        goto __pyx_L8_bool_binop_done;\n",
       "      }\n",
       "
+22:                     malla[j, i] < malla[j+1, i+1]):
\n", "
      __pyx_t_41 = __pyx_v_j;\n",
       "      __pyx_t_42 = __pyx_v_i;\n",
       "      __pyx_t_43 = (__pyx_v_j + 1);\n",
       "      __pyx_t_44 = (__pyx_v_i + 1);\n",
       "      __pyx_t_16 = (((*((double *) ( /* dim=1 */ ((char *) (((double *) ( /* dim=0 */ (__pyx_v_malla.data + __pyx_t_41 * __pyx_v_malla.strides[0]) )) + __pyx_t_42)) ))) < (*((double *) ( /* dim=1 */ ((char *) (((double *) ( /* dim=0 */ (__pyx_v_malla.data + __pyx_t_43 * __pyx_v_malla.strides[0]) )) + __pyx_t_44)) )))) != 0);\n",
       "      __pyx_t_11 = __pyx_t_16;\n",
       "      __pyx_L8_bool_binop_done:;\n",
       "      if (__pyx_t_11) {\n",
       "
+23:                     minimos[i,j] = 1
\n", "
        __pyx_t_45 = __pyx_v_i;\n",
       "        __pyx_t_46 = __pyx_v_j;\n",
       "        *((char *) ( /* dim=1 */ ((char *) (((char *) ( /* dim=0 */ (__pyx_v_minimos.data + __pyx_t_45 * __pyx_v_minimos.strides[0]) )) + __pyx_t_46)) )) = 1;\n",
       "        goto __pyx_L7;\n",
       "      }\n",
       "      __pyx_L7:;\n",
       "    }\n",
       "  }\n",
       "
 24: 
\n", "
+25:     return minimos
\n", "
  __PYX_INC_MEMVIEW(&__pyx_v_minimos, 0);\n",
       "  __pyx_r = __pyx_v_minimos;\n",
       "  goto __pyx_L0;\n",
       "
" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%cython --annotate\n", "import numpy as np\n", "from cython cimport boundscheck, wraparound\n", "\n", "cpdef char[:,::1] busca_min_cython10(double[:, ::1] malla):\n", " cdef unsigned int i, j\n", " cdef unsigned int ii = malla.shape[1]-1\n", " cdef unsigned int jj = malla.shape[0]-1\n", " cdef char[:,::1] minimos = np.zeros_like(malla, dtype = np.int8)\n", " #minimos[...] = 0\n", " cdef unsigned int start = 1\n", " #cdef float [:, :] malla_view = malla\n", " with boundscheck(False), wraparound(False):\n", " for j in range(start, ii):\n", " for i in range(start, jj):\n", " if (malla[j, i] < malla[j-1, i-1] and\n", " malla[j, i] < malla[j-1, i] and\n", " malla[j, i] < malla[j-1, i+1] and\n", " malla[j, i] < malla[j, i-1] and\n", " malla[j, i] < malla[j, i+1] and\n", " malla[j, i] < malla[j+1, i-1] and\n", " malla[j, i] < malla[j+1, i] and\n", " malla[j, i] < malla[j+1, i+1]):\n", " minimos[i,j] = 1\n", "\n", " return minimos" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Vemos que la mayor parte está en 'blanco'. Eso significa que estamos evitando usar la C-API de CPython y la mayor parte sucede en C. Estoy usando `typed memoryviews` que permite trabajar de forma 'transparente' numpy arrays.\n", "\n", "Vamos a ejecutar la nueva versión 100 veces, de la misma forma que hemos hecho con Numba:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "100 loops, best of 3: 27.6 ms per loop\n" ] } ], "source": [ "%timeit -n 100 busca_min_cython10(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Wow, virtualmente obtenemos la misma velocidad entre Numba y Cython y dos órdenes de magnitud de mejora con respecto a la versión Python." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [], "source": [ "res_numba = busca_min_np_jit(data)\n", "res_cython = busca_min_cython10(data)\n", "res_python = busca_min(data)\n", "\n", "np.testing.assert_array_equal(res_numba, res_cython)\n", "np.testing.assert_array_equal(res_numba, res_python)\n", "np.testing.assert_array_equal(res_cython, res_python)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Parece que el resultado es el mismo en todo momento" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "Probemos con arrays de menos y más tamaño." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3 loops, best of 3: 2.04 ms per loop\n", "3 loops, best of 3: 1.75 ms per loop\n", "1 loops, best of 3: 209 ms per loop\n" ] } ], "source": [ "data = np.random.randn(500, 500)\n", "%timeit -n 3 busca_min_np_jit(data)\n", "%timeit -n 3 busca_min_cython10(data)\n", "%timeit busca_min(data)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3 loops, best of 3: 216 ms per loop\n", "3 loops, best of 3: 174 ms per loop\n", "1 loops, best of 3: 21.6 s per loop\n" ] } ], "source": [ "data = np.random.randn(5000, 5000)\n", "%timeit -n 3 busca_min_np_jit(data)\n", "%timeit -n 3 busca_min_cython10(data)\n", "%timeit busca_min(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Parece que las distintas versiones escalan de la misma forma y el rendimiento parece, más o menos, lineal." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusiones de este nuevo capítulo." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Las conclusiones que saco yo de este mano a mano que hemos llevado a cabo JuanLu (featuring Numba) y yo (featuring Cython):\n", "\n", "* Cython: Si te restringes a cosas sencllas, es relativamente sencillo de usar. Básicamente habría que optimizar bucles y, solo en caso de que sea necesario, añadir tipos a otras variables para evitar pasar por la C-API de CPython en ciertas operaciones puesto que puede tener un coste elevado en el rendimiento. Para cosas más complejas, a pesar de que sigue siendo más placentero que C se puede complicar un poco más (pero no mucho más, una vez que lo entendido cómo usarlo).\n", "* Numba: Es bastante sorprendente lo que se puede llegar a conseguir con poco esfuerzo. Parece que siempre introducirá un poco de *overhead* puesto que hace muchas cosas entre bambalinas y de la otra forma (Cython) hace lo que le digamos que haga. También es verdad que muchas cosas no están soportadas, que los errores que obtenemos puede ser un poco crípticos y se hace difícil depurar el código. Pero a pesar de todo lo anterior y conociendo el historial de la gente que está detrás del proyecto Numba creo que su futuro será brillante. Por ejemplo, [Numbagg](https://github.com/shoyer/numbagg) es una librería que usa Numba y que pretende hacer lo mismo que [bottleneck](https://github.com/kwgoodman/bottleneck) (una librería muy especializada para determinadas operaciones de Numpy), que usa Cython consiguiendo [resultados comparables aunque levemente peores](https://github.com/shoyer/numbagg#benchmarks).\n", "\n", "No sé si habrá algún capítulo más de esta serie... Lo dejo en manos de JuanLu o de cualquiera que nos quiera enviar un nuevo post relacionado." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.0" } }, "nbformat": 4, "nbformat_minor": 0 }