{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Benchmarks" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "qutip 4.3.1\n", "numpy 1.15.4\n", "matplotlib 3.0.2\n", "matplotlib.pylab 1.15.4\n", "newtonprop 0.1.0\n", "snakeviz 1.0.0\n", "numba 0.41.0\n", "CPython 3.6.7\n", "IPython 7.2.0\n" ] } ], "source": [ "# NBVAL_IGNORE_OUTPUT\n", "%load_ext watermark\n", "import os\n", "import qutip\n", "import numpy as np\n", "from numpy import pi\n", "import matplotlib\n", "import matplotlib.pylab as plt\n", "import newtonprop\n", "import snakeviz\n", "import numba\n", "%watermark -v --iversions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While only available interactively, `snakeviz` is a very good way to look at profiling data:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%load_ext snakeviz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The benchmarks are based on the Examples, so we rerun the entire Example notebook in this context:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "%%capture\n", "%run example.ipynb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Propagation runtime" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "tlist = np.linspace(0, 10, 100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we measure low long the propagation with QuTiP takes. This happens mostly in compiled code, so it is pretty fast." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "15.6 ms ± 941 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" ] } ], "source": [ "%%timeit\n", "qutip.mesolve(L, rho0, tlist)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "39.1 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" ] } ], "source": [ "%%timeit\n", "propagate_expm(L, rho0, tlist)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Newton propagator, being a reference implementation, is implemented in pure Python, and is several orders of magnitude slower. To make the comparison fair, we limit the precision to $10^{-8}$, which is roughly the precision of `mesolve`." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3.85 s ± 282 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "%%timeit\n", "propagate(L, rho0, tlist, zero_qutip, norm_qutip, inner_qutip, tol=1e-8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When using lower-level data types, things get considerably faster:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "209 ms ± 6.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" ] } ], "source": [ "%%timeit\n", "propagate(apply_cythonized_L, rho0_data, tlist, zero_vectorized, norm_vectorized, inner_vectorized, tol=1e-8)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "210 ms ± 14.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "%%timeit\n", "propagate(L_vectorized, rho0_vectorized, tlist, zero_vectorized, norm_vectorized, inner_vectorized, tol=1e-8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Profiling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can profile the how much time is spent in the various routines, comparing `mesolve` and different variations of the Newton propagator. See https://docs.python.org/3/library/profile.html#instant-user-s-manual for the meaning of the colums." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### mesolve" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we look the QuTiP's `mesolve`:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " " ] } ], "source": [ "stats = %prun -q -r qutip.mesolve(L, rho0, tlist);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can look at which top-level routines we spent the most time in *cumulativly*, that is including sub-calls:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 8378 function calls (8174 primitive calls) in 0.023 seconds\n", "\n", " Ordered by: cumulative time\n", " List reduced from 105 to 10 due to restriction <10>\n", "\n", " ncalls tottime percall cumtime percall filename:lineno(function)\n", " 1 0.000 0.000 0.023 0.023 {built-in method builtins.exec}\n", " 1 0.000 0.000 0.023 0.023 :1()\n", " 1 0.000 0.000 0.023 0.023 mesolve.py:80(mesolve)\n", " 1 0.000 0.000 0.023 0.023 mesolve.py:775(_mesolve_const)\n", " 1 0.001 0.001 0.023 0.023 mesolve.py:985(_generic_ode_solve)\n", " 99 0.000 0.000 0.009 0.000 _ode.py:396(integrate)\n", " 99 0.008 0.000 0.008 0.000 _ode.py:989(run)\n", " 102 0.001 0.000 0.008 0.000 qobj.py:211(__init__)\n", " 112 0.000 0.000 0.004 0.000 qobj.py:1927(type)\n", " 310 0.001 0.000 0.004 0.000 fromnumeric.py:64(_wrapreduction)\n", "\n", "\n" ] } ], "source": [ "stats.sort_stats('cumtime').print_stats(10);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or, the bottom-level routines where we *actually* spent time:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 8378 function calls (8174 primitive calls) in 0.023 seconds\n", "\n", " Ordered by: internal time\n", " List reduced from 105 to 10 due to restriction <10>\n", "\n", " ncalls tottime percall cumtime percall filename:lineno(function)\n", " 99 0.008 0.000 0.008 0.000 _ode.py:989(run)\n", " 310 0.002 0.000 0.002 0.000 {method 'reduce' of 'numpy.ufunc' objects}\n", " 100 0.002 0.000 0.003 0.000 {qutip.cy.spconvert.dense2D_to_fastcsr_fmode}\n", " 1 0.001 0.001 0.023 0.023 mesolve.py:985(_generic_ode_solve)\n", " 611 0.001 0.000 0.001 0.000 {built-in method numpy.core.multiarray.array}\n", " 203 0.001 0.000 0.003 0.000 fastsparse.py:47(__init__)\n", " 310 0.001 0.000 0.004 0.000 fromnumeric.py:64(_wrapreduction)\n", " 100 0.001 0.000 0.001 0.000 superoperator.py:227(vec2mat)\n", " 102 0.001 0.000 0.008 0.000 qobj.py:211(__init__)\n", " 301 0.001 0.000 0.001 0.000 qobj.py:1934(shape)\n", "\n", "\n" ] } ], "source": [ "stats.sort_stats('tottime').print_stats(10);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is dominated by the ODE solver and sparse matrix operations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we're working interactively, we could use `snakeviz` to analyze further details:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "#%snakeviz qutip.mesolve(L, rho0, tlist)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### qutip-propagation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we look at the Newton propagator operating on high-level qutip objects:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "#%snakeviz propagate(L, rho0, tlist, zero_qutip, norm_qutip, inner_qutip, tol=1e-8)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " " ] } ], "source": [ "stats = %prun -q -r propagate(L, rho0, tlist, zero_qutip, norm_qutip, inner_qutip, tol=1e-8);" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 3902775 function calls (3810705 primitive calls) in 4.882 seconds\n", "\n", " Ordered by: cumulative time\n", " List reduced from 182 to 10 due to restriction <10>\n", "\n", " ncalls tottime percall cumtime percall filename:lineno(function)\n", " 1 0.000 0.000 4.882 4.882 {built-in method builtins.exec}\n", " 1 0.000 0.000 4.882 4.882 :1()\n", " 1 0.000 0.000 4.882 4.882 :10(propagate)\n", " 99 0.006 0.000 4.881 0.049 propagator.py:186(step)\n", " 99 0.052 0.001 4.874 0.049 propagator.py:314(_step)\n", " 99 0.137 0.001 3.991 0.040 propagator.py:26(_arnoldi)\n", " 35937 0.141 0.000 1.640 0.000 qobj.py:211(__init__)\n", " 5445 0.019 0.000 1.583 0.000 qobj.py:465(__sub__)\n", " 7623 0.084 0.000 1.568 0.000 qobj.py:360(__add__)\n", " 47817 0.042 0.000 1.288 0.000 qobj.py:1927(type)\n", "\n", "\n" ] } ], "source": [ "stats.sort_stats('cumtime').print_stats(10);" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 3902775 function calls (3810705 primitive calls) in 4.882 seconds\n", "\n", " Ordered by: internal time\n", " List reduced from 182 to 10 due to restriction <10>\n", "\n", " ncalls tottime percall cumtime percall filename:lineno(function)\n", " 110385 0.552 0.000 0.552 0.000 {method 'reduce' of 'numpy.ufunc' objects}\n", " 74547 0.357 0.000 0.737 0.000 fastsparse.py:47(__init__)\n", " 272547 0.289 0.000 0.289 0.000 {built-in method numpy.core.multiarray.array}\n", " 109395 0.233 0.000 0.860 0.000 fromnumeric.py:64(_wrapreduction)\n", " 7623 0.157 0.000 0.509 0.000 fastsparse.py:74(_binopt)\n", " 35937 0.141 0.000 1.640 0.000 qobj.py:211(__init__)\n", " 6435 0.139 0.000 0.217 0.000 {qutip.cy.spmath.zcsr_mult}\n", " 99 0.137 0.001 3.991 0.040 propagator.py:26(_arnoldi)\n", " 91476 0.122 0.000 1.106 0.000 dimensions.py:44(is_scalar)\n", " 548262 0.117 0.000 0.162 0.000 {built-in method builtins.isinstance}\n", "\n", "\n" ] } ], "source": [ "stats.sort_stats('tottime').print_stats(10);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### cythonized qutip-propagation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lastly, we can look at the more efficient propagation using a cythonized application of the QuTiP objects:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "#%snakeviz propagate(apply_cythonized_L, rho0_data, tlist, zero_vectorized, norm_vectorized, inner_vectorized, tol=1e-8)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " " ] } ], "source": [ "stats = %prun -q -r propagate(apply_cythonized_L, rho0_data, tlist, zero_vectorized, norm_vectorized, inner_vectorized, tol=1e-8);" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 95441 function calls in 0.289 seconds\n", "\n", " Ordered by: cumulative time\n", " List reduced from 72 to 10 due to restriction <10>\n", "\n", " ncalls tottime percall cumtime percall filename:lineno(function)\n", " 1 0.000 0.000 0.289 0.289 {built-in method builtins.exec}\n", " 1 0.000 0.000 0.289 0.289 :1()\n", " 1 0.000 0.000 0.289 0.289 :10(propagate)\n", " 99 0.002 0.000 0.289 0.003 propagator.py:186(step)\n", " 99 0.041 0.000 0.284 0.003 propagator.py:314(_step)\n", " 99 0.067 0.001 0.201 0.002 propagator.py:26(_arnoldi)\n", " 990 0.033 0.000 0.058 0.000 linalg.py:953(eigvals)\n", " 1386 0.013 0.000 0.038 0.000 linalg.py:2203(norm)\n", " 1287 0.002 0.000 0.036 0.000 :4(norm_vectorized)\n", " 3762 0.028 0.000 0.029 0.000 {built-in method numpy.core.multiarray.dot}\n", "\n", "\n" ] } ], "source": [ "stats.sort_stats('cumtime').print_stats(10);" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 95441 function calls in 0.289 seconds\n", "\n", " Ordered by: internal time\n", " List reduced from 72 to 10 due to restriction <10>\n", "\n", " ncalls tottime percall cumtime percall filename:lineno(function)\n", " 99 0.067 0.001 0.201 0.002 propagator.py:26(_arnoldi)\n", " 99 0.041 0.000 0.284 0.003 propagator.py:314(_step)\n", " 990 0.033 0.000 0.058 0.000 linalg.py:953(eigvals)\n", " 3762 0.028 0.000 0.029 0.000 {built-in method numpy.core.multiarray.dot}\n", " 990 0.016 0.000 0.016 0.000 {built-in method qutip.cy.spmatfuncs.cy_ode_rhs}\n", " 1386 0.013 0.000 0.038 0.000 linalg.py:2203(norm)\n", " 8613 0.011 0.000 0.014 0.000 defmatrix.py:186(__getitem__)\n", " 5445 0.008 0.000 0.008 0.000 {built-in method numpy.core.multiarray.vdot}\n", " 99 0.008 0.000 0.008 0.000 propagator.py:142(_extend_leja)\n", " 1089 0.006 0.000 0.006 0.000 {method 'reduce' of 'numpy.ufunc' objects}\n", "\n", "\n" ] } ], "source": [ "stats.sort_stats('tottime').print_stats(10);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now see that the (Python-level) arnoldi and step routines of Newton come out as the bottle neck, as the application of the Liouvillian has become efficient level (running at C speed)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the Newton implementation, in particular the `_extend_leja` function has been sped up through the use of numba. Without that, `_extend_leja` would dominate." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Number of function evaluations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the Newton propagator is ultimately still limited by being implemented in Python, it is more fair to measure the runtime in terms of the number of average number of applications of the Liouvillian per time step. This is under the assumption that in an efficient implementation (and for large Hilbert spaces), this is the dominating factor.\n", "\n", "The number of applications depends on the chosen precision and on the length of the time step: longer time steps tend do be more efficient, as only then we're in a regime where the fast convergence of the Newton series kicks in." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We construct a dummy `Qobj` that counts its own applications to a state:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "class CountingQobj(qutip.Qobj):\n", " def __init__(self, *args, **kwargs):\n", " self.counter = 0\n", " super().__init__(*args, **kwargs)\n", " def __mul__(self, other):\n", " if isinstance(other, qutip.Qobj):\n", " self.counter += 1\n", " return super().__mul__(other)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "def count_applications_newton(nt, tol=1e-8):\n", " tlist = np.linspace(0, 10, nt)\n", " L_count = CountingQobj(L)\n", " propagate(L_count, rho0, tlist, zero_qutip, norm_qutip, inner_qutip, tol=1e-8)\n", " return L_count.counter / len(tlist)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "23.0" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "count_applications_newton(nt=10)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "9.9" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "count_applications_newton(nt=100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To compare this to the average number of applications in `mesolve`, we use a trimmed-down version of the `mesolve` routine:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "import scipy\n", "from qutip.solver import Options\n", "from qutip.superoperator import mat2vec\n", "from qutip.cy.spmatfuncs import cy_ode_rhs\n", "from qutip.mesolve import _generic_ode_solve\n", "from qutip.ui.progressbar import BaseProgressBar\n", "\n", "\n", "def mesolve(L, rho0, tlist):\n", " opt = Options()\n", "\n", " def func(t, rho, data, ind, ptr):\n", " func.counter += 1\n", " return cy_ode_rhs(t, rho, data, ind, ptr)\n", "\n", " func.counter = 0\n", "\n", " r = scipy.integrate.ode(func)\n", " r.set_f_params(L.data.data, L.data.indices, L.data.indptr)\n", " r.set_integrator('zvode', method=opt.method, order=opt.order,\n", " atol=opt.atol, rtol=opt.rtol, nsteps=opt.nsteps,\n", " first_step=opt.first_step, min_step=opt.min_step,\n", " max_step=opt.max_step)\n", " initial_vector = mat2vec(rho0.full()).ravel('F')\n", " r.set_initial_value(initial_vector, tlist[0])\n", " dt = tlist[1] - tlist[0]\n", " for step in range(len(tlist)-1):\n", " r.integrate(r.t + dt)\n", " return func.counter\n", "\n" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "def count_applications_mesolve(nt):\n", " tlist = np.linspace(0, 10, nt)\n", " return mesolve(L, rho0, tlist) / len(tlist)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "40.7" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "count_applications_mesolve(10)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4.07" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "count_applications_mesolve(100)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 2 }