{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Performance implications of frozen distributions in SciPy\n", "\n", "This notebook is a quick demonstration of the surprising performance implications of two different interfaces for sampling from probability distributions in SciPy." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import scipy.stats\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One option for sampling from a distribution — in this case, a Poisson distribution with a λ of 5 — is to create a distribution object, like this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "distribution = scipy.stats.poisson(5)\n", "distribution.random_state = 0x00c0ffee\n", "distribution.rvs(size=12)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another option is to use the `poisson` object and specify the distribution parameters and the random state in each call, like this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rs = np.random.RandomState(1234)\n", "\n", "scipy.stats.poisson.rvs(5, size=12, random_state=rs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`RandomState` is really a pseudorandom number generator, and we can see that each call modifies its state:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rs = np.random.RandomState(1234)\n", "\n", "print(scipy.stats.poisson.rvs(5, size=12, random_state=rs))\n", "print(scipy.stats.poisson.rvs(5, size=12, random_state=rs))\n", "print(scipy.stats.poisson.rvs(5, size=12, random_state=rs))\n", "print(scipy.stats.poisson.rvs(5, size=12, random_state=rs))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These two techniques produce identical results. To see whether or not they behave identically, let's collect some timings." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def mkpoisson(l,seed):\n", " p = scipy.stats.poisson(l)\n", " p.random_state = seed\n", " return p\n", "\n", "def experiment_one(agents, steps):\n", " def mkpoisson(l,seed):\n", " p = scipy.stats.poisson(l)\n", " p.random_state = seed\n", " return p\n", "\n", " seeds = np.random.randint(1<<32, size=agents)\n", " streams = [mkpoisson(12, seed) for seed in seeds]\n", " for p in streams:\n", " p.rvs(size=steps)\n", "\n", "def experiment_two(agents, steps):\n", " seeds = np.random.randint(1<<32, size=agents)\n", " states = [np.random.RandomState(seed) for seed in seeds]\n", " for rs in states:\n", " scipy.stats.poisson.rvs(12, size=steps, random_state=rs)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "experiment_one(10000,1000)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "experiment_two(10000,1000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you're running these experiments in a similar environment to my computer, you'll see radically different timings — `experiment_two` is going to be substantially faster than `experiment_one`. (I've observed a 2x difference on Binder and a 4-5x difference locally.)\n", "\n", "To see why, let's profile both functions:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from cProfile import run\n", "import pstats\n", "from pstats import SortKey\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "run(\"experiment_one(10000,1000)\", sort=SortKey.TIME)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "run(\"experiment_two(10000,1000)\", sort=SortKey.TIME)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 4 }