{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# `numpy.arange()` with floats considered harmful\n", "\n", "A few examples used in this notebook are taken from http://quantumwise.com/forum/index.php?topic=110.0#.VIVgyIctjRZ.\n", "\n", "We often need series of regularly spaced numbers. [numpy.arange()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html) is great for that, as long as the step size is an integer.\n", "\n", "If the step size is a floating-point number, however, the result might not be what you expect." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## But when I last tried it worked as expected ...\n", "\n", "[numpy.arange()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html) is NumPy's extension of Python's built-in [range()](https://docs.python.org/3/library/functions.html#func-range) function.\n", "The latter only works with integers, which are also no problem for the former:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4, 5, 6])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(7) # defaults: start=0, step=1" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 4, 5, 6])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(1, 7)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 3, 5])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(1, 7, 2)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(1, 7, 6)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `start` value is always part of the returned range, the `stop` value is never part of it.\n", "\n", "Contrary to `range()`, `numpy.arange()` also works with floating-point numbers." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(0.4, 1.1, 0.1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Also here, as expected, the `stop` value is not part of the range." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## So what's the problem?\n", "\n", "The problem is that some numbers (infinitely many, actually) cannot be represented exactly as fixed-size binary floating-point numbers. Doing calculations with floating-point numbers will often lead to rounding errors. And exactly those rounding errors may lead to unexpected results.\n", "\n", "For example here:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(0.5, 1.1, 0.1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Wow, now `stop` is suddenly included in the range! We didn't expect that!\n", "\n", "That's the reason why care has to be taken when using `np.arange()` with floating-point numbers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## OK, I see the problem now, but what's the solution?\n", "\n", "Well, there isn't a single solution for all use cases, it depends on what you need.\n", "\n", "If you know the first and the last value that should be in your range and the total number of values, you should use [numpy.linspace()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.linspace(0.5, 1.1, 7)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In contrast to `np.arange()`, the resulting range includes the last value.\n", "\n", "If you want it to exclude the last value, you can use the `endpoint` argument (and reduce the number of values by 1)." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.5, 0.6, 0.7, 0.8, 0.9, 1. ])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.linspace(0.5, 1.1, 6, endpoint=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you, however, want to create a range with a given *spacing*, you'd have to manually calculate the *number of values* before being able to use `numpy.linspace()`.\n", "\n", "It would be great if there were a feature of either `numpy.linspace()` or `numpy.arange()` that would allow specifying a *spacing* but would also avoid the above-mentioned problem.\n", "This was discussed [years ago on the NumPy mailing list](https://mail.python.org/pipermail/numpy-discussion/2007-September/029129.html). Sadly, nobody could come up with a good solution.\n", "\n", "So for now, you're stuck with adding or subtracting small amounts to/from the `stop` value to get the desired behavior." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.5, 0.6, 0.7, 0.8, 0.9, 1. ])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "step = 0.1\n", "np.arange(0.5, 1.1 - step*0.5, step)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This yields the expected result, but it only works if you know that the distance between `start` and `stop` is (approximately) an integer multiple of the spacing." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "assert np.isclose((1.1 - 0.5) % step, 0.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want to include the `stop` value, you can do something like this:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(0.5, 1.1 + step*0.5, step)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, you have to make sure that the spacing fits to the distance:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "assert np.isclose((1.1 + 0.5) % step, 0.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Can't this be done automatically?\n", "\n", "Well, let's try ..." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "def myrange(start, stop, step=1):\n", " # Don't use this function, see below!\n", " return np.arange(start, stop - step*0.5, step)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.5, 0.6, 0.7, 0.8, 0.9, 1. ])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myrange(0.5, 1.1, 0.1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK, the problematic case from above works. But let's change the `stop` value a little bit:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.5, 0.6, 0.7, 0.8, 0.9, 1. ])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myrange(0.5, 1.11, 0.1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we would expect `1.1` to be part of the range, because it's clearly smaller than `1.11`. So let's modify our function:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "def myrange(start, stop, step=1):\n", " if np.isclose((stop - start) % step, 0.0):\n", " stop -= step * 0.5\n", " return np.arange(start, stop, step)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myrange(0.5, 1.11, 0.1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK, now `1.1` is part of the range, as expected." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.5, 0.6, 0.7, 0.8, 0.9, 1. ])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myrange(0.5, 1.1, 0.1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The previous example still works. Good.\n", "\n", "This isn't a very elegant solution, though. The behavior depends on the tolerances used in [numpy.isclose()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.isclose.html), namely `rtol` and `atol`. Let's add those to our function!" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "def myrange(start, stop, step=1, dtype=None, **kwargs):\n", " if np.isclose((stop - start) % step, 0.0, **kwargs):\n", " stop -= step * 0.5\n", " return np.arange(start, stop, step, dtype)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that we also enabled the `dtype` argument from `numpy.arange()`." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1])" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myrange(0.5, 1.11, 0.1)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.5, 0.6, 0.7, 0.8, 0.9, 1. ])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myrange(0.5, 1.11, 0.1, atol=0.02)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a quite constructed example, but it should show that setting the tolerance works\n", "\n", "One difference to `numpy.arange()` is that even when only passing integers, the result consists of floating-point numbers." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1., 2., 3., 4., 5., 6.])" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myrange(1, 7)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could add this as branch to our function implementation, but I think it isn't that important. And if we really want integers, we can still force them with `dtype`:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 4, 5, 6])" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myrange(1, 7, dtype=int)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK, that doesn't look that bad, does it?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## But what if I want to include the endpoint?\n", "\n", "OK, you're right. Sometimes, we want the endpoint to be included, so let's try to add this as an option." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "def myrange(start, stop, step=1, endpoint=False, dtype=None, **kwargs):\n", " # Don't use this function, see below!\n", " if np.isclose((stop - start) % step, 0.0, **kwargs):\n", " if endpoint:\n", " stop += step * 0.5\n", " else:\n", " stop -= step * 0.5\n", " return np.arange(start, stop, step, dtype)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myrange(0.5, 1.1, 0.1, endpoint=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK, this seems to work. Let's try another example ..." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1. , 1.2, 1.4, 1.6, 1.8])" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myrange(1, 2, 0.2, endpoint=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What? The endpoint clearly isn't part of the range!\n", "\n", "The problem is that we check if the result of the modulus operator (`%`) is close to `0.0`, but we should also check the case where it is close to `step`!\n", "\n", "We should try to add this check:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "def myrange(start, stop, step=1, endpoint=False, dtype=None, **kwargs):\n", " # Don't use this function, see below!\n", " remainder = (stop - start) % step\n", " if np.any(np.isclose(remainder, (0.0, step), **kwargs)):\n", " if endpoint:\n", " stop += step * 0.5\n", " else:\n", " stop -= step * 0.5\n", " return np.arange(start, stop, step, dtype)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1. , 1.2, 1.4, 1.6, 1.8, 2. ])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myrange(1, 2, 0.2, endpoint=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Good, that's settled. Let's try another one:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1])" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myrange(0.5, 1.11, 0.1, endpoint=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now this is a little strange, because `1.11` is not included. But on the other hand, it isn't even part of the series!\n", "\n", "As far as I can tell, the argument `endpoint` only really makes sense for integer multiples.\n", "For now, I don't know a better solution than to just disallow this case." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "def myrange(start, stop, step=1, endpoint=False, dtype=None, **kwargs):\n", " remainder = (stop - start) % step\n", " if np.any(np.isclose(remainder, (0.0, step), **kwargs)):\n", " if endpoint:\n", " stop += step * 0.5\n", " else:\n", " stop -= step * 0.5\n", " elif endpoint:\n", " raise ValueError(\"Invalid stop value for endpoint=True\")\n", " return np.arange(start, stop, step, dtype)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1])" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "myrange(0.5, 1.1, 0.1, endpoint=True)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "Invalid stop value for endpoint=True", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mmyrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0.5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m1.11\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m0.1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mendpoint\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m\u001b[0m in \u001b[0;36mmyrange\u001b[0;34m(start, stop, step, endpoint, dtype, **kwargs)\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0mstop\u001b[0m \u001b[0;34m-=\u001b[0m \u001b[0mstep\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0;36m0.5\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mendpoint\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 9\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Invalid stop value for endpoint=True\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 10\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstart\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstop\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstep\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mValueError\u001b[0m: Invalid stop value for endpoint=True" ] } ], "source": [ "myrange(0.5, 1.11, 0.1, endpoint=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK, I guess I'll leave it at that for now.\n", "This is also more or less what I implemented as helper function in the [Sound Field Synthesis Toolbox](http://sfs-python.readthedocs.io/#sfs.util.strict_arange).\n", "\n", "It's not a perfect solution, but probably it works for your use case.\n", "\n", "Have fun!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", " \n", " \"CC0\"\n", " \n", "
\n", " To the extent possible under law,\n", " the person who associated CC0\n", " with this work has waived all copyright and related or neighboring\n", " rights to this work.\n", "

" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 4 }