{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Python and numpy bool Types" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This blog post is triggered by a colleague stopping me in the hall and asking \"What does `~` do in Python?\" She was surprised by the behavior of the `~` operator when applied to Python bool types and I was surprised that it behaved differently on numpy bools than on Python bools. All in all enough surprises to write a short blog post about the difference between the two variable types." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The `~` Operator\n", "\n", "Let's start with the original question, what does `~n` do in Python? Answer: [It inverts the bits of `n`](https://docs.python.org/3/library/stdtypes.html#bitwise-operations-on-integer-types), where `n` is an integer. So for example:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 85 in binary: +1010101\n", "~85 in binary: -1010110 is -86 in integer\n" ] } ], "source": [ "n = 85\n", "print(\" {0:d} in binary: {0:+b}\".format(n))\n", "print(\"~{0:d} in binary: {1:+b} is {1:d} in integer\".format(n, ~n))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You may find it surprising that `~85` does not return the bit pattern `0101010` but this is just due to the [two's complement](https://wiki.python.org/moin/BitwiseOperators) representation of integers in Python. \n", "\n", "## Python bool\n", "\n", "Understanding two's complement and knowing that Python `bool`s are a subclass of `int`, it is not surprising that" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " True in binary: 0b1\n", "~True in binary: -0b10 is -2 in integer\n" ] } ], "source": [ "print(\" True in binary: {:s}\".format(bin(True)))\n", "print(\"~True in binary: {:s} is {:d} in integer\".format(bin(~True), ~True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and so the truth value of `~True` is `True`:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "print(bool(~True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "because any integer other than zero evaluates to `True`. This may come as a surprise if you are not aware that bools in Python are in fact integers, which use two's complement. It's even a little bit more confusing because" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-1 True\n" ] } ], "source": [ "print(~False, bool(~False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`~False` in fact evaluates to True. If you want to correctly negate Python boolean values use logical `not` and not bitwise not (`~`):" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "False True\n" ] } ], "source": [ "print(not True, not False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Numpy bool\n", "\n", "What surprised *me* was that numpy bools show a different behavior:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ True True True True True True True True True True]\n", "[False False False False False False False False False False]\n" ] } ], "source": [ "import numpy as np\n", "a = np.ones(10, dtype=bool)\n", "print(a)\n", "print(~a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The reason for this is that numpy bools are an entirely different type. They are not an subclass of Python bools and they are also not a subclass of any numeric type. This is all clearly stated in the [numpy reference manual](https://docs.scipy.org/doc/numpy/reference/arrays.scalars.html#built-in-scalar-types) even with the following warning\n", "\n", "> The bool\\_ type is not a subclass of the int\\_ type (the bool\\_ is not even a number type). This is different than Python’s default implementation of bool as a sub-class of int.\n", "\n", "yet reading this without this example I didn't fully understand the consequences.\n", "\n", "In numpy we can make things even a little more convoluted if we mix Python bools and numpy.bool\\_ in an object array." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ True True False True]\n", "[ True True True False]\n" ] } ], "source": [ "b = np.array([True, True, False, np.True_], dtype=object)\n", "print(b.astype(np.bool))\n", "print((~b).astype(np.bool))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "My advise above to use logical `not` also does not work for numpy arrays because `not` is not applied element-wise but tries to evaluate the boolean value of the entire array." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "ename": "ValueError", "evalue": "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;32mnot\u001b[0m \u001b[0mb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mValueError\u001b[0m: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()" ] } ], "source": [ "print(not b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The correct thing to do for numpy arrays is to use the ufunc `logical_not`, which gives the expected result for both our arrays `a` and `b`" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Array a\n", "[ True True True True True True True True True True]\n", "[False False False False False False False False False False]\n", "\n", "Array b\n", "[True True False True]\n", "[False False True False]\n" ] } ], "source": [ "print(\"Array a\")\n", "print(a)\n", "print(np.logical_not(a))\n", "print(\"\\nArray b\")\n", "print(b)\n", "print(np.logical_not(b))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "This blog post is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. 