{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "
\n", "\n", "[Photo](https://www.pexels.com/photo/landscape-photography-of-desert-33147/) by [Pixabay](https://pixabay.com/en/desert-africa-bedouin-footprints-1007157/) / [CC0](https://www.pexels.com/creative-commons-images/)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Motivation\n", "\n", "This paragraph [in the Python docs](https://docs.python.org/3/reference/datamodel.html#object.__hash__):\n", "\n", "> If a class does not define an [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) method it should not define a [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__) operation either; if it defines [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) but not [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__), its instances will not be usable as items in hashable collections. If a class defines mutable objects and implements an [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) method, it should not implement [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).\n", "\n", "..._why_?\n", "\n", "This talk is a foray into the Python data model so that we understand the reasoning behind this advice. And, above everything else, just an excuse to talk about [hash tables](https://en.wikipedia.org/wiki/Hash_table)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "# Outline\n", "\n", "1. Object _equality_ versus object _identity_.\n", "2. Rich comparison operators for our classes.\n", "3. Python dictionaries and sets are _hash tables_.\n", "4. Fundamentals of hash tables, $\\mathcal{O}(1)$ for the win.\n", "5. The properties of a good hash function.\n", "6. Implementing our own, full-fledged hash table.\n", "7. Revising the initial paragraph — we're now ready.\n", "8. Understanding —and remembering— why.\n", "\n", "_On your marks, get set, go!_" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "tags": [ "alternative:subslide" ] }, "source": [ "# Outline\n", "\n", "1. Python dictionaries and sets are _hash tables_.\n", "2. Fundamentals of hash tables, $\\mathcal{O}(1)$ for the win.\n", "3. The properties of a good hash function.\n", "4. Implementing our own, full-fledged hash table.\n", "5. Revising the initial paragraph — we're now ready.\n", "6. Understanding —and remembering— why.\n", "\n", "_On your marks, get set, go!_" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [ "alternative:skip" ] }, "source": [ "# Object _equality_ versus object _identity_" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "# `==` tests for _equality_\n", "\n", "The `==` operator is the object **value comparison** operator.\n", "\n", "It tests whether two objects have the same _value_." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = [2, 3]\n", "y = [2, 3]\n", "\n", "x == y" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "Both variables have the same value, so the result of the comparison is `True`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "## `__eq__()`\n", "\n", "In the [Python data model](https://docs.python.org/3/reference/datamodel.html), equality is implemented via [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) and its counterpart, [\\_\\_ne\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__ne__)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "# `is` tests for _identity_\n", "\n", "The `is` operator, on the other hand, is the object **identity comparison** operator.\n", "\n", "It tests whether two objects are the _same_." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = [2, 3]\n", "y = [2, 3]\n", "\n", "x is y" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "Although `x` and `y` have the same value, they are different objects, so the comparison with is returns `False`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "## `id()`\n", "\n", "Identity is checked by comparing the built-in function [`id()`](https://docs.python.org/3/library/functions.html#id). It returns the \"identity\" of an object: an integer guaranteed to be unique and constant for each object during its lifetime." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "140116482324168" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = []\n", "id(x)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "140116482324488" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = []\n", "id(y)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x is y" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "In [CPython](https://github.com/python/cpython) this number is in fact the address of the object in memory." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "# However...\n", "\n", "Also known as \"Trivia Facts, Part I\".\n", "\n", "\n", "## Integers\n", "\n", "\n", "What we just said doesn't work as we might expect with integers." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "94149475332416" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = 23\n", "id(x)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "94149475332416" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = 23\n", "id(x)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x is y" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "What? But... they're different objects... _right_?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "And to make things even worse:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "140116482290160" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = 581\n", "id(a)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "140116482290224" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = 581\n", "id(b)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a is b" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "The difference is that the current implementation (of, again, CPython) keeps an array of integer objects for **all integers between −5 and 256**. When we create an integer in that range what we are actually getting is just a reference back to that existing object.\n", "\n", "In other words: in CPython, integers between −5 and 256 are [singletons](https://en.wikipedia.org/wiki/Singleton_pattern).\n", "\n", "This is an _implementation detail_. For example, until February 2006 this happened only for values [−5, 99]. It's undocumented for a reason. Never rely on this!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "## Strings\n", "\n", "The same behavior may be exhibited by small strings literals." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "140116481858392" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = \"dog\"\n", "id(x)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "140116481858392" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = \"dog\"\n", "id(y)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x is y" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "These strings are _interned_." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "# String interning\n", "\n", "From [Wikipedia](https://en.wikipedia.org/wiki/String_interning) \\[emphasis added\\]:\n", "\n", "> In computer science, string interning is a method of **storing only one copy of each distinct string value**, which must be immutable. Interning strings makes some string processing tasks more time- or space-efficient at the cost of requiring more time when the string is created or interned." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = \"\".join([\"d\", \"o\", \"g\"])\n", "y = \"dog\"\n", "x is y" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "Why the difference? Note that we said earlier that this behavior is exhibited for _literals_. Here, `x` is an expression. Because Python does not know its value until runtime, `x` is not interned and therefore it's a different object than `y`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "We can force a string to be interned, however, using [`sys.intern()`](https://docs.python.org/3.2/library/sys.html#sys.intern)." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import sys\n", "\n", "x = sys.intern(\"\".join([\"a\", \"b\", \"c\"]))\n", "y = \"abc\"\n", "\n", "x is y" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "# How is this useful?\n", "\n", "From the [Python docs](https://docs.python.org/3/library/sys.html#sys.intern):\n", "\n", "> Interning strings is useful to gain a little performance on dictionary lookup – if the keys in a dictionary are interned, and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer compare instead of a string compare. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [ "alternative:skip" ] }, "source": [ "# A Python class of our own\n", "\n", "Let's move beyond built-in types and define our own class, `Circle`." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Circle(x=2, y=3, radius=5)\n" ] } ], "source": [ "class Circle:\n", " \n", " def __init__(self, x, y, radius):\n", " self.x = x\n", " self.y = y\n", " self.radius = radius\n", " \n", " def __repr__(self):\n", " return \"Circle(x={}, y={}, radius={})\".format(\n", " self.x, self.y, self.radius)\n", "\n", "\n", "c = Circle(2, 3, 5)\n", "print(c)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "To avoid hard-coding the name of the class, we can use instead:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [], "source": [ "class Circle:\n", " \n", " def __init__(self, x, y, radius):\n", " self.x = x\n", " self.y = y\n", " self.radius = radius\n", " \n", " def __repr__(self):\n", " return \"{}(x={}, y={}, radius={})\".format(\n", " type(self).__name__, # note this\n", " self.x,\n", " self.y,\n", " self.radius)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "# However...\n", "\n", "The comparison between the instances of our class is broken!" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c1 = Circle(2, 3, 5)\n", "c2 = Circle(2, 3, 5)\n", "c1 == c2" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "We didn't expect this — the _values_ are the same, so they should compare equal." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n", "True\n", "True\n" ] } ], "source": [ "print(c1.x == c2.x)\n", "print(c1.y == c2.y)\n", "print(c1.radius == c2.radius)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "# We didn't define `__eq__()`\n", "\n", "From the [Python docs](https://docs.python.org/3/reference/datamodel.html#object.__hash__):\n", "\n", "> User-defined classes have \\[a\\] [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) [method] by default; with \\[it\\], all objects compare unequal (except with themselves) \\[...\\]\n", "\n", "In other words, if we don't define [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) by default it uses `is`.\n", "\n", "And `c1` and `c2` are different objects." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c1 is c2" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "# Rich comparison methods\n", "\n", "\n", "For our classes to be fully comparable, they must implement the \"_rich_\" comparison methods.\n", "\n", "Each comparison operation calls one of these methods. The correspondences are:\n", "\n", "- [`__lt__()`](https://docs.python.org/3/reference/datamodel.html#object.__lt__) $\\rightarrow$ `x < y`\n", "- [`__le__()`](https://docs.python.org/3/reference/datamodel.html#object.__le__) $\\rightarrow$ `x <= y`\n", "- [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) $\\rightarrow$ `x == y`\n", "- [`__ne__()`](https://docs.python.org/3/reference/datamodel.html#object.__ne__) $\\rightarrow$ `x != y` \n", "- [`__gt__()`](https://docs.python.org/3/reference/datamodel.html#object.__gt__) $\\rightarrow$ `x > y`\n", "- [`__ge__()`](https://docs.python.org/3/reference/datamodel.html#object.__ge__) $\\rightarrow$ `x >= y`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "For example, so that we can use `==` with our class:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class Circle:\n", " \n", " def __init__(self, x, y, radius):\n", " self.x = x\n", " self.y = y\n", " self.radius = radius\n", " \n", " def __eq__(self, other):\n", " return self.x == other.x and \\\n", " self.y == other.y and \\\n", " self.radius == other.radius\n", " \n", " \n", "c1 = Circle(2, 3, 5)\n", "c2 = Circle(2, 3, 5)\n", "c1 == c2" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "# `bool()`\n", "\n", "We _could_ return anything, as Python will call `bool()` to determine the truth value." ] }, { "cell_type": "code", "execution_count": 188, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "'Yes, they are equal!'" ] }, "execution_count": 188, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class Circle:\n", " \n", " def __init__(self, x, y, radius):\n", " self.x = x\n", " self.y = y\n", " self.radius = radius\n", " \n", " def __eq__(self, other):\n", " if self.x == other.x and \\\n", " self.y == other.y and \\\n", " self.radius == other.radius:\n", " return \"Yes, they are equal!\"\n", " else:\n", " return \"Nope, they're different\"\n", " \n", " \n", "c1 = Circle(2, 3, 5)\n", "c2 = Circle(2, 3, 5)\n", "c1 == c2" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "As the condition of an `if` statement:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I already told you, they're equal!\n" ] } ], "source": [ "if c1 == c2:\n", " print(\"I already told you, they're equal!\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "But the truth value of _any_ non-empty string is `True`:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [], "source": [ "c1 = Circle(1, 3, 9)\n", "c2 = Circle(2, 5, 7)\n", "\n", "if c1 == c2: # bool(\"Nope, they're different\")\n", " print(\"These circles are equal too!\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "Dont' do this. Return `True` or `False`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "# `NotImplemented`\n", "\n", "- If the comparison is not defined, we return the singleton [`NotImplemented`](https://docs.python.org/3/library/constants.html#NotImplemented).\n", "- It's a singleton, _not_ an exception!\n", "- This makes Python understand and complain that the two objects cannot be compared." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "ename": "TypeError", "evalue": "unorderable types: Circle() < Circle()", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mc1\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCircle\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mc2\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCircle\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mc1\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0mc2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: unorderable types: Circle() < Circle()" ] } ], "source": [ "c1 = Circle(2, 3, 5)\n", "c2 = Circle(2, 3, 5)\n", "c1 < c2" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "The `<` operator calls [`__lt__()`](https://docs.python.org/3/reference/datamodel.html#object.__lt__), but we haven't defined it. Python returns [`NotImplemented`](https://docs.python.org/3/library/constants.html#NotImplemented), which in turn raises [TypeError](https://docs.python.org/3/library/exceptions.html#TypeError).\n", "\n", "Reminder: we *exclusively* have a [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) by default, comparing equal only to ourselves." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "# Returning `NotImplemented` ourselves\n", "\n", "If we want to make sure that the object is only compared to another object of the same type:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class Circle:\n", " \n", " def __init__(self, x, y, radius):\n", " self.x = x\n", " self.y = y\n", " self.radius = radius\n", " \n", " def __eq__(self, other):\n", " if type(other) != type(self):\n", " return NotImplemented\n", " return self.x == other.x and \\\n", " self.y == other.y and \\\n", " self.radius == other.radius\n", " \n", " \n", "c1 = Circle(2, 3, 5)\n", "c1 == 5" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "To account and allow for subclasses:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [], "source": [ "class Circle:\n", " \n", " def __init__(self, x, y, radius):\n", " self.x = x\n", " self.y = y\n", " self.radius = radius\n", " \n", " def __eq__(self, other):\n", " if not isinstance(other, type(self)): # instead of type() == type()\n", " return NotImplemented\n", " return self.x == other.x and \\\n", " self.y == other.y and \\\n", " self.radius == other.radius" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [ "alternative:skip" ] }, "source": [ "# Reflections\n", "\n", "## `__lt__()` and `__gt__()`\n", "\n", "Let's say we want to sort our `Circle` object by increasing area." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [], "source": [ "class Circle:\n", " \n", " def __init__(self, x, y, radius):\n", " self.x = x\n", " self.y = y\n", " self.radius = radius\n", " \n", " def __lt__(self, other):\n", " return self.radius < other.radius" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "It works as we would expect:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c1 = Circle(2, 5, 13)\n", "c2 = Circle(3, 7, 11)\n", "c1 < c2" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "However..." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c1 > c2" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "How?! We didn't define [`__gt__()`](https://docs.python.org/3/reference/datamodel.html#object.__gt__), but Python is not complaining." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "From the [Python docs](https://docs.python.org/3/reference/datamodel.html#object.__gt__):\n", "\n", "> There are no swapped-argument versions of these methods (to be used when the left argument does not support the operation but the right argument does); rather, [`__lt__()`](https://docs.python.org/3/reference/datamodel.html#object.__lt__) and [`__gt__()`](https://docs.python.org/3/reference/datamodel.html#object.__gt__) are each other’s reflection [...]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "This is what happened here: `Circle` doesn't define [`__gt__()`](https://docs.python.org/3/reference/datamodel.html#object.__gt__), so..." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "NotImplemented" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c1.__gt__(c2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "... was translated into:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c2.__lt__(c1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "Technically speaking, we fallback to the right operand’s reflected method." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "Let's verify it via the good ol' [printf debugging](https://en.wikipedia.org/wiki/Debugging#Techniques) technique." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "called Circle(x=3, y=7, radius=11) < Circle(x=2, y=5, radius=13)\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class Circle:\n", " \n", " def __init__(self, x, y, radius):\n", " self.x = x\n", " self.y = y\n", " self.radius = radius\n", " \n", " def __repr__(self):\n", " return \"{}(x={}, y={}, radius={})\".format(\n", " type(self).__name__, self.x, self.y, self.radius)\n", " \n", " def __lt__(self, other):\n", " print(\"called {} < {}\".format(self, other)) # we added this\n", " return self.radius < other.radius\n", "\n", "\n", "c1 = Circle(2, 5, 13)\n", "c2 = Circle(3, 7, 11)\n", "c1 > c2" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "So, indeed, Python resorted to the reflection of the method we have." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "## `__le__()` and `__ge__()`\n", "\n", "The same happens for [`__le__()`](https://docs.python.org/3/reference/datamodel.html#object.__le__) and [`__ge__()`](https://docs.python.org/3/reference/datamodel.html#object.__ge__)." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "called Circle(x=3, y=7, radius=13) >= Circle(x=2, y=5, radius=11)\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class Circle:\n", " \n", " def __init__(self, x, y, radius):\n", " self.x = x\n", " self.y = y\n", " self.radius = radius\n", " \n", " def __repr__(self):\n", " return \"{}(x={}, y={}, radius={})\".format(\n", " type(self).__name__, self.x, self.y, self.radius)\n", " \n", " def __ge__(self, other):\n", " print(\"called {} >= {}\".format(self, other)) # more printf debugging\n", " return self.radius >= other.radius\n", "\n", "\n", "c1 = Circle(2, 5, 11)\n", "c2 = Circle(3, 7, 13)\n", "c1 <= c2" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "## `__eq__()` and `__ne__()`\n", "\n", "Finally, [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) and [`__ne__()`](https://docs.python.org/3/reference/datamodel.html#object.__ne__) are *their own reflection*.\n", "\n", "This means that if `x.__eq__(y)` is not defined, we try to use `y.__eq__(x)`.\n", "\n", "Don't see why it's useful? You're not alone." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "Let's go back to an earlier example." ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [], "source": [ "class Circle:\n", " \"\"\"A regular circle.\"\"\"\n", "\n", " def __init__(self, x, y, radius):\n", " self.x = x\n", " self.y = y\n", " self.radius = radius\n", " \n", " def __repr__(self):\n", " return \"{}(x={}, y={}, radius={})\".format(\n", " type(self).__name__, self.x, self.y, self.radius)\n", " \n", " def __eq__(self, other):\n", " print(\"calling {} == {}\".format(self, other))\n", " if not isinstance(other, type(self)): # we allow for subclasses\n", " return NotImplemented\n", " return self.x == other.x and \\\n", " self.y == other.y and \\\n", " self.radius == other.radius\n", "\n", "\n", "class PremiumCircle(Circle): \n", " \"A luxury Circle, with high quality values.\"\n", " pass # it's the exact same thing, of course" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "Focus your attention on `__eq__()`:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [], "source": [ "def __eq__(self, other):\n", " if not isinstance(other, type(self)): # we allow for subclasses\n", " return NotImplemented\n", " return self.x == other.x and \\\n", " self.y == other.y and \\\n", " self.radius == other.radius" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "```python\n", "if not isinstance(other, type(self)): # we allow for subclasses\n", " return NotImplemented\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "This means that `__eq__()` would only work if the subclass is the _right_ side of `==` (and, therefore, binds to `other`), because..." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "issubclass(PremiumCircle, Circle)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "... but..." ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "issubclass(Circle, PremiumCircle)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "... so `__eq__()` will return `NotImplemented`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "However..." ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "calling PremiumCircle(x=4, y=5, radius=8) == Circle(x=2, y=3, radius=5)\n", "calling Circle(x=2, y=3, radius=5) == PremiumCircle(x=4, y=5, radius=8)\n" ] }, { "data": { "text/plain": [ "False" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c1 = PremiumCircle(4, 5, 8)\n", "c2 = Circle(2, 3, 5)\n", "c1 == c2 # subclass on the *left* side!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "Works! This is, again, because of reflection. Note how we called `__eq__()` twice.\n", "\n", "`PremiumCircle.__eq__(Circle)` returns `NotImplemented`, because `PremiumCircle` is not a subclass of `Circle`. Python uses reflection to try the opposite: `Circle.__eq__(PremiumCircle)`: this comparison can take place because `PremiumCircle` *is* a subclass of Circle." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "The reflection of [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) and [`__ne__()`](https://docs.python.org/3/reference/datamodel.html#object.__ne__) with themseves allows us not having to worry about whether `self` is a subclass of `other` or the other way around.\n", "\n", "From the [docs of NotImplemented](https://docs.python.org/3/library/constants.html#NotImplemented) \\[emphasis added\\]:\n", "\n", "> When a binary (or in-place) method returns `NotImplemented` **the interpreter will try the reflected operation on the other type** (or some other fallback, depending on the operator). If all attempts return `NotImplemented`, the interpreter will raise an appropriate exception [...]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [ "alternative:skip" ] }, "source": [ "# Implied relationships\n", "\n", "By default, [`__ne__()`](https://docs.python.org/3/reference/datamodel.html#object.__ne__) delegates to [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) and inverts the result unless it is `NotImplemented`.\n", "\n", "This means that defining [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) gives us access to `!=` for free." ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class Circle:\n", " \n", " def __init__(self, x, y, radius):\n", " self.x = x\n", " self.y = y\n", " self.radius = radius\n", "\n", " def __eq__(self, other):\n", " return self.x == other.x and \\\n", " self.y == other.y and \\\n", " self.radius == other.radius\n", " \n", "\n", "c1 = Circle(2, 5, 11)\n", "c2 = Circle(3, 7, 13)\n", "c1 != c2" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "## And nothing else\n", "\n", "There are **no other** implied relationships among the comparison operators.\n", "\n", "Python cannot do smart things like figuring out that if `x >= y` and `x != y`, then if follows that `x > y`.\n", "\n", "Humans understand that if (a) `x` is greater than or equal to `y` and (b) they're not equal, that means that `x` must be greater than `y`. Python cannot." ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "ename": "TypeError", "evalue": "unorderable types: Circle() > Circle()", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 15\u001b[0m \u001b[0mc1\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCircle\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m11\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 16\u001b[0m \u001b[0mc2\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCircle\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m7\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m13\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 17\u001b[0;31m \u001b[0mc1\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0mc2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: unorderable types: Circle() > Circle()" ] } ], "source": [ "class Circle:\n", " \n", " def __init__(self, x, y, radius):\n", " self.x = x\n", " self.y = y\n", " self.radius = radius\n", " \n", " def __eq__(self, other):\n", " return self.x == other.x and \\\n", " self.y == other.y and \\\n", " self.radius == other.radius\n", " \n", " def __ge__(self, other):\n", " return self.radius >= other.radius\n", " \n", " \n", "c1 = Circle(2, 5, 11)\n", "c2 = Circle(3, 7, 13)\n", "c1 > c2 " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "# So, all of them?\n", "\n", "So does this mean the in practice we need to define _all_ the comparison methods?" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [], "source": [ "class Circle:\n", " \n", " def __init__(self, x, y, radius):\n", " self.x, self.y, self.radius = x, y, radius\n", "\n", " def __eq__(self, other):\n", " return self.x == other.x and \\\n", " self.y == other.y and \\\n", " self.radius == other.radius\n", " \n", " def __ne__(self, other):\n", " return not self == other\n", " \n", " def __ge__(self, other):\n", " return self.radius >= other.radius\n", " \n", " def __le__(self, other):\n", " return self.radius <= other.radius\n", " \n", " def __gt__(self, other):\n", " return self.radius > other.radius\n", " \n", " def __lt__(self, other):\n", " return self.radius < other.radius" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "## Not _all_, because:\n", "\n", "- `!=` returns the opposite of `==`, so we don't have to implement [`__ne__()`](https://docs.python.org/3/reference/datamodel.html#object.__ne__).\n", "- For [`__le__()`](https://docs.python.org/3/reference/datamodel.html#object.__le__) we can use the reflection of [`__gt__()`](https://docs.python.org/3/reference/datamodel.html#object.__gt__) (or vice versa).\n", "- For [`__gt__()`](https://docs.python.org/3/reference/datamodel.html#object.__gt__) we can use the reflection of [`__lt__()`](https://docs.python.org/3/reference/datamodel.html#object.__ne__) (or vice versa)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "So we can leave it in:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [], "source": [ "class Circle:\n", " \n", " def __init__(self, x, y, radius):\n", " self.x = x\n", " self.y = y\n", " self.radius = radius\n", " \n", " def __eq__(self, other):\n", " return self.x == other.x and \\\n", " self.y == other.y and \\\n", " self.radius == other.radius\n", "\n", " def __ge__(self, other):\n", " return self.radius >= other.radius\n", " \n", " def __gt__(self, other):\n", " return self.radius > other.radius" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [ "alternative:skip" ] }, "source": [ "# `functools.total_ordering()`\n", "\n", "We can forget all these rules and let [`functools.total_ordering()`](https://docs.python.org/3/library/functools.html#functools.total_ordering) do all the work.\n", "\n", "It just needs [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) and one more (any) comparison operator to derive the rest for us _automatically_." ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "outputs": [], "source": [ "import functools\n", "\n", "@functools.total_ordering\n", "class Circle:\n", " \n", " def __init__(self, x, y, radius):\n", " self.x = x\n", " self.y = y\n", " self.radius = radius\n", " \n", " def __eq__(self, other):\n", " return self.x == other.x and \\\n", " self.y == other.y and \\\n", " self.radius == other.radius\n", "\n", " def __ge__(self, other):\n", " return self.radius >= other.radius" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "The name [total ordering](https://en.wikipedia.org/wiki/Total_order) comes from mathematics: \"binary relation on some set X, which is antisymmetric, transitive, and a connex relation\". In human-being-speak we can say that a total order is that of a set of elements in which all the possible pairs are comparable.\n", "\n", "The downside, from [the Python docs](https://docs.python.org/3/library/functools.html#functools.total_ordering) \\[emphasis added\\]:\n", "\n", "> [...] it does come at the **cost of slower execution and more complex stack traces** for the derived comparison methods. If performance benchmarking indicates this is a bottleneck for a given application, implementing all six rich comparison methods instead is likely to provide an easy speed boost." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [ "alternative:skip" ] }, "source": [ "# Prehistory's `__cmp__()`\n", "\n", "Before Python 3, if the rich comparison methods were not implemented, Python falled back to a different magic method: [`__cmp__()`](https://docs.python.org/2/reference/datamodel.html#object.__cmp__).\n", "\n", "`object.__cmp__(self, other)` would return:\n", "\n", "- A negative integer if `self < other`.\n", "- Zero if `self == other`.\n", "- a positive integer if `self > other`.\n", "\n", "Thus, our early ancestors could choose between implementing [`__cmp__()`](https://docs.python.org/2/reference/datamodel.html#object.__cmp__) or the rich comparison methods — named like this to differentiate them from the more primitive alternative.\n", "\n", "In Python 3 [`__cmp__()`](https://docs.python.org/2/reference/datamodel.html#object.__cmp__) no longer exists." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [ "alternative:skip" ] }, "source": [ "# Wrapping up this section\n", "\n", "We now know how to create well behaved totally ordered classes.\n", "\n", "- `==` is for equality...\n", "- ... and `is` for identity.\n", "- The default value of [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__).\n", "- Rich comparison operators:\n", " * Return `NotImplemented` if comparison is undefined.\n", " * Fallback to the right operand's reflected method.\n", " * Have only one implied relationsip: [`__ne__()`](https://docs.python.org/3/reference/datamodel.html#object.__ne__) $\\leftrightarrow$ [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__)\n", "- [`functools.total_ordering()`](https://docs.python.org/3/library/functools.html#functools.total_ordering) simplifies our life.\n", "- [`__cmp__()`](https://docs.python.org/2/reference/datamodel.html#object.__cmp__) exists only in history books." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "
\n", "\n", "
\n", "\n", "[Photo](https://www.pexels.com/photo/adventure-arid-barren-dawn-206724/) by [Pixabay](https://pixabay.com/en/desert-morocco-sand-dune-dry-1748462/) | [CC0](https://www.pexels.com/creative-commons-images/)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [ "alternative:skip" ] }, "source": [ "# All right, I guess — but do we _really_ care?\n", "\n", "We do care if we want to use our classes in a Python dictionary or a set." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Mandatory disclaimer\n", "\n", "Everything to be known about dictionaries was said by [Brandon Rhodes](https://rhodesmill.org/brandon/) back in 2010.\n", "\n", "
\n", "\n", " The Mighty Dictionary (Brandon Rhodes, PyCon 2010).\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# One-minute overview\n", "\n", "If we work with a a Python list, we access elements by their index number." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "numbers = [2, 3, 5, 7, 11]" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "numbers[0]" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "numbers[2]" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "11" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "numbers[-1]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "A [bijective function](https://en.wikipedia.org/wiki/Bijection) maps each index element to the corresponding value." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [], "source": [ "numbers = [2, 3, 5, 7, 11]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "For example, index two (i.e. third element in the array) maps to value 5.\n", "\n", "However, we cannot easily go the other way around. Given, say, the value 7:\n", " - Is it in the array?\n", " - What's its index?\n", "\n", "The answer to these questions is $\\mathcal{O}(n)$." ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "7 in numbers # O(n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Same for:" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "numbers.index(7) # O(n)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Python is doing the equivalent of:" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def index(iterable, wanted):\n", " \"\"\"Return index of 'wanted' in 'iterable'; -1 if not found.\"\"\"\n", " \n", " for index, element in enumerate(iterable):\n", " if element == wanted:\n", " return index\n", " return -1\n", "\n", "\n", "index(numbers, 7)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To make it explicit: we have to visit _all_ the elements in the iterable, so $\\mathcal{O}(n)$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Dictionaries\n", "\n", "- Python [dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) are a [mapping type](https://docs.python.org/3/library/stdtypes.html#typesmapping).\n", "- Instead of indexing elements by a range of numbers, we use *keys*.\n", "- Keys can be any **immutable** (more of that later) type, e.g. numbers or strings.\n", "- They map each key to the corresponding value." ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "numbers = {} # now a dict\n", "numbers[1] = \"one\"\n", "numbers[3] = \"three\"\n", "numbers[7] = \"seven\"" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'three'" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "numbers[3]" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "5 in numbers" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "del numbers[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These operations, unlike for the list, are $\\mathcal{O}(1)$. How? Because dictionaries are Python's name for..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Hash tables\n", "\n", "- The most important data structure known to Humankind, [as Steve Yegge put it](https://steve-yegge.blogspot.com/2008/03/get-that-job-at-google.html).\n", "- Formally known as [unordered associative array](https://en.wikipedia.org/wiki/Hash_table), invented in 1953.\n", "- Insertion, deletion and lookups have an amortized cost of $\\mathcal{O}(1)$ (!)\n", "- Speed, speed, speed. Instant lookup of a value, regardless of how many we have!\n", "\n", "How do hash tables achieve this?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# A dictionary is really a list\n", "\n", "This is the first plot twist.\n", "\n", "A hash table is nothing else than a list (or _array_, in non-Python-specific terminology).\n", "\n", "Let's create our hash table:" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "table = [None, None, None, None, None]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let's visualize it vertically:" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [], "source": [ "table = [\n", " None, # index = 0\n", " None, # index = 1\n", " None, # index = 2\n", " None, # index = 3\n", " None, # index = 4\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- The difference is that the **index** of each element depends on its **value**.\n", "- Given a value, we can compute its index, and directly go there in $\\mathcal{O}(1)$.\n", "- In this manner, we know exactly where to look up the element — no need to loop over all of them.\n", "\n", "How do we compute the indexes?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Hash functions\n", "\n", "From [Wikipedia](https://en.wikipedia.org/wiki/Hash_function):\n", "\n", "> A hash function is any function that can be used to map data of arbitrary size to data of a fixed size.\n", "\n", "- It's a mathematical function that maps data to some smaller value.\n", "- We move from the [domain](https://en.wikipedia.org/wiki/Domain_of_a_function) of the hash function (the set of possible input values)...\n", "- ... to a _smaller_ domain: the number of outputs of the hash function.\n", "- The hash function _compresses_ our data to a fixed size." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# A simple hash function" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "def simple_hash(str_):\n", " return len(str_)" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "simple_hash(\"dog\")" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "simple_hash(\"tiger\")" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "17" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "simple_hash(\"Tyrannosaurus Rex\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We take data (strings, in this case) and return a numeric value." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Using our hash function to compute indexes" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "table = [\n", " None, # index = 0\n", " None, # index = 1\n", " None, # index = 2\n", " None, # index = 3\n", " None, # index = 4\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's get the hash of `\"dog\"`:" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hash of dog is 3\n" ] } ], "source": [ "word = 'dog'\n", "word_hash = simple_hash(word)\n", "print(\"Hash of\", word, \"is\", word_hash)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Use the hash as the index to store the value in our table:" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[None, None, None, 'dog', None]" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "index = word_hash\n", "table[index] = word # O(1)\n", "table" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let's now look up a value. Is `\"bear\"` in the hash table?" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "word = 'bear'\n", "index = simple_hash(word)\n", "table[index] != None" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# First problem: out of range\n", "\n", "Our function can return _any_ value, exceeding the size of the hash table." ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "ename": "IndexError", "evalue": "list assignment index out of range", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mword\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'monkey'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mindex\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msimple_hash\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mword\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mtable\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mindex\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mword\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mIndexError\u001b[0m: list assignment index out of range" ] } ], "source": [ "word = 'monkey'\n", "index = simple_hash(word)\n", "table[index] = word" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Solution: modulo operator\n", "\n", "- That is, we take the remainder after dividing by the length of the list.\n", "- [Modular arithmetic](https://en.wikipedia.org/wiki/Modular_arithmetic), the same way we all know that 70 minutes is one hour _plus_ 10 minutes.\n", "\n", "So the operation becomes:" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "6 % 5 == 1\n" ] } ], "source": [ "word = 'monkey'\n", "index = simple_hash(word) % len(table)\n", "print('{} % {} == {}'.format(simple_hash(word), len(table), index))" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "[None, 'monkey', None, 'dog', None]" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table[index] = word\n", "table" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Is `\"monkey\"` in the hash table?" ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "word = 'monkey'\n", "index = simple_hash(word) % len(table)\n", "table[index] != None" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Second problem: collisions!\n", "\n", "Because we transform the input values into a _smaller_ domain, sooner or later we'll come across two _different_ keys that map to the _same_ hash value.\n", "\n", "This is known as a **collision**." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Right now, our hash table looks like this:" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "table = [\n", " None, # index = 0\n", " 'monkey', # index = 1\n", " None, # index = 2\n", " 'dog', # index = 3\n", " None, # index = 4\n", "]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "A [wild new](https://knowyourmeme.com/memes/a-wild-x-appears-wild-x-appeared) word appears:" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "[None, 'monkey', None, 'cat', None]" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "word = 'cat'\n", "index = simple_hash(word) % len(table)\n", "table[index] = word\n", "table" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[None, 'monkey', None, 'cat', None]" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- `\"cat\"` and `\"dog\"` have the same length.\n", "- Since our hash function is the length, we ended up in the same position.\n", "- We override our value! `\"dog\"` is gone.\n", "\n", "This is a collision. They _will_ happen, so our hash table needs to take them into account.\n", "\n", "More on this later." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# More Trivia: PHP\n", "\n", "[PHP functions originally bucketed by strlen, were renamed to balance length](https://news.ycombinator.com/item?id=6919216):\n", "\n", "> Well, there were other factors in play there. htmlspecialchars was a\n", " very early function. Back when PHP had less than 100 functions and the\n", " function hashing mechanism was strlen(). **In order to get a nice hash\n", " distribution of function names across the various function name lengths\n", " names were picked specifically to make them fit into a specific length\n", " bucket**. This was circa late 1994 when PHP was a tool just for my own\n", " personal use and I wasn't too worried about not being able to remember\n", " the few function names. \\[emphasis added\\]\n", " \n", "Most programming languages use a hash table to store [symbols][1]: e.g. function name to function object or pointer.\n", "\n", "PHP was hashing by _length_, which lead to lots of collisions.\n", "\n", " [1]: https://en.wikipedia.org/wiki/Symbol_(programming)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Solution? Rename functions to smooth the distribution of lengths.\n", "\n", "From [PHP has inconsistent function naming](https://tnx.nl/php.html#names):\n", "\n", "```text\n", "underscore no underscore:\n", "stream_get_line readline\n", "disk_free_space diskfreespace\n", "is_object isset\n", "mcal_day_of_week jddayofweek\n", "```\n", "\n", "We live in the darkest timeline." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Behave, hash function!\n", "\n", "Formally speaking, a good hash function satisfies two main properties:\n", "\n", "## [Determinism](https://en.wikipedia.org/wiki/Hash_function#Determinism)\n", "\n", "The same input value should produce the same output value. In other words: the hash is a [mathematical function][1] of the data being hashed. We cannot depend on external variable parameters such as a pseudo-random number generator.\n", "\n", "Our function satisfied this, as the length of a string is deterministic.\n", "\n", " [1]: https://en.wikipedia.org/wiki/Function_(mathematics)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## [Uniformity](https://en.wikipedia.org/wiki/Hash_function#Uniformity)\n", "\n", "The hashes should be distributed as evently as possible over the output range. In other words: every hash should have the same probability of being generated.\n", "\n", "This is where our hash function (as well as PHP's) failed. Words in English...\n", "\n", "> [...] are reasonably well fitted by a shifted Poisson distribution with mean and variance equal to 6.94 and 5.80. [[Source]](https://www.sciencedirect.com/science/article/pii/0378375886901692)\n", "\n", "This would lead to _a lot_ of collisions." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "# A real-life example: MD5\n", "\n", "- A hash function producing a 128-bit hash value.\n", "- Designed by [Ronald Rivest](https://en.wikipedia.org/wiki/Ron_Rivest) in 1992." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "```text\n", "$ echo \"Let there be light\" | md5sum\n", "b3d440d0691efe7d7d485602f134f469 -\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "- 128 bits $\\rightarrow$ probability of two random hashes accidentally colliding is $1\\ /\\ {2}^{128}$\n", "- Initially designed to be used as a cryptographic hash function.\n", "- In 2004 it was shown that MD5 is not collision-resistant [[Wikipedia](https://en.wikipedia.org/wiki/MD5#History_and_cryptanalysis)]\n", "- For example, given a file we can find generate a different one with the same hash." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "MD5 is still useful to verify data integrity against _unintentional_ corruption.\n", "\n", "For example, [Debian](https://www.debian.org/) gives us [these MD5 checksums](https://cdimage.debian.org/debian-cd/current/amd64/iso-cd/MD5SUMS):\n", "\n", "```text\n", "f8446a84356a6bcbf79201cc9f46063f debian-9.5.0-amd64-netinst.iso\n", "47a5dca818220d8558d37dfa11b85550 debian-9.5.0-amd64-xfce-CD-1.iso\n", "2daa085925a556d35a0eeb27768cc892 debian-mac-9.5.0-amd64-netinst.iso\n", "```\n", "\n", "Then when I download the ISO:\n", "\n", "```text\n", "wget https://cdimage.debian.org/debian-cd/current/amd64/iso-cd/debian-9.5.0-amd64-netinst.iso\n", "md5sum debian-9.5.0-amd64-netinst.iso\n", "f8446a84356a6bcbf79201cc9f46063f\n", "```\n", "\n", "Same hash, so _moooost_ likely no corruption." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Python's `hash()`\n", "\n", "Meet the built-in [`hash()`](https://docs.python.org/3/library/functions.html#hash) function:\n", "\n", "> Return the hash value of the object (if it has one). Hash values are integers. They are used to quickly compare dictionary keys during a dictionary lookup. Numeric values that compare equal have the same hash value (even if they are of different types, as is the case for 1 and 1.0)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Integers" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash(0)" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash(5)" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "113" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash(113)" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "691752902764107783" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash(7.3)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "# Trivia redux" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "-5" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash(-5)" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "-2" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash(-2)" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "-2" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash(-1) # also -2!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "Why?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "> The hash value -1 is reserved (it’s used to flag errors in the C implementation). If the hash\n", " algorithm generates this value, we simply use -2 instead. [[Source](http://effbot.org/zone/python-hash.htm)]\n", " \n", "\n", "Proof in the [CPython source code](https://github.com/python/cpython/blob/2847ccae4687cb43334d87d86fb6c11cb14218f5/Objects/longobject.c#L2986-L2988):\n", "\n", "```c\n", "if (x == (Py_uhash_t)-1)\n", " x = (Py_uhash_t)-2;\n", "return (Py_hash_t)x;\n", "``` " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Strings\n", "\n", "For strings the algorithm is much more complex." ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4448411893666489140" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash(\"dog\")" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-1695142227333385162" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash(\"cat\")" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-4077740646546332916" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash(\"giraffe\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Looking at the [CPython source code](https://github.com/python/cpython/blob/ad65f15581173542f1d2a9968a63bee272510ce3/Python/pyhash.c#L161-L183):\n", "\n", "```c\n", "if (len < Py_HASH_CUTOFF) {\n", " /* Optimize hashing of very small strings with inline DJBX33A. */\n", " Py_uhash_t hash;\n", " const unsigned char *p = src;\n", " hash = 5381; /* DJBX33A starts with 5381 */\n", "\n", " switch(len) {\n", " /* ((hash << 5) + hash) + *p == hash * 33 + *p */\n", " case 7: hash = ((hash << 5) + hash) + *p++; /* fallthrough */\n", " case 6: hash = ((hash << 5) + hash) + *p++; /* fallthrough */\n", " case 5: hash = ((hash << 5) + hash) + *p++; /* fallthrough */\n", " case 4: hash = ((hash << 5) + hash) + *p++; /* fallthrough */\n", " case 3: hash = ((hash << 5) + hash) + *p++; /* fallthrough */\n", " case 2: hash = ((hash << 5) + hash) + *p++; /* fallthrough */\n", " case 1: hash = ((hash << 5) + hash) + *p++; break;\n", " default:\n", " Py_UNREACHABLE();\n", " }\n", " hash ^= len;\n", " hash ^= (Py_uhash_t) _Py_HashSecret.djbx33a.suffix;\n", " x = (Py_hash_t)hash;\n", " }\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Tuples" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3713081631934410656" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash((1, 2))" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4660141651573361605" ] }, "execution_count": 88, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash((\"dog\", \"cat\"))" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-1646678630949266879" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash((\"giraffe\", 113, 5.1))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Taking a look at the [CPython source code](https://github.com/python/cpython/blob/2847ccae4687cb43334d87d86fb6c11cb14218f5/Objects/tupleobject.c#L355-L367): " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```c\n", "x = 0x345678UL;\n", "p = v->ob_item;\n", "while (--len >= 0) {\n", " y = PyObject_Hash(*p++);\n", " if (y == -1)\n", " return -1;\n", " x = (x ^ y) * mult;\n", " /* the cast might truncate len; that doesn't change hash stability */\n", " mult += (Py_hash_t)(82520UL + len + len);\n", "}\n", "x += 97531UL;\n", "if (x == (Py_uhash_t)-1)\n", " x = -2;\n", "``` " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# `hash()` on our own classes\n", "\n", "[`hash()`](https://docs.python.org/3/library/functions.html#hash) delegates the method to the [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__) magic method." ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = 3\n", "hash(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is equivalent to..." ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.__hash__()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let's _not_ implement it in our class..." ] }, { "cell_type": "code", "execution_count": 92, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "-9223363279575207557" ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class Circle:\n", " \n", " def __init__(self, x, y, radius):\n", " self.x = x\n", " self.y = y\n", " self.radius = radius\n", " \n", "\n", "c = Circle(5, 7, 9)\n", "hash(c)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "It returns _something_, nevertheless!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "From the [docs](https://docs.python.org/3/reference/datamodel.html#object.__hash__):\n", "\n", "> User-defined classes have [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) and [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__) methods by default; with them, all objects compare unequal (except with themselves) and `x.__hash__()` returns an appropriate value such that `x == y` implies both that `x is y` and `hash(x) == hash(y)`.\n", "\n", "So the hash will only be the same if... the object is the same.\n", "\n", "The (default) hash function is not based on the _value_ of the object, as we would expect, but on its _identity_." ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Circle:\n", " \n", " def __init__(self, x, y, radius):\n", " self.x = x\n", " self.y = y\n", " self.radius = radius\n", " \n", "\n", "c1 = Circle(5, 7, 9)\n", "c2 = Circle(5, 7, 9)" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hash of c1: -9223363279575207564\n", "Hash of c2: 8757280120398\n" ] } ], "source": [ "print(\"Hash of c1:\", hash(c1))\n", "print(\"Hash of c2:\", hash(c2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Different objects, different hashes!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Implementing `__hash__()`\n", "\n", "To do things right, we need to implement our own [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__) method.\n", "\n", "The [docs](https://docs.python.org/3/reference/datamodel.html#object.__hash__) recommend \\[emphasis added\\]:\n", "\n", "> [...] it is advised to mix together the hash values of the components of the object that also play a part in comparison of objects by **packing them into a tuple and hashing the tuple**." ] }, { "cell_type": "code", "execution_count": 95, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "3789705017597642099" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class Circle:\n", " \n", " def __init__(self, x, y, radius):\n", " self.x = x\n", " self.y = y\n", " self.radius = radius\n", " \n", " def __hash__(self):\n", " return hash((self.x, self.y, self.radius))\n", "\n", "\n", "c = Circle(2, 3, 11)\n", "hash(c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this manner, we delegate [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__) to the hash function to the `tuple` object, which in turn will use the hashes of the attributes (`x`, `y` and `radius`) of our class.\n", "\n", "This hashes will be, as we need, both deterministic and uniform over the output range." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [ "alternative:skip" ] }, "source": [ "# Wrapping up this section\n", "\n", "- Dictionaries are hash tables.\n", "- Hash tables are _the_ data structure: $\\mathcal{O}(1)$ operations.\n", "- Using hash functions to compute hash table indexes. Two problems:\n", " * Values out of range $\\rightarrow$ modulo operator.\n", " * Collisions $\\rightarrow$ will happen, sooner or later.\n", "- PHP's inconsistent function naming — now we know why.\n", "- The two properties of a good hash function:\n", " * Determinism.\n", " * Uniformity.\n", "- A real-life example: MD5.\n", "- Python's built-in `hash()`.\n", "- Implementing `__hash__()`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "tags": [ "alternative:subslide" ] }, "source": [ "# Wrapping up this section\n", "\n", "- Dictionaries are hash tables.\n", "- Hash tables are _the_ data structure: $\\mathcal{O}(1)$ operations.\n", "- Using hash functions to compute hash table indexes. Two problems:\n", " * Values out of range $\\rightarrow$ modulo operator.\n", " * Collisions $\\rightarrow$ will happen, sooner or later.\n", "- PHP's inconsistent function naming — now we know why.\n", "- The two properties of a good hash function:\n", " * Determinism.\n", " * Uniformity.\n", "- Python's built-in `hash()`.\n", "- Implementing `__hash__()`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "
\n", "\n", "
\n", "\n", "[Photo](https://www.pexels.com/photo/sand-desert-sand-dunes-dune-50628/) by [Pixabay](https://pixabay.com/en/morocco-africa-desert-marroc-sand-123976/) | [CC0](https://www.pexels.com/creative-commons-images/)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Implementing our own hash table\n", "\n", "Let's use all we learned to implement our own table hash — this time handling collisions." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We need to split this class over multiple [Jupyter](http://jupyter.org/) cells, so we'll use this decorator:" ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [], "source": [ "def add_to(cls):\n", " \"\"\"A class decorator to dynamically add methods.\"\"\"\n", "\n", " def wrapper(func): \n", " setattr(cls, func.__name__, func)\n", " return cls\n", " return wrapper" ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class HashTable:\n", " \"\"\"Initial prototype of a hash table.\"\"\"\n", " \n", " def __init__(self, size=8):\n", " self._slots = [None] * size\n", " \n", " def __len__(self):\n", " count = 0\n", " for key in self._slots:\n", " if key is not None:\n", " count += 1\n", " return count" ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "@add_to(HashTable)\n", "def __setitem__(self, key, value):\n", " index = hash(key) % len(self._slots)\n", " self._slots[index] = value\n", "\n", "\n", "@add_to(HashTable)\n", "def __getitem__(self, key):\n", " index = hash(key) % len(self._slots)\n", " if self._slots[index] is None:\n", " raise KeyError(key)\n", " return self._slots[index]\n", "\n", "\n", "@add_to(HashTable)\n", "def __delitem__(self, key):\n", " index = hash(key) % len(self._slots)\n", " self._slots[index] = None" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let's try it..." ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "'Katze'" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table = HashTable()\n", "table[\"cat\"] = \"Katze\"\n", "table[\"cat\"]" ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "[None, None, None, None, None, None, 'Katze', None]" ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table._slots" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can double check this is the right index:" ] }, { "cell_type": "code", "execution_count": 101, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash(\"cat\") % 8" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Deletion also works:" ] }, { "cell_type": "code", "execution_count": 102, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "ename": "KeyError", "evalue": "'cat'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mdel\u001b[0m \u001b[0mtable\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"cat\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mtable\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"cat\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m\u001b[0m in \u001b[0;36m__getitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0mindex\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mhash\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_slots\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_slots\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mindex\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 11\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mKeyError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 12\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_slots\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mindex\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 13\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mKeyError\u001b[0m: 'cat'" ] } ], "source": [ "del table[\"cat\"]\n", "table[\"cat\"]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# A couple of improvements:\n", "\n", "1. Keep a *counter* of values, instead of calculating them every single time.\n", "2. Factor our *index calculation* to a private method of the class." ] }, { "cell_type": "code", "execution_count": 103, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class HashTable:\n", " \"\"\"Initial prototype of a hash table, second attempt.\"\"\"\n", " \n", " def __init__(self, size=8):\n", " self._slots = [None] * size\n", " self._count = 0\n", " \n", " def __len__(self):\n", " return self._count\n", " \n", " def _get_index(self, key):\n", " return hash(key) % len(self._slots)" ] }, { "cell_type": "code", "execution_count": 104, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "@add_to(HashTable)\n", "def __setitem__(self, key, value):\n", " index = self._get_index(key)\n", " self._slots[index] = value\n", " self._count += 1\n", "\n", " \n", "@add_to(HashTable)\n", "def __getitem__(self, key):\n", " index = self._get_index(key)\n", " if self._slots[index] is None:\n", " raise KeyError(key)\n", " return self._slots[index]\n", "\n", "\n", "@add_to(HashTable)\n", "def __delitem__(self, key):\n", " index = self._get_index(key)\n", " if self._slots[index] is None:\n", " raise KeyError(key)\n", " self._slots[index] = None\n", " self._count -= 1" ] }, { "cell_type": "code", "execution_count": 105, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[None, None, 'Bär', None, 'Hund', None, None, None]" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table = HashTable()\n", "table[\"dog\"] = \"Hund\"\n", "table[\"bear\"] = \"Bär\"\n", "table._slots" ] }, { "cell_type": "code", "execution_count": 106, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(table)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This is all good, but again – **we have to keep collisions in mind**.\n", "\n", "Sooner than later two elements are going to end up in the _same_ slot." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# A simple approach: separate chaining\n", "\n", "- Each slot is independent, and keeps a list entries with the _same_ index. \n", "- All the keys that share index are kept one after another in the list...\n", "- ... that is, chained, hence the name.\n", "- The algorithm becomes:\n", " * Get the index $i$ of our element.\n", " * Add the element to the $i$-th list in our hash table." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Thus, our empty hash table looks like this:" ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [], "source": [ "table = [\n", " [], # index = 0\n", " [], # index = 1\n", " [], # index = 2\n", " [], # index = 3\n", " [], # index = 4\n", "]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Say we want to map `\"dog\"` to its German translation, `\"Hund\"`.\n", "\n", "It lands in, for example, `index == 3`." ] }, { "cell_type": "code", "execution_count": 108, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "table = [\n", " [], # index = 0\n", " [], # index = 1\n", " [], # index = 2\n", " ['Hund'], # index = 3\n", " [], # index = 4\n", "]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Then we map `\"bear\"` to its German translation, `\"Bär\"`.\n", "\n", "It lands in, say, `index == 0`." ] }, { "cell_type": "code", "execution_count": 109, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "table = [\n", " [\"Bär\"], # index = 0\n", " [], # index = 1\n", " [], # index = 2\n", " ['Hund'], # index = 3\n", " [], # index = 4\n", "]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Now we store `\"cat\"`, and assume that it _also_ lands in `index == 3`." ] }, { "cell_type": "code", "execution_count": 110, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "table = [\n", " [\"Bär\"], # index = 0\n", " [], # index = 1\n", " [], # index = 2\n", " ['Hund', 'Katze'], # index = 3\n", " [], # index = 4\n", "]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Do you see the problem we face now?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Handling collisions\n", "\n", "Somebody asks us \"How do you say _cat_ in German\".\n", "\n", "We use the hash of `\"cat\"` to compute its index, and get that `index == 3`." ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [], "source": [ "table = [\n", " [\"Bär\"], # index = 0\n", " [], # index = 1\n", " [], # index = 2\n", " ['Hund', 'Katze'], # index = 3\n", " [], # index = 4\n", "]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Is it `\"Hund\"` or `\"Katze\"`? How can we tell them apart? We can't!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# We also need the _key_\n", "\n", "In order to deal with collisions we have to store the *key* itself, in addition to the mapped value.\n", "\n", "
\n", "\n", "\n", "Image by Jorge Stolfi | CC BY-SA 3.0\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Now with keys!" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [], "source": [ "import collections\n", "\n", "Entry = collections.namedtuple('Entry', 'key, value')\n", "\n", "\n", "class HashTable:\n", " \"\"\"A separate-chaining hash table.\"\"\"\n", " \n", " def __init__(self, size=8):\n", " self._slots = [list() for _ in range(size)]\n", " self._count = 0\n", " \n", " def __len__(self):\n", " return self._count\n", " \n", " def _get_index(self, key):\n", " return hash(key) % len(self._slots)" ] }, { "cell_type": "code", "execution_count": 113, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "@add_to(HashTable) \n", "def __setitem__(self, key, value):\n", " slot_index = self._get_index(key)\n", " slot = self._slots[slot_index]\n", " for entry_index, entry in enumerate(slot):\n", " if entry.key == key: # note the equality comparison here...\n", " slot[entry_index] = entry._replace(value=value)\n", " break\n", " else:\n", " entry = Entry(key, value)\n", " self._slots[slot_index].append(entry)\n", " self._count += 1" ] }, { "cell_type": "code", "execution_count": 114, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "@add_to(HashTable)\n", "def __getitem__(self, key):\n", " index = self._get_index(key)\n", " for entry in self._slots[index]:\n", " if entry.key == key: # and here...\n", " return entry.value\n", " raise KeyError(key)" ] }, { "cell_type": "code", "execution_count": 115, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "@add_to(HashTable)\n", "def __delitem__(self, key):\n", " slot_index = self._get_index(key)\n", " slot = self._slots[slot_index]\n", " for entry_index, entry in enumerate(slot):\n", " if entry.key == key: # ... and here\n", " del slot[entry_index]\n", " break\n", " else:\n", " raise KeyError(key)\n", " self._count -= 1" ] }, { "cell_type": "code", "execution_count": 116, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "table = HashTable()" ] }, { "cell_type": "code", "execution_count": 117, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "table[\"dog\"] = \"perro\"\n", "table[\"bear\"] = \"Bär\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`\"perro\"` is Spanish, not German! Let's replace the value." ] }, { "cell_type": "code", "execution_count": 118, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "'Hund'" ] }, "execution_count": 118, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table[\"dog\"] = \"Hund\"\n", "table[\"dog\"]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let's take a look at the internals:" ] }, { "cell_type": "code", "execution_count": 119, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "[[],\n", " [],\n", " [Entry(key='bear', value='Bär')],\n", " [],\n", " [Entry(key='dog', value='Hund')],\n", " [],\n", " [],\n", " []]" ] }, "execution_count": 119, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table._slots" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Real life is _so_ much more complex\n", "\n", "- We used this collision resolution strategy here because it's simple.\n", "- Python uses a different one: [open addressing](https://en.wikipedia.org/wiki/Hash_table#Open_addressing) with pseudo-random probing.\n", "- See the superb documentation at [cpython/Objects/dictobject.c](https://github.com/python/cpython/blob/master/Objects/dictobject.c)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "
\n", "\n", "\n", "Image by Jorge Stolfi | CC BY-SA 3.0\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Still useful\n", "\n", "Python may use a different strategy, but the foundations are the same. We need:\n", "\n", "1. The hash to find the corresponding slot (also known as _bucket_).\n", "2. The element itself to compare (`==`) and tell it apart from other elements what may have collided with it." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "Our hash table is really nice! Let's add functionality to loop over the elements:" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [ "alternative:skip" ] }, "source": [ "# Now with `items()`!" ] }, { "cell_type": "code", "execution_count": 120, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "bear -> Bär\n", "dog -> Hund\n" ] } ], "source": [ "@add_to(HashTable)\n", "def items(self):\n", " for slot in self._slots:\n", " for entry in slot:\n", " yield entry\n", " \n", "\n", "table = HashTable()\n", "table[\"dog\"] = \"Hund\"\n", "table[\"bear\"] = \"Bär\"\n", "\n", "for key, value in table.items():\n", " print(key, \"->\", value)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "Alternatively, in a single line:" ] }, { "cell_type": "code", "execution_count": 121, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [], "source": [ "import itertools\n", "\n", "@add_to(HashTable)\n", "def items(self):\n", " yield from itertools.chain.from_iterable(self._slots)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [ "alternative:skip" ] }, "source": [ "# Resizing\n", "\n", "Our hash table is _almost_ perfect! The only downside:\n", "\n", "We only have *eight* slots! Sooner than later it's going to get full and performance start degrading.\n", "\n", "The solution is to resize, also known as _rehashing_.\n", "\n", "- Allocate a new, _larger_ underlying data structure.\n", "- Remove each entry from old `slots`...\n", "- ... and insert it into the new one.\n", "- After _all_ entries are removed from the old `slots`, we no longer need it.\n", "- From now on, use the new slots data structure.\n", "- Takes forever, but [amortized cost](https://en.wikipedia.org/wiki/Amortized_analysis) is $\\mathcal{O}(1)$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "# Now with dynamic resizing!" ] }, { "cell_type": "code", "execution_count": 122, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [], "source": [ "import collections\n", "import itertools\n", "\n", "Entry = collections.namedtuple('Entry', 'key, value')\n", "\n", "\n", "class HashTable:\n", " \"\"\"A separate-chaining, dynamically resized hash table.\"\"\"\n", " \n", " def __init__(self, initial_size=8, load_factor=0.8):\n", " self._slots = [list() for _ in range(initial_size)]\n", " self._size = 0\n", " self._load_factor = load_factor \n", " \n", " def __len__(self):\n", " return self._size\n", " \n", " @staticmethod\n", " def _get_index(key, slots):\n", " return hash(key) % len(slots)\n", " \n", " @classmethod\n", " def _insert(cls, slots, key, value):\n", " index = cls._get_index(key, slots)\n", " slots[index] = value" ] }, { "cell_type": "code", "execution_count": 123, "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "outputs": [], "source": [ "@add_to(HashTable)\n", "def __setitem__(self, key, value):\n", " slot_index = self._get_index(key, self._slots)\n", " slot = self._slots[slot_index]\n", " for entry_index, entry in enumerate(slot):\n", " if entry.key == key: # note the equality comparison here...\n", " slot[entry_index] = entry._replace(value=value)\n", " break\n", " else:\n", " entry = Entry(key, value)\n", " self._slots[slot_index].append(entry)\n", " self._size += 1\n", "\n", " if len(self) / len(self._slots) >= self._load_factor:\n", " self._rehash() " ] }, { "cell_type": "code", "execution_count": 124, "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "outputs": [], "source": [ "@add_to(HashTable)\n", "def _rehash(self):\n", " new_slots = [list() for _ in range(len(self._slots) * 2)]\n", " for key, value in self.items():\n", " HashTable._insert(new_slots, key, value)\n", " self._slots = new_slots" ] }, { "cell_type": "code", "execution_count": 125, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "# The remaining methods; not worth showing on the slides, as they're basically the same ones\n", "# we implemented earlier, but they're included here for the sake of completeness --- this is\n", "# the last version of our hash table implementation.\n", "\n", "@add_to(HashTable) \n", "def __getitem__(self, key):\n", " index = self._get_index(key, self._slots)\n", " for entry in self._slots[index]:\n", " if entry.key == key: # note the equality comparison here...\n", " return entry.value\n", " raise KeyError(key)\n", "\n", "\n", "@add_to(HashTable)\n", "def __delitem__(self, key):\n", " slot_index = self._get_index(key, self._slots)\n", " slot = self._slots[slot_index]\n", " for entry_index, entry in enumerate(slot):\n", " if entry.key == key: # ... and here\n", " del slot[entry_index]\n", " break\n", " else:\n", " raise KeyError(key)\n", " self._size -= 1\n", "\n", "\n", "@add_to(HashTable)\n", "def items(self):\n", " yield from itertools.chain.from_iterable(self._slots) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "# Future improvements\n", "\n", "- Shrink when we delete too many elements (i.e., it's too empty, so make it smaller).\n", "- Alternatives to all-at-once-rehashing:\n", " * [Incremental resizing](https://en.wikipedia.org/wiki/Hash_table#Incremental_resizing).\n", " * [Monotonic keys](https://en.wikipedia.org/wiki/Hash_table#Monotonic_keys).\n", " * [Linear hashing](https://en.wikipedia.org/wiki/Hash_table#Linear_hashing)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In the words of [Andrew Cooke](https://stackoverflow.com/users/181772/andrew-cooke) on [Stack Overflow](https://stackoverflow.com/a/10130506) \\[emphasis added\\]:\n", "\n", "> Therefore, the hash code only hints at the location of an object, an equality test follows to test candidate keys. To implement a membership test in a hash-table set, **the hash code gives you \"bucket\" number in which to search for the value**. However, all set items with the same hash code are in the bucket. For this, **you also need an equality test to distinguish between all candidates in the bucket**. \n", "\n", "That's it." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Sets\n", "\n", "What we learned is also useful to understand [how Python sets work](https://groups.google.com/forum/#!topic/comp.lang.python/4-nf21y-M8g):\n", "\n", "- They are, in essence, a hash table with dummy values.\n", "- In other words: the members of the set are the keys of the hash table.\n", "- Plus, of course, optimizations to exploit this lack of values." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [ "alternative:skip" ] }, "source": [ "# Wrapping up this section\n", "\n", "- We implemented our own hash table.\n", " * Need to store both keys and values.\n", " * Handled collisions resolution via separate chaining.\n", " * Dynamic resizing when it's too full.\n", "- Python dictionaries have a much more complex implementation.\n", "- Sets are in essence a hash table with dummy values." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "tags": [ "alternative:slide" ] }, "source": [ "# Wrapping up this section\n", "\n", "- We implemented our own hash table.\n", " * Need to store both keys and values.\n", " * Handled collisions resolution via separate chaining.\n", "- Python dictionaries have a much more complex implementation.\n", "- Sets are in essence a hash table with dummy values." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "
\n", "\n", "
\n", "\n", "[Photo](https://www.pexels.com/photo/silhouette-photography-of-desert-998649/) by [Francesco Ungaro](https://www.pexels.com/@ghostpresenter) | [Pexels License](https://www.pexels.com/photo-license/)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# We're back\n", "\n", "We're —finally— ready to revisit [the paragraph](https://docs.python.org/3/reference/datamodel.html#object.__hash__) from the beginning:\n", "\n", "> If a class does not define an [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) method it should not define a [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__) operation either; if it defines [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) but not [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__), its instances will not be usable as items in hashable collections. If a class defines mutable objects and implements an [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) method, it should not implement [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).\n", "\n", "We now have the tools to understand —and remember— _why_." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Primo\n", "\n", "> If a class does not define an [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) method it should not define a [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__) operation either; [...]\n", "\n", "Rephrased: defining [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__) but _not_ [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) is wrong. Why?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let's think of a hash table look up:\n", "\n", "- Python uses [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__) to compute the slot where the value goes.\n", "- This gives us the \"_bucket_\" in which to search for our value.\n", "- Now we need the equality test to distingush between all candidates in the bucket.\n", "- We didn't implement [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__), so by default our objects compare unequal except with themselves.\n", "- Therefore, **we cannot find our entry**.\n", "\n", "See [this Stack Overflow question](https://stackoverflow.com/q/42061019) for an example of somebody who ran into this." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## An example that works\n", "\n", "In our implementation of a hash table, if we look up `\"cat\"`:" ] }, { "cell_type": "code", "execution_count": 126, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "table = [\n", " [], # 0\n", " [Entry(key='bear', value='Bär')], # 1\n", " [], # 2\n", " [Entry(key='dog', value='Hund'), Entry(key='cat', value='Katze')], # 3\n", " [], # 4\n", "]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Python uses `str.__hash__()` and determines that `index == 3`.\n", "* There are two different entries there, so I need to compare for equality.\n", "* Python uses `str.__eq__()` to compare `\"cat\"` to `\"dog\"` $\\rightarrow$ `False`.\n", "* Second entry: `\"cat\" == \"cat\"` $\\rightarrow$ `True`. We found our entry." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## An example that doesn't work.\n", "\n", "If we have [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__) but not [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__), we'll land in the right slot but won't be able to find our value." ] }, { "cell_type": "code", "execution_count": 127, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 127, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class Cyborg:\n", "\n", " def __init__(self, name, weapon):\n", " self.name = name\n", " self.weapon = weapon\n", " \n", " def __hash__(self):\n", " return hash((self.name, self.weapon))\n", "\n", "\n", "robot = Cyborg('T-1000', 'Franchi SPAS-12')\n", "cyborgs = set()\n", "cyborgs.add(robot)\n", "robot in cyborgs" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "However.." ] }, { "cell_type": "code", "execution_count": 128, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 128, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cyborgs = {Cyborg('T-1000', 'Franchi SPAS-12')}\n", "Cyborg('T-1000', 'Franchi SPAS-12') in cyborgs # doesn't work!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Without [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__), these objects only compare equal to _themselves_." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "As we saw earlier for `Point`..." ] }, { "cell_type": "code", "execution_count": 129, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:subslide" ] }, "outputs": [], "source": [ "robot1 = Cyborg('T-800', 'M79 grenade launcher')\n", "robot2 = Cyborg('T-800', 'M79 grenade launcher')" ] }, { "cell_type": "code", "execution_count": 130, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "-7726681088492938838" ] }, "execution_count": 130, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash(robot1)" ] }, { "cell_type": "code", "execution_count": 131, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "-7726681088492938838" ] }, "execution_count": 131, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash(robot2)" ] }, { "cell_type": "code", "execution_count": 132, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 132, "metadata": {}, "output_type": "execute_result" } ], "source": [ "robot1 == robot2" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Secondo\n", "\n", "> [...] if [a class] defines [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) but not [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__), its instances will not be usable as items in hashable collections.\n", "\n", "This is —now— almost trivial: if we don't have [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__), we cannot know in which slot the value is." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Indeed, Python won't even let us compute their hash." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "ename": "TypeError", "evalue": "unhashable type: 'Cyborg'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0mrobot\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCyborg\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'T-850'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Ithaca 37'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 12\u001b[0;31m \u001b[0mhash\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrobot\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: unhashable type: 'Cyborg'" ] } ], "source": [ "class Cyborg:\n", "\n", " def __init__(self, name, weapon):\n", " self.name = name\n", " self.weapon = weapon\n", " \n", " def __eq__(self, other):\n", " return self.name == other.name and self.weapon == other.weapon\n", "\n", "\n", "robot = Cyborg('T-850', 'Ithaca 37')\n", "hash(robot)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## But wait\n", "\n", "However, we saw earlier that by default both [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) and [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__) return _something_.\n", "\n", "Indeed, if we don't define _none_ of them Python doesn't complain." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "8744344590021" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class Cyborg:\n", "\n", " def __init__(self, name, weapon):\n", " self.name = name\n", " self.weapon = weapon\n", "\n", "\n", "robot = Cyborg('T-850', 'Ithaca 37')\n", "hash(robot)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Making it unhashable\n", "\n", "How do we _not_ define [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__)? By setting it to `None`.\n", "\n", "From [the docs](https://docs.python.org/3/reference/datamodel.html#object.__hash__):\n", "\n", "> When the [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__) method of a class is `None`, instances of the class will raise an appropriate [`TypeError`](https://docs.python.org/3/library/exceptions.html#TypeError) when a program attempts to retrieve their hash value, and will also be correctly identified as unhashable when checking `isinstance(obj, collections.abc.Hashable)`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## An example" ] }, { "cell_type": "code", "execution_count": 133, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "ename": "TypeError", "evalue": "unhashable type: 'Cyborg'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0mrobot\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCyborg\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'T-850'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Ithaca 37'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 11\u001b[0;31m \u001b[0mhash\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrobot\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: unhashable type: 'Cyborg'" ] } ], "source": [ "class Cyborg:\n", "\n", " def __init__(self, name, weapon):\n", " self.name = name\n", " self.weapon = weapon\n", " \n", " __hash__ = None\n", "\n", "\n", "robot = Cyborg('T-850', 'Ithaca 37')\n", "hash(robot)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Also:" ] }, { "cell_type": "code", "execution_count": 134, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "ename": "TypeError", "evalue": "unhashable type: 'Cyborg'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mbest_friend\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mbest_friend\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mrobot\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"Sarah Connor\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: unhashable type: 'Cyborg'" ] } ], "source": [ "best_friend = dict()\n", "best_friend[robot] = \"Sarah Connor\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "And of course:" ] }, { "cell_type": "code", "execution_count": 135, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "ename": "TypeError", "evalue": "unhashable type: 'Cyborg'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mrobot_army\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mset\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mrobot_army\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0madd\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrobot\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: unhashable type: 'Cyborg'" ] } ], "source": [ "robot_army = set()\n", "robot_army.add(robot)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "All of them fail because our class is not hashable." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "# Getting It Wrong, Part I\n", "\n", "From [the docs](https://docs.python.org/3/reference/datamodel.html#object.__hash__):\n", "\n", "> If a class that does not override [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) wishes to suppress hash support, it should include `__hash__ = None` in the class definition.\n", "\n", "Note that _returning_ `None` does not work." ] }, { "cell_type": "code", "execution_count": 136, "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "ename": "TypeError", "evalue": "__hash__ method should return an integer", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0mrobot\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCyborg\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'T-1000'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Franchi SPAS-12'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 12\u001b[0;31m \u001b[0mhash\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrobot\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: __hash__ method should return an integer" ] } ], "source": [ "class Cyborg:\n", " \n", " def __init__(self, name, weapon):\n", " self.name = name\n", " self.weapon = weapon\n", "\n", " def __hash__(self):\n", " return None\n", "\n", "\n", "robot = Cyborg('T-1000', 'Franchi SPAS-12')\n", "hash(robot)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "# Getting It Wrong, Part II\n", "\n", "The docs also [warn us](https://docs.python.org/3/reference/datamodel.html#object.__hash__):\n", "\n", "> A class which defines its own [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__) that explicitly raises a [`TypeError`](https://docs.python.org/3/library/exceptions.html#TypeError) would be incorrectly identified as hashable by an `isinstance(obj, collections.abc.Hashable)` call." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "## Abstract base classes\n", "\n", "[`collections.abc.Hashable`](https://docs.python.org/3/library/collections.abc.html#collections.abc.Hashable) is an [abstract base class](https://docs.python.org/3/library/abc.html). We use them\n", "\n", ">[... ] to test whether a class provides a particular interface; for example, whether it is hashable or whether it is a mapping." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "## Example 1\n", "\n", "Is this variable iterable?" ] }, { "cell_type": "code", "execution_count": 137, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [], "source": [ "word = \"abc\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "We could try to loop over it and catch the error..." ] }, { "cell_type": "code", "execution_count": 138, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 138, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def is_iterable(obj):\n", " try:\n", " for item in obj:\n", " # can exit immediately, just want to check it doesn't fail\n", " return True\n", " except TypeError:\n", " return False\n", "\n", "\n", "is_iterable(word)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "... or instead use:" ] }, { "cell_type": "code", "execution_count": 139, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 139, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import collections.abc\n", "\n", "isinstance(word, collections.abc.Iterable)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "## Example 2\n", "\n", "Does this variable have a size? " ] }, { "cell_type": "code", "execution_count": 140, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [], "source": [ "t = (x ** 2 for x in [4, 5, 6, 8, 9])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "We could try to call [`len()`](https://docs.python.org/3/library/functions.html#len)..." ] }, { "cell_type": "code", "execution_count": 141, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 141, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def is_sized(obj):\n", " try:\n", " len(obj)\n", " return True\n", " except TypeError:\n", " return False\n", "\n", "is_sized(t)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "Alternatively, we could check whether the object `hasattr(obj, '__len__')`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "... or instead:" ] }, { "cell_type": "code", "execution_count": 142, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 142, "metadata": {}, "output_type": "execute_result" } ], "source": [ "isinstance(t, collections.abc.Sized)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "## Example 3\n", "\n", "In the same way, to check whether a variable is hashable we could just try..." ] }, { "cell_type": "code", "execution_count": 143, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 143, "metadata": {}, "output_type": "execute_result" } ], "source": [ "word = \"Katze\"\n", "\n", "def is_hashable(obj):\n", " try:\n", " hash(obj)\n", " return True\n", " except TypeError:\n", " return False\n", "\n", "\n", "is_hashable(word)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "... or instead:" ] }, { "cell_type": "code", "execution_count": 144, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 144, "metadata": {}, "output_type": "execute_result" } ], "source": [ "isinstance(word, collections.abc.Hashable)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "# Don't raise `TypeError`\n", "\n", "The docs warn us that raising [`TypeError`](https://docs.python.org/3/library/exceptions.html#TypeError) ourselves will not work." ] }, { "cell_type": "code", "execution_count": 145, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "ename": "TypeError", "evalue": "class Cyborg is not hashable", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0mrobot\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCyborg\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'T-800'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'M79 grenade launcher'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 12\u001b[0;31m \u001b[0mhash\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrobot\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m\u001b[0m in \u001b[0;36m__hash__\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__hash__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 8\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"class Cyborg is not hashable\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 9\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mTypeError\u001b[0m: class Cyborg is not hashable" ] } ], "source": [ "class Cyborg:\n", " \n", " def __init__(self, name, weapon):\n", " self.name = name\n", " self.weapon = weapon\n", "\n", " def __hash__(self):\n", " raise TypeError(\"class Cyborg is not hashable\")\n", "\n", "\n", "robot = Cyborg('T-800', 'M79 grenade launcher')\n", "hash(robot)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "This was expected. However..." ] }, { "cell_type": "code", "execution_count": 146, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 146, "metadata": {}, "output_type": "execute_result" } ], "source": [ "isinstance(robot, collections.abc.Hashable)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "Wrong! We broke it!\n", "\n", "Use `__hash__ = None`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Terzo\n", "\n", "And finally:\n", "\n", "> [...] If a class defines mutable objects and implements an [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) method, it should not implement [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).\n", "\n", "This takes us to our final point: the hashability of _mutable_ objects." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Hashing mutable objects.\n", "\n", "Well, we can't.\n", "\n", "Tuples are hashable..." ] }, { "cell_type": "code", "execution_count": 147, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "2528502973977326415" ] }, "execution_count": 147, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash((1, 2, 3))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "... but lists aren't." ] }, { "cell_type": "code", "execution_count": 148, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "ename": "TypeError", "evalue": "unhashable type: 'list'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mhash\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m7\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m9\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: unhashable type: 'list'" ] } ], "source": [ "hash([5, 7, 9])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# I'd never try to do that!\n", "\n", "Perhaps we've never tried calling [`hash()`](https://docs.python.org/3/library/functions.html#hash) explicitly for a list, but we may have run into this:\n", "\n", "Say we want to map two points $(x, y)$, to the result of ${x}^{y}$." ] }, { "cell_type": "code", "execution_count": 149, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "ename": "TypeError", "evalue": "unhashable type: 'list'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0msquares\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0msquares\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m7\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m2\u001b[0m \u001b[0;34m**\u001b[0m \u001b[0;36m7\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: unhashable type: 'list'" ] } ], "source": [ "squares = dict()\n", "squares[[2, 7]] = 2 ** 7" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Same error! It's unhashable." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "For this we need to use tuples..." ] }, { "cell_type": "code", "execution_count": 150, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{(2, 7): 128}" ] }, "execution_count": 150, "metadata": {}, "output_type": "execute_result" } ], "source": [ "squares = dict()\n", "squares[(2, 7)] = 2 ** 7\n", "squares" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "... or perhaps a two-level [`defaultdict`](https://docs.python.org/3/library/collections.html#collections.defaultdict):" ] }, { "cell_type": "code", "execution_count": 151, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defaultdict(dict, {2: {7: 128}})" ] }, "execution_count": 151, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import collections\n", "\n", "squares = collections.defaultdict(dict)\n", "squares[2][7] = 2 ** 7\n", "squares" ] }, { "cell_type": "code", "execution_count": 152, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "128" ] }, "execution_count": 152, "metadata": {}, "output_type": "execute_result" } ], "source": [ "squares[2][7]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# But I _really_ want to hash a list\n", "\n", "Let's imagine a list were hashable and we used it as a key in a dictionary." ] }, { "cell_type": "code", "execution_count": 153, "metadata": {}, "outputs": [], "source": [ "x = [2, 3, 5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Python would use [`__hash__()`](https://docs.python.org/3/library/functions.html#hash) to compute the slot where the list goes.\n", "- We would go to that slot and store the list there, as usual." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Later on, we mutate our list, say:" ] }, { "cell_type": "code", "execution_count": 154, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[2, 7, 5]" ] }, "execution_count": 154, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[1] = 7\n", "x" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- Now we want to loop up `x` in the dictionary.\n", "- Python would use [`__hash__()`](https://docs.python.org/3/library/functions.html#hash) and get a _different_ value, hence a _different_ slot.\n", "- Even if (by chance) we ended up in the same slot, we would need the equality test to distinguish between all candidates in the bucket: old and new `x` are different, so they will compare unequal.\n", "- We can never find our list in the dictionary!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# A hashable mutable class — sadness ensues\n", "\n", "- Let's implement a `Cyborg` class.\n", "- We'll implement [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__)...\n", "- ... but _also_ implement [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__).\n", "\n", "What could go wrong?" ] }, { "cell_type": "code", "execution_count": 155, "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "outputs": [], "source": [ "import functools\n", "\n", "@functools.total_ordering\n", "class Cyborg:\n", " \n", " def __init__(self, name, weapon):\n", " self.name = name\n", " self.weapon = weapon\n", " \n", " def __repr__(self):\n", " return \"Cyborg({}, {})\".format(self.name, self.weapon)\n", " \n", " def __eq__(self, other):\n", " return self.name == other.name and self.weapon == other.weapon\n", " \n", " def __lt__(self, other):\n", " if self.name < other.name:\n", " return True\n", " return self.weapon < other.weapon" ] }, { "cell_type": "code", "execution_count": 155, "metadata": { "slideshow": { "slide_type": "skip" }, "tags": [ "alternative:subslide" ] }, "outputs": [], "source": [ "\n", "class Cyborg:\n", " \n", " def __init__(self, name, weapon):\n", " self.name = name\n", " self.weapon = weapon\n", " \n", " def __repr__(self):\n", " return \"Cyborg({}, {})\".format(self.name, self.weapon)\n", " \n", " def __eq__(self, other):\n", " return self.name == other.name and self.weapon == other.weapon\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Our hash function combines both attributes in a tuple. Both are strings, so immutable, so hashable." ] }, { "cell_type": "code", "execution_count": 156, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "@add_to(Cyborg) \n", "def __hash__(self):\n", " return hash((self.name, self.weapon))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We really shouldn't be doing this..." ] }, { "cell_type": "code", "execution_count": 157, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "robot1 = Cyborg('T-800', 'Franchi SPAS-12')\n", "robot2 = Cyborg('T-800', 'MP5 Submachine gun')" ] }, { "cell_type": "code", "execution_count": 158, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 158, "metadata": {}, "output_type": "execute_result" } ], "source": [ "robot1 == robot2" ] }, { "cell_type": "code", "execution_count": 159, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 159, "metadata": {}, "output_type": "execute_result" } ], "source": [ "robot1 >= robot2" ] }, { "cell_type": "code", "execution_count": 160, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "-2292265897856400339" ] }, "execution_count": 160, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash(robot1)" ] }, { "cell_type": "code", "execution_count": 161, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "6957369841330577170" ] }, "execution_count": 161, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash(robot2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "See the problem here?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Our class is mutable!\n", "\n", "This will break!" ] }, { "cell_type": "code", "execution_count": 162, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-2292265897856400339" ] }, "execution_count": 162, "metadata": {}, "output_type": "execute_result" } ], "source": [ "robot = Cyborg('T-800', 'Franchi SPAS-12')\n", "hash(robot)" ] }, { "cell_type": "code", "execution_count": 163, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'John Connor'" ] }, "execution_count": 163, "metadata": {}, "output_type": "execute_result" } ], "source": [ "target = dict()\n", "target[robot] = \"John Connor\"\n", "target[robot]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "So far this works, but..." ] }, { "cell_type": "code", "execution_count": 164, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4054157808228906460" ] }, "execution_count": 164, "metadata": {}, "output_type": "execute_result" } ], "source": [ "robot.weapon = 'M79 grenade launcher'\n", "hash(robot)" ] }, { "cell_type": "code", "execution_count": 165, "metadata": {}, "outputs": [ { "ename": "KeyError", "evalue": "Cyborg(T-800, M79 grenade launcher)", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mtarget\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mrobot\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mKeyError\u001b[0m: Cyborg(T-800, M79 grenade launcher)" ] } ], "source": [ "target[robot]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We broke it!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "That's what the docs were explaining when they told us \\[emphasis added\\]:\n", "\n", "> [...] If a class defines mutable objects and implements an [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) method, it should not implement [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__), since the implementation of hashable collections requires that a key’s hash value is immutable (**if the object’s hash value changes, it will be in the wrong hash bucket**)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "# Dictionaries store _references_\n", "\n", "A final comment. The dictionary (and thus also sets) store _a reference_ to our object." ] }, { "cell_type": "code", "execution_count": 166, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "140116472769168" ] }, "execution_count": 166, "metadata": {}, "output_type": "execute_result" } ], "source": [ "robot = Cyborg('T-1000', 'Remington 870')\n", "id(robot)" ] }, { "cell_type": "code", "execution_count": 167, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [], "source": [ "terminators = set()\n", "terminators.add(robot)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "Let's retrieve the element from the set [without removing it](https://stackoverflow.com/q/59825):" ] }, { "cell_type": "code", "execution_count": 168, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "140116472769168" ] }, "execution_count": 168, "metadata": {}, "output_type": "execute_result" } ], "source": [ "id(next(iter(terminators))) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "It's not a copy — it's the same object." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "- In other words: mutating our object (which should have never been hashable)...\n", "- ... also modifies the copy stored in the hash hable (dictionary / set!)." ] }, { "cell_type": "code", "execution_count": 169, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "Cyborg(T-1000, RSB-80 Plasma Gun)" ] }, "execution_count": 169, "metadata": {}, "output_type": "execute_result" } ], "source": [ "robot.weapon = 'RSB-80 Plasma Gun'\n", "robot" ] }, { "cell_type": "code", "execution_count": 170, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "{Cyborg(T-1000, RSB-80 Plasma Gun)}" ] }, "execution_count": 170, "metadata": {}, "output_type": "execute_result" } ], "source": [ "terminators" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "source": [ "This implies that if our hashable mutated object landed (by chance) in the same slot as it originally did, we would still be able to find it." ] }, { "cell_type": "code", "execution_count": 171, "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original hash: 2225710126195077769\n" ] } ], "source": [ "robot = Cyborg('T-1000', 'Remington 870')\n", "\n", "terminators = set()\n", "terminators.add(robot)\n", "original_hash = hash(robot)\n", "print(\"Original hash:\", original_hash)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "Let's mutate the object but make sure it goes to the same slot:" ] }, { "cell_type": "code", "execution_count": 172, "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mutated hash: 2225710126195077769\n" ] } ], "source": [ "robot.weapon = 'RSB-80 Plasma Gun'\n", "Cyborg.__hash__ = lambda x: original_hash\n", "print(\"Mutated hash: \", hash(robot))" ] }, { "cell_type": "code", "execution_count": 173, "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [ "alternative:skip" ] }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 173, "metadata": {}, "output_type": "execute_result" } ], "source": [ "robot in terminators" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" }, "tags": [ "alternative:skip" ] }, "source": [ "We can find it! It went to the same bucket.\n", "\n", "Hashable mutable objects will break _most_ of the time, but not always." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Doing it right\n", "\n", "Experiments aside, let's do the right thing and make our mutable class non-hashable." ] }, { "cell_type": "code", "execution_count": 174, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "ename": "TypeError", "evalue": "unhashable type: 'Cyborg'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0mrobot\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCyborg\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'T-1000'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'RSB-80 Plasma Gun'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0mterminators\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mset\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 12\u001b[0;31m \u001b[0mterminators\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0madd\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrobot\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: unhashable type: 'Cyborg'" ] } ], "source": [ "class Cyborg:\n", " \n", " def __init__(self, name, weapon):\n", " self.name = name\n", " self.weapon = weapon\n", " \n", " __hash__ = None\n", " \n", " \n", "robot = Cyborg('T-1000', 'RSB-80 Plasma Gun')\n", "terminators = set()\n", "terminators.add(robot)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [ "alternative:skip" ] }, "source": [ "# Wrapping up this section\n", "\n", "We revisited and understood the intimidating paragraph from the docs.\n", "\n", "- If your class doesn't define `__eq__()`, don't define `__hash__()` either. \n", "- Mutable classes should not be hashable.\n", " * To make it non-hashable, set `__hash__ = None`.\n", "- If your class defines `__eq__()`, implement `__hash__()` for it to be hashable. \n", "- Abstract base classes simplify testing whether an interface is satisfied.\n", "- Dictionaries store references, not copies of the data." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "tags": [ "alternative:slide" ] }, "source": [ "# Wrapping up this section\n", "\n", "We revisited and understood the intimidating paragraph from the docs.\n", "\n", "- If your class doesn't define `__eq__()`, don't define `__hash__()` either. \n", "- Mutable classes should not be hashable.\n", " * To make it non-hashable, set `__hash__ = None`.\n", "- If your class defines `__eq__()`, implement `__hash__()` for it to be hashable. \n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "
\n", "\n", "
\n", "\n", "[Photo](https://www.pexels.com/photo/landscape-photography-of-desert-33147/) by [Pixabay](https://pixabay.com/en/desert-africa-bedouin-footprints-1007157/) / [CC0](https://www.pexels.com/creative-commons-images/)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Bonus: Trollface\n", " \n", "The whole point of using hash tables is to get $\\mathcal{O}(1)$ operations.\n", "\n", "Say we use a dictionary to map every number to its square:" ] }, { "cell_type": "code", "execution_count": 176, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0,\n", " 1: 1,\n", " 2: 4,\n", " 3: 9,\n", " 4: 16,\n", " 5: 25,\n", " 6: 36,\n", " 7: 49,\n", " 8: 64,\n", " 9: 81,\n", " 10: 100,\n", " 11: 121,\n", " 12: 144,\n", " 13: 169,\n", " 14: 196,\n", " 15: 225,\n", " 16: 256,\n", " 17: 289,\n", " 18: 324,\n", " 19: 361,\n", " 20: 400,\n", " 21: 441,\n", " 22: 484,\n", " 23: 529,\n", " 24: 576,\n", " 25: 625,\n", " 26: 676,\n", " 27: 729,\n", " 28: 784,\n", " 29: 841,\n", " 30: 900,\n", " 31: 961,\n", " 32: 1024,\n", " 33: 1089,\n", " 34: 1156,\n", " 35: 1225,\n", " 36: 1296,\n", " 37: 1369,\n", " 38: 1444,\n", " 39: 1521,\n", " 40: 1600,\n", " 41: 1681,\n", " 42: 1764,\n", " 43: 1849,\n", " 44: 1936,\n", " 45: 2025,\n", " 46: 2116,\n", " 47: 2209,\n", " 48: 2304,\n", " 49: 2401,\n", " 50: 2500,\n", " 51: 2601,\n", " 52: 2704,\n", " 53: 2809,\n", " 54: 2916,\n", " 55: 3025,\n", " 56: 3136,\n", " 57: 3249,\n", " 58: 3364,\n", " 59: 3481,\n", " 60: 3600,\n", " 61: 3721,\n", " 62: 3844,\n", " 63: 3969,\n", " 64: 4096,\n", " 65: 4225,\n", " 66: 4356,\n", " 67: 4489,\n", " 68: 4624,\n", " 69: 4761,\n", " 70: 4900,\n", " 71: 5041,\n", " 72: 5184,\n", " 73: 5329,\n", " 74: 5476,\n", " 75: 5625,\n", " 76: 5776,\n", " 77: 5929,\n", " 78: 6084,\n", " 79: 6241,\n", " 80: 6400,\n", " 81: 6561,\n", " 82: 6724,\n", " 83: 6889,\n", " 84: 7056,\n", " 85: 7225,\n", " 86: 7396,\n", " 87: 7569,\n", " 88: 7744,\n", " 89: 7921,\n", " 90: 8100,\n", " 91: 8281,\n", " 92: 8464,\n", " 93: 8649,\n", " 94: 8836,\n", " 95: 9025,\n", " 96: 9216,\n", " 97: 9409,\n", " 98: 9604,\n", " 99: 9801,\n", " 100: 10000,\n", " 101: 10201,\n", " 102: 10404,\n", " 103: 10609,\n", " 104: 10816,\n", " 105: 11025,\n", " 106: 11236,\n", " 107: 11449,\n", " 108: 11664,\n", " 109: 11881,\n", " 110: 12100,\n", " 111: 12321,\n", " 112: 12544,\n", " 113: 12769,\n", " 114: 12996,\n", " 115: 13225,\n", " 116: 13456,\n", " 117: 13689,\n", " 118: 13924,\n", " 119: 14161,\n", " 120: 14400,\n", " 121: 14641,\n", " 122: 14884,\n", " 123: 15129,\n", " 124: 15376,\n", " 125: 15625,\n", " 126: 15876,\n", " 127: 16129,\n", " 128: 16384,\n", " 129: 16641,\n", " 130: 16900,\n", " 131: 17161,\n", " 132: 17424,\n", " 133: 17689,\n", " 134: 17956,\n", " 135: 18225,\n", " 136: 18496,\n", " 137: 18769,\n", " 138: 19044,\n", " 139: 19321,\n", " 140: 19600,\n", " 141: 19881,\n", " 142: 20164,\n", " 143: 20449,\n", " 144: 20736,\n", " 145: 21025,\n", " 146: 21316,\n", " 147: 21609,\n", " 148: 21904,\n", " 149: 22201,\n", " 150: 22500,\n", " 151: 22801,\n", " 152: 23104,\n", " 153: 23409,\n", " 154: 23716,\n", " 155: 24025,\n", " 156: 24336,\n", " 157: 24649,\n", " 158: 24964,\n", " 159: 25281,\n", " 160: 25600,\n", " 161: 25921,\n", " 162: 26244,\n", " 163: 26569,\n", " 164: 26896,\n", " 165: 27225,\n", " 166: 27556,\n", " 167: 27889,\n", " 168: 28224,\n", " 169: 28561,\n", " 170: 28900,\n", " 171: 29241,\n", " 172: 29584,\n", " 173: 29929,\n", " 174: 30276,\n", " 175: 30625,\n", " 176: 30976,\n", " 177: 31329,\n", " 178: 31684,\n", " 179: 32041,\n", " 180: 32400,\n", " 181: 32761,\n", " 182: 33124,\n", " 183: 33489,\n", " 184: 33856,\n", " 185: 34225,\n", " 186: 34596,\n", " 187: 34969,\n", " 188: 35344,\n", " 189: 35721,\n", " 190: 36100,\n", " 191: 36481,\n", " 192: 36864,\n", " 193: 37249,\n", " 194: 37636,\n", " 195: 38025,\n", " 196: 38416,\n", " 197: 38809,\n", " 198: 39204,\n", " 199: 39601,\n", " 200: 40000,\n", " 201: 40401,\n", " 202: 40804,\n", " 203: 41209,\n", " 204: 41616,\n", " 205: 42025,\n", " 206: 42436,\n", " 207: 42849,\n", " 208: 43264,\n", " 209: 43681,\n", " 210: 44100,\n", " 211: 44521,\n", " 212: 44944,\n", " 213: 45369,\n", " 214: 45796,\n", " 215: 46225,\n", " 216: 46656,\n", " 217: 47089,\n", " 218: 47524,\n", " 219: 47961,\n", " 220: 48400,\n", " 221: 48841,\n", " 222: 49284,\n", " 223: 49729,\n", " 224: 50176,\n", " 225: 50625,\n", " 226: 51076,\n", " 227: 51529,\n", " 228: 51984,\n", " 229: 52441,\n", " 230: 52900,\n", " 231: 53361,\n", " 232: 53824,\n", " 233: 54289,\n", " 234: 54756,\n", " 235: 55225,\n", " 236: 55696,\n", " 237: 56169,\n", " 238: 56644,\n", " 239: 57121,\n", " 240: 57600,\n", " 241: 58081,\n", " 242: 58564,\n", " 243: 59049,\n", " 244: 59536,\n", " 245: 60025,\n", " 246: 60516,\n", " 247: 61009,\n", " 248: 61504,\n", " 249: 62001,\n", " 250: 62500,\n", " 251: 63001,\n", " 252: 63504,\n", " 253: 64009,\n", " 254: 64516,\n", " 255: 65025,\n", " 256: 65536,\n", " 257: 66049,\n", " 258: 66564,\n", " 259: 67081,\n", " 260: 67600,\n", " 261: 68121,\n", " 262: 68644,\n", " 263: 69169,\n", " 264: 69696,\n", " 265: 70225,\n", " 266: 70756,\n", " 267: 71289,\n", " 268: 71824,\n", " 269: 72361,\n", " 270: 72900,\n", " 271: 73441,\n", " 272: 73984,\n", " 273: 74529,\n", " 274: 75076,\n", " 275: 75625,\n", " 276: 76176,\n", " 277: 76729,\n", " 278: 77284,\n", " 279: 77841,\n", " 280: 78400,\n", " 281: 78961,\n", " 282: 79524,\n", " 283: 80089,\n", " 284: 80656,\n", " 285: 81225,\n", " 286: 81796,\n", " 287: 82369,\n", " 288: 82944,\n", " 289: 83521,\n", " 290: 84100,\n", " 291: 84681,\n", " 292: 85264,\n", " 293: 85849,\n", " 294: 86436,\n", " 295: 87025,\n", " 296: 87616,\n", " 297: 88209,\n", " 298: 88804,\n", " 299: 89401,\n", " 300: 90000,\n", " 301: 90601,\n", " 302: 91204,\n", " 303: 91809,\n", " 304: 92416,\n", " 305: 93025,\n", " 306: 93636,\n", " 307: 94249,\n", " 308: 94864,\n", " 309: 95481,\n", " 310: 96100,\n", " 311: 96721,\n", " 312: 97344,\n", " 313: 97969,\n", " 314: 98596,\n", " 315: 99225,\n", " 316: 99856,\n", " 317: 100489,\n", " 318: 101124,\n", " 319: 101761,\n", " 320: 102400,\n", " 321: 103041,\n", " 322: 103684,\n", " 323: 104329,\n", " 324: 104976,\n", " 325: 105625,\n", " 326: 106276,\n", " 327: 106929,\n", " 328: 107584,\n", " 329: 108241,\n", " 330: 108900,\n", " 331: 109561,\n", " 332: 110224,\n", " 333: 110889,\n", " 334: 111556,\n", " 335: 112225,\n", " 336: 112896,\n", " 337: 113569,\n", " 338: 114244,\n", " 339: 114921,\n", " 340: 115600,\n", " 341: 116281,\n", " 342: 116964,\n", " 343: 117649,\n", " 344: 118336,\n", " 345: 119025,\n", " 346: 119716,\n", " 347: 120409,\n", " 348: 121104,\n", " 349: 121801,\n", " 350: 122500,\n", " 351: 123201,\n", " 352: 123904,\n", " 353: 124609,\n", " 354: 125316,\n", " 355: 126025,\n", " 356: 126736,\n", " 357: 127449,\n", " 358: 128164,\n", " 359: 128881,\n", " 360: 129600,\n", " 361: 130321,\n", " 362: 131044,\n", " 363: 131769,\n", " 364: 132496,\n", " 365: 133225,\n", " 366: 133956,\n", " 367: 134689,\n", " 368: 135424,\n", " 369: 136161,\n", " 370: 136900,\n", " 371: 137641,\n", " 372: 138384,\n", " 373: 139129,\n", " 374: 139876,\n", " 375: 140625,\n", " 376: 141376,\n", " 377: 142129,\n", " 378: 142884,\n", " 379: 143641,\n", " 380: 144400,\n", " 381: 145161,\n", " 382: 145924,\n", " 383: 146689,\n", " 384: 147456,\n", " 385: 148225,\n", " 386: 148996,\n", " 387: 149769,\n", " 388: 150544,\n", " 389: 151321,\n", " 390: 152100,\n", " 391: 152881,\n", " 392: 153664,\n", " 393: 154449,\n", " 394: 155236,\n", " 395: 156025,\n", " 396: 156816,\n", " 397: 157609,\n", " 398: 158404,\n", " 399: 159201,\n", " 400: 160000,\n", " 401: 160801,\n", " 402: 161604,\n", " 403: 162409,\n", " 404: 163216,\n", " 405: 164025,\n", " 406: 164836,\n", " 407: 165649,\n", " 408: 166464,\n", " 409: 167281,\n", " 410: 168100,\n", " 411: 168921,\n", " 412: 169744,\n", " 413: 170569,\n", " 414: 171396,\n", " 415: 172225,\n", " 416: 173056,\n", " 417: 173889,\n", " 418: 174724,\n", " 419: 175561,\n", " 420: 176400,\n", " 421: 177241,\n", " 422: 178084,\n", " 423: 178929,\n", " 424: 179776,\n", " 425: 180625,\n", " 426: 181476,\n", " 427: 182329,\n", " 428: 183184,\n", " 429: 184041,\n", " 430: 184900,\n", " 431: 185761,\n", " 432: 186624,\n", " 433: 187489,\n", " 434: 188356,\n", " 435: 189225,\n", " 436: 190096,\n", " 437: 190969,\n", " 438: 191844,\n", " 439: 192721,\n", " 440: 193600,\n", " 441: 194481,\n", " 442: 195364,\n", " 443: 196249,\n", " 444: 197136,\n", " 445: 198025,\n", " 446: 198916,\n", " 447: 199809,\n", " 448: 200704,\n", " 449: 201601,\n", " 450: 202500,\n", " 451: 203401,\n", " 452: 204304,\n", " 453: 205209,\n", " 454: 206116,\n", " 455: 207025,\n", " 456: 207936,\n", " 457: 208849,\n", " 458: 209764,\n", " 459: 210681,\n", " 460: 211600,\n", " 461: 212521,\n", " 462: 213444,\n", " 463: 214369,\n", " 464: 215296,\n", " 465: 216225,\n", " 466: 217156,\n", " 467: 218089,\n", " 468: 219024,\n", " 469: 219961,\n", " 470: 220900,\n", " 471: 221841,\n", " 472: 222784,\n", " 473: 223729,\n", " 474: 224676,\n", " 475: 225625,\n", " 476: 226576,\n", " 477: 227529,\n", " 478: 228484,\n", " 479: 229441,\n", " 480: 230400,\n", " 481: 231361,\n", " 482: 232324,\n", " 483: 233289,\n", " 484: 234256,\n", " 485: 235225,\n", " 486: 236196,\n", " 487: 237169,\n", " 488: 238144,\n", " 489: 239121,\n", " 490: 240100,\n", " 491: 241081,\n", " 492: 242064,\n", " 493: 243049,\n", " 494: 244036,\n", " 495: 245025,\n", " 496: 246016,\n", " 497: 247009,\n", " 498: 248004,\n", " 499: 249001,\n", " 500: 250000,\n", " 501: 251001,\n", " 502: 252004,\n", " 503: 253009,\n", " 504: 254016,\n", " 505: 255025,\n", " 506: 256036,\n", " 507: 257049,\n", " 508: 258064,\n", " 509: 259081,\n", " 510: 260100,\n", " 511: 261121,\n", " 512: 262144,\n", " 513: 263169,\n", " 514: 264196,\n", " 515: 265225,\n", " 516: 266256,\n", " 517: 267289,\n", " 518: 268324,\n", " 519: 269361,\n", " 520: 270400,\n", " 521: 271441,\n", " 522: 272484,\n", " 523: 273529,\n", " 524: 274576,\n", " 525: 275625,\n", " 526: 276676,\n", " 527: 277729,\n", " 528: 278784,\n", " 529: 279841,\n", " 530: 280900,\n", " 531: 281961,\n", " 532: 283024,\n", " 533: 284089,\n", " 534: 285156,\n", " 535: 286225,\n", " 536: 287296,\n", " 537: 288369,\n", " 538: 289444,\n", " 539: 290521,\n", " 540: 291600,\n", " 541: 292681,\n", " 542: 293764,\n", " 543: 294849,\n", " 544: 295936,\n", " 545: 297025,\n", " 546: 298116,\n", " 547: 299209,\n", " 548: 300304,\n", " 549: 301401,\n", " 550: 302500,\n", " 551: 303601,\n", " 552: 304704,\n", " 553: 305809,\n", " 554: 306916,\n", " 555: 308025,\n", " 556: 309136,\n", " 557: 310249,\n", " 558: 311364,\n", " 559: 312481,\n", " 560: 313600,\n", " 561: 314721,\n", " 562: 315844,\n", " 563: 316969,\n", " 564: 318096,\n", " 565: 319225,\n", " 566: 320356,\n", " 567: 321489,\n", " 568: 322624,\n", " 569: 323761,\n", " 570: 324900,\n", " 571: 326041,\n", " 572: 327184,\n", " 573: 328329,\n", " 574: 329476,\n", " 575: 330625,\n", " 576: 331776,\n", " 577: 332929,\n", " 578: 334084,\n", " 579: 335241,\n", " 580: 336400,\n", " 581: 337561,\n", " 582: 338724,\n", " 583: 339889,\n", " 584: 341056,\n", " 585: 342225,\n", " 586: 343396,\n", " 587: 344569,\n", " 588: 345744,\n", " 589: 346921,\n", " 590: 348100,\n", " 591: 349281,\n", " 592: 350464,\n", " 593: 351649,\n", " 594: 352836,\n", " 595: 354025,\n", " 596: 355216,\n", " 597: 356409,\n", " 598: 357604,\n", " 599: 358801,\n", " 600: 360000,\n", " 601: 361201,\n", " 602: 362404,\n", " 603: 363609,\n", " 604: 364816,\n", " 605: 366025,\n", " 606: 367236,\n", " 607: 368449,\n", " 608: 369664,\n", " 609: 370881,\n", " 610: 372100,\n", " 611: 373321,\n", " 612: 374544,\n", " 613: 375769,\n", " 614: 376996,\n", " 615: 378225,\n", " 616: 379456,\n", " 617: 380689,\n", " 618: 381924,\n", " 619: 383161,\n", " 620: 384400,\n", " 621: 385641,\n", " 622: 386884,\n", " 623: 388129,\n", " 624: 389376,\n", " 625: 390625,\n", " 626: 391876,\n", " 627: 393129,\n", " 628: 394384,\n", " 629: 395641,\n", " 630: 396900,\n", " 631: 398161,\n", " 632: 399424,\n", " 633: 400689,\n", " 634: 401956,\n", " 635: 403225,\n", " 636: 404496,\n", " 637: 405769,\n", " 638: 407044,\n", " 639: 408321,\n", " 640: 409600,\n", " 641: 410881,\n", " 642: 412164,\n", " 643: 413449,\n", " 644: 414736,\n", " 645: 416025,\n", " 646: 417316,\n", " 647: 418609,\n", " 648: 419904,\n", " 649: 421201,\n", " 650: 422500,\n", " 651: 423801,\n", " 652: 425104,\n", " 653: 426409,\n", " 654: 427716,\n", " 655: 429025,\n", " 656: 430336,\n", " 657: 431649,\n", " 658: 432964,\n", " 659: 434281,\n", " 660: 435600,\n", " 661: 436921,\n", " 662: 438244,\n", " 663: 439569,\n", " 664: 440896,\n", " 665: 442225,\n", " 666: 443556,\n", " 667: 444889,\n", " 668: 446224,\n", " 669: 447561,\n", " 670: 448900,\n", " 671: 450241,\n", " 672: 451584,\n", " 673: 452929,\n", " 674: 454276,\n", " 675: 455625,\n", " 676: 456976,\n", " 677: 458329,\n", " 678: 459684,\n", " 679: 461041,\n", " 680: 462400,\n", " 681: 463761,\n", " 682: 465124,\n", " 683: 466489,\n", " 684: 467856,\n", " 685: 469225,\n", " 686: 470596,\n", " 687: 471969,\n", " 688: 473344,\n", " 689: 474721,\n", " 690: 476100,\n", " 691: 477481,\n", " 692: 478864,\n", " 693: 480249,\n", " 694: 481636,\n", " 695: 483025,\n", " 696: 484416,\n", " 697: 485809,\n", " 698: 487204,\n", " 699: 488601,\n", " 700: 490000,\n", " 701: 491401,\n", " 702: 492804,\n", " 703: 494209,\n", " 704: 495616,\n", " 705: 497025,\n", " 706: 498436,\n", " 707: 499849,\n", " 708: 501264,\n", " 709: 502681,\n", " 710: 504100,\n", " 711: 505521,\n", " 712: 506944,\n", " 713: 508369,\n", " 714: 509796,\n", " 715: 511225,\n", " 716: 512656,\n", " 717: 514089,\n", " 718: 515524,\n", " 719: 516961,\n", " 720: 518400,\n", " 721: 519841,\n", " 722: 521284,\n", " 723: 522729,\n", " 724: 524176,\n", " 725: 525625,\n", " 726: 527076,\n", " 727: 528529,\n", " 728: 529984,\n", " 729: 531441,\n", " 730: 532900,\n", " 731: 534361,\n", " 732: 535824,\n", " 733: 537289,\n", " 734: 538756,\n", " 735: 540225,\n", " 736: 541696,\n", " 737: 543169,\n", " 738: 544644,\n", " 739: 546121,\n", " 740: 547600,\n", " 741: 549081,\n", " 742: 550564,\n", " 743: 552049,\n", " 744: 553536,\n", " 745: 555025,\n", " 746: 556516,\n", " 747: 558009,\n", " 748: 559504,\n", " 749: 561001,\n", " 750: 562500,\n", " 751: 564001,\n", " 752: 565504,\n", " 753: 567009,\n", " 754: 568516,\n", " 755: 570025,\n", " 756: 571536,\n", " 757: 573049,\n", " 758: 574564,\n", " 759: 576081,\n", " 760: 577600,\n", " 761: 579121,\n", " 762: 580644,\n", " 763: 582169,\n", " 764: 583696,\n", " 765: 585225,\n", " 766: 586756,\n", " 767: 588289,\n", " 768: 589824,\n", " 769: 591361,\n", " 770: 592900,\n", " 771: 594441,\n", " 772: 595984,\n", " 773: 597529,\n", " 774: 599076,\n", " 775: 600625,\n", " 776: 602176,\n", " 777: 603729,\n", " 778: 605284,\n", " 779: 606841,\n", " 780: 608400,\n", " 781: 609961,\n", " 782: 611524,\n", " 783: 613089,\n", " 784: 614656,\n", " 785: 616225,\n", " 786: 617796,\n", " 787: 619369,\n", " 788: 620944,\n", " 789: 622521,\n", " 790: 624100,\n", " 791: 625681,\n", " 792: 627264,\n", " 793: 628849,\n", " 794: 630436,\n", " 795: 632025,\n", " 796: 633616,\n", " 797: 635209,\n", " 798: 636804,\n", " 799: 638401,\n", " 800: 640000,\n", " 801: 641601,\n", " 802: 643204,\n", " 803: 644809,\n", " 804: 646416,\n", " 805: 648025,\n", " 806: 649636,\n", " 807: 651249,\n", " 808: 652864,\n", " 809: 654481,\n", " 810: 656100,\n", " 811: 657721,\n", " 812: 659344,\n", " 813: 660969,\n", " 814: 662596,\n", " 815: 664225,\n", " 816: 665856,\n", " 817: 667489,\n", " 818: 669124,\n", " 819: 670761,\n", " 820: 672400,\n", " 821: 674041,\n", " 822: 675684,\n", " 823: 677329,\n", " 824: 678976,\n", " 825: 680625,\n", " 826: 682276,\n", " 827: 683929,\n", " 828: 685584,\n", " 829: 687241,\n", " 830: 688900,\n", " 831: 690561,\n", " 832: 692224,\n", " 833: 693889,\n", " 834: 695556,\n", " 835: 697225,\n", " 836: 698896,\n", " 837: 700569,\n", " 838: 702244,\n", " 839: 703921,\n", " 840: 705600,\n", " 841: 707281,\n", " 842: 708964,\n", " 843: 710649,\n", " 844: 712336,\n", " 845: 714025,\n", " 846: 715716,\n", " 847: 717409,\n", " 848: 719104,\n", " 849: 720801,\n", " 850: 722500,\n", " 851: 724201,\n", " 852: 725904,\n", " 853: 727609,\n", " 854: 729316,\n", " 855: 731025,\n", " 856: 732736,\n", " 857: 734449,\n", " 858: 736164,\n", " 859: 737881,\n", " 860: 739600,\n", " 861: 741321,\n", " 862: 743044,\n", " 863: 744769,\n", " 864: 746496,\n", " 865: 748225,\n", " 866: 749956,\n", " 867: 751689,\n", " 868: 753424,\n", " 869: 755161,\n", " 870: 756900,\n", " 871: 758641,\n", " 872: 760384,\n", " 873: 762129,\n", " 874: 763876,\n", " 875: 765625,\n", " 876: 767376,\n", " 877: 769129,\n", " 878: 770884,\n", " 879: 772641,\n", " 880: 774400,\n", " 881: 776161,\n", " 882: 777924,\n", " 883: 779689,\n", " 884: 781456,\n", " 885: 783225,\n", " 886: 784996,\n", " 887: 786769,\n", " 888: 788544,\n", " 889: 790321,\n", " 890: 792100,\n", " 891: 793881,\n", " 892: 795664,\n", " 893: 797449,\n", " 894: 799236,\n", " 895: 801025,\n", " 896: 802816,\n", " 897: 804609,\n", " 898: 806404,\n", " 899: 808201,\n", " 900: 810000,\n", " 901: 811801,\n", " 902: 813604,\n", " 903: 815409,\n", " 904: 817216,\n", " 905: 819025,\n", " 906: 820836,\n", " 907: 822649,\n", " 908: 824464,\n", " 909: 826281,\n", " 910: 828100,\n", " 911: 829921,\n", " 912: 831744,\n", " 913: 833569,\n", " 914: 835396,\n", " 915: 837225,\n", " 916: 839056,\n", " 917: 840889,\n", " 918: 842724,\n", " 919: 844561,\n", " 920: 846400,\n", " 921: 848241,\n", " 922: 850084,\n", " 923: 851929,\n", " 924: 853776,\n", " 925: 855625,\n", " 926: 857476,\n", " 927: 859329,\n", " 928: 861184,\n", " 929: 863041,\n", " 930: 864900,\n", " 931: 866761,\n", " 932: 868624,\n", " 933: 870489,\n", " 934: 872356,\n", " 935: 874225,\n", " 936: 876096,\n", " 937: 877969,\n", " 938: 879844,\n", " 939: 881721,\n", " 940: 883600,\n", " 941: 885481,\n", " 942: 887364,\n", " 943: 889249,\n", " 944: 891136,\n", " 945: 893025,\n", " 946: 894916,\n", " 947: 896809,\n", " 948: 898704,\n", " 949: 900601,\n", " 950: 902500,\n", " 951: 904401,\n", " 952: 906304,\n", " 953: 908209,\n", " 954: 910116,\n", " 955: 912025,\n", " 956: 913936,\n", " 957: 915849,\n", " 958: 917764,\n", " 959: 919681,\n", " 960: 921600,\n", " 961: 923521,\n", " 962: 925444,\n", " 963: 927369,\n", " 964: 929296,\n", " 965: 931225,\n", " 966: 933156,\n", " 967: 935089,\n", " 968: 937024,\n", " 969: 938961,\n", " 970: 940900,\n", " 971: 942841,\n", " 972: 944784,\n", " 973: 946729,\n", " 974: 948676,\n", " 975: 950625,\n", " 976: 952576,\n", " 977: 954529,\n", " 978: 956484,\n", " 979: 958441,\n", " 980: 960400,\n", " 981: 962361,\n", " 982: 964324,\n", " 983: 966289,\n", " 984: 968256,\n", " 985: 970225,\n", " 986: 972196,\n", " 987: 974169,\n", " 988: 976144,\n", " 989: 978121,\n", " 990: 980100,\n", " 991: 982081,\n", " 992: 984064,\n", " 993: 986049,\n", " 994: 988036,\n", " 995: 990025,\n", " 996: 992016,\n", " 997: 994009,\n", " 998: 996004,\n", " 999: 998001,\n", " ...}" ] }, "execution_count": 176, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "{x: x ** 2 for x in range(10_000)}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Lookups are $\\mathcal{O}(1)$. Proof:" ] }, { "cell_type": "code", "execution_count": 181, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "58 ns ± 6.09 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n" ] } ], "source": [ "%%timeit squares = {x: x ** 2 for x in range(1_000)}\n", "999 in squares" ] }, { "cell_type": "code", "execution_count": 183, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "64 ns ± 9.71 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n" ] } ], "source": [ "%%timeit squares = {x: x ** 2 for x in range(10_000)}\n", "9_999 in squares" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Size of dictionary is $10x$, but it takes the same amount of time. This is $\\mathcal{O}(1)$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let's subclass `int` and make it always return the same hash" ] }, { "cell_type": "code", "execution_count": 184, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "class TrollInt:\n", " \n", " def __init__(self, value):\n", " self.value = value\n", " \n", " def __eq__(self, other):\n", " return self.value == other.value\n", " \n", " def __hash__(self):\n", " return 1" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "This means that they'll all go the same slot.\n", "\n", "Python will need to resolve (endless) collisions" ] }, { "cell_type": "code", "execution_count": 185, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "379 µs ± 89.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" ] } ], "source": [ "%%timeit squares = {TrollInt(x): TrollInt(x ** 2) for x in range(1_000)}\n", "squares[TrollInt(999)] " ] }, { "cell_type": "code", "execution_count": 186, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5.28 ms ± 1.4 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" ] } ], "source": [ "%%timeit squares = {TrollInt(x): TrollInt(x ** 2) for x in range(10_000)}\n", "squares[TrollInt(9_999)]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "We've turned our $\\mathcal{O}(1)$ operations into $\\mathcal{O}(n)$.\n", "\n", "Problem?" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 2 }