\n",
"\n",
"[Photo](https://www.pexels.com/photo/landscape-photography-of-desert-33147/) by [Pixabay](https://pixabay.com/en/desert-africa-bedouin-footprints-1007157/) / [CC0](https://www.pexels.com/creative-commons-images/)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# Motivation\n",
"\n",
"This paragraph [in the Python docs](https://docs.python.org/3/reference/datamodel.html#object.__hash__):\n",
"\n",
"> If a class does not define an [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) method it should not define a [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__) operation either; if it defines [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) but not [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__), its instances will not be usable as items in hashable collections. If a class defines mutable objects and implements an [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) method, it should not implement [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).\n",
"\n",
"..._why_?\n",
"\n",
"This talk is a foray into the Python data model so that we understand the reasoning behind this advice. And, above everything else, just an excuse to talk about [hash tables](https://en.wikipedia.org/wiki/Hash_table)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# Outline\n",
"\n",
"1. Object _equality_ versus object _identity_.\n",
"2. Rich comparison operators for our classes.\n",
"3. Python dictionaries and sets are _hash tables_.\n",
"4. Fundamentals of hash tables, $\\mathcal{O}(1)$ for the win.\n",
"5. The properties of a good hash function.\n",
"6. Implementing our own, full-fledged hash table.\n",
"7. Revising the initial paragraph — we're now ready.\n",
"8. Understanding —and remembering— why.\n",
"\n",
"_On your marks, get set, go!_"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": [
"alternative:subslide"
]
},
"source": [
"# Outline\n",
"\n",
"1. Python dictionaries and sets are _hash tables_.\n",
"2. Fundamentals of hash tables, $\\mathcal{O}(1)$ for the win.\n",
"3. The properties of a good hash function.\n",
"4. Implementing our own, full-fledged hash table.\n",
"5. Revising the initial paragraph — we're now ready.\n",
"6. Understanding —and remembering— why.\n",
"\n",
"_On your marks, get set, go!_"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# Object _equality_ versus object _identity_"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# `==` tests for _equality_\n",
"\n",
"The `==` operator is the object **value comparison** operator.\n",
"\n",
"It tests whether two objects have the same _value_."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = [2, 3]\n",
"y = [2, 3]\n",
"\n",
"x == y"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"Both variables have the same value, so the result of the comparison is `True`."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"## `__eq__()`\n",
"\n",
"In the [Python data model](https://docs.python.org/3/reference/datamodel.html), equality is implemented via [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) and its counterpart, [\\_\\_ne\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__ne__)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# `is` tests for _identity_\n",
"\n",
"The `is` operator, on the other hand, is the object **identity comparison** operator.\n",
"\n",
"It tests whether two objects are the _same_."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = [2, 3]\n",
"y = [2, 3]\n",
"\n",
"x is y"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"Although `x` and `y` have the same value, they are different objects, so the comparison with is returns `False`."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"## `id()`\n",
"\n",
"Identity is checked by comparing the built-in function [`id()`](https://docs.python.org/3/library/functions.html#id). It returns the \"identity\" of an object: an integer guaranteed to be unique and constant for each object during its lifetime."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"140116482324168"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = []\n",
"id(x)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"140116482324488"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y = []\n",
"id(y)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x is y"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"In [CPython](https://github.com/python/cpython) this number is in fact the address of the object in memory."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# However...\n",
"\n",
"Also known as \"Trivia Facts, Part I\".\n",
"\n",
"\n",
"## Integers\n",
"\n",
"\n",
"What we just said doesn't work as we might expect with integers."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"94149475332416"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = 23\n",
"id(x)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"94149475332416"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y = 23\n",
"id(x)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x is y"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"What? But... they're different objects... _right_?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"And to make things even worse:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"140116482290160"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = 581\n",
"id(a)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"140116482290224"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"b = 581\n",
"id(b)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a is b"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"The difference is that the current implementation (of, again, CPython) keeps an array of integer objects for **all integers between −5 and 256**. When we create an integer in that range what we are actually getting is just a reference back to that existing object.\n",
"\n",
"In other words: in CPython, integers between −5 and 256 are [singletons](https://en.wikipedia.org/wiki/Singleton_pattern).\n",
"\n",
"This is an _implementation detail_. For example, until February 2006 this happened only for values [−5, 99]. It's undocumented for a reason. Never rely on this!"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"## Strings\n",
"\n",
"The same behavior may be exhibited by small strings literals."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"140116481858392"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = \"dog\"\n",
"id(x)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"140116481858392"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y = \"dog\"\n",
"id(y)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x is y"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"These strings are _interned_."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# String interning\n",
"\n",
"From [Wikipedia](https://en.wikipedia.org/wiki/String_interning) \\[emphasis added\\]:\n",
"\n",
"> In computer science, string interning is a method of **storing only one copy of each distinct string value**, which must be immutable. Interning strings makes some string processing tasks more time- or space-efficient at the cost of requiring more time when the string is created or interned."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = \"\".join([\"d\", \"o\", \"g\"])\n",
"y = \"dog\"\n",
"x is y"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"Why the difference? Note that we said earlier that this behavior is exhibited for _literals_. Here, `x` is an expression. Because Python does not know its value until runtime, `x` is not interned and therefore it's a different object than `y`."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"We can force a string to be interned, however, using [`sys.intern()`](https://docs.python.org/3.2/library/sys.html#sys.intern)."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import sys\n",
"\n",
"x = sys.intern(\"\".join([\"a\", \"b\", \"c\"]))\n",
"y = \"abc\"\n",
"\n",
"x is y"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# How is this useful?\n",
"\n",
"From the [Python docs](https://docs.python.org/3/library/sys.html#sys.intern):\n",
"\n",
"> Interning strings is useful to gain a little performance on dictionary lookup – if the keys in a dictionary are interned, and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer compare instead of a string compare. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# A Python class of our own\n",
"\n",
"Let's move beyond built-in types and define our own class, `Circle`."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Circle(x=2, y=3, radius=5)\n"
]
}
],
"source": [
"class Circle:\n",
" \n",
" def __init__(self, x, y, radius):\n",
" self.x = x\n",
" self.y = y\n",
" self.radius = radius\n",
" \n",
" def __repr__(self):\n",
" return \"Circle(x={}, y={}, radius={})\".format(\n",
" self.x, self.y, self.radius)\n",
"\n",
"\n",
"c = Circle(2, 3, 5)\n",
"print(c)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"To avoid hard-coding the name of the class, we can use instead:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [],
"source": [
"class Circle:\n",
" \n",
" def __init__(self, x, y, radius):\n",
" self.x = x\n",
" self.y = y\n",
" self.radius = radius\n",
" \n",
" def __repr__(self):\n",
" return \"{}(x={}, y={}, radius={})\".format(\n",
" type(self).__name__, # note this\n",
" self.x,\n",
" self.y,\n",
" self.radius)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# However...\n",
"\n",
"The comparison between the instances of our class is broken!"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c1 = Circle(2, 3, 5)\n",
"c2 = Circle(2, 3, 5)\n",
"c1 == c2"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"We didn't expect this — the _values_ are the same, so they should compare equal."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"True\n",
"True\n",
"True\n"
]
}
],
"source": [
"print(c1.x == c2.x)\n",
"print(c1.y == c2.y)\n",
"print(c1.radius == c2.radius)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# We didn't define `__eq__()`\n",
"\n",
"From the [Python docs](https://docs.python.org/3/reference/datamodel.html#object.__hash__):\n",
"\n",
"> User-defined classes have \\[a\\] [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) [method] by default; with \\[it\\], all objects compare unequal (except with themselves) \\[...\\]\n",
"\n",
"In other words, if we don't define [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) by default it uses `is`.\n",
"\n",
"And `c1` and `c2` are different objects."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c1 is c2"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# Rich comparison methods\n",
"\n",
"\n",
"For our classes to be fully comparable, they must implement the \"_rich_\" comparison methods.\n",
"\n",
"Each comparison operation calls one of these methods. The correspondences are:\n",
"\n",
"- [`__lt__()`](https://docs.python.org/3/reference/datamodel.html#object.__lt__) $\\rightarrow$ `x < y`\n",
"- [`__le__()`](https://docs.python.org/3/reference/datamodel.html#object.__le__) $\\rightarrow$ `x <= y`\n",
"- [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) $\\rightarrow$ `x == y`\n",
"- [`__ne__()`](https://docs.python.org/3/reference/datamodel.html#object.__ne__) $\\rightarrow$ `x != y` \n",
"- [`__gt__()`](https://docs.python.org/3/reference/datamodel.html#object.__gt__) $\\rightarrow$ `x > y`\n",
"- [`__ge__()`](https://docs.python.org/3/reference/datamodel.html#object.__ge__) $\\rightarrow$ `x >= y`"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"For example, so that we can use `==` with our class:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"class Circle:\n",
" \n",
" def __init__(self, x, y, radius):\n",
" self.x = x\n",
" self.y = y\n",
" self.radius = radius\n",
" \n",
" def __eq__(self, other):\n",
" return self.x == other.x and \\\n",
" self.y == other.y and \\\n",
" self.radius == other.radius\n",
" \n",
" \n",
"c1 = Circle(2, 3, 5)\n",
"c2 = Circle(2, 3, 5)\n",
"c1 == c2"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# `bool()`\n",
"\n",
"We _could_ return anything, as Python will call `bool()` to determine the truth value."
]
},
{
"cell_type": "code",
"execution_count": 188,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"'Yes, they are equal!'"
]
},
"execution_count": 188,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"class Circle:\n",
" \n",
" def __init__(self, x, y, radius):\n",
" self.x = x\n",
" self.y = y\n",
" self.radius = radius\n",
" \n",
" def __eq__(self, other):\n",
" if self.x == other.x and \\\n",
" self.y == other.y and \\\n",
" self.radius == other.radius:\n",
" return \"Yes, they are equal!\"\n",
" else:\n",
" return \"Nope, they're different\"\n",
" \n",
" \n",
"c1 = Circle(2, 3, 5)\n",
"c2 = Circle(2, 3, 5)\n",
"c1 == c2"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"As the condition of an `if` statement:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"I already told you, they're equal!\n"
]
}
],
"source": [
"if c1 == c2:\n",
" print(\"I already told you, they're equal!\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"But the truth value of _any_ non-empty string is `True`:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [],
"source": [
"c1 = Circle(1, 3, 9)\n",
"c2 = Circle(2, 5, 7)\n",
"\n",
"if c1 == c2: # bool(\"Nope, they're different\")\n",
" print(\"These circles are equal too!\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"Dont' do this. Return `True` or `False`."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# `NotImplemented`\n",
"\n",
"- If the comparison is not defined, we return the singleton [`NotImplemented`](https://docs.python.org/3/library/constants.html#NotImplemented).\n",
"- It's a singleton, _not_ an exception!\n",
"- This makes Python understand and complain that the two objects cannot be compared."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"ename": "TypeError",
"evalue": "unorderable types: Circle() < Circle()",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mc1\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCircle\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mc2\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCircle\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mc1\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0mc2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: unorderable types: Circle() < Circle()"
]
}
],
"source": [
"c1 = Circle(2, 3, 5)\n",
"c2 = Circle(2, 3, 5)\n",
"c1 < c2"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"The `<` operator calls [`__lt__()`](https://docs.python.org/3/reference/datamodel.html#object.__lt__), but we haven't defined it. Python returns [`NotImplemented`](https://docs.python.org/3/library/constants.html#NotImplemented), which in turn raises [TypeError](https://docs.python.org/3/library/exceptions.html#TypeError).\n",
"\n",
"Reminder: we *exclusively* have a [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) by default, comparing equal only to ourselves."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# Returning `NotImplemented` ourselves\n",
"\n",
"If we want to make sure that the object is only compared to another object of the same type:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"class Circle:\n",
" \n",
" def __init__(self, x, y, radius):\n",
" self.x = x\n",
" self.y = y\n",
" self.radius = radius\n",
" \n",
" def __eq__(self, other):\n",
" if type(other) != type(self):\n",
" return NotImplemented\n",
" return self.x == other.x and \\\n",
" self.y == other.y and \\\n",
" self.radius == other.radius\n",
" \n",
" \n",
"c1 = Circle(2, 3, 5)\n",
"c1 == 5"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"To account and allow for subclasses:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [],
"source": [
"class Circle:\n",
" \n",
" def __init__(self, x, y, radius):\n",
" self.x = x\n",
" self.y = y\n",
" self.radius = radius\n",
" \n",
" def __eq__(self, other):\n",
" if not isinstance(other, type(self)): # instead of type() == type()\n",
" return NotImplemented\n",
" return self.x == other.x and \\\n",
" self.y == other.y and \\\n",
" self.radius == other.radius"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# Reflections\n",
"\n",
"## `__lt__()` and `__gt__()`\n",
"\n",
"Let's say we want to sort our `Circle` object by increasing area."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [],
"source": [
"class Circle:\n",
" \n",
" def __init__(self, x, y, radius):\n",
" self.x = x\n",
" self.y = y\n",
" self.radius = radius\n",
" \n",
" def __lt__(self, other):\n",
" return self.radius < other.radius"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"It works as we would expect:"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c1 = Circle(2, 5, 13)\n",
"c2 = Circle(3, 7, 11)\n",
"c1 < c2"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"However..."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c1 > c2"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"How?! We didn't define [`__gt__()`](https://docs.python.org/3/reference/datamodel.html#object.__gt__), but Python is not complaining."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"From the [Python docs](https://docs.python.org/3/reference/datamodel.html#object.__gt__):\n",
"\n",
"> There are no swapped-argument versions of these methods (to be used when the left argument does not support the operation but the right argument does); rather, [`__lt__()`](https://docs.python.org/3/reference/datamodel.html#object.__lt__) and [`__gt__()`](https://docs.python.org/3/reference/datamodel.html#object.__gt__) are each other’s reflection [...]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"This is what happened here: `Circle` doesn't define [`__gt__()`](https://docs.python.org/3/reference/datamodel.html#object.__gt__), so..."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"NotImplemented"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c1.__gt__(c2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"... was translated into:"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c2.__lt__(c1)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"Technically speaking, we fallback to the right operand’s reflected method."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"Let's verify it via the good ol' [printf debugging](https://en.wikipedia.org/wiki/Debugging#Techniques) technique."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"called Circle(x=3, y=7, radius=11) < Circle(x=2, y=5, radius=13)\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"class Circle:\n",
" \n",
" def __init__(self, x, y, radius):\n",
" self.x = x\n",
" self.y = y\n",
" self.radius = radius\n",
" \n",
" def __repr__(self):\n",
" return \"{}(x={}, y={}, radius={})\".format(\n",
" type(self).__name__, self.x, self.y, self.radius)\n",
" \n",
" def __lt__(self, other):\n",
" print(\"called {} < {}\".format(self, other)) # we added this\n",
" return self.radius < other.radius\n",
"\n",
"\n",
"c1 = Circle(2, 5, 13)\n",
"c2 = Circle(3, 7, 11)\n",
"c1 > c2"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"So, indeed, Python resorted to the reflection of the method we have."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"## `__le__()` and `__ge__()`\n",
"\n",
"The same happens for [`__le__()`](https://docs.python.org/3/reference/datamodel.html#object.__le__) and [`__ge__()`](https://docs.python.org/3/reference/datamodel.html#object.__ge__)."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"called Circle(x=3, y=7, radius=13) >= Circle(x=2, y=5, radius=11)\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"class Circle:\n",
" \n",
" def __init__(self, x, y, radius):\n",
" self.x = x\n",
" self.y = y\n",
" self.radius = radius\n",
" \n",
" def __repr__(self):\n",
" return \"{}(x={}, y={}, radius={})\".format(\n",
" type(self).__name__, self.x, self.y, self.radius)\n",
" \n",
" def __ge__(self, other):\n",
" print(\"called {} >= {}\".format(self, other)) # more printf debugging\n",
" return self.radius >= other.radius\n",
"\n",
"\n",
"c1 = Circle(2, 5, 11)\n",
"c2 = Circle(3, 7, 13)\n",
"c1 <= c2"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"## `__eq__()` and `__ne__()`\n",
"\n",
"Finally, [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) and [`__ne__()`](https://docs.python.org/3/reference/datamodel.html#object.__ne__) are *their own reflection*.\n",
"\n",
"This means that if `x.__eq__(y)` is not defined, we try to use `y.__eq__(x)`.\n",
"\n",
"Don't see why it's useful? You're not alone."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"Let's go back to an earlier example."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [],
"source": [
"class Circle:\n",
" \"\"\"A regular circle.\"\"\"\n",
"\n",
" def __init__(self, x, y, radius):\n",
" self.x = x\n",
" self.y = y\n",
" self.radius = radius\n",
" \n",
" def __repr__(self):\n",
" return \"{}(x={}, y={}, radius={})\".format(\n",
" type(self).__name__, self.x, self.y, self.radius)\n",
" \n",
" def __eq__(self, other):\n",
" print(\"calling {} == {}\".format(self, other))\n",
" if not isinstance(other, type(self)): # we allow for subclasses\n",
" return NotImplemented\n",
" return self.x == other.x and \\\n",
" self.y == other.y and \\\n",
" self.radius == other.radius\n",
"\n",
"\n",
"class PremiumCircle(Circle): \n",
" \"A luxury Circle, with high quality values.\"\n",
" pass # it's the exact same thing, of course"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"Focus your attention on `__eq__()`:"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [],
"source": [
"def __eq__(self, other):\n",
" if not isinstance(other, type(self)): # we allow for subclasses\n",
" return NotImplemented\n",
" return self.x == other.x and \\\n",
" self.y == other.y and \\\n",
" self.radius == other.radius"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"```python\n",
"if not isinstance(other, type(self)): # we allow for subclasses\n",
" return NotImplemented\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"This means that `__eq__()` would only work if the subclass is the _right_ side of `==` (and, therefore, binds to `other`), because..."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"issubclass(PremiumCircle, Circle)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"... but..."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"issubclass(Circle, PremiumCircle)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"... so `__eq__()` will return `NotImplemented`."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"However..."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"calling PremiumCircle(x=4, y=5, radius=8) == Circle(x=2, y=3, radius=5)\n",
"calling Circle(x=2, y=3, radius=5) == PremiumCircle(x=4, y=5, radius=8)\n"
]
},
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c1 = PremiumCircle(4, 5, 8)\n",
"c2 = Circle(2, 3, 5)\n",
"c1 == c2 # subclass on the *left* side!"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"Works! This is, again, because of reflection. Note how we called `__eq__()` twice.\n",
"\n",
"`PremiumCircle.__eq__(Circle)` returns `NotImplemented`, because `PremiumCircle` is not a subclass of `Circle`. Python uses reflection to try the opposite: `Circle.__eq__(PremiumCircle)`: this comparison can take place because `PremiumCircle` *is* a subclass of Circle."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"The reflection of [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) and [`__ne__()`](https://docs.python.org/3/reference/datamodel.html#object.__ne__) with themseves allows us not having to worry about whether `self` is a subclass of `other` or the other way around.\n",
"\n",
"From the [docs of NotImplemented](https://docs.python.org/3/library/constants.html#NotImplemented) \\[emphasis added\\]:\n",
"\n",
"> When a binary (or in-place) method returns `NotImplemented` **the interpreter will try the reflected operation on the other type** (or some other fallback, depending on the operator). If all attempts return `NotImplemented`, the interpreter will raise an appropriate exception [...]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# Implied relationships\n",
"\n",
"By default, [`__ne__()`](https://docs.python.org/3/reference/datamodel.html#object.__ne__) delegates to [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) and inverts the result unless it is `NotImplemented`.\n",
"\n",
"This means that defining [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) gives us access to `!=` for free."
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"class Circle:\n",
" \n",
" def __init__(self, x, y, radius):\n",
" self.x = x\n",
" self.y = y\n",
" self.radius = radius\n",
"\n",
" def __eq__(self, other):\n",
" return self.x == other.x and \\\n",
" self.y == other.y and \\\n",
" self.radius == other.radius\n",
" \n",
"\n",
"c1 = Circle(2, 5, 11)\n",
"c2 = Circle(3, 7, 13)\n",
"c1 != c2"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"## And nothing else\n",
"\n",
"There are **no other** implied relationships among the comparison operators.\n",
"\n",
"Python cannot do smart things like figuring out that if `x >= y` and `x != y`, then if follows that `x > y`.\n",
"\n",
"Humans understand that if (a) `x` is greater than or equal to `y` and (b) they're not equal, that means that `x` must be greater than `y`. Python cannot."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"ename": "TypeError",
"evalue": "unorderable types: Circle() > Circle()",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 15\u001b[0m \u001b[0mc1\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCircle\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m11\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 16\u001b[0m \u001b[0mc2\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCircle\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m7\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m13\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 17\u001b[0;31m \u001b[0mc1\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0mc2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: unorderable types: Circle() > Circle()"
]
}
],
"source": [
"class Circle:\n",
" \n",
" def __init__(self, x, y, radius):\n",
" self.x = x\n",
" self.y = y\n",
" self.radius = radius\n",
" \n",
" def __eq__(self, other):\n",
" return self.x == other.x and \\\n",
" self.y == other.y and \\\n",
" self.radius == other.radius\n",
" \n",
" def __ge__(self, other):\n",
" return self.radius >= other.radius\n",
" \n",
" \n",
"c1 = Circle(2, 5, 11)\n",
"c2 = Circle(3, 7, 13)\n",
"c1 > c2 "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# So, all of them?\n",
"\n",
"So does this mean the in practice we need to define _all_ the comparison methods?"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [],
"source": [
"class Circle:\n",
" \n",
" def __init__(self, x, y, radius):\n",
" self.x, self.y, self.radius = x, y, radius\n",
"\n",
" def __eq__(self, other):\n",
" return self.x == other.x and \\\n",
" self.y == other.y and \\\n",
" self.radius == other.radius\n",
" \n",
" def __ne__(self, other):\n",
" return not self == other\n",
" \n",
" def __ge__(self, other):\n",
" return self.radius >= other.radius\n",
" \n",
" def __le__(self, other):\n",
" return self.radius <= other.radius\n",
" \n",
" def __gt__(self, other):\n",
" return self.radius > other.radius\n",
" \n",
" def __lt__(self, other):\n",
" return self.radius < other.radius"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"## Not _all_, because:\n",
"\n",
"- `!=` returns the opposite of `==`, so we don't have to implement [`__ne__()`](https://docs.python.org/3/reference/datamodel.html#object.__ne__).\n",
"- For [`__le__()`](https://docs.python.org/3/reference/datamodel.html#object.__le__) we can use the reflection of [`__gt__()`](https://docs.python.org/3/reference/datamodel.html#object.__gt__) (or vice versa).\n",
"- For [`__gt__()`](https://docs.python.org/3/reference/datamodel.html#object.__gt__) we can use the reflection of [`__lt__()`](https://docs.python.org/3/reference/datamodel.html#object.__ne__) (or vice versa)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"So we can leave it in:"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [],
"source": [
"class Circle:\n",
" \n",
" def __init__(self, x, y, radius):\n",
" self.x = x\n",
" self.y = y\n",
" self.radius = radius\n",
" \n",
" def __eq__(self, other):\n",
" return self.x == other.x and \\\n",
" self.y == other.y and \\\n",
" self.radius == other.radius\n",
"\n",
" def __ge__(self, other):\n",
" return self.radius >= other.radius\n",
" \n",
" def __gt__(self, other):\n",
" return self.radius > other.radius"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# `functools.total_ordering()`\n",
"\n",
"We can forget all these rules and let [`functools.total_ordering()`](https://docs.python.org/3/library/functools.html#functools.total_ordering) do all the work.\n",
"\n",
"It just needs [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) and one more (any) comparison operator to derive the rest for us _automatically_."
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"outputs": [],
"source": [
"import functools\n",
"\n",
"@functools.total_ordering\n",
"class Circle:\n",
" \n",
" def __init__(self, x, y, radius):\n",
" self.x = x\n",
" self.y = y\n",
" self.radius = radius\n",
" \n",
" def __eq__(self, other):\n",
" return self.x == other.x and \\\n",
" self.y == other.y and \\\n",
" self.radius == other.radius\n",
"\n",
" def __ge__(self, other):\n",
" return self.radius >= other.radius"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"The name [total ordering](https://en.wikipedia.org/wiki/Total_order) comes from mathematics: \"binary relation on some set X, which is antisymmetric, transitive, and a connex relation\". In human-being-speak we can say that a total order is that of a set of elements in which all the possible pairs are comparable.\n",
"\n",
"The downside, from [the Python docs](https://docs.python.org/3/library/functools.html#functools.total_ordering) \\[emphasis added\\]:\n",
"\n",
"> [...] it does come at the **cost of slower execution and more complex stack traces** for the derived comparison methods. If performance benchmarking indicates this is a bottleneck for a given application, implementing all six rich comparison methods instead is likely to provide an easy speed boost."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# Prehistory's `__cmp__()`\n",
"\n",
"Before Python 3, if the rich comparison methods were not implemented, Python falled back to a different magic method: [`__cmp__()`](https://docs.python.org/2/reference/datamodel.html#object.__cmp__).\n",
"\n",
"`object.__cmp__(self, other)` would return:\n",
"\n",
"- A negative integer if `self < other`.\n",
"- Zero if `self == other`.\n",
"- a positive integer if `self > other`.\n",
"\n",
"Thus, our early ancestors could choose between implementing [`__cmp__()`](https://docs.python.org/2/reference/datamodel.html#object.__cmp__) or the rich comparison methods — named like this to differentiate them from the more primitive alternative.\n",
"\n",
"In Python 3 [`__cmp__()`](https://docs.python.org/2/reference/datamodel.html#object.__cmp__) no longer exists."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# Wrapping up this section\n",
"\n",
"We now know how to create well behaved totally ordered classes.\n",
"\n",
"- `==` is for equality...\n",
"- ... and `is` for identity.\n",
"- The default value of [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__).\n",
"- Rich comparison operators:\n",
" * Return `NotImplemented` if comparison is undefined.\n",
" * Fallback to the right operand's reflected method.\n",
" * Have only one implied relationsip: [`__ne__()`](https://docs.python.org/3/reference/datamodel.html#object.__ne__) $\\leftrightarrow$ [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__)\n",
"- [`functools.total_ordering()`](https://docs.python.org/3/library/functools.html#functools.total_ordering) simplifies our life.\n",
"- [`__cmp__()`](https://docs.python.org/2/reference/datamodel.html#object.__cmp__) exists only in history books."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"
\n",
"\n",
"
\n",
"\n",
"[Photo](https://www.pexels.com/photo/adventure-arid-barren-dawn-206724/) by [Pixabay](https://pixabay.com/en/desert-morocco-sand-dune-dry-1748462/) | [CC0](https://www.pexels.com/creative-commons-images/)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# All right, I guess — but do we _really_ care?\n",
"\n",
"We do care if we want to use our classes in a Python dictionary or a set."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# Mandatory disclaimer\n",
"\n",
"Everything to be known about dictionaries was said by [Brandon Rhodes](https://rhodesmill.org/brandon/) back in 2010.\n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# Still useful\n",
"\n",
"Python may use a different strategy, but the foundations are the same. We need:\n",
"\n",
"1. The hash to find the corresponding slot (also known as _bucket_).\n",
"2. The element itself to compare (`==`) and tell it apart from other elements what may have collided with it."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"Our hash table is really nice! Let's add functionality to loop over the elements:"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# Now with `items()`!"
]
},
{
"cell_type": "code",
"execution_count": 120,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"bear -> Bär\n",
"dog -> Hund\n"
]
}
],
"source": [
"@add_to(HashTable)\n",
"def items(self):\n",
" for slot in self._slots:\n",
" for entry in slot:\n",
" yield entry\n",
" \n",
"\n",
"table = HashTable()\n",
"table[\"dog\"] = \"Hund\"\n",
"table[\"bear\"] = \"Bär\"\n",
"\n",
"for key, value in table.items():\n",
" print(key, \"->\", value)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"Alternatively, in a single line:"
]
},
{
"cell_type": "code",
"execution_count": 121,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [],
"source": [
"import itertools\n",
"\n",
"@add_to(HashTable)\n",
"def items(self):\n",
" yield from itertools.chain.from_iterable(self._slots)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# Resizing\n",
"\n",
"Our hash table is _almost_ perfect! The only downside:\n",
"\n",
"We only have *eight* slots! Sooner than later it's going to get full and performance start degrading.\n",
"\n",
"The solution is to resize, also known as _rehashing_.\n",
"\n",
"- Allocate a new, _larger_ underlying data structure.\n",
"- Remove each entry from old `slots`...\n",
"- ... and insert it into the new one.\n",
"- After _all_ entries are removed from the old `slots`, we no longer need it.\n",
"- From now on, use the new slots data structure.\n",
"- Takes forever, but [amortized cost](https://en.wikipedia.org/wiki/Amortized_analysis) is $\\mathcal{O}(1)$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# Now with dynamic resizing!"
]
},
{
"cell_type": "code",
"execution_count": 122,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [],
"source": [
"import collections\n",
"import itertools\n",
"\n",
"Entry = collections.namedtuple('Entry', 'key, value')\n",
"\n",
"\n",
"class HashTable:\n",
" \"\"\"A separate-chaining, dynamically resized hash table.\"\"\"\n",
" \n",
" def __init__(self, initial_size=8, load_factor=0.8):\n",
" self._slots = [list() for _ in range(initial_size)]\n",
" self._size = 0\n",
" self._load_factor = load_factor \n",
" \n",
" def __len__(self):\n",
" return self._size\n",
" \n",
" @staticmethod\n",
" def _get_index(key, slots):\n",
" return hash(key) % len(slots)\n",
" \n",
" @classmethod\n",
" def _insert(cls, slots, key, value):\n",
" index = cls._get_index(key, slots)\n",
" slots[index] = value"
]
},
{
"cell_type": "code",
"execution_count": 123,
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"outputs": [],
"source": [
"@add_to(HashTable)\n",
"def __setitem__(self, key, value):\n",
" slot_index = self._get_index(key, self._slots)\n",
" slot = self._slots[slot_index]\n",
" for entry_index, entry in enumerate(slot):\n",
" if entry.key == key: # note the equality comparison here...\n",
" slot[entry_index] = entry._replace(value=value)\n",
" break\n",
" else:\n",
" entry = Entry(key, value)\n",
" self._slots[slot_index].append(entry)\n",
" self._size += 1\n",
"\n",
" if len(self) / len(self._slots) >= self._load_factor:\n",
" self._rehash() "
]
},
{
"cell_type": "code",
"execution_count": 124,
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"outputs": [],
"source": [
"@add_to(HashTable)\n",
"def _rehash(self):\n",
" new_slots = [list() for _ in range(len(self._slots) * 2)]\n",
" for key, value in self.items():\n",
" HashTable._insert(new_slots, key, value)\n",
" self._slots = new_slots"
]
},
{
"cell_type": "code",
"execution_count": 125,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"# The remaining methods; not worth showing on the slides, as they're basically the same ones\n",
"# we implemented earlier, but they're included here for the sake of completeness --- this is\n",
"# the last version of our hash table implementation.\n",
"\n",
"@add_to(HashTable) \n",
"def __getitem__(self, key):\n",
" index = self._get_index(key, self._slots)\n",
" for entry in self._slots[index]:\n",
" if entry.key == key: # note the equality comparison here...\n",
" return entry.value\n",
" raise KeyError(key)\n",
"\n",
"\n",
"@add_to(HashTable)\n",
"def __delitem__(self, key):\n",
" slot_index = self._get_index(key, self._slots)\n",
" slot = self._slots[slot_index]\n",
" for entry_index, entry in enumerate(slot):\n",
" if entry.key == key: # ... and here\n",
" del slot[entry_index]\n",
" break\n",
" else:\n",
" raise KeyError(key)\n",
" self._size -= 1\n",
"\n",
"\n",
"@add_to(HashTable)\n",
"def items(self):\n",
" yield from itertools.chain.from_iterable(self._slots) "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# Future improvements\n",
"\n",
"- Shrink when we delete too many elements (i.e., it's too empty, so make it smaller).\n",
"- Alternatives to all-at-once-rehashing:\n",
" * [Incremental resizing](https://en.wikipedia.org/wiki/Hash_table#Incremental_resizing).\n",
" * [Monotonic keys](https://en.wikipedia.org/wiki/Hash_table#Monotonic_keys).\n",
" * [Linear hashing](https://en.wikipedia.org/wiki/Hash_table#Linear_hashing)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"In the words of [Andrew Cooke](https://stackoverflow.com/users/181772/andrew-cooke) on [Stack Overflow](https://stackoverflow.com/a/10130506) \\[emphasis added\\]:\n",
"\n",
"> Therefore, the hash code only hints at the location of an object, an equality test follows to test candidate keys. To implement a membership test in a hash-table set, **the hash code gives you \"bucket\" number in which to search for the value**. However, all set items with the same hash code are in the bucket. For this, **you also need an equality test to distinguish between all candidates in the bucket**. \n",
"\n",
"That's it."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Sets\n",
"\n",
"What we learned is also useful to understand [how Python sets work](https://groups.google.com/forum/#!topic/comp.lang.python/4-nf21y-M8g):\n",
"\n",
"- They are, in essence, a hash table with dummy values.\n",
"- In other words: the members of the set are the keys of the hash table.\n",
"- Plus, of course, optimizations to exploit this lack of values."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# Wrapping up this section\n",
"\n",
"- We implemented our own hash table.\n",
" * Need to store both keys and values.\n",
" * Handled collisions resolution via separate chaining.\n",
" * Dynamic resizing when it's too full.\n",
"- Python dictionaries have a much more complex implementation.\n",
"- Sets are in essence a hash table with dummy values."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": [
"alternative:slide"
]
},
"source": [
"# Wrapping up this section\n",
"\n",
"- We implemented our own hash table.\n",
" * Need to store both keys and values.\n",
" * Handled collisions resolution via separate chaining.\n",
"- Python dictionaries have a much more complex implementation.\n",
"- Sets are in essence a hash table with dummy values."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"
\n",
"\n",
"
\n",
"\n",
"[Photo](https://www.pexels.com/photo/silhouette-photography-of-desert-998649/) by [Francesco Ungaro](https://www.pexels.com/@ghostpresenter) | [Pexels License](https://www.pexels.com/photo-license/)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# We're back\n",
"\n",
"We're —finally— ready to revisit [the paragraph](https://docs.python.org/3/reference/datamodel.html#object.__hash__) from the beginning:\n",
"\n",
"> If a class does not define an [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) method it should not define a [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__) operation either; if it defines [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) but not [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__), its instances will not be usable as items in hashable collections. If a class defines mutable objects and implements an [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) method, it should not implement [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).\n",
"\n",
"We now have the tools to understand —and remember— _why_."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Primo\n",
"\n",
"> If a class does not define an [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) method it should not define a [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__) operation either; [...]\n",
"\n",
"Rephrased: defining [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__) but _not_ [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) is wrong. Why?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Let's think of a hash table look up:\n",
"\n",
"- Python uses [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__) to compute the slot where the value goes.\n",
"- This gives us the \"_bucket_\" in which to search for our value.\n",
"- Now we need the equality test to distingush between all candidates in the bucket.\n",
"- We didn't implement [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__), so by default our objects compare unequal except with themselves.\n",
"- Therefore, **we cannot find our entry**.\n",
"\n",
"See [this Stack Overflow question](https://stackoverflow.com/q/42061019) for an example of somebody who ran into this."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## An example that works\n",
"\n",
"In our implementation of a hash table, if we look up `\"cat\"`:"
]
},
{
"cell_type": "code",
"execution_count": 126,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"table = [\n",
" [], # 0\n",
" [Entry(key='bear', value='Bär')], # 1\n",
" [], # 2\n",
" [Entry(key='dog', value='Hund'), Entry(key='cat', value='Katze')], # 3\n",
" [], # 4\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"source": [
"Python uses `str.__hash__()` and determines that `index == 3`.\n",
"* There are two different entries there, so I need to compare for equality.\n",
"* Python uses `str.__eq__()` to compare `\"cat\"` to `\"dog\"` $\\rightarrow$ `False`.\n",
"* Second entry: `\"cat\" == \"cat\"` $\\rightarrow$ `True`. We found our entry."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## An example that doesn't work.\n",
"\n",
"If we have [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__) but not [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__), we'll land in the right slot but won't be able to find our value."
]
},
{
"cell_type": "code",
"execution_count": 127,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 127,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"class Cyborg:\n",
"\n",
" def __init__(self, name, weapon):\n",
" self.name = name\n",
" self.weapon = weapon\n",
" \n",
" def __hash__(self):\n",
" return hash((self.name, self.weapon))\n",
"\n",
"\n",
"robot = Cyborg('T-1000', 'Franchi SPAS-12')\n",
"cyborgs = set()\n",
"cyborgs.add(robot)\n",
"robot in cyborgs"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"However.."
]
},
{
"cell_type": "code",
"execution_count": 128,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 128,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cyborgs = {Cyborg('T-1000', 'Franchi SPAS-12')}\n",
"Cyborg('T-1000', 'Franchi SPAS-12') in cyborgs # doesn't work!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Without [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__), these objects only compare equal to _themselves_."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"As we saw earlier for `Point`..."
]
},
{
"cell_type": "code",
"execution_count": 129,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:subslide"
]
},
"outputs": [],
"source": [
"robot1 = Cyborg('T-800', 'M79 grenade launcher')\n",
"robot2 = Cyborg('T-800', 'M79 grenade launcher')"
]
},
{
"cell_type": "code",
"execution_count": 130,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"-7726681088492938838"
]
},
"execution_count": 130,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hash(robot1)"
]
},
{
"cell_type": "code",
"execution_count": 131,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"-7726681088492938838"
]
},
"execution_count": 131,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hash(robot2)"
]
},
{
"cell_type": "code",
"execution_count": 132,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 132,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"robot1 == robot2"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Secondo\n",
"\n",
"> [...] if [a class] defines [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) but not [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__), its instances will not be usable as items in hashable collections.\n",
"\n",
"This is —now— almost trivial: if we don't have [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__), we cannot know in which slot the value is."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Indeed, Python won't even let us compute their hash."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"ename": "TypeError",
"evalue": "unhashable type: 'Cyborg'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0mrobot\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCyborg\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'T-850'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Ithaca 37'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 12\u001b[0;31m \u001b[0mhash\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrobot\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: unhashable type: 'Cyborg'"
]
}
],
"source": [
"class Cyborg:\n",
"\n",
" def __init__(self, name, weapon):\n",
" self.name = name\n",
" self.weapon = weapon\n",
" \n",
" def __eq__(self, other):\n",
" return self.name == other.name and self.weapon == other.weapon\n",
"\n",
"\n",
"robot = Cyborg('T-850', 'Ithaca 37')\n",
"hash(robot)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## But wait\n",
"\n",
"However, we saw earlier that by default both [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) and [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__) return _something_.\n",
"\n",
"Indeed, if we don't define _none_ of them Python doesn't complain."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"8744344590021"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"class Cyborg:\n",
"\n",
" def __init__(self, name, weapon):\n",
" self.name = name\n",
" self.weapon = weapon\n",
"\n",
"\n",
"robot = Cyborg('T-850', 'Ithaca 37')\n",
"hash(robot)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# Making it unhashable\n",
"\n",
"How do we _not_ define [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__)? By setting it to `None`.\n",
"\n",
"From [the docs](https://docs.python.org/3/reference/datamodel.html#object.__hash__):\n",
"\n",
"> When the [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__) method of a class is `None`, instances of the class will raise an appropriate [`TypeError`](https://docs.python.org/3/library/exceptions.html#TypeError) when a program attempts to retrieve their hash value, and will also be correctly identified as unhashable when checking `isinstance(obj, collections.abc.Hashable)`."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## An example"
]
},
{
"cell_type": "code",
"execution_count": 133,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"ename": "TypeError",
"evalue": "unhashable type: 'Cyborg'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0mrobot\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCyborg\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'T-850'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Ithaca 37'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 11\u001b[0;31m \u001b[0mhash\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrobot\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: unhashable type: 'Cyborg'"
]
}
],
"source": [
"class Cyborg:\n",
"\n",
" def __init__(self, name, weapon):\n",
" self.name = name\n",
" self.weapon = weapon\n",
" \n",
" __hash__ = None\n",
"\n",
"\n",
"robot = Cyborg('T-850', 'Ithaca 37')\n",
"hash(robot)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Also:"
]
},
{
"cell_type": "code",
"execution_count": 134,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"ename": "TypeError",
"evalue": "unhashable type: 'Cyborg'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mbest_friend\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mbest_friend\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mrobot\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"Sarah Connor\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: unhashable type: 'Cyborg'"
]
}
],
"source": [
"best_friend = dict()\n",
"best_friend[robot] = \"Sarah Connor\""
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"And of course:"
]
},
{
"cell_type": "code",
"execution_count": 135,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"ename": "TypeError",
"evalue": "unhashable type: 'Cyborg'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mrobot_army\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mset\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mrobot_army\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0madd\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrobot\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: unhashable type: 'Cyborg'"
]
}
],
"source": [
"robot_army = set()\n",
"robot_army.add(robot)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"source": [
"All of them fail because our class is not hashable."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# Getting It Wrong, Part I\n",
"\n",
"From [the docs](https://docs.python.org/3/reference/datamodel.html#object.__hash__):\n",
"\n",
"> If a class that does not override [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__) wishes to suppress hash support, it should include `__hash__ = None` in the class definition.\n",
"\n",
"Note that _returning_ `None` does not work."
]
},
{
"cell_type": "code",
"execution_count": 136,
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"ename": "TypeError",
"evalue": "__hash__ method should return an integer",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0mrobot\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCyborg\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'T-1000'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Franchi SPAS-12'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 12\u001b[0;31m \u001b[0mhash\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrobot\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: __hash__ method should return an integer"
]
}
],
"source": [
"class Cyborg:\n",
" \n",
" def __init__(self, name, weapon):\n",
" self.name = name\n",
" self.weapon = weapon\n",
"\n",
" def __hash__(self):\n",
" return None\n",
"\n",
"\n",
"robot = Cyborg('T-1000', 'Franchi SPAS-12')\n",
"hash(robot)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# Getting It Wrong, Part II\n",
"\n",
"The docs also [warn us](https://docs.python.org/3/reference/datamodel.html#object.__hash__):\n",
"\n",
"> A class which defines its own [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__) that explicitly raises a [`TypeError`](https://docs.python.org/3/library/exceptions.html#TypeError) would be incorrectly identified as hashable by an `isinstance(obj, collections.abc.Hashable)` call."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"## Abstract base classes\n",
"\n",
"[`collections.abc.Hashable`](https://docs.python.org/3/library/collections.abc.html#collections.abc.Hashable) is an [abstract base class](https://docs.python.org/3/library/abc.html). We use them\n",
"\n",
">[... ] to test whether a class provides a particular interface; for example, whether it is hashable or whether it is a mapping."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"## Example 1\n",
"\n",
"Is this variable iterable?"
]
},
{
"cell_type": "code",
"execution_count": 137,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [],
"source": [
"word = \"abc\""
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"We could try to loop over it and catch the error..."
]
},
{
"cell_type": "code",
"execution_count": 138,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 138,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def is_iterable(obj):\n",
" try:\n",
" for item in obj:\n",
" # can exit immediately, just want to check it doesn't fail\n",
" return True\n",
" except TypeError:\n",
" return False\n",
"\n",
"\n",
"is_iterable(word)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"... or instead use:"
]
},
{
"cell_type": "code",
"execution_count": 139,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 139,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import collections.abc\n",
"\n",
"isinstance(word, collections.abc.Iterable)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"## Example 2\n",
"\n",
"Does this variable have a size? "
]
},
{
"cell_type": "code",
"execution_count": 140,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [],
"source": [
"t = (x ** 2 for x in [4, 5, 6, 8, 9])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"We could try to call [`len()`](https://docs.python.org/3/library/functions.html#len)..."
]
},
{
"cell_type": "code",
"execution_count": 141,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 141,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def is_sized(obj):\n",
" try:\n",
" len(obj)\n",
" return True\n",
" except TypeError:\n",
" return False\n",
"\n",
"is_sized(t)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"Alternatively, we could check whether the object `hasattr(obj, '__len__')`."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"... or instead:"
]
},
{
"cell_type": "code",
"execution_count": 142,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 142,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"isinstance(t, collections.abc.Sized)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"## Example 3\n",
"\n",
"In the same way, to check whether a variable is hashable we could just try..."
]
},
{
"cell_type": "code",
"execution_count": 143,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 143,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"word = \"Katze\"\n",
"\n",
"def is_hashable(obj):\n",
" try:\n",
" hash(obj)\n",
" return True\n",
" except TypeError:\n",
" return False\n",
"\n",
"\n",
"is_hashable(word)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"... or instead:"
]
},
{
"cell_type": "code",
"execution_count": 144,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 144,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"isinstance(word, collections.abc.Hashable)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# Don't raise `TypeError`\n",
"\n",
"The docs warn us that raising [`TypeError`](https://docs.python.org/3/library/exceptions.html#TypeError) ourselves will not work."
]
},
{
"cell_type": "code",
"execution_count": 145,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"ename": "TypeError",
"evalue": "class Cyborg is not hashable",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0mrobot\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCyborg\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'T-800'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'M79 grenade launcher'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 12\u001b[0;31m \u001b[0mhash\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrobot\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m\u001b[0m in \u001b[0;36m__hash__\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__hash__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 8\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"class Cyborg is not hashable\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 9\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mTypeError\u001b[0m: class Cyborg is not hashable"
]
}
],
"source": [
"class Cyborg:\n",
" \n",
" def __init__(self, name, weapon):\n",
" self.name = name\n",
" self.weapon = weapon\n",
"\n",
" def __hash__(self):\n",
" raise TypeError(\"class Cyborg is not hashable\")\n",
"\n",
"\n",
"robot = Cyborg('T-800', 'M79 grenade launcher')\n",
"hash(robot)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"This was expected. However..."
]
},
{
"cell_type": "code",
"execution_count": 146,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 146,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"isinstance(robot, collections.abc.Hashable)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"Wrong! We broke it!\n",
"\n",
"Use `__hash__ = None`."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Terzo\n",
"\n",
"And finally:\n",
"\n",
"> [...] If a class defines mutable objects and implements an [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) method, it should not implement [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).\n",
"\n",
"This takes us to our final point: the hashability of _mutable_ objects."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# Hashing mutable objects.\n",
"\n",
"Well, we can't.\n",
"\n",
"Tuples are hashable..."
]
},
{
"cell_type": "code",
"execution_count": 147,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2528502973977326415"
]
},
"execution_count": 147,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hash((1, 2, 3))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"source": [
"... but lists aren't."
]
},
{
"cell_type": "code",
"execution_count": 148,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"ename": "TypeError",
"evalue": "unhashable type: 'list'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mhash\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m7\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m9\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: unhashable type: 'list'"
]
}
],
"source": [
"hash([5, 7, 9])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# I'd never try to do that!\n",
"\n",
"Perhaps we've never tried calling [`hash()`](https://docs.python.org/3/library/functions.html#hash) explicitly for a list, but we may have run into this:\n",
"\n",
"Say we want to map two points $(x, y)$, to the result of ${x}^{y}$."
]
},
{
"cell_type": "code",
"execution_count": 149,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"ename": "TypeError",
"evalue": "unhashable type: 'list'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0msquares\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0msquares\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m7\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m2\u001b[0m \u001b[0;34m**\u001b[0m \u001b[0;36m7\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: unhashable type: 'list'"
]
}
],
"source": [
"squares = dict()\n",
"squares[[2, 7]] = 2 ** 7"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"source": [
"Same error! It's unhashable."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"For this we need to use tuples..."
]
},
{
"cell_type": "code",
"execution_count": 150,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{(2, 7): 128}"
]
},
"execution_count": 150,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"squares = dict()\n",
"squares[(2, 7)] = 2 ** 7\n",
"squares"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"... or perhaps a two-level [`defaultdict`](https://docs.python.org/3/library/collections.html#collections.defaultdict):"
]
},
{
"cell_type": "code",
"execution_count": 151,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"defaultdict(dict, {2: {7: 128}})"
]
},
"execution_count": 151,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import collections\n",
"\n",
"squares = collections.defaultdict(dict)\n",
"squares[2][7] = 2 ** 7\n",
"squares"
]
},
{
"cell_type": "code",
"execution_count": 152,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"128"
]
},
"execution_count": 152,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"squares[2][7]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# But I _really_ want to hash a list\n",
"\n",
"Let's imagine a list were hashable and we used it as a key in a dictionary."
]
},
{
"cell_type": "code",
"execution_count": 153,
"metadata": {},
"outputs": [],
"source": [
"x = [2, 3, 5]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Python would use [`__hash__()`](https://docs.python.org/3/library/functions.html#hash) to compute the slot where the list goes.\n",
"- We would go to that slot and store the list there, as usual."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Later on, we mutate our list, say:"
]
},
{
"cell_type": "code",
"execution_count": 154,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[2, 7, 5]"
]
},
"execution_count": 154,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x[1] = 7\n",
"x"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"- Now we want to loop up `x` in the dictionary.\n",
"- Python would use [`__hash__()`](https://docs.python.org/3/library/functions.html#hash) and get a _different_ value, hence a _different_ slot.\n",
"- Even if (by chance) we ended up in the same slot, we would need the equality test to distinguish between all candidates in the bucket: old and new `x` are different, so they will compare unequal.\n",
"- We can never find our list in the dictionary!"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# A hashable mutable class — sadness ensues\n",
"\n",
"- Let's implement a `Cyborg` class.\n",
"- We'll implement [`__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__)...\n",
"- ... but _also_ implement [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__).\n",
"\n",
"What could go wrong?"
]
},
{
"cell_type": "code",
"execution_count": 155,
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"outputs": [],
"source": [
"import functools\n",
"\n",
"@functools.total_ordering\n",
"class Cyborg:\n",
" \n",
" def __init__(self, name, weapon):\n",
" self.name = name\n",
" self.weapon = weapon\n",
" \n",
" def __repr__(self):\n",
" return \"Cyborg({}, {})\".format(self.name, self.weapon)\n",
" \n",
" def __eq__(self, other):\n",
" return self.name == other.name and self.weapon == other.weapon\n",
" \n",
" def __lt__(self, other):\n",
" if self.name < other.name:\n",
" return True\n",
" return self.weapon < other.weapon"
]
},
{
"cell_type": "code",
"execution_count": 155,
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": [
"alternative:subslide"
]
},
"outputs": [],
"source": [
"\n",
"class Cyborg:\n",
" \n",
" def __init__(self, name, weapon):\n",
" self.name = name\n",
" self.weapon = weapon\n",
" \n",
" def __repr__(self):\n",
" return \"Cyborg({}, {})\".format(self.name, self.weapon)\n",
" \n",
" def __eq__(self, other):\n",
" return self.name == other.name and self.weapon == other.weapon\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Our hash function combines both attributes in a tuple. Both are strings, so immutable, so hashable."
]
},
{
"cell_type": "code",
"execution_count": 156,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"@add_to(Cyborg) \n",
"def __hash__(self):\n",
" return hash((self.name, self.weapon))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We really shouldn't be doing this..."
]
},
{
"cell_type": "code",
"execution_count": 157,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"robot1 = Cyborg('T-800', 'Franchi SPAS-12')\n",
"robot2 = Cyborg('T-800', 'MP5 Submachine gun')"
]
},
{
"cell_type": "code",
"execution_count": 158,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 158,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"robot1 == robot2"
]
},
{
"cell_type": "code",
"execution_count": 159,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 159,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"robot1 >= robot2"
]
},
{
"cell_type": "code",
"execution_count": 160,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"-2292265897856400339"
]
},
"execution_count": 160,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hash(robot1)"
]
},
{
"cell_type": "code",
"execution_count": 161,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"6957369841330577170"
]
},
"execution_count": 161,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hash(robot2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"See the problem here?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# Our class is mutable!\n",
"\n",
"This will break!"
]
},
{
"cell_type": "code",
"execution_count": 162,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-2292265897856400339"
]
},
"execution_count": 162,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"robot = Cyborg('T-800', 'Franchi SPAS-12')\n",
"hash(robot)"
]
},
{
"cell_type": "code",
"execution_count": 163,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'John Connor'"
]
},
"execution_count": 163,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"target = dict()\n",
"target[robot] = \"John Connor\"\n",
"target[robot]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"So far this works, but..."
]
},
{
"cell_type": "code",
"execution_count": 164,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4054157808228906460"
]
},
"execution_count": 164,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"robot.weapon = 'M79 grenade launcher'\n",
"hash(robot)"
]
},
{
"cell_type": "code",
"execution_count": 165,
"metadata": {},
"outputs": [
{
"ename": "KeyError",
"evalue": "Cyborg(T-800, M79 grenade launcher)",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mtarget\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mrobot\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mKeyError\u001b[0m: Cyborg(T-800, M79 grenade launcher)"
]
}
],
"source": [
"target[robot]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We broke it!"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"That's what the docs were explaining when they told us \\[emphasis added\\]:\n",
"\n",
"> [...] If a class defines mutable objects and implements an [\\_\\_eq\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__eq__) method, it should not implement [\\_\\_hash\\_\\_()](https://docs.python.org/3/reference/datamodel.html#object.__hash__), since the implementation of hashable collections requires that a key’s hash value is immutable (**if the object’s hash value changes, it will be in the wrong hash bucket**)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# Dictionaries store _references_\n",
"\n",
"A final comment. The dictionary (and thus also sets) store _a reference_ to our object."
]
},
{
"cell_type": "code",
"execution_count": 166,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"140116472769168"
]
},
"execution_count": 166,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"robot = Cyborg('T-1000', 'Remington 870')\n",
"id(robot)"
]
},
{
"cell_type": "code",
"execution_count": 167,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [],
"source": [
"terminators = set()\n",
"terminators.add(robot)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"Let's retrieve the element from the set [without removing it](https://stackoverflow.com/q/59825):"
]
},
{
"cell_type": "code",
"execution_count": 168,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"140116472769168"
]
},
"execution_count": 168,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"id(next(iter(terminators))) "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"It's not a copy — it's the same object."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"- In other words: mutating our object (which should have never been hashable)...\n",
"- ... also modifies the copy stored in the hash hable (dictionary / set!)."
]
},
{
"cell_type": "code",
"execution_count": 169,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"Cyborg(T-1000, RSB-80 Plasma Gun)"
]
},
"execution_count": 169,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"robot.weapon = 'RSB-80 Plasma Gun'\n",
"robot"
]
},
{
"cell_type": "code",
"execution_count": 170,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"{Cyborg(T-1000, RSB-80 Plasma Gun)}"
]
},
"execution_count": 170,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"terminators"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"This implies that if our hashable mutated object landed (by chance) in the same slot as it originally did, we would still be able to find it."
]
},
{
"cell_type": "code",
"execution_count": 171,
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Original hash: 2225710126195077769\n"
]
}
],
"source": [
"robot = Cyborg('T-1000', 'Remington 870')\n",
"\n",
"terminators = set()\n",
"terminators.add(robot)\n",
"original_hash = hash(robot)\n",
"print(\"Original hash:\", original_hash)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"Let's mutate the object but make sure it goes to the same slot:"
]
},
{
"cell_type": "code",
"execution_count": 172,
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Mutated hash: 2225710126195077769\n"
]
}
],
"source": [
"robot.weapon = 'RSB-80 Plasma Gun'\n",
"Cyborg.__hash__ = lambda x: original_hash\n",
"print(\"Mutated hash: \", hash(robot))"
]
},
{
"cell_type": "code",
"execution_count": 173,
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": [
"alternative:skip"
]
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 173,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"robot in terminators"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": [
"alternative:skip"
]
},
"source": [
"We can find it! It went to the same bucket.\n",
"\n",
"Hashable mutable objects will break _most_ of the time, but not always."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# Doing it right\n",
"\n",
"Experiments aside, let's do the right thing and make our mutable class non-hashable."
]
},
{
"cell_type": "code",
"execution_count": 174,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"ename": "TypeError",
"evalue": "unhashable type: 'Cyborg'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0mrobot\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCyborg\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'T-1000'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'RSB-80 Plasma Gun'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0mterminators\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mset\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 12\u001b[0;31m \u001b[0mterminators\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0madd\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrobot\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: unhashable type: 'Cyborg'"
]
}
],
"source": [
"class Cyborg:\n",
" \n",
" def __init__(self, name, weapon):\n",
" self.name = name\n",
" self.weapon = weapon\n",
" \n",
" __hash__ = None\n",
" \n",
" \n",
"robot = Cyborg('T-1000', 'RSB-80 Plasma Gun')\n",
"terminators = set()\n",
"terminators.add(robot)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": [
"alternative:skip"
]
},
"source": [
"# Wrapping up this section\n",
"\n",
"We revisited and understood the intimidating paragraph from the docs.\n",
"\n",
"- If your class doesn't define `__eq__()`, don't define `__hash__()` either. \n",
"- Mutable classes should not be hashable.\n",
" * To make it non-hashable, set `__hash__ = None`.\n",
"- If your class defines `__eq__()`, implement `__hash__()` for it to be hashable. \n",
"- Abstract base classes simplify testing whether an interface is satisfied.\n",
"- Dictionaries store references, not copies of the data."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": [
"alternative:slide"
]
},
"source": [
"# Wrapping up this section\n",
"\n",
"We revisited and understood the intimidating paragraph from the docs.\n",
"\n",
"- If your class doesn't define `__eq__()`, don't define `__hash__()` either. \n",
"- Mutable classes should not be hashable.\n",
" * To make it non-hashable, set `__hash__ = None`.\n",
"- If your class defines `__eq__()`, implement `__hash__()` for it to be hashable. \n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"