{ "cells": [ { "cell_type": "code", "execution_count": 73, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T15:52:59.867143Z", "start_time": "2019-01-30T15:52:58.120170Z" }, "slideshow": { "slide_type": "notes" } }, "outputs": [], "source": [ "import matplotlib.dates as mdates\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from datetime import date\n", "from dateutil.parser import parse\n", "history = \"\"\"December, 1989:Implementation started\n", "1990:Internal releases at CWI\n", "February 20, 1991:0.9.0 (released to alt.sources)\n", "February, 1991:0.9.1\n", "September, 1991:0.9.2\n", "December 24, 1991:0.9.4\n", "January 2, 1992:0.9.5 (Macintosh only)\n", "April 6, 1992:0.9.6\n", "January 9, 1993:0.9.8\n", "July 29, 1993:0.9.9\n", "January 26, 1994:1.0.0\n", "February 15, 1994:1.0.2\n", "May 4, 1994:1.0.3\n", "July 14, 1994:1.0.4\n", "October 11, 1994:1.1\n", "November 10, 1994:1.1.1\n", "April 13, 1995:1.2\n", "October 13, 1995:1.3\n", "October 25, 1996:1.4\n", "January 3, 1998:1.5\n", "October 31, 1998:1.5.1\n", "April 13, 1999:1.5.2\n", "September 5, 2000:1.6\n", "October 16, 2000:2.0\n", "February 25, 2001:1.6.1\n", "April 17, 2001:2.1\n", "December 21, 2001:2.2\n", "July 29, 2003:2.3\n", "November 30, 2004:2.4\n", "September 16, 2006:2.5\n", "October 1, 2008:2.6\n", "December 3, 2008:3.0\n", "June 27, 2009: 3.1\n", "July 3, 2010: 2.7\n", "February 20, 2011: 3.2\n", "September 29, 2012: 3.3\n", "March 16, 2014: 3.4\n", "September 13, 2015: 3.5\n", "December 23, 2016: 3.6\n", "June 27, 2018:3.7\n", "January 1, 2020: 2.7 EOL\"\"\"\n", "\n", "dates = []\n", "names = []\n", "for entry in history.split('\\n'):\n", " datestr, version = entry.split(':')\n", " dates.append(parse(datestr))\n", " names.append(version)\n", "\n", "\n", "def plot_timeline(dates, names, title, spans=[]):\n", " levels = np.array([-5, 5, -4, 4, -3, 3, -2, 2])\n", " fig, ax = plt.subplots(figsize=(12, 5))\n", "\n", " # Create the base line\n", " start = min(dates)\n", " stop = max(dates)\n", " ax.plot((start, stop), (0, 0), 'k', alpha=.5)\n", "\n", " # Iterate through releases annotating each one\n", " for ii, (iname, idate) in enumerate(zip(names, dates)):\n", " level = levels[ii % len(levels)]\n", " vert = 'top' if level < 0 else 'bottom'\n", "\n", " ax.scatter(idate, 0, s=100, facecolor='w', edgecolor='k', zorder=9999)\n", " # Plot a line up to the text\n", " ax.plot((idate, idate), (0, level), c='r', alpha=.7)\n", " # Give the text a faint background and align it properly\n", " ax.text(idate, level, iname,\n", " horizontalalignment='right', verticalalignment=vert, fontsize=14,\n", " backgroundcolor=(1., 1., 1., .3))\n", " for args in spans:\n", " ax.axvspan(*args, alpha=0.2)\n", " ax.set(title=title)\n", " # Set the xticks formatting\n", " # format xaxis with 3 month intervals\n", " ax.get_xaxis().set_major_locator(mdates.YearLocator())\n", " ax.get_xaxis().set_major_formatter(mdates.DateFormatter(\"%Y\"))\n", " fig.autofmt_xdate()\n", "\n", " # Remove components for a cleaner look\n", " plt.setp((ax.get_yticklabels() + ax.get_yticklines() +\n", " list(ax.spines.values())), visible=False)\n", " plt.tight_layout()\n", " plt.savefig(f\"{title.lower().replace(' ','_')}.png\")\n", "\n", "\n", "plot_timeline(dates, names, \"Python Release Dates\",\n", " [(parse(\"july 2, 2010\"), parse(\"january 1,2020\")),\n", " (parse(\"December 3, 2008\"), date.today(), 0.2, 0.8)\n", " ])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Python 3: More than just `print()`\n", "\n", "* Andrew Bolster\n", "* Threat Intelligence Data Scientist (Alert Logic)\n", "* Founding Director (Farset Labs)\n", "* Pythionista for ~10 years" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# What We'll Cover\n", "\n", "* History of Python\n", "* Significant Features\n" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2019-01-22T12:49:00.132641Z", "start_time": "2019-01-22T12:49:00.127189Z" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "## But TL;DR?\n", "\n", "* As of Python 3.7; in all but one test type (_why are you doing crypto in python?_), [*3 is 20% faster than 2*](https://hackernoon.com/which-is-the-fastest-version-of-python-2ae7c61a6b2b)\n", "* The language features developed make Python both performent and stable\n", "* The breaking changes between 2 and 3 were due to poor historical architectural decisions; there are no plans for breaking changes going forward\n", "* [2.7 EOL is in less than a year](https://pythonclock.org/): Most major packages have already dropped (non security) support for it, including:\n", " * Numpy\n", " * Pandas\n", " * matplotlib\n", " * dask\n", " * sympy\n", "* Of the [Top 360 most popular Python modules](http://py3readiness.org/) only one hasn't migrated to Python 3: [`apache-beam`](https://beam.apache.org/get-started/beam-overview/) (Which is a Java-first SDK anyway so stuff 'em)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Timeline\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Significant Features/Changes\n", "* `print` _let's just get that out of the way, shall we?_\n", "* Integer Division\n", "* f-strings\n", "* υηι¢σ∂є\n", "* Iterable Unpacking\n", "* Iterators, Generators, `next`s, oh my!\n", "* changes to `dict` behaviour\n", "* dataclasses" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## `print()`\n", "\n", "Probably the most obvious, contentious, but also meaningless change in py3\n" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T13:01:32.544310Z", "start_time": "2019-01-22T13:01:32.538414Z" }, "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello World!\n" ] } ], "source": [ "print(\"Hello World!\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "But it's more than just brackets;" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T13:01:37.906623Z", "start_time": "2019-01-22T13:01:37.899258Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello World !\n" ] } ], "source": [ "print(\"Hello\", \"World\", \"!\") # Native Tuples" ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T13:01:39.919136Z", "start_time": "2019-01-22T13:01:39.914965Z" }, "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello\tWorld\t!\n" ] } ], "source": [ "print(\"Hello\", \"World\", \"!\", sep='\\t') # Custom Separators" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T13:01:42.219575Z", "start_time": "2019-01-22T13:01:42.214818Z" }, "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello World!\n" ] } ], "source": [ "print(\"Hello\", end=' ') # Tail override\n", "print(\"World!\")" ] }, { "cell_type": "code", "execution_count": 87, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T13:02:34.550211Z", "start_time": "2019-01-22T13:02:34.543254Z" }, "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "fatal error\n" ] } ], "source": [ "import sys\n", "print(\"fatal error\", file=sys.stderr) # Can still do piping to file handlers" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Integer Division (/ vs //)\n", "\n", "`print` as a keyword vs `print()` as a function is just a bit of syntactic sugar to simplify the cPython API, however... some changes are more subtle and _more likely to cause non-trivial bugs when porting 'stable' code_\n", "\n", "This is one of them..." ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T12:32:01.859025Z", "start_time": "2019-01-22T12:32:01.849628Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4 / 2 = 2.0\n", "3 / 2 = 1.5\n", "4 // 2 = 2\n", "3 // 2 = 1\n", "4 / 2.0 = 2.0\n", "3 / 2.0 = 1.5\n", "3 // 2.0 = 1.0\n" ] } ], "source": [ "print('4 / 2 =', 4 / 2)\n", "print('3 / 2 =', 3 / 2)\n", "print('4 // 2 =', 4 // 2)\n", "print('3 // 2 =', 3 // 2)\n", "print('4 / 2.0 =', 4 / 2.0)\n", "print('3 / 2.0 =', 3 / 2.0)\n", "print('3 // 2.0 =', 3 // 2.0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Division *now* works in a way that you'd expect a duck-type language to; i.e.\n", "\n", "* `/` _always_ returns a `float` even when it's not numerically necessary\n", "* `//` returns as the type of the denominator, but always with an integer value ($\\in \\mathbb{Z}$)\n" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T12:32:07.512628Z", "start_time": "2019-01-22T12:32:07.498864Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4 / 2 = \n", "3 / 2 = \n", "4 // 2 = \n", "3 // 2 = \n", "4 / 2.0 = \n", "3 / 2.0 = \n", "3 // 2.0 = \n" ] } ], "source": [ "print('4 / 2 =', type(4 / 2))\n", "print('3 / 2 =', type(3 / 2))\n", "print('4 // 2 =', type(4 // 2))\n", "print('3 // 2 =', type(3 // 2))\n", "print('4 / 2.0 =', type(4 / 2.0))\n", "print('3 / 2.0 =', type(3 / 2.0))\n", "print('3 // 2.0 =', type(3 // 2.0))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Also note that this is not `round`; this is `floor` division" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T12:40:20.782290Z", "start_time": "2019-01-22T12:40:20.773670Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5 / 6 = 0.8333333333333334\n", "5 // 6 = 0\n", "‖5/6‖ = 1\n" ] } ], "source": [ "print(\"5 / 6 = \", 5/6)\n", "print(\"5 // 6 = \", 5//6)\n", "print(\"‖5/6‖ = \", round(5/6))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "And if anyone is wondering about performance...\n" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T12:43:30.812714Z", "start_time": "2019-01-22T12:43:19.461666Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "14 ns ± 1.15 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)\n" ] } ], "source": [ "% % timeit\n", "\n", "x = 7//2" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T12:44:53.160826Z", "start_time": "2019-01-22T12:44:53.156828Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "from math import floor" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T12:45:00.845239Z", "start_time": "2019-01-22T12:44:53.954721Z" }, "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "84.7 ns ± 3.32 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n" ] } ], "source": [ "% % timeit\n", "\n", "x = floor(7/2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## υηι¢σ∂є\n", "\n", "From the original 3.0 release notes:\n", "> Everything you thought you knew about binary data and Unicode has changed.\n", "\n", "* All strings are Unicode, but unicode encoded strings are stored as binary\n", "* No more `u\"\"` junk\n", "* Incorrectly encoded `open`'s will fail **loudly**\n", "\n", "The transition is largely painless unless you're doing something really 'clever' to get around python2's utter failings in interacting with Unicode in sensible ways. \n", "\n", "Best of all... this isn't just a string processing change; this is fundamental to the interpreter... so..." ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T13:14:57.604808Z", "start_time": "2019-01-22T13:14:57.596488Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "array([-0.44807362, 0.89399666])" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from numpy import array, cos, sin\n", "\n", "\n", "def rotate(vector, angle):\n", " θ = angle\n", " mat = [[cos(θ), -sin(θ)],\n", " [sin(θ), cos(θ)]]\n", " mat = array(mat)\n", " return mat @ vector # << Sneaky mat_mul operator for free too\n", "\n", "\n", "rotate([1, 0], 90)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unfortunately you can only use printable characters as variable identifiers, so no emojis, but you can go mad elsewhere:" ] }, { "cell_type": "code", "execution_count": 107, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T13:22:56.458846Z", "start_time": "2019-01-22T13:22:56.451679Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python 2 is 💩\n", "Python 2 was 💩\n", "💩 si 2 nohtyP\n" ] } ], "source": [ "import emoji # pip install emoji\n", "this_is_a_regular_string = emoji.emojize(\n", " \"Python 2 is :poop:\", use_aliases=True)\n", "print(this_is_a_regular_string)\n", "print(this_is_a_regular_string.replace('is', 'was'))\n", "print(''.join(reversed(this_is_a_regular_string))) # << Spoilers Ahead" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## f-strings (py3.6)\n", "\n", "* Replacement for `%` and `str.format()` methods\n", "* Jinja-like templating of local scope variables\n", " * (Basically like having an interpreter inside a string)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T15:14:49.938548Z", "start_time": "2019-01-30T15:14:49.934398Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This is a thing\n" ] } ], "source": [ "thing = 'thing'\n", "print(f\"This is a {thing}\")" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T15:14:59.801799Z", "start_time": "2019-01-30T15:14:59.797968Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This is a loud THING\n" ] } ], "source": [ "print(f\"This is a loud {thing.upper()}\")" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T15:28:49.745385Z", "start_time": "2019-01-30T15:28:49.734304Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "Andrew Bolster (30)" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import datetime\n", "from datetime import date\n", "\n", "\n", "class Person:\n", " def __init__(self, first_name: str, last_name: str, birthday: date, gender: str):\n", " self.first_name = first_name\n", " self.last_name = last_name\n", " self.birthday = birthday\n", " self.gender = gender\n", "\n", " @property\n", " def age(self):\n", " today = date.today()\n", " return today.year - self.birthday.year - ((today.month, today.day) < (self.birthday.month, self.birthday.day))\n", "\n", " def __str__(self):\n", " return f\"{self.first_name} {self.last_name}\"\n", "\n", " def __repr__(self):\n", " return f\"{self.first_name} {self.last_name} ({self.age})\"\n", "\n", "\n", "p = Person(\"Andrew\", \"Bolster\", date(1988, 5, 17), 'Male')\n", "p" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T15:28:57.854122Z", "start_time": "2019-01-30T15:28:57.849412Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'Andrew Bolster'" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f\"{p}\" # Defaults to __str__" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T15:28:58.461623Z", "start_time": "2019-01-30T15:28:58.458100Z" } }, "outputs": [ { "data": { "text/plain": [ "'Andrew Bolster (30)'" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f\"{p!r}\" # Can be poked to use __repr__" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T15:30:22.531617Z", "start_time": "2019-01-30T15:30:22.524765Z" } }, "outputs": [ { "data": { "text/plain": [ "'Andrew Bolster is 30 and this is a multiline f-string'" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f\"{p}\"\\\n", " f\" is {p.age}\"\\\n", " f\" and this is a multiline {'f-string'}\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Performance wise; f-strings are _fast_\n", "About 30% faster than `%`\n", "50% faster than `.format()`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Also support standard formatting syntax" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T15:32:41.453011Z", "start_time": "2019-01-30T15:32:41.448888Z" } }, "outputs": [ { "data": { "text/plain": [ "3.141592653589793" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from math import pi\n", "pi" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T15:37:50.754745Z", "start_time": "2019-01-30T15:37:50.747244Z" } }, "outputs": [ { "data": { "text/plain": [ "'003.142'" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f\"{pi:07.4}\" # {value:width.precision}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Iterators and stuff\n", "\n", "### `range` behaves like `xrange` used to\n", "\n", "`xrange` is dead, long live `range`\n" ] }, { "cell_type": "code", "execution_count": 113, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T13:40:56.186205Z", "start_time": "2019-01-22T13:40:56.177204Z" }, "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "(10000, 49995000, 9999, 0)" ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "span = range(10000)\n", "len(span), sum(span), max(span), min(span)" ] }, { "cell_type": "code", "execution_count": 110, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T13:39:57.252348Z", "start_time": "2019-01-22T13:39:57.244898Z" } }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "4 in span" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So what?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "`range` now returns an 'iterator'; elements are not populated until used" ] }, { "cell_type": "code", "execution_count": 117, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T13:42:09.702243Z", "start_time": "2019-01-22T13:42:09.694114Z" } }, "outputs": [ { "data": { "text/plain": [ "100000000000000000" ] }, "execution_count": 117, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stupid_span = range(int(10e16))\n", "len(stupid_span)" ] }, { "cell_type": "code", "execution_count": 118, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T13:42:50.658772Z", "start_time": "2019-01-22T13:42:50.652803Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "101\n" ] } ], "source": [ "for i in stupid_span: # Doesn't blow up memory\n", " if i ** 2 > 10000:\n", " break\n", "print(i)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "`dict`, `zip`, `reversed` and a load of other functions now return iterators" ] }, { "cell_type": "code", "execution_count": 123, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T13:44:50.190575Z", "start_time": "2019-01-22T13:44:50.184689Z" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 123, "metadata": {}, "output_type": "execute_result" } ], "source": [ "zip(range(5), range(5, 0, -1))" ] }, { "cell_type": "code", "execution_count": 126, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T13:45:29.267369Z", "start_time": "2019-01-22T13:45:29.260768Z" } }, "outputs": [ { "data": { "text/plain": [ "{'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4}" ] }, "execution_count": 126, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from string import ascii_letters\n", "dict(zip(ascii_letters, range(5)))" ] }, { "cell_type": "code", "execution_count": 127, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T13:45:43.412319Z", "start_time": "2019-01-22T13:45:43.405548Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "dict_keys(['a', 'b', 'c', 'd', 'e'])" ] }, "execution_count": 127, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = dict(zip(ascii_letters, range(5)))\n", "d.keys()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note; this is a 'view', not an actual list, even though it looks like it.\n", "The motivation for this is that `dict.items()` etc. in py2 produced realised-views of the values as a fully populated list. This was expensive." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Oh, BTW, `dicts` are now sorted!\n", "Previously insertion-sorting was not guaranteed; `dicts` will *always* be returned in the same order as they were inserted" ] }, { "cell_type": "code", "execution_count": 130, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T13:49:55.202126Z", "start_time": "2019-01-22T13:49:55.195646Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "{'b': 1, 'c': 2, 'd': 3, 'e': 4, 'a': -1}" ] }, "execution_count": 130, "metadata": {}, "output_type": "execute_result" } ], "source": [ "del d['a']\n", "d['a'] = -1\n", "d" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "However, since this is a view; things don't always work how you'd imagine" ] }, { "cell_type": "code", "execution_count": 132, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T13:50:41.392550Z", "start_time": "2019-01-22T13:50:41.376765Z" } }, "outputs": [ { "data": { "text/plain": [ "[4, 3, 2, 1, 0]" ] }, "execution_count": 132, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(reversed(range(5)))" ] }, { "cell_type": "code", "execution_count": 133, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T13:50:50.477168Z", "start_time": "2019-01-22T13:50:50.460473Z" } }, "outputs": [ { "ename": "TypeError", "evalue": "'dict_keys' object is not reversible", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mreversed\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0md\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mkeys\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: 'dict_keys' object is not reversible" ] } ], "source": [ "reversed(d.keys())" ] }, { "cell_type": "code", "execution_count": 136, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T13:51:15.731120Z", "start_time": "2019-01-22T13:51:15.725230Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "['a', 'e', 'd', 'c', 'b']" ] }, "execution_count": 136, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(reversed(list(d.keys()))) # need to instantiate the view" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The downside to all of this is that sometimes your code will be peppered with `list`s..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Advanced Unpacking\n" ] }, { "cell_type": "code", "execution_count": 143, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T13:54:19.839626Z", "start_time": "2019-01-22T13:54:19.834510Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 [1, 2, 3, 4, 5, 6, 7]\n", "1 [2, 3, 4, 5, 6, 7]\n", "2 [3, 4, 5, 6, 7]\n", "3 [4, 5, 6, 7]\n", "4 [5, 6, 7]\n", "5 [6, 7]\n", "6 [7]\n", "7 []\n" ] } ], "source": [ "# First with sensible lists:\n", "values = [0, 1, 2, 3, 4, 5, 6, 7]\n", "while values:\n", " first, *values = values\n", " print(first, values)" ] }, { "cell_type": "code", "execution_count": 142, "metadata": { "ExecuteTime": { "end_time": "2019-01-22T13:54:11.231558Z", "start_time": "2019-01-22T13:54:11.225678Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 7 [1, 2, 3, 4, 5, 6]\n", "1 6 [2, 3, 4, 5]\n", "2 5 [3, 4]\n", "3 4 []\n" ] } ], "source": [ "# First with sensible lists:\n", "values = [0, 1, 2, 3, 4, 5, 6, 7]\n", "while values:\n", " head, *values, tail = values\n", " print(head, tail, values)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Type annotations\n", "\n", "* Lazy type hinting; \n", "* Great for documenting what you expect and IDE-assist; doesn't do type validation\n", "* Extensions available for auto generation of sphinx-docs based on hints\n", "* Optional type checking via [mypy](http://mypy-lang.org/)\n", "* Not used for any runtime performance optimisation or anything\n", "* _Kinda_ used in `dataclasses`" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T13:56:38.250976Z", "start_time": "2019-01-30T13:56:38.245350Z" } }, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def add(a: int, b: int)->int:\n", " return a+b\n", "\n", "add(5, 5)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T13:56:45.414795Z", "start_time": "2019-01-30T13:56:45.411122Z" } }, "outputs": [ { "data": { "text/plain": [ "'thisthat'" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "add('this', 'that') # badness" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "from typing import Iterator\n", "\n", "def fib(n: int) -> Iterator[int]:\n", " a, b = 0, 1\n", " while a < n:\n", " yield a\n", " a, b = b, a+b" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T16:02:12.204337Z", "start_time": "2019-01-30T16:02:12.182405Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "['first', 'second', 'third', 'forth', 'fifth']" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from typing import *\n", "from operator import itemgetter\n", "\n", "def listtacular(listicle: Dict[AnyStr,int])->List[AnyStr]:\n", " listable = []\n", " for k, v in sorted(listicle.items(), key=itemgetter(1)):\n", " listable.append(k)\n", " return listable\n", "\n", "listtacular({'first':1, 'fifth':5, 'second':2, 'forth':4, 'third':3})" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## `dataclass` (py3.7)\n", "\n", "Basically, a massive shortcut for building object classes\n", "\n", "Highly recommend watching Raymond Hettinger's PyCon 2018 talk https://www.youtube.com/watch?v=T-TwcmT6Rcw \n", "\n", "TLDR:\n", "1. It makes a mutable data holder, in the spirit of `collections.namedtuple`\n", "2. It writes boiler-plate code for you, simplifying the process of writing the `class`." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T13:20:57.027659Z", "start_time": "2019-01-30T13:20:57.009374Z" } }, "outputs": [], "source": [ "# Code You write\n", "from dataclasses import dataclass\n", "\n", "\n", "@dataclass\n", "class Color:\n", " hue: int\n", " saturation: float\n", " lightness: float = 0.5" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T13:21:01.499483Z", "start_time": "2019-01-30T13:21:01.487207Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "from dataclasses import Field, _MISSING_TYPE, _DataclassParams\n", "\n", "class Color:\n", " 'Color(hue: int, saturation: float, lightness: float = 0.5)'\n", "\n", " def __init__(self, hue: int, saturation: float, lightness: float = 0.5) -> None:\n", " self.hue = hue\n", " self.saturation = saturation\n", " self.lightness = lightness\n", "\n", " def __repr__(self):\n", " return (self.__class__.__qualname__ +\n", " f\"(hue={self.hue!r}, saturation={self.saturation!r}, \"\n", " f\"lightness={self.lightness!r})\")\n", "\n", " def __eq__(self, other):\n", " if other.__class__ is self.__class__:\n", " return (self.hue, self.saturation, self.lightness) == (other.hue, other.saturation, other.lightness)\n", " return NotImplemented\n", "\n", " __hash__ = None\n", "\n", " hue: int\n", " saturation: float\n", " lightness: float = 0.5" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T13:21:01.499483Z", "start_time": "2019-01-30T13:21:01.487207Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ " __dataclass_params__ = _DataclassParams(\n", " init=True,\n", " repr=True,\n", " eq=True,\n", " order=False,\n", " unsafe_hash=False,\n", " frozen=False)\n", "\n", " __dataclass_fields__ = {\n", " 'hue': Field(default=_MISSING_TYPE,\n", " default_factory=_MISSING_TYPE,\n", " init=True,\n", " repr=True,\n", " hash=None,\n", " compare=True,\n", " metadata={}),\n", " 'saturation': Field(default=_MISSING_TYPE,\n", " default_factory=_MISSING_TYPE,\n", " init=True,\n", " repr=True,\n", " hash=None,\n", " compare=True,\n", " metadata={}),\n", " 'lightness': Field(default=0.5,\n", " default_factory=_MISSING_TYPE,\n", " init=True,\n", " repr=True,\n", " hash=None,\n", " compare=True,\n", " metadata={})\n", " }\n", " __dataclass_fields__['hue'].name = 'hue'\n", " __dataclass_fields__['hue'].type = int\n", " __dataclass_fields__['saturation'].name = 'saturation'\n", " __dataclass_fields__['saturation'].type = float\n", " __dataclass_fields__['lightness'].name = 'lightness'\n", " __dataclass_fields__['lightness'].type = float" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T15:45:47.382917Z", "start_time": "2019-01-30T15:45:47.372961Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "Person(first_name='Andrew', last_name='Bolster', birthday=datetime.date(1988, 5, 17), gender='Male')" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from dataclasses import dataclass\n", "from datetime import date\n", "\n", "\n", "@dataclass\n", "class Person: # Basically, gets rid of boring boilerplate\n", " first_name: str\n", " last_name: str\n", " birthday: date\n", " gender: str\n", "\n", "\n", "p = Person(\"Andrew\", \"Bolster\", date(1988, 5, 17), 'Male')\n", "p" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T15:46:45.947800Z", "start_time": "2019-01-30T15:46:45.940899Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "Person(first_name='Andrew', last_name='Bolster')" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@dataclass\n", "class Person: # Basically, gets rid of boring boilerplate\n", " first_name: str\n", " last_name: str\n", " birthday: date = field(repr=False)\n", " gender: str = field(repr=False)\n", "\n", "\n", "p = Person(\"Andrew\", \"Bolster\", date(1988, 5, 17), 'Male')\n", "p" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T15:47:38.078481Z", "start_time": "2019-01-30T15:47:38.070977Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "Person(first_name='Andrew', last_name='Bolster')" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from dataclasses import dataclass, field\n", "from datetime import date\n", "\n", "\n", "@dataclass\n", "class Person: # Basically, gets rid of boring boilerplate\n", " first_name: str\n", " last_name: str\n", " birthday: date = field(repr=False)\n", " gender: str = field(repr=False)\n", "\n", " @property\n", " def age(self):\n", " today = date.today()\n", " return today.year - self.birthday.year \\\n", " - ((today.month, today.day) < (self.birthday.month, self.birthday.day))\n", "\n", " def __str__(self):\n", " return f\"{self.first_name} {self.last_name} ({self.age})\"\n", "\n", "\n", "p = Person(\"Andrew\", \"Bolster\", date(1988, 5, 17), 'Male')\n", "p" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T15:47:44.756966Z", "start_time": "2019-01-30T15:47:44.754169Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Andrew Bolster (30)\n" ] } ], "source": [ "print(p)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### But wait, there's more!\n", "\n", "Passing class decorator arguments to augment output objects; e.g.\n", "\n", "* 'order': adds `__lt__`/`__gt__` etc methods based on tuple-ordering of attributes\n", "* 'frozen': adds `__hash__` method to add immutability / hashability\n", "\n", "\n", "Also `field` declarations to provide per-attribute control over these things" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T16:43:49.214708Z", "start_time": "2019-01-30T16:43:49.203055Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "from dataclasses import dataclass, field\n", "from datetime import datetime\n", "import uuid\n", "\n", "@dataclass(order=True, frozen=True)\n", "class MP:\n", " name: str\n", " gender: str = field(repr=False)\n", " salary: int = field(hash=False, repr=False, metadata={'units': 'GBP'})\n", " age: int = field(hash=False, repr=False)\n", " party: str = field(hash=True, repr=True, default='Independent')\n", " ate: list = field(default_factory=list, compare=False, repr=False)\n", " emp_id: uuid.UUID = field(\n", " default_factory=uuid.uuid4, compare=True, repr=False\n", " )\n", "\n", " def eats(self, thing):\n", " self.ate.append((thing, datetime.now()))" ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T16:51:32.675485Z", "start_time": "2019-01-30T16:51:32.659837Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "MP(name='Sammy Wilson', party='DUP')" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "e1 = MP(name='Sammy Wilson',\n", " gender='male', party='DUP',\n", " salary=77_379, # Another cool py3 feature ;)\n", " age=65,\n", "\n", " )\n", "e2 = MP(name='Caroline Lucas',\n", " gender='female', party='Greens',\n", " salary=77_379, # Another cool py3 feature ;)\n", " age=56,\n", " )\n", "e1 # Non-repr fields not displayed" ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T16:51:33.277433Z", "start_time": "2019-01-30T16:51:33.271347Z" } }, "outputs": [ { "data": { "text/plain": [ "[MP(name='Sammy Wilson', party='DUP'),\n", " MP(name='Caroline Lucas', party='Greens')]" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[e1, e2]" ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T16:51:33.865637Z", "start_time": "2019-01-30T16:51:33.857772Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[MP(name='Caroline Lucas', party='Greens'),\n", " MP(name='Sammy Wilson', party='DUP')]" ] }, "execution_count": 98, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted([e1, e2]) # thanks to 'order'" ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T16:51:34.547711Z", "start_time": "2019-01-30T16:51:34.541765Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "{MP(name='Sammy Wilson', party='DUP'): 'Brexiteers',\n", " MP(name='Caroline Lucas', party='Greens'): 'Sane'}" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "affiliations = {\n", " e1: 'Brexiteers',\n", " e2: 'Sane'\n", "}\n", "affiliations # thanks to 'frozen'" ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "ExecuteTime": { "end_time": "2019-01-30T16:51:35.224956Z", "start_time": "2019-01-30T16:51:35.216883Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sammy Wilson, from the Brexiteers camp, ate fish and chips\n", "Caroline Lucas, from the Sane camp, ate Nothing\n" ] } ], "source": [ "e1.eats('fish')\n", "e1.eats('chips')\n", "for e, camp in affiliations.items():\n", " msg = f\"{e.name}, from the {camp} camp, \"\\\n", " f\"ate {' and '.join([m[0] for m in e.ate]) if e.ate else 'Nothing'}\"\n", " print(msg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# What we done covered\n", "\n", "* `print` \n", "* / vs // \n", "* **unicode**\n", "* Catching Constructions (i.e. first,*rest = iterable)\n", "* changes to dict (i.e. views)\n", "* f-strings (including performance)\n", "* `typing` / type hinting\n", "* `dataclasses`\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "**Anything I've missed / Undersold?**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Conclusion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* If you're not using at least Python 3.5, you're missing out" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* If you're still stuck 2.7, you're going to be left behind" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* If you're still _starting new projects_ in 2.7, you deserve all the pain that's coming your way" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python (py37)", "language": "python", "name": "py37" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" }, "livereveal": { "transitionSpeed": "fast" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }