{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Transforming Code into Beautiful, Idiomatic Python\n", "by Raymond Hettinger
\n", "Learn to take better advantage of Python's best features and improve existing code through a series of code transformations, \"When you see this, do that instead.\"\n", "\n", "Raymond gave the talk but did not take this notes. Notes by Alvin Chia. The notes may include further information. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python 3.6.4 :: Anaconda custom (64-bit)\r\n" ] } ], "source": [ "# This note is taken using Python version\n", "!python --version" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ " " ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.display import HTML\n", "HTML(' ')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Looping" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Looping over a range of numbers" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "1\n", "4\n", "9\n", "16\n", "25\n" ] } ], "source": [ "for i in [0, 1, 2, 3, 4, 5]:\n", " print(i**2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Is there a better way to improve existing code?" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "1\n", "4\n", "9\n", "16\n", "25\n" ] } ], "source": [ "for i in range(6):\n", " print(i**2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Looping over a collection" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "colors = ['red', 'green', 'blue', 'yellow']" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "yellow\n", "blue\n", "green\n", "red\n" ] } ], "source": [ "for i in range(len(colors)-1, -1, -1):\n", " print(colors[i])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Horrible code above. Write pythonic code." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "yellow\n", "blue\n", "green\n", "red\n" ] } ], "source": [ "for color in reversed(colors):\n", " print(color)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Looping over a collection and indicies" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 ----> red\n", "1 ----> green\n", "2 ----> blue\n", "3 ----> yellow\n" ] } ], "source": [ "for i in range(len(colors)):\n", " print(i,'---->',colors[i])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How do you rewrite pythonic without using indicies?" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 ----> red\n", "1 ----> green\n", "2 ----> blue\n", "3 ----> yellow\n" ] } ], "source": [ "# Use enumerate\n", "# Fast, beautiful code\n", "# Whenever you use indicies to index, something is wrong. \n", "for i, color in enumerate(colors):\n", " print(i, '---->', color)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Looping over two collection" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "names = ['raymond', 'rachel', 'matthew']\n", "colors = ['red', 'green', 'blue', 'yellow']" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "raymond --> red\n", "rachel --> green\n", "matthew --> blue\n" ] } ], "source": [ "n = min(len(names), len(colors))\n", "for i in range(n):\n", " print(names[i], '-->', colors[i])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Why do such a thing? Because it works in every other languages they learned.
\n", "What is the Python way? Use zip. Actually, it was in the very first version of Lisp if you read the original paper came out on Lisp.
\n", "zip has a deep history and is a proven winning performer.
\n", "The code now is clean and beautiful." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "raymond --> red\n", "rachel --> green\n", "matthew --> blue\n" ] } ], "source": [ "for name, color in zip(names, colors):\n", " print(name, '-->', color)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Anything wrong with above the code?
\n", "To over loop over, it manifests a third list in memory. The third list consists of tuples each of which is its own separate object. The code uses more memory than the first.
\n", "How to make a program run fast? On modern processors only one thing matters is the code running on L1 cache.
\n", "If you have a cache miss, the Intel optimization guide has this horrifying line in it that says the cost of a cache miss is that simple move becomes as expensive as a floating point divide. It can go from a half clock cycle to 400 to 600 clock cycles. You can lose two and half orders of magnitude by not being in cache.
\n", "If these lists are really big, the zip is not going to fit in cache.
\n", "For Python 2, use izip (iterators) instead of zip.
\n", "For Python 3, the built in zip does the same job as izip in Python 2.x (returns a generator instead of a list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Looping in a sorted order" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "colors = ['red', 'green', 'blue', 'yellow']" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "blue\n", "green\n", "red\n", "yellow\n" ] } ], "source": [ "for color in sorted(colors):\n", " print(color)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How do you reversed the sorted list? " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "yellow\n", "red\n", "green\n", "blue\n" ] } ], "source": [ "for color in sorted(colors, reverse=True):\n", " print(color)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Custom sort order" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "colors = ['red', 'green', 'blue', 'yellow']" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# Old way using cmp parameter\n", "def compare_length(c1, c2):\n", " if len(c1) < len(c2): return -1\n", " if len(c2) > len(c2): return 1\n", " return 0\n", "# In Python 2.x \n", "# In Python 3.x the cmp parameter is removed completely\n", "# print(sorted(colors, cmp=compare_length))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Horrifiying slow. You can write a shorter function and faster.
\n", "How many times will be this function call. If you have a million items in a list. And you are doing sort and the number of comparision is nlogn, so it is a log of a million base 2 is 20 million comparision which a long and slow.
\n", "Is there a better way?
\n", "Sorted colors key equal length. The key function gets called exactly once per key. Which is better? 1 call or 20 million calls? " ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['red', 'blue', 'green', 'yellow']\n" ] } ], "source": [ "print(sorted(colors, key=len))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Call a function until a sentinel value" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "# blocks = []\n", "# while True:\n", "# blocks = f.read(32)\n", "# if block == '':\n", "# break\n", "# blocks.append(block)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "# Use iter\n", "# blocks = []\n", "#for blocks in iter(partial(f.read, 32), ''):\n", "# blocks.append(block)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Distinguishing multiple exit points in loop" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "# Using flags which slow down your code\n", "def find(seq, target):\n", " found = False\n", " for i, value in enumerate(seq):\n", " if value == target:\n", " found = True\n", " break\n", " if not found:\n", " return -1\n", " return 1" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "# Better way! \n", "def find(seq, target):\n", " for i, value in enumerate(seq):\n", " if value == target:\n", " break\n", " else:\n", " # what the else means the code finished the loop\n", " # which is not-break\n", " return 1\n", " return -1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Dictionary Skills" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Mastering dictionaries is a fundamental Python skill\n", "- They are fundamental tool for expressing relationships, linking, counting, and grouping" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Looping over dictionary keys" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "d = {'matthew': 'blue', 'rachel':'green', 'raymond':'red'}" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "matthew --> blue\n", "rachel --> green\n", "raymond --> red\n" ] } ], "source": [ "# needs to rehash every key\n", "for k in d:\n", " print(k,'-->',d[k])" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "matthew --> blue\n", "rachel --> green\n", "raymond --> red\n" ] } ], "source": [ "# needs to rehash every key\n", "for k in d.keys():\n", " print(k,'-->',d[k])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Originally, Python items() built a real list of tuples and returned that. That could potentially takes a lot of extra memory.

\n", "Then, generators were introduced to the language in general, and that method was reimplemented as an iterator-generator method named iteritems(). The original remains for backwards compatibility.

\n", "One of Python 3's changes is that items() now returns iterators, and a list is never fully built. The iteritems() method is also gone, since items() in Python 3 works like viewitems() in Python 2.7

" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "matthew --> blue\n", "rachel --> green\n", "raymond --> red\n" ] } ], "source": [ "# better way uses tuple unpacking but makes a list of d.items() for Python 2.x\n", "# For Python 2.x: dict.items() return a copy of the dictionary's list of (key,value) pairs\n", "# For Python 3.x: dict.items() return an iterator over the dictionary's (key, value) pairs.\n", "for k, v in d.items():\n", " print(k,'-->',v)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "# best way in Python 2.7 to use iterator\n", "# For Python 2.7 dict.iteritems(): Return an iterator over the dictionary's (key, value) pairs.\n", "# for k, v in d.iteritems():\n", "# print(k,'-->',v)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Construct a dictionary from pairs" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "names = ['raymond', 'rachel', 'matthew']\n", "colors = ['red', 'green', 'blue']" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "d = dict(zip(names, colors))" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'matthew': 'blue', 'rachel': 'green', 'raymond': 'red'}" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Counting with dictionaries" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "colors = ['red', 'green', 'red', 'blue', 'green', 'red']" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'blue': 1, 'green': 2, 'red': 3}" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = {}\n", "# basic way to loop through a dictionary\n", "for color in colors:\n", " if color not in d:\n", " d[color] = 0\n", " d[color] += 1\n", "d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next level improvement, use get method" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'blue': 1, 'green': 2, 'red': 3}" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = {}\n", "for color in colors:\n", " d[color] = d.get(color, 0) + 1\n", "d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To further improve the code, use defaultdict. " ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'blue': 1, 'green': 2, 'red': 3}" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from collections import defaultdict\n", "d = defaultdict(int) #default value for integers are 0\n", "for color in colors:\n", " d[color] += 1\n", "d = dict(d) # convert back to d when you don't need defaultdict\n", "d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Grouping with dictionaries" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "names = ['raymond', 'rachel', 'matthew', 'roger', 'betty', 'melissa', 'judith', 'charlie']" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{5: ['roger', 'betty'],\n", " 6: ['rachel', 'judith'],\n", " 7: ['raymond', 'matthew', 'melissa', 'charlie']}" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = {}\n", "for name in names:\n", " key = len(name)\n", " if key not in d:\n", " d[key] = []\n", " d[key].append(name)\n", "d" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "# Use dict.setdefault()\n", "d = {}\n", "for name in names:\n", " key = len(name)\n", " d.setdefault(key, []).append(name)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defaultdict(list,\n", " {5: ['roger', 'betty'],\n", " 6: ['rachel', 'judith'],\n", " 7: ['raymond', 'matthew', 'melissa', 'charlie']})" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Modern way, speed up the code\n", "from collections import defaultdict\n", "d = defaultdict(list)\n", "for name in names:\n", " key = len(name)\n", " d[key].append(name)\n", "d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Is a dictionary popitem() atomic?" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "raymond --> red\n", "rachel --> green\n", "matthew --> blue\n" ] } ], "source": [ "d = {'matthew': 'blue', 'rachel':'green', 'raymond':'red'}\n", "\n", "while d:\n", " key, value = d.popitem()\n", " print(key, '-->', value)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Yes, it is atomic. Therefore, you don't have to put locks around it. It can be used between threads to atomically pull out a task." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Linking dictionaries" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "import os\n", "import argparse\n", "from collections import ChainMap\n", "\n", "defaults = {'color': 'red', 'user':'guest'}\n", "parser = argparse.ArgumentParser(description=\"Linking dictionaries\")\n", "parser.add_argument('-u', '--user')\n", "parser.add_argument('-c', '--color')\n", "namespace = parser.parse_args([])\n", "command_line_args = {k:v for k, v in vars(namespace).items() if v}\n", "# third dictionary not shown - os.environ" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "# Traditional way to do it - found in standard library\n", "# Standard defaults, if someone specified environment variables, it should update and takes precedence.\n", "# Command line arguments should take precedence over environment variables.\n", "# To be fast, don't copy\n", "d = defaults.copy()\n", "d.update(os.environ) \n", "d.update(command_line_args)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "# Use ChainMap instead\n", "# Precedence order: command_line_args > os.environ > defaults\n", "# Links all independent dictionary together without copying\n", "d = ChainMap(command_line_args, os.environ, defaults)\n", "\n", "#In Python2.7: from ConfigParser import _ChainMap as ChainMap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[What is the purpose of collections.ChainMap?](https://stackoverflow.com/questions/23392976/what-is-the-purpose-of-collections-chainmap) In Python 3.3 a ChainMap class was added to the collections module:
\n", "
A ChainMap class is provided for quickly linking a number of mappings so they can be treated as a single unit. It is often much faster than creating a new dictionary and running multiple update() calls.

\n", "Point out two other motivations/advantages/differences of ChainMap, compared to using a dict-update loop, thus only storing the \"final\" version\":\n", "- More information: since a ChainMap structure is \"layered\", it supports answering question like: Am I getting the \"default\" value, or an overridden one? What is the original (\"default\") value? At what level did the value get overridden (borrowing @b4hand's config example: user-config or command-line-overrides)? Using a simple dict, the information needed for answering these questions is already lost.

\n", "- Speed tradeoff: suppose you have N layers and at most M keys in each, constructing a ChainMap takes O(N) and each lookup O(N) worst-case[*], while construction of a dict using an update-loop takes O(NM) and each lookup O(1). This means that if you construct often and only perform a few lookups each time, or if M is big, ChainMap's lazy-construction approach works in your favor.\n", "\n", "[*] The analysis in (2) assumes dict-access is O(1), when in fact it is O(1) on average, and O(M) worst case. See more details [here](https://wiki.python.org/moin/TimeComplexity#dict).\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Improving Clarity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Positional arguments and indicies are nice\n", "- Keywords and names are better\n", "- The first inconvenient for the computer\n", "- The second corresponds to how human's think" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clarity function calls with keyword arguments" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "def twitter_search(msg, retweets, numtweets, popular):\n", " pass" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "# .... somewhere in the code\n", "# Code commonly found in client-side customer base\n", "# what is False? what is 20? what is True?\n", "twitter_search('@obama', False, 20, True)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [], "source": [ "# Replace unreadable code with keyword arguments\n", "# Save microseconds of compute time or hours of programmer time? \n", "twitter_search('@obama', retweets=False, numtweets=20, popular=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clarify multiple return values with named tuples" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0, 4)" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def pt_in_circle():\n", " return 0, 4\n", "\n", "# what are 0 and 4\n", "pt_in_circle()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use collections.namedtuple instead.
\n", "- namedtuple instances are memory efficient as regular tuples because they do not have per-instance dictionaries. \n", "- Each kind of namedtuple is represented by its own class, created by using the namedtuple() factory function. The arguments are the name of the new class and a string containing the names of the elements.\n", "- [Python Documentation here](https://docs.python.org/3/library/collections.html#collections.namedtuple)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "PointInCircle(x=0, y=4)" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from collections import namedtuple\n", "\n", "# declare a namedtuple \n", "PointInCircle = namedtuple(\"PointInCircle\", \"x y\")\n", "\n", "def pt_in_circle():\n", " # Clarity in code and readability \n", " return PointInCircle(x=0, y=4)\n", "\n", "# Readable __repr__ with a name=value style\n", "pt_in_circle()" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "# Use tuple unpacking \n", "x, y = pt_in_circle()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Named tuples are especially useful for assigning field names to result tuples returned by the csv or sqlite3 modules:" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade')\n", "\n", "import csv\n", "for emp in map(EmployeeRecord._make, csv.reader(open(\"employees.csv\", \"rb\"))):\n", " print(emp.name, emp.title)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Unpacking sequences" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "p = 'Raymond', 'Hettinger', 0x30, 'python@example.com'" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "# Longer way using indexing to unpack\n", "fname = p[0]\n", "lname = p[1]\n", "age = p[2]\n", "email = p[3]" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [], "source": [ "# Better and readable way to unpack\n", "fname, lname, age, email = p" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Efficiency" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- An optimization fundamental rule\n", "- Don't cause data to move around unnecessarily \n", "- It takes only a little care to avoid O(n**2) behavior instead of linear behavior" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Concatenating strings" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "names = ['raymond', 'rachel', 'matthew', 'roger', 'betty', 'melissa', 'judith', 'charlie']" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "raymond, rachel, matthew, roger, betty, melissa, judith, charlie\n" ] } ], "source": [ "# c style to concatenate string\n", "s = names[0]\n", "for name in names[1:]:\n", " s += ', ' + name\n", "print(s)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "raymond, rachel, matthew, roger, betty, melissa, judith, charlie\n" ] } ], "source": [ "# Use .join instead\n", "print(\", \".join(names))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Updating sequences" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['mark', 'rachel', 'matthew', 'roger', 'betty', 'melissa', 'judith']" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "names = ['raymond', 'rachel', 'matthew', 'roger', 'betty', 'melissa', 'judith', 'charlie']\n", "# Using indicies\n", "del names[0]\n", "names.pop()\n", "names.insert(0,'mark')\n", "names" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "deque(['mark', 'matthew', 'roger', 'betty', 'melissa', 'judith', 'charlie'])" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Uses deque instead\n", "from collections import deque\n", "names = deque(['raymond', 'rachel', 'matthew', 'roger', 'betty', 'melissa', 'judith', 'charlie'])\n", "del names[0]\n", "names.popleft()\n", "names.appendleft('mark')\n", "names" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Decorators and Context Managers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Helps separate business logic from administrative logic\n", "- Clean beautiful tools for factoring code and improving code reuse\n", "- Good naming is essential \n", "- Remember the Spiderman rule: With great power comes with great responsibility!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using decorators to factor-out administrative logic " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the function below, the administrative and business logic are all mixed together. \n", "1. Administrative logic is cache url in a dictionary that way if I go and look the same web page over and over again, I simply remember it. \n", "2. Business logic is opening a url and returning a web page
" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [], "source": [ "def web_lookup(url, saved={}):\n", " if url in saved:\n", " return saved[url]\n", " page = urllib.urlopen(url).read()\n", " saved[url] = page\n", " return page" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How do we factor-out the administrative logic? By using decorators." ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "# for python 3 \n", "from functools import lru_cache\n", "\n", "@lru_cache(maxsize=100)\n", "def web_lookup(url):\n", " return urllib.urlopen(url).read()" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "# for python 2, write a simple caching\n", "from functools import wraps\n", "def cache(func):\n", " saved = {}\n", " @wraps(func)\n", " def newfunc(*args):\n", " if args in saved:\n", " return newfunc(*args)\n", " result = func(*args)\n", " saved[args] = result \n", " return result\n", " return newfunc\n", "\n", "@cache\n", "def web_lookup(url):\n", " return urllib.urlopen(url).read()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[What does wraps does? Read here](https://stackoverflow.com/questions/308999/what-does-functools-wraps-do)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Factor-out temporary contexts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[From Python docs](https://docs.python.org/3.6/library/decimal.html): The usual start to using decimals is importing the module, viewing the current context with getcontext() and, if necessary, setting new values for precision, rounding, or enabled traps:" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[], traps=[InvalidOperation, DivisionByZero, Overflow])" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from decimal import *\n", "getcontext()" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "getcontext().prec = 7 # Set a new precision" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Back to the problem: " ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3.1415929203539823008849557522123893805309734513274\n" ] } ], "source": [ "# Copy and stored old context. Set new precision to compute. Then reset to previous context\n", "old_context = getcontext().copy()\n", "getcontext().prec = 50\n", "print(Decimal(355)/Decimal(113))\n", "setcontext(old_context)" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3.1415929203539823008849557522123893805309734513274\n", "3.141593\n" ] } ], "source": [ "# Here is a better way. Using a local context manager, it has reusable logic \n", "with localcontext(Context(prec=50)):\n", " print(Decimal(355)/Decimal(113))\n", "print(Decimal(355)/Decimal(113))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to open and close file" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [], "source": [ "f = open('data.txt')\n", "try:\n", " data = f.read()\n", "finally:\n", " f.close()" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "# A simple way \n", "with open('data.txt') as f:\n", " data = f.read()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to use locks" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [], "source": [ "# Either this \n", "from threading import Thread, Lock\n", "lock = Lock()" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Critical Session 1\n", "Critical Session 2\n" ] } ], "source": [ "# Or this (based from youtube presentation)\n", "import threading\n", " \n", "# Make a lock\n", "lock = threading.Lock()\n", "\n", "# Old way to use a lock\n", "lock.acquire()\n", "try:\n", " print(\"Critical Session 1\")\n", " print(\"Critical Session 2\")\n", "finally:\n", " lock.release()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Separate the administration logic of getting a lock from printing by using context manager:" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Critical Session 1\n", "Critical Session 2\n" ] } ], "source": [ "with threading.Lock():\n", " print(\"Critical Session 1\")\n", " print(\"Critical Session 2\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to remove file" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [], "source": [ "import os\n", "try:\n", " os.remove('somefile.tmp')\n", "except OSError:\n", " pass" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [], "source": [ "# Better Alternative:\n", "import os\n", "from contextlib import suppress\n", "\n", "with suppress(OSError):\n", " os.remove('somefile.tmp')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to redirect to stdout" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [], "source": [ "# C style using try, finally to redirect to stdout/stderr\n", "import sys\n", "with open('help.txt', 'w') as f:\n", " oldstdout = sys.stdout\n", " sys.stdout = f \n", " try:\n", " help(pow)\n", " finally:\n", " sys.stdout = oldstdout" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A better way is to use context manager.
\n", "Context manager for temporarily redirecting sys.stdout to another file or file-like object.
\n", "This tools adds flexibility to existing functions or classes whose output is hardwired to stdout.
\n", "For example, the output of help() normally is sent to sys.stdout. You can capture the output in a string by redirecting the output to an io.StringIO object\n" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Help on built-in function pow in module builtins:\\n\\npow(x, y, z=None, /)\\n Equivalent to x**y (with two arguments) or x**y % z (with three arguments)\\n \\n Some types, such as ints, are able to use a more efficient algorithm when\\n invoked using the three argument form.\\n\\n'" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import io\n", "from contextlib import redirect_stdout\n", "f = io.StringIO()\n", "with redirect_stdout(f):\n", " help(pow)\n", "f.getvalue()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To send output of help() to a file on disk, redirect the output to a regular file" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [], "source": [ "from contextlib import redirect_stdout\n", "with open('help.txt', 'w') as f:\n", " with redirect_stdout(f):\n", " help(pow)" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on built-in function pow in module builtins:\r\n", "\r\n", "pow(x, y, z=None, /)\r\n", " Equivalent to x**y (with two arguments) or x**y % z (with three arguments)\r\n", " \r\n", " Some types, such as ints, are able to use a more efficient algorithm when\r\n", " invoked using the three argument form.\r\n", "\r\n" ] } ], "source": [ "!cat help.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To send output of help() to sys.stderr" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Help on built-in function pow in module builtins:\n", "\n", "pow(x, y, z=None, /)\n", " Equivalent to x**y (with two arguments) or x**y % z (with three arguments)\n", " \n", " Some types, such as ints, are able to use a more efficient algorithm when\n", " invoked using the three argument form.\n", "\n" ] } ], "source": [ "from contextlib import redirect_stdout\n", "with redirect_stdout(sys.stderr):\n", " help(pow)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the global side effect on sys.stdout means that this context manager is not suitable for use in library code and most threaded applications. It also has no effect on the output of subprocesses. However, it is still a useful approach for many utility scripts.\n", "\n", "contexlib.redirect_stderr is similar to redirect_stdout but redirecting sys.stderr to another file or file-like object" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to use @contextmanger new in Python 3.6" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Context manager docs are [here](https://docs.python.org/3/library/contextlib.html#contextlib.contextmanager)" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "

\n", "foo\n", "

\n" ] } ], "source": [ "# Example to use @contextmanger but not recommend to generate HTML Tag\n", "from contextlib import contextmanager\n", "\n", "@contextmanager\n", "def tag(name):\n", " print(\"<%s>\" % name)\n", " yield\n", " print(\"<%s>\" % name)\n", "\n", "with tag(\"h1\"):\n", " print(\"foo\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Concise Expressive One-Liners" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Two conflicting rules:**\n", "1. Don't put too much on one line\n", "2. Don't break atoms of thought into subatomic particles\n", "\n", "**Raymond's rule**\n", "- One logical line of code equals one sentence in English\n", "- One logical line = One statement" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## List Comprehensions and Generator Expression" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "385\n" ] } ], "source": [ "result = []\n", "for i in range(11):\n", " s = i ** 2 \n", " result.append(s)\n", "print(sum(result))" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "385\n" ] } ], "source": [ "# Better way using List Comprehension Alternative\n", "print(sum([i ** 2 for i in range(11)]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Why is the second alternative better?
\n", "The first method shows how to approach the problem of calculating the sum. \n", "The second method shows exactly what the problem wants that is the sum of squares. It shows a single unit of though in terms of mathematics by taking sum of square of i from 1 to 10." ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "385\n" ] } ], "source": [ "# There's a even better way. I took an eraser and erase the square brackets.\n", "# It becomes generator expression\n", "print(sum(i ** 2 for i in range(11)))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }