{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Effective Python - 59 Specific Ways to Write Better Python. \n", "# *Chapter 1 - Pythonic-Thinking*\n", "Book by Brett Slatkin. \n", "Summary notes by Tyler Banks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Item 1: Know Which Version of Python You’re Using\n" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sys.version_info(major=3, minor=6, micro=4, releaselevel='final', serial=0)\n", "3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)]\n" ] } ], "source": [ "import sys\n", "print(sys.version_info)\n", "print(sys.version)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Item 2: Follow the PEP 8 Style Guide\n", "http://www.python.org/dev/peps/pep-0008/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Whitepaces\n", "* Use 4 whitespaces\n", "* Lines should be 79 characters or less\n", "* Continuation of long expressions should be intented by 4 extra spaces\n", "* Functions and classes shoulde be separated by two blank lines\n", "* In a class methods should be spearated by one blank\n", "* Don't put spaces around list indexes, function calls, args or assignments\n", "* Put one space before and after variable assignment\n", "\n", "Naming\n", "* Functions, variables and attributes should be in `lowercase_underscore` format\n", "* Protected instance attributes should be in `_leading_underscore` format.\n", "* Private instance attributes should be in `__double_leading_underscore` format\n", "* Classes and exceptions should be in `CapitalizedWord` format\n", "* Module constants should be in `ALL_CAPS` format\n", "* Instance methods in class should use `self` as the name of the first parameter, refering to the object\n", "* Class methods should use `cls` as the name of the first parameter, refering to the class\n", "\n", "Experessions and Statemens\n", "* Use inline negation (`if a is not b`) instead of negation positive statments (`if not a is b`)\n", "* Don't check for empyt values by checking length (`if len(alist) == 0`). Use `if not alist`\n", "* Avoid single line `if` statements, `for`, and `while` loops, and `except` statements. Spread over a series of lines.\n", "* Always put import statements at the top of a file\n", "* Always put absolute names for modules, not relative paths. (`from bar import foo`) not `import foo`\n", "* Imports should be in sections in the following order: Standard library modules, Third party modules, Your own modules. Subsections should be in alphabetical order\n", "\n", "See Pylint (http://www.pylint.org/) to analyze your source code and automatically fix it up!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Item 3: Know the Differences Between `bytes`, `str`, and `unicode`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "General Python 3\n", "* There are two types that represent sequences of characters: `bytes` and `str`\n", "* `bytes` contain raw 8-bit values, `str` contains unicode" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "this is text\n", "b'this is text'\n", "this is text\n", "False\n", "True\n" ] } ], "source": [ "#Convert between str and bytes using encode and decode\n", "string = \"this is text\"\n", "print(string)\n", "\n", "bytes_ = string.encode('utf-8')\n", "print(\"{}\".format(bytes_))\n", "\n", "string1 = bytes_.decode('utf-8')\n", "print(string1)\n", "\n", "print(bytes == string)\n", "print(string == string1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* `bytes` and `str` are never equivilent\n", "* Files opened will default to UTF-8 encoding not binary\n", "* Use 'wb' to open binary files" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "#with open('/tmp/random.bin', 'wb') as f:\n", "# f.write(os.urandom(10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* `bytes` contain sequences of 8 bit values. `str` contains unicode. They can't be used together with operators like `>` or `+`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Item 4: Write Helper Functions Instead of Complex Expressions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Don't overcomplicate one line statements\n", "* Move complex expressions to helper functions, especially for repeated code\n", "* `if`/`else` is more readable than `or`/`and`" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "9\n", "0\n" ] } ], "source": [ "#Example:\n", "my_values = {'red':[9,8,7]}\n", "print(my_values.get('red', [''])[0] or 0)\n", "print(my_values.get('blue', [''])[0] or 0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* The preceeding reads: from my_values, if 'red' exists (otherwise return '') get the first value ([0]) if it exists, otherwise return 0\n", "* Do something like this instead" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n" ] } ], "source": [ "def get_first_int(values, key, default=0):\n", " found = values.get(key, [''])\n", " if found[0]:\n", " found = int(found[0])\n", " else:\n", " found = default\n", " return found\n", "\n", "print(my_values.get('blue', [''])[0] or 0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Item 5: Know How to Slice Sequences" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Slicing is built in to `list`,`str`, and `bytes`\n", "* Slicing can be extended to any class that implements `__getitem__` and `__setitem__` methods. (Inherticance from collections.abc -- Item 28)\n", "* Basic form is `alist[start,end]` and `start` is inclusive and `end` is exclusive." ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "First four: ['a', 'b', 'c', 'd']\n", "Last four: ['e', 'f', 'g', 'h']\n", "Middle two: ['d', 'e']\n" ] } ], "source": [ "a = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']\n", "print('First four:', a[:4])\n", "print('Last four: ', a[-4:])\n", "print('Middle two:', a[3:-3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Using `alist[0:len(alist)]` is redundant\n", "* Slicing a list will result in a whole new list and modifying the result won't affect the original list" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 2, 'z', 'y', 'e', 'f', 'g', 'h']\n", "['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']\n" ] } ], "source": [ "b = a[:]\n", "b[0:2] = (1,2)\n", "b[2:4] = ['z','y']\n", "print(b)\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [], "source": [ "b = a[:]\n", "assert b == a and b is not a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Slicing is forgiving of start and end indexes that are out of bounds making it easy to express slices in the front or back of the list\n", "* Assigning a list slice will replace the range even if their sizes are different" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Item 6: Avoid Using `start`,`end`, and `stride` in a single slice" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Using `start`, `end`, and `stride` in a slice can be confusing\n", "* Prefer using positive stride in slices without `start` or `end` indexes and avoid using negative `stride` if possible\n", "* Avoid using `start`,`end`, and `stride` in a single slice\n", "* Consider doing two assignments (one to slice, another to stride) or use `isslice` from `itertools`" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']\n", "['a', 'c', 'e']\n", "['a', 'c', 'e']\n" ] } ], "source": [ "a = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']\n", "print(a)\n", "#Bad\n", "b = a[0:6:2]\n", "print(b)\n", "#Good\n", "c = a[0:6]\n", "d = c[::2]\n", "print(d)\n", "assert b == d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Item 7: Use List Comprehensions Instead of `map` and `filter`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* List comprehension -- deriving one list from another\n", "* Lists are easier to use than `map` and `filter` because they don't require `lambda` functions. \n", "* Ex: You want to compute the square of each number in a list" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]\n" ] } ], "source": [ "a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\n", "squares = [x**2 for x in a]\n", "print(squares)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* List comprehension is easier to use and allows for filtering" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[4, 16, 36, 64, 100]\n" ] } ], "source": [ "even_squares = [x**2 for x in a if x % 2 == 0]\n", "print(even_squares)\n", "\n", "#Bad, confusing use of map and filter\n", "alt = map(lambda x: x**2, filter(lambda x: x % 2 == 0, a))\n", "assert even_squares == list(alt)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Dictionaries and sets have their own equivilents." ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{1: 'ghost', 2: 'habanero', 3: 'cayenne'}\n", "{8, 5, 7}\n" ] } ], "source": [ "chile_ranks = {'ghost': 1, 'habanero': 2, 'cayenne': 3}\n", "rank_dict = {rank: name for name, rank in chile_ranks.items()}\n", "chile_len_set = {len(name) for name in rank_dict.values()}\n", "print(rank_dict)\n", "print(chile_len_set)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Item 8: Avoid More Than Two Expressions in List Comprehensions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* List comprehension allows for more than one loop level\n", "* Don't use more than two for readability\n", "* Ex: flatten a matrix" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 2, 3, 4, 5, 6, 7, 8, 9]\n" ] } ], "source": [ "matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]\n", "flat = [x for row in matrix for x in row]\n", "print(flat)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Squaring each" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1, 4, 9], [16, 25, 36], [49, 64, 81]]\n" ] } ], "source": [ "squared = [[x**2 for x in row] for row in matrix]\n", "print(squared)" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\n", "[6, 8, 10]\n", "[6, 8, 10]\n" ] } ], "source": [ "# Additional Examples\n", "a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\n", "b = [x for x in a if x > 4 if x % 2 == 0]\n", "c = [x for x in a if x > 4 and x % 2 == 0]\n", "print(a)\n", "print(b)\n", "print(c)" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [], "source": [ "# Bad\n", "# my_lists = [\n", "# [[1, 2, 3], [4, 5, 6]],\n", "# …\n", "# ] flat =\n", "# [\n", "# x\n", "# for sublist1 in my_lists\n", "# for sublist2 in sublist1\n", "# for x in sublist2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Item 9: Consider Generator Expressions for Large Comprehensions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* List comprehension works well for small lists but large inputs could crash your program due to memory use\n", "* Example: reading a file and returning the number of characters on each line" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[21, 6, 15, 16, 20]\n" ] } ], "source": [ "value = [len(x) for x in open('data/i9_file.txt')]\n", "print(value)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Generator expressions don't materialize the whole input sequence when run, it uses an iterator to `yeild` values as they're called\n", "* Generators are created by puting list-comprehension in between `()` characters" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " at 0x0000017B4729FD00>\n" ] } ], "source": [ "it = (len(x) for x in open('data/i9_file.txt'))\n", "print(it)" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21\n" ] } ], "source": [ "print(next(it))" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(6, 2.449489742783178)\n" ] } ], "source": [ "roots = ((x,x**0.5) for x in it)\n", "print(next(roots))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Chaining generators like this runs quickly in Python.\n", "* Useful for large stream of input generators are the best tool\n", "* Iterators are stateful and you need to be careful to only read once" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Item 10: Prefer `enumerate` over `range`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* `range` is useful for loops over a set of integers\n", "* Not so much for lists" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [], "source": [ "#random_bits = 0\n", "#for i in range(64):\n", "# if randint(0, 1):\n", "# random_bits |= 1 << i" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "vanilla is delicious\n", "chocolate is delicious\n", "pecan is delicious\n", "strawberry is delicious\n" ] } ], "source": [ "flavor_list = ['vanilla', 'chocolate', 'pecan', 'strawberry']\n", "for flavor in flavor_list:\n", " print('%s is delicious' % flavor)" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1: vanilla\n", "2: chocolate\n", "3: pecan\n", "4: strawberry\n" ] } ], "source": [ "#Clumsy\n", "for i in range(len(flavor_list)):\n", " flavor = flavor_list[i]\n", " print('%d: %s' % (i + 1, flavor))" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1: vanilla\n", "2: chocolate\n", "3: pecan\n", "4: strawberry\n" ] } ], "source": [ "# Much better \n", "for i, flavor in enumerate(flavor_list):\n", " print('%d: %s' % (i + 1, flavor))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* You can even specify the number at which enumerate starts! Notice the second `enumerate` argument" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1: vanilla\n", "2: chocolate\n", "3: pecan\n", "4: strawberry\n" ] } ], "source": [ "for i, flavor in enumerate(flavor_list, 1):\n", " print('%d: %s' % (i, flavor))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Item 11: Use `zip` to Process Iterators in Parallel" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [], "source": [ "names = ['Cecilia', 'Lise', 'Marie']\n", "letters = [len(n) for n in names]" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Cecilia\n" ] } ], "source": [ "# Start code\n", "longest_name = None\n", "max_letters = 0\n", "for i in range(len(names)):\n", " count = letters[i]\n", " if count > max_letters:\n", " longest_name = names[i]\n", " max_letters = count\n", "print(longest_name)" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Cecilia\n" ] } ], "source": [ "# Better\n", "for i, name in enumerate(names):\n", " count = letters[i]\n", " if count > max_letters:\n", " longest_name = name\n", " max_letters = count\n", "print(longest_name)" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Cecilia\n" ] } ], "source": [ "# Best\n", "for name, count in zip(names, letters):\n", " if count > max_letters:\n", " longest_name = name\n", " max_letters = count\n", "print(longest_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Zip stops when the first iterator is exhausted, be careful\n", "* Zip is a lazy generator producing a tupple\n", "* Use `zip_longest` from `itertools` to iterate over multiple iterators regardless of length" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Item 12: Avoid else Blocks After for and while Loops" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Python loops allow for else blocks after loops (`while` and `for`)\n", "* `else` only runs if the loop body did not encounter a break statement\n", "* Confusing, don't use" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "For Else block!\n" ] } ], "source": [ "for x in []:\n", " print('Never runs')\n", "else:\n", " print('For Else block!')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Item 13: Take Advantage of Each Block in `try`/`except`/`else`/`finally`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* `try`/`finally` allows for you to run cleanup code regardless of exceptions raised in `try` block\n", "* `else` helps minimize the amout of code in `try` and distinguishes success case from `try`/`except` block\n", "* `else` can be used to perform additional actions after successful `try` block but before cleanup in `finally`" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [], "source": [ "UNDEFINED = object()\n", "def divide_json(path):\n", " handle = open(path, 'r+') # May raise IOError\n", " try:\n", " data = handle.read() # May raise UnicodeDecodeError\n", " op = json.loads(data) # May raise ValueError\n", " value = (\n", " op['numerator'] /\n", " op['denominator']) # May raise ZeroDivisionError\n", " except ZeroDivisionError as e:\n", " return UNDEFINED\n", " else:\n", " op['result'] = value\n", " result = json.dumps(op)\n", " handle.seek(0)\n", " handle.write(result) # May raise IOError\n", " return value\n", " finally:\n", " handle.close() # Always runs" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }