{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# code for loading the format for the notebook\n", "import os\n", "\n", "# path : store the current path to convert back to it later\n", "path = os.getcwd()\n", "os.chdir(os.path.join('..', '..', 'notebook_format'))\n", "\n", "from formats import load_style\n", "load_style(plot_style=False)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Ethen 2017-10-24 15:01:20 \n", "\n", "CPython 3.5.2\n", "IPython 6.2.1\n" ] } ], "source": [ "os.chdir(path)\n", "\n", "# magic to print version\n", "%load_ext watermark\n", "%watermark -a 'Ethen' -d -t -v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Structure" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some of the materials are a condensed reimplementation from the resource: Python3 Cookbook Chapter 1. Data Structures and Algorithms, which originally was freely available online." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simple Assignments to Unpack Iterables into Separate Variables\n", "\n", "**Example1:** Unpacking a tuple." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4\n", "5\n" ] } ], "source": [ "# note that in Python,\n", "# it's the \",\" that creates the tuple,\n", "# so we technically do not need the ()\n", "# around the 4, 5. But if we need to create\n", "# a single element tuple, then we do need to ()\n", "# e.g. (4,) would be a single element tuple\n", "p = (4, 5)\n", "x, y = p\n", "print(x)\n", "print(y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This works for all types of sequences (iterables), including tuples, list, strings, generators, files.\n", "\n", "**Example2:** If you want to discard some of the elements, simply use `_` to represent it (You can use anything you want, this is just convention)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "50\n", "91.1\n" ] } ], "source": [ "data = ['ACME', 50, 91.1, (2012, 12, 21)]\n", "_, shares, price, _ = data\n", "print(shares)\n", "print(price)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use \"Star Expressions\" to Unpack Iterables of Arbitrary Length\n", "\n", "**Example1:** Data that has name, e-mail and arbitrary number of telephone numbers." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['773-555-1212', '847-555-1212']" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "record = ('Dave', 'dave@example.com', '773-555-1212', '847-555-1212')\n", "name, email, *phone_numbers = record\n", "phone_numbers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The star expression will always unpack a list (including none).\n", "\n", "**Example2:** Performing different actions when looping over different \"tagged\" tuples. \"Tagged\" simply means they have some known pattern or structure. e.g. The first element is a tag indicating what other information does this tag contain." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "foo 1 2\n", "bar hello\n", "foo 3 4\n" ] } ], "source": [ "records = [\n", " ('foo', 1, 2),\n", " ('bar', 'hello'),\n", " ('foo', 3, 4)\n", "]\n", "\n", "def do_foo(x, y):\n", " print('foo', x, y)\n", " \n", "def do_bar(s):\n", " print('bar', s)\n", "\n", "for tag, *args in records:\n", " if tag == 'foo':\n", " do_foo(*args)\n", " elif tag == 'bar':\n", " do_bar(*args)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Example3:** String manipulation and throwing away variables." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "nobody\n", "/var/empty\n", "/usr/bin/false\n" ] } ], "source": [ "line = 'nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false'\n", "uname, *_, homedir, sh = line.split(':')\n", "print(uname)\n", "print(homedir)\n", "print(sh)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Keeping the Last N Items Using deque\n", "\n", "**Example1:** A fix-sized queue that removes the oldest item when a new item is added and the queue is full." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "deque([1, 2, 3], maxlen=3)\n", "deque([2, 3, 4], maxlen=3)\n" ] } ], "source": [ "from collections import deque\n", "\n", "\n", "# specify the maxlen argument\n", "q = deque(maxlen = 3)\n", "q.append(1)\n", "q.append(2)\n", "q.append(3)\n", "print(q)\n", "q.append(4)\n", "print(q)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Example2:** A unbounded queue. You can pop and add item from both end with O(1)." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "deque([1, 2, 3])\n", "deque([4, 1, 2, 3])\n", "3\n", "deque([4, 1, 2])\n", "4\n", "deque([1, 2])\n" ] } ], "source": [ "q = deque()\n", "q.append(1)\n", "q.append(2)\n", "q.append(3)\n", "print(q)\n", "\n", "q.appendleft(4)\n", "print(q)\n", "\n", "# removes the right-most element\n", "print(q.pop())\n", "print(q)\n", "\n", "# removes the left-most element\n", "print(q.popleft())\n", "print(q)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Finding the Largest or Smallest N Items Using heapq\n", "\n", "**Example1:** `nlargest()` and `nsmallest()`." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[42, 37, 23]\n", "[-4, 1, 2]\n" ] } ], "source": [ "import heapq\n", "\n", "\n", "nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]\n", "print(heapq.nlargest(3, nums))\n", "print(heapq.nsmallest(3, nums))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Example2:** `nlargest()` and `nsmallest()` with more complex data structure." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'name': 'YHOO', 'price': 16.35, 'shares': 45},\n", " {'name': 'FB', 'price': 21.09, 'shares': 200},\n", " {'name': 'HPQ', 'price': 31.75, 'shares': 35}]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "portfolio = [\n", " {'name': 'IBM', 'shares': 100, 'price': 91.1},\n", " {'name': 'AAPL', 'shares': 50, 'price': 543.22},\n", " {'name': 'FB', 'shares': 200, 'price': 21.09},\n", " {'name': 'HPQ', 'shares': 35, 'price': 31.75},\n", " {'name': 'YHOO', 'shares': 45, 'price': 16.35},\n", " {'name': 'ACME', 'shares': 75, 'price': 115.65}\n", "]\n", "\n", "cheap = heapq.nsmallest(3, portfolio, key = lambda s: s['price'])\n", "cheap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When to use what:\n", "\n", "- Use `nlargest()` and `nsmallest()` if you are trying to find a relatively small number of items. \n", "- Use `min()` and `max()` if you are simply trying to find the single smallest or largest item (N=1). \n", "- Use `sorted(items)[:N]` or `sorted(items)[-N:]` if N is about the same size as the input." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## PriorityQueue\n", "\n", "\n", "\n", "We can think of priority queue as a modified queue: instead of retrieving the next element by insertion time, it retrieves the highest-priority element. The priority of individual elements is decided by the ordering applied to their keys.\n", "\n", "Priority queues are commonly used for dealing with scheduling problems. High-priority tasks on the system should take precedence over lower-priority tasks. By organizing pending tasks in a priority queue that uses the task urgency as the key, the task scheduler can allow the highest-priority tasks to run first.\n", "\n", "Let’s take a look at how we can implement priority queues in Python using built-in data structures or data structures that ship with Python’s standard library." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(1, 'eat')\n", "(2, 'code')\n", "(3, 'sleep')\n" ] } ], "source": [ "from queue import PriorityQueue\n", "\n", "q = PriorityQueue()\n", "q.put((2, 'code'))\n", "q.put((1, 'eat'))\n", "q.put((3, 'sleep'))\n", "\n", "while not q.empty():\n", " # note that apart from returning\n", " # the item from the queue, it will\n", " # also remove it from the queue\n", " next_item = q.get()\n", " print(next_item)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we can infer from the output, prioriy queue stores the elements by its priority, in this case the first element in the tuple. After from sorting primitive types such as integers, we can also sort objects that we've defined. To perform sorting on custom objects we need to implement the dunder methods for all 6 comparisons.\n", "\n", "| Operator | Method |\n", "|----------|------------|\n", "| == | ``__eq__`` |\n", "| != | ``__ne__`` |\n", "| < | ``__le__`` |\n", "| <= | ``__le__`` |\n", "| > | ``__gt__`` |\n", "| >= | ``__ge__`` |" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "class Skill:\n", " def __init__(self, priority, description):\n", " self.priority = priority\n", " self.description = description\n", " \n", " def __eq__(self, other):\n", " return self.priority == other.priority\n", "\n", " def __ne__(self, other):\n", " return self.priority != other.priority\n", "\n", " def __lt__(self, other):\n", " return self.priority < other.priority\n", "\n", " def __le__(self, other):\n", " return self.priority <= other.priority\n", "\n", " def __gt__(self, other):\n", " return self.priority > other.priority\n", "\n", " def __ge__(self, other):\n", " return self.priority >= other.priority\n", "\n", " def __repr__(self):\n", " return '{}: {}'.format(self.description, self.priority)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Java: 1\n", "R: 5\n", "Python: 10\n" ] } ], "source": [ "q = PriorityQueue()\n", "q.put(Skill(5, 'R'))\n", "q.put(Skill(10, 'Python'))\n", "q.put(Skill(1, 'Java'))\n", "\n", "while not q.empty():\n", " next_item = q.get()\n", " print(next_item)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Keeping Dictionaries in Order Using OrderedDict" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "grok 4; spam 3; foo 1; bar 2; \n", "foo 1; bar 2; spam 3; grok 4; " ] } ], "source": [ "from collections import OrderedDict\n", "\n", "\n", "d = dict()\n", "d['foo'] = 1\n", "d['bar'] = 2\n", "d['spam'] = 3\n", "d['grok'] = 4\n", "\n", "for key in d:\n", " print(key, d[key], end = '; ')\n", "\n", "print()\n", "od = OrderedDict()\n", "od['foo'] = 1\n", "od['bar'] = 2\n", "od['spam'] = 3\n", "od['grok'] = 4\n", "\n", "# the order remains the same compared to the dict\n", "for key in od:\n", " print(key, od[key], end = '; ')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Be aware that the size of an **OrderedDict** is more than twice as large as a normal dictionary!!" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'{\"foo\": 1, \"bar\": 2, \"spam\": 3, \"grok\": 4}'" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# one useful case may be keeping the order when converting it into json\n", "import json\n", "json.dumps(od)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculating with Dictionaries Using zip\n", "\n", "Performing calculations (e.g., min, max, sorting, etc.) on a dictionary by convert the it into tuples of ( value, key )." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(10.75, 'FB')\n" ] } ], "source": [ "# company and stocks\n", "prices = {'ACME': 45.23, 'AAPL': 612.78, 'IBM': 205.55,\n", " 'HPQ': 37.20, 'FB': 10.75}\n", "\n", "min_price = min(zip(prices.values(), prices.keys()))\n", "print(min_price)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Finding Commonalities in Two Dictionaries Using set operations\n", "\n", "We can perform common set operations with dictionary `.keys()` and `.items()` without having to convert them into a set as they are unique." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "a = {'x': 1, 'y': 2, 'z': 3}\n", "b = {'w': 10, 'x': 11, 'y': 2}" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'x', 'y'}\n", "{'z'}\n", "{('y', 2)}\n", "{'y': 2, 'x': 1}\n" ] } ], "source": [ "# common keys\n", "print(a.keys() & b.keys())\n", "\n", "# keys in a that are not in b\n", "print(a.keys() - b.keys())\n", "\n", "# (key,value) pairs in common\n", "print(a.items() & b.items())\n", "\n", "# make a new dictionary with certain keys removed\n", "c = {key: a[key] for key in a.keys() - {'z'}}\n", "print(c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Removing Duplicates from a Sequence while Maintaining Order" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{1, 2, 10, 5, 9}\n" ] } ], "source": [ "# simply calling set() removes duplicated but does not retain the original order\n", "a = [1, 5, 2, 1, 9, 1, 5, 10]\n", "print(set(a))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 5, 2, 9, 10]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def dedupe(items):\n", " seen = set()\n", " for item in items:\n", " if item not in seen:\n", " seen.add(item)\n", " yield item\n", "\n", "list(dedupe(a))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Naming Slices" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2, 3]\n", "[2, 3]\n", "2\n", "4\n", "1\n" ] } ], "source": [ "items = [0, 1, 2, 3, 4, 5, 6]\n", "important = slice(2, 4, 1)\n", "\n", "# instead of hardcoding the indexing slicing\n", "# we name the slice so we will know what it means\n", "# when we look at it later\n", "print(items[2:4])\n", "print(items[important])\n", "\n", "# you can also look at various info of the slice\n", "print(important.start)\n", "print(important.stop)\n", "print(important.step)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Find the Most Frequently Items in a Sequence Using Counter\n", "\n", "The `most_common` functionality from **Counter**." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[('eyes', 8), ('the', 5), ('look', 4)]\n" ] } ], "source": [ "from collections import Counter\n", "\n", "words = [\n", " 'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes',\n", " 'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the',\n", " 'eyes', \"don't\", 'look', 'around', 'the', 'eyes', 'look', 'into',\n", " 'my', 'eyes', \"you're\", 'under'\n", "]\n", "word_counts = Counter(words)\n", "top_three = word_counts.most_common(3)\n", "print(top_three)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Counter** is simply a dictionary and we can manually increment the count." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "9" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "morewords = ['why', 'are', 'you', 'not', 'looking', 'in', 'my', 'eyes']\n", "for word in morewords:\n", " word_counts[word] += 1\n", "\n", "word_counts['eyes']" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Counter({'eyes': 9, 'the': 5, 'my': 4, 'look': 4, 'into': 3, 'around': 2, 'not': 2, 'in': 1, 'you': 1, 'looking': 1, \"you're\": 1, 'why': 1, 'under': 1, 'are': 1, \"don't\": 1})\n" ] } ], "source": [ "# we can also perform math operations between them\n", "a = Counter(words)\n", "b = Counter(morewords)\n", "print(a + b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sorting with itemgetter and attrgetter\n", "\n", "By using the `key` argument with the `sorted` function we can accomplish a bit more complex operations when it comes to sorting. Note that key only accepts functions as its parameters, thus in the following we use a lambda to create an anonymous function and sort by the second element of the tuples." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students = [ \n", " ('john', 'A', 15),\n", " ('jane', 'B', 12),\n", " ('dave', 'B', 10),\n", "]\n", "\n", "# sort by age , which is the last element\n", "# in the three element tuple list\n", "sorted(students, key = lambda x: x[2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instead of doing that we can use `itemgetter` for convenience." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from operator import itemgetter\n", "sorted(students, key = itemgetter(2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The same notion also applies to sorting by dictionary key." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'fname': 'Big', 'lname': 'Jones', 'uid': 1004},\n", " {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},\n", " {'fname': 'David', 'lname': 'Beazley', 'uid': 1002},\n", " {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}]" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from operator import itemgetter\n", "\n", "\n", "rows = [\n", " {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},\n", " {'fname': 'David', 'lname': 'Beazley', 'uid': 1002},\n", " {'fname': 'John', 'lname': 'Cleese', 'uid': 1001},\n", " {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}\n", "]\n", "\n", "rows_by_fname = sorted(rows, key = itemgetter('fname'))\n", "rows_by_fname" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The method is faster then `key = lambda r: r['fname']`. And we can assign multiple values inside the **itemgetter()**.\n", "\n", "There's also `attrgetter` for performing sorting according to class object's attribute." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "class Student: \n", " def __init__(self, name, grade, age):\n", " self.age = age\n", " self.name = name\n", " self.grade = grade\n", "\n", " def __repr__(self): \n", " return repr((self.name, self.grade, self.age))\n", "\n", "student_objects = [ \n", " Student('john', 'A', 15),\n", " Student('jane', 'B', 12),\n", " Student('dave', 'B', 10)\n", "] " ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# we can sort using the lambda anonymous function way\n", "sorted(student_objects, key = lambda student: student.age)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from operator import attrgetter\n", "\n", "# or sort using attrgetter, notice that we can pass in\n", "# multiple arguments. So here we are sorting by grade then age\n", "sorted(student_objects, key = attrgetter('grade', 'age'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Named Tuples\n", "\n", "Create a tuple with names for clearity (compared with tuples) and also has the immutable feature." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "55\n", "55\n" ] } ], "source": [ "from collections import namedtuple\n", "\n", "\n", "# create the name tuple by assigning the field name\n", "Color = namedtuple('Color', ['red', 'green', 'blue'])\n", "color = Color(55, 55, 55)\n", "\n", "# access the element using .field name\n", "print(color.red)\n", "\n", "# be aware that we can use index to access the element\n", "# in the namedtuple, but this defeats the whole purpose\n", "# of using namedtuple versus plain tuple, i.e. color.red\n", "# is arguably more readable than color[0]\n", "print(color[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "namedtuple can be used as a replacement for a dictionary, which requires more space to store. However, be aware that a it's immutable." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Color(red=75, green=55, blue=55)" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# because of its immutable feature\n", "# color.red = 75 # this will return an error\n", "\n", "# use ._replace() if you really really really\n", "# wish to change the value after creation\n", "color = color._replace(red = 75)\n", "color" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## DefaultDict\n", "\n", "Whenever you need a dictionary, and each value of the dictionary has to start with the default value, use **defaultdict**. A **defaultdict** will never raise a KeyError. Any key that does not exist gets the value returned by the default factory." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Chunky Monkey\n", "Vanilla\n" ] } ], "source": [ "# example 1: Joe does not exist in the dictionary, return default value\n", "from collections import defaultdict\n", "\n", "\n", "ice_cream = defaultdict(lambda: 'Vanilla')\n", "ice_cream['Sarah'] = 'Chunky Monkey'\n", "ice_cream['Abdul'] = 'Butter Pecan'\n", "print(ice_cream['Sarah'])\n", "print(ice_cream['Joe'])" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TX Austin, Houston, Dallas\n", "NY Albany, Syracuse, Buffalo, Rochester\n", "CA Sacramento, Palo Alto\n", "GA Atlanta\n" ] } ], "source": [ "# example 2: Grouping with dictionaries\n", "from collections import defaultdict\n", "\n", "\n", "city_list = [('TX','Austin'), ('TX','Houston'), ('NY','Albany'), ('NY', 'Syracuse'), \n", " ('NY', 'Buffalo'), ('NY', 'Rochester'), ('TX', 'Dallas'), ('CA','Sacramento'), \n", " ('CA', 'Palo Alto'), ('GA', 'Atlanta')]\n", "\n", "cities_by_state = defaultdict(list)\n", "\n", "for state, city in city_list:\n", " cities_by_state[state].append(city)\n", "\n", "for state, cities in cities_by_state.items():\n", " print(state, \", \".join(cities))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filtering Sequence Elements\n", "\n", "**Example1:** Replacing the values that don’t meet the criteria with a new value." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 4, 0, 10, 0, 2, 3, 0]" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mylist = [1, 4, -5, 10, -7, 2, 3, -1]\n", "clip_neg = [m if m > 0 else 0 for m in mylist]\n", "clip_neg" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Example2:** Use `filter` when the filtering criteria can't be easily expressed." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['1', '2', '-3', '4', '5']\n" ] } ], "source": [ "values = ['1', '2', '-3', '-', '4', 'N/A', '5']\n", "\n", "def is_int(val):\n", " try:\n", " x = int(val)\n", " return True\n", " except ValueError:\n", " return False\n", "\n", "# filter returns an iterator, you'll have to manually\n", "# convert it to a list\n", "ivals = list(filter(is_int, values))\n", "print(ivals)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Example3:** `itertools.compress()` for filtering a sequence with an accompanying boolean sequence. Suppose you want to make a list of all addresses where the corresponding count value was greater than 5." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['5800 E 58TH', '4801 N BROADWAY', '1039 W GRANVILLE']\n" ] } ], "source": [ "from itertools import compress\n", "\n", "addresses = [\n", " '5412 N CLARK',\n", " '5148 N CLARK',\n", " '5800 E 58TH',\n", " '2122 N CLARK'\n", " '5645 N RAVENSWOOD',\n", " '1060 W ADDISON',\n", " '4801 N BROADWAY',\n", " '1039 W GRANVILLE',\n", "]\n", "\n", "counts = [0, 3, 10, 4, 1, 7, 6, 1]\n", "more5 = [n > 5 for n in counts]\n", "\n", "# it also returns an iterator\n", "print(list(compress(addresses, more5)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extracting a Subset of a Dictionary" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "prices = {'ACME': 45.23, 'AAPL': 612.78, 'IBM': 205.55,\n", " 'HPQ': 37.20, 'FB': 10.75}\n", "\n", "# Make a dictionary of all prices over 200\n", "p1 = {key: value for key, value in prices.items() if value > 200}\n", "\n", "# Make a dictionary of tech stocks\n", "tech_names = {'AAPL', 'IBM', 'HPQ', 'MSFT'}\n", "p2 = {key: value for key, value in prices.items() if key in tech_names}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Transforming and Reducing Data Using Generation Expression\n", "\n", "**Example1:** Notice that we do not need to write `sum((x * x for x in nums))` to first create the generator." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "55" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nums = [1, 2, 3, 4, 5]\n", "s = sum(x * x for x in nums)\n", "s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Example2:** Determine if any .py files exist in a directory." ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "uncomment to run, but there's no directory named 'dirname'\n" ] } ], "source": [ "\"\"\"\n", "import os\n", "\n", "files = os.listdir('dirname')\n", "if any(name.endswith('.py') for name in files):\n", " print('There be python!')\n", "else:\n", " print('Sorry, no python.')\n", "\"\"\"\n", "print(\"uncomment to run, but there's no directory named 'dirname'\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## For else\n", "\n", "For else let's us remove extraneous flag variables. Examples with and without `for ... else ...`" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "def contains_even_number1(l):\n", " \"\"\"Prints whether or not the list l contains an even number.\"\"\"\n", " has_even_number = False\n", " for element in l:\n", " if element % 2 == 0:\n", " has_even_number = True\n", " break\n", "\n", " if has_even_number:\n", " print(\"list contains an even number\")\n", " else:\n", " print(\"list does not contain an even number\")\n", "\n", "\n", "def contains_even_number2(l):\n", " \"\"\"Prints whether or not the list l contains an even number.\"\"\"\n", " for element in l:\n", " if element % 2 == 0:\n", " print(\"list contains an even number\")\n", " break\n", " # we can think of the else statement as \"if no break\"\n", " else:\n", " print(\"list does not contain an even number\")" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "list contains an even number\n", "list contains an even number\n" ] } ], "source": [ "list1 = [3, 5, 8]\n", "contains_even_number1(list1)\n", "contains_even_number2(list1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that Python also has a [try, except, else expression](https://stackoverflow.com/questions/16138232/is-it-a-good-practice-to-use-try-except-else-in-python). In this scenario, else will evaluate only if there is no exception from the try block, allowing us to simplify the more complicated code below:\n", "\n", "```python\n", "\n", "# we have a sentinel flag indicating whether our try succeeded without any exception.\n", "no_error = False\n", "try:\n", " try_this(whatever)\n", " no_error = True\n", "except SomeException as the_exception:\n", " handle(the_exception)\n", "if no_error:\n", " return something\n", "```\n", "\n", "We can re-write the flow above using else:\n", "\n", "```python\n", "try:\n", " try_this(whatever)\n", "except SomeException as the_exception:\n", " handle(the_exception)\n", "else: # only execute when there is no exception from the try block\n", " return something\n", "```\n", "\n", "The reason why this type of flow control exists is because A try block allows us to handle an \"expected\" error. The except block should only catch exceptions we are prepared to handle. If we handle an unexpected error, our code may do the wrong thing and hide bugs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Reference" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- [Notes: Comparing and Sorting](http://portingguide.readthedocs.io/en/latest/comparisons.html)\n", "- [Blog: for ... else in Python](http://psung.blogspot.tw/2007/12/for-else-in-python.html)\n", "- [Blog: Priority Queues in Python](https://dbader.org/blog/priority-queues-in-python)\n", "- [Blog: Using defaultdict in Python](https://www.accelebrate.com/blog/using-defaultdict-python/)\n", "- [Youtube: Namedtuple - When and why should you use namedtuples?](https://www.youtube.com/watch?v=GfxJYp9_nJA)\n", "- [Online Book: Python3 Cookbook Chapter 1. Data Structures and Algorithms](http://chimera.labs.oreilly.com/books/1230000000393/ch01.html)" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" }, "toc": { "nav_menu": { "height": "487px", "width": "252px" }, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "335px" }, "toc_section_display": "block", "toc_window_display": true }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }