{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# COMP 364: A brief tour of the Standard Library\n", "\n", "There are three kinds of modules/packages:\n", "\n", "* Modules you make yourself\n", "* Third-party modules (e.g. [matplotlib](https://matplotlib.org/))\n", "* Standard library modules\n", "\n", "Standard library modules come included in Python and they contain many useful tools.\n", "\n", "They are maintained by the core Python development team so you can count on them being reliable.\n", "\n", "The Python Standard Library is very extensive, so I will just show you some highlights.\n", "\n", "Refer to [this](https://docs.python.org/3/tutorial/stdlib.html) and [this](https://docs.python.org/3/library/index.html) for a more complete view on the Standard Library.\n", "\n", "**Note:** Standard Library packages and modules are NOT the same thing as **built-in** objects (e.g. `print`, `open`, `zip`, `enumerate`). You still have to `import` standard library modules/packages you just don't have to install them from elsewhere.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## System-related \n", "\n", "* `sys`: functions and variables working on the Python interpreter\n", "* `os`: operating system functionality\n", "* `shutil`: file manipulation\n", "\n", "### `sys`" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Interpreter is located at: /Users/carlosgonzalezoliver/anaconda/envs/py36/bin/python\n", "\n", "Look for modules in: ['', '/Users/carlosgonzalezoliver/anaconda/envs/py36/lib/python36.zip', '/Users/carlosgonzalezoliver/anaconda/envs/py36/lib/python3.6', '/Users/carlosgonzalezoliver/anaconda/envs/py36/lib/python3.6/lib-dynload', '/Users/carlosgonzalezoliver/anaconda/envs/py36/lib/python3.6/site-packages', '/Users/carlosgonzalezoliver/anaconda/envs/py36/lib/python3.6/site-packages/IPython/extensions', '/Users/carlosgonzalezoliver/.ipython']\n", "\n" ] } ], "source": [ "import sys\n", "\n", "#get the interpreter path\n", "print(f\"Interpreter is located at: {sys.executable}\\n\")\n", "#get module search path\n", "print(f\"Look for modules in: {sys.path}\\n\")\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "ename": "SystemExit", "evalue": "", "output_type": "error", "traceback": [ "An exception has occurred, use %tb to see the full traceback.\n", "\u001b[0;31mSystemExit\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/Users/carlosgonzalezoliver/anaconda/envs/py36/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2870: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.\n", " warn(\"To exit: use 'exit', 'quit', or Ctrl-D.\", stacklevel=1)\n" ] } ], "source": [ "#kill the interpreter, stops your program's execution (works better outside of notebooks)\n", "sys.exit()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `sys`: command-line arguments\n", "\n", "Until now we have been getting input from the user in an \"interactive\" way. \n", "\n", "That is, the program pauses execution and waits for the user to respond to the `input` query.\n", "\n", "You can also let users give input to your program at the beginning of execution and then execution is never halted.\n", "\n", "This is done through **command-line arguments**\n", "\n", "Imagine you have a file `divide.py` that divides two numbers given by the user.\n", "\n", "Using `input()` we had\n", "\n", "```python\n", "a = int(input(\"Give me the first number: \"))\n", "b = int(input(\"Give me the second number: \"))\n", "\n", "print(a / b)\n", "```\n", "\n", "With command-line arguments, the information is taken **before** execution.\n", "\n", "```python\n", "import sys\n", "\n", "a = int(sys.argv[1])\n", "b = int(sys.argv[2])\n", "\n", "print(a/b)\n", "```\n", "\n", "From the command line, you would call the program as such:\n", "\n", "```\n", "$ python divide.py 3 2\n", "```\n", "\n", "`sys.argv` stores a list of strings given by the command line.\n", "\n", "In this case:\n", "\n", "```python\n", "print(sys.argv)\n", "```\n", "\n", "Would produce:\n", "\n", "```python\n", "[\"divide.py\", \"3\", \"2\"]\n", "```\n", "\n", "Command line arguments are often preferred when it is desireable to automate the execution of a program.\n", "\n", "## `os`\n", "\n", "This module lets you perform actions related to the operating system.\n", "\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "My operating system type is: posix\n", "I am currently in directory: /Users/carlosgonzalezoliver/Projects/Notebooks/COMP_364/L24\n" ] } ], "source": [ "import os\n", "\n", "print(f\"My operating system type is: {os.name}\")\n", "\n", "print(f\"I am currently in directory: {os.getcwd()}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also change your current working directory" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'/Users/carlosgonzalezoliver/Projects'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "os.chdir(\"/Users/carlosgonzalezoliver/Projects\")\n", "os.getcwd()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see what files are in a directory. No arguments means, look in the current directory." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['.DS_Store',\n", " '.ipynb_checkpoints',\n", " 'AbstractClassification',\n", " 'ArXiVDT',\n", " 'BGSA_Workshop',\n", " 'briancaffey.github.io',\n", " 'cgoliver.github.io',\n", " 'cminerva',\n", " 'Crick',\n", " 'dapps',\n", " 'Dorys_Whaleish_Dictionary.py',\n", " 'ETHDogs',\n", " 'Ethereum',\n", " 'Euler',\n", " 'Git_Talk',\n", " 'Git_Tutorial',\n", " 'Google',\n", " 'haikus',\n", " 'Kattis',\n", " 'Keras-RCNN',\n", " 'Kernels',\n", " 'kernelx',\n", " 'machine_learning',\n", " 'mateRNAl',\n", " 'myproject',\n", " 'Notebooks',\n", " 'Nussinov',\n", " 'Pear',\n", " 'Pickypedia',\n", " 'Plumbing',\n", " 'pocketcluster',\n", " 'Popgen_sols',\n", " 'pyCourses',\n", " 'pyMeet',\n", " 'RNA',\n", " 'RNA-Popgen-Notebook',\n", " 'SeizuresBot',\n", " 'Test',\n", " 'testblog',\n", " 'tPPI',\n", " 'Voting',\n", " 'zminerva']" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "os.listdir()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or you can give a path." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['.ipynb_checkpoints',\n", " 'L24.ipynb',\n", " 'rand_dict.json',\n", " 'rand_dict.pickle',\n", " 'test.csv',\n", " 'test.txt']" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "os.listdir(\"/Users/carlosgonzalezoliver/Projects/Notebooks/COMP_364/L24\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's go back to where we were." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "os.chdir(\"/Users/carlosgonzalezoliver/Projects/Notebooks/COMP_364/L24\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also create new directories." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "os.mkdir(\"Temp\")" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['.ipynb_checkpoints',\n", " 'L24.ipynb',\n", " 'rand_dict.json',\n", " 'rand_dict.pickle',\n", " 'Temp',\n", " 'test.csv',\n", " 'test.txt']" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "os.listdir()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `shutil`\n", "\n", "`shutil` is used for file manipulation (not file content manipulation)\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "with open(\"test.txt\", \"w\") as t:\n", " t.write(\"Hello\")" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['.ipynb_checkpoints',\n", " 'L24.ipynb',\n", " 'rand_dict.json',\n", " 'rand_dict.pickle',\n", " 'Temp',\n", " 'test.csv',\n", " 'test.txt']" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "os.listdir()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'test_copy.txt'" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import shutil\n", "#copy the file\n", "shutil.copyfile(\"test.txt\", \"test_copy.txt\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "os.listdir()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "#delete a directory\n", "shutil.rmtree(\"Temp\")" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "#deleting files is done with os\n", "os.remove(\"test_copy.txt\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "os.listdir()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Math\n", "\n", "There are a couple convenient \"math\" modules\n", "\n", "* math: basic math operations and quantities\n", "* random: pseudo-random numbers\n", "* statistics: basic statistics functions\n", "\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "e^2: 7.38905609893065\n", "log(1): 0.0\n", "3^4: 81.0\n", "sin(4): -0.7568024953079282\n" ] } ], "source": [ "import math\n", "\n", "\n", "print(f\"e^2: {math.exp(2)}\")\n", "\n", "print(f\"log(1): {math.log(1)}\")\n", "\n", "print(f\"3^4: {math.pow(3, 4)}\")\n", "\n", "print(f\"sin(4): {math.sin(4)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Random\n", "\n", "The `random` module gives you pseudo-random (no perfectly random generator exists) functionality." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "uniform random number: 0.4199826783981393\n", "uniform random number between 4 and 15 11\n", "gaussian random number with mean 0 and variance 1: 0.5279296760327262 \n" ] } ], "source": [ "import random\n", "#random number uniformly from 0 and 1\n", "print(f\"uniform random number: {random.random()}\")\n", "\n", "print(f\"uniform random number between 4 and 15 {random.randrange(4, 16)}\")\n", "\n", "mu = 0\n", "sigma = 1\n", "print(f\"gaussian random number with mean {mu} and variance {sigma}: {random.gauss(mu, sigma)} \")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's check that we're actually getting uniform and Gaussian distributions." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEKCAYAAAAIO8L1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEXZJREFUeJzt3XuMZnV9x/H3x11B0VYgTLcrFxfbjS1SLzgaL40BsRWV\numiVrtF2qbRbU6+t0YIkpUlDQ6JWW1tptkp3mxJwa6GgVYGuq6RJgS4X5S5EWIHC7ig21mrEhW//\neA7uw/IbZpid5zmzM+9XsnnO+Z1z5nx/WZbP/H7n8qSqkCRpT0/quwBJ0sJkQEiSmgwISVKTASFJ\najIgJElNBoQkqcmAkCQ1GRCSpCYDQpLUtLzvAvbGIYccUqtWreq7DEnap1xzzTXfqaqJmfbbpwNi\n1apVbNu2re8yJGmfkmT7bPZzikmS1GRASJKaRhYQSc5NsjPJjUNtH0lya5JvJLkoyYFD205PckeS\n25K8ZlR1SZJmZ5QjiI3ACXu0XQ4cXVXPA74JnA6Q5ChgLfDc7phPJVk2wtokSTMYWUBU1RXAA3u0\nXVZVu7rVK4HDuuU1wAVV9eOquhO4A3jJqGqTJM2sz2sQ7wC+1C0fCtw9tO2eru0xkqxPsi3Jtqmp\nqRGXKElLVy8BkeQMYBdw3hM9tqo2VNVkVU1OTMx4G68kaY7G/hxEklOAE4Hja/f3nd4LHD6022Fd\nmySpJ2MdQSQ5AfgQ8Iaq+uHQpkuAtUn2T3IksBq4epy1SZIebWQjiCTnA8cChyS5BziTwV1L+wOX\nJwG4sqreWVU3JdkM3Mxg6uldVfXQqGrT+B236bhezrt13dZezistBiMLiKp6a6P5M4+z/1nAWaOq\nR5L0xPgktSSpyYCQJDUZEJKkJgNCktRkQEiSmvbpLwzaV3nLp6R9gSMISVKTASFJajIgJElNBoQk\nqcmAkCQ1GRCSpKYlfZtrX7ebStK+wBGEJKnJgJAkNS3pKaalxik1SU+EIwhJUpMBIUlqMiAkSU0G\nhCSpyYCQJDV5F5OkfZ7fsTIajiAkSU0GhCSpyYCQJDUZEJKkppFdpE5yLnAisLOqju7aDgY+C6wC\n7gJOrqrvddtOB04FHgLeW1WXjqo2SZoPfb6+ZhwXyEc5gtgInLBH22nAlqpaDWzp1klyFLAWeG53\nzKeSLBthbZKkGYwsIKrqCuCBPZrXAJu65U3ASUPtF1TVj6vqTuAO4CWjqk2SNLNxX4NYUVX3dcv3\nAyu65UOBu4f2u6dre4wk65NsS7JtampqdJVK0hLX20Xqqiqg5nDchqqarKrJiYmJEVQmSYLxB8SO\nJCsBus+dXfu9wOFD+x3WtUmSejLugLgEWNctrwMuHmpfm2T/JEcCq4Grx1ybJGnIKG9zPR84Fjgk\nyT3AmcDZwOYkpwLbgZMBquqmJJuBm4FdwLuq6qFR1SZJmtnIAqKq3jrNpuOn2f8s4KxR1SNJemJ8\nklqS1OTrvrWoLfYnXaVRcgQhSWoyICRJTQaEJKnJgJAkNRkQkqQm72KSRqSvO6i8e0rzxRGEJKnJ\ngJAkNRkQkqQmA0KS1GRASJKaDAhJUpMBIUlqMiAkSU0GhCSpyYCQJDUZEJKkJgNCktRkQEiSmgwI\nSVKTASFJajIgJElNBoQkqcmAkCQ19RIQSf4oyU1JbkxyfpKnJDk4yeVJbu8+D+qjNknSwNgDIsmh\nwHuByao6GlgGrAVOA7ZU1WpgS7cuSepJX1NMy4GnJlkOHAD8N7AG2NRt3wSc1FNtkiQG/6Meq6q6\nN8lHgW8DPwIuq6rLkqyoqvu63e4HVrSOT7IeWA9wxBFHjKNkaZ9y3Kbjejv31nVbezu35l8fU0wH\nMRgtHAk8E3hakrcP71NVBVTr+KraUFWTVTU5MTEx8nolaanqY4rp1cCdVTVVVT8BLgReDuxIshKg\n+9zZQ22SpE4fAfFt4KVJDkgS4HjgFuASYF23zzrg4h5qkyR1+rgGcVWSzwHXAruA64ANwNOBzUlO\nBbYDJ4+7NknSbmMPCICqOhM4c4/mHzMYTUiSFgCfpJYkNRkQkqQmA0KS1GRASJKaDAhJUlMvdzFJ\nWpz6fM2H5p8jCElSkwEhSWoyICRJTQaEJKnJgJAkNRkQkqQmA0KS1DSrgEjyitm0SZIWj9mOID45\nyzZJ0iLxuE9SJ3kZg68DnUjyx0ObfhZYNsrCJEn9mulVG/sx+Ka35cDPDLV/H3jzqIqSJPXvcQOi\nqr4GfC3JxqraPqaaJEkLwGxf1rd/kg3AquFjqupVoyhKktS/2QbEPwN/B3waeGh05UiSForZBsSu\nqjpnpJVIkhaU2d7m+vkkf5hkZZKDH/kz0sokSb2a7QhiXff5waG2Ap49v+VIkhaKWQVEVR056kIk\nSQvLrAIiye+02qvqH+e3HEnSQjHbKaYXDy0/BTgeuBYwICRpkZrtFNN7hteTHAhcMNeTdsd/Gjia\nwbWMdwC3AZ9l8KzFXcDJVfW9uZ5DkrR35vq67/8D9ua6xF8BX66qXwKeD9wCnAZsqarVwJZuXZLU\nk9leg/g8g9/0YfCSvl8GNs/lhEmeAbwSOAWgqh4EHkyyBji2220T8FXgT+ZyDknS3pvtNYiPDi3v\nArZX1T1zPOeRwBTwD0meD1wDvA9YUVX3dfvcD6yY48+XJM2DWU0xdS/tu5XBG10PAh7ci3MuB44B\nzqmqFzKYrnrUdFJVFbtHLI+SZH2SbUm2TU1N7UUZkqTHM9tvlDsZuBp4C3AycFWSub7u+x7gnqq6\nqlv/HIPA2JFkZXe+lcDO1sFVtaGqJqtqcmJiYo4lSJJmMtsppjOAF1fVToAkE8C/M/if+xNSVfcn\nuTvJc6rqNga3zN7c/VkHnN19XvxEf7Ykaf7MNiCe9Eg4dL7L3O+AAngPcF6S/YBvAb/b/bzNSU4F\ntjMYqUiSejLbgPhykkuB87v13wK+ONeTVtX1wGRj0/Fz/ZmSpPk103dS/yKDu4s+mORNwK92m/4T\nOG/UxUmS+jPTCOITwOkAVXUhcCFAkl/ptv3GSKuTJPVmpusIK6rqhj0bu7ZVI6lIkrQgzBQQBz7O\ntqfOZyGSpIVlpoDYluT392xM8nsMnoCWJC1SM12DeD9wUZK3sTsQJoH9gDeOsjBJUr8eNyCqagfw\n8iTHMXg1N8C/VdVXRl6ZJKlXs/0+iK3A1hHXIklaQPbmaWhJ0iJmQEiSmgwISVKTASFJajIgJElN\nBoQkqcmAkCQ1GRCSpCYDQpLUZEBIkpoMCElSkwEhSWoyICRJTQaEJKnJgJAkNRkQkqQmA0KS1GRA\nSJKaeguIJMuSXJfkC936wUkuT3J793lQX7VJkvodQbwPuGVo/TRgS1WtBrZ065KknvQSEEkOA14P\nfHqoeQ2wqVveBJw07rokSbv1NYL4BPAh4OGhthVVdV+3fD+wYuxVSZJ+auwBkeREYGdVXTPdPlVV\nQE1z/Pok25Jsm5qaGlWZkrTk9TGCeAXwhiR3ARcAr0ryT8COJCsBus+drYOrakNVTVbV5MTExLhq\nlqQlZ+wBUVWnV9VhVbUKWAt8pareDlwCrOt2WwdcPO7aJEm7LaTnIM4Gfi3J7cCru3VJUk+W93ny\nqvoq8NVu+bvA8X3WI0nabSGNICRJC4gBIUlqMiAkSU0GhCSpyYCQJDUZEJKkJgNCktRkQEiSmgwI\nSVKTASFJajIgJElNBoQkqcmAkCQ1GRCSpCYDQpLUZEBIkpoMCElSkwEhSWoyICRJTQaEJKnJgJAk\nNRkQkqQmA0KS1GRASJKaDAhJUpMBIUlqGntAJDk8ydYkNye5Kcn7uvaDk1ye5Pbu86Bx1yZJ2q2P\nEcQu4ANVdRTwUuBdSY4CTgO2VNVqYEu3LknqydgDoqruq6pru+X/BW4BDgXWAJu63TYBJ427NknS\nbr1eg0iyCnghcBWwoqru6zbdD6zoqSxJEj0GRJKnA/8CvL+qvj+8raoKqGmOW59kW5JtU1NTY6hU\nkpamXgIiyZMZhMN5VXVh17wjycpu+0pgZ+vYqtpQVZNVNTkxMTGegiVpCerjLqYAnwFuqaq/HNp0\nCbCuW14HXDzu2iRJuy3v4ZyvAH4buCHJ9V3bh4Gzgc1JTgW2Ayf3UJskqTP2gKiq/wAyzebjx1mL\nJGl6PkktSWoyICRJTQaEJKnJgJAkNRkQkqQmA0KS1GRASJKaDAhJUpMBIUlqMiAkSU0GhCSpyYCQ\nJDUZEJKkJgNCktRkQEiSmgwISVKTASFJajIgJElNBoQkqcmAkCQ1GRCSpCYDQpLUZEBIkpoMCElS\nkwEhSWoyICRJTQsuIJKckOS2JHckOa3veiRpqVpQAZFkGfC3wGuBo4C3Jjmq36okaWlaUAEBvAS4\no6q+VVUPAhcAa3quSZKWpIUWEIcCdw+t39O1SZLGbHnfBTxRSdYD67vVHyS5bQ4/5hDgO/NX1T5h\nKfYZlma/7fMSkFOyN31+1mx2WmgBcS9w+ND6YV3bT1XVBmDD3pwkybaqmtybn7GvWYp9hqXZb/u8\nNIyjzwttium/gNVJjkyyH7AWuKTnmiRpSVpQI4iq2pXk3cClwDLg3Kq6qeeyJGlJWlABAVBVXwS+\nOOLT7NUU1T5qKfYZlma/7fPSMPI+p6pGfQ5J0j5ooV2DkCQtEIs+IJKcm2RnkhuH2g5OcnmS27vP\ng/qscb5N0+e3JLkpycNJFt3dHtP0+SNJbk3yjSQXJTmwzxrn2zR9/vOuv9cnuSzJM/uscRRa/R7a\n9oEkleSQPmoblWn+rv8syb3d3/X1SV433+dd9AEBbARO2KPtNGBLVa0GtnTri8lGHtvnG4E3AVeM\nvZrx2Mhj+3w5cHRVPQ/4JnD6uIsasY08ts8fqarnVdULgC8Afzr2qkZvI4/tN0kOB34d+Pa4CxqD\njTT6DHy8ql7Q/Zn3a7eLPiCq6grggT2a1wCbuuVNwEljLWrEWn2uqluqai4PFe4TpunzZVW1q1u9\nksFzNYvGNH3+/tDq04BFd5Fxmn/TAB8HPsTS6vNILfqAmMaKqrqvW74fWNFnMRqLdwBf6ruIcUhy\nVpK7gbexOEcQj5FkDXBvVX2971rG7D3dlOK5o5gqX6oB8VM1uI1r0f3God2SnAHsAs7ru5ZxqKoz\nqupwBv19d9/1jFqSA4APs0TCcMg5wLOBFwD3AR+b7xMs1YDYkWQlQPe5s+d6NCJJTgFOBN5WS++e\n7vOA3+y7iDH4BeBI4OtJ7mIwlXhtkp/vtaoRq6odVfVQVT0M/D2Dt2HPq6UaEJcA67rldcDFPdai\nEUlyAoM56TdU1Q/7rmcckqweWl0D3NpXLeNSVTdU1c9V1aqqWsXgLdDHVNX9PZc2Uo/8ktt5I4Mb\nUeb3HIv9l6ok5wPHMnjb4w7gTOBfgc3AEcB24OSqGvsFoFGZps8PAJ8EJoD/Aa6vqtf0VeN8m6bP\npwP7A9/tdruyqt7ZS4EjME2fXwc8B3iYwX/b76yqe6f7GfuiVr+r6jND2+8CJqtq0bzddZq/62MZ\nTC8VcBfwB0PXVufnvIs9ICRJc7NUp5gkSTMwICRJTQaEJKnJgJAkNRkQkqQmA0KaQZKtSV6zR9v7\nk5zzOMf8YPSVSaNlQEgzO5/B96MPW9u1S4uWASHN7HPA65PsB5BkFfBM4LokW5Jcm+SG7oVxj5Lk\n2CRfGFr/m+71HyR5UZKvJbkmyaV7PBkr9c6AkGbQPWV/NfDarmktgyfxfwS8saqOAY4DPpYks/mZ\nSZ7M4Mn2N1fVi4BzgbPmu3ZpbyzvuwBpH/HINNPF3eepQIC/SPJKBq+2OJTBq+Nn8w6g5wBHA5d3\nmbKMwRs5pQXDgJBm52Lg40mOAQ6oqmu6qaIJ4EVV9ZPuHUBP2eO4XTx6pP7I9gA3VdXLRlu2NHdO\nMUmzUFU/ALYymAp65OL0M4CdXTgcBzyrceh24Kgk+3ffiX18134bMJHkZTCYckry3JF2QnqCHEFI\ns3c+cBG772g6D/h8khuAbTRerV1VdyfZzOBVzHcC13XtDyZ5M/DXSZ7B4N/iJ4CbRt4LaZZ8m6sk\nqckpJklSkwEhSWoyICRJTQaEJKnJgJAkNRkQkqQmA0KS1GRASJKa/h/RkGhK08TF6QAAAABJRU5E\nrkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEKCAYAAAAIO8L1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEAFJREFUeJzt3X+sX3ddx/Hniw4G8nPLrk3ZCh2xIWyoA8rigJhdp27i\nj24GZwnREqeTOJBFRTdInGhqMMiPOAWtbqEkc7OyLRtmAqNUkEQY7Zhs3Q9p2Jq16dbCJDA1YLe3\nf9xT96V82vu9bc8933vv85F88z3nc875nvdJ1752Puecz0lVIUnSoZ42dAGSpMlkQEiSmgwISVKT\nASFJajIgJElNBoQkqcmAkCQ1GRCSpCYDQpLUdMLQBRyLU045pVatWjV0GZK0oGzfvv3rVTU123oL\nOiBWrVrFtm3bhi5DkhaUJLvGWc8uJklSkwEhSWoyICRJTQaEJKnJgJAkNRkQkqQmA0KS1GRASJKa\nDAhJUtOCfpJams30punB9r11/dbB9i0dD55BSJKaDAhJUpMBIUlqMiAkSU0GhCSpyYCQJDUZEJKk\nJgNCktRkQEiSmgwISVKTASFJajIgJElNBoQkqcmAkCQ1GRCSpCYDQpLUZEBIkpoMCElSk68clXoy\n1OtOfdWpjhfPICRJTQaEJKmpt4BIsjLJ1iT3JtmR5O1d+8lJbk/y1e77pJFtrkyyM8kDSc7vqzZJ\n0uz6PIM4APxuVZ0B/BhwWZIzgCuALVW1GtjSzdMtWwecCVwAfCjJsh7rkyQdQW8BUVV7q+rObvrb\nwH3AqcBaYFO32ibgwm56LXBDVX2nqh4EdgJn91WfJOnI5uUaRJJVwCuALwLLq2pvt+gRYHk3fSrw\n8Mhmu7s2SdIAeg+IJM8BbgQur6pvjS6rqgJqjr93aZJtSbbt37//OFYqSRrVa0AkeToz4XBdVd3U\nNT+aZEW3fAWwr2vfA6wc2fy0ru17VNXGqlpTVWumpqb6K16Slrg+72IKcA1wX1W9f2TRrcD6bno9\ncMtI+7okJyY5HVgN3NFXfZKkI+vzSerXAr8C3J3krq7tncB7gM1JLgF2ARcDVNWOJJuBe5m5A+qy\nqnqix/okSUfQW0BU1eeBHGbxeYfZZgOwoa+aJEnj80lqSVKTASFJajIgJElNBoQkqcmAkCQ1GRCS\npCYDQpLUZEBIkpoMCElSkwEhSWoyICRJTQaEJKnJgJAkNRkQkqQmA0KS1GRASJKaDAhJUpMBIUlq\nMiAkSU0GhCSp6YShC9DSML1peugSJM2RZxCSpCYDQpLUZEBIkpoMCElSkwEhSWoyICRJTQaEJKnJ\ngJAkNRkQkqQmA0KS1GRASJKaDAhJUpMBIUlqMiAkSU0GhCSpqbeASHJtkn1J7hlp+6Mke5Lc1X1e\nP7LsyiQ7kzyQ5Py+6pIkjafPM4iPABc02j9QVWd1n9sAkpwBrAPO7Lb5UJJlPdYmSZpFbwFRVZ8D\nHhtz9bXADVX1nap6ENgJnN1XbZKk2Q1xDeJtSb7SdUGd1LWdCjw8ss7urk2SNJD5DogPAy8BzgL2\nAu+b6w8kuTTJtiTb9u/ff7zrkyR15jUgqurRqnqiqp4E/panupH2ACtHVj2ta2v9xsaqWlNVa6am\npvotWJKWsHkNiCQrRmYvAg7e4XQrsC7JiUlOB1YDd8xnbZKk73VCXz+c5HrgXOCUJLuBq4Bzk5wF\nFPAQ8JsAVbUjyWbgXuAAcFlVPdFXbZKk2fUWEFX1xkbzNUdYfwOwoa96JElz45PUkqQmA0KS1GRA\nSJKaDAhJUlNvF6klDWN60/Rg+966futg+9bx5xmEJKnJgJAkNRkQkqQmA0KS1DRWQCR57ThtkqTF\nY9wziKvHbJMkLRJHvM01yTnAa4CpJL8zsuh5gK8ElaRFbLbnIJ4BPKdb77kj7d8C3tBXUZKk4R0x\nIKrqs8Bnk3ykqnbNU02SpAkw7pPUJybZCKwa3aaqfqKPoiRJwxs3IP4R+Gvg7wBf5CNJS8C4AXGg\nqj7cayWSpIky7m2uH0/yW0lWJDn54KfXyiRJgxr3DGJ99/2OkbYCXnJ8y5EkTYqxAqKqTu+7EEnS\nZBkrIJL8aqu9qj56fMuRJE2KcbuYXj0y/UzgPOBOwICQpEVq3C6mt43OJ3kBcEMvFUmSJsLRDvf9\nX4DXJSRpERv3GsTHmblrCWYG6XsZsLmvoiRJwxv3GsSfj0wfAHZV1e4e6pEkTYixupi6QfvuZ2ZE\n15OA7/ZZlCRpeON2MV0MvBf4FyDA1UneUVUf67E2HWfTm6aHLkHSAjJuF9O7gFdX1T6AJFPApwED\nQpIWqXHvYnrawXDofGMO20qSFqBxzyA+keSTwPXd/C8Dt/VTkiRpEsz2TuofApZX1TuS/CLwum7R\nvwHX9V2cJGk4s51BfBC4EqCqbgJuAkjyw92yn++1OknSYGa7jrC8qu4+tLFrW9VLRZKkiTBbQLzg\nCMuedTwLkSRNltkCYluS3zi0McmvA9v7KUmSNAlmuwZxOXBzkjfxVCCsAZ4BXNRnYZKkYR3xDKKq\nHq2q1wDvBh7qPu+uqnOq6pEjbZvk2iT7ktwz0nZyktuTfLX7Pmlk2ZVJdiZ5IMn5x3JQkqRjN+5Y\nTFur6uru85kxf/sjwAWHtF0BbKmq1cCWbp4kZwDrgDO7bT6UZNmY+5Ek9aC3p6Gr6nPAY4c0rwU2\nddObgAtH2m+oqu9U1YPATuDsvmqTJM1uvofLWF5Ve7vpR4Dl3fSpwMMj6+3u2r5PkkuTbEuybf/+\n/f1VKklL3GDjKVVV8dRLiOay3caqWlNVa6ampnqoTJIE8x8QjyZZAdB9HxwAcA+wcmS907o2SdJA\n5jsgbgXWd9PrgVtG2tclOTHJ6cBq4I55rk2SNGLc0VznLMn1wLnAKUl2A1cB7wE2J7kE2AVcDFBV\nO5JsBu5l5pWml1XVE33VJkmaXW8BUVVvPMyi8w6z/gZgQ1/1SJLmxpf+SJKaDAhJUpMBIUlqMiAk\nSU0GhCSpyYCQJDUZEJKkJgNCktRkQEiSmgwISVKTASFJajIgJElNBoQkqcmAkCQ1GRCSpCYDQpLU\nZEBIkpoMCElSkwEhSWoyICRJTQaEJKnJgJAkNRkQkqQmA0KS1GRASJKaDAhJUpMBIUlqMiAkSU0G\nhCSpyYCQJDWdMHQBkhaP6U3Tg+x36/qtg+x3sfMMQpLUZEBIkpoMCElSkwEhSWoyICRJTYPcxZTk\nIeDbwBPAgapak+Rk4B+AVcBDwMVV9Z9D1CdJGvYMYrqqzqqqNd38FcCWqloNbOnmJUkDmaQuprXA\npm56E3DhgLVI0pI3VEAU8Okk25Nc2rUtr6q93fQjwPJhSpMkwXBPUr+uqvYk+UHg9iT3jy6sqkpS\nrQ27QLkU4EUvelH/lfZgqKdNJWkuBjmDqKo93fc+4GbgbODRJCsAuu99h9l2Y1Wtqao1U1NT81Wy\nJC058x4QSZ6d5LkHp4GfBu4BbgXWd6utB26Z79okSU8ZootpOXBzkoP7//uq+kSSLwGbk1wC7AIu\nHqA2SVJn3gOiqr4G/Gij/RvAefNdjySpbZJuc5UkTRADQpLUZEBIkpoMCElSkwEhSWoyICRJTQaE\nJKnJgJAkNRkQkqQmA0KS1GRASJKaDAhJUpMBIUlqMiAkSU0GhCSpyYCQJDUZEJKkJgNCktRkQEiS\nmgwISVKTASFJajIgJElNBoQkqcmAkCQ1GRCSpCYDQpLUZEBIkppOGLoASTpW05umB9nv1vVbB9nv\nfPEMQpLUZEBIkpoMCElSkwEhSWpa0heph7qwJUkLgWcQkqQmA0KS1GRASJKaJi4gklyQ5IEkO5Nc\nMXQ9krRUTdRF6iTLgL8CfgrYDXwpya1Vde+wlUnS9xvyRpf5eIp70s4gzgZ2VtXXquq7wA3A2oFr\nkqQladIC4lTg4ZH53V2bJGmeTVQX0ziSXApc2s0+nuSBMTc9Bfh6P1XNG49hMngMk2FJH0PenGPZ\n74vHWWnSAmIPsHJk/rSu7f9V1UZg41x/OMm2qlpzbOUNy2OYDB7DZPAY+jdpXUxfAlYnOT3JM4B1\nwK0D1yRJS9JEnUFU1YEkbwU+CSwDrq2qHQOXJUlL0kQFBEBV3Qbc1sNPz7lbagJ5DJPBY5gMHkPP\nUlVD1yBJmkCTdg1CkjQhlkxAJPmTJF9JcleSTyV54dA1zVWS9ya5vzuOm5O8YOia5irJLyXZkeTJ\nJBN790bLYhgGJsm1SfYluWfoWo5GkpVJtia5t/vv6O1D1zRXSZ6Z5I4k/94dw7uHrulwlkwXU5Ln\nVdW3uunfBs6oqrcMXNacJPlp4DPdxfw/A6iqPxi4rDlJ8jLgSeBvgN+rqm0DlzSWbhiY/2BkGBjg\njQttGJgkPw48Dny0ql4+dD1zlWQFsKKq7kzyXGA7cOFC+nNIEuDZVfV4kqcDnwfeXlVfGLi077Nk\nziAOhkPn2cCCS8aq+lRVHehmv8DMcyILSlXdV1XjPtw4SRbFMDBV9TngsaHrOFpVtbeq7uymvw3c\nxwIbbaFmPN7NPr37TOS/R0smIACSbEjyMPAm4A+HrucY/Rrwz0MXsYQ4DMyESbIKeAXwxWErmbsk\ny5LcBewDbq+qiTyGRRUQST6d5J7GZy1AVb2rqlYC1wFvHbbattmOoVvnXcABZo5j4oxzDNKxSPIc\n4Ebg8kN6BxaEqnqiqs5iphfg7CQT2d03cc9BHIuq+skxV72OmWctruqxnKMy2zEkeTPwc8B5NaEX\nkObw57CQzDoMjOZH129/I3BdVd00dD3Hoqq+mWQrcAEwcTcOLKoziCNJsnpkdi1w/1C1HK0kFwC/\nD/xCVf330PUsMQ4DMwG6C7zXAPdV1fuHrudoJJk6eAdikmcxc+PDRP57tJTuYroReCkzd9DsAt5S\nVQvq/wCT7AROBL7RNX1hAd6JdRFwNTAFfBO4q6rOH7aq8SR5PfBBnhoGZsPAJc1ZkuuBc5kZRfRR\n4KqqumbQouYgyeuAfwXuZubvMsA7uxEYFoQkPwJsYua/o6cBm6vqj4etqm3JBIQkaW6WTBeTJGlu\nDAhJUpMBIUlqMiAkSU0GhCSpyYCQZtGNHnr+IW2XJ/nwEbZ5/HDLpIXCgJBmdz0zD8aNWte1S4uW\nASHN7mPAz3ZPUB8cJO6FwJeTbElyZ5K7W2NNJTk3yT+NzP9lN1wKSV6V5LNJtif5ZDeUtTQxDAhp\nFlX1GHAH8DNd0zpgM/A/wEVV9UpgGnhfNxTErLrxhK4G3lBVrwKuBRbck9la3BbVYH1Sjw52M93S\nfV8CBPjT7iU8TzIz/Pdy4JExfu+lwMuB27tMWQbsPf5lS0fPgJDGcwvwgSSvBH6gqrZ3XUVTwKuq\n6n+TPAQ885DtDvC9Z+oHlwfYUVXn9Fu2dPTsYpLG0L0BbCszXUEHL04/H9jXhcM08OLGpruAM5Kc\n2I3geV7X/gAwleQcmOlySnJmrwchzZFnENL4rgdu5qk7mq4DPp7kbmAbjSGbq+rhJJuZGev/QeDL\nXft3k7wB+Iskz2fm7+IHgR29H4U0JkdzlSQ12cUkSWoyICRJTQaEJKnJgJAkNRkQkqQmA0KS1GRA\nSJKaDAhJUtP/AdirAzKSPOSXAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "\n", "def rand_plot(samples):\n", " n, bins, patches = plt.hist(samples, 10, normed=0, facecolor='green', alpha=0.75)\n", " plt.xlabel(\"Value\")\n", " plt.ylabel(\"Count\")\n", " plt.show()\n", " \n", "#uniform random number\n", "\n", "unif = [random.uniform(10, 15) for _ in range(1000)]\n", "rand_plot(unif)\n", "\n", "#gaussian random number\n", "\n", "gaussian = [random.gauss(mu, sigma) for _ in range(1000)]\n", "rand_plot(gaussian)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also do random things with lists." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "duck\n", "tails\n", "['eagle', 'swan', 'goose', 'duck']\n" ] } ], "source": [ "#randomly pick one item\n", "\n", "birds = [\"duck\", \"goose\", \"eagle\", \"swan\"]\n", "\n", "print(random.choice(birds))\n", "\n", "#coin toss\n", "coin = [\"heads\", \"tails\"]\n", "print(random.choice(coin))\n", "\n", "#shuffle the items of a list in place\n", "random.shuffle(birds)\n", "print(birds)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data structures\n", "\n", "The `collections` module lets us enhance some of the container types we've seen for more user friendliness." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Counter({'red': 4, 'blue': 2, 'black': 1})\n", "4\n", "[('red', 4), ('blue', 2)]\n" ] } ], "source": [ "import collections\n", "\n", "#count number of occurences from a list\n", "c = collections.Counter([\"red\", \"red\", \"red\", \"black\", \"red\", \"blue\", \"blue\"])\n", "print(c)\n", "print(c['red'])\n", "#get the 2 most common elements\n", "print(c.most_common(2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`namedtuple` lets us give names to the indices of a tuple." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.1\n", "Carlos\n", "cs\n" ] } ], "source": [ "Student = collections.namedtuple('Student', ['name', 'grade', 'major'])\n", "\n", "s = Student('Carlos', 2.1, 'cs')\n", "print(s.grade)\n", "print(s.name)\n", "print(s.major)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Useful for giving CSV entries meaningful names.\n", "\n", "`test.csv`:\n", "\n", "```\n", "carlos,2.4,cs\n", "jim,3.1,math\n", "joan,2.5,phys\n", "jack,3.6,cs\n", "```" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Student(name='carlos', grade='2.4', major='cs')\n", "carlos\n", "Student(name='jim', grade='3.1', major='math')\n", "jim\n", "Student(name='joan', grade='2.5', major='phys')\n", "joan\n", "Student(name='jack', grade='3.6', major='cs')\n", "jack\n" ] } ], "source": [ "with open(\"test.csv\", \"r\") as students:\n", " for s in students:\n", " #the _make() function lets you make a NamedTuple from an iterable\n", " line = s.strip().split(\",\")\n", " tup = Student._make(line)\n", " print(tup)\n", " print(tup.name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `datetime` module is useful for handling date formats." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2017-11-09\n", "2017\n", "2017-11-06\n", "\n", "Days till Christmas: 49 days, 0:00:00\n", "0\n", "0\n" ] } ], "source": [ "import datetime as dt\n", "\n", "date = dt.date(2017, 11, 9)\n", "print(date)\n", "print(date.year)\n", "\n", "#today's date\n", "print(dt.date.today())\n", "\n", "#compare dates\n", "christmas = dt.date(2017, 12, 25)\n", "till_christmas = christmas - dt.date.today()\n", "#produces a timedelta object\n", "print(type(till_christmas))\n", "\n", "print(f\"Days till Christmas: {till_christmas}\")\n", "\n", "#day of the week as an integer\n", "print(dt.date.today().weekday())\n", "print(christmas.weekday())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Quality Control\n", "\n", "The `timeit` module helps you time the execution of some code snippets." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "7.79176791899954" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import timeit\n", "\n", "timeit.timeit(\"[x*x for x in range(100)]\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `doctest` module lets you put executable python in docstrings as test calls to make sure everything works as expected. The module looks for `>>>` interactive python calls and compares the actual call to what is in the string as the output. " ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "TestResults(failed=0, attempted=2)" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import doctest\n", "\n", "def mysquare(x):\n", " \"\"\"\n", " This function computes the square of a number.\n", " >>> mysquare(5)\n", " 25\n", " \"\"\"\n", " return x*x\n", "def mymean(nums):\n", " \"\"\"\n", " This function computes the mean of a list of numbers.\n", " >>> mymean([2, 2, 3, 4])\n", " 2.75\n", " \"\"\"\n", " tot = 0\n", " for i in nums:\n", " tot += i\n", " return tot / len(nums)\n", "\n", "doctest.testmod()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Storage\n", "\n", "`pickle` is a very useful module for storing python objects in files so that you can keep working on them later.\n" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'dog': 0.3856088112400382, 'cat': 0.7451045373130774, 'giraffe': 0.4958727957365945, 'lion': 0.7046721287417177, 'zebra': 0.07175260118616189}\n" ] } ], "source": [ "import pickle\n", "\n", "rand_dict = {}\n", "\n", "animals = [\"dog\", \"cat\", \"giraffe\", \"lion\", \"zebra\"]\n", "\n", "for a in animals:\n", " rand_dict[a] = random.random()\n", "print(rand_dict)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I can now store, or **dump** the dictionary to a file.\n", "\n", "Pickle stores objects as a binary representation which is not human readable and only works in Python but is very fast." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "pickle.dump(rand_dict, open(\"rand_dict.pickle\", \"wb\"))" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": true }, "outputs": [], "source": [ "loaded = pickle.load(open(\"rand_dict.pickle\", \"rb\"))" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'dog': 0.3856088112400382, 'cat': 0.7451045373130774, 'giraffe': 0.4958727957365945, 'lion': 0.7046721287417177, 'zebra': 0.07175260118616189}\n" ] } ], "source": [ "print(loaded)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`json` does a similar job but the contents are human-readable and can be read by any language. The downside is it's not as fast.\n", "\n", "JSON cannot store any custom classes and not all python classes can be JSONed." ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import json\n", "\n", "json.dump(rand_dict, open(\"rand_dict.json\", \"w\"))" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": true }, "outputs": [], "source": [ "jsoned = json.load(open(\"rand_dict.json\", \"r\"))" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'cat': 0.7451045373130774,\n", " 'dog': 0.3856088112400382,\n", " 'giraffe': 0.4958727957365945,\n", " 'lion': 0.7046721287417177,\n", " 'zebra': 0.07175260118616189}" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "jsoned" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Multiprocessing\n", "\n", "Sometimes you can have tasks that can be easily parallelized.\n", "\n", "Since most computers have more than one processor, we can let multiple processors work on our Python at the same time.\n", "\n", "For example:\n", "\n", "For a given number $n$ I want to compute the sum of every number **up to $n$** cubed. \n", "\n", "Obviously the process of squaring a particular number in the list is independent of squaring any other number." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Parallel job took: 14.64725112915039\n", "Serial job took 25.040592908859253\n" ] } ], "source": [ "from multiprocessing import Pool\n", "import time\n", "\n", "def cube_sum(x):\n", " return sum([i**3 for i in range(x)])\n", "\n", "#we use the context manager to take care of all the setup\n", "#we create a Pool object which contains the processors we can send tasks to\n", "#here we have chosen to use 4 processes\n", "\n", "start = time.time()\n", "nums = [i for i in range(10000)]\n", "\n", "with Pool(4) as p:\n", " result = p.map(cube_sum, nums)\n", "print(f\"Parallel job took: {time.time() - start}\")\n", "\n", "### normally:\n", "start_serial = time.time()\n", "serial_result = [cube_sum(x) for x in nums]\n", "print(f\"Serial job took {time.time() - start_serial}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The reason I came up with such a weird function is that parallelizing is not always faster.\n", "\n", "There is quite a bit of setup and communication that needs to happen to coordinate the processors (aka **overhead**).\n", "\n", "If the actual comptuation is faster than the overhead then the normal serial method is faster." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Others\n", "\n", "There are many other modules that I did not cover, and many other functionalities of the ones I did cover that I didn't have time to show you.\n", "\n", "Some notable Standard Library modules worth looking into:\n", "\n", "* `re`: searching for patterns inside strings\n", "* `statistics`: basics statistics function (mean, std, etc)\n", "* `os.path`, `glob`: handling file paths \n", "* `csv`: automatic CSV file parsing\n", "* `logging`: code and error logging\n", "* `argpars`: command line argument parser\n", "* `tkinter`: building graphical user interfaces" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 2 }