{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Writing Modules and Functions\n", "===\n", "\n", "## Unit 14, Lecture 2\n", "\n", "\n", "*Numerical Methods and Statistics*\n", "\n", "----\n", "\n", "\n", "#### May 1, 2018" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Writing Good, Reliable Documented Functions\n", "====\n", "\n", "We're going to focus now on what goes into writing a good Python function. If you want your function to be reusable, you need to store it in a textfile that ends in `.py`. We can do this using the `%%writefile` magic. Let's see an example:" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting test.py\n" ] } ], "source": [ "%%writefile test.py\n", "\n", "def hello_world():\n", " print('Hello World')" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello World\n" ] } ], "source": [ "import test\n", "\n", "test.hello_world()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "If you look in the file system, you'll see we have a file called `test.py`. If it's in the same directory as you, you can get everything from the test file using `import`. Here's some examples of it's somewhere else:\n", "\n", "1. If `test.py` is in the parent directory of yours: `import ..test`\n", "2. If `test.py` is in a subdirectory called sub: `import sub.test`. To do that though you need to have an empty file called `__init__.py` inside of the `sub` folder" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Modules\n", "===\n", "\n", "This file we've created is called a module, just like the math or numpy module. We can have multiple functions inside the module as well as variables. " ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting test.py\n" ] } ], "source": [ "%%writefile test.py\n", "\n", "pi = 3.0\n", "\n", "def square(x):\n", " return x*x\n", "\n", "\n", "def hello_world():\n", " print('Hello World')" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "ename": "AttributeError", "evalue": "module 'test' has no attribute 'pi'", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mAttributeError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mtest\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'pi is exactly {}'\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mtest\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mpi\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mAttributeError\u001b[0m: module 'test' has no attribute 'pi'" ] } ], "source": [ "import test\n", "print('pi is exactly {}'.format(test.pi))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Uh-oh! It is using an outdated of `test.py`. To get python to reload it, we can restart the kernel or use the `reload` command" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pi is exactly 3.0\n" ] } ], "source": [ "from importlib import reload\n", "reload(test)\n", "print('pi is exactly {}'.format(test.pi))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Documenting\n", "===\n", "\n", "You can add helpful documentation at the module (top of file) and function level" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting test.py\n" ] } ], "source": [ "%%writefile test.py\n", "'''This module contains nonsense'''\n", "\n", "pi = 3.0\n", "\n", "def square(x):\n", " '''Want to square a number? This function will help'''\n", " return x*x\n", "\n", "\n", "def hello_world():\n", " print('Hello World')" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on module test:\n", "\n", "NAME\n", " test - This module contains nonsense\n", "\n", "FUNCTIONS\n", " hello_world()\n", " \n", " square(x)\n", " Want to square a number? This function will help\n", "\n", "DATA\n", " pi = 3.0\n", "\n", "FILE\n", " /home/jovyan/work/test.py\n", "\n", "\n" ] } ], "source": [ "reload(test)\n", "help(test)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Writing a good function\n", "====\n", "\n", "The reason for creating a module like `test.py` is to write a function once and for all so you don't need to copy-pasta. Let's try this for confidence intervals of data. Here are the steps:\n", "\n", "0. Document what your function should do (plan)\n", "1. Get basic functionality working in a notebook (develop)\n", "2. Move function to a file and import (deploy)\n", "3. Write some cells in a notebook to test basic cases until you have everything working (test)\n", "4. Finally polish off your code by testing bad inputs and trying to break it (more testing)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Example: Writing a function to compute confidence intervals\n", "----\n", "\n", "Let's see this in action for computing confidence intervals" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "1. Plan\n", "---\n", "\n", "I'll be writing out the documentation. I'll use a docstring format called Napoleon. This is more complex than what we've seen before. We specify the function, how it works, examples, what it takes and what it returns. It's important to write your documentation FIRST, so you know what to write" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def conf_interval(data, interval_type='double', confidence=0.95):\n", " '''This function takes in the data and computes a confidence interval\n", " \n", " Examples\n", " --------\n", "\n", " data = [4,3,2,5]\n", " center, width = conf_interval(data)\n", " print('The mean is {} +/- {}'.format(center, width))\n", " \n", " Parameters\n", " ----------\n", " data : list\n", " The list of data points\n", " interval_type : str\n", " What kind of confidence interval. Can be double, upper, lower.\n", " confidence : float\n", " The confidence of the interval\n", " Returns\n", " -------\n", " center, width\n", " Center is the mean of the data. Width is the width of the confidence interval. \n", " If a lower or upper is specified, width is the upper or lower value.\n", " '''" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "2. Develop\n", "----\n", "\n", "Let's try first of all to compute just a double-sided confidence interval" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.66666666667 1.63127456586 0.975\n" ] } ], "source": [ "import scipy.stats as ss\n", "import numpy as np\n", "\n", "data = [4,3,5,3,6, 7]\n", "interval_type = 'double'\n", "confidence = 0.95\n", "\n", "center = np.mean(data)\n", "s = np.std(data, ddof=1)\n", "ppf = 1 - (1 - confidence) / 2\n", "t = ss.t.ppf(ppf, len(data))\n", "width = s / np.sqrt(len(data)) * t\n", "\n", "print(center, width, ppf)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Now let's try adding some logic for the interval_type of confidence interval" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.66666666667 1.29545352026\n" ] } ], "source": [ "interval_type = 'lower'\n", "if interval_type == 'lower':\n", " ppf = confidence\n", " t = ss.t.ppf(ppf, len(data))\n", " top = s / np.sqrt(len(data)) * t\n", " print(center, top)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The lower confidence interval should run from neg-infinity to a value above the mean. We need to adjust the code." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.66666666667 5.96212018693\n" ] } ], "source": [ "interval_type = 'lower'\n", "if interval_type == 'lower':\n", " ppf = confidence\n", " t = ss.t.ppf(ppf, len(data))\n", " top = s / np.sqrt(len(data)) * t\n", " print(center, center + top)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.66666666667 3.3712131464\n" ] } ], "source": [ "interval_type = 'upper'\n", "if interval_type == 'upper':\n", " ppf = 1 - confidence\n", " t = ss.t.ppf(ppf, len(data))\n", " top = s / np.sqrt(len(data)) * t\n", " print(center, center + top)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We can see there is quite a bit of code-repeat. Let's try to put the whole thing together without repeats" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.66666666667 5.96212018693 0.95\n" ] } ], "source": [ "import scipy.stats as ss\n", "import numpy as np\n", "\n", "data = [4,3,5,3,6, 7]\n", "interval_type = 'lower'\n", "confidence = 0.95\n", "\n", "center = np.mean(data)\n", "s = np.std(data, ddof=1)\n", "if interval_type == 'lower':\n", " ppf = confidence\n", "elif interval_type == 'upper':\n", " ppf = 1 - confidence\n", "else:\n", " ppf = 1 - (1 - confidence) / 2\n", "t = ss.t.ppf(ppf, len(data))\n", "width = s / np.sqrt(len(data)) * t\n", "\n", "if interval_type == 'lower' or interval_type == 'upper':\n", " width = width + center\n", "\n", "print(center, width, ppf)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "3. Deploy\n", "---\n", "\n", "Let's put everything together now into a file" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting utilities.py\n" ] } ], "source": [ "%%writefile utilities.py\n", "\n", "import scipy.stats as ss\n", "import numpy as np\n", "\n", "def conf_interval(data, interval_type='double', confidence=0.95):\n", " '''This function takes in the data and computes a confidence interval\n", " \n", " Examples\n", " --------\n", "\n", " data = [4,3,2,5]\n", " center, width = conf_interval(data)\n", " print('The mean is {} +/- {}'.format(center, width))\n", " \n", " Parameters\n", " ----------\n", " data : list\n", " The list of data points\n", " interval_type : str\n", " What kind of confidence interval. Can be double, upper, lower.\n", " confidence : float\n", " The confidence of the interval\n", " Returns\n", " -------\n", " center, width\n", " Center is the mean of the data. Width is the width of the confidence interval. \n", " If a lower or upper is specified, width is the upper or lower value.\n", " '''\n", "\n", " center = np.mean(data)\n", " s = np.std(data, ddof=1)\n", " if interval_type == 'lower':\n", " ppf = confidence\n", " elif interval_type == 'upper':\n", " ppf = 1 - confidence\n", " else:\n", " ppf = 1 - (1 - confidence) / 2\n", " t = ss.t.ppf(ppf, len(data))\n", " width = s / np.sqrt(len(data)) * t\n", " \n", " if interval_type == 'lower' or interval_type == 'upper':\n", " width = center + width\n", " return center, width" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import utilities\n", "reload(utilities)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "I wrote some example code with the documentation. Let's see if it works" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The mean is 3.5 +/- 1.792187609015029\n" ] } ], "source": [ "data = [4,3,2,5]\n", "center, width = utilities.conf_interval(data)\n", "print('The mean is {} +/- {}'.format(center, width))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "4. Test\n", "----\n", "\n", "Let's now test the code for a few different cases" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(12.413339150292749, 0.062443365827491451)\n" ] } ], "source": [ "#see if it recovers the correct mean\n", "data = ss.norm.rvs(size=1000, loc=12.4)\n", "print(utilities.conf_interval(data))" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(12.413339150292749, 12.360949919612446)\n" ] } ], "source": [ "#see if it can handle upper/lower\n", "print(utilities.conf_interval(data, 'upper'))" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(12.413339150292749, 12.465728380973053)\n" ] } ], "source": [ "print(utilities.conf_interval(data, 'lower'))" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(12.413339150292749, 0.036626408883252866)\n" ] } ], "source": [ "#Check different confidence values\n", "print(utilities.conf_interval(data, confidence=0.75))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "5. Break it\n", "---" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "(12.413339150292749, nan)" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "utilities.conf_interval(data, confidence=95)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This is a pretty usual mistake. We should probably check that confidence is a valid probability. " ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.5/site-packages/numpy/core/_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice\n", " warnings.warn(\"Degrees of freedom <= 0 for slice\", RuntimeWarning)\n", "/opt/conda/lib/python3.5/site-packages/numpy/core/_methods.py:116: RuntimeWarning: invalid value encountered in double_scalars\n", " ret = ret.dtype.type(ret / rcount)\n" ] }, { "data": { "text/plain": [ "(3.0, nan)" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "utilities.conf_interval([3], confidence=0.5)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Uh-oh, only one value was given. We should probably warn if there are not enough values." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting utilities.py\n" ] } ], "source": [ "%%writefile utilities.py\n", "\n", "import scipy.stats as ss\n", "import numpy as np\n", "\n", "def conf_interval(data, interval_type='double', confidence=0.95):\n", " '''This function takes in the data and computes a confidence interval\n", " \n", " Examples\n", " --------\n", "\n", " data = [4,3,2,5]\n", " center, width = conf_interval(data)\n", " print('The mean is {} +/- {}'.format(center, width))\n", " \n", " Parameters\n", " ----------\n", " data : list\n", " The list of data points\n", " interval_type : str\n", " What kind of confidence interval. Can be double, upper, lower.\n", " confidence : float\n", " The confidence of the interval\n", " Returns\n", " -------\n", " center, width\n", " Center is the mean of the data. Width is the width of the confidence interval. \n", " If a lower or upper is specified, width is the upper or lower value.\n", " '''\n", " \n", " if(len(data) < 3):\n", " print('Not enough data given. Must have at least 3 values')\n", "\n", " center = np.mean(data)\n", " s = np.std(data, ddof=1)\n", " if interval_type == 'lower':\n", " ppf = confidence\n", " elif interval_type == 'upper':\n", " ppf = 1 - confidence\n", " else:\n", " ppf = 1 - (1 - confidence) / 2\n", " t = ss.t.ppf(ppf, len(data))\n", " width = s / np.sqrt(len(data)) * t\n", " \n", " if interval_type == 'lower' or interval_type == 'upper':\n", " width = center + width\n", " return center, width" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Not enough data given. Must have at least 3 values\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.5/site-packages/numpy/core/_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice\n", " warnings.warn(\"Degrees of freedom <= 0 for slice\", RuntimeWarning)\n", "/opt/conda/lib/python3.5/site-packages/numpy/core/_methods.py:116: RuntimeWarning: invalid value encountered in double_scalars\n", " ret = ret.dtype.type(ret / rcount)\n" ] }, { "data": { "text/plain": [ "(3.0, nan)" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reload(utilities)\n", "utilities.conf_interval([3])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Ah, but notice it didn't actually stop the program!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Exceptions\n", "---\n", "\n", "What we need is to do one of those error messages you see a lot. We can do that by *raising* an exception" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "ename": "RuntimeError", "evalue": "This is a problem", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mRuntimeError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[1;32mraise\u001b[0m \u001b[0mRuntimeError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'This is a problem'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mRuntimeError\u001b[0m: This is a problem" ] } ], "source": [ "raise RuntimeError('This is a problem')" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "ename": "ValueError", "evalue": "Your value is bad and you should feel bad", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[1;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'Your value is bad and you should feel bad'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mValueError\u001b[0m: Your value is bad and you should feel bad" ] } ], "source": [ "raise ValueError('Your value is bad and you should feel bad')" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting utilities.py\n" ] } ], "source": [ "%%writefile utilities.py\n", "\n", "import scipy.stats as ss\n", "import numpy as np\n", "\n", "def conf_interval(data, interval_type='double', confidence=0.95):\n", " '''This function takes in the data and computes a confidence interval\n", " \n", " Examples\n", " --------\n", "\n", " data = [4,3,2,5]\n", " center, width = conf_interval(data)\n", " print('The mean is {} +/- {}'.format(center, width))\n", " \n", " Parameters\n", " ----------\n", " data : list\n", " The list of data points\n", " interval_type : str\n", " What kind of confidence interval. Can be double, upper, lower.\n", " confidence : float\n", " The confidence of the interval\n", " Returns\n", " -------\n", " center, width\n", " Center is the mean of the data. Width is the width of the confidence interval. \n", " If a lower or upper is specified, width is the upper or lower value.\n", " '''\n", " \n", " if(len(data) < 3):\n", " raise ValueError('Not enough data given. Must have at least 3 values')\n", "\n", " center = np.mean(data)\n", " s = np.std(data, ddof=1)\n", " if interval_type == 'lower':\n", " ppf = confidence\n", " elif interval_type == 'upper':\n", " ppf = 1 - confidence\n", " else:\n", " ppf = 1 - (1 - confidence) / 2\n", " t = ss.t.ppf(ppf, len(data))\n", " width = s / np.sqrt(len(data)) * t\n", " \n", " if interval_type == 'lower' or interval_type == 'upper':\n", " width = center + width\n", " return center, width" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "ename": "ValueError", "evalue": "Not enough data given. Must have at least 3 values", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[0mreload\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mutilities\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mutilities\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mconf_interval\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m3\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;32m/home/jovyan/work/utilities.py\u001b[0m in \u001b[0;36mconf_interval\u001b[1;34m(data, interval_type, confidence)\u001b[0m\n\u001b[0;32m 29\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 30\u001b[0m \u001b[1;32mif\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;33m<\u001b[0m \u001b[1;36m3\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 31\u001b[1;33m \u001b[1;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'Not enough data given. Must have at least 3 values'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 32\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 33\u001b[0m \u001b[0mcenter\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mmean\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;31mValueError\u001b[0m: Not enough data given. Must have at least 3 values" ] } ], "source": [ "reload(utilities)\n", "utilities.conf_interval([3])" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting utilities.py\n" ] } ], "source": [ "%%writefile utilities.py\n", "\n", "import scipy.stats as ss\n", "import numpy as np\n", "\n", "def conf_interval(data, interval_type='double', confidence=0.95):\n", " '''This function takes in the data and computes a confidence interval\n", " \n", " Examples\n", " --------\n", "\n", " data = [4,3,2,5]\n", " center, width = conf_interval(data)\n", " print('The mean is {} +/- {}'.format(center, width))\n", " \n", " Parameters\n", " ----------\n", " data : list\n", " The list of data points\n", " interval_type : str\n", " What kind of confidence interval. Can be double, upper, lower.\n", " confidence : float\n", " The confidence of the interval\n", " Returns\n", " -------\n", " center, width\n", " Center is the mean of the data. Width is the width of the confidence interval. \n", " If a lower or upper is specified, width is the upper or lower value.\n", " '''\n", " \n", " if(len(data) < 3):\n", " raise ValueError('Not enough data given. Must have at least 3 values')\n", " if(interval_type not in ['upper', 'lower', 'double']):\n", " raise ValueError('I do not know how to make a {} confidence interval'.format(interval_type))\n", " if(0 > confidence or confidence > 1):\n", " raise ValueError('Confidence must be between 0 and 1')\n", " \n", " center = np.mean(data)\n", " s = np.std(data, ddof=1)\n", " if interval_type == 'lower':\n", " ppf = confidence\n", " elif interval_type == 'upper':\n", " ppf = 1 - confidence\n", " else:\n", " ppf = 1 - (1 - confidence) / 2\n", " t = ss.t.ppf(ppf, len(data))\n", " width = s / np.sqrt(len(data)) * t\n", " \n", " if interval_type == 'lower' or interval_type == 'upper':\n", " width = center + width\n", " return center, width" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "ename": "ValueError", "evalue": "Not enough data given. Must have at least 3 values", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[0mreload\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mutilities\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mutilities\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mconf_interval\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m3\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;32m/home/jovyan/work/utilities.py\u001b[0m in \u001b[0;36mconf_interval\u001b[1;34m(data, interval_type, confidence)\u001b[0m\n\u001b[0;32m 29\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 30\u001b[0m \u001b[1;32mif\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;33m<\u001b[0m \u001b[1;36m3\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 31\u001b[1;33m \u001b[1;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'Not enough data given. Must have at least 3 values'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 32\u001b[0m \u001b[1;32mif\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0minterval_type\u001b[0m \u001b[1;32mnot\u001b[0m \u001b[1;32min\u001b[0m \u001b[1;33m[\u001b[0m\u001b[1;34m'upper'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m'lower'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m'double'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 33\u001b[0m \u001b[1;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'I do not know how to make a {} confidence interval'\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0minterval_type\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;31mValueError\u001b[0m: Not enough data given. Must have at least 3 values" ] } ], "source": [ "reload(utilities)\n", "utilities.conf_interval([3])" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "ename": "ValueError", "evalue": "Confidence must be between 0 and 1", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mutilities\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mconf_interval\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m3\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;36m4\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;36m32\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mconfidence\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;36m95\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;32m/home/jovyan/work/utilities.py\u001b[0m in \u001b[0;36mconf_interval\u001b[1;34m(data, interval_type, confidence)\u001b[0m\n\u001b[0;32m 33\u001b[0m \u001b[1;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'I do not know how to make a {} confidence interval'\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0minterval_type\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 34\u001b[0m \u001b[1;32mif\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;36m0\u001b[0m \u001b[1;33m>\u001b[0m \u001b[0mconfidence\u001b[0m \u001b[1;32mor\u001b[0m \u001b[0mconfidence\u001b[0m \u001b[1;33m>\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 35\u001b[1;33m \u001b[1;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'Confidence must be between 0 and 1'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 36\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 37\u001b[0m \u001b[0mcenter\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mmean\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;31mValueError\u001b[0m: Confidence must be between 0 and 1" ] } ], "source": [ "utilities.conf_interval([3,4,32], confidence=95)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Packaging up your files\n", "====\n", "\n", "Now we'll learn how to put all our files together into a package that we can always use." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "You need to arrange your files and folders in a special way. Let's say I'm putting all my functions together into a package called che116. I need to arrange it like this:\n", "\n", " che116-package/ <-- the top directory\n", " setup.py <-- the file which gives info about the package\n", " che116/ <-- a folder where the code is stored\n", " __init__.py <-- a completely empty file. The name is important\n", " stats.py <-- where I would put some functions related to stats" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here's the contents of the three files we need to make. **NOTE:** You need to create the folders above before you can run this. Change the stuff after `%%writefile` to match where you want it." ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting unit_15/che116-package/setup.py\n" ] } ], "source": [ "%%writefile unit_15/che116-package/setup.py\n", "\n", "from setuptools import setup\n", "\n", "setup(name = 'che116', #the name for install purposes\n", " author = 'Andrew White', #for your own info\n", " description = 'Some stuff I wrote for CHE 116', #displayed when install/update\n", " version='1.0',\n", " packages=['che116']) #This name should match the directory where you put your code" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting unit_15/che116-package/che116/__init__.py\n" ] } ], "source": [ "%%writefile unit_15/che116-package/che116/__init__.py\n", "'''You can put some comments in here if you want. They should describe the package.'''" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting unit_15/che116-package/che116/stats.py\n" ] } ], "source": [ "%%writefile unit_15/che116-package/che116/stats.py\n", "\n", "\n", "import scipy.stats as ss\n", "import numpy as np\n", "\n", "def conf_interval(data, interval_type='double', confidence=0.95):\n", " '''This function takes in the data and computes a confidence interval\n", " \n", " Examples\n", " --------\n", "\n", " data = [4,3,2,5]\n", " center, width = conf_interval(data)\n", " print('The mean is {} +/- {}'.format(center, width))\n", " \n", " Parameters\n", " ----------\n", " data : list\n", " The list of data points\n", " interval_type : str\n", " What kind of confidence interval. Can be double, upper, lower.\n", " confidence : float\n", " The confidence of the interval\n", " Returns\n", " -------\n", " center, width\n", " Center is the mean of the data. Width is the width of the confidence interval. \n", " If a lower or upper is specified, width is the upper or lower value.\n", " '''\n", " \n", " if(len(data) < 3):\n", " raise ValueError('Not enough data given. Must have at least 3 values')\n", " if(interval_type not in ['upper', 'lower', 'double']):\n", " raise ValueError('I do not know how to make a {} confidence interval'.format(interval_type))\n", " if(0 > confidence or confidence > 1):\n", " raise ValueError('Confidence must be between 0 and 1')\n", " \n", " center = np.mean(data)\n", " s = np.std(data, ddof=1)\n", " if interval_type == 'lower':\n", " ppf = confidence\n", " elif interval_type == 'upper':\n", " ppf = 1 - confidence\n", " else:\n", " ppf = 1 - (1 - confidence) / 2\n", " t = ss.t.ppf(ppf, len(data))\n", " width = s / np.sqrt(len(data)) * t\n", " \n", " if interval_type == 'lower' or interval_type == 'upper':\n", " width = center + width\n", " return center, width" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Installing your package\n", "---\n", "\n", "Once you're done, run `pip install -e [path to your folder]`, where the path is the directory where you put the setup.py file. The `-e` means editable: if you edit any of the above files, you do not need to reinstall" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "['Obtaining file:///home/jovyan/work/unit_15/che116-package',\n", " 'Installing collected packages: che116',\n", " ' Running setup.py develop for che116',\n", " 'Successfully installed che116-1.0']" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%system pip install -e unit_15/che116-package" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "(3.6666666666666665, 4.7274821017614208)" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#YOU MUST RESTART KERNEL FIRST TIME THROUGH\n", "#after intall + restart, you'll always have your package available\n", "import che116.stats as cs\n", "\n", "cs.conf_interval([4,3,4])" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on package che116:\n", "\n", "NAME\n", " che116 - You can put some comments in here if you want. They should describe the package.\n", "\n", "PACKAGE CONTENTS\n", " stats\n", "\n", "FILE\n", " /home/jovyan/work/unit_15/che116-package/che116/__init__.py\n", "\n", "\n" ] } ], "source": [ "help(che116)" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on module che116.stats in che116:\n", "\n", "NAME\n", " che116.stats\n", "\n", "FUNCTIONS\n", " conf_interval(data, interval_type='double', confidence=0.95)\n", " This function takes in the data and computes a confidence interval\n", " \n", " Examples\n", " --------\n", " \n", " data = [4,3,2,5]\n", " center, width = conf_interval(data)\n", " print('The mean is {} +/- {}'.format(center, width))\n", " \n", " Parameters\n", " ----------\n", " data : list\n", " The list of data points\n", " interval_type : str\n", " What kind of confidence interval. Can be double, upper, lower.\n", " confidence : float\n", " The confidence of the interval\n", " Returns\n", " -------\n", " center, width\n", " Center is the mean of the data. Width is the width of the confidence interval. \n", " If a lower or upper is specified, width is the upper or lower value.\n", "\n", "FILE\n", " /home/jovyan/work/unit_15/che116-package/che116/stats.py\n", "\n", "\n" ] } ], "source": [ "help(che116.stats)" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 1 }