{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Intro to Numpy\n", "\n", "## Objectives\n", "\n", "- Make new NumPy arrays\n", "- Perform arithmetic with arrays\n", "- Use common array methods and NumPy functions\n", "- Filter arrays using boolean arrays and \"fancy\" indexing" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# add the \"as np\" so we don't have to type \"numpy\" each time\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Making Arrays\n", "\n", "At the beginning of the Python lesson we loaded data into NumPy from a text file.\n", "Let's look at a few other ways to make arrays.\n", "If you've got data in Python lists those can be converted to arrays:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# list like yesterday\n", "odds = [1, 3, 5, 7]\n", "odds" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# pass the list to np.array\n", "odds_arr = np.array(odds)\n", "odds_arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lists and arrays have some similarities:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print('index', odds[1], odds_arr[1])\n", "print('slice', odds[1:3], odds_arr[1:3])\n", "print('length', len(odds), len(odds_arr))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for x in odds:\n", " print(x)\n", "for x in odds_arr:\n", " print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But some things are different:" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### Question\n", "\n", "How would you add 1 to every number in the `odds` list and make a new list with those (even) numbers? (You don't need to write code, isntead talk about how you'd do this.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "evens = []\n", "for x in odds:\n", " y = x + 1\n", " evens.append(y)\n", "evens" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With arrays this is simpler:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "even_arr = odds_arr + 1\n", "even_arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now suppose we want to add these odd and even numbers together. With lists that's another loop, but with arrays:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "even_arr + odds_arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Suppose we did need to loop over an array to perform a complex calculation and then store the result in a new array. We can't append to arrays, they have a fixed size." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "odds_arr.append(9)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We need to create the new array with the appropriate size and shape. (By the way, you can check the size and shape of an array with the `.size` and `.shape` attributes.)\n", "\n", "Functions for creating new arrays of same given shape are `np.empty`, `np.ones`, `np.zeros`, and `np.arange`. The one thing you *have* to tell these functions is how big the array needs to be." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "np.ones(5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "np.zeros((3, 3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Questions\n", "\n", "Why did I put an extra pair of parentheses in the call to `np.zeros`?\n", "\n", "Which shape value is for the number of rows in the array, and which is for the number of columns?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "np.zeros((4, 2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`np.arange` doesn't take a shape, instead it takes start, stop, and step values to make an array of numbers that cover some range:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "r = np.arange(0, 20, 2)\n", "r" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "See even more functions for making arrays at http://docs.scipy.org/doc/numpy/reference/routines.array-creation.html." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Array Methods and NumPy Functions" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "We're going to load some specialy prepared precipitation data derived from the `precip_yearly.csv` file. It's saved in a special cross-platform NumPy binary format. Learn more about NumPy's special binary array storage format at http://docs.scipy.org/doc/numpy/reference/routines.io.html." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "store = np.load('mean_ca_precip.npz')\n", "years = store['years']\n", "precip = store['precip']" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "print(years)\n", "print(precip)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Arrays have methods attached to them for doing calculations and transformations with data. For example you can get the mean of an array (as seen with Pandas):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "precip.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the NumPy package (`np`) has functions that work on arrays, e.g. to calculate a logarithm:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "np.log(precip)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 1\n", "\n", "Use IPython's tab completion feature to compare available array methods (e.g. `precip.mean`) with available NumPy functions (e.g. `np.log`). Do you notice any differences?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are a massive number of routines in NumPy. For more info see http://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs and http://docs.scipy.org/doc/numpy/reference/routines.html." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Boolean Indexing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's examine which years had more than average precipitation and which had below average.\n", "First we'll need the average precipitation:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "avg = precip.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we can compare that average to the values in the `precip` array:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "precip > avg" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The comparison creates a new array of boolean (True/False) values that is True where the precipitation that year was above average and False where the precipitation was below average.\n", "We can use the boolean arrays to pull values out of other arrays:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "above = precip > avg\n", "print(precip[above])\n", "print(years[above])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use boolean indexing without first assigning them to an array:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "years[precip < avg]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But if you do have a boolean array and what its opposite you can use the `~` operator:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "years[~above]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also use boolean indexing to make assignments to arrays:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "arr = np.random.normal(size=(4, 4))\n", "arr" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "arr[arr < 0] = 0\n", "arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 2\n", "\n", "Write a function that clips data in an array to given low and high values.\n", "For example, calling `clip(arr, 0, 1)` should return an array where values lower than zero have been replaced with 0 and values higher than one have been replaced with 1." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.2" } }, "nbformat": 4, "nbformat_minor": 0 }