{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Numerical operations with NumPy\n", "\n", "Now that we know how to do the basics of using NumPy's arrays, we can move on to doing some mathematical operations with them. As mentioned in the previous secion on benchmarking, NumPy arrays have additional functionality to Python lists when it comes to manipulating the entries.\n", "\n", "For comparison on the semantics, compare what happens when we multiply a list and an array by some factor:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "python_list = [1, 2, 3, 4, 5, 6]\n", "python_list * 3" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 3, 6, 9, 12, 15, 18])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "numpy_array = np.array([1, 2, 3, 4, 5, 6])\n", "numpy_array * 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see that multiply means very different things to the two types:\n", " - A python list interprets multiply as a duplicating append to the list so you end up with the entries repeated\n", " - A NumPy array takes the requested operation and applies to each of its entries in turn.\n", "\n", "This is at the very core of how NumPy makes mathematical operations fast." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Elementwise operations\n", "\n", "As we just saw, any mathematical operation will be applied to the whole array elementwise:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2, 3, 4, 5])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.array([1, 2, 3, 4])\n", "a + 1" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 2, 4, 8, 16])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2**a # two to the power of a" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-1., 0., 1., 2.])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = np.ones(4) + 1 # [2, 2, 2, 2]\n", "a - b" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2., 4., 6., 8.])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a * b" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 2, 3, 6, 13, 28])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "j = np.arange(5)\n", "2**(j + 1) - j" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Do note however that array multiplication is not the same matrix multiplication:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 1., 1.],\n", " [1., 1., 1.],\n", " [1., 1., 1.]])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c = np.ones((3, 3))\n", "c * c # This will do element-wise multiplication" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To do matrix multiplication you must either use the `dot()` method to calculate the dot product:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[3., 3., 3.],\n", " [3., 3., 3.],\n", " [3., 3., 3.]])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c.dot(c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or use the new `@` operator which was added in Python 3.5:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[3., 3., 3.],\n", " [3., 3., 3.],\n", " [3., 3., 3.]])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c @ c" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "Try simple arithmetic elementwise operations: add even elements with odd elements" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Comparisons between arrays give an array containing booleans:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([False, True, False, True])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.array([1, 2, 3, 4])\n", "b = np.array([4, 2, 2, 4])\n", "a == b" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([False, False, True, False])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a > b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want to perform an array-wise comparison, you can use `np.array_equal()`.\n", "\n", "NumPy also has a series of more complicated function which can be applied to an array such as:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0. , 0.84147098, 0.90929743, 0.14112001, -0.7568025 ])" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.arange(5)\n", "np.sin(a)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0. , 1. , 1.41421356, 1.73205081, 2. ])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sqrt(a)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1. , 2.71828183, 7.3890561 , 20.08553692, 54.59815003])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.exp(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "Look at the help for `np.allclose()`. When might this be useful?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic reductions\n", "\n", "A *reduction* in programming is taking come compound object and reducing it down to some basic property of itself. For example the sum of all the elements of an array is a reduction, as is computing its maximum value.\n", "\n", "The sum of an array can be calculated with the `sum()` method or the `np.sum()` function:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.array([1, 2, 3, 4])\n", "np.sum(x)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The maxiumum and minimum can also be calculated:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.array([1, 3, 2])\n", "x.min()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.max()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.argmin() # index of minimum" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.argmax() # index of maximum" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Logical operations can be performed over the whole array. Like the built-in Python functions, `all()` returns whether *all* the items in the array are `True` and `any()` returns whether *any* of the items are `True`:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.all([True, True, False])" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.any([True, True, False])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, there are some simple statistics that can be gleaned:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.75" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.array([1, 2, 3, 1])\n", "x.mean()" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.5" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.median(x)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.82915619758885" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.std() # full population standard dev." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "What is the difference between `sum()` and `cumsum()`?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Broadcasting\n", "\n", "Basic operations on numpy arrays (addition, etc.) are elementwise, this means that if you are operating on two arrays, they must be the same size.\n", "\n", "Nevertheless, It’s also possible to do operations on arrays of different sizes if NumPy can transform these arrays so that they all have the same size: this conversion is called broadcasting.\n", "\n", "Here's an example to demonstrate:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 0, 0],\n", " [10, 10, 10],\n", " [20, 20, 20],\n", " [30, 30, 30]])" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.tile(np.arange(0, 40, 10), (3, 1)).transpose()\n", "a" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = np.array([0, 1, 2])\n", "b" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(4, 3)" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.shape" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(3,)" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see that the shapes of the two arrays are different, one is 4×3 and the other is one-dimensional of size 3.\n", "\n", "NumPy can look at the two arrays and see that width of `a` is `3` and the width of `b` is `3` and so interprets that you want to match those together. The [rule works](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html#general-broadcasting-rules) by checking that the lengths of the trailing dimensions\n", "1. are equal, or\n", "2. one of them is 1.\n", "\n", "In out case here, the trailing dimension of `a` is `3` and the trailing dimension of `b` is also `3` so broadcasting can occur:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2],\n", " [10, 11, 12],\n", " [20, 21, 22],\n", " [30, 31, 32]])" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a + b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have already used broadcasting without knowing it!:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "array([[2., 2., 2., 2., 2.],\n", " [1., 1., 1., 1., 1.],\n", " [1., 1., 1., 1., 1.],\n", " [1., 1., 1., 1., 1.]])" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.ones((4, 5))\n", "a[0] = 2 # we assign an array of dimension 0 to an array of dimension 1\n", "a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "Try creating a number of different NumPy arrays of different sizes and dimensions and try broadcasting them amongst each other. Make sure you understand the rules concerning what can be broadcast and what cannot." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What you should know\n", "\n", "- Know how to create arrays : `array`, `arange`, `ones`, `zeros`.\n", "\n", "- Know the shape of the array with `array.shape`, then use slicing to obtain different views of the array: `array[::2]`, etc. Adjust the shape of the array using `reshape` or flatten it with `ravel` and understand the difference between views and copies.\n", "\n", "- Obtain a subset of the elements of an array and/or modify their values with masks\n", " ```\n", " a[a < 0] = 0\n", " ```\n", "- Know miscellaneous operations on arrays, such as finding the mean or max (`array.max()`, `array.mean()`). No need to retain everything, but have the reflex to search in the documentation (online docs, `help()`, `lookfor()`).\n", "\n", "- For advanced use: master the indexing with arrays of integers, as well as broadcasting. Know more NumPy functions to handle various array operations.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# What's next\n", "\n", "This is the end of the prepared material for this course but there is plenty more good material online. There are two main routes to take from here:\n", "\n", "- If you want to learn more about the numerical side of things, with a focus on SciPy and NumPy, look through the free notes at [scipy-lectures.org](http://www.scipy-lectures.org/), probably starting at chapter 1.5.\n", "\n", "- If you want to learn more about pandas then the best book for that is [Python for Data Analysis, 2nd Edition](http://shop.oreilly.com/product/0636920050896.do) by Wes McKinney, one of the authors of pandas. For free material, there are some excellent tutorials on the [pandas website](http://pandas.pydata.org/pandas-docs/stable/tutorials.html)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" } }, "nbformat": 4, "nbformat_minor": 2 }