{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Numpy\n", "> This tutorial introduces numpy, a Python library for performing numerical computations in Python.\n", "\n", "- toc: false \n", "- badges: true\n", "- comments: true\n", "- categories: [numpy]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### In order to be able to use numpy we need to import the library using the special word `import`. Also, to avoid typing `numpy` every time we want to use one if its functions we can provide an alias using the special word `as`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Now, we have access to all the functions available in `numpy` by typing `np.name_of_function`. For example, the equivalent of `1 + 1` in Python can be done in `numpy`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.add(1,1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Although this might not seem very useful, however, even simple operations like this one, can be much quicker in `numpy` than in standard Python when using lots of numbers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> Tip: To access the documentation explaining how a function is used, its input parameters and output format we can press `Shift+Tab` after the function name\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.add" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### By default the result of a function or operation is shown underneath the cell containing the code. If we want to reuse this result for a later operation we can assign it to a variable:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.add(2,3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### We have just declared a variable `a` that holds the result of the function. We cannow use of display this variable, at any point of this notebook. For example we can show its contents by typing the variable name in a new cell:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise 1.1: Can you use the previous numpy function to add the values `34` and `29`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from check_answer import check_answer\n", "\n", "# Substitute the ? symbols by the correct expressions and values\n", "answ = ?\n", "\n", "check_answer(\"1.1\", answ)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### One of numpy's core concepts is the `array`, which is equivalent to numpy lists, but can be multidimensional and with much more functionality. To declare a numpy array explicity we do:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.array([1,2,3,4,5,6,7,8,9])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Most of the functions and operations defined in numpy can be applied to arrays. For example, with the previous `add` operation:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr1 = np.array([1,2,3,4])\n", "arr2 = np.array([3,4,5,6])\n", "\n", "np.add(arr1, arr2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### But a more simple and convenient notation can also be used:\n", "\n", "> Note: This operation detects that numpy arrays are being added and calls the previous function for efficient execution. Think that these arrays can contain large amounts of values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr1 + arr2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Arrays can be sliced and diced. We can get subsets of the arrays using the indexing notation which is `[start:end:stride]`. Let's see what this means:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])\n", "\n", "print(arr[5])\n", "print(arr[5:])\n", "print(arr[:5])\n", "print(arr[::2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Experiment playing with the indexes to understand the meaning of start, end and stride. What happend if you don't specify a start? What value numpy uses instead? Note that numpy indexes start on `0`, the same convention used in Python lists." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise 1.2: Can you declare a new array with contents [5,4,3,2,1] and slice it to select the last 3 items?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Substitute the ? symbols by the correct expressions and values\n", "arr = ?\n", "answ = arr[?:?]\n", "\n", "check_answer(\"1.2\", answ)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Indexes can also be negative, meaning that you start counting by the end. For example, to select the last 2 elements in an array we can do:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])\n", "\n", "arr[-2:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise 1.3: Can you figure out how to select all the elements in the previous array excluding the last one, [15]?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Substitute the ? symbols by the correct expressions and values\n", "answ = arr[?]\n", "\n", "check_answer(\"1.3\", answ)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise 1.4: What about doing excluding the last element from the list but selecting every 3rd element this time? Remember the third index indicates strides if used\n", "> Tip: Result should be ?`[0,3,6,9,12]`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "answ = arr[?:?:?]\n", "\n", "check_answer(\"1.4\", answ)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Numpy arrays can have multiple dimensions. For example, we define a 2-dimensional `(1,9)` array using nested square brackets [ ]. The convention in numpy is that the outer [ ] represent the first dimension and the inner [ ] contains the last dimension.\n", "\n", "\"drawing\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### For example the following cell declares a 2-dimensional array with shape (1, 9)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.array([[1,2,3,4,5,6,7,8,9]])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### To visualise the shape (dimensions) of a numpy array we can add the suffix `.shape` to an array expression or variable containing a numpy array." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr1 = np.array([1,2,3,4,5,6,7,8,9])\n", "arr2 = np.array([[1,2,3,4,5,6,7,8,9]])\n", "arr3 = np.array([[1],[2],[3],[4],[5],[6],[7],[8],[9]])\n", "\n", "arr1.shape, arr2.shape, arr3.shape, np.array([1,2,3]).shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Arrays can be reshaped into different shapes using the function `reshape`:\n", "\n", "> Note: The total number of elements has to be the same before and after the reshape operation, otherwise numpy will throw and error." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.array([1,2,3,4,5,6,7,8]).reshape((2,4))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### See the following example how a 9-element array can be reshaped into two dimensional arrays with different shapes.\n", "\n", "> Note: We are declaring the array and reshaping all in one line. This is called 'chaining' and allows us to conveniently perform multiple operations. The expressions are evaluated from left to right, so in this case first we create and array and then apply the reshape operation to the resulting array." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr1 = np.array([1,2,3,4,5,6,7,8,9]).reshape(1,9)\n", "arr2 = np.array([1,2,3,4,5,6,7,8,9]).reshape(9,1)\n", "arr3 = np.array([1,2,3,4,5,6,7,8,9]).reshape(3,3)\n", "\n", "arr1.shape, arr2.shape, arr3.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise 1.5: Can you declare a 1-dimensional array with 6 elements and then reshape it into a 2-dimensional array with shape (2,3)? You can try to do this in one or two lines." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# variable answ should contain your 2-dimensional array with shape (2,3)\n", "answ = ?\n", "\n", "check_answer(\"1.5\", answ.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### There are convenient functions in numpy for declaring common arrays without having to type all their elements:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr1 = np.arange(9)\n", "arr2 = np.ones((3,3))\n", "arr3 = np.zeros((2,2,2))\n", " \n", "print(arr1)\n", "print(\"--------\")\n", "print(arr2)\n", "print(\"--------\")\n", "print(arr3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise 1.6: Can you declare a 3-dimensional array with shape (5,3,3)? The contents of the array don't matter for this exercise so you can use any of the previously introduced functions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# variable answ should contain your 3-dimensional array with shape (5,3,3)\n", "answ = ?\n", "\n", "check_answer(\"1.6\", answ.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise 1.7: Can you create another array with the same shape and use the numpy function to add both arrays:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr1 = ?\n", "arr2 = ?\n", "\n", "answ = np.?\n", "\n", "check_answer(\"1.7\", answ.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Numpy has useful functions for calculating the mean, standard deviation and sum of the elements of an array." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr = np.arange(9).reshape((3,3))\n", "\n", "print(arr)\n", "print(\"--------\")\n", "print(\"Mean:\", np.mean(arr))\n", "print(\"Std Dev:\", np.std(arr))\n", "print(\"Sum:\",np.sum(arr))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### These operation can be performed along specific axis." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr = np.arange(9).reshape((3,3))\n", "\n", "print(arr)\n", "print(\"--------\")\n", "print(\"Sum along the vertical axis:\", np.sum(arr, axis=0))\n", "print(\"--------\")\n", "print(\"Sum along the horizontal axis:\", np.sum(arr, axis=1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise 1.8: Declare a 2-dimensional array with shape (20,10) all filled with ones. Then calculate the sum of its values along the first dimension (axis=0). The result has to be a 1-dimensional array with shape (10,)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# variable answ should contain your 3-dimensional array with shape (5,3,3)\n", "answ = ?\n", "\n", "check_answer(\"1.8\", answ)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Numpy arrays can contain numerical values of different types. These types can be divided in these groups:\n", "\n", " * Integers\n", " * Unsigned\n", " * 8 bits: `uint8`\n", " * 16 bits: `uint16`\n", " * 32 bits: `uint32`\n", " * 64 bits: `uint64`\n", " * Signed\n", " * 8 bits: `int8`\n", " * 16 bits: `int16`\n", " * 32 bits: `int32`\n", " * 64 bits: `int64`\n", "\n", "* Floats\n", " * 32 bits: `float32`\n", " * 64 bits: `float64`\n", " \n", "#### We can look up the type of an array by using the `.dtype` suffix." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr = np.ones((10,10,10))\n", "\n", "arr.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### To specify the type of an array, we can add the `dtype` parameter to the declaration expression." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr = np.ones((10,10,10), dtype=np.uint8)\n", "\n", "arr.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### We can also change the type of an existing array using the `.astype` function." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr = np.ones((10,10,10))\n", "arr = arr.astype(np.float32)\n", "\n", "# Or all in one line\n", "arr = np.ones((10,10,10)).astype(np.float32)\n", "\n", "arr.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise 1.9: Change the type of the following array to `int16`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "answ = np.arange(10)\n", "\n", "answ = answ#Your code goes here\n", "\n", "check_answer(\"1.9\", answ.dtype)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Broadcasting: numpy is set up internally in a way that allows performing array operations efficiently. Sometimes it is not entirely obvious what is going on. For example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.zeros((10,10))\n", "\n", "a = a + 1\n", "\n", "a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The previous operation declares a 10x10 array, assigns that to a variable `a` and then we add `1` to this variable. However, `1` is a single value and is not even an array so it is not entirely clear what is going on. Broadcasting is the ability in numpy to arrays to replicate or promoted arrays involved in operations to match their shapes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.arange(9).reshape((3,3))\n", "\n", "b = np.arange(3)\n", "\n", "a + b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise 1.10: Can you declare a new 1-dimensional array with shape (10,) all filled with `2` values?\n", "\n", "> Tip: _We have just seen an example of broadcasting by adding a single value to an array. Broadcasting also works with other operations, such as multiplication or division, so you can complete this exercise declaring an initial array containing all zeros or ones and then using one operation to modify all its values._" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "answ = ?\n", "\n", "check_answer(\"1.10\", answ)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Boolean values: Numpy arrays normally store numeric values but they can also contain boolean values. Booleans is a data type that can have two possible values: [`True`, `False`]. For example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr = np.array([True, False, True])\n", "\n", "arr, arr.shape, arr.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### We can operate with boolean arrays using the numpy functions for performing logical operations such as `and`, `or` and others." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr1 = np.array([True, True, False, False])\n", "arr2 = np.array([True, False, True, False])\n", "\n", "print(np.logical_and(arr1, arr2))\n", "print(np.logical_or(arr1, arr2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### These operations are conveniently offered by numpy with the `*` and `+`.\n", "\n", "> Note: Here the `*` and `+` symbols are not performing multiplication and addition as with numerical arrays. Numpy detects the type of the arrays involved in the operation and changes the behaviour of these operators. This ability to change the behaviour of operators depending on the situation is called 'operator overloading' in programming languages." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(arr1 * arr2)\n", "print(arr1 + arr2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Boolean arrays are often the result of comparing a numerical arrays with certain values. This is sometimes useful to detect values that are equal, below or above a number in a numpy array. For example, is we want to know which values in an array are equal to 1 and the values that are greater than 2 we can do:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr = np.array([1, 3, 5, 1, 6, 3, 1, 5, 7, 1])\n", "\n", "print(arr == 1)\n", "print(arr > 2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise 1.11: For this exercise you'll need to combine array comparisons and logical operators to find out the values in the following array that are greater than `3` and less than `7`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr = np.array([1, 3, 5, 1, 6, 3, 1, 5, 7, 1])\n", "\n", "answ = ?\n", "\n", "check_answer(\"1.11\", answ)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Boolean types are quite handy for indexing and selecting parts of images as we will see later. Many numpy functions also work with Boolean types." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr = np.array([1,2,3,4,5,6,7,8,9])\n", "mask = np.array([True,False,True,False,True,False,True,False,True])\n", "\n", "arr[mask]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise 1.12: Based on the previous example how would you select the values in the array that are greater than `3` and less than `7`. In this case you want a new smaller array containing just the values that are within that range." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr = np.array([1, 3, 5, 1, 6, 3, 1, 5, 7, 1])\n", "\n", "answ = ?\n", "\n", "check_answer(\"1.12\", answ)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Depending of the language that you have used before this behaviour in Python might strike you:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([0,0,0])\n", "\n", "# We make a copy of array a with name b\n", "b = a\n", "\n", "# We modify the first element of b\n", "b[0] = 1\n", "\n", "print(a)\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Both arrays have been modified. This is in fact because a and b are references to the same underlying array. If you want to have variables with independent arrays you'll have to use the `b = np.copy(a)` function to explicitly make a copy of the array." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([0,0,0])\n", "\n", "# We explicitly make a copy of array a with name b\n", "b = np.copy(a)\n", "\n", "# We modify the first element of b\n", "b[0] = 1\n", "\n", "print(a)\n", "print(b)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }