{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Numerical Python with numpy\n", "\n", "NumPy ('Numerical Python') is the defacto standard module for doing numerical work in Python. Its main feature is its array data type which allows very compact and efficient storage of homogenous (of the same type) data.\n", "\n", "A lot of the material in this section is based on [SciPy Lecture Notes](http://www.scipy-lectures.org/intro/numpy/array_object.html) ([CC-by 4.0](http://www.scipy-lectures.org/preface.html#license)).\n", "\n", "As you go through this material, you'll likely find it useful to refer to the [NumPy documentation](https://docs.scipy.org/doc/numpy/), particularly the [array objects](https://docs.scipy.org/doc/numpy/reference/arrays.html) section.\n", "\n", "As with `pandas` there is a standard convention for importing `numpy`, and that is as `np`:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have access to the `numpy` package we can start using its features.\n", "\n", "## Creating arrays\n", "\n", "In many ways a NumPy array can be treated like a standard Python `list` and much of the way you interact with it is identical. Given a list, you can create an array as follows:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 2 3 4 5 6 7 8]\n" ] } ], "source": [ "python_list = [1, 2, 3, 4, 5, 6, 7, 8]\n", "numpy_array = np.array(python_list)\n", "print(numpy_array)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ndim give the number of dimensions\n", "numpy_array.ndim" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(8,)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# the shape of an array is a tuple of its length in each dimension. In this case it is only 1-dimensional\n", "numpy_array.shape" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "8" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# as in standard Python, len() gives a sensible answer\n", "len(numpy_array)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2 3]\n", " [4 5 6]]\n" ] } ], "source": [ "nested_list = [[1, 2, 3], [4, 5, 6]]\n", "two_dim_array = np.array(nested_list)\n", "print(two_dim_array)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "two_dim_array.ndim" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2, 3)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "two_dim_array.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's very common when working with data to not have it already in a Python list but rather to want to create some data from scratch. `numpy` comes with a whole suite of functions for creating arrays. We will now run through some of the most commonly used." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first is `np.arange` (meaning \"array range\") which works in a vary similar fashion the the standard Python `range()` function, including how it defaults to starting from zero, doesn't include the number at the top of the range and how it allows you to specify a 'step:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(10) #0 .. n-1 (!)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 3, 5, 7])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(1, 9, 2) # start, end (exclusive), step" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next up is the `np.linspace` (meaning \"linear space\") which generates a given floating point numbers starting from the first argument up to the second argument. The third argument defines how many numbers to create:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0. , 0.2, 0.4, 0.6, 0.8, 1. ])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.linspace(0, 1, 6) # start, end, num-points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note how it included the end point unlike `arange()`. You can change this feature by using the `endpoint` argument:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0. , 0.2, 0.4, 0.6, 0.8])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.linspace(0, 1, 5, endpoint=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`np.ones` creates an n-dimensional array filled with the value `1.0`. The argument you give to the function defines the shape of the array:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 1., 1.],\n", " [1., 1., 1.],\n", " [1., 1., 1.]])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.ones((3, 3)) # reminder: (3, 3) is a tuple" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Likewise, you can create an array of any size filled with zeros:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0., 0.],\n", " [0., 0.]])" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.zeros((2, 2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `np.eye` (referring to the matematical identity matrix, commonly labelled as `I`) creates a square matrix of a given size with `1.0` on the diagonal and `0.0` elsewhere:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 0., 0.],\n", " [0., 1., 0.],\n", " [0., 0., 1.]])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.eye(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `np.diag` creates a square matrix with the given values on the diagonal and `0.0` elsewhere:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 0, 0, 0],\n", " [0, 2, 0, 0],\n", " [0, 0, 3, 0],\n", " [0, 0, 0, 4]])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.diag([1, 2, 3, 4])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, you can fill an array with random numbers, specifying the seed if you want reproducibility:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.37454012, 0.95071431, 0.73199394, 0.59865848])" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.seed(42)\n", "\n", "np.random.rand(4) # uniform in [0, 1]" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-0.23415337, -0.23413696, 1.57921282, 0.76743473])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.randn(4) # Gaussian" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercises\n", "\n", "- Experiment with `arange`, `linspace`, `ones`, `zeros`, `eye` and `diag`.\n", "- Create different kinds of arrays with random numbers.\n", "- Look at the function `np.empty`. What does it do? When might this be useful?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reshaping arrays\n", "\n", "Behind the scenes, a multi-dimensional NumPy `array` is just stored as a linear segment of memory. The fact that it is presented as having more than one dimension is simply a layer on top of that (sometimes called a *view*). This means that we can simply change that interpretive layer and change the shape of an array very quickly (i.e without NumPy having to copy any data around).\n", "\n", "This is mostly done with the `reshape()` method on the array object:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_array = np.arange(16)\n", "my_array" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(16,)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_array.shape" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3, 4, 5, 6, 7],\n", " [ 8, 9, 10, 11, 12, 13, 14, 15]])" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_array.reshape((2, 8))" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3],\n", " [ 4, 5, 6, 7],\n", " [ 8, 9, 10, 11],\n", " [12, 13, 14, 15]])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_array.reshape((4, 4))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that if you check, `my_array.shape` will still return `(16,)` as `reshaped` is simply a *view* on the original data, it hasn't actually *changed* it. If you want to edit the original object in-place then you can use the `resize()` method.\n", "\n", "You can also transpose an array using the `transpose()` method which mirrors the array along its diagonal:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 8],\n", " [ 1, 9],\n", " [ 2, 10],\n", " [ 3, 11],\n", " [ 4, 12],\n", " [ 5, 13],\n", " [ 6, 14],\n", " [ 7, 15]])" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_array.reshape((2, 8)).transpose()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 4, 8, 12],\n", " [ 1, 5, 9, 13],\n", " [ 2, 6, 10, 14],\n", " [ 3, 7, 11, 15]])" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_array.reshape((4,4)).transpose()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercises\n", "\n", "Using the NumPy documentation at https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html, to create, **in one line** a NumPy array which looks like:\n", "\n", "```python\n", "[10, 60, 20, 70, 30, 80, 40, 90, 50, 100]\n", "```\n", "\n", "Hint: you will need to use `transpose()`, `reshape()` and `arange()` as well as one new function from the \"Shape manipulation\" section of the documentation. Can you find a method which uses less than 4 function calls?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic data types\n", "\n", "You may have noticed that, in some instances, array elements are displayed with a trailing dot (e.g. `2.` vs `2`). This is due to a difference in the data-type used:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('int64')" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.array([1, 2, 3])\n", "a.dtype" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = np.array([1., 2., 3.])\n", "b.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Different data-types allow us to store data more compactly in memory, but most of the time we simply work with floating point numbers. Note that, in the example above, NumPy auto-detects the data-type from the input." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c = np.array([1, 2, 3], dtype=float)\n", "c.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The default data type for most arrays is 64 bit floating point." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = np.ones((3, 3))\n", "d.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are other data types as well:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('complex128')" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "e = np.array([1+2j, 3+4j, 5+6*1j])\n", "e.dtype" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('bool')" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f = np.array([True, False, False, True])\n", "f.dtype" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('