{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "**Disclaimer:** Most of the content in this notebook is coming from [www.scipy-lectures.org](http://www.scipy-lectures.org/intro/index.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# NumPy\n", "\n", "[NumPy](http://www.numpy.org/) is **the** fundamental package for scientific computing with Python. It is the basic building block of most data analysis in Python and contains highly optimized routines for creating and manipulating arrays.\n", "\n", "### Everything revolves around numpy arrays\n", "* **`Scipy`** adds a bunch of useful science and engineering routines that operate on numpy arrays. E.g. signal processing, statistical distributions, image analysis, etc.\n", "* **`pandas`** adds powerful methods for manipulating numpy arrays. Like data frames in R - but typically faster.\n", "* **`scikit-learn`** supports state-of-the-art machine learning over numpy arrays. Inputs and outputs of virtually all functions are numpy arrays.\n", "* If you want many more short exercises than the ones in this notebook - you can find 100 of them [here](http://www.labri.fr/perso/nrougier/teaching/numpy.100/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## NumPy arrays vs Python arrays\n", "\n", "NumPy arrays look very similar to Pyhon arrays." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "a = np.array([0, 1, 2, 3])\n", "a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**So, why is this useful?** NumPy arrays are memory-efficient containers that provide fast numerical operations. We can show this very quickly by running a simple computation on the same two arrays." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "L = range(1000)\n", "\n", "# Computing the power of the first 1000 numbers with Python arrays\n", "%timeit [i**2 for i in L]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array(L)\n", "\n", "# Computing the power of the first 1000 numbers with Numpy arrays\n", "%timeit a**2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Creating arrays\n", "\n", "## Manual construction of arrays\n", "\n", "You can create NumPy arrays manually almost in the same way as in Python in general.\n", "\n", "### 1-D" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([0, 1, 2, 3])\n", "a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2-D, 3-D, ..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b = np.array([[0, 1, 2], [3, 4, 5]]) # 2 x 3 array\n", "b" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "c = np.array([[[1], [2]], [[3], [4]]])\n", "c" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "c.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Functions for creating arrays\n", "\n", "In practice, we rarely enter items one by one. Therefore, NumPy offers many different helper functions.\n", "\n", "### Evenly spaced" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.arange(10) # 0 .. n-1 (!)\n", "a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b = np.arange(1, 9, 2) # start, end (exclusive), step\n", "b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ... or by number of points" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "c = np.linspace(0, 1, 6) # start, end, num-points\n", "c" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = np.linspace(0, 1, 5, endpoint=False)\n", "d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Common arrays" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.ones((3, 3)) # reminder: (3, 3) is a tuple\n", "a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b = np.zeros((2, 2))\n", "b" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "c = np.eye(3)\n", "c" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = np.diag(np.array([1, 2, 3, 4]))\n", "d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `np.random`: random numbers (Mersenne Twister PRNG)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.random.rand(4) # uniform in [0, 1]\n", "a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b = np.random.randn(4) # Gaussian\n", "b" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(1234) # Setting the random seed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Basic data types\n", "\n", "You may have noticed that, in some instances, array elements are displayed with a trailing dot (e.g. ``2.`` vs ``2``). This is due to a difference in the data-type used:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([1, 2, 3])\n", "a.dtype" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b = np.array([1., 2., 3.])\n", "b.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Different data-types allow us to store data more compactly in memory, but most of the time we simply work with floating point numbers. Note that, in the example above, NumPy auto-detects the data-type from the input.\n", "\n", "You can explicitly specify which data-type you want:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "c = np.array([1, 2, 3], dtype=float)\n", "c.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The **default** data type is floating point:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.ones((3, 3))\n", "a.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are also other types:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Complex\n", "d = np.array([1+2j, 3+4j, 5+6*1j])\n", "d.dtype" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Bool\n", "e = np.array([True, False, False, True])\n", "e.dtype" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Strings\n", "f = np.array(['Bonjour', 'Hello', 'Hallo',])\n", "f.dtype # <--- strings containing max. 7 letters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And much more...\n", "* ``int32``\n", "* ``int64``\n", "* ``uint32``\n", "* ``uint64``" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Indexing and slicing\n", "\n", "The items of an array can be accessed and assigned to the same way as other Python sequences (e.g. lists):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.arange(10)\n", "a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a[0], a[2], a[-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Warning**: Indices begin at 0, like other Python sequences (and C/C++). In contrast, in Fortran or Matlab, indices begin at 1." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The usual python idiom for reversing a sequence is supported:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a[::-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For multidimensional arrays, indexes are tuples of integers:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.diag(np.arange(3))\n", "a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a[1, 1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a[2, 1] = 10 # third line, second column\n", "a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Note\n", "\n", "* In 2D, the first dimension corresponds to **rows**, the second to **columns**.\n", "* For multidimensional ``a``, ``a[0]`` is interpreted by taking all elements in the unspecified dimensions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Slicing: Arrays, like other Python sequences can also be sliced" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.arange(10)\n", "a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a[2:9:3] # [start:end:step]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the last index is not included!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a[:4]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All three slice components are not required: by default, `start` is 0,\n", "`end` is the last and `step` is 1:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a[1:3]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a[::2]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a[3:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A small illustrated summary of NumPy indexing and slicing...\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also combine assignment and slicing:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.arange(10)\n", "a[5:] = 10\n", "a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b = np.arange(5)\n", "a[5:] = b[::-1]\n", "a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Fancy indexing\n", "\n", "NumPy arrays can be indexed with slices, but also with boolean or integer arrays (**masks**). This method is called *fancy indexing*. It creates **copies not views**.\n", "\n", "## Using boolean masks" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(3)\n", "a = np.random.randint(0, 21, 15)\n", "a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(a % 3 == 0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mask = (a % 3 == 0)\n", "extract_from_a = a[mask] # or, a[a%3==0]\n", "extract_from_a # extract a sub-array with the mask" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Indexing with a mask can be very useful to assign a new value to a sub-array:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a[a % 3 == 0] = -1\n", "a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Indexing with an array of integers" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.arange(0, 100, 10)\n", "a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Indexing can be done with an array of integers, where the same index is repeated several time:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a[[2, 3, 2, 4, 2]] # note: [2, 3, 2, 4, 2] is a Python list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "New values can be assigned with this kind of indexing:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a[[9, 7]] = -100\n", "a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The image below illustrates various fancy indexing applications:\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Elementwise operations\n", "\n", "NumPy provides many elementwise operations that are much quicker than comparable list comprehension in plain Python.\n", "\n", "## Basic operations\n", "\n", "With scalars:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([1, 2, 3, 4])\n", "a + 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "2**a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All arithmetic operates elementwise:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b = np.ones(4) + 1\n", "a - b" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a * b" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "j = np.arange(5)\n", "2**(j + 1) - j" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Array multiplication is not matrix multiplication" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "c = np.ones((3, 3))\n", "c * c # NOT matrix multiplication!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Matrix multiplication\n", "c.dot(c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other operations\n", "\n", "### Comparisons" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([1, 2, 3, 4])\n", "b = np.array([4, 2, 2, 4])\n", "a == b" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a > b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Array-wise comparisons:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([1, 2, 3, 4])\n", "b = np.array([4, 2, 2, 4])\n", "c = np.array([1, 2, 3, 4])\n", "np.array_equal(a, b)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.array_equal(a, c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Logical operations" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([1, 1, 0, 0], dtype=bool)\n", "b = np.array([1, 0, 1, 0], dtype=bool)\n", "np.logical_or(a, b)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.logical_and(a, b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Transcendental functions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.arange(5)\n", "np.sin(a)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.log(a)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.exp(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Shape mismatches" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.arange(4)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# NBVAL_SKIP\n", "a + np.array([1, 2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Broadcasting?* We'll return to that later." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Transposition" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.triu(np.ones((3, 3)), 1)\n", "a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The transposition is a view\n", "\n", "As a result, the following code **is wrong** and will **not make a matrix symmetric**:\n", "\n", " >>> a += a.T\n", "\n", "It will work for small arrays (because of buffering) but fail for large one, in unpredictable ways." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Basic reductions\n", "\n", "NumPy offers many quick functions to compute things like sum, mean, max etc.\n", "\n", "## Computing sums" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.array([1, 2, 3, 4])\n", "np.sum(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: Certain NumPy functions can be also written at the end of an Numpy array." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x.sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sum by rows and by columns:\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.array([[1, 1], [2, 2]])\n", "x" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x.sum(axis=0) # columns (first dimension)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x.sum(axis=1) # rows (second dimension)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other reductions\n", "\n", "Like, `mean`, `std`, `cumsum` etc. works the same way (and take ``axis=``)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Extrema" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.array([1, 3, 2])\n", "x.min()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x.max()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x.argmin() # index of minimum" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x.argmax() # index of maximum" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Logical operations" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.all([True, True, False])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.any([True, True, False])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Can be used for array comparisons:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.zeros((100, 100))\n", "np.any(a != 0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.all(a == a)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([1, 2, 3, 2])\n", "b = np.array([2, 2, 3, 2])\n", "c = np.array([6, 4, 4, 5])\n", "((a <= b) & (b <= c)).all()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Statistics" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.array([1, 2, 3, 1])\n", "y = np.array([[1, 2, 3], [5, 6, 1]])\n", "x.mean()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.median(x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.median(y, axis=-1) # last axis" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x.std() # full population standard dev." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "... and many more (best to learn as you go)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Broadcasting\n", "\n", "* Basic operations on ``numpy`` arrays (addition, etc.) are elementwise\n", "\n", "* This works on arrays of the same size. ***Nevertheless***, It's also possible to do operations on arrays of different sizes if *NumPy* can transform these arrays so that they all have the same size: this conversion is called **broadcasting**.\n", "\n", "The image below gives an example of broadcasting:\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's verify this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.tile(np.arange(0, 40, 10), (3, 1)).T\n", "a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b = np.array([0, 1, 2])\n", "a + b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have already used broadcasting without knowing it!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.ones((4, 5))\n", "a[0] = 2 # we assign an array of dimension 0 to an array of dimension 1\n", "a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An useful trick:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.arange(0, 40, 10)\n", "a.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = a[:, np.newaxis] # adds a new axis -> 2D array\n", "a.shape\n", "a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a + b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Broadcasting seems a bit magical, but it is actually quite natural to use it when we want to solve a problem whose output data is an array with more dimensions than input data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A lot of grid-based or network-based problems can also use broadcasting. For instance, if we want to compute the distance from the origin of points on a 10x10 grid, we can do:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x, y = np.arange(5), np.arange(5)[:, None]\n", "distance = np.sqrt(x ** 2 + y ** 2)\n", "distance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or in color:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Array shape manipulation\n", "\n", "Sometimes your arrays don't have the right shape. Also for this, NumPy has many solutions.\n", "\n", "## Flattening" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([[1, 2, 3], [4, 5, 6]])\n", "a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.ravel()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.T" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.T.ravel()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Higher dimensions: last dimensions ravel out \"first\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reshaping\n", "\n", "The inverse operation to flattening:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])\n", "a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.reshape(6, 2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or," ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.reshape((6, -1)) # unspecified (-1) value is inferred" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adding a dimension\n", "\n", "Indexing with the ``np.newaxis`` or ``None`` object allows us to add an axis to an array (you have seen this already above in the broadcasting section):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "z = np.array([1, 2, 3])\n", "z" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "z[:, np.newaxis]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "z[np.newaxis, :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dimension shuffling" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.arange(4*3*2).reshape(4, 3, 2)\n", "a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b = a.transpose(1, 2, 0)\n", "b" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Resizing\n", "\n", "Size of an array can be changed with ``ndarray.resize``:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.arange(4)\n", "a.resize((8,))\n", "a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Sorting data\n", "\n", "Sorting along an axis:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([[4, 3, 5], [1, 2, 1]])\n", "b = np.sort(a, axis=1)\n", "b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Important**: Note that the code above sorts each row separately!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In-place sort:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.sort(axis=1)\n", "a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sorting with fancy indexing:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([4, 3, 1, 2])\n", "j = np.argsort(a)\n", "j" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a[j]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finding minima and maxima:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([4, 3, 1, 2])\n", "j_max = np.argmax(a)\n", "j_min = np.argmin(a)\n", "j_max, j_min" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# `npy` - NumPy's own data format\n", "\n", "NumPy has its own binary format, not portable but with efficient I/O:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = np.ones((3, 3))\n", "np.save('pop.npy', data)\n", "data3 = np.load('pop.npy')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Summary - What do you need to know to get started?\n", "\n", "* Know how to create arrays : ``array``, ``arange``, ``ones``, ``zeros``.\n", "* Know the shape of the array with ``array.shape``, then use slicing to obtain different views of the array: ``array[::2]``, etc. Adjust the shape of the array using ``reshape`` or flatten it with ``ravel``.\n", "* Obtain a subset of the elements of an array and/or modify their values with masks \n", " ``a[a < 0] = 0``\n", "* Know miscellaneous operations on arrays, such as finding the mean or max (``array.max()``, ``array.mean()``).\n", "* For advanced use: master the indexing with arrays of integers, as well as broadcasting." ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 2 }