{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Disclaimer:** Most of the content in this notebook is coming from [www.scipy-lectures.org](http://www.scipy-lectures.org/intro/index.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# NumPy\n",
"\n",
"[NumPy](http://www.numpy.org/) is **the** fundamental package for scientific computing with Python. It is the basic building block of most data analysis in Python and contains highly optimized routines for creating and manipulating arrays.\n",
"\n",
"### Everything revolves around numpy arrays\n",
"* **`Scipy`** adds a bunch of useful science and engineering routines that operate on numpy arrays. E.g. signal processing, statistical distributions, image analysis, etc.\n",
"* **`pandas`** adds powerful methods for manipulating numpy arrays. Like data frames in R - but typically faster.\n",
"* **`scikit-learn`** supports state-of-the-art machine learning over numpy arrays. Inputs and outputs of virtually all functions are numpy arrays.\n",
"* If you want many more short exercises than the ones in this notebook - you can find 100 of them [here](http://www.labri.fr/perso/nrougier/teaching/numpy.100/)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## NumPy arrays vs Python arrays\n",
"\n",
"NumPy arrays look very similar to Pyhon arrays."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"a = np.array([0, 1, 2, 3])\n",
"a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**So, why is this useful?** NumPy arrays are memory-efficient containers that provide fast numerical operations. We can show this very quickly by running a simple computation on the same two arrays."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"L = range(1000)\n",
"\n",
"# Computing the power of the first 1000 numbers with Python arrays\n",
"%timeit [i**2 for i in L]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.array(L)\n",
"\n",
"# Computing the power of the first 1000 numbers with Numpy arrays\n",
"%timeit a**2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Creating arrays\n",
"\n",
"## Manual construction of arrays\n",
"\n",
"You can create NumPy arrays manually almost in the same way as in Python in general.\n",
"\n",
"### 1-D"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.array([0, 1, 2, 3])\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2-D, 3-D, ..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"b = np.array([[0, 1, 2], [3, 4, 5]]) # 2 x 3 array\n",
"b"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"b.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"c = np.array([[[1], [2]], [[3], [4]]])\n",
"c"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"c.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Functions for creating arrays\n",
"\n",
"In practice, we rarely enter items one by one. Therefore, NumPy offers many different helper functions.\n",
"\n",
"### Evenly spaced"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.arange(10) # 0 .. n-1 (!)\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"b = np.arange(1, 9, 2) # start, end (exclusive), step\n",
"b"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ... or by number of points"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"c = np.linspace(0, 1, 6) # start, end, num-points\n",
"c"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"d = np.linspace(0, 1, 5, endpoint=False)\n",
"d"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Common arrays"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.ones((3, 3)) # reminder: (3, 3) is a tuple\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"b = np.zeros((2, 2))\n",
"b"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"c = np.eye(3)\n",
"c"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"d = np.diag(np.array([1, 2, 3, 4]))\n",
"d"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `np.random`: random numbers (Mersenne Twister PRNG)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.random.rand(4) # uniform in [0, 1]\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"b = np.random.randn(4) # Gaussian\n",
"b"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(1234) # Setting the random seed"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Basic data types\n",
"\n",
"You may have noticed that, in some instances, array elements are displayed with a trailing dot (e.g. ``2.`` vs ``2``). This is due to a difference in the data-type used:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.array([1, 2, 3])\n",
"a.dtype"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"b = np.array([1., 2., 3.])\n",
"b.dtype"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Different data-types allow us to store data more compactly in memory, but most of the time we simply work with floating point numbers. Note that, in the example above, NumPy auto-detects the data-type from the input.\n",
"\n",
"You can explicitly specify which data-type you want:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"c = np.array([1, 2, 3], dtype=float)\n",
"c.dtype"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The **default** data type is floating point:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.ones((3, 3))\n",
"a.dtype"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are also other types:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Complex\n",
"d = np.array([1+2j, 3+4j, 5+6*1j])\n",
"d.dtype"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Bool\n",
"e = np.array([True, False, False, True])\n",
"e.dtype"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Strings\n",
"f = np.array(['Bonjour', 'Hello', 'Hallo',])\n",
"f.dtype # <--- strings containing max. 7 letters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And much more...\n",
"* ``int32``\n",
"* ``int64``\n",
"* ``uint32``\n",
"* ``uint64``"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Indexing and slicing\n",
"\n",
"The items of an array can be accessed and assigned to the same way as other Python sequences (e.g. lists):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.arange(10)\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a[0], a[2], a[-1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Warning**: Indices begin at 0, like other Python sequences (and C/C++). In contrast, in Fortran or Matlab, indices begin at 1."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The usual python idiom for reversing a sequence is supported:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a[::-1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For multidimensional arrays, indexes are tuples of integers:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.diag(np.arange(3))\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a[1, 1]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a[2, 1] = 10 # third line, second column\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a[1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Note\n",
"\n",
"* In 2D, the first dimension corresponds to **rows**, the second to **columns**.\n",
"* For multidimensional ``a``, ``a[0]`` is interpreted by taking all elements in the unspecified dimensions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Slicing: Arrays, like other Python sequences can also be sliced"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.arange(10)\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a[2:9:3] # [start:end:step]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that the last index is not included!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a[:4]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"All three slice components are not required: by default, `start` is 0,\n",
"`end` is the last and `step` is 1:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a[1:3]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a[::2]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a[3:]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A small illustrated summary of NumPy indexing and slicing...\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also combine assignment and slicing:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.arange(10)\n",
"a[5:] = 10\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"b = np.arange(5)\n",
"a[5:] = b[::-1]\n",
"a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Fancy indexing\n",
"\n",
"NumPy arrays can be indexed with slices, but also with boolean or integer arrays (**masks**). This method is called *fancy indexing*. It creates **copies not views**.\n",
"\n",
"## Using boolean masks"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(3)\n",
"a = np.random.randint(0, 21, 15)\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"(a % 3 == 0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mask = (a % 3 == 0)\n",
"extract_from_a = a[mask] # or, a[a%3==0]\n",
"extract_from_a # extract a sub-array with the mask"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Indexing with a mask can be very useful to assign a new value to a sub-array:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a[a % 3 == 0] = -1\n",
"a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Indexing with an array of integers"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.arange(0, 100, 10)\n",
"a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Indexing can be done with an array of integers, where the same index is repeated several time:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a[[2, 3, 2, 4, 2]] # note: [2, 3, 2, 4, 2] is a Python list"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"New values can be assigned with this kind of indexing:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a[[9, 7]] = -100\n",
"a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The image below illustrates various fancy indexing applications:\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Elementwise operations\n",
"\n",
"NumPy provides many elementwise operations that are much quicker than comparable list comprehension in plain Python.\n",
"\n",
"## Basic operations\n",
"\n",
"With scalars:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.array([1, 2, 3, 4])\n",
"a + 1"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"2**a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"All arithmetic operates elementwise:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"b = np.ones(4) + 1\n",
"a - b"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a * b"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"j = np.arange(5)\n",
"2**(j + 1) - j"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Array multiplication is not matrix multiplication"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"c = np.ones((3, 3))\n",
"c * c # NOT matrix multiplication!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Matrix multiplication\n",
"c.dot(c)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Other operations\n",
"\n",
"### Comparisons"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.array([1, 2, 3, 4])\n",
"b = np.array([4, 2, 2, 4])\n",
"a == b"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a > b"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Array-wise comparisons:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.array([1, 2, 3, 4])\n",
"b = np.array([4, 2, 2, 4])\n",
"c = np.array([1, 2, 3, 4])\n",
"np.array_equal(a, b)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.array_equal(a, c)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Logical operations"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.array([1, 1, 0, 0], dtype=bool)\n",
"b = np.array([1, 0, 1, 0], dtype=bool)\n",
"np.logical_or(a, b)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.logical_and(a, b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transcendental functions"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.arange(5)\n",
"np.sin(a)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.log(a)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.exp(a)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Shape mismatches"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.arange(4)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# NBVAL_SKIP\n",
"a + np.array([1, 2])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Broadcasting?* We'll return to that later."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transposition"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.triu(np.ones((3, 3)), 1)\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a.T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The transposition is a view\n",
"\n",
"As a result, the following code **is wrong** and will **not make a matrix symmetric**:\n",
"\n",
" >>> a += a.T\n",
"\n",
"It will work for small arrays (because of buffering) but fail for large one, in unpredictable ways."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Basic reductions\n",
"\n",
"NumPy offers many quick functions to compute things like sum, mean, max etc.\n",
"\n",
"## Computing sums"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x = np.array([1, 2, 3, 4])\n",
"np.sum(x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: Certain NumPy functions can be also written at the end of an Numpy array."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x.sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sum by rows and by columns:\n",
"\n",
""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x = np.array([[1, 1], [2, 2]])\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x.sum(axis=0) # columns (first dimension)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x.sum(axis=1) # rows (second dimension)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Other reductions\n",
"\n",
"Like, `mean`, `std`, `cumsum` etc. works the same way (and take ``axis=``)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Extrema"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x = np.array([1, 3, 2])\n",
"x.min()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x.max()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x.argmin() # index of minimum"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x.argmax() # index of maximum"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Logical operations"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.all([True, True, False])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.any([True, True, False])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Can be used for array comparisons:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.zeros((100, 100))\n",
"np.any(a != 0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.all(a == a)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.array([1, 2, 3, 2])\n",
"b = np.array([2, 2, 3, 2])\n",
"c = np.array([6, 4, 4, 5])\n",
"((a <= b) & (b <= c)).all()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Statistics"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x = np.array([1, 2, 3, 1])\n",
"y = np.array([[1, 2, 3], [5, 6, 1]])\n",
"x.mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.median(x)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.median(y, axis=-1) # last axis"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x.std() # full population standard dev."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"... and many more (best to learn as you go)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Broadcasting\n",
"\n",
"* Basic operations on ``numpy`` arrays (addition, etc.) are elementwise\n",
"\n",
"* This works on arrays of the same size. ***Nevertheless***, It's also possible to do operations on arrays of different sizes if *NumPy* can transform these arrays so that they all have the same size: this conversion is called **broadcasting**.\n",
"\n",
"The image below gives an example of broadcasting:\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's verify this:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.tile(np.arange(0, 40, 10), (3, 1)).T\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"b = np.array([0, 1, 2])\n",
"a + b"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have already used broadcasting without knowing it!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.ones((4, 5))\n",
"a[0] = 2 # we assign an array of dimension 0 to an array of dimension 1\n",
"a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An useful trick:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.arange(0, 40, 10)\n",
"a.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = a[:, np.newaxis] # adds a new axis -> 2D array\n",
"a.shape\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a + b"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Broadcasting seems a bit magical, but it is actually quite natural to use it when we want to solve a problem whose output data is an array with more dimensions than input data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A lot of grid-based or network-based problems can also use broadcasting. For instance, if we want to compute the distance from the origin of points on a 10x10 grid, we can do:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x, y = np.arange(5), np.arange(5)[:, None]\n",
"distance = np.sqrt(x ** 2 + y ** 2)\n",
"distance"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or in color:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Array shape manipulation\n",
"\n",
"Sometimes your arrays don't have the right shape. Also for this, NumPy has many solutions.\n",
"\n",
"## Flattening"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.array([[1, 2, 3], [4, 5, 6]])\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a.ravel()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a.T"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a.T.ravel()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Higher dimensions: last dimensions ravel out \"first\"."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reshaping\n",
"\n",
"The inverse operation to flattening:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a.reshape(6, 2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or,"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a.reshape((6, -1)) # unspecified (-1) value is inferred"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Adding a dimension\n",
"\n",
"Indexing with the ``np.newaxis`` or ``None`` object allows us to add an axis to an array (you have seen this already above in the broadcasting section):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"z = np.array([1, 2, 3])\n",
"z"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"z[:, np.newaxis]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"z[np.newaxis, :]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dimension shuffling"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.arange(4*3*2).reshape(4, 3, 2)\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"b = a.transpose(1, 2, 0)\n",
"b"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"b.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Resizing\n",
"\n",
"Size of an array can be changed with ``ndarray.resize``:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.arange(4)\n",
"a.resize((8,))\n",
"a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sorting data\n",
"\n",
"Sorting along an axis:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.array([[4, 3, 5], [1, 2, 1]])\n",
"b = np.sort(a, axis=1)\n",
"b"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Important**: Note that the code above sorts each row separately!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In-place sort:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a.sort(axis=1)\n",
"a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sorting with fancy indexing:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.array([4, 3, 1, 2])\n",
"j = np.argsort(a)\n",
"j"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a[j]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finding minima and maxima:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = np.array([4, 3, 1, 2])\n",
"j_max = np.argmax(a)\n",
"j_min = np.argmin(a)\n",
"j_max, j_min"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# `npy` - NumPy's own data format\n",
"\n",
"NumPy has its own binary format, not portable but with efficient I/O:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = np.ones((3, 3))\n",
"np.save('pop.npy', data)\n",
"data3 = np.load('pop.npy')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Summary - What do you need to know to get started?\n",
"\n",
"* Know how to create arrays : ``array``, ``arange``, ``ones``, ``zeros``.\n",
"* Know the shape of the array with ``array.shape``, then use slicing to obtain different views of the array: ``array[::2]``, etc. Adjust the shape of the array using ``reshape`` or flatten it with ``ravel``.\n",
"* Obtain a subset of the elements of an array and/or modify their values with masks \n",
" ``a[a < 0] = 0``\n",
"* Know miscellaneous operations on arrays, such as finding the mean or max (``array.max()``, ``array.mean()``).\n",
"* For advanced use: master the indexing with arrays of integers, as well as broadcasting."
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}