{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 3.4 NumPy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Estimated time to complete this notebook: 20 minutes*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3.4.1 Limitations of Python Lists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The normal Python List is just one dimensional.\n", "To make a matrix, we have to nest Python lists:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "x = [list(range(5)) for N in range(5)]" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[[0, 1, 2, 3, 4],\n", " [0, 1, 2, 3, 4],\n", " [0, 1, 2, 3, 4],\n", " [0, 1, 2, 3, 4],\n", " [0, 1, 2, 3, 4]]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[2][2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Applying an operation to every element is a pain:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [ "raises-exception" ] }, "outputs": [ { "ename": "TypeError", "evalue": "can only concatenate list (not \"int\") to list", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mx\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: can only concatenate list (not \"int\") to list" ] } ], "source": [ "x + 5" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[[5, 6, 7, 8, 9],\n", " [5, 6, 7, 8, 9],\n", " [5, 6, 7, 8, 9],\n", " [5, 6, 7, 8, 9],\n", " [5, 6, 7, 8, 9]]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[[elem + 5 for elem in row] for row in x]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Common useful operations like transposing a matrix or reshaping a 10 by 10 matrix into a 20 by 5 matrix are not easy to code in raw Python lists." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3.4.2 The NumPy array" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy's array type represents a multidimensional matrix $M_{i,j,k...n}$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The NumPy array seems at first to be just like a list:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "my_array = np.array(range(5))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_array" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_array[2]" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Hello\n", "HelloHello\n", "HelloHelloHello\n", "HelloHelloHelloHello\n" ] } ], "source": [ "for element in my_array:\n", " print(\"Hello\" * element)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also see our first weakness of NumPy arrays versus Python lists:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "tags": [ "raises-exception" ] }, "outputs": [ { "ename": "AttributeError", "evalue": "'numpy.ndarray' object has no attribute 'append'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mmy_array\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m4\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mAttributeError\u001b[0m: 'numpy.ndarray' object has no attribute 'append'" ] } ], "source": [ "my_array.append(4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For NumPy arrays, you typically don't change the data size once you've defined your array, whereas for Python lists, you can do this efficiently.\n", "However, you get back lots of goodies in return..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3.4.3 Elementwise Operations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But most operations can be applied element-wise automatically!" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2, 3, 4, 5, 6])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_array + 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These \"vectorized\" operations are very fast: (see [here](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit) for more information on the `%%timeit` magic)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "big_list = range(10000)\n", "big_array = np.arange(10000)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.24 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" ] } ], "source": [ "%%timeit\n", "[x**2 for x in big_list]" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.68 µs ± 72.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n" ] } ], "source": [ "%%timeit\n", "big_array**2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3.4.4 Arange and linspace" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy has two easy methods for defining floating-point evenly spaced arrays:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "x = np.arange(0, 10, 0.1) # Start, stop, step size" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that using non-integer step size does not work with Python lists:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "tags": [ "raises-exception" ] }, "outputs": [ { "ename": "TypeError", "evalue": "'float' object cannot be interpreted as an integer", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0my\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m10\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m0.1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: 'float' object cannot be interpreted as an integer" ] } ], "source": [ "y = list(range(0, 10, 0.1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, we can quickly an evenly spaced range of a known size (e.g. for graph plotting):" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "import math\n", "\n", "values = np.linspace(0, math.pi, 100) # Start, stop, number of steps" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0. , 0.03173326, 0.06346652, 0.09519978, 0.12693304,\n", " 0.1586663 , 0.19039955, 0.22213281, 0.25386607, 0.28559933,\n", " 0.31733259, 0.34906585, 0.38079911, 0.41253237, 0.44426563,\n", " 0.47599889, 0.50773215, 0.53946541, 0.57119866, 0.60293192,\n", " 0.63466518, 0.66639844, 0.6981317 , 0.72986496, 0.76159822,\n", " 0.79333148, 0.82506474, 0.856798 , 0.88853126, 0.92026451,\n", " 0.95199777, 0.98373103, 1.01546429, 1.04719755, 1.07893081,\n", " 1.11066407, 1.14239733, 1.17413059, 1.20586385, 1.23759711,\n", " 1.26933037, 1.30106362, 1.33279688, 1.36453014, 1.3962634 ,\n", " 1.42799666, 1.45972992, 1.49146318, 1.52319644, 1.5549297 ,\n", " 1.58666296, 1.61839622, 1.65012947, 1.68186273, 1.71359599,\n", " 1.74532925, 1.77706251, 1.80879577, 1.84052903, 1.87226229,\n", " 1.90399555, 1.93572881, 1.96746207, 1.99919533, 2.03092858,\n", " 2.06266184, 2.0943951 , 2.12612836, 2.15786162, 2.18959488,\n", " 2.22132814, 2.2530614 , 2.28479466, 2.31652792, 2.34826118,\n", " 2.37999443, 2.41172769, 2.44346095, 2.47519421, 2.50692747,\n", " 2.53866073, 2.57039399, 2.60212725, 2.63386051, 2.66559377,\n", " 2.69732703, 2.72906028, 2.76079354, 2.7925268 , 2.82426006,\n", " 2.85599332, 2.88772658, 2.91945984, 2.9511931 , 2.98292636,\n", " 3.01465962, 3.04639288, 3.07812614, 3.10985939, 3.14159265])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy comes with 'vectorised' versions of common functions which work element-by-element when applied to arrays:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "tags": [ "raises-exception" ] }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from matplotlib import pyplot as plt\n", "\n", "plt.plot(values, np.sin(values))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So we don't have to use awkward list comprehensions when using these." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3.4.5 Multi-Dimensional Arrays" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy's true power comes from multi-dimensional arrays:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[0., 0.],\n", " [0., 0.],\n", " [0., 0.],\n", " [0., 0.]],\n", "\n", " [[0., 0.],\n", " [0., 0.],\n", " [0., 0.],\n", " [0., 0.]],\n", "\n", " [[0., 0.],\n", " [0., 0.],\n", " [0., 0.],\n", " [0., 0.]]])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.zeros([3, 4, 2]) # 3 arrays with 4 rows and 2 columns each" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unlike a list-of-lists in Python, we can reshape arrays:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,\n", " 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,\n", " 34, 35, 36, 37, 38, 39])" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.array(range(40))\n", "x" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 0, 1],\n", " [ 2, 3],\n", " [ 4, 5],\n", " [ 6, 7],\n", " [ 8, 9]],\n", "\n", " [[10, 11],\n", " [12, 13],\n", " [14, 15],\n", " [16, 17],\n", " [18, 19]],\n", "\n", " [[20, 21],\n", " [22, 23],\n", " [24, 25],\n", " [26, 27],\n", " [28, 29]],\n", "\n", " [[30, 31],\n", " [32, 33],\n", " [34, 35],\n", " [36, 37],\n", " [38, 39]]])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = x.reshape([4, 5, 2]) # 4 Arrays - 5 Rows - 2 Columns\n", "y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And index multiple columns at once:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "35" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y[3, 2, 1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Including selecting on inner axes while taking all from the outermost:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 5, 15, 25, 35])" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y[:, 2, 1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And subselecting ranges:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[20, 21]],\n", "\n", " [[30, 31]]])" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y[2:, :1, :] # Last 2 axes, 1st row, all columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And [transpose](https://en.wikipedia.org/wiki/Transpose) arrays:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 0, 10, 20, 30],\n", " [ 2, 12, 22, 32],\n", " [ 4, 14, 24, 34],\n", " [ 6, 16, 26, 36],\n", " [ 8, 18, 28, 38]],\n", "\n", " [[ 1, 11, 21, 31],\n", " [ 3, 13, 23, 33],\n", " [ 5, 15, 25, 35],\n", " [ 7, 17, 27, 37],\n", " [ 9, 19, 29, 39]]])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y.transpose()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can get the dimensions of an array with `shape`" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(4, 5, 2)" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y.shape # 4 Arrays - 5 Rows - 2 Columns" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2, 5, 4)" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y.transpose().shape # 2 Arrays - 5 Rows - 4 Columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some numpy functions apply by default to the whole array, but can be chosen to act only on certain axes:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2],\n", " [ 3, 4, 5],\n", " [ 6, 7, 8],\n", " [ 9, 10, 11]])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.arange(12).reshape(4, 3)\n", "x" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1., 4., 7., 10.])" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.mean(1) # Mean along the second axis, leaving the first." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([4.5, 5.5, 6.5])" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.mean(0) # Mean along the first axis, leaving the second." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5.5" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.mean() # mean of all axes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3.4.6 Array Datatypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A Python `list` can contain data of mixed type:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "x = [\"hello\", 2, 3.4]" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "float" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(x[2])" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "int" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(x[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A NumPy array always contains just one datatype:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['hello', '2', '3.4'], dtype='