{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Intro to Numpy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Attribution note**: This notebook was created by David Dotson and Oliver Beckstein and is made available under a CC-BY 4.0 license. Some of the material was inspired by lessons developed by [Software Carpentry](http://software-carpentry.org/) ([Programming with Python](http://swcarpentry.github.io/python-novice-inflammation/)), as well as material previously created by Oliver Beckstein ([Python and Numpy for SimBioNano PHY 598](http://becksteinlab.physics.asu.edu/pages/courses/2013/SimBioNano/04/PythonAndNumpy/p04_class.html))." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When it comes to doing numerical work, Python by itself is rather slow. By slow we mean compared to languages like C and Fortran, which benefit from being **compiled** languages in which a program is preprocessed into machine code by a compiler. Python by contrast is an **interpreted** language, in which each line in a program is fed to the Python interpreter in sequence, then executed. The flexiblity and ease of use that come with Python come at the cost of pure performance." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, though Python code itself may be slow, Python can be used to run code that is written in a compiled language and already compiled. We will use a library (a.k.a., a Python *module*) that does exactly this underneath the hood to get fast performance for numerical operations on arrays." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Importing a module is like taking a piece of equipment out of a storage locker and setting it up on a lab bench. Importing the name `numpy` makes all the functions and classes (object types) available to us. The core data structure that `numpy` provides is known as the `numpy` array:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 2 3 4]\n" ] } ], "source": [ "somenums = numpy.array([1, 2, 3, 4])\n", "print(somenums)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A numpy array looks superfically similar to a `list`, which is a builtin to Python. They are fundamentally different, however, in how they both work and how they exist in memory. `numpy` arrays don't store references to other objects, but instead point to contiguous blocks of memory in which each element is of exactly the same data type. For instance, we just made an array of 64 bit integers:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('int64')" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "somenums.dtype" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['Han Solo', 'Kylo Ren', '7'], dtype=' \"Get each row in the array starting from the row at index 5 up to and not including the row at index 73.\"\n", "\n", "We could even coarse-grain by slicing out every fifth row in this range:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(14, 3)" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "position[5:73:5].shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now what if we wanted a specific *element* of the array? Indexing works for this too:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.0012600012229571492" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "position[42, 1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the y-position of the 42nd frame. \n", "\n", "NOTE: **The numpy indexing notation differs from Python indexing of nested lists**, which would look like `positions[42][1]`. \n", "\n", "Incidentally, this also works for numpy arrays but is less readable and slower, as we can demonstrate with the `%timeit` magic function of IPython/Jupyter:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "113 ns ± 0.771 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)\n" ] } ], "source": [ "%timeit position[42, 1]" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "233 ns ± 3.45 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n" ] } ], "source": [ "%timeit position[42][1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first index/slice corresponds to the first *axis* of the array, which for a 2-D array corresponds to the rows. The second index/slice would then be the columns. If we had a 3-D array, indexing the first axis would yield 2-D arrays. If we had a 4-D array, indexing the first axis would yield 3-D arrays, and so on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: Numpy slicing\n", "\n", "Obtain an array giving the mean of the x, y positions (separately) from the frame at index 10 to the frame at index 43 as a 1-D array." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can do this by slicing both the first axis (rows) and the second axis (columns), then using the `mean` method of the resulting array. To only take a mean across the rows (a mean for each column), we must specify the `axis=0` keyword." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([9.99999853e-01, 7.65000753e-04])" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "position[10:42, :2].mean(axis=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What if we wanted the smaller of the two numbers only?" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.000780000767676743" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "position[10:43, :2].mean(axis=0).min()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since slicing and methods of arrays often yield arrays, you can chain operations in this way. This is what qualifies as a *pythonic* way to work with these objects." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Fancy and boolean indexing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's also possible to index arrays with lists of indices to select out; these can be repeated and in any order." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,\n", " 17, 18, 19, 20, 21, 22, 23])" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fancy indexing with a *list of indices*:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 2, 23, 2, 3])" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A[[2, -1, 2, 3]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fancy indexing with *booleans*:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([False, False, False, False, False, False, False, False, False,\n", " False, False, True, True, True, True, True, True, True,\n", " True, True, True, True, True, True])" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "big_values = A > 10\n", "big_values" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23])" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A[big_values]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Boolean indexing is a convenient way to select parts of an array.** Test it out interactively." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For our trajectory: We can select arbitrary time steps:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 9.99999999e-01, 6.00000600e-05, -6.28319159e-05],\n", " [ 9.99999997e-01, 1.20000120e-04, -1.25663832e-04],\n", " [ 9.99999990e-01, 2.10000210e-04, -2.19911705e-04],\n", " [ 4.08082062e-01, -1.63206333e+00, -1.22464680e-15],\n", " [ 9.99999999e-01, 6.00000600e-05, -6.28319159e-05]])" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "position[[2, 4, 7, -1, 2]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also use arrays of booleans to get back arrays with items for which `True` was the value in the boolean array used:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([False, False, False, ..., False, False, False])" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(position[:, :2] > 2).any(axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use this array to get only the rows for which either the x or y position is greater than 2:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(336428, 3)" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "position[(position[:,:2] > 2).any(axis=1)].shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Boolean arrays are useful for filtering data for rows of interest." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Technical note**: fancy and boolean indexing like this generally gives back a new array instead of a *view* to the existing one. Slicing, by contrast, always gives a view. This matters when using indexing or slicing to alter the values in an array." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Array arithmetic " ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "a = np.array([1, 2, 3, 4])\n", "b = np.array([-1, 8, 7, -3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Array operations are element-wise**:" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 10, 10, 1])" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a + b" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 2, -6, -4, 7])" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a - b" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ -1, 16, 21, -12])" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a * b" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-1. , 0.25 , 0.42857143, -1.33333333])" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a / b" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1.0000e+00, 2.5600e+02, 2.1870e+03, 1.5625e-02])" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.asarray(a, dtype=np.float) ** b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For n-D arrays the same applies:" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "U = np.array([[1, 2], [3, 4]])\n", "V = np.array([[10, 20], [30, 40]])" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[11, 22],\n", " [33, 44]])" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "U + V" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ -9, -18],\n", " [-27, -36]])" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "U - V" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 10, 40],\n", " [ 90, 160]])" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "U * V" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0.1, 0.1],\n", " [0.1, 0.1]])" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "U / V" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Trajectory example continued " ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [], "source": [ "position = create_position()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calculate the distance from a reference value, e.g., the end point of the trajectory:" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 4.08082062e-01, -1.63206333e+00, -1.22464680e-15])" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ref = position[-1]\n", "ref" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\n", "d_i = |\\mathbf{r}_i - \\mathbf{r}_\\text{ref}| = \\sqrt{\\sum_{k=0}^2 (r_{i,k} - r_{\\text{ref},k})^2}\n", "$$\n", "\n", "Let's rewrite with the difference vector\n", "$$\n", "\\Delta\\mathbf{r}_i = \\mathbf{r}_i - \\mathbf{r}_\\text{ref}\n", "$$\n", "as\n", "$$\n", "d_i = \\sqrt{\\Delta\\mathbf{r}_i \\cdot \\Delta\\mathbf{r}_i} = \\sqrt{\\sum_{k=0}^2 \\Delta r_{i,k}^2}\n", "$$\n", "\n", "This will give us an instruction for how to write it in numpy:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_All_ the difference vectors:" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "Delta_r = position - ref" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1000000, 3)" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Delta_r.shape" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 5.91917938e-01, 1.63206333e+00, 1.22464680e-15],\n", " [ 5.91917938e-01, 1.63209333e+00, -3.14159579e-05],\n", " [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]])" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Delta_r[[0, 1, -1]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Square each element:" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[3.50366846e-01, 2.66363072e+00, 1.49975978e-30],\n", " [3.50366845e-01, 2.66372865e+00, 9.86962414e-10],\n", " [3.50366845e-01, 2.66382657e+00, 3.94784965e-09],\n", " ...,\n", " [1.33352927e-09, 2.53458565e-09, 3.94784966e-09],\n", " [3.33385299e-10, 6.33642303e-10, 9.86962414e-10],\n", " [0.00000000e+00, 0.00000000e+00, 0.00000000e+00]])" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Delta_r ** 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sum *along the second axis*:" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([3.01399757e+00, 3.01409549e+00, 3.01419342e+00, ...,\n", " 7.81596457e-09, 1.95399002e-09, 0.00000000e+00])" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = np.sum(Delta_r ** 2, axis=1)\n", "s" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1000000,)" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Take the root for each individual $\\Delta\\mathbf{r}_i \\cdot \\Delta\\mathbf{r}_i$:" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "array([1.73608685e+00, 1.73611506e+00, 1.73614326e+00, ...,\n", " 8.84079441e-05, 4.42039593e-05, 0.00000000e+00])" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sqrt(s)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Altogether:" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1.73608685e+00, 1.73611506e+00, 1.73614326e+00, ...,\n", " 8.84079441e-05, 4.42039593e-05, 0.00000000e+00])" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sqrt(np.sum((position - ref)**2, axis=1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "looks very similar to" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\n", "d_i = \\sqrt{\\sum_{k=0}^2 (r_{i,k} - r_{\\text{ref},k})^2}\n", "$$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Note on matrices " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that multiplication between two arrays is **not** the same as matrix multiplcation. **Arithmetic operations are element-wise.**" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, -5, -10])" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v = np.array([0, -1, 10])\n", "w = np.array([3, 5, -1])\n", "v * w" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 1 0]\n", " [0 1 0]\n", " [0 0 2]]\n", "[[ 2. 0. 0. ]\n", " [-1. 1. 0. ]\n", " [ 0. 0. 0.5]]\n" ] } ], "source": [ "A = np.array([[1, 1, 0], [0, 1, 0], [0, 0, 2]])\n", "B = np.array([[2, 0, 0], [-1, 1, 0], [0, 0, 0.5]])\n", "print(A)\n", "print(B)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The multiplication is _element-wise_:" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 2., 0., 0.],\n", " [-0., 1., 0.],\n", " [ 0., 0., 1.]])" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A * B" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But there is a method for doing matrix multiplication/inner products: vector/vector\n", "$$\n", "\\mathbf{v}\\cdot\\mathbf{w} = \\sum_{i=1}^3 v_i w_i,\n", "$$\n", "matrix/vector\n", "$$\n", "\\mathbf{A}\\cdot\\mathbf{v} = \\sum_{i=j}^3 A_{ij} v_j,\n", "$$ \n", "matrix/matrix \n", "$$\n", "[\\mathsf{A}\\cdot\\mathsf{B}]_{ik} = \\sum_{j=1}^3 A_{ij} B_{jk}\n", "$$" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-15" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.dot(v, w)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or with `@` (matrix multiplication operator)" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-15" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v @ w" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Numpy arrays also have the `dot()` method:" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-1, -1, 20])" ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.dot(v)" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 1., 0.],\n", " [-1., 1., 0.],\n", " [ 0., 0., 1.]])" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.dot(B)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And more linear algebra functions can be found in the [`numpy.linalg`](https://docs.scipy.org/doc/numpy/reference/routines.linalg.html) module:" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['LinAlgError',\n", " '__builtins__',\n", " '__cached__',\n", " '__doc__',\n", " '__file__',\n", " '__loader__',\n", " '__name__',\n", " '__package__',\n", " '__path__',\n", " '__spec__',\n", " '_umath_linalg',\n", " 'cholesky',\n", " 'cond',\n", " 'det',\n", " 'eig',\n", " 'eigh',\n", " 'eigvals',\n", " 'eigvalsh',\n", " 'inv',\n", " 'lapack_lite',\n", " 'linalg',\n", " 'lstsq',\n", " 'matrix_power',\n", " 'matrix_rank',\n", " 'multi_dot',\n", " 'norm',\n", " 'pinv',\n", " 'qr',\n", " 'slogdet',\n", " 'solve',\n", " 'svd',\n", " 'tensorinv',\n", " 'tensorsolve',\n", " 'test']" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dir(np.linalg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating new arrays" ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0., 0., 0.],\n", " [0., 0., 0.]])" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.zeros((2,3))" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 1., 1.],\n", " [1., 1., 1.]])" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.ones((2,3))" ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ True, True, True],\n", " [ True, True, True]])" ] }, "execution_count": 98, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.ones((2,3), dtype=np.bool)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Number ranges:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`np.arange` is the equivalent to `range`:" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-6, -4, -2, 0, 2, 4])" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(-6, 6, 2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It can use float steps:" ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. ,\n", " 6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(10, step=0.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "More useful is `np.linspace(start, stop, num)` to make exactly `num` equally distant numbers between `start` and `stop` _inclusive_:" ] }, { "cell_type": "code", "execution_count": 104, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-6., -5., -4., -3., -2., -1., 0., 1., 2., 3., 4., 5., 6.])" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.linspace(-6, 6, 13)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Thinking in arrays " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Say we wanted to displace our particle a full 5 units in each of the x, y, and z directions. If you have experience with a language like C, you might be used to writing nested loops like this one to achieve this:" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [], "source": [ "position = create_position()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(We use the `%%time` magic to get the time for a whole code block.)" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 1.84 s, sys: 10.7 ms, total: 1.85 s\n", "Wall time: 1.86 s\n" ] } ], "source": [ "%%time\n", "for i in range(position.shape[0]):\n", " for j in range(position.shape[1]):\n", " position[i, j] += 5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Even slower when you try to do it in a \"Pythonic\" fashion with the (otherwise very good!) `enumerate()` function:" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [], "source": [ "position = create_position()" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 2.74 s, sys: 12.7 ms, total: 2.76 s\n", "Wall time: 2.77 s\n" ] } ], "source": [ "%%time\n", "for i, row in enumerate(position):\n", " for j, element in enumerate(row):\n", " position[i, j] += 5 " ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[6. , 5. , 5. ],\n", " [6. , 5.00003 , 4.99996858],\n", " [6. , 5.00006 , 4.99993717],\n", " ...,\n", " [5.40811858, 3.36798701, 4.99993717],\n", " [5.40810032, 3.36796184, 4.99996858],\n", " [5.40808206, 3.36793667, 5. ]])" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "position" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But one of the main points of `numpy` is performance, so we'd do better to spend as little time in an operation running through the Python interpreter, which is the case in the above loop. Instead we can do:" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [], "source": [ "position = create_position()" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 2.66 ms, sys: 1.18 ms, total: 3.83 ms\n", "Wall time: 2.13 ms\n" ] } ], "source": [ "%%time\n", "position += 5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Speed-up for using array operations instead of `for` loops:" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1003.8461538461538" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2.61 / 2.6e-3" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "On my laptop the difference in speed is about a factor of 400–1000 (you might see speed-ups on the order of 100 to 1000). The larger the array the more pronounced the difference in speed will be, too. The general rule when using `numpy` is to try and put what you're trying to do in terms of operations on whole arrays (or slices of them). **Avoid Python loops unless absolutely necessary.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: array arithmetic with *broadcasting*\n", "\n", "Rescale (multiply) the y-positions by 2 and displace the x and z positions by 3 and -100, respectively." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are a lot of ways to do this, but the most succinct way is to take advantage of *broadcasting*. That is, doing:" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [], "source": [ "position = create_position()" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [], "source": [ "position = position * np.array([1, 2, 1]) + np.array([3, 0, -100])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`numpy` will take the 3-element, 1-D arrays here and apply them to whole columns in `position`. Note that we already took advantage of broadcasting rules in a way, since multiplying an array by a scalar is the same as multiplying the array by an array of equal shape with all elements equal to the scalar." ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 2 }