{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#### **Author**: `marimuthu`(mario) [[@kmario23](https://github.com/kmario23)]\n", "\n", "# **An introduction to NumPy**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy (*Numerical Python*) is a Python library for scientific computing, that provide high-performance vector, matrix, and higher-dimensional data structures for Python. It is implemented in C and Fortran so when calculations are vectorized (formulated with vectors and matrices), the performance is very good.\n", "\n", "It offers `ndarray` data structure for storing and `ufuncs` for efficiently processing the (homogeneous) data. Some of the important functionalities include: `basic slicing`, `advanced or fancy indexing`, `broadcasting`, etc." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **How are NumPy arrays different from Python lists?**\n", "\n", " - Python lists are very general. They can contain any kind of object. They are dynamically typed. \n", " - They do not support mathematical functions such as matrix and dot multiplications, etc. Implementing such functions for Python lists would not be very efficient because of the dynamic typing.\n", " - Numpy arrays are statically typed and homogeneous. The type of the elements is determined when the array is created.\n", " - Numpy arrays are memory efficient.\n", " - Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of numpy arrays can be implemented in a compiled language (C and Fortran is used)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# mandatory imports\n", "\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'1.15.3'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check version\n", "np.__version__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **Getting help**" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[0;31mInit signature:\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mndarray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m/\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mDocstring:\u001b[0m \n", "ndarray(shape, dtype=float, buffer=None, offset=0,\n", " strides=None, order=None)\n", "\n", "An array object represents a multidimensional, homogeneous array\n", "of fixed-size items. An associated data-type object describes the\n", "format of each element in the array (its byte-order, how many bytes it\n", "occupies in memory, whether it is an integer, a floating point number,\n", "or something else, etc.)\n", "\n", "Arrays should be constructed using `array`, `zeros` or `empty` (refer\n", "to the See Also section below). The parameters given here refer to\n", "a low-level method (`ndarray(...)`) for instantiating an array.\n", "\n", "For more information, refer to the `numpy` module and examine the\n", "methods and attributes of an array.\n", "\n", "Parameters\n", "----------\n", "(for the __new__ method; see Notes below)\n", "\n", "shape : tuple of ints\n", " Shape of created array.\n", "dtype : data-type, optional\n", " Any object that can be interpreted as a numpy data type.\n", "buffer : object exposing buffer interface, optional\n", " Used to fill the array with data.\n", "offset : int, optional\n", " Offset of array data in buffer.\n", "strides : tuple of ints, optional\n", " Strides of data in memory.\n", "order : {'C', 'F'}, optional\n", " Row-major (C-style) or column-major (Fortran-style) order.\n", "\n", "Attributes\n", "----------\n", "T : ndarray\n", " Transpose of the array.\n", "data : buffer\n", " The array's elements, in memory.\n", "dtype : dtype object\n", " Describes the format of the elements in the array.\n", "flags : dict\n", " Dictionary containing information related to memory use, e.g.,\n", " 'C_CONTIGUOUS', 'OWNDATA', 'WRITEABLE', etc.\n", "flat : numpy.flatiter object\n", " Flattened version of the array as an iterator. The iterator\n", " allows assignments, e.g., ``x.flat = 3`` (See `ndarray.flat` for\n", " assignment examples; TODO).\n", "imag : ndarray\n", " Imaginary part of the array.\n", "real : ndarray\n", " Real part of the array.\n", "size : int\n", " Number of elements in the array.\n", "itemsize : int\n", " The memory use of each array element in bytes.\n", "nbytes : int\n", " The total number of bytes required to store the array data,\n", " i.e., ``itemsize * size``.\n", "ndim : int\n", " The array's number of dimensions.\n", "shape : tuple of ints\n", " Shape of the array.\n", "strides : tuple of ints\n", " The step-size required to move from one element to the next in\n", " memory. For example, a contiguous ``(3, 4)`` array of type\n", " ``int16`` in C-order has strides ``(8, 2)``. This implies that\n", " to move from element to element in memory requires jumps of 2 bytes.\n", " To move from row-to-row, one needs to jump 8 bytes at a time\n", " (``2 * 4``).\n", "ctypes : ctypes object\n", " Class containing properties of the array needed for interaction\n", " with ctypes.\n", "base : ndarray\n", " If the array is a view into another array, that array is its `base`\n", " (unless that array is also a view). The `base` array is where the\n", " array data is actually stored.\n", "\n", "See Also\n", "--------\n", "array : Construct an array.\n", "zeros : Create an array, each element of which is zero.\n", "empty : Create an array, but leave its allocated memory unchanged (i.e.,\n", " it contains \"garbage\").\n", "dtype : Create a data-type.\n", "\n", "Notes\n", "-----\n", "There are two modes of creating an array using ``__new__``:\n", "\n", "1. If `buffer` is None, then only `shape`, `dtype`, and `order`\n", " are used.\n", "2. If `buffer` is an object exposing the buffer interface, then\n", " all keywords are interpreted.\n", "\n", "No ``__init__`` method is needed because the array is fully initialized\n", "after the ``__new__`` method.\n", "\n", "Examples\n", "--------\n", "These examples illustrate the low-level `ndarray` constructor. Refer\n", "to the `See Also` section above for easier ways of constructing an\n", "ndarray.\n", "\n", "First mode, `buffer` is None:\n", "\n", ">>> np.ndarray(shape=(2,2), dtype=float, order='F')\n", "array([[ -1.13698227e+002, 4.25087011e-303],\n", " [ 2.88528414e-306, 3.27025015e-309]]) #random\n", "\n", "Second mode:\n", "\n", ">>> np.ndarray((2,), buffer=np.array([1,2,3]),\n", "... offset=np.int_().itemsize,\n", "... dtype=int) # offset = 1*itemsize, i.e. skip first element\n", "array([2, 3])\n", "\u001b[0;31mFile:\u001b[0m ~/anaconda3/lib/python3.6/site-packages/numpy/__init__.py\n", "\u001b[0;31mType:\u001b[0m type\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# read about signature and docstring \n", "np.ndarray?" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on function mean in module numpy.core.fromnumeric:\n", "\n", "mean(a, axis=None, dtype=None, out=None, keepdims=<no value>)\n", " Compute the arithmetic mean along the specified axis.\n", " \n", " Returns the average of the array elements. The average is taken over\n", " the flattened array by default, otherwise over the specified axis.\n", " `float64` intermediate and return values are used for integer inputs.\n", " \n", " Parameters\n", " ----------\n", " a : array_like\n", " Array containing numbers whose mean is desired. If `a` is not an\n", " array, a conversion is attempted.\n", " axis : None or int or tuple of ints, optional\n", " Axis or axes along which the means are computed. The default is to\n", " compute the mean of the flattened array.\n", " \n", " .. versionadded:: 1.7.0\n", " \n", " If this is a tuple of ints, a mean is performed over multiple axes,\n", " instead of a single axis or all the axes as before.\n", " dtype : data-type, optional\n", " Type to use in computing the mean. For integer inputs, the default\n", " is `float64`; for floating point inputs, it is the same as the\n", " input dtype.\n", " out : ndarray, optional\n", " Alternate output array in which to place the result. The default\n", " is ``None``; if provided, it must have the same shape as the\n", " expected output, but the type will be cast if necessary.\n", " See `doc.ufuncs` for details.\n", " \n", " keepdims : bool, optional\n", " If this is set to True, the axes which are reduced are left\n", " in the result as dimensions with size one. With this option,\n", " the result will broadcast correctly against the input array.\n", " \n", " If the default value is passed, then `keepdims` will not be\n", " passed through to the `mean` method of sub-classes of\n", " `ndarray`, however any non-default value will be. If the\n", " sub-class' method does not implement `keepdims` any\n", " exceptions will be raised.\n", " \n", " Returns\n", " -------\n", " m : ndarray, see dtype parameter above\n", " If `out=None`, returns a new array containing the mean values,\n", " otherwise a reference to the output array is returned.\n", " \n", " See Also\n", " --------\n", " average : Weighted average\n", " std, var, nanmean, nanstd, nanvar\n", " \n", " Notes\n", " -----\n", " The arithmetic mean is the sum of the elements along the axis divided\n", " by the number of elements.\n", " \n", " Note that for floating-point input, the mean is computed using the\n", " same precision the input has. Depending on the input data, this can\n", " cause the results to be inaccurate, especially for `float32` (see\n", " example below). Specifying a higher-precision accumulator using the\n", " `dtype` keyword can alleviate this issue.\n", " \n", " By default, `float16` results are computed using `float32` intermediates\n", " for extra precision.\n", " \n", " Examples\n", " --------\n", " >>> a = np.array([[1, 2], [3, 4]])\n", " >>> np.mean(a)\n", " 2.5\n", " >>> np.mean(a, axis=0)\n", " array([ 2., 3.])\n", " >>> np.mean(a, axis=1)\n", " array([ 1.5, 3.5])\n", " \n", " In single precision, `mean` can be inaccurate:\n", " \n", " >>> a = np.zeros((2, 512*512), dtype=np.float32)\n", " >>> a[0, :] = 1.0\n", " >>> a[1, :] = 0.1\n", " >>> np.mean(a)\n", " 0.54999924\n", " \n", " Computing the mean in float64 is more accurate:\n", " \n", " >>> np.mean(a, dtype=np.float64)\n", " 0.55000000074505806\n", "\n" ] } ], "source": [ "# or use help()\n", "\n", "help(np.mean)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "-------\n", "\n", "### **Creating N-dimensional NumPy arrays**\n", "There are a number of ways to initialize new numpy arrays, for example from\n", "\n", " - a Python list or tuples\n", " - using functions that are dedicated to generating numpy arrays, such as *`numpy.arange`*, *`numpy.linspace`*, etc.\n", " - reading data from files\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **From lists**\n", " To create new vector and matrix arrays using Python lists we can use the `numpy.array()` function." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 4], dtype=int32)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# a vector: the argument to the array function is a Python list\n", "# more generally, 1D array\n", "lst = [1,2,3,4.0]\n", "v = np.array(lst, dtype=np.int32)\n", "\n", "v" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('int32')" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# get its datatype\n", "v.dtype" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2],\n", " [3, 4]])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# a matrix: the argument to the array function is a nested Python list (can also be a tuple of tuples)\n", "# more generally, a 2D array\n", "list_of_lists = [[1, 2], [3, 4]]\n", "M = np.array(list_of_lists)\n", "\n", "M" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3, 4]], dtype=int32)" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# a row vector\n", "\n", "row_vec = v[np.newaxis, :] # v[None, :]\n", "row_vec" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1],\n", " [2],\n", " [3],\n", " [4]], dtype=int32)" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# a column vector\n", "col_vec = v[:, np.newaxis] # v[:, None]\n", "col_vec\n", "\n", "# read more about newaxis here: https://stackoverflow.com/questions/29241056/how-does-numpy-newaxis-work-and-when-to-use-it" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **Construction using intrinsic array generating functions**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy provides many functions for generating arrays. Some of them are:\n", "\n", "- numpy.arange()\n", "- numpy.linspace()\n", "- numpy.logspace()\n", "- numpy.random." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0. , 0.20408163, 0.40816327, 0.6122449 , 0.81632653,\n", " 1.02040816, 1.2244898 , 1.42857143, 1.63265306, 1.83673469,\n", " 2.04081633, 2.24489796, 2.44897959, 2.65306122, 2.85714286,\n", " 3.06122449, 3.26530612, 3.46938776, 3.67346939, 3.87755102,\n", " 4.08163265, 4.28571429, 4.48979592, 4.69387755, 4.89795918,\n", " 5.10204082, 5.30612245, 5.51020408, 5.71428571, 5.91836735,\n", " 6.12244898, 6.32653061, 6.53061224, 6.73469388, 6.93877551,\n", " 7.14285714, 7.34693878, 7.55102041, 7.75510204, 7.95918367,\n", " 8.16326531, 8.36734694, 8.57142857, 8.7755102 , 8.97959184,\n", " 9.18367347, 9.3877551 , 9.59183673, 9.79591837, 10. ])" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# when using linspace, both end points ARE included\n", "np.linspace(0, 10)" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1. , 1.742909 , 3.03773178, 5.29449005,\n", " 9.22781435, 16.08324067, 28.03162489, 48.85657127,\n", " 85.15255772, 148.4131591 ])" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.logspace(0, 5, 10, base=np.e)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 0.41541553, -0.8747973 , 0.15095532, 1.84904123],\n", " [-0.56014374, 1.63895079, 1.27462562, -0.46117795],\n", " [-0.11496104, -0.62632673, 0.39435467, -0.78113887]],\n", "\n", " [[ 0.24521882, -0.45360077, 0.65377784, -0.28579184],\n", " [-0.79458074, -1.16854651, 0.95008769, 1.00244117],\n", " [-0.62781925, 1.01931728, -1.15421105, -0.91988477]]])" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# a 3D array\n", "# a random array where the values come from a standard Normal distribution\n", "gaussian = np.random.randn(2 * 3 * 4)\n", "\n", "# reshape the array to desired shape.\n", "# only the number of dimensions can be altered \n", "# the number of elements CANNOT be changed during a reshape operation\n", "\n", "gaussian = gaussian.reshape(2, 3, 4)\n", "gaussian" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0., 0., 0., 0.],\n", " [0., 0., 0., 0.],\n", " [0., 0., 0., 0.]])" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# an array full of zero values\n", "# one can also specify a desired datatype\n", "\n", "zero_arr = np.zeros((3, 4))\n", "zero_arr" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 1., 1., 1.],\n", " [1., 1., 1., 1.],\n", " [1., 1., 1., 1.]], dtype=float32)" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# an array full of ones\n", "# one can also specify datatype\n", "\n", "ones_arr = np.ones((3, 4), dtype=np.float32)\n", "ones_arr" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 0., 0.],\n", " [0., 1., 0.],\n", " [0., 0., 1.]], dtype=float128)" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# a 4x4 identity (matrix) array\n", "\n", "iden = np.identity(3, dtype=np.float128) # np.eye(4, dtype=np.float128)\n", "iden" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 0., 0., 0.],\n", " [0., 2., 0., 0.],\n", " [0., 0., 3., 0.],\n", " [0., 0., 0., 4.]])" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# a diagonal array\n", "\n", "diag = np.diag([1, 2, 3, 4.0])\n", "diag" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'int': [numpy.int8, numpy.int16, numpy.int32, numpy.int64],\n", " 'uint': [numpy.uint8, numpy.uint16, numpy.uint32, numpy.uint64],\n", " 'float': [numpy.float16, numpy.float32, numpy.float64, numpy.float128],\n", " 'complex': [numpy.complex64, numpy.complex128, numpy.complex256],\n", " 'others': [bool, object, bytes, str, numpy.void]}" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# get the list of all supported data types\n", "np.sctypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **A note on datatypes**\n", "\n", "If no datatype is specified during array construction using `np.array()`, NumPy assigns a default `dtype`. This is dependent on the OS (32 or 64 bit) and the elements of the array. \n", "\n", "- On a 32-bit system, `np.int32` would be assigned if all the values of the array are integers. If at least one value is float, then `np.float32` would be assigned (i.e., integers are up-cast to floating point). \n", "- Analogously, on a 64-bit machine, `np.int64` would be assigned if all the values of the array are integers. If at least one value is float, then `np.float64` would be assigned.\n", "\n", "---------" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## **NumPy Array Attributes**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- *Attributes of arrays*: Determining the size, shape, memory consumption, and data types of arrays\n", "- *Indexing of arrays*: Getting and setting the value of individual array elements\n", "- *Slicing of arrays*: Getting and setting smaller subarrays within a larger array\n", "- *Reshaping of arrays*: Changing the shape of a given array\n", "- *Joining and splitting of arrays*: Combining multiple arrays into one, and splitting one array into many" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each array has attributes such as: \n", " - `ndim` (the number of dimensions)\n", " - ``shape`` (the size of each dimension)\n", " - ``size`` (the total number of elements in the array)\n", " - ``nbytes`` (lists the total memory consumed by the array (in bytes))" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "# a 3D random array where the values come from a standard Normal distribution\n", "gaussian = np.random.randn(2 * 3 * 4).reshape((2, 3, 4))" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total dimensions of the array is: 3\n", "the shape of the array is: (2, 3, 4)\n", "total number of items is: 24\n", "memory consumed by each item is: 8\n", "total memory consumed by the whole array is: 192\n" ] } ], "source": [ "# get number of dimensions of the array\n", "gaussian.ndim\n", "print(\"total dimensions of the array is: \", gaussian.ndim)\n", "\n", "# get the shape of the array\n", "gaussian.shape\n", "print(\"the shape of the array is: \", gaussian.shape)\n", "\n", "# get the total number of elements in the array\n", "gaussian.size\n", "print(\"total number of items is: \", gaussian.size)\n", "\n", "# get memory consumed by each item in the array\n", "gaussian.itemsize\n", "print(\"memory consumed by each item is: \", gaussian.itemsize)\n", "\n", "# get memory consumed by the array\n", "gaussian.nbytes\n", "print(\"total memory consumed by the whole array is: \", gaussian.nbytes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## **Array Indexing**\n", "\n", " - We can index elements in an array using square brackets and indices. For 1D arrays, indexing works the same as with Python list." ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 4, 16, 6, 22, 19, 7, 12, 8, 21])" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 1D array of random integers\n", "# get 10 integers from 0 to 23\n", "\n", "num_samples = 10\n", "integers = np.random.randint(23, size=num_samples)\n", "integers" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "16" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# indexing 1D array needs only one index\n", "# get 3rd element (remember: NumPy unlike MATLAB is 0 based indexing)\n", "integers[2]" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],\n", " [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30],\n", " [31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45]])" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "twoD_arr = np.arange(1, 46).reshape(3, -1)\n", "twoD_arr" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "45" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# indexing 2D array needs only two indices.\n", "# then it returns a scalar value\n", "\n", "# value at last row and last column\n", "twoD_arr[-1, -1]" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45])" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# however, if we use only one (valid) index then it returns a 1D array\n", "\n", "# get all elments in the last row\n", "twoD_arr[-1] # or twoD_arr[-1, ] or twoD_arr[-1, :]" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[-1.24336147, -0.23806955, 1.07747428, -0.50192961],\n", " [ 0.51076712, 0.84448581, 0.87265159, 0.14996984],\n", " [-0.67491747, -0.91893385, 2.78620975, -0.77090699]],\n", "\n", " [[ 1.18190561, -0.80861693, -3.56556081, 0.63648925],\n", " [-0.83482671, -0.62060468, 0.10749168, 0.47283747],\n", " [-0.75846323, -0.92861168, -0.03909349, -2.39948965]]])" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# remember `gaussian` is a 3D array. \n", "gaussian" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1.18190561, -0.80861693, -3.56556081, 0.63648925],\n", " [-0.83482671, -0.62060468, 0.10749168, 0.47283747],\n", " [-0.75846323, -0.92861168, -0.03909349, -2.39948965]])" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# So, a 2D array is returned when using one index\n", "\n", "# return last slice\n", "gaussian[-1]" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1.18190561, -0.80861693, -3.56556081, 0.63648925])" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# a 1D array is returned when using a pair of indices\n", "\n", "# return first row from last slice\n", "gaussian[-1, 0]" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-0.75846323, -0.92861168, -0.03909349, -2.39948965])" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# return last row from last slice\n", "gaussian[-1, -1]" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-2.3994896530870284" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# return last element of row of last slice\n", "idx = (-1, -1, -1)\n", "gaussian[idx]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also assign new values to elements in an array using indexing:" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 4, 99, 6, 22, 19, 7, 12, 8, 21])" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# updating the array by assigning values\n", "# truncation will happen if there's a datatype mismatch\n", "#print(integers)\n", "integers[2] = 99.21\n", "integers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **Index slicing**\n", "\n", "Index slicing is the technical name for the syntax `M[lower:upper:step]` to extract part of an array. \n", "Negative indices counts from the end of the array (positive index from the begining):" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 4, 99, 6, 22, 19, 7, 12, 8, 21])" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "integers" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([19, 7, 12, 8, 21])" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# slice a portion of the array\n", "# similar to Python iterator slicing\n", "# x[start:stop:step]\n", "\n", "# get last 5 elements\n", "integers[-5:]\n", "\n", "# if `stop` is omitted then it'll be sliced till the end of the array\n", "# by default, step is 1" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 99, 22, 7, 8])" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# get alternative elements (every other element) from the array\n", "# equivalently step = 2\n", "\n", "integers[::2]" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([21, 8, 12, 7, 19, 22, 6, 99, 4, 0])" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# reversing the array\n", "integers[::-1]" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 6, 22, 19, 7, 12, 8, 21])" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# forward traversal of array\n", "integers[3::]" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 6, 99, 4, 0])" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# reverse travesal of array (starting from 4th element)\n", "integers[3::-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Array slices are mutable: if they are assigned a new value the original array from which the slice was extracted is modified:" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 4, 99, 6, 22, 19, 7, 12, 8, 21])" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "integers" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 4, 99, 6, 22, 19, 7, 12, -23, -46])" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# assign new values to the last two elements\n", "integers[-2:] = [-23, -46]\n", "\n", "integers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## **nD arrays (a.k.a tensors)**" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3, 4],\n", " [ 5, 6, 7, 8, 9],\n", " [10, 11, 12, 13, 14],\n", " [15, 16, 17, 18, 19]])" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# a 2D array\n", "twenty = (np.arange(4 * 5)).reshape(4, 5)\n", "twenty" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0, 1, 2],\n", " [5, 6, 7]])" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# slice first 2 rows and 3 columns\n", "twenty[:2, :3]" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 4],\n", " [15, 19]])" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# slice and get only the corner elements\n", "# three \"jumps\" along dimension 0\n", "# four \"jumps\" along dimension 1\n", "twenty[::3, ::4]" ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[15, 16, 17, 18, 19],\n", " [10, 11, 12, 13, 14],\n", " [ 5, 6, 7, 8, 9],\n", " [ 0, 1, 2, 3, 4]])" ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# reversing the order of elements along columns (i.e. along dimension 0)\n", "twenty[::-1, ...]" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 4, 3, 2, 1, 0],\n", " [ 9, 8, 7, 6, 5],\n", " [14, 13, 12, 11, 10],\n", " [19, 18, 17, 16, 15]])" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# reversing the order of elements along rows (i.e. along dimension 1)\n", "twenty[..., ::-1]" ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[19, 18, 17, 16, 15],\n", " [14, 13, 12, 11, 10],\n", " [ 9, 8, 7, 6, 5],\n", " [ 4, 3, 2, 1, 0]])" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# reversing the rows and columns (i.e. along both dimensions)\n", "twenty[::-1, ::-1]" ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[19, 18, 17, 16, 15],\n", " [14, 13, 12, 11, 10],\n", " [ 9, 8, 7, 6, 5],\n", " [ 4, 3, 2, 1, 0]])" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# or more intuitively\n", "np.flip(twenty, axis=(0, 1))\n", "\n", "# or equivalently\n", "np.flipud(np.fliplr(twenty))\n", "np.fliplr(np.flipud(twenty))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **Fancy indexing**\n", " - Fancy indexing is the name for when an array or a list is used in-place of an index:" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3, 4],\n", " [ 5, 6, 7, 8, 9],\n", " [10, 11, 12, 13, 14],\n", " [15, 16, 17, 18, 19]])" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# a 2D array\n", "twenty" ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 5, 6, 7, 8, 9],\n", " [10, 11, 12, 13, 14],\n", " [15, 16, 17, 18, 19]])" ] }, "execution_count": 108, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# get 2nd, 3rd, and 4th rows\n", "row_indices = [1, 2, 3]\n", "twenty[row_indices]" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 6, 12, 19])" ] }, "execution_count": 111, "metadata": {}, "output_type": "execute_result" } ], "source": [ "col_indices = [1, 2, -1] # remember, index -1 means the last element\n", "twenty[row_indices, col_indices]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also use index masks:\n", " - If the index mask is a NumPy array of data type *bool*, then an element is selected (*True*) or not (*False*) depending on the value of the index mask at the position of each element" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 4, 99, 6, 22, 19, 7, 12, -23, -46])" ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 1D array\n", "integers" ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 99, 22, 7, -23])" ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# mask has to be of the same shape as the array to be indexed; else IndexError would be thrown\n", "# mask for indexing alternate elements in the array\n", "row_mask = np.array([True, False, True, False, True, False, True, False, True, False])\n", "\n", "integers[row_mask]" ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0, 99, 22, 7, -23])" ] }, "execution_count": 114, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# alternatively\n", "row_mask = np.array([1, 0, 1, 0, 1, 0, 1, 0, 1, 0], dtype=np.bool)\n", "\n", "integers[row_mask]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This feature is very useful to conditionally select elements from an array, using for example comparison operators:" ] }, { "cell_type": "code", "execution_count": 115, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. ,\n", " 6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])" ] }, "execution_count": 115, "metadata": {}, "output_type": "execute_result" } ], "source": [ "range_arr = np.arange(0, 10, 0.5)\n", "range_arr" ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([False, False, False, False, False, False, False, False, False,\n", " False, False, True, True, True, True, False, False, False,\n", " False, False])" ] }, "execution_count": 116, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mask = (range_arr > 5) * (range_arr < 7.5)\n", "mask" ] }, { "cell_type": "code", "execution_count": 117, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([5.5, 6. , 6.5, 7. ])" ] }, "execution_count": 117, "metadata": {}, "output_type": "execute_result" } ], "source": [ "range_arr[mask]" ] }, { "cell_type": "code", "execution_count": 118, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([5.5, 6. , 6.5, 7. ])" ] }, "execution_count": 118, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# or equivalently\n", "\n", "mask = (5 < range_arr) & (range_arr < 7.5)\n", "range_arr[mask]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## **view** vs **copy**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As the name suggests, it is simply another way of **viewing** the data of the array. Technically, that means that the data of both objects is _shared_. You can create *views* by selecting a slice of the original array, or also by changing the dtype (or a combination of both). These different kinds of views are described below:\n", "\n", "- **Slice views**\n", " - This is probably the most common source of view creations in NumPy. The rule of thumb for creating a slice view is that the viewed elements can be addressed with offsets, strides, and counts in the original array. For example:" ] }, { "cell_type": "code", "execution_count": 120, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" ] }, "execution_count": 120, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.arange(10)\n", "a" ] }, { "cell_type": "code", "execution_count": 121, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 4, 7])" ] }, "execution_count": 121, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create a slice view\n", "s1 = a[1::3]\n", "s1" ] }, { "cell_type": "code", "execution_count": 122, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" ] }, "execution_count": 122, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the above code snippet, `s1` is a *view* of `a`. If we update elements of `a`, then the changes are reflected in `s1`." ] }, { "cell_type": "code", "execution_count": 124, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1, 4, 77])" ] }, "execution_count": 124, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a[7] = 77\n", "s1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- **Dtype views**\n", " - Another way to create array views is by assigning another dtype to the same data area. For example:" ] }, { "cell_type": "code", "execution_count": 126, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int16)" ] }, "execution_count": 126, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = np.arange(10, dtype='int16')\n", "b" ] }, { "cell_type": "code", "execution_count": 127, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 1, 3, 3, 5, 5, 7, 7, 9, 9], dtype=int16)" ] }, "execution_count": 127, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b32 = b.view(np.int32)\n", "b32 += 1\n", "\n", "# check array b and see the changes reflected\n", "b" ] }, { "cell_type": "code", "execution_count": 128, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 0, 1, 0, 3, 0, 3, 0, 5, 0, 5, 0, 7, 0, 7, 0, 9, 0, 9, 0],\n", " dtype=int8)" ] }, "execution_count": 128, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b8 = b.view(np.int8)\n", "b8" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **Note**: `dtype` views are not as useful as slice views, but can come in handy in some cases (for example, for quickly looking at the bytes of a generic array).\n", "\n", " - Fancy indexing returns copies not *views*.\n", " - Basic slicing returns *views* not copies." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## **Useful functions**" ] }, { "cell_type": "code", "execution_count": 129, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3, 4, 5, 6],\n", " [ 7, 8, 9, 10, 11, 12, 13],\n", " [14, 15, 16, 17, 18, 19, 20],\n", " [21, 22, 23, 24, 25, 26, 27],\n", " [28, 29, 30, 31, 32, 33, 34]])" ] }, "execution_count": 129, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# toy data\n", "arr = np.arange(5 * 7).reshape(5, 7)\n", "arr" ] }, { "cell_type": "code", "execution_count": 133, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[28, 29, 30, 31, 32, 33, 34],\n", " [21, 22, 23, 24, 25, 26, 27],\n", " [14, 15, 16, 17, 18, 19, 20],\n", " [ 7, 8, 9, 10, 11, 12, 13],\n", " [ 0, 1, 2, 3, 4, 5, 6]])" ] }, "execution_count": 133, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# randomly shuffle the array along axis 0\n", "# NOTE: this is an in-place operation\n", "np.random.shuffle(arr)\n", "arr" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# argmax of an array\n", "arr = np.arange(4, 2 * 11).reshape(2, 9)\n", "arr" ] }, { "cell_type": "code", "execution_count": 155, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1, 8)" ] }, "execution_count": 155, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# compute argmax\n", "amax = np.argmax(arr, axis=None)\n", "idx = np.unravel_index(amax, arr.shape)\n", "idx" ] }, { "cell_type": "code", "execution_count": 154, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "21" ] }, "execution_count": 154, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# retrieve element\n", "arr[idx]" ] }, { "cell_type": "code", "execution_count": 156, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "21" ] }, "execution_count": 156, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# however, `max` would simply do that job\n", "np.max(arr)" ] }, { "cell_type": "code", "execution_count": 159, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 4, 5, 6, 7, 8, 9, 10, 11, 12],\n", " [13, 14, 15, 16, 17, 18, 19, 20, 21]])" ] }, "execution_count": 159, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr" ] }, { "cell_type": "code", "execution_count": 166, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5]])" ] }, "execution_count": 166, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# compute `mean` along an axis;\n", "# should never use a `for` loop to do this\n", "# use the standard ufunc `np.mean()`\n", "\n", "## Signature: np.mean(a, axis=None, dtype=None, out=None, keepdims=<no value>)\n", "avg = np.mean(arr, axis=0, keepdims=True) # `keepdims` kwarg would return the result as an array of same dimension as input array.\n", "avg" ] }, { "cell_type": "code", "execution_count": 168, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]),\n", " array([7, 8, 0, 1, 2, 3, 4, 5, 6, 7, 8]))" ] }, "execution_count": 168, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# conditional check\n", "idxs = np.where(arr > 10)\n", "idxs" ] }, { "cell_type": "code", "execution_count": 174, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21])" ] }, "execution_count": 174, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# we can get the actual elements with the above mask\n", "greater10 = arr[idxs]\n", "greater10" ] }, { "cell_type": "code", "execution_count": 175, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 4, 5, 6, 7, 8, 9, 10, 11, 12],\n", " [13, 14, 15, 16, 17, 18, 19, 20, 21]])" ] }, "execution_count": 175, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# note that this would give us a copy of array.\n", "\n", "greater10[-1] = 0\n", "arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **Storing NumPy arrays using native file format**" ] }, { "cell_type": "code", "execution_count": 176, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "persist/random-array.npy: data\n" ] } ], "source": [ "random_arr = np.random.randn(2, 3, 4)\n", "np.save(\"persist/random-array.npy\", random_arr)\n", "\n", "# The exclamation mark means that this line should be run through `bash` as though it were run on the terminal\n", "!file persist/random-array.npy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "-----------\n", "\n", "## **Linear Algebra**\n", "\n", "Vectorizing the code is key to writing efficient numerical calculation with Python/NumPy. This means that, as much as possible, a program should be formulated in terms of matrix and vector operations, like matrix-matrix multiplication." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Scalar-array operations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the usual arithmetic operators to multiply, add, subtract, and divide arrays with scalar numbers." ] }, { "cell_type": "code", "execution_count": 184, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4])" ] }, "execution_count": 184, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vec = np.arange(0, 5)\n", "\n", "vec" ] }, { "cell_type": "code", "execution_count": 182, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 2, 4, 6, 8])" ] }, "execution_count": 182, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# note that original `vec` still remains unaffected since we haven't assigned the new array to it.\n", "vec * 2" ] }, { "cell_type": "code", "execution_count": 183, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2, 3, 4, 5, 6])" ] }, "execution_count": 183, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vec + 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Element-wise array-array operations\n", "\n", "When we add, subtract, multiply, and divide arrays with each other, the default behaviour is **element-wise** operations:" ] }, { "cell_type": "code", "execution_count": 185, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3, 4],\n", " [ 5, 6, 7, 8, 9],\n", " [10, 11, 12, 13, 14],\n", " [15, 16, 17, 18, 19]])" ] }, "execution_count": 185, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr = np.arange(4 * 5).reshape(4, 5)\n", "arr" ] }, { "cell_type": "code", "execution_count": 189, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 30, 80, 130, 180],\n", " [ 80, 255, 430, 605],\n", " [ 130, 430, 730, 1030],\n", " [ 180, 605, 1030, 1455]])" ] }, "execution_count": 189, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.matmul(arr, (arr.T))" ] }, { "cell_type": "code", "execution_count": 190, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(5,) (4, 5)\n" ] }, { "data": { "text/plain": [ "array([[ 0, 1, 4, 9, 16],\n", " [ 0, 6, 14, 24, 36],\n", " [ 0, 11, 24, 39, 56],\n", " [ 0, 16, 34, 54, 76]])" ] }, "execution_count": 190, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(vec.shape, arr.shape)\n", "\n", "# shape has to match\n", "vec * arr" ] }, { "cell_type": "code", "execution_count": 193, "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "operands could not be broadcast together with shapes (5,1) (4,5) ", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m<ipython-input-193-ea24296fd5a0>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;31m#print(vec[:, None].shape)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mvec\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0marr\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mValueError\u001b[0m: operands could not be broadcast together with shapes (5,1) (4,5) " ] } ], "source": [ "# else no broadcasting will happen and an error is thrown\n", "#print(vec[:, None].shape)\n", "\n", "vec[:, None] * arr" ] }, { "cell_type": "code", "execution_count": 194, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 0, 0, 0, 0],\n", " [ 5, 6, 7, 8, 9],\n", " [20, 22, 24, 26, 28],\n", " [45, 48, 51, 54, 57]])" ] }, "execution_count": 194, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# however, this would work\n", "vec[:4, None] * arr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Matrix algebra\n", "What about the glorified matrix mutiplication? There are two ways. We can either use the `dot` function, which applies a matrix-matrix, matrix-vector, or inner vector multiplication to its two arguments. Or you can use the `@` operator in Python 3" ] }, { "cell_type": "code", "execution_count": 130, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(5, 4) (4, 5)\n" ] }, { "data": { "text/plain": [ "array([[350, 380, 410, 440, 470],\n", " [380, 414, 448, 482, 516],\n", " [410, 448, 486, 524, 562],\n", " [440, 482, 524, 566, 608],\n", " [470, 516, 562, 608, 654]])" ] }, "execution_count": 130, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# matrix-matrix product\n", "print(arr.T.shape, arr.shape)\n", "np.dot(arr.T, arr)" ] }, { "cell_type": "code", "execution_count": 196, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "shapes: (4, 5) (5,)\n" ] }, { "data": { "text/plain": [ "array([ 30, 80, 130, 180])" ] }, "execution_count": 196, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# matrix-vector product\n", "print(\"shapes: \", arr.shape, vec.shape)\n", "np.dot(arr, vec) # but not this: np.dot(vec, arr)" ] }, { "cell_type": "code", "execution_count": 197, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "shapes: (1, 5) (5, 1)\n" ] }, { "data": { "text/plain": [ "array([[30]])" ] }, "execution_count": 197, "metadata": {}, "output_type": "execute_result" } ], "source": [ "col_vec = vec[:, None]\n", "print(\"shapes: \", (col_vec).T.shape, (col_vec).shape)\n", "\n", "# inner product\n", "(col_vec.T) @ (col_vec)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "See also the related functions: `inner`, `outer`, `cross`, `kron`, `tensordot`. Try for example `help(kron)`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "-------------------\n", "\n", "#### **Stacking and repeating arrays**\n", "\n", "Using function `repeat`, `tile`, `vstack`, `hstack`, and `concatenate` we can create larger vectors and matrices from smaller ones:" ] }, { "cell_type": "code", "execution_count": 201, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2],\n", " [3, 4]])" ] }, "execution_count": 201, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.array([[1, 2], [3, 4]])\n", "a" ] }, { "cell_type": "code", "execution_count": 202, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4])" ] }, "execution_count": 202, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# repeat each element 3 times\n", "np.repeat(a, 3)" ] }, { "cell_type": "code", "execution_count": 203, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 1, 2, 1, 2],\n", " [3, 4, 3, 4, 3, 4]])" ] }, "execution_count": 203, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# tile the matrix 3 times \n", "np.tile(a, 3)" ] }, { "cell_type": "code", "execution_count": 204, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[5, 6]])" ] }, "execution_count": 204, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = np.array([[5, 6]])\n", "b" ] }, { "cell_type": "code", "execution_count": 205, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2],\n", " [3, 4],\n", " [5, 6]])" ] }, "execution_count": 205, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# concatenate a and b along axis 0\n", "np.concatenate((a, b), axis=0)" ] }, { "cell_type": "code", "execution_count": 138, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 5],\n", " [3, 4, 6]])" ] }, "execution_count": 138, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# concatenate a and b along axis 1\n", "np.concatenate((a, b.T), axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### hstack and vstack" ] }, { "cell_type": "code", "execution_count": 139, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2],\n", " [3, 4],\n", " [5, 6]])" ] }, "execution_count": 139, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.vstack((a,b))" ] }, { "cell_type": "code", "execution_count": 140, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 5],\n", " [3, 4, 6]])" ] }, "execution_count": 140, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.hstack((a,b.T))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "-----------------\n", "\n", "### **Copy and \"deep copy\"**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For performance reasons, assignments in Python usually do not copy the underlaying objects. This is important for example when objects are passed between functions, to avoid an excessive amount of memory copying when it is not necessary (technical term: *pass by reference*). " ] }, { "cell_type": "code", "execution_count": 206, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2],\n", " [3, 4]])" ] }, "execution_count": 206, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = np.array([[1, 2], [3, 4]])\n", "\n", "A" ] }, { "cell_type": "code", "execution_count": 207, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 2],\n", " [3, 4]])" ] }, "execution_count": 207, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# now array B is referring to the same array data as A \n", "B = A\n", "B" ] }, { "cell_type": "code", "execution_count": 209, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[10, 2],\n", " [ 3, 4]])" ] }, "execution_count": 209, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# changing B affects A\n", "B[0,0] = 10\n", "\n", "A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want to avoid such a behavior, so that when we get a new completely independent object `B` copied from `A`, then we need to do a so-called \"deep copy\" using the function `copy`:" ] }, { "cell_type": "code", "execution_count": 210, "metadata": {}, "outputs": [], "source": [ "B = np.copy(A)" ] }, { "cell_type": "code", "execution_count": 211, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[10, 2],\n", " [ 3, 4]])" ] }, "execution_count": 211, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# now, if we modify B, A is not affected\n", "B[0,0] = -5\n", "\n", "A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---------------------\n", "### **Vectorizing functions**\n", "\n", "As mentioned several times by now, to get good performance we should always try to avoid looping over elements in our vectors and matrices, and instead use vectorized algorithms. The first step in converting a scalar algorithm to a vectorized algorithm is to make sure that the functions we write work with vector inputs." ] }, { "cell_type": "code", "execution_count": 212, "metadata": {}, "outputs": [], "source": [ "def Theta(x):\n", " \"\"\"\n", " scalar implementation of the Heaviside step function.\n", " \"\"\"\n", " if x >= 0:\n", " return 1\n", " else:\n", " return 0" ] }, { "cell_type": "code", "execution_count": 213, "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m<ipython-input-213-489043f2ba8f>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mv1\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mTheta\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mv1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m<ipython-input-212-d160bfd9c2b9>\u001b[0m in \u001b[0;36mTheta\u001b[0;34m(x)\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mscalar\u001b[0m \u001b[0mimplementation\u001b[0m \u001b[0mof\u001b[0m \u001b[0mthe\u001b[0m \u001b[0mHeaviside\u001b[0m \u001b[0mstep\u001b[0m \u001b[0mfunction\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \"\"\"\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0;32mif\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m>=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 6\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mValueError\u001b[0m: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()" ] } ], "source": [ "v1 = np.array([-3,-2,-1,0,1,2,3])\n", "\n", "Theta(v1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That didn't work because we didn't write the function `Theta` so that it can handle a _vector_ input... \n", "\n", "To get a vectorized version of Theta we can use the Numpy function `vectorize`. In many cases it can automatically vectorize a function:" ] }, { "cell_type": "code", "execution_count": 215, "metadata": {}, "outputs": [], "source": [ "Theta_vec = np.vectorize(Theta)" ] }, { "cell_type": "code", "execution_count": 216, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 0, 0, 1, 1, 1, 1])" ] }, "execution_count": 216, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Theta_vec(v1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OTOH, we can also implement the function to accept a vector input from the beginning (requires more effort but might give better performance):" ] }, { "cell_type": "code", "execution_count": 217, "metadata": {}, "outputs": [], "source": [ "def Theta(x):\n", " \"\"\"\n", " Vector-aware implementation of the Heaviside step function.\n", " \"\"\"\n", " return 1 * (x >= 0)" ] }, { "cell_type": "code", "execution_count": 218, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 0, 0, 1, 1, 1, 1])" ] }, "execution_count": 218, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Theta(v1)" ] }, { "cell_type": "code", "execution_count": 219, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0, 1)" ] }, "execution_count": 219, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# it even works with scalar input\n", "Theta(-1.2), Theta(2.6)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "--------------------------\n", "--------------------------\n", "\n", "## **Advanced NumPy**" ] }, { "cell_type": "code", "execution_count": 220, "metadata": {}, "outputs": [], "source": [ "from IPython.display import Image\n", "from IPython.core.display import display, HTML" ] }, { "cell_type": "code", "execution_count": 221, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<img src=\"https://i.stack.imgur.com/p2PGi.png\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 221, "metadata": {}, "output_type": "execute_result" } ], "source": [ "axis_visual = \"https://i.stack.imgur.com/p2PGi.png\"\n", "Image(url=axis_visual)" ] }, { "cell_type": "code", "execution_count": 222, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<img src=\"https://www.oreilly.com/library/view/elegant-scipy/9781491922927/assets/elsp_0105.png\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 222, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr_visual = \"https://www.oreilly.com/library/view/elegant-scipy/9781491922927/assets/elsp_0105.png\"\n", "Image(url=arr_visual)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Computing statistics across axes**" ] }, { "cell_type": "code", "execution_count": 223, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3, 4, 5],\n", " [ 6, 7, 8, 9, 10, 11],\n", " [12, 13, 14, 15, 16, 17],\n", " [18, 19, 20, 21, 22, 23],\n", " [24, 25, 26, 27, 28, 29]])" ] }, "execution_count": 223, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr = np.arange(5 * 6).reshape(5, 6)\n", "arr" ] }, { "cell_type": "code", "execution_count": 225, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[12., 13., 14., 15., 16., 17.]])" ] }, "execution_count": 225, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arr.mean(axis=0, keepdims=True)" ] }, { "cell_type": "code", "execution_count": 226, "metadata": {}, "outputs": [], "source": [ "# what would be the result for:\n", "avg = arr.mean(axis=1, keepdims=True)\n", "\n", "# similarly, max, min, std, etc." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Broadcasting**" ] }, { "cell_type": "code", "execution_count": 227, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<img src=\"https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png\" width=\"720\" height=\"480\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 227, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bcast_visual = \"https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png\"\n", "Image(url=bcast_visual, width=720, height=480)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **RandomState**\n", "\n", "For reproducing the results, fix the seed:\n", "\n", " A fixed seed and a fixed series of calls to 'RandomState' methods using the same parameters will always produce the same results up to roundoff error except when the values were incorrect." ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[2, 3, 1, 3, 3, 0],\n", " [1, 1, 1, 3, 2, 1],\n", " [4, 3, 0, 2, 4, 4]])" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rng = np.random.RandomState(seed=42)\n", "data = rng.randint(-1, 5, (3, 6))\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Sampling from Distributions**" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 0.49671415, -0.1382643 ],\n", " [ 0.64768854, 1.52302986],\n", " [-0.23415337, -0.23413696],\n", " [ 1.57921282, 0.76743473]],\n", "\n", " [[-0.46947439, 0.54256004],\n", " [-0.46341769, -0.46572975],\n", " [ 0.24196227, -1.91328024],\n", " [-1.72491783, -0.56228753]],\n", "\n", " [[-1.01283112, 0.31424733],\n", " [-0.90802408, -1.4123037 ],\n", " [ 1.46564877, -0.2257763 ],\n", " [ 0.0675282 , -1.42474819]]])" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# for reproducibility\n", "rng = np.random.RandomState(seed=42)\n", "\n", "std_normal_dist = rng.standard_normal(size=(3, 4, 2))\n", "std_normal_dist" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[ 0.49671415, -0.1382643 ],\n", " [ 0.64768854, 1.52302986],\n", " [-0.23415337, -0.23413696],\n", " [ 1.57921282, 0.76743473]],\n", "\n", " [[-0.46947439, 0.54256004],\n", " [-0.46341769, -0.46572975],\n", " [ 0.24196227, -1.91328024],\n", " [-1.72491783, -0.56228753]],\n", "\n", " [[-1.01283112, 0.31424733],\n", " [-0.90802408, -1.4123037 ],\n", " [ 1.46564877, -0.2257763 ],\n", " [ 0.0675282 , -1.42474819]]])" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# if reproducibility matters ...\n", "rng = np.random.RandomState(seed=42)\n", "\n", "# an array of 10 points randomly sampled from a normal distribution\n", "# loc=mean, scale=std deviation\n", "rng.normal(loc=0.0, scale=1.0, size=(3, 4, 2))" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[0.37454012, 0.95071431],\n", " [0.73199394, 0.59865848],\n", " [0.15601864, 0.15599452],\n", " [0.05808361, 0.86617615]],\n", "\n", " [[0.60111501, 0.70807258],\n", " [0.02058449, 0.96990985],\n", " [0.83244264, 0.21233911],\n", " [0.18182497, 0.18340451]],\n", "\n", " [[0.30424224, 0.52475643],\n", " [0.43194502, 0.29122914],\n", " [0.61185289, 0.13949386],\n", " [0.29214465, 0.36636184]]])" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# uniform distribution\n", "rng = np.random.RandomState(seed=42)\n", "\n", "rng.uniform(low=0, high=1.0, size=(3, 4, 2))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "--------------------------\n", "--------------------------\n", "\n", "## **Further references:**\n", "\n", "- [DataQuest NumPy cheatsheet](https://www.dataquest.io/blog/large_files/numpy-cheat-sheet.pdf)\n", "- https://docs.scipy.org/doc/numpy/reference/\n", "- Your own imagination & dexterity!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }