{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Numpy and Matplotlib" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Numpy\n", "\n", "NumPy is a linear algebra library in Python, with computationally expensive methods written in FORTRAN for speed. \n", "\n", "* The reference manual is at . \n", "* A nice tutorial can be found at \n", "* or: \n", "* If you already know Matlab, a comparison is at " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Importing libraries\n", "\n", "To import a library in Python, you only need to use the keyword `import` at the beginning of your script / notebook (or more exactly, before you use it).\n", "\n", "```python\n", "import numpy\n", "```\n", "\n", "Think of it as the equivalent of `#include ` in C/C++ (if you know Java, you will not be shocked). You can then use the functions and objects provided by the library using the namespace of the library:\n", "\n", "```python\n", "x = numpy.array([1, 2, 3])\n", "```\n", "\n", "If you do not want to type `numpy.` everytime, and if you are not afraid that numpy redefines any important function, you can also simply import every definition declared by the library in your current namespace with:\n", "\n", "```python\n", "from numpy import *\n", "```\n", "\n", "and use the objects directly:\n", "\n", "```python\n", "x = array([1, 2, 3])\n", "```\n", "\n", "However, it is good practice to give an alias to the library when its name is too long (numpy is still okay, but think of matplotlib...):\n", "\n", "```python\n", "import numpy as np \n", "```\n", "\n", "You can then use the objects like this:\n", "\n", "```python\n", "x = np.array([1, 2, 3])\n", "```\n", "\n", "Remember that you can get help on any NumPy function:\n", "\n", "```python\n", "help(np.array)\n", "help(np.ndarray.transpose)\n", "```" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Vectors and matrices\n", "\n", "The basic object in NumPy is an **array** with d-dimensions (1D = vector, 2D = matrix, 3D or more = tensor). They can store either integers or floats, using various precisions.\n", "\n", "In order to create a vector of three floats, you simply have to build an `array()` object by providing a list of floats as input:\n", "\n", "```python\n", "A = np.array( [ 1., 2., 3.] )\n", "```\n", "\n", "Matrices should be initialized with a list of lists. For a 3x4 matrix of 8 bits unsigned integers, it is:\n", "\n", "```python\n", "B = np.array( [ \n", " [ 1, 2, 3, 4],\n", " [ 5, 6, 7, 8],\n", " [ 4, 3, 2, 1] \n", " ] , dtype=np.uint8)\n", "```\n", "\n", "Most of the time, you won't care about the type (the default floating-point precision is what you want for machine learning), but if you need it, you can always specify it with the parameter `dtype={int32, uint16, float64, ...}`. Note that even if you pass integers to the array (`np.array( [ 1, 2, 3] )`), they will be converted to floats by default." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following attributes of an array can be accessed:\n", "\n", "- `A.shape` : returns the shape of the vector `(n,)` or matrix `(m, n)`.\n", "\n", "- `A.size` : returns the total number of elements in the array.\n", "\n", "- `A.ndim` : returns the number of dimensions of the array (vector: 1, matrix:2).\n", "\n", "- `A.dtype.name` : returns the type of data stored in the array (int32, uint16, float64...)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Define the two arrays $A$ and $B$ from above and print those attributes. Modify the arrays (number of elements, type) and observe how they change. \n", "\n", "*Hint:* you can print an array just like any other Python object." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Internally, the values are stored sequentially as a vector, even if your array has more than one dimension. The apparent shape is just used for mathematical operations. You can **reshape** a matrix very easily with the `reshape()` method:\n", "\n", "```python\n", "B = np.array( [ \n", " [ 1, 2, 3, 4],\n", " [ 5, 6, 7, 8],\n", " [ 4, 3, 2, 1] \n", "]) # B has 3 rows, 4 columns\n", "\n", "C = B.reshape((6, 2)) # C has 6 rows, 2 columns\n", "```\n", "\n", "The only thing to respect is that the total number of elements must be the same. Beware also of the order in which the elements will be put.\n", "\n", "**Q:** Create a vector with 8 elements and reshape it into a 2x4 matrix." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Initialization of an array\n", "\n", "Providing a list of values to `array()` would be tedious for large arrays. Numpy offers constructors that allow to construct simply most vectors or matrices.\n", "\n", "`np.zeros(shape)` creates an array of shape `shape` filled with zeros. Note: if you give a single integer for the shape, it will be interpreted as a vector of shape `(d,)`. \n", "\n", "`np.ones(shape)` creates an array of shape `shape` filled with ones. \n", "\n", "`np.full(shape, val)` creates an array of shape `shape` filled with `val`. \n", "\n", "`np.eye(n)` creates a diagonal matrix of shape `(n, n)`.\n", "\n", "`np.arange(a, b)` creates a vector of integers whose value linearly increase from `a` to `b` (excluded).\n", "\n", "`np.linspace(a, b, n)` creates a vector of `n` values evenly distributed between `a` and `b` (included).\n", "\n", "\n", "**Q:** Create and print:\n", "\n", "* a 2x3 matrix filled with zeros.\n", "* a vector of 12 elements initialized to 3.14.\n", "* a vector of 11 elements whose value linearly increases from 0.0 to 10.0.\n", "* a vector of 11 elements whose value linearly increases from 10 to 20." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Random distributions\n", "\n", "In many cases, it is useful to initialize a vector or matrix with random values. **Random number generators** (rng) allows to draw numbers from any probability distribution (uniform, normal, etc.) using pseudo-random methods. \n", "\n", "In numpy versions before 1.16, the `numpy.random` module had direct methods allowing to initialize arrays:\n", "\n", "```python\n", "A = np.random.uniform(-1.0, 1.0, (10, 10)) # a 10x10 matrix with values uniformly taken between -1 and 1\n", "```\n", "\n", "Since numpy 1.16, this method has been deprecated in favor of a more explicit initialization of the underlying rng:\n", "\n", "```python\n", "rng = np.random.default_rng()\n", "A = rng.uniform(-1.0, 1.0, (10, 10))\n", "```\n", "\n", "The advantages of this new method (reproducibility, parallel seeds) will not matter for these exercises, but let's take good habits already.\n", "\n", "The generator has many built-in methods, covering virtually any useful probability distribution. Read the documentation of the random generator:\n", "\n", "\n", "\n", "**Q:** Create:\n", "\n", "* A vector of 20 elements following a normal distribution with mean 2.0 and standard devation 3.0.\n", "* A 10x10 matrix whose elements come from the exponential distribution with $\\beta = 2$.\n", "* A vector of 10 integers randomly chosen between 1 and 100 (hint: involves `arange` and `rng.choice`)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Manipulation of matrices: indices, slices\n", "\n", "To access a particular element of a matrix, you can use the usual Python list style (the first element has a rank of 0), once per dimension:\n", "\n", "```python\n", "A = np.array(\n", " [ \n", " [ 1, 2, 3, 4],\n", " [ 5, 6, 7, 8],\n", " [ 9, 10, 11, 12]\n", " ]\n", ")\n", "\n", "x = A[0, 2] # The element on the first row and third column\n", "```\n", "\n", "For matrices, the first index represents the rows, the second the columns. [0, 2] represents the element at the first row, third column.\n", "\n", "**Q:** Define this matrix and replace the element `12` by a zero using indices:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is possible to access complete row or columns of a matrix using **slices**. The `:` symbol is a shortcut for \"everything\":\n", "\n", "```python\n", "b = A[:, 2] # third column\n", "c = A[0, :] # first row\n", "```\n", "\n", "**Q:** Set the fourth column of A to 1." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As for python lists, you can specify a range `start:stop` to get only a subset of a row/column (beware, stop is excluded):\n", "\n", "```python\n", "d = A[0, 1:3] # second and third elements of the first row\n", "e = A[1, :2] # first and second elements of the second row\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use boolean arrays to retrieve indices:\n", "\n", "```python\n", "A = np.array( \n", " [ [ -2, 2, 1, -4],\n", " [ 3, -1, -5, -3] ])\n", "\n", "negatives = A < 0 # Boolean array where each element is True when the condition is met.\n", "A[negatives] = 0 # All negative elements of A (where the boolean matrix is True) will be set to 0\n", "```\n", "\n", "A simpler way to write it is:\n", "\n", "```python\n", "A[A < 0] = 0\n", "```\n", "\n", "**Q:** print A, negatives and A again after the assignment:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Basic linear algebra \n", "\n", "Let's first define some matrices:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "A = np.array( [ [ 1, 2, 3, 4],\n", " [ 5, 6, 7, 8] ])\n", "\n", "B = np.array( [ [ 1, 2],\n", " [ 3, 4],\n", " [ 5, 6],\n", " [ 7, 8] ])\n", "\n", "C = np.array( [ [ 1, 2, 3, 4],\n", " [ 5, 6, 7, 8],\n", " [ 9, 0, 1, 1],\n", " [ 13, 7, 2, 6] ])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Transpose a matrix \n", "\n", "A matrix can be transposed with the `transpose()` method or the `.T` shortcut:\n", "\n", "```python\n", "D = A.transpose() \n", "E = A.T # equivalent\n", "```\n", "\n", "**Q:** Try it:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`transpose()` does not change `A`, it only returns a transposed copy. To transpose `A` definitely, you have to use the assigment `A = A.T`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Multiply two matrices \n", "\n", "There are two manners to multiply matrices:\n", "\n", "- element-wise: Two arrays of **exactly** the same shape can be multiplied *element-wise* by using the `*` operator:\n", "\n", "```python\n", "D = A * B\n", "```\n", "\n", "- algebrically: To perform a **matrix multiplication**, you have to use the `dot()` method. Beware: the dimensions must match! `(m, n) * (n, p) = (m, p)`\n", "\n", "```python\n", "E = np.dot(A, B)\n", "```\n", "\n", "**Q:** Use the matrices `A` and `B` previously defined and multiply them element-wise and algebrically. You may have to transpose one of them." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Multiplying a matrix with a vector\n", "\n", "`*` and `np.dot` also apply on matrix-vector multiplications $\\mathbf{y} = A \\times \\mathbf{x}$ or vector-vector multiplications.\n", "\n", "**Q:** Define a vector $\\mathbf{x}$ with four elements and multiply it with the matrix $A$ using `*` and `np.dot`. What do you obtain? Try the same by multiplying the vector $\\mathbf{x}$ and itself." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Inverting a matrix\n", "\n", "Inverting a Matrix (when possible) can be done using the `inv()` method whitch is defined in the `linalg` submodule of NumPy.\n", "\n", "```python\n", "inv_C = np.linalg.inv(C)\n", "```\n", "\n", "**Q:**\n", "\n", "1. Invert `C` and print the result.\n", "2. Multiply `C` with its inverse and print the result. What do observe? Why is Numpy called a *numerical computation* library?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Summing elements \n", "\n", "One can sum the elements of a matrix globally, row-wise or column-wise:\n", "\n", "```python\n", "# Globally\n", "S1 = np.sum(A)\n", "\n", "# Per column\n", "S2 = np.sum(A, axis=0) \n", "\n", "# Per row\n", "S3 = np.sum(A, axis=1) \n", "```\n", "\n", "**Q:** Try them:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You also have access to the minimum (`np.min()`), maximum (`np.max()`), mean (`np.mean()`) of an array, also per row/column. \n", "\n", "**Q:** Try them out:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Mathematical operations \n", "\n", "You can apply any usual mathematical operations (cos, sin, exp, etc...) on each element of a matrix (element-wise):\n", "\n", "```python\n", "D = np.exp(A)\n", "E = np.cos(A)\n", "F = np.log(A)\n", "G = (A+3) * np.cos(A-2)\n", "```\n", "\n", "**Q:** Try it." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Matplotlib\n", "\n", "Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.\n", "\n", "* Reference: \n", "* Tutorial by N. Rougier: \n", "\n", "This is the default historical visualization library in Python, which anybody should know, but not the nicest. If you are interested in having better visualizations, have a look at:\n", "\n", "* `seaborn` \n", "* `ggplot2` \n", "* `bokeh` \n", "* `plotly` \n", "\n", "We will nevertheless stick to matplotlib in these exercises.\n", "\n", "The `pyplot` module is the most famous, as it has a similar interface to Matlab. It is customary to use the `plt` namescape for it:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `plt.plot()`\n", "\n", "The `plt.plot()` command allows to make simple line drawings:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "x = np.linspace(0., 10., 100)\n", "y = x**2 + 1.\n", "\n", "plt.figure()\n", "plt.plot(x, y)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`plot()` takes two vectors `x` and `y` as inputs (they must have the same size) and plots them against each other. It is standard to define the x-axis with `np.linspace()` if you just want to plot a function. 100 points is usually a good choice, but you can experiments with less points.\n", "\n", "The call to `plt.show()` is obligatory at the end to display the window when using a script (very common mistake to forget it!). It is not needed in Jupyter notebooks as it is implicitly called, but let's take the habit anyway. \n", "\n", "The call to `plt.figure()` is also optional, as a new figure is created when you call `plt.plot()` for the first time.\n", "\n", "**Q:** Create a third vector `z` (e.g. `z = -x**2 + 2`) and plot it against `x` right after `y` (i.e. between `plt.plot(x, y)` and `plt.show()`). What happens?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Now call `plt.figure()` again between the two plots. What happens?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, the plot is quite empty. This is fine when experimenting in a notebook, but not when incorporating the figures in your thesis. You can make a plot look better by adding a title, labels on the axes, etc. \n", "\n", "```python\n", "plt.title('My title')\n", "plt.xlabel('x-axis')\n", "plt.ylabel('y-axis')\n", "```\n", "\n", "**Q:** Make the previous plots nicer by adding legends and axes.\n", "\n", "*Hint:* if you know LateX equations, you can insert simple formulas in the title or axes by using two dollar signs `$$`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you make multiple plots on the same figure by calling `plt.plot()` multiple times, you can add a label to each plot to create a legend with `plt.legend()`:\n", "\n", "```python\n", "plt.plot(x, y, label='y')\n", "plt.plot(x, z, label='z')\n", "plt.legend()\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another advantage of declaring a figure is that you can modify its size (which is very small in a notebook by default) with the `figsize` argument in inches:\n", "\n", "```python\n", "plt.figure(figsize=(16, 10))\n", "```\n", "\n", "**Q:** Experiment with figure sizes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Side-by-side plots\n", "\n", "To make separate plots in the same figure, you can use `plt.subplot(abc)`.\n", "\n", "The function takes three digits a, b, c as input (e.g. 221 or 122) where:\n", "\n", "- a is the number of rows.\n", "- b is the number of columns.\n", "- c is the index (starting at 1) of the current subplot.\n", "\n", "Here is a dummy example of a 2x2 grid of plots:\n", "\n", "```python\n", "plt.subplot(221)\n", "plt.plot(x, y)\n", "\n", "plt.subplot(222)\n", "plt.plot(x, z)\n", "\n", "plt.subplot(223)\n", "plt.plot(y, x)\n", "\n", "plt.subplot(224)\n", "plt.plot(z, x)\n", "```\n", "\n", "**Q:** Try it." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `plt.imshow()`\n", "\n", "Matrices can be displayed using `plt.imshow()`. You can choose the color code with the `cmap` argument (e.g. `gray` or `hot`).\n", "\n", "```python\n", "plt.imshow(A, cmap=plt.cm.hot, interpolation='nearest')\n", "plt.colorbar()\n", "```\n", "\n", "`plt.colorbar()` allows to show a vertical bar indicating the color code. \n", "\n", "The interpolation method can also be selected for small matrices (`'nearest` by default, but you can choose `interpolation=\"bicubic\"` for a smoother display).\n", "\n", "(0, 0) is at the top-left of the image, the first axis is vertical. Change it with the `origin` parameter.\n", "\n", "**Q:** Create a 10x10 matrix (e.g. randomly) and plot it. Try different color maps ( and interpolation methods." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `plt.scatter()`\n", "\n", "If you want to display dots instead of of lines or pixels, `plt.scatter` takes two vectors of same size and plots them against each other:\n", "\n", "```python\n", "plt.scatter(x, y)\n", "```\n", "\n", "**Q:** Create two vectors with 100 elements and make a scatter plot." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `plt.hist()`\n", "\n", "Histograms can be useful to visualize the distribution of some data. If `z` is a vector of values, the histogram is simply:\n", "\n", "```python\n", "plt.hist(z, bins=20)\n", "```\n", "\n", "The number of bins is 10 by default, but you can of course change it.\n", "\n", "**Q:** Draw 1000 values from a normal distribution of your choice and make an histogram." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3.9.12 ('base')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" }, "vscode": { "interpreter": { "hash": "3d24234067c217f49dc985cbc60012ce72928059d528f330ba9cb23ce737906d" } } }, "nbformat": 4, "nbformat_minor": 4 }