{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Numpy and Matplotlib"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Numpy numerical library\n",
"\n",
"NumPy is a linear algebra library in Python, with computationally expensive methods written in FORTRAN for speed. \n",
"\n",
"* The reference manual is at . \n",
"* A nice tutorial can be found at \n",
"* or: \n",
"* If you already know Matlab, a comparison is at "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Importing libraries\n",
"\n",
"To import a library in Python, you only need to use the keyword `import` at the beginning of your script / notebook (or more exactly, before you use it).\n",
"\n",
"```python\n",
"import numpy\n",
"```\n",
"\n",
"Think of it as the equivalent of `#include ` in C/C++ (if you know Java, you will not be shocked). You can then use the functions and objects provided by the library using the namespace of the library:\n",
"\n",
"```python\n",
"x = numpy.array([1, 2, 3])\n",
"```\n",
"\n",
"If you do not want to type `numpy.` everytime, and if you are not afraid that numpy redefines any important function, you can also simply import every definition declared by the library in your current namespace with:\n",
"\n",
"```python\n",
"from numpy import *\n",
"```\n",
"\n",
"and use the objects directly:\n",
"\n",
"```python\n",
"x = array([1, 2, 3])\n",
"```\n",
"\n",
"However, it is good practice to give an alias to the library when its name is too long (numpy is still okay, but think of matplotlib...):\n",
"\n",
"```python\n",
"import numpy as np \n",
"```\n",
"\n",
"You can then use the objects like this:\n",
"\n",
"```python\n",
"x = np.array([1, 2, 3])\n",
"```\n",
"\n",
"Remember that you can get help on any NumPy function:\n",
"\n",
"```python\n",
"help(np.array)\n",
"help(np.ndarray.transpose)\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Vectors and matrices\n",
"\n",
"The basic object in NumPy is an **array** with d-dimensions (1D = vector, 2D = matrix, 3D or more = tensor). They can store either integers or floats, using various precisions.\n",
"\n",
"In order to create a vector of three floats, you simply have to build an `array()` object by providing a list of floats as input:\n",
"\n",
"```python\n",
"A = np.array( [ 1., 2., 3.] )\n",
"```\n",
"\n",
"Matrices should be initialized with a list of lists. For a 3x4 matrix of 8 bits unsigned integers, it is:\n",
"\n",
"```python\n",
"B = np.array( [ \n",
" [ 1, 2, 3, 4],\n",
" [ 5, 6, 7, 8],\n",
" [ 4, 3, 2, 1] \n",
" ] , dtype=np.uint8)\n",
"```\n",
"\n",
"Most of the time, you won't care about the type (the default floating-point precision is what you want for machine learning), but if you need it, you can always specify it with the parameter `dtype={int32, uint16, float64, ...}`. Note that even if you pass integers to the array (`np.array( [ 1, 2, 3] )`), they will be converted to floats by default."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following attributes of an array can be accessed:\n",
"\n",
"- `A.shape` : returns the shape of the vector `(n,)` or matrix `(m, n)`.\n",
"\n",
"- `A.size` : returns the total number of elements in the array.\n",
"\n",
"- `A.ndim` : returns the number of dimensions of the array (vector: 1, matrix:2).\n",
"\n",
"- `A.dtype.name` : returns the type of data stored in the array (int32, uint16, float64...)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Q:** Define the two arrays $A$ and $B$ from above and print those attributes. Modify the arrays (number of elements, type) and observe how they change. \n",
"\n",
"*Hint:* you can print an array just like any other Python object."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Internally, the values are stored sequentially as a vector, even if your array has more than one dimension. The apparent shape is just used for mathematical operations. You can **reshape** a matrix very easily with the `reshape()` method:\n",
"\n",
"```python\n",
"B = np.array( [ \n",
" [ 1, 2, 3, 4],\n",
" [ 5, 6, 7, 8],\n",
" [ 4, 3, 2, 1] \n",
"]) # B has 3 rows, 4 columns\n",
"\n",
"C = B.reshape((6, 2)) # C has 6 rows, 2 columns\n",
"```\n",
"\n",
"The only thing to respect is that the total number of elements must be the same. Beware also of the order in which the elements will be put.\n",
"\n",
"**Q:** Create a vector with 8 elements and reshape it into a 2x4 matrix."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Initialization of an array\n",
"\n",
"Providing a list of values to `array()` would be tedious for large arrays. Numpy offers constructors that allow to construct simply most vectors or matrices.\n",
"\n",
"`np.zeros(shape)` creates an array of shape `shape` filled with zeros. Note: if you give a single integer for the shape, it will be interpreted as a vector of shape `(d,)`. \n",
"\n",
"`np.ones(shape)` creates an array of shape `shape` filled with ones. \n",
"\n",
"`np.full(shape, val)` creates an array of shape `shape` filled with `val`. \n",
"\n",
"`np.eye(n)` creates a diagonal matrix of shape `(n, n)`.\n",
"\n",
"`np.arange(a, b)` creates a vector of integers whose value linearly increase from `a` to `b` (excluded).\n",
"\n",
"`np.linspace(a, b, n)` creates a vector of `n` values evenly distributed between `a` and `b` (included).\n",
"\n",
"\n",
"**Q:** Create and print:\n",
"\n",
"* a 2x3 matrix filled with zeros.\n",
"* a vector of 12 elements initialized to 3.14.\n",
"* a vector of 11 elements whose value linearly increases from 0.0 to 10.0.\n",
"* a vector of 11 elements whose value linearly increases from 10 to 20."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Random distributions\n",
"\n",
"In many cases, it is useful to initialize a vector or matrix with random values. **Random number generators** (rng) allows to draw numbers from any probability distribution (uniform, normal, etc.) using pseudo-random methods. \n",
"\n",
"In numpy versions before 1.16, the `numpy.random` module had direct methods allowing to initialize arrays:\n",
"\n",
"```python\n",
"A = np.random.uniform(-1.0, 1.0, (10, 10)) # a 10x10 matrix with values uniformly taken between -1 and 1\n",
"```\n",
"\n",
"Since numpy 1.16, this method has been deprecated in favor of a more explicit initialization of the underlying rng:\n",
"\n",
"```python\n",
"rng = np.random.default_rng()\n",
"A = rng.uniform(-1.0, 1.0, (10, 10))\n",
"```\n",
"\n",
"The advantages of this new method (reproducibility, parallel seeds) will not matter for these exercises, but let's take good habits already.\n",
"\n",
"The generator has many built-in methods, covering virtually any useful probability distribution. Read the documentation of the random generator:\n",
"\n",
"\n",
"\n",
"**Q:** Create:\n",
"\n",
"* A vector of 20 elements following a normal distribution with mean 2.0 and standard devation 3.0.\n",
"* A 10x10 matrix whose elements come from the exponential distribution with $\\beta = 2$.\n",
"* A vector of 10 integers randomly chosen between 1 and 100 (hint: involves `arange` and `rng.choice`)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Manipulation of matrices: indices, slices\n",
"\n",
"To access a particular element of a matrix, you can use the usual Python list style (the first element has a rank of 0), once per dimension:\n",
"\n",
"```python\n",
"A = np.array(\n",
" [ \n",
" [ 1, 2, 3, 4],\n",
" [ 5, 6, 7, 8],\n",
" [ 9, 10, 11, 12]\n",
" ]\n",
")\n",
"\n",
"x = A[0, 2] # The element on the first row and third column\n",
"```\n",
"\n",
"For matrices, the first index represents the rows, the second the columns. [0, 2] represents the element at the first row, third column.\n",
"\n",
"**Q:** Define this matrix and replace the element `12` by a zero using indices:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is possible to access complete row or columns of a matrix using **slices**. The `:` symbol is a shortcut for \"everything\":\n",
"\n",
"```python\n",
"b = A[:, 2] # third column\n",
"c = A[0, :] # first row\n",
"```\n",
"\n",
"**Q:** Set the fourth column of A to 1."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As for python lists, you can specify a range `start:stop` to get only a subset of a row/column (beware, stop is excluded):\n",
"\n",
"```python\n",
"d = A[0, 1:3] # second and third elements of the first row\n",
"e = A[1, :2] # first and second elements of the second row\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can use boolean arrays to retrieve indices:\n",
"\n",
"```python\n",
"A = np.array( \n",
" [ [ -2, 2, 1, -4],\n",
" [ 3, -1, -5, -3] ])\n",
"\n",
"negatives = A < 0 # Boolean array where each element is True when the condition is met.\n",
"A[negatives] = 0 # All negative elements of A (where the boolean matrix is True) will be set to 0\n",
"```\n",
"\n",
"A simpler way to write it is:\n",
"\n",
"```python\n",
"A[A < 0] = 0\n",
"```\n",
"\n",
"**Q:** print A, negatives and A again after the assignment:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Basic linear algebra \n",
"\n",
"Let's first define some matrices:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"A = np.array( [ [ 1, 2, 3, 4],\n",
" [ 5, 6, 7, 8] ])\n",
"\n",
"B = np.array( [ [ 1, 2],\n",
" [ 3, 4],\n",
" [ 5, 6],\n",
" [ 7, 8] ])\n",
"\n",
"C = np.array( [ [ 1, 2, 3, 4],\n",
" [ 5, 6, 7, 8],\n",
" [ 9, 0, 1, 1],\n",
" [ 13, 7, 2, 6] ])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Transpose a matrix \n",
"\n",
"A matrix can be transposed with the `transpose()` method or the `.T` shortcut:\n",
"\n",
"```python\n",
"D = A.transpose() \n",
"E = A.T # equivalent\n",
"```\n",
"\n",
"**Q:** Try it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`transpose()` does not change `A`, it only returns a transposed copy. To transpose `A` definitely, you have to use the assigment `A = A.T`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Multiply two matrices \n",
"\n",
"There are two manners to multiply matrices:\n",
"\n",
"- element-wise: Two arrays of **exactly** the same shape can be multiplied *element-wise* by using the `*` operator:\n",
"\n",
"```python\n",
"D = A * B\n",
"```\n",
"\n",
"- algebrically: To perform a **matrix multiplication**, you have to use the `dot()` method. Beware: the dimensions must match! `(m, n) * (n, p) = (m, p)`\n",
"\n",
"```python\n",
"E = np.dot(A, B)\n",
"```\n",
"\n",
"**Q:** Use the matrices `A` and `B` previously defined and multiply them element-wise and algebrically. You may have to transpose one of them."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Multiplying a matrix with a vector\n",
"\n",
"`*` and `np.dot` also apply on matrix-vector multiplications $\\mathbf{y} = A \\times \\mathbf{x}$ or vector-vector multiplications.\n",
"\n",
"**Q:** Define a vector $\\mathbf{x}$ with four elements and multiply it with the matrix $A$ using `*` and `np.dot`. What do you obtain? Try the same by multiplying the vector $\\mathbf{x}$ and itself."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Inverting a matrix\n",
"\n",
"Inverting a Matrix (when possible) can be done using the `inv()` method whitch is defined in the `linalg` submodule of NumPy.\n",
"\n",
"```python\n",
"inv_C = np.linalg.inv(C)\n",
"```\n",
"\n",
"**Q:**\n",
"\n",
"1. Invert `C` and print the result.\n",
"2. Multiply `C` with its inverse and print the result. What do observe? Why is Numpy called a *numerical computation* library?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Summing elements \n",
"\n",
"One can sum the elements of a matrix globally, row-wise or column-wise:\n",
"\n",
"```python\n",
"# Globally\n",
"S1 = np.sum(A)\n",
"\n",
"# Per column\n",
"S2 = np.sum(A, axis=0) \n",
"\n",
"# Per row\n",
"S3 = np.sum(A, axis=1) \n",
"```\n",
"\n",
"**Q:** Try them:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You also have access to the minimum (`np.min()`), maximum (`np.max()`), mean (`np.mean()`) of an array, also per row/column. \n",
"\n",
"**Q:** Try them out:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Mathematical operations \n",
"\n",
"You can apply any usual mathematical operations (cos, sin, exp, etc...) on each element of a matrix (element-wise):\n",
"\n",
"```python\n",
"D = np.exp(A)\n",
"E = np.cos(A)\n",
"F = np.log(A)\n",
"G = (A+3) * np.cos(A-2)\n",
"```\n",
"\n",
"**Q:** Try it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Matplotlib\n",
"\n",
"Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.\n",
"\n",
"* Reference: \n",
"* Tutorial by N. Rougier: \n",
"\n",
"This is the default historical visualization library in Python, which anybody should know, but not the nicest. If you are interested in having better visualizations, have a look at:\n",
"\n",
"* `seaborn` \n",
"* `ggplot2` \n",
"* `bokeh` \n",
"* `plotly` \n",
"\n",
"We will nevertheless stick to matplotlib in these exercises.\n",
"\n",
"The `pyplot` module is the most famous, as it has a similar interface to Matlab. It is customary to use the `plt` namescape for it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `plt.plot()`\n",
"\n",
"The `plt.plot()` command allows to make simple line drawings:\n",
"\n",
"```python\n",
"x = np.linspace(0., 10., 100)\n",
"y = x**2 + 1.\n",
"\n",
"plt.figure()\n",
"plt.plot(x, y)\n",
"plt.show()\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`plot()` takes two vectors `x` and `y` as inputs (they must have the same size) and plots them against each other. It is standard to define the x-axis with `np.linspace()` if you just want to plot a function. 100 points is usually a good choice, but you can experiments with less points.\n",
"\n",
"The call to `plt.show()` is obligatory at the end to display the window when using a script (very common mistake to forget it!). It is not needed in Jupyter notebooks as it is implicitly called, but let's take the habit anyway. \n",
"\n",
"The call to `plt.figure()` is also optional, as a new figure is created when you call `plt.plot()` for the first time.\n",
"\n",
"**Q:** Create a third vector `z` (e.g. `z = -x**2 + 2`) and plot it against `x` right after `y` (i.e. between `plt.plot(x, y)` and `plt.show()`). What happens?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Q:** Now call `plt.figure()` again between the two plots. What happens?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By default, the plot is quite empty. This is fine when experimenting in a notebook, but not when incorporating the figures in your thesis. You can make a plot look better by adding a title, labels on the axes, etc. \n",
"\n",
"```python\n",
"plt.title('My title')\n",
"plt.xlabel('x-axis')\n",
"plt.ylabel('y-axis')\n",
"```\n",
"\n",
"**Q:** Make the previous plots nicer by adding legends and axes.\n",
"\n",
"*Hint:* if you know LateX equations, you can insert simple formulas in the title or axes by using two dollar signs `$$`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you make multiple plots on the same figure by calling `plt.plot()` multiple times, you can add a label to each plot to create a legend with `plt.legend()`:\n",
"\n",
"```python\n",
"plt.plot(x, y, label='y')\n",
"plt.plot(x, z, label='z')\n",
"plt.legend()\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another advantage of declaring a figure is that you can modify its size (which is very small in a notebook by default) with the `figsize` argument in inches:\n",
"\n",
"```python\n",
"plt.figure(figsize=(16, 10))\n",
"```\n",
"\n",
"**Q:** Experiment with figure sizes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Side-by-side plots\n",
"\n",
"To make separate plots in the same figure, you can use `plt.subplot(abc)`.\n",
"\n",
"The function takes three digits a, b, c as input (e.g. 221 or 122) where:\n",
"\n",
"- a is the number of rows.\n",
"- b is the number of columns.\n",
"- c is the index (starting at 1) of the current subplot.\n",
"\n",
"Here is a dummy example of a 2x2 grid of plots:\n",
"\n",
"```python\n",
"plt.subplot(221)\n",
"plt.plot(x, y)\n",
"\n",
"plt.subplot(222)\n",
"plt.plot(x, z)\n",
"\n",
"plt.subplot(223)\n",
"plt.plot(y, x)\n",
"\n",
"plt.subplot(224)\n",
"plt.plot(z, x)\n",
"```\n",
"\n",
"**Q:** Try it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `plt.imshow`\n",
"\n",
"Matrices can be displayed using `plt.imshow()`. You can choose the color code with the `cmap` argument (e.g. `gray` or `hot`).\n",
"\n",
"```python\n",
"plt.imshow(A, cmap=plt.cm.hot, interpolation='nearest')\n",
"plt.colorbar()\n",
"```\n",
"\n",
"`plt.colorbar()` allows to show a vertical bar indicating the color code. \n",
"\n",
"The interpolation method can also be selected for small matrices (`'nearest` by default, but you can choose `interpolation=\"bicubic\"` for a smoother display).\n",
"\n",
"(0, 0) is at the top-left of the image, the first axis is vertical. Change it with the `origin` parameter.\n",
"\n",
"**Q:** Create a 10x10 matrix (e.g. randomly) and plot it. Try different color maps ( and interpolation methods."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `plt.scatter`\n",
"\n",
"If you want to display dots instead of of lines or pixels, `plt.scatter` takes two vectors of same size and plots them against each other:\n",
"\n",
"```python\n",
"plt.scatter(x, y)\n",
"```\n",
"\n",
"**Q:** Create two vectors with 100 elements and make a scatter plot."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `plt.hist()`\n",
"\n",
"Histograms can be useful to visualize the distribution of some data. If `z` is a vector of values, the histogram is simply:\n",
"\n",
"```python\n",
"plt.hist(z, bins=20)\n",
"```\n",
"\n",
"The number of bins is 10 by default, but you can of course change it.\n",
"\n",
"**Q:** Draw 1000 values from a normal distribution of your choice and make an histogram."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.12 ('base')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
},
"vscode": {
"interpreter": {
"hash": "3d24234067c217f49dc985cbc60012ce72928059d528f330ba9cb23ce737906d"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}