{ "cells": [ { "cell_type": "code", "execution_count": 578, "id": "2ac47cb3", "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "from preamble import *" ] }, { "cell_type": "code", "execution_count": 579, "id": "448b459d", "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torch.nn as nn" ] }, { "cell_type": "markdown", "id": "8b0a1a96", "metadata": {}, "source": [ "# Reproducibility\n", "- We set the random seeds so that the training results are always the same\n", "- Feel free to change the seed number to see the effects of the random initialization of the network weights on the training results" ] }, { "cell_type": "code", "execution_count": 580, "id": "ad3d6c46", "metadata": {}, "outputs": [], "source": [ "SEED = 10\n", "torch.manual_seed(SEED)\n", "torch.backends.openmp.deterministic = True\n", "np.random.seed(SEED)" ] }, { "cell_type": "markdown", "id": "69778674", "metadata": {}, "source": [ "# Conventions for this notebook\n", "\n", "## Jargon\n", "- Unit = activation = neuron\n", "- Model = neural network\n", "- Feature = dimension of input vector = number of independent variables\n", "- Hypothesis = prediction = output of the model\n", "\n", "\n", "## Indices\n", "- **Data points:** $i = 1,..., n$ \n", "- **Features:** $k = 1,..., p$ \n", "- **Layers:** $j = 1,..., l$ \n", "- **Activation unit label:** $s$ \n", "\n", "## Scalars\n", "- $u^j$ = number of units in layer $j$\n", "- $a_s^j$ is the activation unit $s$ in layer $j$\n", "\n", "## Vectors and matrices\n", "Check slide 13 of the [lecture notes](https://github.com/ansantam/2022-MT-ARD-ST3-ML-workshop/blob/main/slides/1-neural-networks.pdf) for a visualization of the dimensions.\n", "\n", "- $\\pmb{X}$: input vector of dimension $[n \\times (p \\times 1)]$\n", "- $a^j$: activation vector of layer $j$ of dimension $[(u^j + 1) \\times 1]$\n", "- $\\pmb{\\theta}^j$: weight matrix from layer $j$ to $j+1$, of dimension $[u^{j+1} \\times (u^j + 1)]$\n", "\n", "\n", " where the $+1$ accounts for the bias unit \n", "\n", "\n", "\n", "$$\n", "\\pmb{X} =\n", "\\begin{bmatrix}\n", "x_0 \\\\\n", "x_1 \\\\\n", "\\vdots \\\\\n", "x_p\n", "\\end{bmatrix} \\ \\ ; \\ \\\n", "\\pmb{\\theta}^j =\n", "\\begin{bmatrix}\n", "\\theta_{10} & \\dots & \\theta_{1(u^j + 1)}\\\\\n", "\\theta_{20} & \\ddots\\\\\n", "\\vdots \\\\\n", "\\theta_{(u^{j+1}) 0} & & \\theta_{(u^{j+1})(u^j + 1)}\\\\\n", "\\end{bmatrix} \n", "$$\n", "\n" ] }, { "cell_type": "markdown", "id": "fa2320f4", "metadata": {}, "source": [ "# 0. Introduction\n", "In this notebook we will train a neural network to fit an arbitrary function.\n", "\n", "## 0.1. Universal Approximation Theorem\n", "- When the activation function is non-linear, then a two-layer neural network can be proven to be a **universal function approximator**.\n", "- This is where the power of neural networks comes from! \n", "\n", "## 0.2. Create a function to fit\n", "Let's create a simple non-linear function to fit with our neural network:" ] }, { "cell_type": "code", "execution_count": 581, "id": "e58cebe3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5, 1.0, 'Function to be fitted')" ] }, "execution_count": 581, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sample_points = 3e3\n", "x_lim = 100\n", "x = np.linspace(0, x_lim, int(sample_points))\n", "y = np.sin(x * x_lim * 1e-4) * np.cos(x * x_lim * 1e-3) * 3\n", "plt.plot(x, y)\n", "plt.xlabel('X')\n", "plt.ylabel('Y')\n", "plt.title('Function to be fitted')" ] }, { "cell_type": "markdown", "id": "a73959ae", "metadata": {}, "source": [ "## 0.3. Data shape\n", "- Our data is 1D, meaning it has only one feature.\n", "- We want a model that for a given $x$ it returns the correspondent $y$ value.\n", "- This means that a model with one neuron input and a one neuron output suffices:" ] }, { "cell_type": "code", "execution_count": 582, "id": "13d603ef", "metadata": {}, "outputs": [], "source": [ "n_input = 1\n", "n_out = 1" ] }, { "cell_type": "markdown", "id": "5ba45463", "metadata": {}, "source": [ "In order for the model to take each point of the data one by one we need to do some additional re-shaping, where we introduce an additional dimension for each entry:" ] }, { "cell_type": "code", "execution_count": 583, "id": "5726813c", "metadata": {}, "outputs": [], "source": [ "x_reshape = x.reshape((int(len(x) / n_input), n_input))\n", "y_reshape = y.reshape((int(len(y) / n_out), n_out))" ] }, { "cell_type": "code", "execution_count": 584, "id": "e448eb81", "metadata": {}, "outputs": [], "source": [ "# # Uncomment to check the shape change\n", "# print(x.shape, y.shape)\n", "# print(x_reshape.shape, y_reshape.shape)\n", "# print(x[10], x_reshape[10])" ] }, { "cell_type": "markdown", "id": "4335bb62", "metadata": {}, "source": [ "## 0.4. Data type\n", "The data that we will input to the model needs to be of the type `torch.float32`\n", "\n", "_Side Remark_: The default dtype of torch tensors (also the layer parameters) is `torch.float32`, which is related to the GPU performance optimization. If one wants to use `torch.float64`/`torch.double` instead, one can set the tensors to double precision via `v = v.double()` or set the global precision via `torch.set_default_dtype(torch.float64)`. Just keep in mind, the NN parameters and the input tensors should have the same precision.\n", "\n", "Before starting, let's convert our data numpy arrays to torch tensors:" ] }, { "cell_type": "code", "execution_count": 585, "id": "b413e6fd", "metadata": {}, "outputs": [], "source": [ "x_torch = torch.from_numpy(x_reshape)\n", "y_torch = torch.from_numpy(y_reshape)" ] }, { "cell_type": "code", "execution_count": 586, "id": "57ce4393", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "float64 float64\n", "torch.float64 torch.float64\n" ] } ], "source": [ "# Type checking:\n", "print(x.dtype, y.dtype)\n", "print(x_torch.dtype, y_torch.dtype)" ] }, { "cell_type": "markdown", "id": "794384e7", "metadata": {}, "source": [ "The type is still not correct, but we can easily convert it:" ] }, { "cell_type": "code", "execution_count": 587, "id": "acce223d", "metadata": {}, "outputs": [], "source": [ "x_torch = x_torch.to(dtype=torch.float32)\n", "y_torch = y_torch.to(dtype=torch.float32)" ] }, { "cell_type": "code", "execution_count": 588, "id": "6597c669", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "float64 float64\n", "torch.float32 torch.float32\n" ] } ], "source": [ "# Type checking:\n", "print(x.dtype, y.dtype)\n", "print(x_torch.dtype, y_torch.dtype)" ] }, { "cell_type": "markdown", "id": "7b139da4", "metadata": {}, "source": [ "## 0.4. Data normalization\n", "We will also need to normalize the data to make sure we are in the non-linear region of the activation functions:\n", "" ] }, { "cell_type": "code", "execution_count": 589, "id": "6cb2f611", "metadata": {}, "outputs": [], "source": [ "x_norm = torch.nn.functional.normalize(x_torch, p=5, dim=0)\n", "y_norm = torch.nn.functional.normalize(y_torch, p=5, dim=0)" ] }, { "cell_type": "markdown", "id": "29c1044a", "metadata": {}, "source": [ "The [`torch.nn.functional.normalize`](https://pytorch.org/docs/stable/generated/torch.nn.functional.normalize.html) function performs $L_p$ normalization, where the $L_p$ norm is:\n", "\n", "$$\n", "||x||_p = (\\sum_{i=1}^n |x_i|^p)^{1/p} \\ \\ \\ p>0\n", "$$" ] }, { "cell_type": "code", "execution_count": 590, "id": "03c39d09", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5, 1.0, 'Normalized function')" ] }, "execution_count": 590, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(x_norm.detach().numpy(), y_norm.detach().numpy())\n", "plt.title('Normalized function')" ] }, { "cell_type": "markdown", "id": "f4f44199", "metadata": {}, "source": [ "# 2. Build your model" ] }, { "cell_type": "markdown", "id": "3eedf40a", "metadata": {}, "source": [ "- In PyTorch [`Sequential`](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html#torch.nn.Sequential) stands for *sequential container*, where modules can be added sequentially and are connected in a cascading way. The output for each module is forwarded sequentially to the next.\n", "- Now we will build a simple model with one hidden layer with `Sequential`\n", "- Remember that every layer in a neural network is followed by an **activation layer** that performs some additional operations on the neurons.\n", "\n", "## 2.1 Activation functions\n", "\n", "## 2.2. Model architecture" ] }, { "cell_type": "code", "execution_count": 591, "id": "56a81cf8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sequential(\n", " (0): Linear(in_features=1, out_features=5, bias=True)\n", " (1): Tanh()\n", " (2): Linear(in_features=5, out_features=1, bias=True)\n", " (3): Tanh()\n", ")\n" ] } ], "source": [ "n_hidden_01 = 5\n", "\n", "model0 = nn.Sequential(nn.Linear(n_input, n_hidden_01),\n", " nn.Tanh(),\n", " nn.Linear(n_hidden_01, n_out),\n", " nn.Tanh()\n", " )\n", "print(model0)" ] }, { "cell_type": "code", "execution_count": 592, "id": "ffe7db45", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sequential(\n", " (0): Linear(in_features=1, out_features=10, bias=True)\n", " (1): Tanh()\n", " (2): Linear(in_features=10, out_features=1, bias=True)\n", " (3): Tanh()\n", ")\n" ] } ], "source": [ "n_hidden_11 = 10\n", "\n", "model1 = nn.Sequential(nn.Linear(n_input, n_hidden_11),\n", " nn.Tanh(),\n", " nn.Linear(n_hidden_11, n_out),\n", " nn.Tanh()\n", " )\n", "print(model1)" ] }, { "cell_type": "code", "execution_count": 593, "id": "8f7cb08d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sequential(\n", " (0): Linear(in_features=1, out_features=5, bias=True)\n", " (1): Tanh()\n", " (2): Linear(in_features=5, out_features=5, bias=True)\n", " (3): Tanh()\n", " (4): Linear(in_features=5, out_features=1, bias=True)\n", " (5): Tanh()\n", ")\n" ] } ], "source": [ "n_hidden_21 = 5\n", "n_hidden_22 = 5\n", "model2 = nn.Sequential(nn.Linear(n_input, n_hidden_21),\n", " nn.Tanh(),\n", " nn.Linear(n_hidden_21, n_hidden_22),\n", " nn.Tanh(),\n", " nn.Linear(n_hidden_22, n_out),\n", " nn.Tanh()\n", " )\n", "print(model2)" ] }, { "cell_type": "markdown", "id": "bd82d8db", "metadata": {}, "source": [ "\n", " How much do you think each hyperparameter will affect the quality of the model? \n", "" ] }, { "cell_type": "markdown", "id": "4c905a1b", "metadata": {}, "source": [ "You can uncomment and execute the next line to explore the methods of the `model` object you created" ] }, { "cell_type": "code", "execution_count": 594, "id": "2d4584d5", "metadata": {}, "outputs": [], "source": [ "# dir(model)" ] }, { "cell_type": "markdown", "id": "f6d3419d", "metadata": {}, "source": [ "## 2.1 - Understanding the PyTorch model\n", "Try the `parameters` method (needs to be instantiated)." ] }, { "cell_type": "code", "execution_count": 595, "id": "a0d0892c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 595, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model0.parameters()" ] }, { "cell_type": "markdown", "id": "c834e0b8", "metadata": {}, "source": [ "The `parameters` method gives back a *generator*, which means it needs to be iterated over to give back an output:" ] }, { "cell_type": "code", "execution_count": 596, "id": "08caf645", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "torch.Size([5, 1])\n", "torch.Size([5])\n", "torch.Size([1, 5])\n", "torch.Size([1])\n" ] } ], "source": [ "for element in model0.parameters():\n", " print(element.shape)" ] }, { "cell_type": "markdown", "id": "e44a30e9", "metadata": {}, "source": [ "\n", " Without taking into account any bias unit: can you identify the elements of the model by their dimensions?\n", "" ] }, { "cell_type": "markdown", "id": "a750495a", "metadata": {}, "source": [ "- The first element corresponds to the weight matrix $\\theta^0$ from layer 0 to layer 1, of dimensions $u^{j+1} \\times u^j = u^2 \\times u^1$ (so, without bias)\n", "- The second element corresponds to the values of the activation units in layer 1\n", "- The third element corresponds to the weight matrix $\\theta^1$ from layer 1 to layer 2, of dimensions $u^{j+1} \\times u^j = u^3 \\times u^3 $ (without bias)\n", "- The fourth element is the output of the model" ] }, { "cell_type": "markdown", "id": "35a13a21", "metadata": {}, "source": [ "Let's have a look at what the contents of those tensors:" ] }, { "cell_type": "code", "execution_count": 597, "id": "383b806d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Parameter containing:\n", "tensor([[-0.0838],\n", " [-0.0343],\n", " [-0.3750],\n", " [ 0.2300],\n", " [-0.5721]], requires_grad=True)\n", "Parameter containing:\n", "tensor([-0.1763, 0.3876, 0.9386, 0.2356, -0.3393], requires_grad=True)\n", "Parameter containing:\n", "tensor([[ 0.0429, -0.0501, 0.1825, 0.0512, 0.1752]], requires_grad=True)\n", "Parameter containing:\n", "tensor([0.4337], requires_grad=True)\n" ] } ], "source": [ "for element in model0.parameters():\n", " print(element)" ] }, { "cell_type": "markdown", "id": "e6da98e7", "metadata": {}, "source": [ "\n", " What are these values?\n", "" ] }, { "cell_type": "markdown", "id": "f11f48b6", "metadata": {}, "source": [ "# 3 - Define the loss function\n", "- Reminder: the **loss function** measures how distant the predictions made by the model are from the actual values\n", "- `torch.nn` provides many different types of [loss functions](https://pytorch.org/docs/stable/nn.html#loss-functions). One of the most popular ones in the [Mean Squared Error (MSE)](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss) since it can be applied to a wide variety of cases.\n", "- In general cost functions are chosen depending on desirable properties, such as convexity." ] }, { "cell_type": "code", "execution_count": 598, "id": "2ebd648b", "metadata": {}, "outputs": [], "source": [ "loss_function = nn.MSELoss()" ] }, { "cell_type": "markdown", "id": "b443be59", "metadata": {}, "source": [ "# 4 - Define the optimizer\n", "[`torch.optim`](https://pytorch.org/docs/stable/optim.html) provides implementations of various optimization algorithms. The optimizer object will hold the current state and will update the parameters of the model based on computer gradients. It takes as an input an iterable containing the model parameters, that we explored before." ] }, { "cell_type": "code", "execution_count": 599, "id": "9d116553", "metadata": {}, "outputs": [], "source": [ "batch_size = 200 # how many points to pass to the model at a time\n", "learning_rate = 0.015" ] }, { "cell_type": "code", "execution_count": 600, "id": "8921c861", "metadata": {}, "outputs": [], "source": [ "optimizer0 = torch.optim.Adam(model0.parameters(), lr=learning_rate)\n", "optimizer1 = torch.optim.Adam(model1.parameters(), lr=learning_rate)\n", "optimizer2 = torch.optim.Adam(model2.parameters(), lr=learning_rate)" ] }, { "cell_type": "code", "execution_count": 601, "id": "b89395a5", "metadata": {}, "outputs": [], "source": [ "# optimizer0 = torch.optim.SGD(model0.parameters(), lr=learning_rate)\n", "# optimizer1 = torch.optim.SGD(model1.parameters(), lr=learning_rate)\n", "# optimizer2 = torch.optim.SGD(model2.parameters(), lr=learning_rate)" ] }, { "cell_type": "markdown", "id": "5fbca570", "metadata": {}, "source": [ "# 5 - Train the model on a loop\n", "The model learns iteratively in a loop of a given number of epochs. Each loop consists of:\n", "- A **forward propagation**: compute $y$ given the input $x$ and current weights and calculate the loss\n", "- A **backward propagation**: compute the gradient of the loss function (error of the loss at each unit)\n", "- Gradient descent: update model weights" ] }, { "cell_type": "code", "execution_count": 602, "id": "94274de5", "metadata": {}, "outputs": [], "source": [ "epochs = 1000" ] }, { "cell_type": "code", "execution_count": 603, "id": "77062514", "metadata": {}, "outputs": [], "source": [ "losses0 = []\n", "for epoch in range(epochs):\n", " pred_y0 = model0(x_norm)\n", " optimizer0.zero_grad()\n", " loss0 = loss_function(pred_y0, y_norm)\n", " losses0.append(loss0.item())\n", " loss0.backward()\n", " optimizer0.step()" ] }, { "cell_type": "code", "execution_count": 604, "id": "628a5f23", "metadata": {}, "outputs": [], "source": [ "losses1 = []\n", "for epoch in range(epochs):\n", " pred_y1 = model1(x_norm)\n", " optimizer1.zero_grad()\n", " loss1 = loss_function(pred_y1, y_norm)\n", " losses1.append(loss1.item())\n", " loss1.backward()\n", " optimizer1.step()" ] }, { "cell_type": "code", "execution_count": 605, "id": "f360c837", "metadata": {}, "outputs": [], "source": [ "losses2 = []\n", "for epoch in range(epochs):\n", " pred_y2 = model2(x_norm)\n", " optimizer2.zero_grad()\n", " loss2 = loss_function(pred_y2, y_norm)\n", " losses2.append(loss2.item())\n", " loss2.backward()\n", " optimizer2.step()" ] }, { "cell_type": "code", "execution_count": 606, "id": "bd335e65", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(losses0, label='Model 0')\n", "plt.plot(losses1, label='Model 1')\n", "plt.plot(losses2, label='Model 2')\n", "plt.ylabel('Loss')\n", "plt.xlabel('Epoch')\n", "plt.title(\"Learning rate %f\"%(learning_rate))\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "7f72684a", "metadata": {}, "source": [ "# 6 - Test the trained model\n", "- Let's create some random points in the x-axis within the model's interval that will serve as test data.\n", "- We will do the same data manipulations as before." ] }, { "cell_type": "code", "execution_count": 607, "id": "b953f650", "metadata": {}, "outputs": [], "source": [ "test_points = 50\n", "x_test = np.random.uniform(0, np.max(x_norm.detach().numpy()), test_points)\n", "x_test_reshape = x_test.reshape((int(len(x_test) / n_input), n_input))\n", "x_test_torch = torch.from_numpy(x_test_reshape)\n", "x_test_torch = x_test_torch.to(dtype=torch.float32)" ] }, { "cell_type": "markdown", "id": "3cc5f187", "metadata": {}, "source": [ "Now we predict the y-value with our model:" ] }, { "cell_type": "code", "execution_count": 608, "id": "cfa4d71a", "metadata": {}, "outputs": [], "source": [ "y0_test_torch = model0(x_test_torch)\n", "y1_test_torch = model1(x_test_torch)\n", "y2_test_torch = model2(x_test_torch)" ] }, { "cell_type": "code", "execution_count": 609, "id": "bfbc0ae0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 609, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(x_norm.detach().numpy(), y_norm.detach().numpy())\n", "plt.scatter(x_test_torch.detach().numpy(), y0_test_torch.detach().numpy(), color='red', label='Model 0')\n", "plt.scatter(x_test_torch.detach().numpy(), y1_test_torch.detach().numpy(), color='orange', label='Model 1')\n", "plt.scatter(x_test_torch.detach().numpy(), y2_test_torch.detach().numpy(), color='green', label='Model 2')\n", "plt.legend()" ] }, { "cell_type": "markdown", "id": "67c1e49a", "metadata": {}, "source": [ "\n", " What do you think of these results, do they follow the universal approximation theorem?\n", "" ] }, { "cell_type": "markdown", "id": "059e1811", "metadata": {}, "source": [ "# 7 - Play with the notebook!\n", "Some ideas:\n", "- Change the number of epochs in `Section 5` to 5000 and re-train the models. What happens?\n", "- Change the random seed in the `Reproducibility` cell at the very top. How do the results change?\n", "- Change the optimizer in `Section 4` from `Adam` to `SGD` and re-train the models. What happens?\n", "- [**if time allows, takes several minutes**] Change the epochs in `Section 5` to 1000000. What happens?\n", "- Go back to 1000 epochs and the Adam optimizer. Change the learning rate in `Section 4` to 0.05. How do the results change? what does it tell us about our previous value?\n", "- Change the learning rate to 0.5. What happens now?\n", "\n" ] }, { "cell_type": "code", "execution_count": 610, "id": "6455d58a", "metadata": {}, "outputs": [], "source": [ "# %reset -f" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 5 }