{ "cells": [ { "cell_type": "markdown", "id": "268e92dc-f5f5-4b6c-86df-67358aec18a6", "metadata": {}, "source": [ "# Learning a Simple Linear Model with PyTorch\n", "\n", "**Authors:** Jeffrey Huang and Alex Michels\n", "\n", "In this notebook, we will use PyTorch for a simple linear regression model." ] }, { "cell_type": "code", "execution_count": 1, "id": "0d186bbb-3290-4cef-bd0d-760fd6d5d84e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'2.1.0+cu121'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from pathlib import Path\n", "import time\n", "import torch\n", "from torch import nn # nn contains all of PyTorch's building blocks for neural networks\n", "\n", "# Check PyTorch version\n", "torch.__version__" ] }, { "cell_type": "markdown", "id": "4ae430a7", "metadata": {}, "source": [ "This notebook is a quick overview of the general ML workflow. In this scenario, we don't have data to start off with. After importing the necessary libraries, we instead create our own data that approximates a linear shape. This can be done using PyTorch's built in function for generating arrays, which are called tensors in an ML context." ] }, { "cell_type": "code", "execution_count": 2, "id": "942dde0e-0f32-43e5-a0bb-2dc6b6725870", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(tensor([[0.0000],\n", " [0.0200],\n", " [0.0400],\n", " [0.0600],\n", " [0.0800],\n", " [0.1000],\n", " [0.1200],\n", " [0.1400],\n", " [0.1600],\n", " [0.1800]]),\n", " tensor([[0.3000],\n", " [0.3140],\n", " [0.3280],\n", " [0.3420],\n", " [0.3560],\n", " [0.3700],\n", " [0.3840],\n", " [0.3980],\n", " [0.4120],\n", " [0.4260]]))" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create *known* parameters, our model will try to predict these\n", "weight = 0.7\n", "bias = 0.3\n", "\n", "# Create data\n", "start = 0\n", "end = 1\n", "step = 0.02\n", "X = torch.arange(start, end, step).unsqueeze(dim=1)\n", "y = weight * X + bias\n", "\n", "X[:10], y[:10]" ] }, { "cell_type": "markdown", "id": "d4db6553", "metadata": {}, "source": [ "Next, it is necessary to split our generated dataset into training and validation sets. This is a necessary step in order to test the performance of model on data that it hasn't seen before, so called \"out-of-sample\" data. The normal ratio for a train test split is 80-20." ] }, { "cell_type": "code", "execution_count": 3, "id": "7129a675-2eab-4151-b3c5-94a9ee097055", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(40, 40, 10, 10)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create random train/test split\n", "train_split = int(0.8 * len(X)) # 80% of data used for training set, 20% for testing\n", "indices = np.random.permutation(X.shape[0]) # shuffle the data so our partitions are random\n", "training_idx, test_idx = indices[:train_split], indices[train_split:]\n", "X_train, y_train = X[training_idx, :], y[training_idx, :]\n", "X_test, y_test = X[test_idx, :], y[test_idx, :]\n", "\n", "len(X_train), len(y_train), len(X_test), len(y_test)" ] }, { "cell_type": "markdown", "id": "d69b659d", "metadata": {}, "source": [ "We can next define a function to help visualize the data. This is especially effective with data of simple dimensions that would be plotted on a 2D plane, such as our generated data. More complex, higher dimensional data may require some other form of representation." ] }, { "cell_type": "code", "execution_count": 4, "id": "56eecfa0-ea57-427e-982d-e5350158a74b", "metadata": {}, "outputs": [], "source": [ "def plot_predictions(train_data=X_train, \n", " train_labels=y_train, \n", " test_data=X_test, \n", " test_labels=y_test, \n", " predictions=None):\n", " \"\"\"\n", " Plots training data, test data and compares predictions.\n", " \"\"\"\n", " plt.figure(figsize=(10, 7))\n", " # Plot training data in blue\n", " plt.scatter(train_data, train_labels, c=\"b\", s=4, label=\"Training data\")\n", " # Plot test data in green\n", " plt.scatter(test_data, test_labels, c=\"orange\", s=4, label=\"Testing data\")\n", " if predictions is not None:\n", " # Plot the predictions in red (predictions were made on the test data)\n", " plt.scatter(test_data, predictions, c=\"r\", s=4, label=\"Predictions\")\n", " # Show the legend\n", " plt.legend(prop={\"size\": 14});" ] }, { "cell_type": "markdown", "id": "3e07c4fa", "metadata": {}, "source": [ "We can test this function below. Understandably, we only see a straight line at first. The prediction points are not present yet because we haven't yet made a model for our data, let alone train or predict points with it." ] }, { "cell_type": "code", "execution_count": 5, "id": "2bf4c854-0895-4446-9417-65d53179b107", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plot_predictions()" ] }, { "cell_type": "markdown", "id": "91695133", "metadata": {}, "source": [ "## Creating the Model\n", "\n", "The next step is to define the model. As this is mostly linear data, the neural network for this simple; there isn't any complex architecture with many hidden layers. \n", "\n", "Instead, the model's weights and biases replicate the standard linear regression model. \n", "\n", "We also define a forward function for making a forward pass through the model. In more complex ML models this would be a function to let data go through every layer in the model to generate an output. Since this is a simple linear case, the forward function just plugs in an input x into weight and bias parameters." ] }, { "cell_type": "code", "execution_count": 6, "id": "cf5a7f4d-1e8d-4484-87d5-53b0461595ea", "metadata": {}, "outputs": [], "source": [ "# Create a Linear Regression model class\n", "class LinearRegressionModel(nn.Module): # <- almost everything in PyTorch is a nn.Module (think of this as neural network lego blocks)\n", " def __init__(self):\n", " super().__init__() \n", " self.weights = nn.Parameter(torch.randn(1, # <- start with random weights (this will get adjusted as the model learns)\n", " dtype=torch.float), # <- PyTorch loves float32 by default\n", " requires_grad=True) # <- can we update this value with gradient descent?)\n", "\n", " self.bias = nn.Parameter(torch.randn(1, # <- start with random bias (this will get adjusted as the model learns)\n", " dtype=torch.float), # <- PyTorch loves float32 by default\n", " requires_grad=True) # <- can we update this value with gradient descent?))\n", "\n", " # Forward defines the computation in the model\n", " def forward(self, x: torch.Tensor) -> torch.Tensor: # <- \"x\" is the input data (e.g. training/testing features)\n", " return self.weights * x + self.bias # <- this is the linear regression formula (y = m*x + b)" ] }, { "cell_type": "code", "execution_count": 7, "id": "f6691ccd-7fcd-4505-8ca5-1e8aa136e159", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[Parameter containing:\n", " tensor([0.3367], requires_grad=True),\n", " Parameter containing:\n", " tensor([0.1288], requires_grad=True)]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Set manual seed since nn.Parameter are randomly initialzied\n", "torch.manual_seed(42)\n", "\n", "# Create an instance of the model (this is a subclass of nn.Module that contains nn.Parameter(s))\n", "model_0 = LinearRegressionModel()\n", "\n", "# Check the nn.Parameter(s) within the nn.Module subclass we created\n", "list(model_0.parameters())" ] }, { "cell_type": "markdown", "id": "581537ab", "metadata": {}, "source": [ "We can use the following syntax to make predictions with our model. Note that we haven't yet trained the model. Could you guess as to how well these predictions will perform? \n", "\n", "These will be terrible predictions because the model hasn't even seen the training data yet." ] }, { "cell_type": "code", "execution_count": 8, "id": "2f72d15d-a977-4215-8a2f-3bfa79ed660a", "metadata": {}, "outputs": [], "source": [ "# Make predictions with model\n", "with torch.inference_mode(): \n", " y_preds = model_0(X_test)" ] }, { "cell_type": "code", "execution_count": 9, "id": "c91ae8e5-c8b0-4c47-8e33-3ffe9f117f1a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of testing samples: 10\n", "Number of predictions made: 10\n", "Predicted values:\n", "tensor([[0.3914],\n", " [0.4588],\n", " [0.2635],\n", " [0.1355],\n", " [0.4386],\n", " [0.2298],\n", " [0.1490],\n", " [0.3376],\n", " [0.4251],\n", " [0.1894]])\n" ] } ], "source": [ "print(f\"Number of testing samples: {len(X_test)}\") \n", "print(f\"Number of predictions made: {len(y_preds)}\")\n", "print(f\"Predicted values:\\n{y_preds}\")" ] }, { "cell_type": "markdown", "id": "3b286c67-7d69-488b-ae48-808631432c02", "metadata": {}, "source": [ "We can plot our predictions to see that our model is very bad without training:" ] }, { "cell_type": "code", "execution_count": 10, "id": "95197ded-d2d0-4065-a209-fb51da64859f", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plot_predictions(predictions=y_preds)" ] }, { "cell_type": "markdown", "id": "abe9e56b", "metadata": {}, "source": [ "## Training the Model\n", "\n", "To prepare for the model training process, we additionally have to define a loss function and an optimizer. In this case we use the Mean Absolute Error (MAE) as our loss function.\n", "\n", "$$MAE = \\frac{\\sum_{i=1}^{n}|y_{i}-x_{i}|}{n} = \\frac{\\sum_{i=1}^{n} |e_{i}|}{n}$$\n", "\n", "where $y_{i}$ are the predictions, $x_{i}$ are the true values and $e_{i}$ is the error ($y_{i}-x_{i}$).\n", "\n", "\n", "We will optimize with the [Stochastic Gradient Descent algorithm](https://en.wikipedia.org/wiki/Stochastic_gradient_descent)." ] }, { "cell_type": "code", "execution_count": 11, "id": "d6bda52d-d676-4212-b994-8410934bd6cb", "metadata": {}, "outputs": [], "source": [ "# Create the loss function\n", "loss_fn = nn.L1Loss() # MAE loss is same as L1Loss\n", "\n", "# Create the optimizer\n", "optimizer = torch.optim.SGD(params=model_0.parameters(), # parameters of target model to optimize\n", " lr=0.01) # learning rate (how much the optimizer should change parameters at each step, higher=more (less stable), lower=less (might take a long time))" ] }, { "cell_type": "markdown", "id": "d15cdb75", "metadata": {}, "source": [ "It's also good to check whether we are currently using CPU or GPU resources. Many ML tasks execute more efficiently on GPUs, and so it is worthwhile to execute this step. The following lines of code check if the current device PyTorch is running on is CPU or CUDA, along with more information. Depending on your system environment, this may change." ] }, { "cell_type": "code", "execution_count": 12, "id": "9ecb3850-501e-4656-9cf5-84e31a5c9ae7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using device: cpu\n", "\n" ] } ], "source": [ "# train on the GPU or on the CPU, if a GPU is not available\n", "device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')\n", "\n", "print('Using device:', device)\n", "print()\n", "\n", "#Additional Info when using cuda\n", "if device.type == 'cuda':\n", " print(torch.cuda.get_device_name(0))\n", " print('Memory Usage:')\n", " print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')\n", " print('Cached: ', round(torch.cuda.memory_reserved(0)/1024**3,1), 'GB')" ] }, { "cell_type": "markdown", "id": "591f3874", "metadata": {}, "source": [ "Now begins the long awaited training step. We put together the functions we've defined previously iteratively feed the dataset through the model (each iteration is called an \"epoch\"). Over multiple iterations, the model parameters will be optimized. It is also worthwhile to see the time it takes for the model training step. As this is a simple linear model, training should be quick." ] }, { "cell_type": "code", "execution_count": 13, "id": "a34c2021-e94c-42e4-89ae-0690d730e8d1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch: 0 | MAE Train Loss: 0.34703248739242554 | MAE Test Loss: 0.345443993806839 \n", "Epoch: 10 | MAE Train Loss: 0.2236068695783615 | MAE Test Loss: 0.22056642174720764 \n", "Epoch: 20 | MAE Train Loss: 0.10422110557556152 | MAE Test Loss: 0.10919070243835449 \n", "Epoch: 30 | MAE Train Loss: 0.056145280599594116 | MAE Test Loss: 0.07201977074146271 \n", "Epoch: 40 | MAE Train Loss: 0.04523301124572754 | MAE Test Loss: 0.06203325837850571 \n", "Epoch: 50 | MAE Train Loss: 0.0405350998044014 | MAE Test Loss: 0.05537428334355354 \n", "Epoch: 60 | MAE Train Loss: 0.03626192361116409 | MAE Test Loss: 0.04946230724453926 \n", "Epoch: 70 | MAE Train Loss: 0.03198875114321709 | MAE Test Loss: 0.04355033114552498 \n", "Epoch: 80 | MAE Train Loss: 0.02771509811282158 | MAE Test Loss: 0.037638384848833084 \n", "Epoch: 90 | MAE Train Loss: 0.023439554497599602 | MAE Test Loss: 0.0317264162003994 \n", "Training took 0.2305295467376709 seconds\n" ] } ], "source": [ "start_time = time.time()\n", "torch.manual_seed(42)\n", "\n", "# Set the number of epochs (how many times the model will pass over the training data)\n", "epochs = 100\n", "\n", "# Create empty loss lists to track values\n", "train_loss_values = []\n", "test_loss_values = []\n", "epoch_count = []\n", "\n", "for epoch in range(epochs):\n", " ### Training\n", " # Put model in training mode (this is the default state of a model)\n", " model_0.train()\n", " # 1. Forward pass on train data using the forward() method inside \n", " y_pred = model_0(X_train)\n", " # print(y_pred)\n", " # 2. Calculate the loss (how different are our models predictions to the ground truth)\n", " loss = loss_fn(y_pred, y_train)\n", " # 3. Zero grad of the optimizer\n", " optimizer.zero_grad()\n", " # 4. Loss backwards\n", " loss.backward()\n", " # 5. Progress the optimizer\n", " optimizer.step()\n", " ### Testing\n", " # Put the model in evaluation mode\n", " model_0.eval()\n", " with torch.inference_mode():\n", " # 1. Forward pass on test data\n", " test_pred = model_0(X_test)\n", "\n", " # 2. Caculate loss on test data\n", " test_loss = loss_fn(test_pred, y_test.type(torch.float)) # predictions come in torch.float datatype, so comparisons need to be done with tensors of the same type\n", "\n", " # Print out what's happening\n", " if epoch % 10 == 0:\n", " epoch_count.append(epoch)\n", " train_loss_values.append(loss.detach().numpy())\n", " test_loss_values.append(test_loss.detach().numpy())\n", " print(f\"Epoch: {epoch} | MAE Train Loss: {loss} | MAE Test Loss: {test_loss} \")\n", "curr_time = time.time()\n", "print(f\"Training took {curr_time - start_time} seconds\")" ] }, { "cell_type": "markdown", "id": "1c6d85dd", "metadata": {}, "source": [ "Plotting the loss curves over epochs shows how loss decreases through each iteration, showing the model improving incrementally." ] }, { "cell_type": "code", "execution_count": 14, "id": "8c9b8154-7d2d-42d8-bd6e-d9d3db8156ce", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Plot the loss curves\n", "plt.plot(epoch_count, train_loss_values, label=\"Train loss\")\n", "plt.plot(epoch_count, test_loss_values, label=\"Test loss\")\n", "plt.title(\"Training and test loss curves\")\n", "plt.ylabel(\"Loss\")\n", "plt.xlabel(\"Epochs\")\n", "plt.legend();" ] }, { "cell_type": "code", "execution_count": 15, "id": "9dd7076e-6115-4de0-9cc3-0a2c9cc076f6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The model learned the following values for weights and bias:\n", "OrderedDict([('weights', tensor([0.6178])), ('bias', tensor([0.3428]))])\n", "\n", "And the original values for weights and bias are:\n", "weights: 0.7, bias: 0.3\n" ] } ], "source": [ "# Find our model's learned parameters\n", "print(\"The model learned the following values for weights and bias:\")\n", "print(model_0.state_dict())\n", "print(\"\\nAnd the original values for weights and bias are:\")\n", "print(f\"weights: {weight}, bias: {bias}\")" ] }, { "cell_type": "markdown", "id": "7bd562fd", "metadata": {}, "source": [ "## Testing the Model\n", "\n", "We can now evaluate the trained model. We see that it very nearly approximates the values in the testing set." ] }, { "cell_type": "code", "execution_count": 16, "id": "b965b5a3-4aed-4fd6-ad74-038113c05623", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([[0.8247],\n", " [0.9482],\n", " [0.5899],\n", " [0.3552],\n", " [0.9112],\n", " [0.5281],\n", " [0.3799],\n", " [0.7258],\n", " [0.8864],\n", " [0.4540]])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 1. Set the model in evaluation mode\n", "model_0.eval()\n", "\n", "# 2. Setup the inference mode context manager\n", "with torch.inference_mode():\n", " # 3. Make sure the calculations are done with the model and data on the same device\n", " # in our case, we haven't setup device-agnostic code yet so our data and model are\n", " # on the CPU by default.\n", " # model_0.to(device)\n", " # X_test = X_test.to(device)\n", " y_preds = model_0(X_test)\n", "y_preds" ] }, { "cell_type": "code", "execution_count": 17, "id": "c24e631f-b161-44fd-94ef-758c315bfc24", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plot_predictions(predictions=y_preds)" ] }, { "cell_type": "markdown", "id": "7c048192", "metadata": {}, "source": [ "## Saving the Trained Model\n", "\n", "Lastly, we can save the trained model. We can either save the entire model or just its state_dict (its parameters). The latter is the safer option in most cases." ] }, { "cell_type": "code", "execution_count": 18, "id": "a8349e7e-7c9d-49dd-85e1-9c7ee2eabb2b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Saving model to: models/01_pytorch_workflow_model_0.pth\n" ] } ], "source": [ "# 1. Create models directory \n", "MODEL_PATH = Path(\"models\")\n", "MODEL_PATH.mkdir(parents=True, exist_ok=True)\n", "\n", "# 2. Create model save path \n", "MODEL_NAME = \"01_pytorch_workflow_model_0.pth\"\n", "MODEL_SAVE_PATH = MODEL_PATH / MODEL_NAME\n", "\n", "# 3. Save the model state dict \n", "print(f\"Saving model to: {MODEL_SAVE_PATH}\")\n", "torch.save(obj=model_0.state_dict(), # only saving the state_dict() only saves the models learned parameters\n", " f=MODEL_SAVE_PATH)" ] }, { "cell_type": "code", "execution_count": 19, "id": "ae1fb441-97d6-473a-9f63-646601cd508b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-rw-r--r-- 1 jovyan users 1680 Nov 7 20:37 models/01_pytorch_workflow_model_0.pth\n" ] } ], "source": [ "# Check the saved file path\n", "!ls -l models/01_pytorch_workflow_model_0.pth" ] }, { "cell_type": "code", "execution_count": 20, "id": "02d45a04-583e-434c-881a-cafb2ac4c924", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Instantiate a new instance of our model (this will be instantiated with random weights)\n", "loaded_model_0 = LinearRegressionModel()\n", "\n", "# Load the state_dict of our saved model (this will update the new instance of our model with trained weights)\n", "loaded_model_0.load_state_dict(torch.load(f=MODEL_SAVE_PATH))" ] }, { "cell_type": "code", "execution_count": 21, "id": "a302042f-1bfb-463d-abac-dd24f645ed91", "metadata": {}, "outputs": [], "source": [ "# 1. Put the loaded model into evaluation mode\n", "loaded_model_0.eval()\n", "\n", "# 2. Use the inference mode context manager to make predictions\n", "with torch.inference_mode():\n", " loaded_model_preds = loaded_model_0(X_test) # perform a forward pass on the test data with the loaded model" ] }, { "cell_type": "code", "execution_count": 22, "id": "9116eee9-abe7-4b26-9e80-cd7b9fa154bd", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([[True],\n", " [True],\n", " [True],\n", " [True],\n", " [True],\n", " [True],\n", " [True],\n", " [True],\n", " [True],\n", " [True]])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Compare previous model predictions with loaded model predictions (these should be the same)\n", "y_preds == loaded_model_preds" ] }, { "cell_type": "code", "execution_count": null, "id": "ca43e7cd-e9af-4a1d-a99d-3e16ba6c9015", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3-0.9.4", "language": "python", "name": "python3-0.9.4" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" } }, "nbformat": 4, "nbformat_minor": 5 }