{ "cells": [ { "cell_type": "markdown", "id": "299596ab", "metadata": { "id": "299596ab" }, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/fonnesbeck/Bios8366/blob/master/notebooks/Section7_2-Introduction-to-PyTorch.ipynb)\n", "\n", "# Introduction to PyTorch\n", "\n", "![](https://github.com/fonnesbeck/Bios8366/blob/master/notebooks/images/Pytorch_logo.png?raw=true)" ] }, { "cell_type": "markdown", "id": "d40eaa4e", "metadata": { "id": "d40eaa4e" }, "source": [ "PyTorch is one of the most popular deep learning Python libraries, and it is widely used by the AI research community. Many developers and researchers use PyTorch to accelerate deep learning research experimentation and prototyping.\n", "\n", "The PyTorch library is primarily developed by Facebook’s AI Research Lab (FAIR) and is free and open source software with over 1,700 contributors. It allows you to easily run array-based calculations, build dynamic neural networks, and perform autodifferentiation in Python with strong graphics processing unit (GPU) acceleration—all important features required for deep learning research. Although some use it for accelerated tensor computing, most use it for deep learning development.\n", "\n", "PyTorch’s simple and flexible interface enables fast experimentation. You can load data, apply transforms, and build models with a few lines of code. Then, you have the flexibility to write customized training, validation, and test loops and deploy trained models with ease.\n", "\n", "Many developers and researchers use PyTorch to accelerate deep learning research experimentation and prototyping. Its simple Python API, GPU support, and flexibility make it a popular choice among academic and commercial research organizations. Since being open sourced in 2018, PyTorch has reached a stable release and can be easily installed on Windows, Mac, and Linux operating systems. The framework continues to expand rapidly and now facilitates deployment to production environments in the cloud and mobile platforms." ] }, { "cell_type": "markdown", "id": "48b2244d", "metadata": { "id": "48b2244d" }, "source": [ "## PyTorch Programming Concepts\n", "\n", "PyTorch's computation takes place over a **directed graph**, which is comprised of a set of *nodes* and associated *edges*, used to describe operations and tensors.\n", "\n", "![](https://github.com/fonnesbeck/Bios8366/blob/master/notebooks/images/tf_dag.png?raw=true)\n", "\n", "### Tensors\n", "\n", "We were introduced to tensors in previous sections, when we discussed PyMC and Aesara. Similarly in PyTorch, a tensor is the fundamental data structure for storing and manipulating data. Like an Aesara tensor (or a NumPy array, for that matter), a tensor is a multidimensional array containing elements of a single data type. Tensors can be used to represent scalars, vectors, matrices, and n-dimensional arrays and are derived from the torch.Tensor class. However, tensors are more than just arrays of numbers. Creating or instantiating a tensor object from the `torch.Tensor` class gives us access to a set of built-in class attributes and operations or class methods that provide a robust set of built-in capabilities. \n", "\n", "Tensors also include added benefits that make them more suitable than NumPy arrays for deep learning calculations. Importantly, tensor operations can be performed significantly faster using **GPU acceleration**. Also, tensors can be stored and manipulated at scale using **distributed** processing on multiple CPUs and GPUs and across multiple servers. And third, tensors keep track of their graph computations, which is very important in implementing a deep learning library.\n", "\n", "### Simple CPU Example\n", "\n", "Here’s a simple example that creates a tensor, performs a tensor operation, and uses a built-in method on the tensor itself. By default, the tensor data type will be derived from the input data type and the tensor will be allocated to the CPU device. \n", "\n", "First, we import the PyTorch library, then we create two tensors, `x` and `y`, from two-dimensional lists. \n", "\n", "Next, we add the two tensors and store the result in `z`. We can just use the `+` operator here because the `torch.Tensor` class supports operator overloading. \n", "\n", "Finally, we print the new tensor, `z`, which we can see is the matrix sum of `x` and `y`, and we print the size of `z`. Notice that `z` is a tensor object itself and the `size()` method is used to return its matrix dimensions, namely 2 × 3:" ] }, { "cell_type": "code", "execution_count": null, "id": "a7ef87dd", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "a7ef87dd", "outputId": "a3277b42-ed6f-4411-dd44-9c40e8873ab3" }, "outputs": [], "source": [ "import torch\n", "\n", "x = torch.tensor([[1,2,3],[4,5,6]])\n", "y = torch.tensor([[7,8,9],[10,11,12]])\n", "z = x + y\n", "print(z)" ] }, { "cell_type": "code", "execution_count": null, "id": "0f2d3e05", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "0f2d3e05", "outputId": "aa76d01e-38f3-4c1b-a523-a1ff1d23d2bd" }, "outputs": [], "source": [ "z.size()" ] }, { "cell_type": "markdown", "id": "73e9cbf1", "metadata": { "id": "73e9cbf1" }, "source": [ "### Simple GPU Example\n", "\n", "The ability to accelerate tensor operations on a GPU is a major advantage of tensors over NumPy arrays. This is the same example from above, but here we move the tensors to the GPU device if one is available. Notice that the output tensor is also allocated to the GPU. You can use the device attribute (e.g., `z.device`) to confirm where the tensor resides.\n", "\n", "The `torch.cuda.is_available()` function will return `True` if your machine has GPU support. If your machine contains multiple GPUs, you can also control which GPU is being used." ] }, { "cell_type": "code", "execution_count": null, "id": "5400a935", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "5400a935", "outputId": "48dcf3d9-726a-4246-bddc-44b0f8622e30" }, "outputs": [], "source": [ "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n", "\n", "x = torch.tensor([[1,2,3],[4,5,6]],\n", " device=device)\n", "y = torch.tensor([[7,8,9],[10,11,12]],\n", " device=device)\n", "z = x + y\n", "print(z)\n", "\n", "print(z.device)" ] }, { "cell_type": "markdown", "id": "3341393e", "metadata": { "id": "3341393e" }, "source": [ "### GPU support\n", "\n", "A crucial feature of PyTorch is the support of GPUs, short for Graphics Processing Unit. A GPU can perform many thousands of small operations in parallel, making it very well suitable for performing large matrix operations in neural networks. When comparing GPUs to CPUs, we can list the following main differences (credit: [Kevin Krewell, 2009](https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/)) \n", "\n", "![](https://github.com/fonnesbeck/Bios8366/blob/master/notebooks/images/comparison_CPU_GPU.png?raw=true)\n", "\n", "CPUs and GPUs have both different advantages and disadvantages, which is why many computers contain both components and use them for different tasks. In case you are not familiar with GPUs, you can read up more details in this [NVIDIA blog post](https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/) or [here](https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html). \n", "\n", "GPUs can accelerate the training of your network up to a factor of $100$ which is essential for large neural networks. PyTorch implements a lot of functionality for supporting GPUs (mostly those of NVIDIA due to the libraries [CUDA](https://developer.nvidia.com/cuda-zone) and [cuDNN](https://developer.nvidia.com/cudnn)). First, let's check whether you have a GPU available:" ] }, { "cell_type": "code", "execution_count": null, "id": "7f6e2b82", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "7f6e2b82", "outputId": "e12e34a7-c52c-4fff-d6f6-a08e529862db" }, "outputs": [], "source": [ "gpu_avail = torch.cuda.is_available()\n", "print(f\"Is the GPU available? {gpu_avail}\")" ] }, { "cell_type": "markdown", "id": "f7bfde90", "metadata": { "id": "f7bfde90" }, "source": [ "If you have a GPU on your computer but the command above returns False, make sure you have the correct CUDA-version installed. On Google Colab, make sure that you have selected a GPU in your runtime setup (in the menu, check under `Runtime -> Change runtime type`). \n", "\n", "By default, all tensors you create are stored on the CPU. We can push a tensor to the GPU by using the function `.to(...)`, or `.cuda()`. However, it is often a good practice to define a `device` object in your code which points to the GPU if you have one, and otherwise to the CPU. Then, you can write your code with respect to this device object, and it allows you to run the same code on both a CPU-only system, and one with a GPU. Let's try it below. We can specify the device as follows: " ] }, { "cell_type": "code", "execution_count": null, "id": "7d3eab8a", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "7d3eab8a", "outputId": "02bc8ba4-b61f-45f2-8b9e-3598132fc262" }, "outputs": [], "source": [ "device = torch.device(\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n", "print(\"Device\", device)" ] }, { "cell_type": "markdown", "id": "cfcec77b", "metadata": { "id": "cfcec77b" }, "source": [ "Now let's create a tensor and push it to the device:" ] }, { "cell_type": "code", "execution_count": null, "id": "f821672e", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "f821672e", "outputId": "7f41a420-e340-4a4e-a662-5671f111e5cc" }, "outputs": [], "source": [ "x = torch.zeros(2, 3)\n", "x = x.to(device)\n", "print(\"X\", x)" ] }, { "cell_type": "markdown", "id": "7d8ae1cc", "metadata": { "id": "7d8ae1cc" }, "source": [ "In case you have a GPU, you should now see the attribute `device='cuda:0'` being printed next to your tensor. The zero next to cuda indicates that this is the zero-th GPU device on your computer. PyTorch also supports multi-GPU systems, but this you will only need once you have very big networks to train (if interested, see the [PyTorch documentation](https://pytorch.org/docs/stable/distributed.html#distributed-basics)). We can also compare the runtime of a large matrix multiplication on the CPU with a operation on the GPU:" ] }, { "cell_type": "code", "execution_count": null, "id": "e802613b", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "e802613b", "outputId": "5e03f60c-6535-4bc6-a61a-b0898ad7ffca" }, "outputs": [], "source": [ "import time\n", "x = torch.randn(5000, 5000)\n", "\n", "## CPU version\n", "start_time = time.time()\n", "_ = torch.matmul(x, x)\n", "end_time = time.time()\n", "print(f\"CPU time: {(end_time - start_time):6.5f}s\")\n", "\n", "## GPU version\n", "x = x.to(device)\n", "# CUDA is asynchronous, so we need to use different timing functions\n", "start = torch.cuda.Event(enable_timing=True)\n", "end = torch.cuda.Event(enable_timing=True)\n", "start.record()\n", "_ = torch.matmul(x, x)\n", "end.record()\n", "torch.cuda.synchronize() # Waits for everything to finish running on the GPU\n", "print(f\"GPU time: {0.001 * start.elapsed_time(end):6.5f}s\") # Milliseconds to seconds" ] }, { "cell_type": "markdown", "id": "f9901033", "metadata": { "id": "f9901033" }, "source": [ "Depending on the size of the operation and the CPU/GPU in your system, the speedup of this operation can be >50x. As `matmul` operations are very common in neural networks, we can already see the great benefit of training a NN on a GPU. The time estimate can be relatively noisy here because we haven't run it for multiple times. Feel free to extend this, but it also takes longer to run.\n", "\n", "When generating random numbers, the seed between CPU and GPU is not synchronized. Hence, we need to set the seed on the GPU separately to ensure a reproducible code. Note that due to different GPU architectures, running the same code on different GPUs does not guarantee the same random numbers. Still, we don't want that our code gives us a different output every time we run it on the exact same hardware. Hence, we also set the seed on the GPU:" ] }, { "cell_type": "code", "execution_count": null, "id": "zhXTAiv2InvS", "metadata": { "id": "zhXTAiv2InvS" }, "outputs": [], "source": [ "# GPU operations have a separate seed we also want to set\n", "if torch.cuda.is_available(): \n", " torch.cuda.manual_seed(42)\n", " torch.cuda.manual_seed_all(42)" ] }, { "cell_type": "markdown", "id": "b5110173", "metadata": { "deletable": true, "editable": true, "id": "b5110173" }, "source": [ "## Creating Tensors\n", "\n", "You can create tensors from preexisting numeric data or create random samplings. Tensors can be created from preexisting data stored in built-in structures such as lists, tuples, scalars, or serialized data files, as well as in NumPy arrays." ] }, { "cell_type": "markdown", "id": "909e4114", "metadata": { "id": "909e4114" }, "source": [ "The function `torch.Tensor` allocates memory for the desired tensor, but reuses any values that have already been in the memory. To directly assign values to the tensor during initialization, there are many alternatives including:\n", "\n", "* `torch.zeros`: Creates a tensor filled with zeros\n", "* `torch.ones`: Creates a tensor filled with ones\n", "* `torch.rand`: Creates a tensor with random values uniformly sampled between 0 and 1\n", "* `torch.randn`: Creates a tensor with random values sampled from a normal distribution with mean 0 and variance 1\n", "* `torch.arange`: Creates a tensor containing the values $N,N+1,N+2,...,M$\n", "* `torch.Tensor` (input list): Creates a tensor from the list elements you provide" ] }, { "cell_type": "code", "execution_count": null, "id": "3861149d", "metadata": { "collapsed": true, "deletable": true, "editable": true, "id": "3861149d" }, "outputs": [], "source": [ "import numpy\n", "\n", "# Created from preexisting arrays\n", "w = torch.tensor([1,2,3])\n", "w = torch.tensor((1,2,3))\n", "w = torch.tensor(numpy.array([1,2,3]))\n", "\n", "# Initialized by size\n", "w = torch.empty(100,200)\n", "w = torch.zeros(100,200)\n", "w = torch.ones(100,200)" ] }, { "cell_type": "markdown", "id": "d5c06f82", "metadata": { "deletable": true, "editable": true, "id": "d5c06f82" }, "source": [ "If you want to initialize a tensor with random values, PyTorch supports a robust set of functions that you can use, such as torch.`rand()`, `torch.randn()`, and `torch.randint()`." ] }, { "cell_type": "code", "execution_count": null, "id": "fbd28729", "metadata": { "deletable": true, "editable": true, "id": "fbd28729" }, "outputs": [], "source": [ "# Initialized by size with random values\n", "w = torch.rand(100,200) \n", "w = torch.randn(100,200) \n", "w = torch.randint(5,10,(100,200)) \n", "\n", "# Initialized with specified data type or device\n", "w = torch.empty((100,200), dtype=torch.float64,\n", " device=device)\n", "\n", "# Initialized to have the same size, data type,\n", "# and device as another tensor\n", "x = torch.empty_like(w)" ] }, { "cell_type": "markdown", "id": "3d0871e3", "metadata": { "deletable": true, "editable": true, "id": "3d0871e3" }, "source": [ "In addition, functions with the _like postfix such as `torch.empty_like()` and `torch.ones_like()` return tensors that have the same size, data type, and device as another tensor but are initialized differently." ] }, { "cell_type": "markdown", "id": "cec02914", "metadata": { "id": "cec02914" }, "source": [ "## Tensor Attributes\n", "\n", "One PyTorch quality that has contributed to its popularity is the fact that it’s very Pythonic and object oriented in nature. Since a tensor is its own data type, you can read attributes of the tensor object itself. Assuming `x` is a tensor, you can access several attributes of `x` as follows:\n", "\n", "`x.dtype`\n", "\n", "- Indicates the tensor’s data type (see Table 2-2 for a list of PyTorch data types)\n", "\n", "`x.device`\n", "\n", "- Indicates the tensor’s device location (e.g., CPU or GPU memory)\n", "\n", "`x.shape`\n", "\n", "- Shows the tensor’s dimensions\n", "\n", "`x.ndim`\n", "\n", "- Identifies the number of a tensor’s dimensions or rank\n", "\n", "`x.requires_grad`\n", "\n", "- A Boolean attribute that indicates whether the tensor keeps track of graph computations (see “Automatic Differentiation (Autograd)”)\n", "\n", "`x.grad`\n", "\n", "- Contains the actual gradients if requires_grad is True\n", "\n", "`x.grad_fn`\n", "\n", "- Stores the graph computation function used if requires_grad is True\n", "\n", "`x.s_cuda`, `x.is_sparse`, `x.is_quantized`, `x.is_leaf`, `x.is_mkldnn`\n", "\n", "- Boolean attributes that indicate whether the tensor meets certain conditions\n", "\n", "`x.layout`\n", "\n", "- Indicates how a tensor is laid out in memory\n", "\n", "## Data Types\n", "\n", "During deep learning development, it’s important to be aware of the data type used by your data and its calculations. So when you create tensors, you should control what data types are being used. You can specify the data type when creating the tensor by using the `dtype` parameter, or you can cast a tensor to a new dtype using the appropriate casting method or the `to()` method:" ] }, { "cell_type": "code", "execution_count": null, "id": "0b891887", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "collapsed": true, "deletable": true, "editable": true, "id": "0b891887", "outputId": "a3d2fb12-6081-4e51-e374-e74e0e5429eb" }, "outputs": [], "source": [ "# Specify the data type at creation using dtype\n", "w = torch.tensor([1,2,3], dtype=torch.float32)\n", "\n", "# Use the casting method to cast to a new data type\n", "w.int() # w remains a float32 after the cast\n", "w = w.int() # w changes to an int32 after the cast\n", "\n", "# Use the to() method to cast to a new type\n", "w = w.to(torch.float64) \n", "w = w.to(dtype=torch.float64) \n", "\n", "# Python automatically converts data types during\n", "# operations\n", "x = torch.tensor([1,2,3], dtype=torch.int32)\n", "y = torch.tensor([1,2,3], dtype=torch.float32)\n", "z = x + y \n", "print(z.dtype)\n" ] }, { "cell_type": "markdown", "id": "34adf80e", "metadata": { "deletable": true, "editable": true, "id": "34adf80e" }, "source": [ "## Tensor Operations\n", "\n", "PyTorch supports a robust set of tensor operations that allow you to access and transform your tensor data.\n", "\n", "### Indexing, Slicing, Combining, and Splitting Tensors\n", "\n", "Once you have created tensors, you may want to access portions of the data and combine or split tensors to form new tensors. You can slice and index tensors in the same way you would slice and index NumPy arrays. Note that indexing and slicing will return tensors even if the array is only a single element. You will need to use the `item()` function to convert a single-element tensor to a Python value when passing to other functions like `print()`:" ] }, { "cell_type": "code", "execution_count": null, "id": "0256adf4", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "deletable": true, "editable": true, "id": "0256adf4", "outputId": "caed38e6-6c32-427e-d2ab-7ec7765d0661" }, "outputs": [], "source": [ "x = torch.tensor([[1,2],[3,4],[5,6],[7,8]])\n", "x" ] }, { "cell_type": "code", "execution_count": null, "id": "6fae3164", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "6fae3164", "outputId": "ca2cac9d-1a9a-4303-b140-9e01468b58af" }, "outputs": [], "source": [ "x[1,1]" ] }, { "cell_type": "code", "execution_count": null, "id": "ec137a3c", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ec137a3c", "outputId": "e79a325c-c33d-4fa8-9971-0ce39891cddd" }, "outputs": [], "source": [ "x[1,1].item()" ] }, { "cell_type": "code", "execution_count": null, "id": "59d887c9", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "59d887c9", "outputId": "149db0c3-b5ac-46ca-f783-9c641a1c8674" }, "outputs": [], "source": [ "# Slicing\n", "x[:2,1]" ] }, { "cell_type": "code", "execution_count": null, "id": "0d1effea", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "0d1effea", "outputId": "ea67c243-607b-4139-9a3b-b92264789cff" }, "outputs": [], "source": [ "# Boolean indexing\n", "x[x<5]" ] }, { "cell_type": "markdown", "id": "92011c5b", "metadata": { "deletable": true, "editable": true, "id": "92011c5b" }, "source": [ "PyTorch also supports transposing and reshaping arrays" ] }, { "cell_type": "code", "execution_count": null, "id": "576f54aa", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "collapsed": true, "deletable": true, "editable": true, "id": "576f54aa", "outputId": "c7c708c1-8948-4ad9-9f94-ea1d137265d1" }, "outputs": [], "source": [ "# Transpose array; x.t() or x.T can be used\n", "x.t()" ] }, { "cell_type": "code", "execution_count": null, "id": "cc098225", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "cc098225", "outputId": "9d9537ae-0c68-4ccd-c0df-d00acf294a61" }, "outputs": [], "source": [ "# Change shape; usually view() is preferred over\n", "# reshape()\n", "x.view((2,4))" ] }, { "cell_type": "markdown", "id": "c5a12a4e", "metadata": { "deletable": true, "editable": true, "id": "c5a12a4e" }, "source": [ "You can also combine or split tensors by using functions like `torch.stack()` and `torch.unbind()`, respectively." ] }, { "cell_type": "code", "execution_count": null, "id": "d1b38d4a", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "deletable": true, "editable": true, "id": "d1b38d4a", "outputId": "90b2c007-a332-4fe9-8317-c26ddb9dcf88" }, "outputs": [], "source": [ "# Combining tensors\n", "y = torch.stack((x, x))\n", "y" ] }, { "cell_type": "markdown", "id": "49b416b1", "metadata": { "deletable": true, "editable": true, "id": "49b416b1" }, "source": [ "PyTorch provides a robust set of built-in functions that can be used to access, split, and combine tensors in different ways.\n", "\n", "`torch.cat()`\n", "- Concatenates the given sequence of tensors in the given dimension.\n", "\n", "`torch.chunk()`\n", "- Splits a tensor into a specific number of chunks. Each chunk is a view of the input tensor.\n", "\n", "`torch.gather()`\n", "- Gathers values along an axis specified by the dimension.\n", "\n", "`torch.index_select()`\n", "- Returns a new tensor that indexes the input tensor along a dimension using the entries in the index, which is a LongTensor.\n", "\n", "`torch.masked_select()`\n", "- Returns a new 1D tensor that indexes the input tensor according to the Boolean mask, which is a BoolTensor.\n", "\n", "\n", "`torch.narrow()`\n", "- Returns a tensor that is a narrow version of the input tensor.\n", "\n", "`torch.nonzero()`\n", "- Returns the indices of nonzero elements.\n", "\n", "`torch.reshape()`\n", "- Returns a tensor with the same data and number of elements as the input tensor, but a different shape. Use view() instead to ensure the tensor is not copied.\n", "\n", "`torch.split()`\n", "- Splits the tensor into chunks. Each chunk is a view or subdivision of the original tensor.\n", "\n", "`torch.squeeze()`\n", "- Returns a tensor with all the dimensions of the input tensor of size 1 removed.\n", "\n", "`torch.stack()`\n", "- Concatenates a sequence of tensors along a new dimension.\n", "\n", "`torch.t()`\n", "- Expects the input to be a 2D tensor and transposes dimensions 0 and 1.\n", "\n", "`torch.take()`\n", "- Returns a tensor at specified indices when slicing is not continuous.\n", "\n", "`torch.transpose()`\n", "- Transposes only the specified dimensions.\n", "\n", "`torch.unbind()`\n", "- Removes a tensor dimension by returning a tuple of the removed dimension.\n", "\n", "`torch.unsqueeze()`\n", "- Returns a new tensor with a dimension of size 1 inserted at the specified position.\n", "\n", "`torch.where()`\n", "- Returns a tensor of selected elements from either one of two tensors, depending on the specified condition." ] }, { "cell_type": "markdown", "id": "a6dbd9c6", "metadata": { "id": "a6dbd9c6" }, "source": [ "Tensors can also be converted back into NumPy arrays using the `numpy()` method." ] }, { "cell_type": "code", "execution_count": null, "id": "74c8d44f", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "collapsed": true, "deletable": true, "editable": true, "id": "74c8d44f", "outputId": "45a84d39-99ad-4a82-9ef2-47f21a04aaa2" }, "outputs": [], "source": [ "tensor = torch.arange(4)\n", "np_arr = tensor.numpy()\n", "\n", "print(\"PyTorch tensor:\", tensor)\n", "print(\"Numpy array:\", np_arr)" ] }, { "cell_type": "markdown", "id": "23b70548", "metadata": { "id": "23b70548" }, "source": [ "The conversion of tensors to numpy require the tensor to be on the CPU, and not the GPU. In case you have a tensor on GPU, you need to call `.cpu()` on the tensor beforehand. Hence, you get a line like `np_arr = tensor.cpu().numpy()`." ] }, { "cell_type": "markdown", "id": "86b9d17b", "metadata": { "id": "86b9d17b" }, "source": [ "### Tensor Operations for Mathematics\n", "\n", "Deep learning development is strongly based on mathematical computations, so PyTorch supports a very robust set of built-in math functions. Whether you are creating new data transforms, customizing loss functions, or building your own optimization algorithms, you can speed up your research and development with the math functions provided by PyTorch.\n", "\n", "PyTorch supports many different types of math functions, including pointwise operations, reduction functions, comparison calculations, and linear algebra operations, as well as spectral and other math computations. \n", "\n", "For example, a commonly used operation is matrix multiplication, which are essential for neural networks. If we have an input vector `x`, which is transformed using a learned weight matrix `w`, this can be computed in several ways:\n", "\n", "- `torch.matmul`: Performs the matrix product over two tensors, where the specific behavior depends on the dimensions. If both inputs are matrices (2-dimensional tensors), it performs the standard matrix product. For higher dimensional inputs, the function supports broadcasting (for details see the documentation). Can also be written as `a @ b`, similar to numpy.\n", "\n", "- `torch.mm`: Performs the matrix product over two matrices, but doesn’t support broadcasting\n", "\n", "- `torch.bmm`: Performs the matrix product with a support batch dimension. If the first tensor `T` is of shape `(b, n, m)`, and the second tensor `R` is `(b, m, p)`, the output `O` is of shape `(b, n, p)`, and has been calculated by performing `b` matrix multiplications of the submatrices of `T` and `R`: `O[i] = T[i] @ R[i]`. \n", "\n", "- `torch.einsum`: Performs matrix multiplications and more (i.e. sums of products) using the Einstein summation convention." ] }, { "cell_type": "code", "execution_count": null, "id": "e4caa0c5", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "e4caa0c5", "outputId": "6194798d-8431-4a4a-8f4a-ff5448463cb8" }, "outputs": [], "source": [ "x = torch.arange(6)\n", "x = x.view(2, 3)\n", "print(\"X\", x)" ] }, { "cell_type": "code", "execution_count": null, "id": "00a0142e", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "00a0142e", "outputId": "2d01817c-f90d-4127-f0ad-f9ef3f4e8bb8" }, "outputs": [], "source": [ "W = torch.arange(9).view(3, 3) # We can also stack multiple operations in a single line\n", "print(\"W\", W)" ] }, { "cell_type": "code", "execution_count": null, "id": "4d25c340", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "4d25c340", "outputId": "1668a450-f2ff-4869-b330-cd2a2eec6bb9" }, "outputs": [], "source": [ "h = torch.matmul(x, W) # Verify the result by calculating it by hand too!\n", "print(\"h\", h)" ] }, { "cell_type": "markdown", "id": "ab630042", "metadata": { "id": "ab630042" }, "source": [ "### Dynamic Computation Graph and Backpropagation\n", "\n", "One of the main reasons for using PyTorch in Deep Learning projects is that we can automatically get **gradients/derivatives** of functions that we define. We will mainly use PyTorch for implementing neural networks, and they are just fancy functions. If we use weight matrices in our function that we want to learn, then those are called the **parameters** or simply the **weights**.\n", "\n", "If our neural network would output a single scalar value, we would talk about taking the **derivative**, but you will see that quite often we will have **multiple** output variables (\"values\"); in that case we talk about **gradients**. It's a more general term.\n", "\n", "Given an input $\\mathbf{x}$, we define our function by **manipulating** that input, usually by matrix-multiplications with weight matrices and additions with so-called bias vectors. As we manipulate our input, we are automatically creating a **computational graph**. This graph shows how to arrive at our output from our input. \n", "\n", "PyTorch is a **define-by-run** framework; this means that we can just do our manipulations, and PyTorch will keep track of that graph for us. Thus, we create a dynamic computation graph along the way.\n", "\n", "So, the only thing we have to do is to compute the **output**, and then we can ask PyTorch to automatically get the **gradients**. \n", "\n", "The first thing we have to do is to specify which tensors require gradients. By default, when we create a tensor, it does not require gradients." ] }, { "cell_type": "code", "execution_count": null, "id": "de424ba1", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "de424ba1", "outputId": "e13ce81c-9959-4dca-e878-49b311f59a89" }, "outputs": [], "source": [ "x = torch.ones((3,))\n", "print(x.requires_grad)" ] }, { "cell_type": "markdown", "id": "094b3e3c", "metadata": { "id": "094b3e3c" }, "source": [ "We can change this for an existing tensor using the function `requires_grad_()` (underscore indicating that this is a in-place operation). Alternatively, when creating a tensor, you can pass the argument `requires_grad=True` to most initializers we have seen above." ] }, { "cell_type": "code", "execution_count": null, "id": "52eb0a5e", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "52eb0a5e", "outputId": "af7efa87-344e-4c55-fefd-299fca77714e" }, "outputs": [], "source": [ "x.requires_grad_(True)\n", "print(x.requires_grad)" ] }, { "cell_type": "markdown", "id": "b381a4d6", "metadata": { "id": "b381a4d6" }, "source": [ "In order to get familiar with the concept of a computation graph, we will create one for the following function:\n", "\n", "$$y = \\frac{1}{|x|}\\sum_i \\left[(x_i + 2)^2 + 3\\right]$$\n", "\n", "You could imagine that $x$ are our parameters, and we want to optimize (either maximize or minimize) the output $y$. For this, we want to obtain the gradients $\\partial y / \\partial \\mathbf{x}$. For our example, we'll use $\\mathbf{x}=[0,1,2]$ as our input." ] }, { "cell_type": "code", "execution_count": null, "id": "4af38547", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "4af38547", "outputId": "7dacb327-f15d-4d57-b515-008fcbb84f62" }, "outputs": [], "source": [ "x = torch.arange(3, dtype=torch.float32, requires_grad=True) # Only float tensors can have gradients\n", "print(\"X\", x)" ] }, { "cell_type": "markdown", "id": "a82a7a3a", "metadata": { "id": "a82a7a3a" }, "source": [ "Now let's build the computation graph step by step. You can combine multiple operations in a single line, but we will separate them here to get a better understanding of how each operation is added to the computation graph." ] }, { "cell_type": "code", "execution_count": null, "id": "3b904ff2", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "3b904ff2", "outputId": "d566e293-1f57-4e5d-a87d-f8b45e1aff2d" }, "outputs": [], "source": [ "a = x + 2\n", "b = a ** 2\n", "c = b + 3\n", "y = c.mean()\n", "print(\"Y\", y)" ] }, { "cell_type": "markdown", "id": "8aa7f8d5", "metadata": { "id": "8aa7f8d5" }, "source": [ "Using the statements above, we have created the following computatino graph:\n", "\n", "![image.png]()\n", "\n", "We calculate $a$ based on the inputs $x$ and the constant $2$, $b$ is $a$ squared, and so on. The visualization is an abstraction of the dependencies between inputs and outputs of the operations we have applied.\n", "\n", "Each node of the computation graph has automatically defined a function for calculating the gradients with respect to its inputs, `grad_fn`. You can see this when we printed the output tensor $y$. This is why the computation graph is usually visualized in the reverse direction (arrows point from the result to the inputs). We can perform backpropagation on the computation graph by calling the function `backward()` on the last output, which effectively calculates the gradients for each tensor that has the property `requires_grad=True`:" ] }, { "cell_type": "code", "execution_count": null, "id": "b8b63b84", "metadata": { "id": "b8b63b84" }, "outputs": [], "source": [ "y.backward()" ] }, { "cell_type": "markdown", "id": "b5be707c", "metadata": { "id": "b5be707c" }, "source": [ "`x.grad` will now contain the gradient $\\partial y/ \\partial \\mathcal{x}$, and this gradient indicates how a change in $\\mathbf{x}$ will affect output $y$ given the current input $\\mathbf{x}=[0,1,2]$:" ] }, { "cell_type": "code", "execution_count": null, "id": "fabb19be", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "fabb19be", "outputId": "42c7e5c5-e0ed-441c-90d8-3e1be9fe1ce9" }, "outputs": [], "source": [ "print(x.grad)" ] }, { "cell_type": "markdown", "id": "f818f2d4", "metadata": { "id": "f818f2d4" }, "source": [ "We can also verify these gradients by hand. We will calculate the gradients using the chain rule, in the same way as PyTorch did it:\n", "\n", "$$\\frac{\\partial y}{\\partial x_i} = \\frac{\\partial y}{\\partial c_i}\\frac{\\partial c_i}{\\partial b_i}\\frac{\\partial b_i}{\\partial a_i}\\frac{\\partial a_i}{\\partial x_i}$$\n", "\n", "Note that we have simplified this equation to index notation, and by using the fact that all operation besides the mean do not combine the elements in the tensor. The partial derivatives are:\n", "\n", "$$\n", "\\frac{\\partial a_i}{\\partial x_i} = 1,\\hspace{1cm}\n", "\\frac{\\partial b_i}{\\partial a_i} = 2\\cdot a_i\\hspace{1cm}\n", "\\frac{\\partial c_i}{\\partial b_i} = 1\\hspace{1cm}\n", "\\frac{\\partial y}{\\partial c_i} = \\frac{1}{3}\n", "$$\n", "\n", "Hence, with the input being $\\mathbf{x}=[0,1,2]$, our gradients are $\\partial y/\\partial \\mathbf{x}=[4/3,2,8/3]$. The previous code cell should have printed the same result." ] }, { "cell_type": "markdown", "id": "150620b6", "metadata": { "id": "150620b6" }, "source": [ "## Data Loading\n", "\n", "PyTorch provides powerful built-in classes and utilities, such as the `Dataset`, `DataLoader`, and `Sampler` classes, for loading various types of data. The `Dataset` class defines how to access and preprocess data from a file or data sources. The `Sampler` class defines how to sample data from a dataset in order to create batches, while the `DataLoader` class combines a dataset with a sampler and allows you to iterate over a set of batches.\n", "\n", "PyTorch libraries such as Torchvision and Torchtext also provide classes to support specialized data like computer vision and natural language data. The torchvision.datasets module is a good example of how to utilize built-in classes to load data. The torchvision.datasets module provides a number of subclasses to load image data from popular academic datasets.\n", "\n", "One of these popular datasets is CIFAR-10. The CIFAR-10 dataset consists of 50,000 training images and 10,000 test images of 10 possible objects: airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "62b5b16a", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 84, "referenced_widgets": [ "3e0bd075e0174bc1bdcb137bafa82b80", "9c92e8e0174c44388aaf7e50297e0d3b", "378e399b9dde438e91ea576916fc7f5b", "3070edc6c79b40309c198b5c4845c569", "eccc0d3be2f84bd5954077b0c5ed0fee", "e0537b86f5b64a098a72b5452093d378", "0c08e5867eef48cfae0adf6a02915631", "7fa15f28033e4c0cbdca4d87019dde74", "9ecef5efa63041599041d76ee295a8c8", "208cd522e37a4d4e8a23bb7af68cbbc5", "fc8f946314f641c5af76cf7d62c7b4db" ] }, "deletable": true, "editable": true, "id": "62b5b16a", "outputId": "d39f21a2-21c1-4561-f00b-9cda6f722d17" }, "outputs": [], "source": [ "from torchvision.datasets import CIFAR10\n", "\n", "train_data = CIFAR10(root=\"./train/\",\n", " train=True,\n", " download=True)" ] }, { "cell_type": "markdown", "id": "0cfbb6c6", "metadata": { "id": "0cfbb6c6" }, "source": [ "The `train` parameter determines whether we load the training data or the testing data, and setting `download` to `True` will download the data for us if we don’t have it already." ] }, { "cell_type": "code", "execution_count": null, "id": "2e83360c", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "2e83360c", "outputId": "480d33f1-fbd4-4d4a-e94b-739a09aec922" }, "outputs": [], "source": [ "train_data" ] }, { "cell_type": "code", "execution_count": null, "id": "bb65de6f", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "bb65de6f", "outputId": "85bfca58-3950-4f64-818d-48fca7268b82" }, "outputs": [], "source": [ "len(train_data)" ] }, { "cell_type": "code", "execution_count": null, "id": "3e48da2f", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "3e48da2f", "outputId": "138eb1e6-fb69-48b0-e0a9-0a7d4101bc87" }, "outputs": [], "source": [ "train_data.data.shape" ] }, { "cell_type": "code", "execution_count": null, "id": "6e057ec7", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "6e057ec7", "outputId": "f8e5201a-3abf-4f5a-a2f2-f368c153ba84" }, "outputs": [], "source": [ "train_data.targets[:10]" ] }, { "cell_type": "code", "execution_count": null, "id": "1a2e3f61", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "1a2e3f61", "outputId": "2210d9cf-365b-4b36-e6f1-ebc4175ffb41" }, "outputs": [], "source": [ "train_data.classes" ] }, { "cell_type": "code", "execution_count": null, "id": "fc91dc6e", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "fc91dc6e", "outputId": "f464ed20-ad53-4f86-ae82-977e8b64fe22" }, "outputs": [], "source": [ "train_data.class_to_idx" ] }, { "cell_type": "code", "execution_count": null, "id": "aba65f5d", "metadata": { "id": "aba65f5d" }, "outputs": [], "source": [ "data, label = train_data[0]" ] }, { "cell_type": "code", "execution_count": null, "id": "08a358a5", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "08a358a5", "outputId": "d8fced7f-1d16-4258-d091-2d232df7f57f" }, "outputs": [], "source": [ "type(data)" ] }, { "cell_type": "code", "execution_count": null, "id": "e519b432", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 49 }, "id": "e519b432", "outputId": "b767b36c-fbab-42b3-8ac4-8cc932678ace" }, "outputs": [], "source": [ "data" ] }, { "cell_type": "markdown", "id": "a2163963", "metadata": { "deletable": true, "editable": true, "id": "a2163963" }, "source": [ "The data consists of a PIL image object. PIL is a common image format that uses the Pillow library to store image pixel values in the format of height × width × channels. A color image has three channels (RGB) for red, green, and blue.\n", "\n", "We can also load the test data into another dataset object called test_data." ] }, { "cell_type": "code", "execution_count": null, "id": "219c6f38", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 84, "referenced_widgets": [ "97433cc60d6841859129fcb83c67ec4b", "c41ea09afdd8433c9d4593ccf9a4d4e9", "a797b8179d314847915848d12720352e", "b6cbb990998e4ce890309fb7065803ff", "909d0ff2a5454898827ccb8b0c83f318", "f7325c3b35ad4414bd1c7ab3ffb7858f", "7fce1614a72d47e7b0fd3d59c692e2cb", "9af871bc187c47b89603944054aac415", "b528593e4a024f428b1a0a8fb49e2386", "7d4297ecd6bd41f58fa31e61c6cfa341", "068c5d4019c740019f30c75afaf877c6" ] }, "deletable": true, "editable": true, "id": "219c6f38", "outputId": "18ee6cdd-026d-42c4-f489-e832865a6cdb" }, "outputs": [], "source": [ "test_data = CIFAR10(root=\"./test/\",\n", " train=False,\n", " download=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "5149dbed", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "5149dbed", "outputId": "e34cb70e-2f3d-4806-cc0f-e702cb46428b" }, "outputs": [], "source": [ "test_data" ] }, { "cell_type": "markdown", "id": "176bf8d3", "metadata": { "deletable": true, "editable": true, "id": "176bf8d3" }, "source": [ "## Data Transforms\n", "\n", "In the data loading step, we pulled data from its source and created dataset objects that contain information about the dataset and the data itself. However, the data might need to be adjusted before it is passed into the NN model for training and testing. For example, data values may be normalized to assist training, augmented to create larger datasets, or converted from one type of object to a tensor.\n", "\n", "These adjustments are accomplished by applying transforms. The beauty of using transforms in PyTorch is that you can define a sequence of transforms and apply it when the data is accessed. " ] }, { "cell_type": "code", "execution_count": null, "id": "47a5b599", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "deletable": true, "editable": true, "id": "47a5b599", "outputId": "7ff8c00a-ac25-433e-fe7c-1a10844ed522" }, "outputs": [], "source": [ "from torchvision import transforms\n", "\n", "train_transforms = transforms.Compose([\n", " transforms.RandomCrop(32, padding=4),\n", " transforms.RandomHorizontalFlip(),\n", " transforms.ToTensor(),\n", " transforms.Normalize(\n", " mean=(0.4914, 0.4822, 0.4465),\n", " std=(0.2023, 0.1994, 0.2010))])\n", "\n", "train_data = CIFAR10(root=\"./train/\",\n", " train=True,\n", " download=True,\n", " transform=train_transforms)" ] }, { "cell_type": "markdown", "id": "f475febe", "metadata": { "deletable": true, "editable": true, "id": "f475febe" }, "source": [ "We define a set of transforms using the transforms.`Compose()` class. This class accepts a list of transforms and applies them in sequence. Here we randomly crop and flip images, convert them to tensors, and normalize the tensor values to predetermined means and standard deviations.\n", "\n", "The transforms are passed to the dataset class during instantiation and become part of the dataset object. The transforms are applied whenever the dataset object is accessed, returning a new result consisting of the transformed data.eate a variable from an array or list of constants." ] }, { "cell_type": "code", "execution_count": null, "id": "addd9065", "metadata": { "deletable": true, "editable": true, "id": "addd9065", "outputId": "62e9e57f-ce4e-431c-afe1-8025f69a7f1f" }, "outputs": [], "source": [ "train_data" ] }, { "cell_type": "markdown", "id": "3e1572ff", "metadata": { "deletable": true, "editable": true, "id": "3e1572ff" }, "source": [ "The data output is now a tensor of size 3 × 32 × 32. It has also been randomly cropped, horizontally flipped, and normalized." ] }, { "cell_type": "code", "execution_count": null, "id": "bce5602f", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 284 }, "deletable": true, "editable": true, "id": "bce5602f", "outputId": "9338dd4b-984b-43ae-d93c-fdb5fe1a415f" }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "data, label = train_data[0]\n", "\n", "def imshow(img):\n", " npimg = img.numpy()\n", " plt.imshow(np.transpose(npimg, (1, 2, 0)))\n", " plt.show()\n", "\n", "imshow(data)" ] }, { "cell_type": "markdown", "id": "9719bec9", "metadata": { "deletable": true, "editable": true, "id": "9719bec9" }, "source": [ "The colors may look strange because of the normalization, but this actually helps NN models do a better job of classifying the images.\n", "\n", "We can define a different set of transforms for testing and apply them to our test data as well. In the case of test data, we do not want to crop or flip the image, but we do need to convert the image to tensors and normalize the tensor values." ] }, { "cell_type": "code", "execution_count": null, "id": "016521dd", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "deletable": true, "editable": true, "id": "016521dd", "outputId": "90faee82-958d-42f4-cb81-90d2da69ac00" }, "outputs": [], "source": [ "test_transforms = transforms.Compose([\n", " transforms.ToTensor(),\n", " transforms.Normalize(\n", " (0.4914, 0.4822, 0.4465),\n", " (0.2023, 0.1994, 0.2010))])\n", "\n", "test_data = CIFAR10(\n", " root=\"./test/\",\n", " train=False,\n", " transform=test_transforms)\n", "\n", "test_data" ] }, { "cell_type": "markdown", "id": "d8bc68cd", "metadata": { "deletable": true, "editable": true, "id": "d8bc68cd" }, "source": [ "## Data Batching\n", "\n", "Now that we have defined the transforms and created the datasets, we can access data samples one at a time. However, when you train your model, you will want to pass in small batches of data at each iteration. Sending data in batches not only allows more efficient training but also takes advantage of the parallel nature of GPUs to accelerate training.\n", "\n", "Batch processing can easily be implemented using the torch.utils.data.DataLoader class." ] }, { "cell_type": "code", "execution_count": null, "id": "56ef5f4e", "metadata": { "deletable": true, "editable": true, "id": "56ef5f4e" }, "outputs": [], "source": [ "trainloader = torch.utils.data.DataLoader(\n", " train_data,\n", " batch_size=16,\n", " shuffle=True)" ] }, { "cell_type": "markdown", "id": "df4d5486", "metadata": { "deletable": true, "editable": true, "id": "df4d5486" }, "source": [ "We use a batch size of 16 samples and shuffle our dataset so that the dataloader retrieves a random sampling of the data.\n", "\n", "The dataloader object combines a dataset and a sampler, and provides an iterable over the given dataset. In other words, your training loop can use this object to sample your dataset and apply transforms one batch at a time instead of applying them for the complete dataset at once. This considerably improves efficiency and speed when training and testing models.\n", "\n", "We can configure our data loader with the following input arguments (only a selection, see full list [here](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)):\n", "\n", "* `batch_size`: Number of samples to stack per batch\n", "* `shuffle`: If True, the data is returned in a random order. This is important during training for introducing stochasticity. \n", "* `num_workers`: Number of subprocesses to use for data loading. The default, 0, means that the data will be loaded in the main process which can slow down training for datasets where loading a data point takes a considerable amount of time (e.g. large images). More workers are recommended for those, but can cause issues on Windows computers. For tiny datasets as ours, 0 workers are usually faster.\n", "* `pin_memory`: If True, the data loader will copy Tensors into CUDA pinned memory before returning them. This can save some time for large data points on GPUs. Usually a good practice to use for a training set, but not necessarily for validation and test to save memory on the GPU.\n", "* `drop_last`: If True, the last batch is dropped in case it is smaller than the specified batch size. This occurs when the dataset size is not a multiple of the batch size. Only potentially helpful during training to keep a consistent batch size." ] }, { "cell_type": "code", "execution_count": null, "id": "43ecbfd5", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "collapsed": true, "deletable": true, "editable": true, "id": "43ecbfd5", "outputId": "2bfe85da-476b-4ab4-c44f-247cceb6d6b3" }, "outputs": [], "source": [ "data_batch, labels_batch = next(iter(trainloader))\n", "\n", "data_batch.size()" ] }, { "cell_type": "markdown", "id": "cf797ec1", "metadata": { "deletable": true, "editable": true, "id": "cf797ec1" }, "source": [ "We need to use `iter()` to cast the trainloader to an iterator and then use `next()` to iterate over the data one more time. This is only necessary when accessing one batch. As we’ll see later, our training loops will access the dataloader directly without the need for `iter()` and `next()`. After checking the sizes of the data and labels, we see they return batches of size 16." ] }, { "cell_type": "code", "execution_count": null, "id": "dd2c5c09", "metadata": { "collapsed": true, "deletable": true, "editable": true, "id": "dd2c5c09" }, "outputs": [], "source": [ "testloader = torch.utils.data.DataLoader(\n", " test_data,\n", " batch_size=16,\n", " shuffle=False)" ] }, { "cell_type": "markdown", "id": "3826f6b3", "metadata": { "deletable": true, "editable": true, "id": "3826f6b3" }, "source": [ "We set shuffle to False since there’s usually no need to shuffle the test data.\n", "\n", "PyTorch also provides a submodule called `torch.utils.data` that you can use to create your own dataset and dataloader classes like the ones you saw in Torchvision. It consists of `Dataset`, `Sampler`, and `DataLoader` classes.\n", " " ] }, { "cell_type": "markdown", "id": "f07c13e9", "metadata": { "id": "f07c13e9" }, "source": [ "## Example: Continuous XOR\n", "\n", "If we want to build a neural network in PyTorch, we could specify all our parameters (weight matrices, bias vectors) using `Tensors` (with `requires_grad=True`), ask PyTorch to calculate the gradients and then adjust the parameters. But things can quickly get cumbersome if we have a lot of parameters. In PyTorch, there is a package called `torch.nn` that makes building neural networks more convenient. \n", "\n", "We will introduce the libraries and all additional parts you might need to train a neural network in PyTorch, using a simple example classifier on the simple XOR example from the previous section: given two binary inputs $x_1$ and $x_2$, the label to predict is $1$ if either $x_1$ or $x_2$ is $1$ while the other is $0$, or the label is $0$ in all other cases. Recall that a single neuron (linear classifier) cannot learn this simple function.\n", "Hence, we will build a small neural network that can learn this function. \n", "To make it a little bit more interesting, we move the XOR into continuous space and introduce some gaussian noise on the binary inputs. Our desired separation of an XOR dataset could look as follows:\n", "\n", "![image.png]()" ] }, { "cell_type": "markdown", "id": "0dbf9668", "metadata": { "id": "0dbf9668" }, "source": [ "### The model\n", "\n", "The package `torch.nn` defines a series of useful classes like linear networks layers, activation functions, loss functions etc. A full list can be found [here](https://pytorch.org/docs/stable/nn.html). " ] }, { "cell_type": "code", "execution_count": null, "id": "a0eb8f0b", "metadata": { "id": "a0eb8f0b" }, "outputs": [], "source": [ "import torch.nn as nn" ] }, { "cell_type": "markdown", "id": "fe7eda13", "metadata": { "id": "fe7eda13" }, "source": [ "Additionally to `torch.nn`, there is also `torch.nn.functional`. It contains functions that are used in network layers. This is in contrast to `torch.nn` which defines them as `nn.Modules`, and `torch.nn` actually uses a lot of functionalities from `torch.nn.functional`. Hence, the functional package is useful in many situations, and so we import it as well here." ] }, { "cell_type": "code", "execution_count": null, "id": "3d8f322d", "metadata": { "id": "3d8f322d" }, "outputs": [], "source": [ "import torch.nn.functional as F" ] }, { "cell_type": "markdown", "id": "a469bc33", "metadata": { "id": "a469bc33" }, "source": [ "#### nn.Module\n", "\n", "In PyTorch, a neural network is comprised of modules. Modules can contain other modules, and a neural network is considered to be a module itself as well. \n", "\n", "The basic template of a module is as follows:" ] }, { "cell_type": "code", "execution_count": null, "id": "9fc8b8ee", "metadata": { "id": "9fc8b8ee" }, "outputs": [], "source": [ "class MyModule(nn.Module):\n", " \n", " def __init__(self):\n", " super().__init__()\n", " # Some init for my module\n", " \n", " def forward(self, x):\n", " # Function for performing the calculation of the module.\n", " pass" ] }, { "cell_type": "markdown", "id": "fefdd1a4", "metadata": { "id": "fefdd1a4" }, "source": [ "The `forward` method is where the computation of the module is taken place, and is executed when you call the module (`nn = MyModule(); nn(x)`). In the `__init__` method we specify the parameters of the module using `nn.Parameter` or defining other modules that are used in the forward function. The backward calculation is done automatically, but could be overriden with a custom `backward` method as needed.\n", "\n", "#### Simple classifier\n", "We can now make use of the pre-defined modules in the `torch.nn` package, and define our own small neural network. We will use a minimal network with a input layer, one hidden layer with tanh as activation function, and a output layer. \n", "\n", "![image.png]()\n", "\n", "The input neurons are shown in blue, which represent the coordinates $x_1$ and $x_2$ of a data point. The hidden neurons including a tanh activation are shown in white, and the output neuron in red.\n", "In PyTorch, we can define this as follows:" ] }, { "cell_type": "code", "execution_count": null, "id": "49e269ea", "metadata": { "id": "49e269ea" }, "outputs": [], "source": [ "class SimpleClassifier(nn.Module):\n", "\n", " def __init__(self, num_inputs, num_hidden, num_outputs):\n", " super().__init__()\n", " # Initialize the modules we need to build the network\n", " self.linear1 = nn.Linear(num_inputs, num_hidden)\n", " self.act_fn = nn.Tanh()\n", " self.linear2 = nn.Linear(num_hidden, num_outputs)\n", "\n", " def forward(self, x):\n", " # Perform the calculation of the model to determine the prediction\n", " x = self.linear1(x)\n", " x = self.act_fn(x)\n", " x = self.linear2(x)\n", " return x" ] }, { "cell_type": "markdown", "id": "ce9025ca", "metadata": { "id": "ce9025ca" }, "source": [ "For the examples in this notebook, we will use a tiny neural network with two input neurons and four hidden neurons. As we perform binary classification, we will use a single output neuron. Note that we do not apply an activation to the output. This is because other functions, especially the loss, are more efficient and precise to calculate on the original outputs instead of the sigmoid output. " ] }, { "cell_type": "code", "execution_count": null, "id": "d2d99a2b", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "d2d99a2b", "outputId": "c09a3ddd-e15f-4ce9-cde7-76ca8f356daa" }, "outputs": [], "source": [ "model = SimpleClassifier(num_inputs=2, num_hidden=4, num_outputs=1)\n", "model" ] }, { "cell_type": "markdown", "id": "1d82bffd", "metadata": { "id": "1d82bffd" }, "source": [ "The parameters of a module can be obtained by using its `parameters()` functions, or `named_parameters()` to get a name to each parameter object. " ] }, { "cell_type": "code", "execution_count": null, "id": "498aa1d7", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "498aa1d7", "outputId": "826feb33-b1d5-411f-e049-81ce14a746da" }, "outputs": [], "source": [ "for name, param in model.named_parameters():\n", " print(f\"Parameter {name}, shape {param.shape}\")" ] }, { "cell_type": "markdown", "id": "47146b4f", "metadata": { "id": "47146b4f" }, "source": [ "Each linear layer has a weight matrix of the shape `[output, input]`, and a bias of the shape `[output]`. The tanh activation function does not have any parameters. Note that parameters are only registered for `nn.Module` objects that are attributes of the class. If you define a list of modules, the parameters of those are not registered for the outer module and can cause some issues when you try to optimize your module. There are alternatives, like `nn.ModuleList`, `nn.ModuleDict` and `nn.Sequential`, that allow you to have different data structures of modules. " ] }, { "cell_type": "markdown", "id": "eb225d4c", "metadata": { "id": "eb225d4c" }, "source": [ "#### Custom XOR Dataset\n", "\n", "To define a custom dataset in PyTorch, we simply specify two functions: `__getitem__`, and `__len__`. The `__getitem__` method returns the `idx`-indexed data point in the dataset, while the `__len__` method returns the size of the dataset. " ] }, { "cell_type": "code", "execution_count": null, "id": "aae2bb58", "metadata": { "id": "aae2bb58" }, "outputs": [], "source": [ "import torch.utils.data as data\n", "\n", "class XORDataset(data.Dataset):\n", "\n", " def __init__(self, size, std=0.1):\n", " \"\"\"\n", " Inputs:\n", " size - Number of data points we want to generate\n", " std - Standard deviation of the noise (see generate_continuous_xor function)\n", " \"\"\"\n", " super().__init__()\n", " self.size = size\n", " self.std = std\n", " self.generate_continuous_xor()\n", "\n", " def generate_continuous_xor(self):\n", " # Each data point in the XOR dataset has two variables, x and y, that can be either 0 or 1\n", " # The label is their XOR combination, i.e. 1 if only x or only y is 1 while the other is 0.\n", " # If x=y, the label is 0.\n", " data = torch.randint(low=0, high=2, size=(self.size, 2), dtype=torch.float32)\n", " label = (data.sum(dim=1) == 1).to(torch.long)\n", " # To make it slightly more challenging, we add a bit of gaussian noise to the data points.\n", " data += self.std * torch.randn(data.shape)\n", "\n", " self.data = data\n", " self.label = label\n", "\n", " def __len__(self):\n", " # Number of data point we have. Alternatively self.data.shape[0], or self.label.shape[0]\n", " return self.size\n", "\n", " def __getitem__(self, idx):\n", " # Return the idx-th data point of the dataset\n", " # If we have multiple things to return (data point and label), we can return them as tuple\n", " data_point = self.data[idx]\n", " data_label = self.label[idx]\n", " return data_point, data_label" ] }, { "cell_type": "markdown", "id": "c91a11ec", "metadata": { "id": "c91a11ec" }, "source": [ "Let's try to create such a dataset and inspect it:" ] }, { "cell_type": "code", "execution_count": null, "id": "81f96424", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "81f96424", "outputId": "7cb501d7-98aa-415a-9479-ca9db49ea441" }, "outputs": [], "source": [ "dataset = XORDataset(size=200)\n", "print(\"Size of dataset:\", len(dataset))\n", "print(\"Data point 0:\", dataset[0])" ] }, { "cell_type": "code", "execution_count": null, "id": "47cc981c", "metadata": { "id": "47cc981c" }, "outputs": [], "source": [ "def visualize_samples(data, label):\n", " if isinstance(data, torch.Tensor):\n", " data = data.cpu().numpy()\n", " if isinstance(label, torch.Tensor):\n", " label = label.cpu().numpy()\n", " data_0 = data[label == 0]\n", " data_1 = data[label == 1]\n", " \n", " plt.figure(figsize=(4,4))\n", " plt.scatter(data_0[:,0], data_0[:,1], edgecolor=\"#333\", label=\"Class 0\")\n", " plt.scatter(data_1[:,0], data_1[:,1], edgecolor=\"#333\", label=\"Class 1\")\n", " plt.title(\"Dataset samples\")\n", " plt.ylabel(r\"$x_2$\")\n", " plt.xlabel(r\"$x_1$\")\n", " plt.legend()" ] }, { "cell_type": "code", "execution_count": null, "id": "4bd6b4e6", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 297 }, "id": "4bd6b4e6", "outputId": "28176085-131a-4e03-946d-5d1ae3a712cc" }, "outputs": [], "source": [ "visualize_samples(dataset.data, dataset.label)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "0132dbf8", "metadata": { "id": "0132dbf8" }, "source": [ "Let's create a simple data loader:" ] }, { "cell_type": "code", "execution_count": null, "id": "458d8a43", "metadata": { "id": "458d8a43" }, "outputs": [], "source": [ "data_loader = torch.utilis.data.DataLoader(dataset, batch_size=8, shuffle=True)" ] }, { "cell_type": "markdown", "id": "225c5edd", "metadata": {}, "source": [ "The data loader is iterable; `next(iter(...))` catches the first batch of the loader\n", "If `shuffle` is `True`, this will return a different batch every time we run this cell.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "048bad12", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "048bad12", "outputId": "a99e8785-0ae1-4ac7-d704-c1d4b0e28600" }, "outputs": [], "source": [ "data_inputs, data_labels = next(iter(data_loader))" ] }, { "cell_type": "markdown", "id": "a9bc9781", "metadata": {}, "source": [ "The shape of the outputs are `[batch_size, d_1,...,d_N]` where `d_1,...,d_N` are the \n", "dimensions of the data point returned from the dataset class" ] }, { "cell_type": "code", "execution_count": null, "id": "6fea4669", "metadata": {}, "outputs": [], "source": [ "print(\"Data inputs\", data_inputs.shape, \"\\n\", data_inputs)\n", "print(\"Data labels\", data_labels.shape, \"\\n\", data_labels)" ] }, { "cell_type": "markdown", "id": "5584ab5a", "metadata": { "id": "5584ab5a" }, "source": [ "### Optimization\n", "\n", "After defining the model and the dataset, it is time to prepare the optimization of the model. During training, we will perform the following steps:\n", "\n", "1. Get a batch from the data loader\n", "2. Obtain the predictions from the model for the batch\n", "3. Calculate the loss based on the difference between predictions and labels\n", "4. Backpropagation: calculate the gradients for every parameter with respect to the loss\n", "5. Update the parameters of the model in the direction of the gradients\n", "\n", "We have seen how we can do step 1, 2 and 4 in PyTorch. Now, we will look at step 3 and 5." ] }, { "cell_type": "markdown", "id": "8a5aab14", "metadata": { "id": "8a5aab14" }, "source": [ "#### Loss modules\n", "\n", "We can calculate the loss for a batch by simply performing a few tensor operations as those are automatically added to the computation graph. For instance, for binary classification, we can use Binary Cross Entropy (BCE) which is defined as follows:\n", "\n", "$$\\mathcal{L}_{BCE} = -\\sum_i \\left[ y_i \\log x_i + (1 - y_i) \\log (1 - x_i) \\right]$$\n", "\n", "where $y$ are our labels, and $x$ our predictions, both in the range of $[0,1]$. However, PyTorch already provides a list of predefined loss functions which we can use (see [here](https://pytorch.org/docs/stable/nn.html#loss-functions) for a full list). For instance, for BCE, PyTorch has two modules: `nn.BCELoss()`, `nn.BCEWithLogitsLoss()`. While `nn.BCELoss` expects the inputs $x$ to be in the range $[0,1]$, i.e. the output of a sigmoid, `nn.BCEWithLogitsLoss` combines a sigmoid layer and the BCE loss in a single class. This version is numerically more stable than using a standard sigmoid followed by a BCE loss because of the logarithms applied in the loss function. Hence, it is best to use loss functions applied on logits where possible. For our model, we therefore use the module `nn.BCEWithLogitsLoss`. " ] }, { "cell_type": "code", "execution_count": null, "id": "b0e39ec6", "metadata": { "id": "b0e39ec6" }, "outputs": [], "source": [ "loss_module = nn.BCEWithLogitsLoss()" ] }, { "cell_type": "markdown", "id": "cefa4095", "metadata": { "id": "cefa4095" }, "source": [ "#### Stochastic Gradient Descent\n", "\n", "For updating the parameters, PyTorch provides the package `torch.optim` that has most popular optimizers implemented. We will discuss the specific optimizers and their differences later in the course, but will for now use the simplest of them: `torch.optim.SGD`. Stochastic Gradient Descent updates parameters by multiplying the gradients with a small constant, called learning rate, and subtracting those from the parameters (hence minimizing the loss). Therefore, we slowly move towards the direction of minimizing the loss. A good default value of the learning rate for a small network is 0.1. \n", "\n", "The input to the optimizer are the parameters of the model `model.parameters()`\n" ] }, { "cell_type": "code", "execution_count": null, "id": "65772a2e", "metadata": { "id": "65772a2e" }, "outputs": [], "source": [ "optimizer = torch.optim.SGD(model.parameters(), lr=0.1)" ] }, { "cell_type": "markdown", "id": "76006d49", "metadata": { "id": "76006d49" }, "source": [ "The optimizer provides two useful methods: `step()`, and `zero_grad()`. The `step()` method updates the parameters based on the gradients as explained above, while `zero_grad()` sets the gradients of all parameters to zero. While this function seems less relevant at first, it is a crucial step before performing backpropagation. If we would call the `backward` function on the loss while the parameter gradients are non-zero from the previous batch, the new gradients would actually be added to the previous ones instead of overwriting them. This is done because a parameter might occur multiple times in a computation graph, and we need to sum the gradients in this case instead of replacing them. Hence, remember to call `optimizer.zero_grad()` before calculating the gradients of a batch." ] }, { "cell_type": "markdown", "id": "1ad88a1c", "metadata": { "id": "1ad88a1c" }, "source": [ "### Training\n", "\n", "Finally, we are ready to train our model. As a first step, we create a slightly larger dataset and specify a data loader with a larger batch size. " ] }, { "cell_type": "code", "execution_count": null, "id": "ef0df3d2", "metadata": { "id": "ef0df3d2" }, "outputs": [], "source": [ "train_dataset = XORDataset(size=2500)\n", "train_data_loader = data.DataLoader(train_dataset, batch_size=128, shuffle=True)" ] }, { "cell_type": "markdown", "id": "dd5b80f0", "metadata": { "id": "dd5b80f0" }, "source": [ "Now, we can write a small training function. Remember the five steps: \n", "\n", "1. load a batch\n", "2. obtain the predictions\n", "3. calculate the loss\n", "4. backpropagate\n", "5. update \n", " \n", "Additionally, we have to push all data and model parameters to the device of our choice (GPU if available). For our simple example, communicating the data to the GPU actually takes much more time than we could save from running the operation on GPU. For large networks, the communication time is significantly smaller than the actual runtime making a GPU crucial in these cases. Nevertheless, we will push the data to GPU here. " ] }, { "cell_type": "code", "execution_count": null, "id": "1e12ea55", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "1e12ea55", "outputId": "f6a6608f-13bb-400b-e032-9fe8c426aa4c" }, "outputs": [], "source": [ "# Push model to device. Has to be only done once\n", "model.to(device)" ] }, { "cell_type": "markdown", "id": "a499ebb7", "metadata": { "id": "a499ebb7" }, "source": [ "In addition, we set our model to training mode. This is done by calling `model.train()`. There exist certain modules that need to perform a different forward step during training than during testing (e.g. `BatchNorm` and `Dropout`), and we can switch between them using `model.train()` and `model.eval()`." ] }, { "cell_type": "code", "execution_count": null, "id": "b4aa1325", "metadata": { "id": "b4aa1325" }, "outputs": [], "source": [ "from tqdm import tqdm\n", "\n", "def train_model(model, optimizer, data_loader, loss_module, num_epochs=100):\n", " # Set model to train mode\n", " model.train() \n", " \n", " # Training loop\n", " for epoch in tqdm(range(num_epochs)):\n", " for data_inputs, data_labels in data_loader:\n", " \n", " ## Step 1: Move input data to device (only strictly necessary if we use GPU)\n", " data_inputs = data_inputs.to(device)\n", " data_labels = data_labels.to(device)\n", " \n", " ## Step 2: Run the model on the input data\n", " preds = model(data_inputs)\n", " preds = preds.squeeze(dim=1) # Output is [Batch size, 1], but we want [Batch size]\n", " \n", " ## Step 3: Calculate the loss\n", " loss = loss_module(preds, data_labels.float())\n", " \n", " ## Step 4: Perform backpropagation\n", " # Before calculating the gradients, we need to ensure that they are all zero. \n", " # The gradients would not be overwritten, but actually added to the existing ones.\n", " optimizer.zero_grad() \n", " # Perform backpropagation\n", " loss.backward()\n", " \n", " ## Step 5: Update the parameters\n", " optimizer.step()" ] }, { "cell_type": "code", "execution_count": null, "id": "e6c86b96", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "e6c86b96", "outputId": "6a521e6d-09fe-4e81-9916-b7ca277a1c6d" }, "outputs": [], "source": [ "train_model(model, optimizer, train_data_loader, loss_module)" ] }, { "cell_type": "markdown", "id": "0b862513", "metadata": { "id": "0b862513" }, "source": [ "#### Saving a model\n", "\n", "After finish training a model, we serialize the model to disk so that we can use the same weights to reconstruct the model when needed. For this, we extract the `state_dict` from the model which contains all learnable parameters. " ] }, { "cell_type": "code", "execution_count": null, "id": "d43f7552", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "d43f7552", "outputId": "916eb44c-db76-46db-8e69-587029608aef" }, "outputs": [], "source": [ "state_dict = model.state_dict()\n", "print(state_dict)" ] }, { "cell_type": "markdown", "id": "88e0ab8b", "metadata": { "id": "88e0ab8b" }, "source": [ "To save the state dictionary, we can use `torch.save`:" ] }, { "cell_type": "code", "execution_count": null, "id": "d067493b", "metadata": { "id": "d067493b" }, "outputs": [], "source": [ "torch.save(state_dict, \"our_model.tar\")" ] }, { "cell_type": "markdown", "id": "adc2f9e2", "metadata": { "id": "adc2f9e2" }, "source": [ "To load a model from a state dict, we use the function `torch.load` to load the state dict from the disk, and the module function `load_state_dict` to overwrite our parameters with the new values:" ] }, { "cell_type": "code", "execution_count": null, "id": "e5ab5e1a", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "e5ab5e1a", "outputId": "727287b1-e7d8-4277-ee64-719c76e3d9b4" }, "outputs": [], "source": [ "# Load state dict from the disk (make sure it is the same name as above)\n", "state_dict = torch.load(\"our_model.tar\")\n", "\n", "# Create a new model and load the state\n", "new_model = SimpleClassifier(num_inputs=2, num_hidden=4, num_outputs=1)\n", "new_model.load_state_dict(state_dict)\n", "\n", "# Verify that the parameters are the same\n", "print(\"Original model\\n\", model.state_dict())\n", "print(\"\\nLoaded model\\n\", new_model.state_dict())" ] }, { "cell_type": "markdown", "id": "3c9333b2", "metadata": { "id": "3c9333b2" }, "source": [ "A detailed tutorial on saving and loading models in PyTorch can be found [here](https://pytorch.org/tutorials/beginner/saving_loading_models.html)." ] }, { "cell_type": "markdown", "id": "b9356299", "metadata": { "id": "b9356299" }, "source": [ "### Evaluation\n", "\n", "Once we have trained a model, it is time to evaluate it on a held-out test set. As our dataset consist of randomly generated data points, we need to first create a test set with a corresponding data loader." ] }, { "cell_type": "code", "execution_count": null, "id": "293a8e46", "metadata": { "id": "293a8e46" }, "outputs": [], "source": [ "test_dataset = XORDataset(size=500)\n", "test_data_loader = data.DataLoader(test_dataset, batch_size=128, shuffle=False, drop_last=False) " ] }, { "cell_type": "markdown", "id": "fc3c64bf", "metadata": { "id": "fc3c64bf" }, "source": [ "As metric, we will use **accuracy** which is calculated as follows:\n", "\n", "$$acc = \\frac{\\#\\text{correct predictions}}{\\#\\text{all predictions}} = \\frac{TP+TN}{TP+TN+FP+FN}$$\n", "\n", "where TP are the true positives, TN true negatives, FP false positives, and FN the fale negatives. \n", "\n", "When evaluating the model, we don't need to keep track of the computation graph as we don't intend to calculate the gradients. This reduces the required memory and speed up the model. In PyTorch, we can temporarily deactivate the computation graph using a context manager. \n", "\n", "Remember to set the model to eval mode." ] }, { "cell_type": "code", "execution_count": null, "id": "dc8df44c", "metadata": { "id": "dc8df44c" }, "outputs": [], "source": [ "def eval_model(model, data_loader):\n", " model.eval() # Set model to eval mode\n", " true_preds, num_preds = 0., 0.\n", " \n", " with torch.no_grad(): # Deactivate gradients for the following code\n", " for data_inputs, data_labels in data_loader:\n", " \n", " # Determine prediction of model on dev set\n", " data_inputs, data_labels = data_inputs.to(device), data_labels.to(device)\n", " preds = model(data_inputs)\n", " preds = preds.squeeze(dim=1)\n", " preds = torch.sigmoid(preds) # Sigmoid to map predictions between 0 and 1\n", " pred_labels = (preds >= 0.5).long() # Binarize predictions to 0 and 1\n", " \n", " # Keep records of predictions for the accuracy metric (true_preds=TP+TN, num_preds=TP+TN+FP+FN)\n", " true_preds += (pred_labels == data_labels).sum()\n", " num_preds += data_labels.shape[0]\n", " \n", " acc = true_preds / num_preds\n", " print(f\"Accuracy of the model: {100.0*acc:4.2f}%\")" ] }, { "cell_type": "code", "execution_count": null, "id": "0f7b7f70", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "0f7b7f70", "outputId": "4f32c18a-3411-4382-f832-4512bd53f120" }, "outputs": [], "source": [ "eval_model(model, test_data_loader)" ] }, { "cell_type": "markdown", "id": "e682dcd2", "metadata": { "id": "e682dcd2" }, "source": [ "#### Visualizing classification boundaries\n", "\n", "To visualize what our model has learned, we can perform a prediction for every data point in a range of $[-0.5, 1.5]$, and visualize the predicted class as in the sample figure at the beginning of this section. This shows where the model has created decision boundaries, and which points would be classified as $0$, and which as $1$. We therefore get a background image out of blue (class 0) and orange (class 1). The spots where the model is uncertain we will see a blurry overlap. The specific code is less relevant compared to the output figure which should hopefully show us a clear separation of classes:" ] }, { "cell_type": "code", "execution_count": null, "id": "ada53a67", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 460 }, "id": "ada53a67", "outputId": "d6e10c9d-5401-4655-c088-01d1ec19a21a" }, "outputs": [], "source": [ "from matplotlib.colors import to_rgba\n", "\n", "@torch.no_grad()\n", "def visualize_classification(model, data, label):\n", " if isinstance(data, torch.Tensor):\n", " data = data.cpu().numpy()\n", " if isinstance(label, torch.Tensor):\n", " label = label.cpu().numpy()\n", " data_0 = data[label == 0]\n", " data_1 = data[label == 1]\n", " \n", " fig = plt.figure(figsize=(4,4), dpi=100)\n", " plt.scatter(data_0[:,0], data_0[:,1], edgecolor=\"#333\", label=\"Class 0\")\n", " plt.scatter(data_1[:,0], data_1[:,1], edgecolor=\"#333\", label=\"Class 1\")\n", " plt.title(\"Dataset samples\")\n", " plt.ylabel(r\"$x_2$\")\n", " plt.xlabel(r\"$x_1$\")\n", " plt.legend()\n", " \n", " # Let's make use of a lot of operations we have learned above\n", " model.to(device)\n", " c0 = torch.Tensor(to_rgba(\"C0\")).to(device)\n", " c1 = torch.Tensor(to_rgba(\"C1\")).to(device)\n", " x1 = torch.arange(-0.5, 1.5, step=0.01, device=device)\n", " x2 = torch.arange(-0.5, 1.5, step=0.01, device=device)\n", " xx1, xx2 = torch.meshgrid(x1, x2) # Meshgrid function as in numpy\n", " model_inputs = torch.stack([xx1, xx2], dim=-1)\n", " preds = model(model_inputs)\n", " preds = torch.sigmoid(preds)\n", " output_image = (1 - preds) * c0[None,None] + preds * c1[None,None] # Specifying \"None\" in a dimension creates a new one\n", " output_image = output_image.cpu().numpy() # Convert to numpy array. This only works for tensors on CPU, hence first push to CPU\n", " plt.imshow(output_image, origin='lower', extent=(-0.5, 1.5, -0.5, 1.5))\n", " plt.grid(False)\n", " return fig\n", "\n", "_ = visualize_classification(model, dataset.data, dataset.label)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "fe247b77", "metadata": { "id": "fe247b77" }, "source": [ "The decision boundaries might not look exactly as in the figure in the preamble of this section which can be caused by running it on CPU or a different GPU architecture. Nevertheless, the result on the accuracy metric should be the approximately the same. " ] }, { "cell_type": "markdown", "id": "283da4f0", "metadata": {}, "source": [ "## Exercise\n", "\n", "Build a multi-layer network to predict the grape varietal from the wine dataset." ] }, { "cell_type": "code", "execution_count": null, "id": "0b4aab25", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "wine = pd.read_table(\"../data/wine.dat\", sep='\\s+')\n", "\n", "attributes = ['Alcohol',\n", " 'Malic acid',\n", " 'Ash',\n", " 'Alcalinity of ash',\n", " 'Magnesium',\n", " 'Total phenols',\n", " 'Flavanoids',\n", " 'Nonflavanoid phenols',\n", " 'Proanthocyanins',\n", " 'Color intensity',\n", " 'Hue',\n", " 'OD280/OD315 of diluted wines',\n", " 'Proline']\n", "\n", "grape = wine.pop('region')\n", "y = grape.values-1\n", "X = wine.values" ] }, { "cell_type": "code", "execution_count": null, "id": "74726b3a", "metadata": {}, "outputs": [], "source": [ "# Write your answer here" ] }, { "cell_type": "markdown", "id": "5b4b7b9f", "metadata": { "id": "5b4b7b9f" }, "source": [ "---\n", "## References\n", "\n", "- [Deep Learning with PyTorch](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html)\n", "- [University of Amsterdam Deep Learning Tutorials](https://uvadlc-notebooks.readthedocs.io/en/latest/)" ] } ], "metadata": { "accelerator": "GPU", "colab": { "collapsed_sections": [], "name": "Section7_2-Introduction-to-PyTorch.ipynb", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.9" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "068c5d4019c740019f30c75afaf877c6": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "0c08e5867eef48cfae0adf6a02915631": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "208cd522e37a4d4e8a23bb7af68cbbc5": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "3070edc6c79b40309c198b5c4845c569": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_208cd522e37a4d4e8a23bb7af68cbbc5", "placeholder": "​", "style": "IPY_MODEL_fc8f946314f641c5af76cf7d62c7b4db", "value": " 170499072/? [00:02<00:00, 66048766.89it/s]" } }, "378e399b9dde438e91ea576916fc7f5b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_7fa15f28033e4c0cbdca4d87019dde74", "max": 170498071, "min": 0, "orientation": "horizontal", "style": "IPY_MODEL_9ecef5efa63041599041d76ee295a8c8", "value": 170498071 } }, "3e0bd075e0174bc1bdcb137bafa82b80": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HBoxView", "box_style": "", "children": [ "IPY_MODEL_9c92e8e0174c44388aaf7e50297e0d3b", "IPY_MODEL_378e399b9dde438e91ea576916fc7f5b", "IPY_MODEL_3070edc6c79b40309c198b5c4845c569" ], "layout": "IPY_MODEL_eccc0d3be2f84bd5954077b0c5ed0fee" } }, "7d4297ecd6bd41f58fa31e61c6cfa341": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "7fa15f28033e4c0cbdca4d87019dde74": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "7fce1614a72d47e7b0fd3d59c692e2cb": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "909d0ff2a5454898827ccb8b0c83f318": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "97433cc60d6841859129fcb83c67ec4b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HBoxView", "box_style": "", "children": [ "IPY_MODEL_c41ea09afdd8433c9d4593ccf9a4d4e9", "IPY_MODEL_a797b8179d314847915848d12720352e", "IPY_MODEL_b6cbb990998e4ce890309fb7065803ff" ], "layout": "IPY_MODEL_909d0ff2a5454898827ccb8b0c83f318" } }, "9af871bc187c47b89603944054aac415": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "9c92e8e0174c44388aaf7e50297e0d3b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_e0537b86f5b64a098a72b5452093d378", "placeholder": "​", "style": "IPY_MODEL_0c08e5867eef48cfae0adf6a02915631", "value": "" } }, "9ecef5efa63041599041d76ee295a8c8": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "bar_color": null, "description_width": "" } }, "a797b8179d314847915848d12720352e": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_9af871bc187c47b89603944054aac415", "max": 170498071, "min": 0, "orientation": "horizontal", "style": "IPY_MODEL_b528593e4a024f428b1a0a8fb49e2386", "value": 170498071 } }, "b528593e4a024f428b1a0a8fb49e2386": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "bar_color": null, "description_width": "" } }, "b6cbb990998e4ce890309fb7065803ff": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_7d4297ecd6bd41f58fa31e61c6cfa341", "placeholder": "​", "style": "IPY_MODEL_068c5d4019c740019f30c75afaf877c6", "value": " 170499072/? [00:02<00:00, 64265175.65it/s]" } }, "c41ea09afdd8433c9d4593ccf9a4d4e9": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_f7325c3b35ad4414bd1c7ab3ffb7858f", "placeholder": "​", "style": "IPY_MODEL_7fce1614a72d47e7b0fd3d59c692e2cb", "value": "" } }, "e0537b86f5b64a098a72b5452093d378": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "eccc0d3be2f84bd5954077b0c5ed0fee": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "f7325c3b35ad4414bd1c7ab3ffb7858f": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "fc8f946314f641c5af76cf7d62c7b4db": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } } } } }, "nbformat": 4, "nbformat_minor": 5 }