{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"2022-01-22-torch.ipynb","provenance":[{"file_id":"https://github.com/recohut/nbs/blob/main/raw/T206654%20%7C%20PyTorch%20Fundamentals%20Part%203.ipynb","timestamp":1644661382940}],"collapsed_sections":["o955FXZZkCEk","wMJvhqEB5iC-","qQHxydHZ5qaV","nHyMLmcM5iDE","OtZrzIZJ5iDH","j-AfPHnK5iDI","mQB4RAJh5iDK","CxTuMi0B5iDL","YeIFAlZj5iDM","X1rQSGrW5iDM","UpSbWJFB5iDO","WD2vgrnLlJLn"],"authorship_tag":"ABX9TyMYLi3jz7aoJGE1dY9UNTyw"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","metadata":{"id":"GvJoPkNIlpiY"},"source":["# PyTorch Fundamentals"]},{"cell_type":"markdown","metadata":{"id":"L_1mg9UYj7Q2"},"source":["## Tensors"]},{"cell_type":"markdown","metadata":{"id":"V1JS0B802NXL"},"source":["Before we dive deep into the world of PyTorch development, it’s important to familiarize yourself with the fundamental data structure in PyTorch: the torch.tensor. By understanding the tensor, you will understand how PyTorch handles and stores data, and since deep learning is fundamentally the collection and manipulation of floating-point numbers, understanding tensors will help you understand how PyTorch implements more advanced functions for deep learning. In addition, you may find yourself using tensor operations frequently when preprocessing input data or manipulating output data during model development"]},{"cell_type":"markdown","metadata":{"id":"dmq42z9n2qPS"},"source":["In PyTorch, a tensor is a data structure used to store and manipulate data. Like a NumPy array, a tensor is a multidimensional array containing elements of a single data type. Tensors can be used to represent scalars, vectors, matrices, and n-dimensional arrays and are derived from the torch.Tensor class. However, tensors are more than just arrays of numbers. Creating or instantiating a tensor object from the torch.Tensor class gives us access to a set of built-in class attributes and operations or class methods that provide a robust set of built-in capabilities. This guide describes these attributes and operations in detail."]},{"cell_type":"markdown","metadata":{"id":"ClxVmz532rjS"},"source":["Tensors also include added benefits that make them more suitable than NumPy arrays for deep learning calculations. First, tensor operations can be performed significantly faster using GPU acceleration. Second, tensors can be stored and manipulated at scale using distributed processing on multiple CPUs and GPUs and across multiple servers. And third, tensors keep track of their graph computations, which is very important in implementing a deep learning library."]},{"cell_type":"markdown","metadata":{"id":"mW6c7P-N22oi"},"source":["**Simple example**"]},{"cell_type":"markdown","metadata":{"id":"JddfsjK83LfC"},"source":["First, we import the PyTorch library, then we create two tensors, x and y, from two-dimensional lists. Next, we add the two tensors and store the result in z. We can just use the + operator here because the torch.Tensor class supports operator overloading. Finally, we print the new tensor, z, which we can see is the matrix sum of x and y, and we print the size of z. Notice that z is a tensor object itself and the size() method is used to return its matrix dimensions, namely 2 × 3:"]},{"cell_type":"code","metadata":{"id":"FsxVCauz3MKH"},"source":["import torch\n","\n","x = torch.tensor([[1,2,3],[4,5,6]])\n","y = torch.tensor([[7,8,9],[10,11,12]])\n","z = x + y"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"mEIMUuTr3Udo","executionInfo":{"status":"ok","timestamp":1631129024511,"user_tz":-330,"elapsed":467,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"e8894882-b7a2-4255-a9dd-400e80ec6d03"},"source":["print(z)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["tensor([[ 8, 10, 12],\n"," [14, 16, 18]])\n"]}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"GXEbbu4M3TNI","executionInfo":{"status":"ok","timestamp":1631129038084,"user_tz":-330,"elapsed":425,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"c4fec653-9e02-468f-83de-f1317127d6f2"},"source":["print(z.size())"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["torch.Size([2, 3])\n"]}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":885},"id":"WyeCg7C73eiY","executionInfo":{"status":"ok","timestamp":1631129159852,"user_tz":-330,"elapsed":541,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"fa75bf8e-78fc-4201-f7ce-50913cf15b61"},"source":["', '.join(dir(z))"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"application/vnd.google.colaboratory.intrinsic+json":{"type":"string"},"text/plain":["'T, __abs__, __add__, __and__, __array__, __array_priority__, __array_wrap__, __bool__, __class__, __complex__, __contains__, __deepcopy__, __delattr__, __delitem__, __dict__, __dir__, __div__, __doc__, __eq__, __float__, __floordiv__, __format__, __ge__, __getattribute__, __getitem__, __gt__, __hash__, __iadd__, __iand__, __idiv__, __ifloordiv__, __ilshift__, __imod__, __imul__, __index__, __init__, __init_subclass__, __int__, __invert__, __ior__, __ipow__, __irshift__, __isub__, __iter__, __itruediv__, __ixor__, __le__, __len__, __long__, __lshift__, __lt__, __matmul__, __mod__, __module__, __mul__, __ne__, __neg__, __new__, __nonzero__, __or__, __pos__, __pow__, __radd__, __rdiv__, __reduce__, __reduce_ex__, __repr__, __reversed__, __rfloordiv__, __rmul__, __rpow__, __rshift__, __rsub__, __rtruediv__, __setattr__, __setitem__, __setstate__, __sizeof__, __str__, __sub__, __subclasshook__, __torch_function__, __truediv__, __weakref__, __xor__, _backward_hooks, _base, _cdata, _coalesced_, _dimI, _dimV, _grad, _grad_fn, _indices, _is_view, _make_subclass, _nnz, _reduce_ex_internal, _to_sparse_csr, _update_names, _values, _version, abs, abs_, absolute, absolute_, acos, acos_, acosh, acosh_, add, add_, addbmm, addbmm_, addcdiv, addcdiv_, addcmul, addcmul_, addmm, addmm_, addmv, addmv_, addr, addr_, align_as, align_to, all, allclose, amax, amin, angle, any, apply_, arccos, arccos_, arccosh, arccosh_, arcsin, arcsin_, arcsinh, arcsinh_, arctan, arctan_, arctanh, arctanh_, argmax, argmin, argsort, as_strided, as_strided_, as_subclass, asin, asin_, asinh, asinh_, atan, atan2, atan2_, atan_, atanh, atanh_, backward, baddbmm, baddbmm_, bernoulli, bernoulli_, bfloat16, bincount, bitwise_and, bitwise_and_, bitwise_not, bitwise_not_, bitwise_or, bitwise_or_, bitwise_xor, bitwise_xor_, bmm, bool, broadcast_to, byte, cauchy_, cdouble, ceil, ceil_, cfloat, char, cholesky, cholesky_inverse, cholesky_solve, chunk, clamp, clamp_, clamp_max, clamp_max_, clamp_min, clamp_min_, clip, clip_, clone, coalesce, col_indices, conj, contiguous, copy_, copysign, copysign_, cos, cos_, cosh, cosh_, count_nonzero, cpu, cross, crow_indices, cuda, cummax, cummin, cumprod, cumprod_, cumsum, cumsum_, data, data_ptr, deg2rad, deg2rad_, dense_dim, dequantize, det, detach, detach_, device, diag, diag_embed, diagflat, diagonal, diff, digamma, digamma_, dim, dist, div, div_, divide, divide_, dot, double, dsplit, dtype, eig, element_size, eq, eq_, equal, erf, erf_, erfc, erfc_, erfinv, erfinv_, exp, exp2, exp2_, exp_, expand, expand_as, expm1, expm1_, exponential_, fill_, fill_diagonal_, fix, fix_, flatten, flip, fliplr, flipud, float, float_power, float_power_, floor, floor_, floor_divide, floor_divide_, fmax, fmin, fmod, fmod_, frac, frac_, frexp, gather, gcd, gcd_, ge, ge_, geometric_, geqrf, ger, get_device, grad, grad_fn, greater, greater_, greater_equal, greater_equal_, gt, gt_, half, hardshrink, has_names, heaviside, heaviside_, histc, hsplit, hypot, hypot_, i0, i0_, igamma, igamma_, igammac, igammac_, imag, index_add, index_add_, index_copy, index_copy_, index_fill, index_fill_, index_put, index_put_, index_select, indices, inner, int, int_repr, inverse, is_coalesced, is_complex, is_contiguous, is_cuda, is_distributed, is_floating_point, is_leaf, is_meta, is_mkldnn, is_mlc, is_nonzero, is_pinned, is_quantized, is_same_size, is_set_to, is_shared, is_signed, is_sparse, is_sparse_csr, is_vulkan, is_xpu, isclose, isfinite, isinf, isnan, isneginf, isposinf, isreal, istft, item, kron, kthvalue, layout, lcm, lcm_, ldexp, ldexp_, le, le_, lerp, lerp_, less, less_, less_equal, less_equal_, lgamma, lgamma_, log, log10, log10_, log1p, log1p_, log2, log2_, log_, log_normal_, log_softmax, logaddexp, logaddexp2, logcumsumexp, logdet, logical_and, logical_and_, logical_not, logical_not_, logical_or, logical_or_, logical_xor, logical_xor_, logit, logit_, logsumexp, long, lstsq, lt, lt_, lu, lu_solve, map2_, map_, masked_fill, masked_fill_, masked_scatter, masked_scatter_, masked_select, matmul, matrix_exp, matrix_power, max, maximum, mean, median, min, minimum, mm, mode, moveaxis, movedim, msort, mul, mul_, multinomial, multiply, multiply_, mv, mvlgamma, mvlgamma_, name, names, nan_to_num, nan_to_num_, nanmedian, nanquantile, nansum, narrow, narrow_copy, ndim, ndimension, ne, ne_, neg, neg_, negative, negative_, nelement, new, new_empty, new_empty_strided, new_full, new_ones, new_tensor, new_zeros, nextafter, nextafter_, nonzero, norm, normal_, not_equal, not_equal_, numel, numpy, orgqr, ormqr, outer, output_nr, permute, pin_memory, pinverse, polygamma, polygamma_, positive, pow, pow_, prelu, prod, put, put_, q_per_channel_axis, q_per_channel_scales, q_per_channel_zero_points, q_scale, q_zero_point, qr, qscheme, quantile, rad2deg, rad2deg_, random_, ravel, real, reciprocal, reciprocal_, record_stream, refine_names, register_hook, reinforce, relu, relu_, remainder, remainder_, rename, rename_, renorm, renorm_, repeat, repeat_interleave, requires_grad, requires_grad_, reshape, reshape_as, resize, resize_, resize_as, resize_as_, retain_grad, roll, rot90, round, round_, rsqrt, rsqrt_, scatter, scatter_, scatter_add, scatter_add_, select, set_, sgn, sgn_, shape, share_memory_, short, sigmoid, sigmoid_, sign, sign_, signbit, sin, sin_, sinc, sinc_, sinh, sinh_, size, slogdet, smm, softmax, solve, sort, sparse_dim, sparse_mask, sparse_resize_, sparse_resize_and_clear_, split, split_with_sizes, sqrt, sqrt_, square, square_, squeeze, squeeze_, sspaddmm, std, stft, storage, storage_offset, storage_type, stride, sub, sub_, subtract, subtract_, sum, sum_to_size, svd, swapaxes, swapaxes_, swapdims, swapdims_, symeig, t, t_, take, take_along_dim, tan, tan_, tanh, tanh_, tensor_split, tile, to, to_dense, to_mkldnn, to_sparse, tolist, topk, trace, transpose, transpose_, triangular_solve, tril, tril_, triu, triu_, true_divide, true_divide_, trunc, trunc_, type, type_as, unbind, unflatten, unfold, uniform_, unique, unique_consecutive, unsafe_chunk, unsafe_split, unsafe_split_with_sizes, unsqueeze, unsqueeze_, values, var, vdot, view, view_as, vsplit, where, xlogy, xlogy_, xpu, zero_'"]},"metadata":{},"execution_count":6}]},{"cell_type":"markdown","metadata":{"id":"_dYo0ElY3gmG"},"source":["**Running it on gpu (if available)**"]},{"cell_type":"code","metadata":{"id":"t7tC4u3r4Fl_"},"source":["device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n","\n","x = torch.tensor([[1,2,3],[4,5,6]],\n"," device=device)\n","y = torch.tensor([[7,8,9],[10,11,12]],\n"," device=device)\n","z = x + y"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"__m8emIe4o2F","executionInfo":{"status":"ok","timestamp":1631129347120,"user_tz":-330,"elapsed":10,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"4efd6e86-e5ad-48ed-8c85-b6562e61a108"},"source":["print(z)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["tensor([[ 8, 10, 12],\n"," [14, 16, 18]])\n"]}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"zjYvTp7c4p8V","executionInfo":{"status":"ok","timestamp":1631129356196,"user_tz":-330,"elapsed":456,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"331b0120-3803-40ff-944d-557bc90b65fd"},"source":["print(z.size())"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["torch.Size([2, 3])\n"]}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"G8z472dV4sLM","executionInfo":{"status":"ok","timestamp":1631129360441,"user_tz":-330,"elapsed":505,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"3a4bee36-05c4-4acc-9590-1bfa2fa2d8d9"},"source":["print(z.device)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["cpu\n"]}]},{"cell_type":"markdown","metadata":{"id":"gkJvL2pC4tMn"},"source":["The previous section showed a simple way to create tensors; however, there are many other ways to do it. You can create tensors from preexisting numeric data or create random samplings. Tensors can be created from preexisting data stored in array-like structures such as lists, tuples, scalars, or serialized data files, as well as in NumPy arrays.\n","\n","The following code illustrates some common ways to create tensors. First, it shows how to create a tensor from a list using torch.tensor(). This method can also be used to create tensors from other data structures like tuples, sets, or NumPy arrays:"]},{"cell_type":"code","metadata":{"id":"alVKYnj45bT9"},"source":["import numpy \n","\n","# Created from pre-existing arrays\n","w = torch.tensor([1,2,3]) # <1>\n","w = torch.tensor((1,2,3)) # <2>\n","w = torch.tensor(numpy.array([1,2,3])) # <3>\n","\n","# Initialized by size\n","w = torch.empty(100,200) # <4>\n","w = torch.zeros(100,200) # <5>\n","w = torch.ones(100,200) # <6>\n","\n","# Initialized by size with random values\n","w = torch.rand(100,200) # <7>\n","w = torch.randn(100,200) # <8>\n","w = torch.randint(5,10,(100,200)) # <9> \n","\n","# Initialized with specified data type or device\n","w = torch.empty((100,200), dtype=torch.float64, \n"," device=\"cpu\")\n","\n","# Initialized to have same size, data type, \n","# and device as another tensor\n","x = torch.empty_like(w)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"2LVb-lUd5pLa"},"source":["1. from a list\n","2. from a tuple\n","3. from a numpy array\n","4. uninitialized, elements values are not predictable\n","5. all elements initialized with 0.0\n","6. all elements initialized with 1.0\n","7. creates a 100 x 200 tensor with elements from a uniform distribution on the interval [0, 1)\n","8. elements are random numbers from a normal distribution with mean 0 and variance 1\n","9. elements are random integers between 5 and 10"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Xdx5vlig_kIn","executionInfo":{"status":"ok","timestamp":1631131339172,"user_tz":-330,"elapsed":619,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"ebc0a95a-8a12-4b1d-acaa-11d079b00ba2"},"source":["x = torch.tensor([[1,2,3],[4,5,6]])\n","\n","print(torch.empty_like(x))\n","print(torch.empty_like(x))\n","print(torch.zeros_like(x))\n","print(torch.ones_like(x))\n","\n","print(torch.full_like(x, fill_value=5))"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["tensor([[94291637941632, 2, 3],\n"," [ 4, 5, 6]])\n","tensor([[94291637941632, 2, 3],\n"," [ 4, 5, 6]])\n","tensor([[0, 0, 0],\n"," [0, 0, 0]])\n","tensor([[1, 1, 1],\n"," [1, 1, 1]])\n","tensor([[5, 5, 5],\n"," [5, 5, 5]])\n"]}]},{"cell_type":"markdown","metadata":{"id":"SN3rMtBV6HE4"},"source":["Following table lists PyTorch functions used to create tensors. You should use each one with the torch namespace, e.g., torch.empty()."]},{"cell_type":"markdown","metadata":{"id":"HuBlN-LC6njl"},"source":["| Function | Description |\n","| -------- | ----------- |\n","| torch.tensor(data, dtype=None, device=None,
requires_grad=False, pin_memory=False) | Creates a tensor from an existing data structure |\n","| torch.empty(*size, out=None, dtype=None,
layout=torch.strided, device=None, requires_grad=False) | Creates a tensor from uninitialized elements based on the random state of values in memory |\n","| torch.zeros(*size, out=None, dtype=None,
layout=torch.strided, device=None, requires_grad=False) | Creates a tensor with all elements initialized to 0.0 |\n","| torch.ones(*size, out=None, dtype=None,
layout=torch.strided, device=None, requires_grad=False) | Creates a tensor with all elements initialized to 1.0 |\n","| torch.arange(start=0, end, step=1, out=None,
dtype=None, layout=torch.strided, device=None, requires_grad=False) | Creates a 1D tensor of values over a range with a common step value |\n","| torch.linspace(start, end, steps=100,
out=None, dtype=None, layout=torch.strided,
device=None, requires_grad=False) | Creates a 1D tensor of linearly spaced points between the start and end |\n","| torch.logspace(start, end, steps=100,
base=10.0, out=None, dtype=None,
layout=torch.strided, device=None, requires_grad=False) | Creates a 1D tensor of logarithmically spaced points between the start and end |\n","| torch.eye(n, m=None, out=None, dtype=None,
layout=torch.strided, device=None, requires_grad=False) | Creates a 2D tensor with ones on the diagonal and zeros everywhere else |\n","| torch.full(size, fill_value, out=None,
dtype=None, layout=torch.strided, device=None, requires_grad=False) | Creates a tensor filled with fill_value |\n","| torch.load(f) | Loads a tensor from a serialized pickle file |\n","| torch.save(f) | Saves a tensor to a serialized pickle file |"]},{"cell_type":"markdown","metadata":{"id":"d7p4SR2x7TYN"},"source":["During deep learning development, it’s important to be aware of the data type used by your data and its calculations. So when you create tensors, you should control what data types are being used. As mentioned previously, all tensor elements have the same data type. You can specify the data type when creating the tensor by using the dtype parameter, or you can cast a tensor to a new dtype using the appropriate casting method or the to() method, as shown in the following code:"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"mLRaLgB_93Og","executionInfo":{"status":"ok","timestamp":1631130783444,"user_tz":-330,"elapsed":525,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"24ef8ed3-944d-469c-ff3a-aa9b1ec6de1c"},"source":["# Specify data type at creation using dtype\n","w = torch.tensor([1,2,3], dtype=torch.float32)\n","\n","# Use casting method to cast to a new data type\n","w.int() # w remains a float32 after cast\n","w = w.int() # w changes to int32 after cast\n","\n","# Use to() method to cast to a new type\n","w = w.to(torch.float64) # <1>\n","w = w.to(dtype=torch.float64) # <2>\n","\n","# Python automatically converts data types during operations\n","x = torch.tensor([1,2,3], dtype=torch.int32)\n","y = torch.tensor([1,2,3], dtype=torch.float32)\n","z = x + y # <3>\n","print(z.dtype)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["torch.float32\n"]}]},{"cell_type":"markdown","metadata":{"id":"trT-uz9G_HV8"},"source":["Table below lists all the available data types in PyTorch. Each data type results in a different tensor class depending on the tensor’s device. The corresponding tensor classes are shown in the two rightmost columns for CPUs and GPUs, respectively."]},{"cell_type":"markdown","metadata":{"id":"D5zCAiIQ-Inm"},"source":["| Data type | dtype | CPU tensor | GPU tensor |\n","| --------- | ----- | ---------- | ---------- |\n","| 32-bit floating point (default) | torch.float32 or torch.float | torch.​​Float⁠Ten⁠sor | torch.cuda.​Float⁠Tensor |\n","| 64-bit floating point | torch.float64 or torch.dou⁠ble | torch.​​Dou⁠ble⁠Tensor | torch.cuda.​​Dou⁠bleTensor |\n","| 16-bit floating point | torch.float16 or torch.half | torch.​Half⁠Tensor | torch.cuda.​Half⁠Tensor |\n","| 8-bit integer (unsigned) | torch.uint8 | torch.​Byte⁠Tensor | torch.cuda.​Byte⁠Tensor |\n","| 8-bit integer (signed) | torch.int8 | torch.​Char⁠Tensor | torch.cuda.​Char⁠Tensor |\n","| 16-bit integer (signed) | torch.int16 or torch.short | torch.​Short⁠Tensor | torch.cuda.​Short⁠Tensor |\n","| 32-bit integer (signed) | torch.int32 or torch.int | torch.​IntTen⁠sor | torch.cuda.​IntTen⁠sor |\n","| 64-bit integer (signed) | torch.int64 or torch.long | torch.​Long⁠Tensor | torch.cuda.​Long⁠Tensor |\n","| Boolean | torch.bool | torch.​Bool⁠Tensor | torch.cuda.​Bool⁠Tensor |"]},{"cell_type":"markdown","metadata":{"id":"8AjkaSqO-7Zu"},"source":["**Indexing, Slicing, Combining, and Splitting Tensors**"]},{"cell_type":"markdown","metadata":{"id":"m_VAewgyAjwx"},"source":["Once you have created tensors, you may want to access portions of the data and combine or split tensors to form new tensors. The following code demonstrates how to perform these types of operations. You can slice and index tensors in the same way you would slice and index NumPy arrays."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"uQGhWD-6AlPD","executionInfo":{"status":"ok","timestamp":1631131457846,"user_tz":-330,"elapsed":4,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"d4e71867-d856-4022-e32c-812db216e17a"},"source":["x = torch.tensor([[1,2],[3,4],[5,6],[7,8]])\n","x"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["tensor([[1, 2],\n"," [3, 4],\n"," [5, 6],\n"," [7, 8]])"]},"metadata":{},"execution_count":22}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"7O6X3OYuAtOa","executionInfo":{"status":"ok","timestamp":1631131458570,"user_tz":-330,"elapsed":7,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"89f2c247-2efe-48c5-fdff-1009791d1ca7"},"source":["# Indexing, returns a tensor\n","print(x[1,1])"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["tensor(4)\n"]}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"-LbU5OomAthN","executionInfo":{"status":"ok","timestamp":1631131478134,"user_tz":-330,"elapsed":807,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"91f2a05b-c509-4295-e596-76a2e511907c"},"source":["# Indexing, returns a value as a Python number\n","print(x[1,1].item())"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["4\n"]}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"ptvE3KWXAyKE","executionInfo":{"status":"ok","timestamp":1631131493805,"user_tz":-330,"elapsed":451,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"45ac3bac-dce8-4cde-b604-459b1e086b29"},"source":["# Slicing\n","print(x[:2,1])"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["tensor([2, 4])\n"]}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"42-DNNI4A2Df","executionInfo":{"status":"ok","timestamp":1631131513968,"user_tz":-330,"elapsed":612,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"02a75a93-d252-4ea5-90a4-bf0bcf4b38aa"},"source":["# Boolean indexing\n","# Only keep elements less than 5\n","print(x[x<5])"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["tensor([1, 2, 3, 4])\n"]}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"nuG5jm8BA682","executionInfo":{"status":"ok","timestamp":1631131530965,"user_tz":-330,"elapsed":510,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"02be09e8-4f1a-45d8-b662-28b7cfb3130b"},"source":["# Transpose array; x.t() or x.T can be used\n","print(x.t())"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["tensor([[1, 3, 5, 7],\n"," [2, 4, 6, 8]])\n"]}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"zHp9ZqpHA_HI","executionInfo":{"status":"ok","timestamp":1631131548192,"user_tz":-330,"elapsed":10,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"4e0fe869-b7f9-42c2-c83e-51a49c08f2b0"},"source":["# Change shape; usually view() is preferred over\n","# reshape()\n","print(x.view((2,4)))"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["tensor([[1, 2, 3, 4],\n"," [5, 6, 7, 8]])\n"]}]},{"cell_type":"markdown","metadata":{"id":"fCZINmpeBDO1"},"source":["You can also combine or split tensors by using functions like torch.stack() and torch.unbind(), respectively, as shown in the following code:"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Wu2qkNljBQlP","executionInfo":{"status":"ok","timestamp":1631131625385,"user_tz":-330,"elapsed":446,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"0d3d90ef-0c3c-4d6a-f42a-f295b76be533"},"source":["# Combining tensors\n","y = torch.stack((x, x))\n","print(y)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["tensor([[[1, 2],\n"," [3, 4],\n"," [5, 6],\n"," [7, 8]],\n","\n"," [[1, 2],\n"," [3, 4],\n"," [5, 6],\n"," [7, 8]]])\n"]}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"W9IKh1U-Bsmn","executionInfo":{"status":"ok","timestamp":1631131718764,"user_tz":-330,"elapsed":10,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"a07d0050-9ad3-4442-9854-7570cd066ece"},"source":["x"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["tensor([[1, 2],\n"," [3, 4],\n"," [5, 6],\n"," [7, 8]])"]},"metadata":{},"execution_count":31}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Zxyw4MgwBWMR","executionInfo":{"status":"ok","timestamp":1631131697996,"user_tz":-330,"elapsed":890,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"1736171c-1bfa-416d-8736-a15b5dc155c8"},"source":["# Splitting tensors\n","a,b = x.unbind(dim=1)\n","print(a,b)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["tensor([1, 3, 5, 7]) tensor([2, 4, 6, 8])\n"]}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"85HKy4e_Bn0V","executionInfo":{"status":"ok","timestamp":1631131777258,"user_tz":-330,"elapsed":621,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"9226f00b-ed00-459b-db86-e35a8bfa4ebb"},"source":["# Splitting tensors\n","a,b,c,d = x.unbind(dim=0)\n","print(a,b,c,d)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["tensor([1, 2]) tensor([3, 4]) tensor([5, 6]) tensor([7, 8])\n"]}]},{"cell_type":"markdown","metadata":{"id":"w5GOH2K8BvZ5"},"source":["PyTorch provides a robust set of built-in functions that can be used to access, split, and combine tensors in different ways. Table below lists some commonly used functions to manipulate tensor elements."]},{"cell_type":"markdown","metadata":{"id":"rJUrzJA8CWDm"},"source":["| Function | Description |\n","| -------- | ----------- |\n","| torch.**cat**() | Concatenates the given sequence of tensors in the given dimension. |\n","| torch.**chunk**() | Splits a tensor into a specific number of chunks. Each chunk is a view of the input tensor. |\n","| torch.**gather**() | Gathers values along an axis specified by the dimension. |\n","| torch.**index\\_select**() | Returns a new tensor that indexes the input tensor along a dimension using the entries in the index, which is a LongTensor. |\n","| torch.**masked\\_select**() | Returns a new 1D tensor that indexes the input tensor according to the Boolean mask, which is a BoolTensor. |\n","| torch.**narrow**() | Returns a tensor that is a narrow version of the input tensor. |\n","| torch.**nonzero**() | Returns the indices of nonzero elements. |\n","| torch.**reshape**() | Returns a tensor with the same data and number of elements as the input tensor, but a different shape.
Use view() instead to ensure the tensor is not copied. |\n","| torch.**split**() | Splits the tensor into chunks. Each chunk is a view or subdivision of the original tensor. |\n","| torch.**squeeze**() | Returns a tensor with all the dimensions of the input tensor of size 1 removed. |\n","| torch.**stack**() | Concatenates a sequence of tensors along a new dimension. |\n","| torch.**t**() | Expects the input to be a 2D tensor and transposes dimensions 0 and 1. |\n","| torch.**take**() | Returns a tensor at specified indices when slicing is not continuous. |\n","| torch.**transpose**() | Transposes only the specified dimensions. |\n","| torch.**unbind**() | Removes a tensor dimension by returning a tuple of the removed dimension. |\n","| torch.**unsqueeze**() | Returns a new tensor with a dimension of size 1 inserted at the specified position. |\n","| torch.**where**() | Returns a tensor of selected elements from either one of two tensors, depending on the specified condition. |"]},{"cell_type":"markdown","metadata":{"id":"lJLaGnC7C3xR"},"source":["Deep learning development is strongly based on mathematical computations, so PyTorch supports a very robust set of built-in math functions. Whether you are creating new data transforms, customizing loss functions, or building your own optimization algorithms, you can speed up your research and development with the math functions provided by PyTorch."]},{"cell_type":"markdown","metadata":{"id":"OsVAAMcODZ2L"},"source":["PyTorch supports many different types of math functions, including pointwise operations, reduction functions, comparison calculations, and linear algebra operations, as well as spectral and other math computations. The first category of useful math operations we’ll look at are pointwise operations. Pointwise operations perform an operation on each point in the tensor individually and return a new tensor.\n","\n","They are useful for rounding and truncation as well as trigonometrical and logical operations. By default, the functions will create a new tensor or use one passed in by the out parameter. If you want to perform an in-place operation, remember to append an underscore to the function name.\n","\n","Table below lists some commonly used pointwise operations."]},{"cell_type":"markdown","metadata":{"id":"hNIlm_R5DlCy"},"source":["| Operation type | Sample functions |\n","| -------------- | ---------------- |\n","| Basic math | add(), div(), mul(), neg(), reciprocal(), true\\_divide() |\n","| Truncation | ceil(), clamp(), floor(), floor\\_divide(), fmod(), frac(), lerp(), remainder(), round(), sigmoid(), trunc() |\n","| Complex numbers | abs(), angle(), conj(), imag(), real() |\n","| Trigonometry | acos(), asin(), atan(), cos(), cosh(), deg2rad(), rad2deg(), sin(), sinh(), tan(), tanh() |\n","| Exponents and logarithms | exp(), expm1(), log(), log10(), log1p(), log2(), logaddexp(), pow(), rsqrt(), sqrt(), square() |\n","| Logical | logical\\_and(), logical\\_not(), logical\\_or(), logical\\_xor() |\n","| Cumulative math | addcdiv(), addcmul() |\n","| Bitwise operators | bitwise\\_not(), bitwise\\_and(), bitwise\\_or(), bitwise\\_xor() |\n","| Error functions | erf(), erfc(), erfinv() |\n","| Gamma functions | digamma(), lgamma(), mvlgamma(), polygamma() |"]},{"cell_type":"markdown","metadata":{"id":"6SfQFUgAD4sS"},"source":["The second category of math functions we’ll look at are reduction operations. Reduction operations reduce a bunch of numbers down to a single number or a smaller set of numbers. That is, they reduce the dimensionality or rank of the tensor. Reduction operations include functions for finding maximum or minimum values as well as many statistical calculations, like finding the mean or standard deviation.\n","\n","These operations are frequently used in deep learning. For example, deep learning classification often uses the argmax() function to reduce softmax outputs to a dominant class."]},{"cell_type":"markdown","metadata":{"id":"5F3pQs2vFAqX"},"source":["| Function | Description |\n","| -------- | ----------- |\n","| torch.**argmax**(_input, dim, keepdim=False, out=None_) | Returns the index(es) of the maximum value across all elements, or just a dimension if it’s specified |\n","| torch.**argmin**(_input, dim, keepdim=False, out=None_) | Returns the index(es) of the minimum value across all elements, or just a dimension if it’s specified |\n","| torch.**dist**(_input, dim, keepdim=False, out=None_) | Computes the _p_\\-norm of two tensors |\n","| torch.**logsumexp**(_input, dim, keepdim=False, out=None_) | Computes the log of summed exponentials of each row of the input tensor in the given dimension |\n","| torch.**mean**(_input, dim, keepdim=False, out=None_) | Computes the mean or average across all elements, or just a dimension if it’s specified |\n","| torch.**median**(_input, dim, keepdim=False, out=None_) | Computes the median or middle value across all elements, or just a dimension if it’s specified |\n","| torch.**mode**(_input, dim, keepdim=False, out=None_) | Computes the mode or most frequent value across all elements, or just a dimension if it’s specified |\n","| torch.**norm**(_input, p='fro', dim=None,__keepdim=False,__out=None, dtype=None_) | Computes the matrix or vector norm across all elements, or just a dimension if it’s specified |\n","| torch.**prod**(_input, dim, keepdim=False, dtype=None_) | Computes the product of all elements, or of each row of the input tensor if it’s specified |\n","| torch.**std**(_input, dim, keepdim=False, out=None_) | Computes the standard deviation across all elements, or just a dimension if it’s specified |\n","| torch.**std\\_mean**(_input, unbiased=True_) | Computes the standard deviation and mean across all elements, or just a dimension if it’s specified |\n","| torch.**sum**(_input, dim, keepdim=False, out=None_) | Computes the sum of all elements, or just a dimension if it’s specified |\n","| torch.**unique**(_input, dim, keepdim=False, out=None_) | Removes duplicates across the entire tensor, or just a dimension if it’s specified |\n","| torch.unique\\_​consecutive(_input, dim, keepdim=False, out=None_) | Similar to torch.unique() but only removes consecutive duplicates |\n","| torch.**var**(_input, dim, keepdim=False, out=None_) | Computes the variance across all elements, or just a dimension if it’s specified |\n","| torch.**var\\_mean**(_input, dim, keepdim=False, out=None_) | Computes the mean and variance across all elements, or just a dimension if it’s specified |"]},{"cell_type":"markdown","metadata":{"id":"L52k6ec3FQde"},"source":["Note that many of these functions accept the dim parameter, which specifies the dimension of reduction for multidimensional tensors. This is similar to the axis parameter in NumPy. By default, when dim is not specified, the reduction occurs across all dimensions. Specifying dim = 1 will compute the operation across each row. For example, torch.mean(x,1) will compute the mean for each row in tensor x."]},{"cell_type":"markdown","metadata":{"id":"1chfJ5HJF1Bp"},"source":["> Tip: It’s common to chain methods together. For example, torch.rand(2,2).max().item() creates a 2 × 2 tensor of random floats, finds the maximum value, and returns the value itself from the resulting tensor."]},{"cell_type":"markdown","metadata":{"id":"qUE7FYCCF3JW"},"source":["Next, we’ll look at PyTorch’s comparison functions. Comparison functions usually compare all the values within a tensor, or compare one tensor’s values to another’s. They can return a tensor full of Booleans based on each element’s value such as torch.eq() or torch.is_boolean(). There are also functions to find the maximum or minimum value, sort tensor values, return the top subset of tensor elements, and more.\n","\n","Table below lists some commonly used comparison functions for your reference."]},{"cell_type":"markdown","metadata":{"id":"YXtj9JNOF-mn"},"source":["| Operation type | Sample functions |\n","| -------------- | ---------------- |\n","| Compare a tensor to other tensors | eq(), ge(), gt(), le(), lt(), ne() or \\==, \\>, \\>=, <, <=, !=, respectively |\n","| Test tensor status or conditions | isclose(), isfinite(), isinf(), isnan() |\n","| Return a single Boolean for the entire tensor | allclose(), equal() |\n","| Find value(s) over the entire tensor or along a given dimension | argsort(), kthvalue(), max(), min(), sort(), topk() |"]},{"cell_type":"markdown","metadata":{"id":"2rqA_1urGJRf"},"source":["The next type of mathematical functions we’ll look at are linear algebra functions. Linear algebra functions facilitate matrix operations and are important for deep learning computations.\n","\n","Many computations, including gradient descent and optimization algorithms, use linear algebra to implement their calculations. PyTorch supports a robust set of built-in linear algebra operations, many of which are based on the Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) standardized libraries."]},{"cell_type":"markdown","metadata":{"id":"vYenKx13GSTS"},"source":["| Function | Description |\n","| -------- | ----------- |\n","| torch.**matmul**() | Computes a matrix product of two tensors; supports broadcasting |\n","| torch.**chain\\_matmul**() | Computes a matrix product of _N_ tensors |\n","| torch.**mm**() | Computes a matrix product of two tensors (if broadcasting is required, use matmul()) |\n","| torch.**addmm**() | Computes a matrix product of two tensors and adds it to the input |\n","| torch.**bmm**() | Computes a batch of matrix products |\n","| torch.**addbmm**() | Computes a batch of matrix products and adds it to the input |\n","| torch.**baddbmm**() | Computes a batch of matrix products and adds it to the input batch |\n","| torch.**mv**() | Computes the product of the matrix and vector |\n","| torch.**addmv**() | Computes the product of the matrix and vector and adds it to the input |\n","| torch.**matrix\\_power** | Returns a tensor raised to the power of _n_ (for square tensors) |\n","| torch.**eig**() | Finds the eigenvalues and eigenvectors of a real square tensor |\n","| torch.**inverse**() | Computes the inverse of a square tensor |\n","| torch.**det**() | Computes the determinant of a matrix or batch of matrices |\n","| torch.**logdet**() | Computes the log determinant of a matrix or batch of matrices |\n","| torch.**dot**() | Computes the inner product of two tensors |\n","| torch.**addr**() | Computes the outer product of two tensors and adds it to the input |\n","| torch.**solve**() | Returns the solution to a system of linear equations |\n","| torch.**svd**() | Performs a single-value decomposition |\n","| torch.**pca\\_lowrank**() | Performs a linear principle component analysis |\n","| torch.**cholesky**() | Computes a Cholesky decomposition |\n","| torch.**cholesky\\_inverse**() | Computes the inverse of a symmetric positive definite matrix and returns the Cholesky factor |\n","| torch.**cholesky\\_solve**() | Solves a system of linear equations using the Cholesky factor |"]},{"cell_type":"markdown","metadata":{"id":"rmCdezGDGc-1"},"source":["The final type of mathematical operations we’ll consider are spectral and other math operations. Depending on the domain of interest, these functions may be useful for data transforms or analysis. For example, spectral operations like the fast Fourier transform (FFT) can play an important role in computer vision or digital signal processing applications."]},{"cell_type":"markdown","metadata":{"id":"e4pOe7roGrGN"},"source":["| Operation type | Sample functions |\n","| -------------- | ---------------- |\n","| Fast, inverse, and short-time Fourier transforms | fft(), ifft(), stft() |\n","| Real-to-complex FFT and complex-to-real inverse FFT (IFFT) | rfft(), irfft() |\n","| Windowing algorithms | bartlett\\_window(), blackman\\_window(),hamming\\_window(), hann\\_window() |\n","| Histogram and bin counts | histc(), bincount() |\n","| Cumulative operations | cummax(), cummin(), cumprod(), cumsum(),trace() (sum of the diagonal), 
einsum() (sum of products using Einstein summation) |\n","| Normalization functions | cdist(), renorm() |\n","| Cross product, dot product, and Cartesian product | cross(), tensordot(), cartesian\\_prod() |\n","| Functions that create a diagonal tensor with elements of the input tensor | diag(), diag\\_embed(), diag\\_flat(), diagonal() |\n","| Einstein summation | einsum() |\n","| Matrix reduction and restructuring functions | flatten(), flip(), rot90(), repeat\\_interleave(), meshgrid(), roll(), combinations() |\n","| Functions that return the lower or upper triangles and their indices | tril(), tril\\_indices, triu(), triu\\_indices() |"]},{"cell_type":"markdown","metadata":{"id":"xo2INk3MG-yP"},"source":["One function, backward(), is worth calling out in its own subsection because it’s what makes PyTorch so powerful for deep learning development. The backward() function uses PyTorch’s automatic differentiation package, torch.autograd, to differentiate and compute gradients of tensors based on the chain rule."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"0y6hSl9PHPwh","executionInfo":{"status":"ok","timestamp":1631133192775,"user_tz":-330,"elapsed":522,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"71d6cb87-37f6-414b-8427-2dfeb7cb75c5"},"source":["x = torch.tensor([[1,2,3],[4,5,6]], \n"," dtype=torch.float, requires_grad=True)\n","print(x)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["tensor([[1., 2., 3.],\n"," [4., 5., 6.]], requires_grad=True)\n"]}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"ZzqsahapHWKV","executionInfo":{"status":"ok","timestamp":1631133204109,"user_tz":-330,"elapsed":425,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"3a474ad1-549f-4497-c43f-b6acf9b708a3"},"source":["f = x.pow(2).sum()\n","print(f)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["tensor(91., grad_fn=)\n"]}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"aajVVfvhHU1k","executionInfo":{"status":"ok","timestamp":1631133219903,"user_tz":-330,"elapsed":634,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"33b1db4f-a128-47f6-8475-edc6d9c05283"},"source":["f.backward()\n","print(x.grad) # df/dx = 2x"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["tensor([[ 2., 4., 6.],\n"," [ 8., 10., 12.]])\n"]}]},{"cell_type":"markdown","metadata":{"id":"o955FXZZkCEk"},"source":["## Gradient Descent"]},{"cell_type":"markdown","metadata":{"id":"JJlouCu108PF"},"source":["we'll implement the basic functions of the Gradient Descent algorithm to find the boundary in a small dataset. First, we'll start with some functions that will help us plot and visualize the data."]},{"cell_type":"code","metadata":{"id":"NSmwqCcu1L6W"},"source":["import matplotlib.pyplot as plt\n","import numpy as np\n","import pandas as pd\n","\n","#Some helper functions for plotting and drawing lines\n","\n","def plot_points(X, y):\n"," admitted = X[np.argwhere(y==1)]\n"," rejected = X[np.argwhere(y==0)]\n"," plt.scatter([s[0][0] for s in rejected], [s[0][1] for s in rejected], s = 25, color = 'blue', edgecolor = 'k')\n"," plt.scatter([s[0][0] for s in admitted], [s[0][1] for s in admitted], s = 25, color = 'red', edgecolor = 'k')\n","\n","def display(m, b, color='g--'):\n"," plt.xlim(-0.05,1.05)\n"," plt.ylim(-0.05,1.05)\n"," x = np.arange(-10, 10, 0.1)\n"," plt.plot(x, m*x+b, color)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":265},"id":"ZRHt-Mj01Slt","executionInfo":{"status":"ok","timestamp":1631195611817,"user_tz":-330,"elapsed":525,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"00c2c3d3-31ee-4e01-80f9-0f0aa9b4c2ca"},"source":["data = pd.read_csv('https://raw.githubusercontent.com/udacity/deep-learning-v2-pytorch/master/intro-neural-networks/gradient-descent/data.csv', header=None)\n","X = np.array(data[[0,1]])\n","y = np.array(data[2])\n","plot_points(X,y)\n","plt.show()"],"execution_count":null,"outputs":[{"output_type":"display_data","data":{"image/png":"\n","text/plain":["
"]},"metadata":{"needs_background":"light"}}]},{"cell_type":"markdown","metadata":{"id":"BvrTusgE12Xe"},"source":["- Sigmoid activation function\n","\n","$$\\sigma(x) = \\frac{1}{1+e^{-x}}$$\n","\n","- Output (prediction) formula\n","\n","$$\\hat{y} = \\sigma(w_1 x_1 + w_2 x_2 + b)$$\n","\n","- Error function\n","\n","$$Error(y, \\hat{y}) = - y \\log(\\hat{y}) - (1-y) \\log(1-\\hat{y})$$\n","\n","- The function that updates the weights\n","\n","$$ w_i \\longrightarrow w_i + \\alpha (y - \\hat{y}) x_i$$\n","\n","$$ b \\longrightarrow b + \\alpha (y - \\hat{y})$$"]},{"cell_type":"code","metadata":{"id":"v7_iirSK1b2M"},"source":["# Activation (sigmoid) function\n","def sigmoid(x):\n"," return 1 / (1 + np.exp(-x))\n","\n","# Output (prediction) formula\n","def output_formula(features, weights, bias):\n"," return sigmoid(np.dot(features, weights) + bias)\n","\n","# Error (log-loss) formula\n","def error_formula(y, output):\n"," return - y*np.log(output) - (1 - y) * np.log(1-output)\n","\n","# Gradient descent step\n","def update_weights(x, y, weights, bias, learnrate):\n"," output = output_formula(x, weights, bias)\n"," d_error = y - output\n"," weights += learnrate * d_error * x\n"," bias += learnrate * d_error\n"," return weights, bias"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"g55v6K862D3T"},"source":["The following training function will help us iterate the gradient descent algorithm through all the data, for a number of epochs. It will also plot the data, and some of the boundary lines obtained as we run the algorithm."]},{"cell_type":"code","metadata":{"id":"_jJ3uDFO2Gq9"},"source":["np.random.seed(44)\n","\n","epochs = 100\n","learnrate = 0.01\n","\n","def train(features, targets, epochs, learnrate, graph_lines=False):\n"," \n"," errors = []\n"," n_records, n_features = features.shape\n"," last_loss = None\n"," weights = np.random.normal(scale=1 / n_features**.5, size=n_features)\n"," bias = 0\n"," for e in range(epochs):\n"," del_w = np.zeros(weights.shape)\n"," for x, y in zip(features, targets):\n"," weights, bias = update_weights(x, y, weights, bias, learnrate)\n"," \n"," # Printing out the log-loss error on the training set\n"," out = output_formula(features, weights, bias)\n"," loss = np.mean(error_formula(targets, out))\n"," errors.append(loss)\n"," if e % (epochs / 10) == 0:\n"," print(\"\\n========== Epoch\", e,\"==========\")\n"," if last_loss and last_loss < loss:\n"," print(\"Train loss: \", loss, \" WARNING - Loss Increasing\")\n"," else:\n"," print(\"Train loss: \", loss)\n"," last_loss = loss\n"," \n"," # Converting the output (float) to boolean as it is a binary classification\n"," # e.g. 0.95 --> True (= 1), 0.31 --> False (= 0)\n"," predictions = out > 0.5\n"," \n"," accuracy = np.mean(predictions == targets)\n"," print(\"Accuracy: \", accuracy)\n"," if graph_lines and e % (epochs / 100) == 0:\n"," display(-weights[0]/weights[1], -bias/weights[1])\n"," \n","\n"," # Plotting the solution boundary\n"," plt.title(\"Solution boundary\")\n"," display(-weights[0]/weights[1], -bias/weights[1], 'black')\n","\n"," # Plotting the data\n"," plot_points(features, targets)\n"," plt.show()\n","\n"," # Plotting the error\n"," plt.title(\"Error Plot\")\n"," plt.xlabel('Number of epochs')\n"," plt.ylabel('Error')\n"," plt.plot(errors)\n"," plt.show()"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"tDM8b5Kg2-9T"},"source":["When we run the function, we'll obtain the following:\n","- 10 updates with the current training loss and accuracy\n","- A plot of the data and some of the boundary lines obtained. The final one is in black. Notice how the lines get closer and closer to the best fit, as we go through more epochs.\n","- A plot of the error function. Notice how it decreases as we go through more epochs."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"A_zVS6ES2_KU","executionInfo":{"status":"ok","timestamp":1631196445953,"user_tz":-330,"elapsed":1818,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"f86b820e-f754-4b38-cb81-aecc6e226c97"},"source":["train(X, y, epochs, learnrate, True)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","========== Epoch 0 ==========\n","Train loss: 0.7135845195381634\n","Accuracy: 0.4\n","\n","========== Epoch 10 ==========\n","Train loss: 0.6225835210454962\n","Accuracy: 0.59\n","\n","========== Epoch 20 ==========\n","Train loss: 0.5548744083669508\n","Accuracy: 0.74\n","\n","========== Epoch 30 ==========\n","Train loss: 0.501606141872473\n","Accuracy: 0.84\n","\n","========== Epoch 40 ==========\n","Train loss: 0.4593334641861401\n","Accuracy: 0.86\n","\n","========== Epoch 50 ==========\n","Train loss: 0.42525543433469987\n","Accuracy: 0.93\n","\n","========== Epoch 60 ==========\n","Train loss: 0.3973461571671399\n","Accuracy: 0.93\n","\n","========== Epoch 70 ==========\n","Train loss: 0.3741469765239074\n","Accuracy: 0.93\n","\n","========== Epoch 80 ==========\n","Train loss: 0.35459973368161973\n","Accuracy: 0.94\n","\n","========== Epoch 90 ==========\n","Train loss: 0.3379273658879921\n","Accuracy: 0.94\n"]},{"output_type":"display_data","data":{"image/png":"\n","text/plain":["
"]},"metadata":{"needs_background":"light"}},{"output_type":"display_data","data":{"image/png":"\n","text/plain":["
"]},"metadata":{"needs_background":"light"}}]},{"cell_type":"markdown","metadata":{"id":"wMJvhqEB5iC-"},"source":["## Predicting Student Admissions with Neural Networks\n","In this section, we predict student admissions to graduate school at UCLA based on three pieces of data:\n","- GRE Scores (Test)\n","- GPA Scores (Grades)\n","- Class rank (1-4)\n","\n","The dataset originally came from here: http://www.ats.ucla.edu/"]},{"cell_type":"markdown","metadata":{"id":"qQHxydHZ5qaV"},"source":["### Loading the data"]},{"cell_type":"code","metadata":{"id":"X2hOA87D5iDB","colab":{"base_uri":"https://localhost:8080/","height":204},"executionInfo":{"status":"ok","timestamp":1631196738357,"user_tz":-330,"elapsed":521,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"e6239a19-beb5-488d-a8e3-09927caaec92"},"source":["# Importing pandas and numpy\n","import pandas as pd\n","import numpy as np\n","\n","# Reading the csv file into a pandas DataFrame\n","data = pd.read_csv('https://raw.githubusercontent.com/udacity/deep-learning-v2-pytorch/master/intro-neural-networks/student-admissions/student_data.csv')\n","\n","# Printing out the first 10 rows of our data\n","data.head()"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
admitgregparank
003803.613
116603.673
218004.001
316403.194
405202.934
\n","
"],"text/plain":[" admit gre gpa rank\n","0 0 380 3.61 3\n","1 1 660 3.67 3\n","2 1 800 4.00 1\n","3 1 640 3.19 4\n","4 0 520 2.93 4"]},"metadata":{},"execution_count":1}]},{"cell_type":"markdown","metadata":{"id":"nHyMLmcM5iDE"},"source":["### Plotting the data\n","\n","First let's make a plot of our data to see how it looks. In order to have a 2D plot, let's ingore the rank."]},{"cell_type":"code","metadata":{"id":"lOVx1Waa5iDF","colab":{"base_uri":"https://localhost:8080/","height":279},"executionInfo":{"status":"ok","timestamp":1631196744289,"user_tz":-330,"elapsed":674,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"e83bb84c-0446-4354-aa89-9a3c7aa8288e"},"source":["# %matplotlib inline\n","import matplotlib.pyplot as plt\n","\n","# Function to help us plot\n","def plot_points(data):\n"," X = np.array(data[[\"gre\",\"gpa\"]])\n"," y = np.array(data[\"admit\"])\n"," admitted = X[np.argwhere(y==1)]\n"," rejected = X[np.argwhere(y==0)]\n"," plt.scatter([s[0][0] for s in rejected], [s[0][1] for s in rejected], s = 25, color = 'red', edgecolor = 'k')\n"," plt.scatter([s[0][0] for s in admitted], [s[0][1] for s in admitted], s = 25, color = 'cyan', edgecolor = 'k')\n"," plt.xlabel('Test (GRE)')\n"," plt.ylabel('Grades (GPA)')\n"," \n","# Plotting the points\n","plot_points(data)\n","plt.show()"],"execution_count":null,"outputs":[{"output_type":"display_data","data":{"image/png":"\n","text/plain":["
"]},"metadata":{"needs_background":"light"}}]},{"cell_type":"markdown","metadata":{"id":"WsL43x5v5iDG"},"source":["Roughly, it looks like the students with high scores in the grades and test passed, while the ones with low scores didn't, but the data is not as nicely separable as we hoped it would. Maybe it would help to take the rank into account? Let's make 4 plots, each one for each rank."]},{"cell_type":"code","metadata":{"id":"XNtmwfBS5iDG","colab":{"base_uri":"https://localhost:8080/","height":1000},"executionInfo":{"status":"ok","timestamp":1631196748158,"user_tz":-330,"elapsed":1230,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"b52a77a1-58c1-4764-de1d-f644c9367d43"},"source":["# Separating the ranks\n","data_rank1 = data[data[\"rank\"]==1]\n","data_rank2 = data[data[\"rank\"]==2]\n","data_rank3 = data[data[\"rank\"]==3]\n","data_rank4 = data[data[\"rank\"]==4]\n","\n","# Plotting the graphs\n","plot_points(data_rank1)\n","plt.title(\"Rank 1\")\n","plt.show()\n","plot_points(data_rank2)\n","plt.title(\"Rank 2\")\n","plt.show()\n","plot_points(data_rank3)\n","plt.title(\"Rank 3\")\n","plt.show()\n","plot_points(data_rank4)\n","plt.title(\"Rank 4\")\n","plt.show()"],"execution_count":null,"outputs":[{"output_type":"display_data","data":{"image/png":"\n","text/plain":["
"]},"metadata":{"needs_background":"light"}},{"output_type":"display_data","data":{"image/png":"\n","text/plain":["
"]},"metadata":{"needs_background":"light"}},{"output_type":"display_data","data":{"image/png":"\n","text/plain":["
"]},"metadata":{"needs_background":"light"}},{"output_type":"display_data","data":{"image/png":"\n","text/plain":["
"]},"metadata":{"needs_background":"light"}}]},{"cell_type":"markdown","metadata":{"id":"OtZrzIZJ5iDH"},"source":["This looks more promising, as it seems that the lower the rank, the higher the acceptance rate. Let's use the rank as one of our inputs. In order to do this, we should one-hot encode it.\n","\n","### One-hot encoding the rank\n","Use the `get_dummies` function in pandas in order to one-hot encode the data.\n","\n","Hint: To drop a column, it's suggested that you use `one_hot_data`[.drop( )](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html)."]},{"cell_type":"code","metadata":{"id":"JH_R-AIm5iDI","colab":{"base_uri":"https://localhost:8080/","height":359},"executionInfo":{"status":"ok","timestamp":1631196791193,"user_tz":-330,"elapsed":498,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"03a91117-96b5-4ae0-f241-af27b22ad1d5"},"source":["# Make dummy variables for rank\n","one_hot_data = pd.concat([data, pd.get_dummies(data['rank'], prefix='rank')], axis=1)\n","\n","# Drop the previous rank column\n","one_hot_data = one_hot_data.drop('rank', axis=1)\n","\n","# Print the first 10 rows of our data\n","one_hot_data[:10]"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
admitgregparank_1rank_2rank_3rank_4
003803.610010
116603.670010
218004.001000
316403.190001
405202.930001
517603.000100
615602.981000
704003.080100
815403.390010
907003.920100
\n","
"],"text/plain":[" admit gre gpa rank_1 rank_2 rank_3 rank_4\n","0 0 380 3.61 0 0 1 0\n","1 1 660 3.67 0 0 1 0\n","2 1 800 4.00 1 0 0 0\n","3 1 640 3.19 0 0 0 1\n","4 0 520 2.93 0 0 0 1\n","5 1 760 3.00 0 1 0 0\n","6 1 560 2.98 1 0 0 0\n","7 0 400 3.08 0 1 0 0\n","8 1 540 3.39 0 0 1 0\n","9 0 700 3.92 0 1 0 0"]},"metadata":{},"execution_count":4}]},{"cell_type":"markdown","metadata":{"id":"j-AfPHnK5iDI"},"source":["### Scaling the data\n","The next step is to scale the data. We notice that the range for grades is 1.0-4.0, whereas the range for test scores is roughly 200-800, which is much larger. This means our data is skewed, and that makes it hard for a neural network to handle. Let's fit our two features into a range of 0-1, by dividing the grades by 4.0, and the test score by 800."]},{"cell_type":"code","metadata":{"id":"X68T9T2f5iDJ","colab":{"base_uri":"https://localhost:8080/","height":359},"executionInfo":{"status":"ok","timestamp":1631196829494,"user_tz":-330,"elapsed":525,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"bac0abcf-e771-4ed3-c2e7-ae2bf9b6038c"},"source":["# Copying our data\n","processed_data = one_hot_data[:]\n","\n","# Scaling the columns\n","processed_data['gre'] = processed_data['gre']/800\n","processed_data['gpa'] = processed_data['gpa']/4.0\n","processed_data[:10]"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
admitgregparank_1rank_2rank_3rank_4
000.4750.90250010
110.8250.91750010
211.0001.00001000
310.8000.79750001
400.6500.73250001
510.9500.75000100
610.7000.74501000
700.5000.77000100
810.6750.84750010
900.8750.98000100
\n","
"],"text/plain":[" admit gre gpa rank_1 rank_2 rank_3 rank_4\n","0 0 0.475 0.9025 0 0 1 0\n","1 1 0.825 0.9175 0 0 1 0\n","2 1 1.000 1.0000 1 0 0 0\n","3 1 0.800 0.7975 0 0 0 1\n","4 0 0.650 0.7325 0 0 0 1\n","5 1 0.950 0.7500 0 1 0 0\n","6 1 0.700 0.7450 1 0 0 0\n","7 0 0.500 0.7700 0 1 0 0\n","8 1 0.675 0.8475 0 0 1 0\n","9 0 0.875 0.9800 0 1 0 0"]},"metadata":{},"execution_count":5}]},{"cell_type":"markdown","metadata":{"id":"mQB4RAJh5iDK"},"source":["### Splitting the data into Training and Testing"]},{"cell_type":"markdown","metadata":{"id":"2nm16-HL5iDK"},"source":["In order to test our algorithm, we'll split the data into a Training and a Testing set. The size of the testing set will be 10% of the total data."]},{"cell_type":"code","metadata":{"id":"m2MzzBT65iDK","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1631196836661,"user_tz":-330,"elapsed":430,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"98650c0d-8e50-4003-b1b2-de9d1653e363"},"source":["sample = np.random.choice(processed_data.index, size=int(len(processed_data)*0.9), replace=False)\n","train_data, test_data = processed_data.iloc[sample], processed_data.drop(sample)\n","\n","print(\"Number of training samples is\", len(train_data))\n","print(\"Number of testing samples is\", len(test_data))\n","print(train_data[:10])\n","print(test_data[:10])"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Number of training samples is 360\n","Number of testing samples is 40\n"," admit gre gpa rank_1 rank_2 rank_3 rank_4\n","99 0 0.500 0.8275 0 0 1 0\n","194 1 0.750 0.8675 0 1 0 0\n","61 0 0.700 0.8300 0 0 0 1\n","302 1 0.500 0.7875 0 1 0 0\n","394 1 0.575 0.9975 0 0 1 0\n","62 0 0.800 0.9175 0 0 1 0\n","345 0 0.625 0.7575 0 0 1 0\n","344 0 0.650 0.8375 0 0 1 0\n","68 0 0.725 0.9225 1 0 0 0\n","314 0 0.675 0.8650 0 0 0 1\n"," admit gre gpa rank_1 rank_2 rank_3 rank_4\n","5 1 0.950 0.7500 0 1 0 0\n","13 0 0.875 0.7700 0 1 0 0\n","23 0 0.850 0.7975 0 0 0 1\n","26 1 0.775 0.9025 1 0 0 0\n","30 0 0.675 0.9450 0 0 0 1\n","58 0 0.500 0.9125 0 1 0 0\n","72 0 0.600 0.8475 0 0 0 1\n","74 0 0.900 0.8625 0 0 0 1\n","75 0 0.900 1.0000 0 0 1 0\n","80 0 0.875 0.7250 0 0 0 1\n"]}]},{"cell_type":"markdown","metadata":{"id":"CxTuMi0B5iDL"},"source":["### Splitting the data into features and targets (labels)\n","Now, as a final step before the training, we'll split the data into features (X) and targets (y)."]},{"cell_type":"code","metadata":{"id":"fHTwjUce5iDL","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1631196840524,"user_tz":-330,"elapsed":417,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"4dd5991a-a39b-4a27-d890-c004409ba32b"},"source":["features = train_data.drop('admit', axis=1)\n","targets = train_data['admit']\n","features_test = test_data.drop('admit', axis=1)\n","targets_test = test_data['admit']\n","\n","print(features[:10])\n","print(targets[:10])"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":[" gre gpa rank_1 rank_2 rank_3 rank_4\n","99 0.500 0.8275 0 0 1 0\n","194 0.750 0.8675 0 1 0 0\n","61 0.700 0.8300 0 0 0 1\n","302 0.500 0.7875 0 1 0 0\n","394 0.575 0.9975 0 0 1 0\n","62 0.800 0.9175 0 0 1 0\n","345 0.625 0.7575 0 0 1 0\n","344 0.650 0.8375 0 0 1 0\n","68 0.725 0.9225 1 0 0 0\n","314 0.675 0.8650 0 0 0 1\n","99 0\n","194 1\n","61 0\n","302 1\n","394 1\n","62 0\n","345 0\n","344 0\n","68 0\n","314 0\n","Name: admit, dtype: int64\n"]}]},{"cell_type":"markdown","metadata":{"id":"YeIFAlZj5iDM"},"source":["### Training the 1-layer Neural Network\n","The following function trains the 1-layer neural network. \n","First, we'll write some helper functions."]},{"cell_type":"code","metadata":{"id":"obL2qqRx5iDM"},"source":["# Activation (sigmoid) function\n","def sigmoid(x):\n"," return 1 / (1 + np.exp(-x))\n","\n","def sigmoid_prime(x):\n"," return sigmoid(x) * (1-sigmoid(x))\n"," \n","def error_formula(y, output):\n"," return - y*np.log(output) - (1 - y) * np.log(1-output)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"X1rQSGrW5iDM"},"source":["### Backpropagate the error\n","Now it's your turn to shine. Write the error term. Remember that this is given by the equation $$ (y-\\hat{y})x $$ for binary cross entropy loss function and \n","$$ (y-\\hat{y})\\sigma'(x)x $$ for mean square error. "]},{"cell_type":"code","metadata":{"id":"6l8htF5M5iDN"},"source":["def error_term_formula(x, y, output):\n","# for binary cross entropy loss\n"," return (y - output)*x\n","# for mean square error\n","# return (y - output)*sigmoid_prime(x)*x"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"xkPopj_E5iDN","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1631196887826,"user_tz":-330,"elapsed":3917,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"9eb95e1e-bd44-4592-815d-8c00e5863c42"},"source":["# Neural Network hyperparameters\n","epochs = 1000\n","learnrate = 0.0001\n","\n","# Training function\n","def train_nn(features, targets, epochs, learnrate):\n"," \n"," # Use to same seed to make debugging easier\n"," np.random.seed(42)\n","\n"," n_records, n_features = features.shape\n"," last_loss = None\n","\n"," # Initialize weights\n"," weights = np.random.normal(scale=1 / n_features**.5, size=n_features)\n","\n"," for e in range(epochs):\n"," del_w = np.zeros(weights.shape)\n"," for x, y in zip(features.values, targets):\n"," # Loop through all records, x is the input, y is the target\n","\n"," # Activation of the output unit\n"," # Notice we multiply the inputs and the weights here \n"," # rather than storing h as a separate variable \n"," output = sigmoid(np.dot(x, weights))\n","\n"," # The error term\n"," error_term = error_term_formula(x, y, output)\n","\n"," # The gradient descent step, the error times the gradient times the inputs\n"," del_w += error_term\n","\n"," # Update the weights here. The learning rate times the \n"," # change in weights\n"," # don't have to divide by n_records since it is compensated by the learning rate\n"," weights += learnrate * del_w #/ n_records \n","\n"," # Printing out the mean square error on the training set\n"," if e % (epochs / 10) == 0:\n"," out = sigmoid(np.dot(features, weights))\n"," loss = np.mean(error_formula(targets, out))\n"," print(\"Epoch:\", e)\n"," if last_loss and last_loss < loss:\n"," print(\"Train loss: \", loss, \" WARNING - Loss Increasing\")\n"," else:\n"," print(\"Train loss: \", loss)\n"," last_loss = loss\n"," print(\"=========\")\n"," print(\"Finished training!\")\n"," return weights\n"," \n","weights = train_nn(features, targets, epochs, learnrate)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Epoch: 0\n","Train loss: 0.7554101695287188\n","=========\n","Epoch: 100\n","Train loss: 0.6257618907012589\n","=========\n","Epoch: 200\n","Train loss: 0.6138277106962134\n","=========\n","Epoch: 300\n","Train loss: 0.6106428180453167\n","=========\n","Epoch: 400\n","Train loss: 0.6086718721854811\n","=========\n","Epoch: 500\n","Train loss: 0.607187860874402\n","=========\n","Epoch: 600\n","Train loss: 0.606032020277616\n","=========\n","Epoch: 700\n","Train loss: 0.6051189345648322\n","=========\n","Epoch: 800\n","Train loss: 0.6043881919205627\n","=========\n","Epoch: 900\n","Train loss: 0.6037952479383182\n","=========\n","Finished training!\n"]}]},{"cell_type":"markdown","metadata":{"id":"UpSbWJFB5iDO"},"source":["### Calculating the Accuracy on the Test Data"]},{"cell_type":"code","metadata":{"id":"AReeZ67y5iDO","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1631196888241,"user_tz":-330,"elapsed":6,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"2339ead3-1b6e-4fee-98a7-3f7920ceb00e"},"source":["# Calculate accuracy on test data\n","test_out = sigmoid(np.dot(features_test, weights))\n","predictions = test_out > 0.5\n","accuracy = np.mean(predictions == targets_test)\n","print(\"Prediction accuracy: {:.3f}\".format(accuracy))"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Prediction accuracy: 0.625\n"]}]},{"cell_type":"markdown","metadata":{"id":"WD2vgrnLlJLn"},"source":["## Vision model on CIFAR image dataset"]},{"cell_type":"markdown","metadata":{"id":"xCdsm69B-zyJ"},"source":["You’ll build a deep learning model and train the model using a common training loop structure. Then, you’ll test your model’s performance and tweak hyperparameters to improve your results and training speed. Finally, we’ll explore ways to deploy your model to prototype systems or production."]},{"cell_type":"markdown","metadata":{"id":"X_NpMaWa-0It"},"source":["First, we load this data and convert it to numeric values in the form of tensors. The tensors will act as inputs during the model training stage; however, before they are passed in, the tensors are usually preprocessed via transforms and grouped into batches for better training performance. Thus, the data preparation stage takes generic data and converts it to batches of tensors that can be passed into your NN model."]},{"cell_type":"markdown","metadata":{"id":"w3DtG7yY_FSm"},"source":["Next, in the model experimentation and development stage, we will design an NN model, train the model with our training data, test its performance, and optimize our hyperparameters to improve performance to a desired level. To do so, we will separate our dataset into three parts: one for training, one for validation, and one for testing. We’ll design an NN model and train its parameters with our training data. PyTorch provides elegantly designed modules and classes in the torch.nn module to help you create and train your NNs. We will define a loss function and optimizer from a selection of the many built-in PyTorch functions. Then we’ll perform backpropagation and update the model parameters in our training loop."]},{"cell_type":"markdown","metadata":{"id":"MDCzltgR_Wf7"},"source":["Within each epoch, we’ll also validate our model by passing in validation data, measuring performance, and potentially tuning hyperparameters. Finally, we’ll test our model by passing in test data and measuring the model’s performance against unseen data. In practice, validation and test loops may be optional, but we show them here for completeness."]},{"cell_type":"markdown","metadata":{"id":"JJltU6wr_h7o"},"source":["The last stage of deep learning model development is the model deployment stage. In this stage, we have a fully trained model—so what do we do with it? If you are a deep learning research scientist conducting experiments, you may want to simply save the model to a file and load it for further research and experimentation, or you may want to provide access to it via a repository like PyTorch Hub. You may also want to deploy it to an edge device or local server to demonstrate a prototype or a proof of concept.\n","\n","On the other hand, if you are a software developer or systems engineer, you may want to deploy your model to a product or service. In this case, you can deploy your model to a production environment on a cloud server or deploy it to an edge device or mobile phone. When deploying trained models, the model often requires additional postprocessing. For example, you may classify a batch of images, but you only want to report the most confident result. The model deployment stage also handles any postprocessing that is needed to go from your model’s output values to the final solution."]},{"cell_type":"markdown","metadata":{"id":"Ml-b4gkD_xYS"},"source":["PyTorch provides powerful built-in classes and utilities, such as the Dataset, DataLoader, and Sampler classes, for loading various types of data. The Dataset class defines how to access and preprocess data from a file or data sources. The Sampler class defines how to sample data from a dataset in order to create batches, while the DataLoader class combines a dataset with a sampler and allows you to iterate over a set of batches."]},{"cell_type":"code","metadata":{"id":"pIC4ZPIOATAb"},"source":["import torch\n","import torchvision\n","\n","from torchvision.datasets import CIFAR10"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":83,"referenced_widgets":["3b1bbde53e234e2ea985f0f07877c792","0a799c17626b43b8b38b79dacde55151","dfee4821887947a08e77015f08dbb098","d21e616effd549e88a3b6ae5a9111dd3","b6b4b2c8fb56426391335802bb4a308b","e1ea823e09fc45dc8ae837c9f125f3f3","ba713e8d5e2a4fbd985879ace34a8ab2","f40e9824762146c8b7da2a0fa7c40899","85df6f0c4928469dbed5c8eb554e612e","5e10a87722e747c3861e535c1bbcb6ee","0a9fb274ebdb43d5901e59bd3c9caacb"]},"id":"d-kAioq6BZMk","executionInfo":{"status":"ok","timestamp":1631165207095,"user_tz":-330,"elapsed":7346,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"70f85ff8-da99-413f-fc9a-570e6f7f73d3"},"source":["train_data = CIFAR10(root=\"./train/\",\n"," train=True, \n"," download=True)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./train/cifar-10-python.tar.gz\n"]},{"output_type":"display_data","data":{"application/vnd.jupyter.widget-view+json":{"model_id":"3b1bbde53e234e2ea985f0f07877c792","version_minor":0,"version_major":2},"text/plain":[" 0%| | 0/170498071 [00:00\n","2\n","\n","\n","\n","6\n","frog\n"]}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":83,"referenced_widgets":["81e9d38f7d484dd790e7b6927de2b447","be1fa3882b9646aa84f518b34b4382a8","4986bf68729f42a283f2ff29fb07e293","ce6f449b2b94421c9d71c9c01d9c0026","6d82a6732a28409aadba1a17802e0a58","c239f2f37cee48e68ec47b676dd5be3b","aaf297df8b214dabb0b7cfc4153b25e3","566005b140a843ec95618ad33f3ee20b","295277ab3eb84e4180f04e3cd3c49f1e","433630fb825a4f518a328b59d93a70ad","1063afb7a51f4e7bb8c48189cac7a4b5"]},"id":"X8PkwxC_Bbar","executionInfo":{"status":"ok","timestamp":1631165383490,"user_tz":-330,"elapsed":6130,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"3ddb8d77-9593-4b61-e987-fa18fb6dd55c"},"source":["test_data = CIFAR10(root=\"./test/\", \n"," train=False, \n"," download=True)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./test/cifar-10-python.tar.gz\n"]},{"output_type":"display_data","data":{"application/vnd.jupyter.widget-view+json":{"model_id":"81e9d38f7d484dd790e7b6927de2b447","version_minor":0,"version_major":2},"text/plain":[" 0%| | 0/170498071 [00:00\n","torch.Size([3, 32, 32])\n","tensor([[[-2.4291, -2.4291, -2.4291, ..., -2.4291, -2.4291, -2.4291],\n"," [-2.4291, -2.4291, -2.4291, ..., -2.4291, -2.4291, -2.4291],\n"," [-2.4291, -2.4291, -2.4291, ..., -2.4291, -2.4291, -2.4291],\n"," ...,\n"," [-2.4291, -2.4291, -2.4291, ..., 0.2073, 0.0328, -0.0835],\n"," [-2.4291, -2.4291, -2.4291, ..., 0.3430, 0.0910, 0.1297],\n"," [-2.4291, -2.4291, -2.4291, ..., 0.4593, 0.2267, 0.3430]],\n","\n"," [[-2.4183, -2.4183, -2.4183, ..., -2.4183, -2.4183, -2.4183],\n"," [-2.4183, -2.4183, -2.4183, ..., -2.4183, -2.4183, -2.4183],\n"," [-2.4183, -2.4183, -2.4183, ..., -2.4183, -2.4183, -2.4183],\n"," ...,\n"," [-2.4183, -2.4183, -2.4183, ..., -0.2156, -0.3532, -0.6089],\n"," [-2.4183, -2.4183, -2.4183, ..., -0.3532, -0.5892, -0.6482],\n"," [-2.4183, -2.4183, -2.4183, ..., -0.2746, -0.4319, -0.3139]],\n","\n"," [[-2.2214, -2.2214, -2.2214, ..., -2.2214, -2.2214, -2.2214],\n"," [-2.2214, -2.2214, -2.2214, ..., -2.2214, -2.2214, -2.2214],\n"," [-2.2214, -2.2214, -2.2214, ..., -2.2214, -2.2214, -2.2214],\n"," ...,\n"," [-2.2214, -2.2214, -2.2214, ..., -1.2069, -1.4605, -1.4605],\n"," [-2.2214, -2.2214, -2.2214, ..., -1.1678, -1.3629, -1.3239],\n"," [-2.2214, -2.2214, -2.2214, ..., -0.9922, -1.1678, -1.1093]]])\n"]}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"vbfud9n2Cdk3","executionInfo":{"status":"ok","timestamp":1631165571307,"user_tz":-330,"elapsed":1019,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"324530ee-f55b-42ae-95c1-773f4b8712d2"},"source":["test_transforms = transforms.Compose([\n"," transforms.ToTensor(),\n"," transforms.Normalize(\n"," (0.4914, 0.4822, 0.4465),\n"," (0.2023, 0.1994, 0.2010))])\n","\n","test_data = torchvision.datasets.CIFAR10(\n"," root=\"./test/\", \n"," train=False, \n"," transform=test_transforms)\n","\n","print(test_data)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Dataset CIFAR10\n"," Number of datapoints: 10000\n"," Root location: ./test/\n"," Split: Test\n"," StandardTransform\n","Transform: Compose(\n"," ToTensor()\n"," Normalize(mean=(0.4914, 0.4822, 0.4465), std=(0.2023, 0.1994, 0.201))\n"," )\n"]}]},{"cell_type":"markdown","metadata":{"id":"-5xEDXNeC1o-"},"source":["Now that we have defined the transforms and created the datasets, we can access data samples one at a time. However, when you train your model, you will want to pass in small batches of data at each iteration. Sending data in batches not only allows more efficient training but also takes advantage of the parallel nature of GPUs to accelerate training.\n","\n","Batch processing can easily be implemented using the torch.utils.data.DataLoader class. Let’s start with an example of how Torchvision uses this class, and then we’ll cover it in more detail."]},{"cell_type":"code","metadata":{"id":"SS_RrZScDFvw"},"source":["trainloader = torch.utils.data.DataLoader(\n"," train_data,\n"," batch_size=16,\n"," shuffle=True)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"x-IEueVMDs-w"},"source":["testloader = torch.utils.data.DataLoader(\n"," test_data,\n"," batch_size=16,\n"," shuffle=False)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"JW_yheMmDR9m"},"source":["The dataloader object combines a dataset and a sampler, and provides an iterable over the given dataset. In other words, your training loop can use this object to sample your dataset and apply transforms one batch at a time instead of applying them for the complete dataset at once. This considerably improves efficiency and speed when training and testing models."]},{"cell_type":"markdown","metadata":{"id":"7QQ6ss7YDSPi"},"source":["The following code shows how to retrieve a batch of samples from the trainloader:"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"yFxW7zDFDaCU","executionInfo":{"status":"ok","timestamp":1631165738759,"user_tz":-330,"elapsed":376,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"328cc60b-8537-4e7c-f2e5-918e7e81c0c5"},"source":["data_batch, labels_batch = next(iter(trainloader))\n","\n","print(data_batch.size())\n","print(labels_batch.size())"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["torch.Size([16, 3, 32, 32])\n","torch.Size([16])\n"]}]},{"cell_type":"markdown","metadata":{"id":"DJcGmSRLDerp"},"source":["We need to use iter() to cast the trainloader to an iterator and then use next() to iterate over the data one more time. This is only necessary when accessing one batch. As we’ll see later, our training loops will access the dataloader directly without the need for iter() and next(). After checking the sizes of the data and labels, we see they return batches of size 16."]},{"cell_type":"markdown","metadata":{"id":"Te-zeXO7DpsL"},"source":["So far, I’ve shown you how to load, transform, and batch image data using Torchvision. However, you can use PyTorch to prepare other types of data as well. PyTorch libraries such as Torchtext and Torchaudio provide dataset and dataloader classes for text and audio data, and new external libraries are being developed all the time.\n","\n","PyTorch also provides a submodule called torch.utils.data that you can use to create your own dataset and dataloader classes like the ones you saw in Torchvision. It consists of Dataset, Sampler, and DataLoader classes."]},{"cell_type":"markdown","metadata":{"id":"semtaM19D4jV"},"source":["PyTorch supports map- and iterable-style dataset classes. A map-style dataset is derived from the abstract class torch.utils.data.Dataset. It implements the getitem() and len() functions, and represents a map from (possibly nonintegral) indices/keys to data samples. For example, such a dataset, when accessed with dataset[idx], could read the idx-th image and its corresponding label from a folder on the disk. Map-style datasets are more commonly used than iterable-style datasets, and all datasets that represent a map made from keys or data samples should use this subclass."]},{"cell_type":"markdown","metadata":{"id":"YtARpb62O95_"},"source":["All subclasses should overwrite getitem(), which fetches a data sample for a given key. Subclasses can also optionally overwrite len(), which returns the size of the dataset by many Sampler implementations and the default options of DataLoader."]},{"cell_type":"markdown","metadata":{"id":"DDbVqGrpPLFv"},"source":["An iterable-style dataset, on the other hand, is derived from the torch.utils.data.IterableDataset abstract class. It implements the iter() protocol and represents an iterable over data samples. This type of dataset is typically used when reading data from a database or a remote server, as well as data generated in real time. Iterable datasets are useful when random reads are expensive or uncertain, and when the batch size depends on fetched data."]},{"cell_type":"markdown","metadata":{"id":"jKbI_kYyPeEV"},"source":["In addition to dataset classes PyTorch also provides sampler classes, which offer a way to iterate over indices of dataset samples. Sampler are derived from the torch.utils.data.Sampler base class.\n","\n","Every Sampler subclass needs to implement an iter() method to provide a way to iterate over indices of dataset elements and a len() method that returns the length of the returned iterators."]},{"cell_type":"markdown","metadata":{"id":"P4-YSshXQCyK"},"source":["The dataset and sampler objects are not iterables, meaning you cannot run a for loop on them. The dataloader object solves this problem. The Dataset class returns a dataset object that includes data and information about the data. The Sampler class returns the actual data itself in a specified or random fashion. The DataLoader class combines a dataset with a sampler and returns an iterable."]},{"cell_type":"markdown","metadata":{"id":"XlIm6jP-QKRZ"},"source":["One of the most powerful features of PyTorch is its Python module torch.nn, which makes it easy to design and experiment with new models. The following code illustrates how you can create a simple model with torch.nn. In this example, we will create a fully connected model called SimpleNet. It consists of an input layer, a hidden layer, and an output layer that takes in 2,048 input values and returns 2 output values for classification:"]},{"cell_type":"code","metadata":{"id":"l4Td7yLBQ8Lt"},"source":["import torch.nn as nn\n","import torch.nn.functional as F\n","\n","class SimpleNet(nn.Module):\n","\n"," def __init__(self):\n"," super(SimpleNet, self).__init__()\n"," self.fc1 = nn.Linear(2048, 256)\n"," self.fc2 = nn.Linear(256, 64)\n"," self.fc3 = nn.Linear(64,2)\n","\n"," def forward(self, x):\n"," x = x.view(-1, 2048)\n"," x = F.relu(self.fc1(x))\n"," x = F.relu(self.fc2(x))\n"," x = F.softmax(self.fc3(x),dim=1)\n"," return x"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Ub45qjGyRTLT","executionInfo":{"status":"ok","timestamp":1631169362176,"user_tz":-330,"elapsed":5,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"4265488e-5330-400f-c02b-ab43a6a66723"},"source":["simplenet = SimpleNet()\n","print(simplenet)"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["SimpleNet(\n"," (fc1): Linear(in_features=2048, out_features=256, bias=True)\n"," (fc2): Linear(in_features=256, out_features=64, bias=True)\n"," (fc3): Linear(in_features=64, out_features=2, bias=True)\n",")\n"]}]},{"cell_type":"markdown","metadata":{"id":"zaOuo05hRTXq"},"source":["This simple model demonstrates the following decisions you need to make during model design:\n","1. **Module definition**: How will you define the layers of your NN? How will you combine these layers into building blocks? In the example, we chose three linear or fully connected layers.\n","2. **Activation functions**: Which activation functions will you use at the end of each layer or module? In the example, we chose to use relu activation for the input and hidden layers and softmax for the output layer.\n","3. **Module connections**: How will your modules be connected to each other? In the example, we chose to simply connect each linear layer in sequence.\n","4. **Output selection**: What output values and formats will be returned? In this example, we return two values from the softmax() function."]},{"cell_type":"markdown","metadata":{"id":"ZiE7TMF-R_z1"},"source":["The next step in model development is to train your model with your training data. Training a model involves nothing more than estimating the model’s parameters, passing in data, and adjusting the parameters to achieve a more accurate representation of how the data is generally modeled.\n","\n","In other words, you set the parameters to some values, pass through data, and then compare the model’s outputs with true outputs to measure the error. The goal is to change the parameters and repeat the process until the error is minimized and the model’s outputs are the same as the true outputs."]},{"cell_type":"markdown","metadata":{"id":"EGhjUdCvTJ5W"},"source":["In this example, we will train the LeNet5 model with the CIFAR-10 dataset that we used earlier in this chapter. The LeNet5 model is a simple convolutional NN developed by Yann LeCun and his team at Bell Labs in the 1990s to classify hand-written digits. (Unbeknownst to me at the time, I actually worked for Bell Labs in the same building in Holmdel, NJ, while this work was being performed.)"]},{"cell_type":"code","metadata":{"id":"qyM0Bi4iTgUu"},"source":["from torch import nn\n","import torch.nn.functional as F\n","\n","class LeNet5(nn.Module):\n"," def __init__(self):\n"," super(LeNet5, self).__init__()\n"," self.conv1 = nn.Conv2d(3, 6, 5) # <1>\n"," self.conv2 = nn.Conv2d(6, 16, 5)\n"," self.fc1 = nn.Linear(16 * 5 * 5, 120)\n"," self.fc2 = nn.Linear(120, 84)\n"," self.fc3 = nn.Linear(84, 10)\n","\n"," def forward(self, x):\n"," x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))\n"," x = F.max_pool2d(F.relu(self.conv2(x)), 2)\n"," x = x.view(-1, int(x.nelement() / x.shape[0]))\n"," x = F.relu(self.fc1(x))\n"," x = F.relu(self.fc2(x))\n"," x = self.fc3(x)\n"," return x\n","\n","device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n","model = LeNet5().to(device=device)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"JV3k8lOLTszU"},"source":["Next, we need to define the loss function (which is also called the criterion) and the optimizer algorithm. The loss function determines how we measure the performance of our model and computes the loss or error between predictions and truth. We’ll attempt to minimize the loss by adjusting the model parameters during training. The optimizer defines how we update our model’s parameters during training.\n","\n","To define the loss function and the optimizer, we use the torch.optim and torch.nn packages as shown in the following code:"]},{"cell_type":"code","metadata":{"id":"xR3rFFNiTtDZ"},"source":["from torch import optim\n","from torch import nn\n","\n","criterion = nn.CrossEntropyLoss()\n","optimizer = optim.SGD(model.parameters(),\n"," lr=0.001, \n"," momentum=0.9)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"xArCva7zT5fM","executionInfo":{"status":"ok","timestamp":1631170522614,"user_tz":-330,"elapsed":341743,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"afecda60-2966-473d-c9c7-72fe74dd7936"},"source":["N_EPOCHS = 10 \n","for epoch in range(N_EPOCHS): # <1>\n","\n"," epoch_loss = 0.0\n"," for inputs, labels in trainloader:\n"," inputs = inputs.to(device) # <2>\n"," labels = labels.to(device)\n","\n"," optimizer.zero_grad() # <3>\n","\n"," outputs = model(inputs) # <4>\n"," loss = criterion(outputs, labels) # <5>\n"," loss.backward() # <6>\n"," optimizer.step() # <7>\n","\n"," epoch_loss += loss.item() # <8>\n"," print(\"Epoch: {} Loss: {}\".format(epoch, \n"," epoch_loss/len(trainloader)))"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stderr","text":["/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)\n"," return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)\n"]},{"output_type":"stream","name":"stdout","text":["Epoch: 0 Loss: 1.9107245240402222\n","Epoch: 1 Loss: 1.600426623764038\n","Epoch: 2 Loss: 1.4878545830726624\n","Epoch: 3 Loss: 1.3998275267791749\n","Epoch: 4 Loss: 1.339401881046295\n","Epoch: 5 Loss: 1.2818300464820862\n","Epoch: 6 Loss: 1.2469202939224242\n","Epoch: 7 Loss: 1.2177171779727936\n","Epoch: 8 Loss: 1.194690007867813\n","Epoch: 9 Loss: 1.1706352946281433\n"]}]},{"cell_type":"markdown","metadata":{"id":"nzNfnkUgUbQF"},"source":["1. Outer training loop; loop over 10 epochs.\n","2. Move inputs and labels to GPU if available.\n","3. Zero out gradients before each backpropagation pass, or they’ll accumulate.\n","4. Perform forward pass.\n","5. Compute loss.\n","6. Perform backpropagation; compute gradients.\n","7. Adjust parameters based on gradients.\n","8. Accumulate batch loss so we can average over the epoch."]},{"cell_type":"markdown","metadata":{"id":"7izhHuA_Uo70"},"source":["The training loop consists of two loops. In the outer loop, we will process the entire set of training data during every iteration or epoch. However, instead of waiting to process the entire dataset before updating the model’s parameters, we process smaller batches of data, one batch at a time. The inner loop loops over each batch."]},{"cell_type":"markdown","metadata":{"id":"1i5HdMfOU7Uv"},"source":["> Warning: By default, PyTorch accumulates the gradients during each call to loss.backward() (i.e., the backward pass). This is convenient while training some types of NNs, such as RNNs; however, it is not desired for convolutional neural networks (CNNs). In most cases, you will need to call optimizer.zero_grad() to zero the gradients before doing backpropagation so the optimizer updates the model parameters correctly."]},{"cell_type":"markdown","metadata":{"id":"tNCw6Ci5Wv9u"},"source":["Now that we have trained our model and attempted to minimize the loss, how can we evaluate its performance? How do we know that our model will generalize and work with data it has never seen before?\n","\n","Model development often includes validation and testing loops to ensure that overfitting does not occur and that the model will perform well against unseen data. Let’s address validation first. Here, I’ll provide you with a quick reference for how you can add validation to your training loops with PyTorch.\n","\n","Typically, we will reserve a portion of the training data for validation. The validation data will not be used to train the NN; instead, we’ll use it to test the performance of the model at the end of each epoch.\n","\n","Validation is good practice when training your models. It’s commonly performed when adjusting hyperparameters. For example, maybe we want to slow down the learning rate after five epochs."]},{"cell_type":"markdown","metadata":{"id":"g00vfkZEW80J"},"source":["Before we perform validation, we need to split our training dataset into a training dataset and a validation dataset. We use the random_split() function from torch.utils.data to reserve 10,000 of our 50,000 training images for validation. Once we create our train_set and val_set, we create our dataloaders for each one."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"viaCRVKdXJer","executionInfo":{"status":"ok","timestamp":1631170911674,"user_tz":-330,"elapsed":497,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"6ae8b391-d7bd-4caa-918d-bfb817416e10"},"source":["from torch.utils.data import random_split\n","\n","train_set, val_set = random_split(\n"," train_data,\n"," [40000, 10000])\n","\n","trainloader = torch.utils.data.DataLoader(\n"," train_set,\n"," batch_size=16,\n"," shuffle=True)\n","\n","valloader = torch.utils.data.DataLoader(\n"," val_set,\n"," batch_size=16,\n"," shuffle=True)\n","\n","print(len(trainloader))\n","print(len(valloader))"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["2500\n","625\n"]}]},{"cell_type":"markdown","metadata":{"id":"-wY5C8ycX4hI"},"source":["If the loss decreases for validation data, then the model is doing well. However, if the training loss decreases but the validation loss does not, then there’s a good chance the model is overfitting."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"q3JcA2nLXjFO","executionInfo":{"status":"ok","timestamp":1631171348512,"user_tz":-330,"elapsed":333293,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"c69de3b0-c8c2-436e-fa85-13b50510abcb"},"source":["from torch import optim\n","from torch import nn\n","\n","model = LeNet5().to(device)\n","criterion = nn.CrossEntropyLoss()\n","optimizer = optim.SGD(model.parameters(), \n"," lr=0.001, \n"," momentum=0.9)\n","\n","N_EPOCHS = 10\n","for epoch in range(N_EPOCHS):\n","\n"," # Training \n"," train_loss = 0.0\n"," model.train() # <1>\n"," for inputs, labels in trainloader:\n"," inputs = inputs.to(device)\n"," labels = labels.to(device)\n","\n"," optimizer.zero_grad()\n","\n"," outputs = model(inputs)\n"," loss = criterion(outputs, labels)\n"," loss.backward()\n"," optimizer.step()\n","\n"," train_loss += loss.item()\n","\n"," # Validation\n"," val_loss = 0.0\n"," model.eval() # <2>\n"," for inputs, labels in valloader:\n"," inputs = inputs.to(device)\n"," labels = labels.to(device)\n","\n"," outputs = model(inputs)\n"," loss = criterion(outputs, labels)\n","\n"," val_loss += loss.item()\n","\n"," print(\"Epoch: {} Train Loss: {} Val Loss: {}\".format(\n"," epoch, \n"," train_loss/len(trainloader), \n"," val_loss/len(valloader)))"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Epoch: 0 Train Loss: 1.9745660563468934 Val Loss: 1.7492280321121216\n","Epoch: 1 Train Loss: 1.6637929407119751 Val Loss: 1.5639272161483764\n","Epoch: 2 Train Loss: 1.5348106513500213 Val Loss: 1.4912440963745117\n","Epoch: 3 Train Loss: 1.4464851764440536 Val Loss: 1.385581601524353\n","Epoch: 4 Train Loss: 1.374079407954216 Val Loss: 1.4018443069458009\n","Epoch: 5 Train Loss: 1.316621362066269 Val Loss: 1.2531775268554688\n","Epoch: 6 Train Loss: 1.2859153034687043 Val Loss: 1.2561434190750123\n","Epoch: 7 Train Loss: 1.2512328678131104 Val Loss: 1.2327665576934814\n","Epoch: 8 Train Loss: 1.2264495978951455 Val Loss: 1.2291773901939391\n","Epoch: 9 Train Loss: 1.195324891924858 Val Loss: 1.1875609773635865\n"]}]},{"cell_type":"markdown","metadata":{"id":"eFNMCQiyXm62"},"source":["> Note: Running the .train() or .eval() method on your model object puts the model in training or testing mode, respectively. Calling these methods is only necessary if your model operates differently for training and evaluation. For example, dropout and batch normalization are used in training but not in validation or testing. It’s good practice to call .train() and .eval() in your loops."]},{"cell_type":"markdown","metadata":{"id":"M2AsMS_YX03g"},"source":["As you can see, our model is training well and does not seem to be overfitting, since both the training loss and the validation loss are decreasing. If we train the model for more epochs, we may get even better results.\n","\n","We’re not quite finished, though. Our model may still be overfitting. We might have just gotten lucky with our choice of hyperparameters, leading to good validation results. As a further test against overfitting, we will run some test data through our model.\n","\n","The model has never seen the test data during training, nor has the test data had any influence on the hyperparameters. Let’s see how we perform against the test dataset."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"GXWxEZP8YBFl","executionInfo":{"status":"ok","timestamp":1631171352290,"user_tz":-330,"elapsed":3808,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"139bdffb-e3b6-48ff-90e9-a3dd72dd011a"},"source":["num_correct = 0.0\n","\n","for x_test_batch, y_test_batch in testloader:\n"," model.eval()\n"," y_test_batch = y_test_batch.to(device)\n"," x_test_batch = x_test_batch.to(device)\n"," y_pred_batch = model(x_test_batch)\n"," _, predicted = torch.max(y_pred_batch, 1)\n"," num_correct += (predicted == y_test_batch).float().sum()\n"," \n","accuracy = num_correct/(len(testloader)*testloader.batch_size) \n","\n","print(len(testloader), testloader.batch_size)\n","\n","print(\"Test Accuracy: {}\".format(accuracy))"],"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["625 16\n","Test Accuracy: 0.6157000064849854\n"]}]},{"cell_type":"markdown","metadata":{"id":"cosznu2cYl3A"},"source":["> Tip: You now know how to create training, validation, and test loops using PyTorch. Feel free to use this code as a reference when creating your own loops."]},{"cell_type":"markdown","metadata":{"id":"xu8H1j2mYdUl"},"source":["Now that you have a fully trained model, let’s explore what you can do with it in the model deployment stage. One of the simplest things you can do is save your trained model for future use. When you want to run your model against new inputs, you can simply load it and call the model with the new values.\n","\n","The following code illustrates the recommended way to save and load a trained model. It uses the state_dict() method, which creates a dictionary object that maps each layer to its parameter tensor. In other words, we only need to save the model’s learned parameters. We already have the model’s design defined in our model class, so we don’t need to save the architecture. When we load the model, we use the constructor to create a “blank model,” and then we use load_state_dict() to set the parameters for each layer:"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"iiJyV2pvYpiv","executionInfo":{"status":"ok","timestamp":1631171428335,"user_tz":-330,"elapsed":408,"user":{"displayName":"Sparsh Agarwal","photoUrl":"","userId":"13037694610922482904"}},"outputId":"86a16ec6-0b0d-48e0-d2b1-0a8029fe91d5"},"source":["torch.save(model.state_dict(), \"./lenet5_model.pt\")\n","\n","model = LeNet5().to(device)\n","model.load_state_dict(torch.load(\"./lenet5_model.pt\"))"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[""]},"metadata":{},"execution_count":24}]},{"cell_type":"markdown","metadata":{"id":"PF7IUdY_ZT0B"},"source":["> Note: A common PyTorch convention is to save models using either a .pt or .pth file extension."]}]}