{ "cells": [ { "cell_type": "markdown", "id": "9e126499", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "# Data Manipulation\n", "\n" ] }, { "cell_type": "markdown", "id": "7c971253", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "To start, we import the PyTorch library.\n", "Note that the package name is `torch`" ] }, { "cell_type": "code", "execution_count": 1, "id": "01fa8e58", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:55.152236Z", "iopub.status.busy": "2023-08-18T19:32:55.151500Z", "iopub.status.idle": "2023-08-18T19:32:57.051589Z", "shell.execute_reply": "2023-08-18T19:32:57.050409Z" }, "origin_pos": 6, "tab": [ "pytorch" ] }, "outputs": [], "source": [ "import torch" ] }, { "cell_type": "markdown", "id": "6f47ba3d", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "A tensor represents a (possibly multidimensional) array of numerical values" ] }, { "cell_type": "code", "execution_count": 2, "id": "b6aa30a9", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.056039Z", "iopub.status.busy": "2023-08-18T19:32:57.055276Z", "iopub.status.idle": "2023-08-18T19:32:57.089028Z", "shell.execute_reply": "2023-08-18T19:32:57.088195Z" }, "origin_pos": 14, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "tensor([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11.])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = torch.arange(12, dtype=torch.float32)\n", "x" ] }, { "cell_type": "code", "execution_count": 3, "id": "640cadaf", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.093138Z", "iopub.status.busy": "2023-08-18T19:32:57.092473Z", "iopub.status.idle": "2023-08-18T19:32:57.098450Z", "shell.execute_reply": "2023-08-18T19:32:57.097452Z" }, "origin_pos": 21, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "12" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.numel()" ] }, { "cell_type": "markdown", "id": "26fba460", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "We can access a tensor's *shape*" ] }, { "cell_type": "code", "execution_count": 4, "id": "6e0a9616", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.102194Z", "iopub.status.busy": "2023-08-18T19:32:57.101575Z", "iopub.status.idle": "2023-08-18T19:32:57.107424Z", "shell.execute_reply": "2023-08-18T19:32:57.106501Z" }, "origin_pos": 24, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "torch.Size([12])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.shape" ] }, { "cell_type": "markdown", "id": "1f9312ce", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Change the shape of a tensor\n", "without altering its size or values" ] }, { "cell_type": "code", "execution_count": 5, "id": "6092207c", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.111467Z", "iopub.status.busy": "2023-08-18T19:32:57.110749Z", "iopub.status.idle": "2023-08-18T19:32:57.117759Z", "shell.execute_reply": "2023-08-18T19:32:57.116917Z" }, "origin_pos": 26, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "tensor([[ 0., 1., 2., 3.],\n", " [ 4., 5., 6., 7.],\n", " [ 8., 9., 10., 11.]])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X = x.reshape(3, 4)\n", "X" ] }, { "cell_type": "markdown", "id": "db1fae63", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We can construct a tensor with all elements set to 0\n", "or one" ] }, { "cell_type": "code", "execution_count": 6, "id": "383cafca", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.122018Z", "iopub.status.busy": "2023-08-18T19:32:57.121194Z", "iopub.status.idle": "2023-08-18T19:32:57.128294Z", "shell.execute_reply": "2023-08-18T19:32:57.127285Z" }, "origin_pos": 30, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "tensor([[[0., 0., 0., 0.],\n", " [0., 0., 0., 0.],\n", " [0., 0., 0., 0.]],\n", "\n", " [[0., 0., 0., 0.],\n", " [0., 0., 0., 0.],\n", " [0., 0., 0., 0.]]])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "torch.zeros((2, 3, 4))" ] }, { "cell_type": "code", "execution_count": 7, "id": "0ea249d4", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.132534Z", "iopub.status.busy": "2023-08-18T19:32:57.131716Z", "iopub.status.idle": "2023-08-18T19:32:57.139029Z", "shell.execute_reply": "2023-08-18T19:32:57.138135Z" }, "origin_pos": 35, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "tensor([[[1., 1., 1., 1.],\n", " [1., 1., 1., 1.],\n", " [1., 1., 1., 1.]],\n", "\n", " [[1., 1., 1., 1.],\n", " [1., 1., 1., 1.],\n", " [1., 1., 1., 1.]]])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "torch.ones((2, 3, 4))" ] }, { "cell_type": "markdown", "id": "01f34860", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Sample each element randomly (and independently)" ] }, { "cell_type": "code", "execution_count": 8, "id": "2254595d", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.143051Z", "iopub.status.busy": "2023-08-18T19:32:57.142388Z", "iopub.status.idle": "2023-08-18T19:32:57.149695Z", "shell.execute_reply": "2023-08-18T19:32:57.148813Z" }, "origin_pos": 40, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "tensor([[ 0.1351, -0.9099, -0.2028, 2.1937],\n", " [-0.3200, -0.7545, 0.8086, -1.8730],\n", " [ 0.3929, 0.4931, 0.9114, -0.7072]])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "torch.randn(3, 4)" ] }, { "cell_type": "markdown", "id": "dc6681ad", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Supplying the exact values for each element" ] }, { "cell_type": "code", "execution_count": 9, "id": "b26863d8", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.153567Z", "iopub.status.busy": "2023-08-18T19:32:57.153222Z", "iopub.status.idle": "2023-08-18T19:32:57.160436Z", "shell.execute_reply": "2023-08-18T19:32:57.159548Z" }, "origin_pos": 45, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "tensor([[2, 1, 4, 3],\n", " [1, 2, 3, 4],\n", " [4, 3, 2, 1]])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "torch.tensor([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])" ] }, { "cell_type": "markdown", "id": "255741b1", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "`[-1]` selects the last row and `[1:3]`\n", "selects the second and third rows" ] }, { "cell_type": "code", "execution_count": 10, "id": "d9049a53", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.164537Z", "iopub.status.busy": "2023-08-18T19:32:57.163812Z", "iopub.status.idle": "2023-08-18T19:32:57.171699Z", "shell.execute_reply": "2023-08-18T19:32:57.170451Z" }, "origin_pos": 49, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "(tensor([ 8., 9., 10., 11.]),\n", " tensor([[ 4., 5., 6., 7.],\n", " [ 8., 9., 10., 11.]]))" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X[-1], X[1:3]" ] }, { "cell_type": "markdown", "id": "30e6e22b", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "We can also *write* elements of a matrix by specifying indices" ] }, { "cell_type": "code", "execution_count": 11, "id": "9246619c", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.176047Z", "iopub.status.busy": "2023-08-18T19:32:57.175685Z", "iopub.status.idle": "2023-08-18T19:32:57.182893Z", "shell.execute_reply": "2023-08-18T19:32:57.181890Z" }, "origin_pos": 52, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "tensor([[ 0., 1., 2., 3.],\n", " [ 4., 5., 17., 7.],\n", " [ 8., 9., 10., 11.]])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X[1, 2] = 17\n", "X" ] }, { "cell_type": "markdown", "id": "8708c166", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "To assign multiple elements the same value,\n", "we apply the indexing on the left-hand side \n", "of the assignment operation" ] }, { "cell_type": "code", "execution_count": 12, "id": "0532f024", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.186970Z", "iopub.status.busy": "2023-08-18T19:32:57.186270Z", "iopub.status.idle": "2023-08-18T19:32:57.193303Z", "shell.execute_reply": "2023-08-18T19:32:57.192338Z" }, "origin_pos": 56, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "tensor([[12., 12., 12., 12.],\n", " [12., 12., 12., 12.],\n", " [ 8., 9., 10., 11.]])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X[:2, :] = 12\n", "X" ] }, { "cell_type": "code", "execution_count": 13, "id": "6dd6724c", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.197301Z", "iopub.status.busy": "2023-08-18T19:32:57.196599Z", "iopub.status.idle": "2023-08-18T19:32:57.206136Z", "shell.execute_reply": "2023-08-18T19:32:57.205188Z" }, "origin_pos": 61, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "tensor([162754.7969, 162754.7969, 162754.7969, 162754.7969, 162754.7969,\n", " 162754.7969, 162754.7969, 162754.7969, 2980.9580, 8103.0840,\n", " 22026.4648, 59874.1406])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "torch.exp(x)" ] }, { "cell_type": "code", "execution_count": 14, "id": "89bc996d", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.210417Z", "iopub.status.busy": "2023-08-18T19:32:57.209741Z", "iopub.status.idle": "2023-08-18T19:32:57.219298Z", "shell.execute_reply": "2023-08-18T19:32:57.218318Z" }, "origin_pos": 66, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "(tensor([ 3., 4., 6., 10.]),\n", " tensor([-1., 0., 2., 6.]),\n", " tensor([ 2., 4., 8., 16.]),\n", " tensor([0.5000, 1.0000, 2.0000, 4.0000]),\n", " tensor([ 1., 4., 16., 64.]))" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = torch.tensor([1.0, 2, 4, 8])\n", "y = torch.tensor([2, 2, 2, 2])\n", "x + y, x - y, x * y, x / y, x ** y" ] }, { "cell_type": "markdown", "id": "4217e631", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "*concatenate* multiple tensors" ] }, { "cell_type": "code", "execution_count": 15, "id": "43aa9012", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.223534Z", "iopub.status.busy": "2023-08-18T19:32:57.222711Z", "iopub.status.idle": "2023-08-18T19:32:57.233166Z", "shell.execute_reply": "2023-08-18T19:32:57.232145Z" }, "origin_pos": 71, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "(tensor([[ 0., 1., 2., 3.],\n", " [ 4., 5., 6., 7.],\n", " [ 8., 9., 10., 11.],\n", " [ 2., 1., 4., 3.],\n", " [ 1., 2., 3., 4.],\n", " [ 4., 3., 2., 1.]]),\n", " tensor([[ 0., 1., 2., 3., 2., 1., 4., 3.],\n", " [ 4., 5., 6., 7., 1., 2., 3., 4.],\n", " [ 8., 9., 10., 11., 4., 3., 2., 1.]]))" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X = torch.arange(12, dtype=torch.float32).reshape((3,4))\n", "Y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])\n", "torch.cat((X, Y), dim=0), torch.cat((X, Y), dim=1)" ] }, { "cell_type": "markdown", "id": "55bcaa66", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Construct a binary tensor via *logical statements*" ] }, { "cell_type": "code", "execution_count": 16, "id": "91d39e58", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.237276Z", "iopub.status.busy": "2023-08-18T19:32:57.236485Z", "iopub.status.idle": "2023-08-18T19:32:57.243133Z", "shell.execute_reply": "2023-08-18T19:32:57.242117Z" }, "origin_pos": 75, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "tensor([[False, True, False, True],\n", " [False, False, False, False],\n", " [False, False, False, False]])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X == Y" ] }, { "cell_type": "markdown", "id": "b7ccd1ea", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Summing all the elements in the tensor" ] }, { "cell_type": "code", "execution_count": 17, "id": "080b0125", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.247142Z", "iopub.status.busy": "2023-08-18T19:32:57.246480Z", "iopub.status.idle": "2023-08-18T19:32:57.253117Z", "shell.execute_reply": "2023-08-18T19:32:57.252212Z" }, "origin_pos": 77, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "tensor(66.)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X.sum()" ] }, { "cell_type": "markdown", "id": "8861ed40", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Perform elementwise binary operations\n", "by invoking the *broadcasting mechanism*" ] }, { "cell_type": "code", "execution_count": 18, "id": "be37d2de", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.256932Z", "iopub.status.busy": "2023-08-18T19:32:57.256264Z", "iopub.status.idle": "2023-08-18T19:32:57.263823Z", "shell.execute_reply": "2023-08-18T19:32:57.262881Z" }, "origin_pos": 81, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "(tensor([[0],\n", " [1],\n", " [2]]),\n", " tensor([[0, 1]]))" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = torch.arange(3).reshape((3, 1))\n", "b = torch.arange(2).reshape((1, 2))\n", "a, b" ] }, { "cell_type": "code", "execution_count": 19, "id": "9f62e827", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.267856Z", "iopub.status.busy": "2023-08-18T19:32:57.267172Z", "iopub.status.idle": "2023-08-18T19:32:57.273497Z", "shell.execute_reply": "2023-08-18T19:32:57.272587Z" }, "origin_pos": 85, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "tensor([[0, 1],\n", " [1, 2],\n", " [2, 3]])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a + b" ] }, { "cell_type": "markdown", "id": "b4200afd", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Running operations can cause new memory to be\n", "allocated to host results" ] }, { "cell_type": "code", "execution_count": 20, "id": "754a7433", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.277697Z", "iopub.status.busy": "2023-08-18T19:32:57.277047Z", "iopub.status.idle": "2023-08-18T19:32:57.283549Z", "shell.execute_reply": "2023-08-18T19:32:57.282613Z" }, "origin_pos": 87, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "before = id(Y)\n", "Y = Y + X\n", "id(Y) == before" ] }, { "cell_type": "markdown", "id": "04fb2680", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Performing in-place operations" ] }, { "cell_type": "code", "execution_count": 21, "id": "c4d62609", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.287695Z", "iopub.status.busy": "2023-08-18T19:32:57.286964Z", "iopub.status.idle": "2023-08-18T19:32:57.293078Z", "shell.execute_reply": "2023-08-18T19:32:57.292048Z" }, "origin_pos": 92, "tab": [ "pytorch" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "id(Z): 140381179266448\n", "id(Z): 140381179266448\n" ] } ], "source": [ "Z = torch.zeros_like(Y)\n", "print('id(Z):', id(Z))\n", "Z[:] = X + Y\n", "print('id(Z):', id(Z))" ] }, { "cell_type": "markdown", "id": "fd10587e", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "If the value of `X` is not reused in subsequent computations,\n", "we can also use `X[:] = X + Y` or `X += Y`\n", "to reduce the memory overhead of the operation" ] }, { "cell_type": "code", "execution_count": 22, "id": "b8c13447", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.296911Z", "iopub.status.busy": "2023-08-18T19:32:57.296361Z", "iopub.status.idle": "2023-08-18T19:32:57.302754Z", "shell.execute_reply": "2023-08-18T19:32:57.301805Z" }, "origin_pos": 97, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "before = id(X)\n", "X += Y\n", "id(X) == before" ] }, { "cell_type": "markdown", "id": "3be062ac", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Converting to a NumPy tensor (`ndarray`)" ] }, { "cell_type": "code", "execution_count": 23, "id": "576963aa", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.306812Z", "iopub.status.busy": "2023-08-18T19:32:57.306088Z", "iopub.status.idle": "2023-08-18T19:32:57.312356Z", "shell.execute_reply": "2023-08-18T19:32:57.311478Z" }, "origin_pos": 103, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "(numpy.ndarray, torch.Tensor)" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = X.numpy()\n", "B = torch.from_numpy(A)\n", "type(A), type(B)" ] }, { "cell_type": "markdown", "id": "b36343ba", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Convert a size-1 tensor to a Python scalar" ] }, { "cell_type": "code", "execution_count": 24, "id": "388c5252", "metadata": { "execution": { "iopub.execute_input": "2023-08-18T19:32:57.316471Z", "iopub.status.busy": "2023-08-18T19:32:57.315825Z", "iopub.status.idle": "2023-08-18T19:32:57.322867Z", "shell.execute_reply": "2023-08-18T19:32:57.322007Z" }, "origin_pos": 108, "tab": [ "pytorch" ] }, "outputs": [ { "data": { "text/plain": [ "(tensor([3.5000]), 3.5, 3.5, 3)" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = torch.tensor([3.5])\n", "a, a.item(), float(a), int(a)" ] } ], "metadata": { "celltoolbar": "Slideshow", "language_info": { "name": "python" }, "required_libs": [], "rise": { "autolaunch": true, "enable_chalkboard": true, "overlay": "
", "scroll": true } }, "nbformat": 4, "nbformat_minor": 5 }