# CS6493 - Tutorial 1
## Introduction to Google Colab and PyTorch

Welcome to the CS6493 tutorial. In this session, you will become familiar with our experimental environment and practice some basic PyTorch operations.

## 1. Google Colab

You can use Google Colab to run the toy models. Here are some important notes for using Colab:

- You are expected to be familiar with Python and Jupyter.
- We will use **Google Colab** for the following experiments. Please run the experiments in [Google Colab](https://colab.research.google.com/).  
  (If you do not have a Google Account, please register for one.)
- Please go to **Edit -> Notebook Settings** and select Python 3 and GPU as the hardware accelerator.
- Before running a specific model, check the resources you need and compare them with the available resources by using **!nvidia-smi**.
- We will be happy to assist you throughout the tutorial, so please feel free to ask any questions you may have.


## 2. PyTorch

We use [PyTorch](https://pytorch.org/) framework to finish the implementations. In this section, we will introduce the installation, the basic operations of PyTorch.

### 2.1 Installation
Since the Colab has installed the PyTorch by default, you can check the version of PyTorch and whether it supports to GPUs by the following command. 

In [None]:
# check the GPU resource in the Colab
!nvidia-smi

In [None]:
import torch
print("PyTorch version: ", torch.__version__)

Additionally, if a specific version of PyTorch is required by some of the repositories, you can visit the official PyTorch website to find the appropriate version. It is recommended to use a full command with the exact version details, as shown below:
```
# CUDA 12.1
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
```



In [None]:
# you can try this if you want to install a specific version of PyTorch
!pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121

You can use the following code to check more details about the information of GPUs.

In [None]:
import torch
print("PyTorch version: ", torch.__version__)
print("GPU support: ", torch.cuda.is_available())
print("Available devices count: ", torch.cuda.device_count())

## 2.2 Quick start - Tensor in PyTorch

In this section, we introcue some basic concepts and operations of Tensor.

In [None]:
import numpy as np

Tensors are a specialized data structure that are very similar to arrays and matrices. In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.

Tensors are similar to NumPy’s ndarrays, except that tensors can run on GPUs or other hardware accelerators. 

One simple way to understand and utilize tensors is to know what each dimension represents.

### Create Tensors

Tensors can be created directly from data or NumPy arrays. You can assign the data type to the tensor. Otherwise, the data type would be automatically inferred.

In [None]:
data = [[0,1], [2,3]]
tensor_data = torch.tensor(data)
tensor_data_float = torch.tensor(data).float()
print(f"Long Tensor: \n {tensor_data} \n")  # the data type is LongTensor
print(f"Float Tensor: \n {tensor_data_float} \n")

In [None]:
np_data = np.array(data)
tensor_np_data = torch.tensor(np_data)
tensor_np_data_float = torch.tensor(np_data).float()
print(f"Long Tensor: \n {tensor_np_data} \n")  # the data type is LongTensor
print(f"Float Tensor: \n {tensor_np_data_float} \n")

You can also create the tensors filled with constant (e.g., 0 and 1) or random values,

In [None]:
zeros_tensor = torch.zeros((2,3))
ones_tensor = torch.ones((2,3))
random_tensor = torch.rand((2,3))
print(f"Zeros Tensor: \n {zeros_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Random Tensor: \n {random_tensor} \n")

### Attributes of a Tensor

Tensor attributes describe their shape, datatype, and the device on which they are stored.

In [None]:
tensor = torch.rand(2,3)

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

### Operations on Tensors

There are over 100 tensor operations, including arthmetic, linear algebra, matrix manipulation and more. In this section, we only introduce some frequently used operations in our later tutorials and projects.

**Move Tensor to Device**

By default, tensors are created on the CPU. We need to explicitly move tensors to the GPU using `.to()` method (after checking for GPU availability). Keep in mind that copying large tensors across devices can be expensive in terms of time and memory!

In [None]:
# move tensor to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
tensor = tensor.to(device)
print(f"Device tensor is stored on: {tensor.device}")

**Tensor indexing, slicing and reshape**

In [2]:
import torch
tensor = torch.rand(4, 6)
tensor

tensor([[0.7335, 0.7830, 0.2709, 0.9149, 0.0203, 0.6076],
        [0.1010, 0.8982, 0.8029, 0.2490, 0.0831, 0.7048],
        [0.6930, 0.3721, 0.4666, 0.3729, 0.9197, 0.5189],
        [0.7800, 0.2683, 0.0027, 0.6551, 0.1588, 0.7311]])

In [None]:
# let take a look at its first row and column
print(f"First row: {tensor[0]}")
print(f"First column: {tensor[:,0]}")
print(f"Last column: {tensor[:, -1]}")

In [None]:
# reshape
print(f"Reshape to (2,12): \n {tensor.view(2, 12)} \n")
print(f"Reshape to (2,2,6): \n {tensor.view(-1, 2, 6)} \n")

**Joining tensors.** You can use torch.cat to concatenate a sequence of tensors along a given dimension.

In [None]:
t1 = torch.zeros(4, 2)
new_t = torch.cat([tensor, t1, t1], dim=1)
new_t

**Arithmetic operations**

The basic arithmetic operations of Pytorch are similar with those in Numpy, such as `.pow()`, `.div()`, `.sum()` and more. Here we talk more about multiplication in Pytorch.

In [None]:
# This computes the matrix multiplication between two tensors. y1, y2 will have the same value
print(f"Shape of original tensor: {tensor.shape}")
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)

print(f"Shape of matrix multiplication resulting tensor: {y1.shape}")

# This computes the element-wise product. z1, z2, z3 will have the same value
z1 = tensor * tensor
z2 = tensor.mul(tensor)

print(f"Shape of element-wise product resulting tensor: {z1.shape}")

## 2.3 Practice

In NLP, we have a very popular and famous techique, termed **Attention** which is used to measure the improtance among each components. Formally, we define the attention mechanism as:

$Attention(\mathbf{Q},\mathbf{K},\mathbf{V}) = \text{Softmax}(\frac{\mathbf{Q}\mathbf{K}^T}{\sqrt{d_k}})\mathbf{V}$

$\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}$,

you can attempt to implement softmax function and attention by yourself.

Hint: you can decompose the equation into some basic components and check how to achieve these basic components. Google your questions and you can find the answers on stackoverflow or official documentations of Numpy or Pytorch.

In [None]:
# please notice that we have a batch size of 2, 
# and the dimension of Q, K, V is 4x8 
# you can regard them as 4 queries, 4 keys and 4 values with the representation of 8 dimensional vectors
v= torch.rand((2,4,8))
k = v
q = torch.rand((2,4,8))
d_k = 8

In [None]:
# insert your code
def attention(q, k, v):
    pass

def softmax(x):
    pass
