{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "D_a2USyd4giE" }, "source": [ "# **Homework 3 - Convolutional Neural Network**\n", "\n", "This is the example code of homework 3 of the machine learning course by Prof. Hung-yi Lee.\n", "\n", "In this homework, you are required to build a convolutional neural network for image classification, possibly with some advanced training tips.\n", "\n", "\n", "There are three levels here:\n", "\n", "**Easy**: Build a simple convolutional neural network as the baseline. (2 pts)\n", "\n", "**Medium**: Design a better architecture or adopt different data augmentations to improve the performance. (2 pts)\n", "\n", "**Hard**: Utilize provided unlabeled data to obtain better results. (2 pts)" ] }, { "cell_type": "markdown", "metadata": { "id": "VHpJocsDr6iA" }, "source": [ "## **About the Dataset**\n", "\n", "The dataset used here is food-11, a collection of food images in 11 classes.\n", "\n", "For the requirement in the homework, TAs slightly modified the data.\n", "Please DO NOT access the original fully-labeled training data or testing labels.\n", "\n", "Also, the modified dataset is for this course only, and any further distribution or commercial use is forbidden." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "zhzdomRTOKoJ", "outputId": "dadbc075-15af-410d-9051-7a9634a374f2" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "/usr/local/lib/python3.7/dist-packages/gdown/cli.py:131: FutureWarning: Option `--id` was deprecated in version 4.3.1 and will be removed in 5.0. The arguments for commonly used modules:\n", " # torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)\n", " # torch.nn.MaxPool2d(kernel_size, stride, padding)\n", "\n", " # input image size: [3, 128, 128]\n", " self.cnn_layers = nn.Sequential(\n", " # torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)\n", " nn.Conv2d(3, 64, 3, 1, 1), # output image size: [64, 128, 128]\n", " # torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, device=None, dtype=None)\n", " nn.BatchNorm2d(64), # 不改变image size\n", " # torch.nn.ReLU(inplace=False)\n", " nn.ReLU(), # 加一个非线性变换\n", " # torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)\n", " nn.MaxPool2d(2, 2, 0), # output image size: [64, 64, 64]\n", "\n", " nn.Conv2d(64, 128, 3, 1, 1), # output image size: [128, 64, 64]\n", " nn.BatchNorm2d(128),\n", " nn.ReLU(),\n", " nn.MaxPool2d(2, 2, 0), # output image size: [128, 32, 32]\n", "\n", " nn.Conv2d(128, 256, 3, 1, 1), # output image size: [256, 32, 32]\n", " nn.BatchNorm2d(256),\n", " nn.ReLU(),\n", " nn.MaxPool2d(4, 4, 0), # output image size: [256, 8, 8]\n", " )\n", " self.fc_layers = nn.Sequential(\n", " nn.Linear(256 * 8 * 8, 256), # 全连接需要拉直\n", " nn.ReLU(),\n", " nn.Linear(256, 256), # ?\n", " nn.ReLU(),\n", " nn.Linear(256, 11)\n", " )\n", "\n", " def forward(self, x):\n", " # input (x): [batch_size, 3, 128, 128]\n", " # output: [batch_size, 11]\n", "\n", " # Extract features by convolutional layers.\n", " x = self.cnn_layers(x)\n", "\n", " # The extracted feature map must be flatten before going to fully-connected layers.\n", " x = x.flatten(1) # 拉直\n", "\n", " # The features are transformed by fully-connected layers to obtain the final logits. logits: 最终的全连接层的输出\n", " x = self.fc_layers(x)\n", " return x" ] }, { "cell_type": "markdown", "metadata": { "id": "cbm81gwD50fo" }, "source": [ "cnn_layers - First Layer\n", "\n", "![](https://cdn.jsdelivr.net/gh/WanpengXu/myPicGo/img/202208060048059.png)\n", "\n", "1. Conv2d:\n", "\n", "```python\n", "torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)\n", "```\n", "\n", "卷积\n", "\n", "$$\n", "H_{out} = ⌊\\frac{128+2*1-1*(3-1)-1}{1}+1⌋=128\n", "$$\n", "\n", "$$\n", "W_{out} = ⌊\\frac{128+2*1-1*(3-1)-1}{1}+1⌋=128\n", "$$\n", "\n", "2. BatchNorm2d\n", "\n", "```python\n", "torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, device=None, dtype=None)\n", "```\n", "\n", "batch标准化\n", "\n", "![](https://cdn.jsdelivr.net/gh/WanpengXu/myPicGo/img/202208060104296.png)\n", "\n", "前三步类似概率论中的随机变量$X(𝜇, σ) ∼ N(0, 1)$,ϵ是参数eps,添加到mini-batch中,以保证值的稳定性。\n", "\n", "3. ReLU\n", "\n", "```python\n", "torch.nn.ReLU(inplace=False)\n", "```\n", "\n", "![](https://pytorch.org/docs/stable/_images/ReLU.png)\n", "\n", "4. MaxPool2d\n", "\n", "```python\n", "torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)\n", "```\n", "\n", "![](https://cdn.jsdelivr.net/gh/WanpengXu/myPicGo/img/202208060116297.png)\n", "\n", "$$\n", "H_{out}=⌊\\frac{128+2*0-1*(2-1)-1}{2}+1⌋=64\n", "$$\n", "\n", "$$\n", "W_{out}=⌊\\frac{128+2*0-1*(2-1)-1}{2}+1⌋=64\n", "$$\n", "\n", "forward\n", "\n", "```python\n", "x = x.flatten(1)\n", "```\n", "or\n", "```python\n", "x = x.view(x.size()[0], -1)\n", "```\n", "\n", "![](https://cdn.jsdelivr.net/gh/WanpengXu/myPicGo/img/202208060133103.png)" ] }, { "cell_type": "markdown", "metadata": { "id": "aEnGbriXORN3" }, "source": [ "## **Training**\n", "\n", "You can finish supervised learning by simply running the provided code without any modification.\n", "\n", "The function \"get_pseudo_labels\" is used for semi-supervised learning.\n", "It is expected to get better performance if you use unlabeled data for semi-supervised learning.\n", "However, you have to implement the function on your own and need to adjust several hyperparameters manually.\n", "\n", "For more details about semi-supervised learning, please refer to [Prof. Lee's slides](https://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/semi%20(v3).pdf).\n", "\n", "Again, please notice that utilizing external data (or pre-trained model) for training is **prohibited**." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "id": "swlf5EwA-hxA" }, "outputs": [], "source": [ "def get_pseudo_labels(dataset, model, threshold=0.65):\n", " # This functions generates pseudo-labels of a dataset using given model.\n", " # It returns an instance of DatasetFolder containing images whose prediction confidences exceed a given threshold.\n", " # You are NOT allowed to use any models trained on external data for pseudo-labeling.\n", " device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n", "\n", " # Construct a data loader.\n", " data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=False)\n", "\n", " # Make sure the model is in eval mode.\n", " model.eval()\n", " # Define softmax function.\n", " softmax = nn.Softmax(dim=-1)\n", "\n", " # Iterate over the dataset by batches.\n", " for batch in tqdm(data_loader):\n", " img, _ = batch\n", "\n", " # Forward the data\n", " # Using torch.no_grad() accelerates the forward process.\n", " with torch.no_grad():\n", " logits = model(img.to(device))\n", "\n", " # Obtain the probability distributions by applying softmax on logits.\n", " probs = softmax(logits)\n", "\n", " # ---------- TODO ----------\n", " # Filter the data and construct a new dataset.\n", "\n", " # # Turn off the eval mode.\n", " model.train()\n", " return dataset" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "id": "PHaFE-8oQtkC", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "b1f9d2b1-cd1b-42ba-f8f7-05351d25f617" }, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "\r 0%| | 0/25 [00:00