{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "name": "“SHARE MLSpring2021 - HW2-1.ipynb”的副本",
      "provenance": [],
      "collapsed_sections": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OYlaRwNu7ojq"
      },
      "source": [
        "# **Homework 2-1 Phoneme Classification**\n",
        "\n",
        "* Slides: https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/hw/HW02/HW02.pdf\n",
        "* Video (Chinese): https://youtu.be/PdjXnQbu2zo\n",
        "* Video (English): https://youtu.be/ESRr-VCykBs\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "emUd7uS7crTz"
      },
      "source": [
        "## The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT)\n",
        "The TIMIT corpus of reading speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems.\n",
        "\n",
        "This homework is a multiclass classification task, \n",
        "we are going to train a deep neural network classifier to predict the phonemes for each frame from the speech corpus TIMIT.\n",
        "\n",
        "link: https://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KVUGfWTo7_Oj"
      },
      "source": [
        "## Download Data\n",
        "Download data from google drive, then unzip it.\n",
        "\n",
        "You should have `timit_11/train_11.npy`, `timit_11/train_label_11.npy`, and `timit_11/test_11.npy` after running this block.<br><br>\n",
        "`timit_11/`\n",
        "- `train_11.npy`: training data<br>\n",
        "- `train_label_11.npy`: training label<br>\n",
        "- `test_11.npy`:  testing data<br><br>\n",
        "\n",
        "**notes: if the google drive link is dead, you can download the data directly from Kaggle and upload it to the workspace**\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "OzkiMEcC3Foq",
        "outputId": "6500d6a7-7187-474f-ca8f-86157488a3c0"
      },
      "source": [
        "!gdown --id '1HPkcmQmFGu-3OknddKIa5dNDsR05lIQR' --output data.zip\n",
        "!unzip data.zip\n",
        "!ls"
      ],
      "execution_count": 1,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "/usr/local/lib/python3.7/dist-packages/gdown/cli.py:131: FutureWarning: Option `--id` was deprecated in version 4.3.1 and will be removed in 5.0. You don't need to pass it anymore to use a file ID.\n",
            "  category=FutureWarning,\n",
            "Downloading...\n",
            "From: https://drive.google.com/uc?id=1HPkcmQmFGu-3OknddKIa5dNDsR05lIQR\n",
            "To: /content/data.zip\n",
            "100% 372M/372M [00:03<00:00, 95.5MB/s]\n",
            "Archive:  data.zip\n",
            "   creating: timit_11/\n",
            "  inflating: timit_11/train_11.npy   \n",
            "  inflating: timit_11/test_11.npy    \n",
            "  inflating: timit_11/train_label_11.npy  \n",
            "data.zip  sample_data  timit_11\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_L_4anls8Drv"
      },
      "source": [
        "## Preparing Data\n",
        "Load the training and testing data from the `.npy` file (NumPy array)."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "IJjLT8em-y9G",
        "outputId": "ca14d59f-58b3-4f23-e324-d1fb2278583e"
      },
      "source": [
        "import numpy as np\n",
        "\n",
        "print('Loading data ...')\n",
        "\n",
        "data_root='./timit_11/'\n",
        "\n",
        "# read data from .npy file using np.load\n",
        "train = np.load(data_root + 'train_11.npy')\n",
        "train_label = np.load(data_root + 'train_label_11.npy')\n",
        "test = np.load(data_root + 'test_11.npy')\n",
        "\n",
        "print('Size of training data: {}'.format(train.shape))\n",
        "print('Size of testing data: {}'.format(test.shape))"
      ],
      "execution_count": 2,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Loading data ...\n",
            "Size of training data: (1229932, 429)\n",
            "Size of testing data: (451552, 429)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "us5XW_x6udZQ"
      },
      "source": [
        "## Create Dataset"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Fjf5EcmJtf4e"
      },
      "source": [
        "import torch\n",
        "from torch.utils.data import Dataset\n",
        "\n",
        "class TIMITDataset(Dataset):\n",
        "    def __init__(self, X, y=None):\n",
        "        self.data = torch.from_numpy(X).float()\n",
        "        if y is not None:\n",
        "            y = y.astype(np.int)\n",
        "            self.label = torch.LongTensor(y)\n",
        "        else:\n",
        "            self.label = None\n",
        "\n",
        "    def __getitem__(self, idx):\n",
        "        if self.label is not None:\n",
        "            return self.data[idx], self.label[idx]\n",
        "        else:\n",
        "            return self.data[idx]\n",
        "\n",
        "    def __len__(self):\n",
        "        return len(self.data)\n"
      ],
      "execution_count": 3,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "otIC6WhGeh9v"
      },
      "source": [
        "Split the labeled data into a training set and a validation set, you can modify the variable `VAL_RATIO` to change the ratio of validation data."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "sYqi_lAuvC59",
        "outputId": "e5cd332e-e122-4490-e1fe-c834907edda0"
      },
      "source": [
        "VAL_RATIO = 0.2\n",
        "\n",
        "percent = int(train.shape[0] * (1 - VAL_RATIO))     # pivot of train data and dev data\n",
        "train_x, train_y, val_x, val_y = train[:percent], train_label[:percent], train[percent:], train_label[percent:]\n",
        "print('Size of training set: {}'.format(train_x.shape))\n",
        "print('Size of validation set: {}'.format(val_x.shape))"
      ],
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Size of training set: (983945, 429)\n",
            "Size of validation set: (245987, 429)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nbCfclUIgMTX"
      },
      "source": [
        "Create a data loader from the dataset, feel free to tweak the variable `BATCH_SIZE` here."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RUCbQvqJurYc",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "c66c0f90-838e-45eb-f96a-a6b1a81b1868"
      },
      "source": [
        "BATCH_SIZE = 64\n",
        "\n",
        "from torch.utils.data import DataLoader\n",
        "\n",
        "train_set = TIMITDataset(train_x, train_y)\n",
        "val_set = TIMITDataset(val_x, val_y)\n",
        "train_loader = DataLoader(train_set, batch_size=BATCH_SIZE, shuffle=True) # only shuffle the training data\n",
        "val_loader = DataLoader(val_set, batch_size=BATCH_SIZE, shuffle=False)"
      ],
      "execution_count": 5,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:8: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.\n",
            "Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations\n",
            "  \n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_SY7X0lUgb50"
      },
      "source": [
        "Cleanup the unneeded variables to save memory.<br>\n",
        "\n",
        "**notes: if you need to use these variables later, then you may remove this block or clean up unneeded variables later<br>the data size is quite huge, so be aware of memory usage in colab**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "y8rzkGraeYeN",
        "outputId": "9c326abe-0b60-4aed-c345-8dccbd81e9bb"
      },
      "source": [
        "del train, train_label, train_x, train_y, val_x, val_y\n",
        "\n",
        "# Garbage Collector\n",
        "import gc\n",
        "# gc.collect(generation=2): For variable of generation 0~2 of generational algorithm, delete memory for unreachable variable in them using mark-sweep algotirhm(travel algotirhm).\n",
        "gc.collect()        # del变量后立即gc.collect，可以确保它们引用的内存被立即清除，以释放RAM"
      ],
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "153"
            ]
          },
          "metadata": {},
          "execution_count": 6
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IRqKNvNZwe3V"
      },
      "source": [
        "## Create Model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FYr1ng5fh9pA"
      },
      "source": [
        "Define model architecture, you are encouraged to change and experiment with the model architecture."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "lbZrwT6Ny0XL"
      },
      "source": [
        "import torch\n",
        "import torch.nn as nn\n",
        "\n",
        "class Classifier(nn.Module):\n",
        "    def __init__(self):\n",
        "        super(Classifier, self).__init__()\n",
        "        self.layer1 = nn.Linear(429, 1024)\n",
        "        self.layer2 = nn.Linear(1024, 512)\n",
        "        self.layer3 = nn.Linear(512, 128)\n",
        "        self.out = nn.Linear(128, 39) \n",
        "\n",
        "        self.act_fn = nn.Sigmoid()\n",
        "\n",
        "    def forward(self, x):\n",
        "        x = self.layer1(x)\n",
        "        x = self.act_fn(x)\n",
        "\n",
        "        x = self.layer2(x)\n",
        "        x = self.act_fn(x)\n",
        "\n",
        "        x = self.layer3(x)\n",
        "        x = self.act_fn(x)\n",
        "\n",
        "        x = self.out(x)\n",
        "        \n",
        "        return x"
      ],
      "execution_count": 7,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "VRYciXZvPbYh"
      },
      "source": [
        "## Training"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "y114Vmm3Ja6o"
      },
      "source": [
        "# check device\n",
        "def get_device():\n",
        "  return 'cuda' if torch.cuda.is_available() else 'cpu'"
      ],
      "execution_count": 8,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "sEX-yjHjhGuH"
      },
      "source": [
        "Fix random seeds for reproducibility."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "88xPiUnm0tAd"
      },
      "source": [
        "# fix random seed\n",
        "def same_seeds(seed):\n",
        "    torch.manual_seed(seed)\n",
        "    if torch.cuda.is_available():\n",
        "        torch.cuda.manual_seed(seed)\n",
        "        torch.cuda.manual_seed_all(seed)\n",
        "    np.random.seed(seed)\n",
        "    torch.backends.cudnn.benchmark = False\n",
        "    torch.backends.cudnn.deterministic = True"
      ],
      "execution_count": 9,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KbBcBXkSp6RA"
      },
      "source": [
        "Feel free to change the training parameters here."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "QTp3ZXg1yO9Y",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "630d8c09-f3f0-4b0d-d01f-b6f5785d139c"
      },
      "source": [
        "# fix random seed for reproducibility 可重复性\n",
        "same_seeds(0)\n",
        "\n",
        "# get device \n",
        "device = get_device()\n",
        "print(f'DEVICE: {device}')\n",
        "\n",
        "# training parameters\n",
        "num_epoch = 20               # number of training epoch\n",
        "learning_rate = 0.0001       # learning rate\n",
        "\n",
        "# the path where checkpoint saved\n",
        "model_path = './model.ckpt'\n",
        "\n",
        "# create model, define a loss function, and optimizer\n",
        "model = Classifier().to(device)\n",
        "criterion = nn.CrossEntropyLoss() \n",
        "optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)"
      ],
      "execution_count": 10,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "DEVICE: cuda\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "CdMWsBs7zzNs",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "6a7ed041-e6e9-425e-c206-3148ca868cce"
      },
      "source": [
        "# start training\n",
        "\n",
        "best_acc = 0.0\n",
        "for epoch in range(num_epoch):\n",
        "    train_acc = 0.0\n",
        "    train_loss = 0.0\n",
        "    val_acc = 0.0\n",
        "    val_loss = 0.0\n",
        "\n",
        "    # training\n",
        "    model.train()   # set the model to training mode\n",
        "    for i, data in enumerate(train_loader):\n",
        "        inputs, labels = data\n",
        "        inputs, labels = inputs.to(device), labels.to(device)\n",
        "        optimizer.zero_grad() \n",
        "        outputs = model(inputs) \n",
        "        batch_loss = criterion(outputs, labels)\n",
        "        # torch.max(input, dim): Returns a namedtuple (values, indices) where values is the maximum value of each row of the input tensor in the given dimension dim.\n",
        "        _, train_pred = torch.max(outputs, 1)   # get the index of the class with the highest probability\n",
        "        batch_loss.backward() \n",
        "        optimizer.step() \n",
        "\n",
        "        # Tensor.cpu(): Returns a copy of this object in CPU memory.\n",
        "        # Tensor.==: 对应位置相同为1，不同为0！！，等同于Tensor.eq_()方法。\n",
        "        # Tensor.sum(): Returns the sum of all elements in the input tensor.\n",
        "        # Tensor.item(): Returns the value of this tensor as a standard Python number.\n",
        "        train_acc += (train_pred.cpu() == labels.cpu()).sum().item()        # 2 tensor -> 1 tensor -> 1 tensor of 1 element -> 1 number\n",
        "        train_loss += batch_loss.item()\n",
        "\n",
        "    # validation\n",
        "    if len(val_set) > 0:\n",
        "        model.eval()    # set the model to evaluation mode\n",
        "        with torch.no_grad():\n",
        "            for i, data in enumerate(val_loader):\n",
        "                inputs, labels = data\n",
        "                inputs, labels = inputs.to(device), labels.to(device)\n",
        "                outputs = model(inputs)\n",
        "                batch_loss = criterion(outputs, labels) \n",
        "                _, val_pred = torch.max(outputs, 1) \n",
        "            \n",
        "                val_acc += (val_pred.cpu() == labels.cpu()).sum().item()    # get the index of the class with the highest probability\n",
        "                val_loss += batch_loss.item()\n",
        "\n",
        "            print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f} | Val Acc: {:3.6f} loss: {:3.6f}'.format(\n",
        "                epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader), val_acc/len(val_set), val_loss/len(val_loader)\n",
        "            ))\n",
        "\n",
        "            # if the model improves, save a checkpoint at this epoch\n",
        "            if val_acc > best_acc:\n",
        "                best_acc = val_acc\n",
        "                torch.save(model.state_dict(), model_path)\n",
        "                print('saving model with acc {:.3f}'.format(best_acc/len(val_set)))\n",
        "    else:\n",
        "        print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f}'.format(\n",
        "            epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader)\n",
        "        ))\n",
        "\n",
        "# if not validating, save the last epoch\n",
        "if len(val_set) == 0:\n",
        "    torch.save(model.state_dict(), model_path)\n",
        "    print('saving model at last epoch')\n"
      ],
      "execution_count": 11,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[001/020] Train Acc: 0.467302 Loss: 1.811661 | Val Acc: 0.567428 loss: 1.433065\n",
            "saving model with acc 0.567\n",
            "[002/020] Train Acc: 0.594383 Loss: 1.330666 | Val Acc: 0.628639 loss: 1.211098\n",
            "saving model with acc 0.629\n",
            "[003/020] Train Acc: 0.644506 Loss: 1.154064 | Val Acc: 0.660421 loss: 1.101216\n",
            "saving model with acc 0.660\n",
            "[004/020] Train Acc: 0.672216 Loss: 1.052246 | Val Acc: 0.676300 loss: 1.038718\n",
            "saving model with acc 0.676\n",
            "[005/020] Train Acc: 0.691347 Loss: 0.983104 | Val Acc: 0.685154 loss: 1.001852\n",
            "saving model with acc 0.685\n",
            "[006/020] Train Acc: 0.705615 Loss: 0.931955 | Val Acc: 0.689301 loss: 0.984177\n",
            "saving model with acc 0.689\n",
            "[007/020] Train Acc: 0.716344 Loss: 0.891687 | Val Acc: 0.694516 loss: 0.964627\n",
            "saving model with acc 0.695\n",
            "[008/020] Train Acc: 0.725881 Loss: 0.857907 | Val Acc: 0.697720 loss: 0.951889\n",
            "saving model with acc 0.698\n",
            "[009/020] Train Acc: 0.733717 Loss: 0.829495 | Val Acc: 0.696691 loss: 0.949866\n",
            "[010/020] Train Acc: 0.741151 Loss: 0.803701 | Val Acc: 0.699374 loss: 0.944832\n",
            "saving model with acc 0.699\n",
            "[011/020] Train Acc: 0.748049 Loss: 0.781106 | Val Acc: 0.697773 loss: 0.946494\n",
            "[012/020] Train Acc: 0.753793 Loss: 0.760380 | Val Acc: 0.702830 loss: 0.938236\n",
            "saving model with acc 0.703\n",
            "[013/020] Train Acc: 0.759404 Loss: 0.741234 | Val Acc: 0.700452 loss: 0.945627\n",
            "[014/020] Train Acc: 0.764573 Loss: 0.723574 | Val Acc: 0.702159 loss: 0.942118\n",
            "[015/020] Train Acc: 0.769470 Loss: 0.707325 | Val Acc: 0.704432 loss: 0.936154\n",
            "saving model with acc 0.704\n",
            "[016/020] Train Acc: 0.773687 Loss: 0.691314 | Val Acc: 0.701736 loss: 0.945713\n",
            "[017/020] Train Acc: 0.778676 Loss: 0.676633 | Val Acc: 0.701586 loss: 0.953081\n",
            "[018/020] Train Acc: 0.783106 Loss: 0.662425 | Val Acc: 0.699667 loss: 0.963290\n",
            "[019/020] Train Acc: 0.786395 Loss: 0.649180 | Val Acc: 0.700082 loss: 0.957681\n",
            "[020/020] Train Acc: 0.790643 Loss: 0.636623 | Val Acc: 0.699732 loss: 0.964269\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1Hi7jTn3PX-m"
      },
      "source": [
        "## Testing"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NfUECMFCn5VG"
      },
      "source": [
        "Create a testing dataset, and load model from the saved checkpoint."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "1PKjtAScPWtr",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "2e5c5981-04b6-4cbb-d3b7-6810a437af59"
      },
      "source": [
        "# create testing dataset\n",
        "test_set = TIMITDataset(test, None)\n",
        "test_loader = DataLoader(test_set, batch_size=BATCH_SIZE, shuffle=False)\n",
        "\n",
        "# create model and load weights from checkpoint\n",
        "model = Classifier().to(device)\n",
        "model.load_state_dict(torch.load(model_path))"
      ],
      "execution_count": 12,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<All keys matched successfully>"
            ]
          },
          "metadata": {},
          "execution_count": 12
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "940TtCCdoYd0"
      },
      "source": [
        "Make prediction."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "84HU5GGjPqR0"
      },
      "source": [
        "predict = []\n",
        "model.eval()    # set the model to evaluation mode\n",
        "with torch.no_grad():\n",
        "    for i, data in enumerate(test_loader):\n",
        "        inputs = data\n",
        "        inputs = inputs.to(device)\n",
        "        outputs = model(inputs)\n",
        "        _, test_pred = torch.max(outputs, 1)    # get the index of the class with the highest probability\n",
        "\n",
        "        for y in test_pred.cpu().numpy():\n",
        "            predict.append(y)"
      ],
      "execution_count": 13,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AWDf_C-omElb"
      },
      "source": [
        "Write prediction to a CSV file.\n",
        "\n",
        "After finish running this block, download the file `prediction.csv` from the files section on the left-hand side and submit it to Kaggle."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "GuljYSPHcZir"
      },
      "source": [
        "with open('prediction.csv', 'w') as f:\n",
        "    f.write('Id,Class\\n')\n",
        "    for i, y in enumerate(predict):\n",
        "        f.write('{},{}\\n'.format(i, y))"
      ],
      "execution_count": 14,
      "outputs": []
    }
  ]
}