{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "accelerator": "GPU", "colab": { "name": "“SHARE MLSpring2021 - HW2-1.ipynb”的副本", "provenance": [], "collapsed_sections": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "OYlaRwNu7ojq" }, "source": [ "# **Homework 2-1 Phoneme Classification**\n", "\n", "* Slides: https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/hw/HW02/HW02.pdf\n", "* Video (Chinese): https://youtu.be/PdjXnQbu2zo\n", "* Video (English): https://youtu.be/ESRr-VCykBs\n" ] }, { "cell_type": "markdown", "metadata": { "id": "emUd7uS7crTz" }, "source": [ "## The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT)\n", "The TIMIT corpus of reading speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems.\n", "\n", "This homework is a multiclass classification task, \n", "we are going to train a deep neural network classifier to predict the phonemes for each frame from the speech corpus TIMIT.\n", "\n", "link: https://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3" ] }, { "cell_type": "markdown", "metadata": { "id": "KVUGfWTo7_Oj" }, "source": [ "## Download Data\n", "Download data from google drive, then unzip it.\n", "\n", "You should have `timit_11/train_11.npy`, `timit_11/train_label_11.npy`, and `timit_11/test_11.npy` after running this block.

\n", "`timit_11/`\n", "- `train_11.npy`: training data
\n", "- `train_label_11.npy`: training label
\n", "- `test_11.npy`: testing data

\n", "\n", "**notes: if the google drive link is dead, you can download the data directly from Kaggle and upload it to the workspace**\n", "\n", "\n" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "OzkiMEcC3Foq", "outputId": "6500d6a7-7187-474f-ca8f-86157488a3c0" }, "source": [ "!gdown --id '1HPkcmQmFGu-3OknddKIa5dNDsR05lIQR' --output data.zip\n", "!unzip data.zip\n", "!ls" ], "execution_count": 1, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "/usr/local/lib/python3.7/dist-packages/gdown/cli.py:131: FutureWarning: Option `--id` was deprecated in version 4.3.1 and will be removed in 5.0. You don't need to pass it anymore to use a file ID.\n", " category=FutureWarning,\n", "Downloading...\n", "From: https://drive.google.com/uc?id=1HPkcmQmFGu-3OknddKIa5dNDsR05lIQR\n", "To: /content/data.zip\n", "100% 372M/372M [00:03<00:00, 95.5MB/s]\n", "Archive: data.zip\n", " creating: timit_11/\n", " inflating: timit_11/train_11.npy \n", " inflating: timit_11/test_11.npy \n", " inflating: timit_11/train_label_11.npy \n", "data.zip sample_data timit_11\n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "_L_4anls8Drv" }, "source": [ "## Preparing Data\n", "Load the training and testing data from the `.npy` file (NumPy array)." ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "IJjLT8em-y9G", "outputId": "ca14d59f-58b3-4f23-e324-d1fb2278583e" }, "source": [ "import numpy as np\n", "\n", "print('Loading data ...')\n", "\n", "data_root='./timit_11/'\n", "\n", "# read data from .npy file using np.load\n", "train = np.load(data_root + 'train_11.npy')\n", "train_label = np.load(data_root + 'train_label_11.npy')\n", "test = np.load(data_root + 'test_11.npy')\n", "\n", "print('Size of training data: {}'.format(train.shape))\n", "print('Size of testing data: {}'.format(test.shape))" ], "execution_count": 2, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Loading data ...\n", "Size of training data: (1229932, 429)\n", "Size of testing data: (451552, 429)\n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "us5XW_x6udZQ" }, "source": [ "## Create Dataset" ] }, { "cell_type": "code", "metadata": { "id": "Fjf5EcmJtf4e" }, "source": [ "import torch\n", "from torch.utils.data import Dataset\n", "\n", "class TIMITDataset(Dataset):\n", " def __init__(self, X, y=None):\n", " self.data = torch.from_numpy(X).float()\n", " if y is not None:\n", " y = y.astype(np.int)\n", " self.label = torch.LongTensor(y)\n", " else:\n", " self.label = None\n", "\n", " def __getitem__(self, idx):\n", " if self.label is not None:\n", " return self.data[idx], self.label[idx]\n", " else:\n", " return self.data[idx]\n", "\n", " def __len__(self):\n", " return len(self.data)\n" ], "execution_count": 3, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "otIC6WhGeh9v" }, "source": [ "Split the labeled data into a training set and a validation set, you can modify the variable `VAL_RATIO` to change the ratio of validation data." ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "sYqi_lAuvC59", "outputId": "e5cd332e-e122-4490-e1fe-c834907edda0" }, "source": [ "VAL_RATIO = 0.2\n", "\n", "percent = int(train.shape[0] * (1 - VAL_RATIO)) # pivot of train data and dev data\n", "train_x, train_y, val_x, val_y = train[:percent], train_label[:percent], train[percent:], train_label[percent:]\n", "print('Size of training set: {}'.format(train_x.shape))\n", "print('Size of validation set: {}'.format(val_x.shape))" ], "execution_count": 4, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Size of training set: (983945, 429)\n", "Size of validation set: (245987, 429)\n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "nbCfclUIgMTX" }, "source": [ "Create a data loader from the dataset, feel free to tweak the variable `BATCH_SIZE` here." ] }, { "cell_type": "code", "metadata": { "id": "RUCbQvqJurYc", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "c66c0f90-838e-45eb-f96a-a6b1a81b1868" }, "source": [ "BATCH_SIZE = 64\n", "\n", "from torch.utils.data import DataLoader\n", "\n", "train_set = TIMITDataset(train_x, train_y)\n", "val_set = TIMITDataset(val_x, val_y)\n", "train_loader = DataLoader(train_set, batch_size=BATCH_SIZE, shuffle=True) # only shuffle the training data\n", "val_loader = DataLoader(val_set, batch_size=BATCH_SIZE, shuffle=False)" ], "execution_count": 5, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:8: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.\n", "Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations\n", " \n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "_SY7X0lUgb50" }, "source": [ "Cleanup the unneeded variables to save memory.
\n", "\n", "**notes: if you need to use these variables later, then you may remove this block or clean up unneeded variables later
the data size is quite huge, so be aware of memory usage in colab**" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "y8rzkGraeYeN", "outputId": "9c326abe-0b60-4aed-c345-8dccbd81e9bb" }, "source": [ "del train, train_label, train_x, train_y, val_x, val_y\n", "\n", "# Garbage Collector\n", "import gc\n", "# gc.collect(generation=2): For variable of generation 0~2 of generational algorithm, delete memory for unreachable variable in them using mark-sweep algotirhm(travel algotirhm).\n", "gc.collect() # del变量后立即gc.collect,可以确保它们引用的内存被立即清除,以释放RAM" ], "execution_count": 6, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "153" ] }, "metadata": {}, "execution_count": 6 } ] }, { "cell_type": "markdown", "metadata": { "id": "IRqKNvNZwe3V" }, "source": [ "## Create Model" ] }, { "cell_type": "markdown", "metadata": { "id": "FYr1ng5fh9pA" }, "source": [ "Define model architecture, you are encouraged to change and experiment with the model architecture." ] }, { "cell_type": "code", "metadata": { "id": "lbZrwT6Ny0XL" }, "source": [ "import torch\n", "import torch.nn as nn\n", "\n", "class Classifier(nn.Module):\n", " def __init__(self):\n", " super(Classifier, self).__init__()\n", " self.layer1 = nn.Linear(429, 1024)\n", " self.layer2 = nn.Linear(1024, 512)\n", " self.layer3 = nn.Linear(512, 128)\n", " self.out = nn.Linear(128, 39) \n", "\n", " self.act_fn = nn.Sigmoid()\n", "\n", " def forward(self, x):\n", " x = self.layer1(x)\n", " x = self.act_fn(x)\n", "\n", " x = self.layer2(x)\n", " x = self.act_fn(x)\n", "\n", " x = self.layer3(x)\n", " x = self.act_fn(x)\n", "\n", " x = self.out(x)\n", " \n", " return x" ], "execution_count": 7, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "VRYciXZvPbYh" }, "source": [ "## Training" ] }, { "cell_type": "code", "metadata": { "id": "y114Vmm3Ja6o" }, "source": [ "# check device\n", "def get_device():\n", " return 'cuda' if torch.cuda.is_available() else 'cpu'" ], "execution_count": 8, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "sEX-yjHjhGuH" }, "source": [ "Fix random seeds for reproducibility." ] }, { "cell_type": "code", "metadata": { "id": "88xPiUnm0tAd" }, "source": [ "# fix random seed\n", "def same_seeds(seed):\n", " torch.manual_seed(seed)\n", " if torch.cuda.is_available():\n", " torch.cuda.manual_seed(seed)\n", " torch.cuda.manual_seed_all(seed)\n", " np.random.seed(seed)\n", " torch.backends.cudnn.benchmark = False\n", " torch.backends.cudnn.deterministic = True" ], "execution_count": 9, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "KbBcBXkSp6RA" }, "source": [ "Feel free to change the training parameters here." ] }, { "cell_type": "code", "metadata": { "id": "QTp3ZXg1yO9Y", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "630d8c09-f3f0-4b0d-d01f-b6f5785d139c" }, "source": [ "# fix random seed for reproducibility 可重复性\n", "same_seeds(0)\n", "\n", "# get device \n", "device = get_device()\n", "print(f'DEVICE: {device}')\n", "\n", "# training parameters\n", "num_epoch = 20 # number of training epoch\n", "learning_rate = 0.0001 # learning rate\n", "\n", "# the path where checkpoint saved\n", "model_path = './model.ckpt'\n", "\n", "# create model, define a loss function, and optimizer\n", "model = Classifier().to(device)\n", "criterion = nn.CrossEntropyLoss() \n", "optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)" ], "execution_count": 10, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "DEVICE: cuda\n" ] } ] }, { "cell_type": "code", "metadata": { "id": "CdMWsBs7zzNs", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "6a7ed041-e6e9-425e-c206-3148ca868cce" }, "source": [ "# start training\n", "\n", "best_acc = 0.0\n", "for epoch in range(num_epoch):\n", " train_acc = 0.0\n", " train_loss = 0.0\n", " val_acc = 0.0\n", " val_loss = 0.0\n", "\n", " # training\n", " model.train() # set the model to training mode\n", " for i, data in enumerate(train_loader):\n", " inputs, labels = data\n", " inputs, labels = inputs.to(device), labels.to(device)\n", " optimizer.zero_grad() \n", " outputs = model(inputs) \n", " batch_loss = criterion(outputs, labels)\n", " # torch.max(input, dim): Returns a namedtuple (values, indices) where values is the maximum value of each row of the input tensor in the given dimension dim.\n", " _, train_pred = torch.max(outputs, 1) # get the index of the class with the highest probability\n", " batch_loss.backward() \n", " optimizer.step() \n", "\n", " # Tensor.cpu(): Returns a copy of this object in CPU memory.\n", " # Tensor.==: 对应位置相同为1,不同为0!!,等同于Tensor.eq_()方法。\n", " # Tensor.sum(): Returns the sum of all elements in the input tensor.\n", " # Tensor.item(): Returns the value of this tensor as a standard Python number.\n", " train_acc += (train_pred.cpu() == labels.cpu()).sum().item() # 2 tensor -> 1 tensor -> 1 tensor of 1 element -> 1 number\n", " train_loss += batch_loss.item()\n", "\n", " # validation\n", " if len(val_set) > 0:\n", " model.eval() # set the model to evaluation mode\n", " with torch.no_grad():\n", " for i, data in enumerate(val_loader):\n", " inputs, labels = data\n", " inputs, labels = inputs.to(device), labels.to(device)\n", " outputs = model(inputs)\n", " batch_loss = criterion(outputs, labels) \n", " _, val_pred = torch.max(outputs, 1) \n", " \n", " val_acc += (val_pred.cpu() == labels.cpu()).sum().item() # get the index of the class with the highest probability\n", " val_loss += batch_loss.item()\n", "\n", " print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f} | Val Acc: {:3.6f} loss: {:3.6f}'.format(\n", " epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader), val_acc/len(val_set), val_loss/len(val_loader)\n", " ))\n", "\n", " # if the model improves, save a checkpoint at this epoch\n", " if val_acc > best_acc:\n", " best_acc = val_acc\n", " torch.save(model.state_dict(), model_path)\n", " print('saving model with acc {:.3f}'.format(best_acc/len(val_set)))\n", " else:\n", " print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f}'.format(\n", " epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader)\n", " ))\n", "\n", "# if not validating, save the last epoch\n", "if len(val_set) == 0:\n", " torch.save(model.state_dict(), model_path)\n", " print('saving model at last epoch')\n" ], "execution_count": 11, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "[001/020] Train Acc: 0.467302 Loss: 1.811661 | Val Acc: 0.567428 loss: 1.433065\n", "saving model with acc 0.567\n", "[002/020] Train Acc: 0.594383 Loss: 1.330666 | Val Acc: 0.628639 loss: 1.211098\n", "saving model with acc 0.629\n", "[003/020] Train Acc: 0.644506 Loss: 1.154064 | Val Acc: 0.660421 loss: 1.101216\n", "saving model with acc 0.660\n", "[004/020] Train Acc: 0.672216 Loss: 1.052246 | Val Acc: 0.676300 loss: 1.038718\n", "saving model with acc 0.676\n", "[005/020] Train Acc: 0.691347 Loss: 0.983104 | Val Acc: 0.685154 loss: 1.001852\n", "saving model with acc 0.685\n", "[006/020] Train Acc: 0.705615 Loss: 0.931955 | Val Acc: 0.689301 loss: 0.984177\n", "saving model with acc 0.689\n", "[007/020] Train Acc: 0.716344 Loss: 0.891687 | Val Acc: 0.694516 loss: 0.964627\n", "saving model with acc 0.695\n", "[008/020] Train Acc: 0.725881 Loss: 0.857907 | Val Acc: 0.697720 loss: 0.951889\n", "saving model with acc 0.698\n", "[009/020] Train Acc: 0.733717 Loss: 0.829495 | Val Acc: 0.696691 loss: 0.949866\n", "[010/020] Train Acc: 0.741151 Loss: 0.803701 | Val Acc: 0.699374 loss: 0.944832\n", "saving model with acc 0.699\n", "[011/020] Train Acc: 0.748049 Loss: 0.781106 | Val Acc: 0.697773 loss: 0.946494\n", "[012/020] Train Acc: 0.753793 Loss: 0.760380 | Val Acc: 0.702830 loss: 0.938236\n", "saving model with acc 0.703\n", "[013/020] Train Acc: 0.759404 Loss: 0.741234 | Val Acc: 0.700452 loss: 0.945627\n", "[014/020] Train Acc: 0.764573 Loss: 0.723574 | Val Acc: 0.702159 loss: 0.942118\n", "[015/020] Train Acc: 0.769470 Loss: 0.707325 | Val Acc: 0.704432 loss: 0.936154\n", "saving model with acc 0.704\n", "[016/020] Train Acc: 0.773687 Loss: 0.691314 | Val Acc: 0.701736 loss: 0.945713\n", "[017/020] Train Acc: 0.778676 Loss: 0.676633 | Val Acc: 0.701586 loss: 0.953081\n", "[018/020] Train Acc: 0.783106 Loss: 0.662425 | Val Acc: 0.699667 loss: 0.963290\n", "[019/020] Train Acc: 0.786395 Loss: 0.649180 | Val Acc: 0.700082 loss: 0.957681\n", "[020/020] Train Acc: 0.790643 Loss: 0.636623 | Val Acc: 0.699732 loss: 0.964269\n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "1Hi7jTn3PX-m" }, "source": [ "## Testing" ] }, { "cell_type": "markdown", "metadata": { "id": "NfUECMFCn5VG" }, "source": [ "Create a testing dataset, and load model from the saved checkpoint." ] }, { "cell_type": "code", "metadata": { "id": "1PKjtAScPWtr", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "2e5c5981-04b6-4cbb-d3b7-6810a437af59" }, "source": [ "# create testing dataset\n", "test_set = TIMITDataset(test, None)\n", "test_loader = DataLoader(test_set, batch_size=BATCH_SIZE, shuffle=False)\n", "\n", "# create model and load weights from checkpoint\n", "model = Classifier().to(device)\n", "model.load_state_dict(torch.load(model_path))" ], "execution_count": 12, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": {}, "execution_count": 12 } ] }, { "cell_type": "markdown", "metadata": { "id": "940TtCCdoYd0" }, "source": [ "Make prediction." ] }, { "cell_type": "code", "metadata": { "id": "84HU5GGjPqR0" }, "source": [ "predict = []\n", "model.eval() # set the model to evaluation mode\n", "with torch.no_grad():\n", " for i, data in enumerate(test_loader):\n", " inputs = data\n", " inputs = inputs.to(device)\n", " outputs = model(inputs)\n", " _, test_pred = torch.max(outputs, 1) # get the index of the class with the highest probability\n", "\n", " for y in test_pred.cpu().numpy():\n", " predict.append(y)" ], "execution_count": 13, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "AWDf_C-omElb" }, "source": [ "Write prediction to a CSV file.\n", "\n", "After finish running this block, download the file `prediction.csv` from the files section on the left-hand side and submit it to Kaggle." ] }, { "cell_type": "code", "metadata": { "id": "GuljYSPHcZir" }, "source": [ "with open('prediction.csv', 'w') as f:\n", " f.write('Id,Class\\n')\n", " for i, y in enumerate(predict):\n", " f.write('{},{}\\n'.format(i, y))" ], "execution_count": 14, "outputs": [] } ] }