{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Assignment 2.4: Text classification via CNN (20 points)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this assignment you should perform sentiment analysis of the IMDB reviews based on CNN architecture. Read carefully [Convolutional Neural Networks for Sentence Classification](https://arxiv.org/pdf/1408.5882.pdf) by Yoon Kim." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import torch\n", "\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "import torch.optim as optim\n", "\n", "from torchtext import datasets\n", "from torchtext.data import Field, LabelField\n", "from torchtext.data import Iterator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Preparing Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "TEXT = Field(sequential=True, lower=True, batch_first=True)\n", "LABEL = LabelField(batch_first=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train, tst = datasets.IMDB.splits(TEXT, LABEL)\n", "trn, vld = train.split()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# %%time\n", "TEXT.build_vocab(trn)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "LABEL.build_vocab(trn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating the Iterator (2 points)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define an iterator here" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_iter, val_iter, test_iter = \n", "# =============================\n", "# Write code here\n", "# =============================" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define CNN-based text classification model (8 points)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class CNN(nn.Module):\n", " def __init__(self, V, D, kernel_sizes, dropout=0.5):\n", " super(CNN, self).__init__()\n", " \n", " # =============================\n", " # Write code here\n", " # =============================\n", " \n", " def forward(self, x):\n", " \n", " # =============================\n", " # Write code here\n", " # =============================\n", " \n", " return logit" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "kernel_sizes = [3,4,5]\n", "vocab_size = len(TEXT.vocab)\n", "dropout = 0.5\n", "dim = 300\n", "\n", "model = CNN(vocab_size, dim, kernel_sizes, dropout)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model.cuda()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The training loop (3 points)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define the optimization function and the loss functions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "opt = # your code goes here\n", "loss_func = # your code goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Think carefully about the stopping criteria. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "epochs = # your code goes here" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "for epoch in range(1, epochs + 1):\n", " running_loss = 0.0\n", " running_corrects = 0\n", " model.train() \n", " for batch in train_iter: \n", " \n", " x = batch.text\n", " y = batch.label\n", " \n", " opt.zero_grad()\n", " preds = model(x)\n", " loss = loss_func(preds, y)\n", " loss.backward()\n", " opt.step()\n", " running_loss += loss.item()\n", " \n", " epoch_loss = running_loss / len(trn)\n", " \n", " val_loss = 0.0\n", " model.eval()\n", " correct = 0\n", " total = 0 \n", " for batch in val_iter:\n", " \n", " x = batch.text\n", " y = batch.label\n", " \n", " preds = model(x)\n", " loss = loss_func(preds, y)\n", " val_loss += loss.item()\n", " \n", " val_loss /= len(vld)\n", " \n", " print('Epoch: {}, Training Loss: {}, Validation Loss: {}'.format(epoch, epoch_loss, val_loss))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Calculate performance of the trained model (2 points)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for batch in test_iter:\n", " x = batch.text\n", " y = batch.label" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Write down the calculated performance\n", "\n", "### Accuracy:\n", "### Precision:\n", "### Recall:\n", "### F1:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Experiments (5 points)\n", "\n", "Experiment with the model and achieve better results. Implement and describe your experiments in details, mention what was helpful." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. ?\n", "### 2. ?\n", "### 3. ?" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 4 }