{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Deep Learning Models -- A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks.\n", "- Author: Sebastian Raschka\n", "- GitHub Repository: https://github.com/rasbt/deeplearning-models" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "vY4SK0xKAJgm" }, "source": [ "# Model Zoo -- RNN with LSTM with Own Dataset" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "sc6xejhY-NzZ" }, "source": [ "Example notebook showing how to use an own CSV text dataset for training a simple RNN for sentiment classification (here: a binary classification problem with two labels, positive and negative) using LSTM (Long Short Term Memory) cells." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": {}, "colab_type": "code", "id": "moNmVfuvnImW" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sebastian Raschka \n", "\n", "CPython 3.6.8\n", "IPython 7.2.0\n", "\n", "torch 1.0.1.post2\n" ] } ], "source": [ "%load_ext watermark\n", "%watermark -a 'Sebastian Raschka' -v -p torch\n", "\n", "import torch\n", "import torch.nn.functional as F\n", "from torchtext import data\n", "from torchtext import datasets\n", "import time\n", "import random\n", "import pandas as pd\n", "\n", "torch.backends.cudnn.deterministic = True" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "GSRL42Qgy8I8" }, "source": [ "## General Settings" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": {}, "colab_type": "code", "id": "OvW1RgfepCBq" }, "outputs": [], "source": [ "RANDOM_SEED = 123\n", "torch.manual_seed(RANDOM_SEED)\n", "\n", "VOCABULARY_SIZE = 20000\n", "LEARNING_RATE = 1e-4\n", "BATCH_SIZE = 128\n", "NUM_EPOCHS = 15\n", "DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n", "\n", "EMBEDDING_DIM = 128\n", "HIDDEN_DIM = 256\n", "OUTPUT_DIM = 1" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "mQMmKUEisW4W" }, "source": [ "## Dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following cells will download the IMDB movie review dataset (http://ai.stanford.edu/~amaas/data/sentiment/) for positive-negative sentiment classification in as CSV-formatted file:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2019-11-28 19:47:46-- https://github.com/rasbt/python-machine-learning-book-2nd-edition/raw/master/code/ch08/movie_data.csv.gz\n", "Resolving github.com (github.com)... 140.82.113.3\n", "Connecting to github.com (github.com)|140.82.113.3|:443... connected.\n", "HTTP request sent, awaiting response... 302 Found\n", "Location: https://raw.githubusercontent.com/rasbt/python-machine-learning-book-2nd-edition/master/code/ch08/movie_data.csv.gz [following]\n", "--2019-11-28 19:47:46-- https://raw.githubusercontent.com/rasbt/python-machine-learning-book-2nd-edition/master/code/ch08/movie_data.csv.gz\n", "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.184.133\n", "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.184.133|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 26521894 (25M) [application/octet-stream]\n", "Saving to: ‘movie_data.csv.gz’\n", "\n", "movie_data.csv.gz 100%[===================>] 25.29M 10.5MB/s in 2.4s \n", "\n", "2019-11-28 19:47:49 (10.5 MB/s) - ‘movie_data.csv.gz’ saved [26521894/26521894]\n", "\n" ] } ], "source": [ "!wget https://github.com/rasbt/python-machine-learning-book-2nd-edition/raw/master/code/ch08/movie_data.csv.gz" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "!gunzip -f movie_data.csv.gz " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check that the dataset looks okay:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | review | \n", "sentiment | \n", "
---|---|---|
0 | \n", "In 1974, the teenager Martha Moxley (Maggie Gr... | \n", "1 | \n", "
1 | \n", "OK... so... I really like Kris Kristofferson a... | \n", "0 | \n", "
2 | \n", "***SPOILER*** Do not read this, if you think a... | \n", "0 | \n", "
3 | \n", "hi for all the people who have seen this wonde... | \n", "1 | \n", "
4 | \n", "I recently bought the DVD, forgetting just how... | \n", "0 | \n", "