{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "xT7MKZuMRaCg"
   },
   "source": [
    "### Author: [Pratik Sharma](https://github.com/sharmapratik88/)\n",
    "## Project 10 - Natural Language Processing - Sentiment Analysis\n",
    "\n",
    "* Generate Word Embedding and retrieve outputs of each layer with Keras based on the Classification task.\n",
    "* Word embedding are a type of word representation that allows words with similar meaning to have a similar representation.\n",
    "* It is a distributed representation for the text that is perhaps one of the key breakthroughs for the impressive performance of deep learning methods on challenging natural language processing problems.\n",
    "* We will use the IMDb dataset to learn word embedding as we train our dataset. \n",
    "* This dataset contains 25,000 movie reviews from IMDB, labeled with a sentiment (positive or negative).\n",
    "\n",
    "**Data Description**\n",
    "\n",
    "* The Dataset of 25,000 movie reviews from IMDB, labeled by sentiment (positive/negative). Reviews have been preprocessed, and each review is encoded as a sequence of word indexes (integers).\n",
    "* For convenience, the words are indexed by their frequency in the dataset, meaning the for that has index 1 is the most frequent word.\n",
    "* Use the first 20 words from each review to speed up training, using a max vocab size of 10,000.\n",
    "* As a convention, \"0\" does not stand for a specific word, but instead is used to encode any unknown word."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Wq4RCyyPSYRp"
   },
   "source": [
    "### Import Packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "OiJCJ6pmri5r",
    "outputId": "3ca7b205-d0d6-4d92-f377-014eea264daf"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n"
     ]
    }
   ],
   "source": [
    "# Mounting Google Drive\n",
    "from google.colab import drive\n",
    "drive.mount('/content/drive')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "QblnEOger1tI"
   },
   "outputs": [],
   "source": [
    "# Setting the current working directory\n",
    "import os; os.chdir('drive/My Drive/Great Learning/NLP')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "MSu6Pr6MsloO",
    "outputId": "1df95099-47ac-4fa4-8ace-528a27df9334"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Using TensorFlow backend.\n"
     ]
    }
   ],
   "source": [
    "# Import packages\n",
    "import pandas as pd, numpy as np\n",
    "import tensorflow as tf\n",
    "assert tf.__version__ >= '2.0'\n",
    "\n",
    "from itertools import islice\n",
    "\n",
    "# Keras\n",
    "from keras.layers import Dense, Embedding, LSTM, Dropout, MaxPooling1D, Conv1D\n",
    "from keras.preprocessing.sequence import pad_sequences\n",
    "from keras.models import Model, Sequential\n",
    "from keras.preprocessing import sequence\n",
    "from keras.datasets import imdb\n",
    "\n",
    "from keras.callbacks import ModelCheckpoint, EarlyStopping\n",
    "\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.metrics import classification_report\n",
    "\n",
    "# Suppress warnings\n",
    "import warnings; warnings.filterwarnings('ignore')\n",
    "\n",
    "random_state = 42\n",
    "np.random.seed(random_state)\n",
    "tf.random.set_seed(random_state)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "qMEsHYrWxdtk"
   },
   "source": [
    "### Loading Dataset - Train & Test Split"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 252
    },
    "colab_type": "code",
    "id": "h0g381XzeCyz",
    "outputId": "52207c81-7fbb-43c8-b7bd-03f0921a8cb6"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "------------------------------------------------------------ \n",
      "Number of rows in training dataset: 32000\n",
      "Number of columns in training dataset: 300\n",
      "Number of unique words in training dataset: 9999\n",
      "------------------------------------------------------------ \n",
      "Number of rows in validation dataset: 8000\n",
      "Number of columns in validation dataset: 300\n",
      "Number of unique words in validation dataset: 9984\n",
      "------------------------------------------------------------ \n",
      "Number of rows in test dataset: 10000\n",
      "Number of columns in test dataset: 300\n",
      "Number of unique words in test dataset: 9995\n",
      "------------------------------------------------------------ \n",
      "Unique Categories: (array([0, 1]), array([0, 1]), array([0, 1]))\n"
     ]
    }
   ],
   "source": [
    "vocab_size = 10000\n",
    "maxlen = 300\n",
    "(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words = vocab_size)\n",
    "\n",
    "x_train = pad_sequences(x_train, maxlen = maxlen, padding = 'pre')\n",
    "x_test =  pad_sequences(x_test, maxlen = maxlen, padding = 'pre')\n",
    "\n",
    "X = np.concatenate((x_train, x_test), axis = 0)\n",
    "y = np.concatenate((y_train, y_test), axis = 0)\n",
    "\n",
    "x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = random_state, shuffle = True)\n",
    "x_train, x_valid, y_train, y_valid = train_test_split(x_train, y_train, test_size = 0.2, random_state = random_state, shuffle = True)\n",
    "\n",
    "print('---'*20, f'\\nNumber of rows in training dataset: {x_train.shape[0]}')\n",
    "print(f'Number of columns in training dataset: {x_train.shape[1]}')\n",
    "print(f'Number of unique words in training dataset: {len(np.unique(np.hstack(x_train)))}')\n",
    "\n",
    "\n",
    "print('---'*20, f'\\nNumber of rows in validation dataset: {x_valid.shape[0]}')\n",
    "print(f'Number of columns in validation dataset: {x_valid.shape[1]}')\n",
    "print(f'Number of unique words in validation dataset: {len(np.unique(np.hstack(x_valid)))}')\n",
    "\n",
    "\n",
    "print('---'*20, f'\\nNumber of rows in test dataset: {x_test.shape[0]}')\n",
    "print(f'Number of columns in test dataset: {x_test.shape[1]}')\n",
    "print(f'Number of unique words in test dataset: {len(np.unique(np.hstack(x_test)))}')\n",
    "\n",
    "\n",
    "print('---'*20, f'\\nUnique Categories: {np.unique(y_train), np.unique(y_valid), np.unique(y_test)}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "evBIH06AhdCO"
   },
   "source": [
    "### Get word index and create a key-value pair for word and word id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 104
    },
    "colab_type": "code",
    "id": "FJhKAPne7gST",
    "outputId": "38fec127-6795-4123-df8c-9f9798909346"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Review: <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <START> the only possible way to enjoy this flick is to bang your head against the wall allow some internal <UNK> of the brain let a bunch of your brain cells die and once you are officially mentally retarded perhaps then you might enjoy this film br br the only saving grace was the story between <UNK> and stephanie govinda was excellent in the role of the cab driver and so was the brit girl perhaps if they would have created the whole movie on their <UNK> in india and how they eventually fall in love would have made it a much more enjoyable film br br the only reason i gave it a 3 rating is because of <UNK> and his ability as an actor when it comes to comedy br br <UNK> <UNK> and anil kapoor were wasted needlessly plus the scene at <UNK> of the re union was just too much to <UNK> being an international <UNK> in the post 9 11 world anil kapoor would have got himself shot much before he even reached the sky bridge to <UNK> his true love but then again the point of the movie was to defy logic gravity physics and throw an egg on the face of the general audience br br watch it at your own peril at least i know i have been <UNK> for life\n",
      "Actual Sentiment: 0\n",
      "------------------------------------------------------------------------------------------ \n",
      " [(34704, 'fawn'), (52009, 'tsukino'), (52010, 'nunnery'), (16819, 'sonja'), (63954, 'vani'), (1411, 'woods'), (16118, 'spiders'), (2348, 'hanging'), (2292, 'woody'), (52011, 'trawling'), (52012, \"hold's\"), (11310, 'comically'), (40833, 'localized'), (30571, 'disobeying'), (52013, \"'royale\"), (40834, \"harpo's\"), (52014, 'canet'), (19316, 'aileen'), (52015, 'acurately'), (52016, \"diplomat's\"), (25245, 'rickman'), (6749, 'arranged'), (52017, 'rumbustious'), (52018, 'familiarness'), (52019, \"spider'\"), (68807, 'hahahah'), (52020, \"wood'\"), (40836, 'transvestism'), (34705, \"hangin'\"), (2341, 'bringing'), (40837, 'seamier'), (34706, 'wooded'), (52021, 'bravora'), (16820, 'grueling'), (1639, 'wooden'), (16821, 'wednesday'), (52022, \"'prix\"), (34707, 'altagracia'), (52023, 'circuitry'), (11588, 'crotch'), (57769, 'busybody'), (52024, \"tart'n'tangy\"), (14132, 'burgade'), (52026, 'thrace'), (11041, \"tom's\"), (52028, 'snuggles'), (29117, 'francesco'), (52030, 'complainers'), (52128, 'templarios'), (40838, '272')]\n"
     ]
    }
   ],
   "source": [
    "def decode_review(x, y):\n",
    "  w2i = imdb.get_word_index()                                \n",
    "  w2i = {k:(v + 3) for k, v in w2i.items()}\n",
    "  w2i['<PAD>'] = 0\n",
    "  w2i['<START>'] = 1\n",
    "  w2i['<UNK>'] = 2\n",
    "  i2w = {i: w for w, i in w2i.items()}\n",
    "\n",
    "  ws = (' '.join(i2w[i] for i in x))\n",
    "  print(f'Review: {ws}')\n",
    "  print(f'Actual Sentiment: {y}')\n",
    "  return w2i, i2w\n",
    "\n",
    "w2i, i2w = decode_review(x_train[0], y_train[0])\n",
    "\n",
    "# get first 50 key, value pairs from id to word dictionary\n",
    "print('---'*30, '\\n', list(islice(i2w.items(), 0, 50)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "dybtUgUReCy8"
   },
   "source": [
    "### Build Keras Embedding Layer Model\n",
    "We can think of the Embedding layer as a dicionary that maps a index assigned to a word to a word vector. This layer is very flexible and can be used in a few ways:\n",
    "\n",
    "* The embedding layer can be used at the start of a larger deep learning model. \n",
    "* Also we could load pre-train word embeddings into the embedding layer when we create our model.\n",
    "* Use the embedding layer to train our own word2vec models.\n",
    "\n",
    "The keras embedding layer doesn't require us to onehot encode our words, instead we have to give each word a unqiue intger number as an id. For the imdb dataset we've loaded this has already been done, but if this wasn't the case we could use sklearn [LabelEncoder](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 470
    },
    "colab_type": "code",
    "id": "mSCO8ltUsOKE",
    "outputId": "8c8bdc9d-1871-47d2-eb28-0f46331a8418"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model: \"sequential_1\"\n",
      "_________________________________________________________________\n",
      "Layer (type)                 Output Shape              Param #   \n",
      "=================================================================\n",
      "embedding_1 (Embedding)      (None, 300, 256)          2560000   \n",
      "_________________________________________________________________\n",
      "dropout_1 (Dropout)          (None, 300, 256)          0         \n",
      "_________________________________________________________________\n",
      "conv1d_1 (Conv1D)            (None, 300, 256)          327936    \n",
      "_________________________________________________________________\n",
      "conv1d_2 (Conv1D)            (None, 300, 128)          163968    \n",
      "_________________________________________________________________\n",
      "max_pooling1d_1 (MaxPooling1 (None, 150, 128)          0         \n",
      "_________________________________________________________________\n",
      "conv1d_3 (Conv1D)            (None, 150, 64)           41024     \n",
      "_________________________________________________________________\n",
      "max_pooling1d_2 (MaxPooling1 (None, 75, 64)            0         \n",
      "_________________________________________________________________\n",
      "lstm_1 (LSTM)                (None, 75)                42000     \n",
      "_________________________________________________________________\n",
      "dense_1 (Dense)              (None, 1)                 76        \n",
      "=================================================================\n",
      "Total params: 3,135,004\n",
      "Trainable params: 3,135,004\n",
      "Non-trainable params: 0\n",
      "_________________________________________________________________\n",
      "None\n"
     ]
    }
   ],
   "source": [
    "# Model\n",
    "model = Sequential()\n",
    "model.add(Embedding(vocab_size, 256, input_length = maxlen))\n",
    "model.add(Dropout(0.25))\n",
    "model.add(Conv1D(256, 5, padding = 'same', activation = 'relu', strides = 1))\n",
    "model.add(Conv1D(128, 5, padding = 'same', activation = 'relu', strides = 1))\n",
    "model.add(MaxPooling1D(pool_size = 2))\n",
    "model.add(Conv1D(64, 5, padding = 'same', activation = 'relu', strides = 1))\n",
    "model.add(MaxPooling1D(pool_size = 2))\n",
    "model.add(LSTM(75))\n",
    "model.add(Dense(1, activation = 'sigmoid'))\n",
    "model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])\n",
    "print(model.summary())\n",
    "\n",
    "# Adding callbacks\n",
    "es = EarlyStopping(monitor = 'val_loss', mode = 'min', verbose = 1, patience = 0)  \n",
    "mc = ModelCheckpoint('imdb_model.h5', monitor = 'val_loss', mode = 'min', save_best_only = True, verbose = 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 218
    },
    "colab_type": "code",
    "id": "fJtm93UciQo7",
    "outputId": "a82f268f-7056-4ed1-fbf3-a18e476a0d50"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train on 32000 samples, validate on 8000 samples\n",
      "Epoch 1/3\n",
      "32000/32000 [==============================] - 77s 2ms/step - loss: 0.3472 - accuracy: 0.8342 - val_loss: 0.2467 - val_accuracy: 0.8984\n",
      "\n",
      "Epoch 00001: val_loss improved from inf to 0.24669, saving model to imdb_model.h5\n",
      "Epoch 2/3\n",
      "32000/32000 [==============================] - 75s 2ms/step - loss: 0.1824 - accuracy: 0.9311 - val_loss: 0.2559 - val_accuracy: 0.8997\n",
      "\n",
      "Epoch 00002: val_loss did not improve from 0.24669\n",
      "Epoch 00002: early stopping\n",
      "10000/10000 [==============================] - 2s 190us/step\n",
      "Test accuracy: 90.14%\n"
     ]
    }
   ],
   "source": [
    "# Fit the model\n",
    "model.fit(x_train, y_train, validation_data = (x_valid, y_valid), epochs = 3, batch_size = 64, verbose = True, callbacks = [es, mc])\n",
    "\n",
    "# Evaluate the model\n",
    "scores = model.evaluate(x_test, y_test, batch_size = 64)\n",
    "print('Test accuracy: %.2f%%' % (scores[1]*100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 185
    },
    "colab_type": "code",
    "id": "Fyx6fuOqIpEP",
    "outputId": "824cda87-2374-4203-c9c8-85f7b20e628e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Classification Report:\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "           0       0.92      0.89      0.90      5086\n",
      "           1       0.89      0.92      0.90      4914\n",
      "\n",
      "    accuracy                           0.90     10000\n",
      "   macro avg       0.90      0.90      0.90     10000\n",
      "weighted avg       0.90      0.90      0.90     10000\n",
      "\n"
     ]
    }
   ],
   "source": [
    "y_pred = model.predict_classes(x_test)\n",
    "print(f'Classification Report:\\n{classification_report(y_pred, y_test)}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Igq8Qm8GeCzG"
   },
   "source": [
    "### Retrive output of each layer in keras for a given single test sample from the trained model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 1000
    },
    "colab_type": "code",
    "id": "0AqOnLa2eCzH",
    "outputId": "a88b3cdd-01df-4194-b9d3-817789d15bf7"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      " ---------------------------------------- embedding_1 layer ---------------------------------------- \n",
      "\n",
      "[[[ 4.74077724e-02 -1.45893563e-02 -1.92809459e-02 ...  1.59389190e-02\n",
      "   -3.90756801e-02 -6.46728724e-02]\n",
      "  [ 4.74077724e-02 -1.45893563e-02 -1.92809459e-02 ...  1.59389190e-02\n",
      "   -3.90756801e-02 -6.46728724e-02]\n",
      "  [ 4.74077724e-02 -1.45893563e-02 -1.92809459e-02 ...  1.59389190e-02\n",
      "   -3.90756801e-02 -6.46728724e-02]\n",
      "  ...\n",
      "  [-5.12011871e-02  2.73237063e-04 -3.15764773e-05 ...  4.48421352e-02\n",
      "    2.12928746e-02 -1.26087647e-02]\n",
      "  [ 6.66740909e-02  1.52700637e-02 -7.01705664e-02 ... -9.86870304e-02\n",
      "    4.93544117e-02 -3.51153836e-02]\n",
      "  [-3.40692252e-02 -4.36996408e-02  4.43636142e-02 ...  1.14621185e-02\n",
      "    2.80509088e-02 -2.31574550e-02]]]\n",
      "\n",
      " ---------------------------------------- dropout_1 layer ---------------------------------------- \n",
      "\n",
      "[[[ 4.74077724e-02 -1.45893563e-02 -1.92809459e-02 ...  1.59389190e-02\n",
      "   -3.90756801e-02 -6.46728724e-02]\n",
      "  [ 4.74077724e-02 -1.45893563e-02 -1.92809459e-02 ...  1.59389190e-02\n",
      "   -3.90756801e-02 -6.46728724e-02]\n",
      "  [ 4.74077724e-02 -1.45893563e-02 -1.92809459e-02 ...  1.59389190e-02\n",
      "   -3.90756801e-02 -6.46728724e-02]\n",
      "  ...\n",
      "  [-5.12011871e-02  2.73237063e-04 -3.15764773e-05 ...  4.48421352e-02\n",
      "    2.12928746e-02 -1.26087647e-02]\n",
      "  [ 6.66740909e-02  1.52700637e-02 -7.01705664e-02 ... -9.86870304e-02\n",
      "    4.93544117e-02 -3.51153836e-02]\n",
      "  [-3.40692252e-02 -4.36996408e-02  4.43636142e-02 ...  1.14621185e-02\n",
      "    2.80509088e-02 -2.31574550e-02]]]\n",
      "\n",
      " ---------------------------------------- conv1d_1 layer ---------------------------------------- \n",
      "\n",
      "[[[0.         0.         0.         ... 0.         0.         0.        ]\n",
      "  [0.         0.         0.         ... 0.         0.         0.        ]\n",
      "  [0.         0.         0.         ... 0.         0.         0.        ]\n",
      "  ...\n",
      "  [0.         0.03142974 0.         ... 0.         0.         0.        ]\n",
      "  [0.         0.         0.         ... 0.         0.04410822 0.        ]\n",
      "  [0.00265946 0.         0.         ... 0.         0.         0.        ]]]\n",
      "\n",
      " ---------------------------------------- conv1d_2 layer ---------------------------------------- \n",
      "\n",
      "[[[0.         0.         0.         ... 0.         0.         0.00992592]\n",
      "  [0.         0.         0.         ... 0.         0.         0.00992592]\n",
      "  [0.         0.         0.         ... 0.         0.         0.00992592]\n",
      "  ...\n",
      "  [0.         0.         0.         ... 0.         0.00156896 0.05785441]\n",
      "  [0.         0.         0.         ... 0.         0.         0.04214466]\n",
      "  [0.         0.         0.         ... 0.         0.02336704 0.0316269 ]]]\n",
      "\n",
      " ---------------------------------------- max_pooling1d_1 layer ---------------------------------------- \n",
      "\n",
      "[[[0.         0.         0.         ... 0.         0.         0.00992592]\n",
      "  [0.         0.         0.         ... 0.         0.         0.00992592]\n",
      "  [0.         0.         0.         ... 0.         0.         0.00992592]\n",
      "  ...\n",
      "  [0.         0.         0.         ... 0.         0.00145619 0.        ]\n",
      "  [0.         0.         0.         ... 0.         0.00156896 0.05785441]\n",
      "  [0.         0.         0.         ... 0.         0.02336704 0.04214466]]]\n",
      "\n",
      " ---------------------------------------- conv1d_3 layer ---------------------------------------- \n",
      "\n",
      "[[[0.         0.         0.         ... 0.         0.         0.        ]\n",
      "  [0.         0.         0.         ... 0.         0.         0.        ]\n",
      "  [0.         0.         0.         ... 0.         0.         0.        ]\n",
      "  ...\n",
      "  [0.06621806 0.         0.         ... 0.         0.         0.        ]\n",
      "  [0.03402259 0.         0.         ... 0.         0.         0.        ]\n",
      "  [0.02756308 0.         0.         ... 0.         0.00653153 0.        ]]]\n",
      "\n",
      " ---------------------------------------- max_pooling1d_2 layer ---------------------------------------- \n",
      "\n",
      "[[[0.         0.         0.         ... 0.         0.         0.        ]\n",
      "  [0.         0.         0.         ... 0.         0.         0.        ]\n",
      "  [0.         0.         0.         ... 0.         0.         0.        ]\n",
      "  ...\n",
      "  [0.09239879 0.         0.         ... 0.         0.14975896 0.        ]\n",
      "  [0.10864462 0.         0.         ... 0.         0.         0.        ]\n",
      "  [0.03402259 0.         0.         ... 0.         0.00653153 0.        ]]]\n",
      "\n",
      " ---------------------------------------- lstm_1 layer ---------------------------------------- \n",
      "\n",
      "[[ 0.50351334 -0.02411361 -0.5454778  -0.7051705   0.7910843   0.60205024\n",
      "  -0.5404606   0.52277875 -0.7547373   0.50682384  0.7195332  -0.53433776\n",
      "  -0.74312335 -0.02459037 -0.11192464 -0.7976935   0.15080199  0.57083166\n",
      "   0.01432618 -0.7756157  -0.5437074  -0.6041164   0.02231176 -0.55837494\n",
      "   0.20796183 -0.75535905 -0.6613373   0.7095603   0.32622436 -0.52310455\n",
      "   0.11562354  0.5646972  -0.6707359  -0.5060398   0.7623417  -0.6992541\n",
      "  -0.04636298  0.43525052 -0.7030687  -0.02964382  0.0145278  -0.7140254\n",
      "  -0.1343891  -0.38830882 -0.00243724 -0.40578288 -0.01659107  0.00169612\n",
      "   0.08863628  0.6849975  -0.62493294  0.6887381   0.00755857 -0.48001266\n",
      "  -0.5722563  -0.571908    0.70052683  0.07025653 -0.08210693 -0.58302814\n",
      "  -0.00641777  0.7430728  -0.1640142   0.6192563  -0.21381459  0.03986506\n",
      "  -0.2063878   0.00244516  0.0595973   0.24460196 -0.68142074 -0.64712524\n",
      "   0.27605036 -0.6704749   0.03478413]]\n",
      "\n",
      " ---------------------------------------- dense_1 layer ---------------------------------------- \n",
      "\n",
      "[[0.00859277]]\n"
     ]
    }
   ],
   "source": [
    "sample_x_test = x_test[np.random.randint(10000)]\n",
    "for layer in model.layers:\n",
    "\n",
    "    model_layer = Model(inputs = model.input, outputs = model.get_layer(layer.name).output)\n",
    "    output = model_layer.predict(sample_x_test.reshape(1,-1))\n",
    "    print('\\n','--'*20, layer.name, 'layer', '--'*20, '\\n')\n",
    "    print(output)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 87
    },
    "colab_type": "code",
    "id": "nANxPDVbxL_M",
    "outputId": "2c115657-8d1c-46a1-e20c-8ed09c3bd4ab"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Review: <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <START> this movie was great and i was waiting for it for a long time when it finally came out i was really happy and looked forward to a 10 out of 10 it was great and lived up to my potential the performances were great on the part of the adults and most of the kids the only bad performance was by milo himself there was one problem that i encountered with this and others like it movie all of the characters i wanted to live were getting killed overall i give this movie an excellent 9 out of 10 maybe we should <UNK> better people to kill next time though ok\n",
      "Actual Sentiment: 1\n",
      "Predicted sentiment: 1\n"
     ]
    }
   ],
   "source": [
    "decode_review(x_test[10], y_test[10])\n",
    "print(f'Predicted sentiment: {y_pred[10][0]}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "LRVgcJiCm6FW"
   },
   "source": [
    "### Conclusion\n",
    "* Sentiment classification task on the IMDB dataset, on test dataset,\n",
    "  * Accuracy: > 90%\n",
    "  * F1-score: > 90%\n",
    "  * Loss of 0.25"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "collapsed_sections": [],
   "name": "09_Natural Language Processing_Sentiment Analysis.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}