{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "冒險21_打造 RNN 情意分析函數學習機",
      "provenance": [],
      "collapsed_sections": [],
      "authorship_tag": "ABX9TyMuNoxkHeLlcLvpYz9y5WwW",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/yenlung/Python-AI-Book/blob/main/%E5%86%92%E9%9A%AA21_%E6%89%93%E9%80%A0RNN%E6%83%85%E6%84%8F%E5%88%86%E6%9E%90%E5%87%BD%E6%95%B8%E5%AD%B8%E7%BF%92%E6%A9%9F.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "PMWDNxo3N6z_"
      },
      "outputs": [],
      "source": [
        "%matplotlib inline\n",
        "\n",
        "import numpy as np\n",
        "import pandas as pd\n",
        "import matplotlib.pyplot as plt"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pU0MI3Jm-vej"
      },
      "source": [
        "### 1. 讀入深度學習套件"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "yGQHjFkg-vek"
      },
      "source": [
        "from tensorflow.keras.preprocessing import sequence\n",
        "from tensorflow.keras.models import Sequential\n",
        "from tensorflow.keras.layers import Dense, Embedding\n",
        "from tensorflow.keras.layers import LSTM\n",
        "from tensorflow.keras.datasets import imdb"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "upzJSb2Y-vek"
      },
      "source": [
        "### 2. 讀入數據\n",
        "\n",
        "一般自然語言處理, 我們會限制最大要使用的字數。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "TMRJOWWD-vek",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "a7c4f0c6-1fab-4225-da74-18c6e57f74b2"
      },
      "source": [
        "(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz\n",
            "17465344/17464789 [==============================] - 0s 0us/step\n",
            "17473536/17464789 [==============================] - 0s 0us/step\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "print(f'訓練資料筆數:{len(x_train)}')\n",
        "print(f'測試資料筆數:{len(x_test)}')"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "ru3eSKVZmLLn",
        "outputId": "7121387f-ec85-423e-94f3-3abd3dea9b9f"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "訓練資料筆數:25000\n",
            "測試資料筆數:25000\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "q2HndB0cAH_d"
      },
      "source": [
        "注意每筆評論的長度當然是不一樣的。"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "print(f'第一筆訓練資料的長度:{len(x_train[0])}')\n",
        "print(f'第二筆測試資料的長度:{len(x_train[1])}')"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "-mXnNk5HnIkK",
        "outputId": "6d7227ca-fa2a-4736-9300-54ca8ff40b19"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "第一筆訓練資料的長度:218\n",
            "第二筆測試資料的長度:189\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "TwmfAdEm-ven",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "8a8fd643-8366-4f6b-d181-d19873e8ab4f"
      },
      "source": [
        "print(f'第一筆資料的標籤:{y_train[0]}(正評)')\n",
        "print(f'第二筆資料的標籤:{y_train[1]}(負評)')"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "第一筆資料的標籤:1(正評)\n",
            "第二筆資料的標籤:0(負評)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kY0bt0Mv-veo"
      },
      "source": [
        "### 3. 資料處理\n",
        "\n",
        "雖然我們可以做真的 seq2seq, 可是資料長度不一樣對計算上有麻煩, 因此平常還是會固定一定長度, 其餘補 0。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ynXfhV0S-veo"
      },
      "source": [
        "x_train = sequence.pad_sequences(x_train, maxlen=100)\n",
        "x_test = sequence.pad_sequences(x_test, maxlen=100)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gLw1t5YZ-vep"
      },
      "source": [
        "### 4. step 01: 打造一個函數學習機"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "T3WoAWzg-vep"
      },
      "source": [
        "model = Sequential()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "paCqtY9E-vep"
      },
      "source": [
        "model.add(Embedding(10000, 128))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "e3ZFi6IO-vep"
      },
      "source": [
        "model.add(LSTM(128))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "WYMx8FF7-veq"
      },
      "source": [
        "model.add(Dense(1, activation='sigmoid'))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XbmruxHeAxT8"
      },
      "source": [
        "#### 組裝"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Hc3kJo8u-veq"
      },
      "source": [
        "model.compile(loss='binary_crossentropy',\n",
        "             optimizer='adam',\n",
        "             metrics=['accuracy'])"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7_s0X6DHA6A-"
      },
      "source": [
        "#### 欣賞我們的 model"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "vKwxPjN6-veq",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "f3d02e97-00d9-4916-c46b-9bd25ec72dc8"
      },
      "source": [
        "model.summary()"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Model: \"sequential\"\n",
            "_________________________________________________________________\n",
            " Layer (type)                Output Shape              Param #   \n",
            "=================================================================\n",
            " embedding (Embedding)       (None, None, 128)         1280000   \n",
            "                                                                 \n",
            " lstm (LSTM)                 (None, 128)               131584    \n",
            "                                                                 \n",
            " dense (Dense)               (None, 1)                 129       \n",
            "                                                                 \n",
            "=================================================================\n",
            "Total params: 1,411,713\n",
            "Trainable params: 1,411,713\n",
            "Non-trainable params: 0\n",
            "_________________________________________________________________\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "O19zWWcl-veq"
      },
      "source": [
        "### 5. step 02: 訓練"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "iIL00x1u-veq",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "efbeea1d-3a67-4a12-e40d-c10582014d3f"
      },
      "source": [
        "model.fit(x_train, y_train, batch_size=32, epochs=10,\n",
        "          validation_data=(x_test, y_test))"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Epoch 1/10\n",
            "782/782 [==============================] - 22s 24ms/step - loss: 0.4183 - accuracy: 0.8095 - val_loss: 0.3533 - val_accuracy: 0.8436\n",
            "Epoch 2/10\n",
            "782/782 [==============================] - 18s 24ms/step - loss: 0.2572 - accuracy: 0.8975 - val_loss: 0.3465 - val_accuracy: 0.8459\n",
            "Epoch 3/10\n",
            "782/782 [==============================] - 19s 24ms/step - loss: 0.1825 - accuracy: 0.9292 - val_loss: 0.4315 - val_accuracy: 0.8412\n",
            "Epoch 4/10\n",
            "782/782 [==============================] - 18s 23ms/step - loss: 0.1317 - accuracy: 0.9512 - val_loss: 0.4378 - val_accuracy: 0.8267\n",
            "Epoch 5/10\n",
            "782/782 [==============================] - 19s 24ms/step - loss: 0.0930 - accuracy: 0.9672 - val_loss: 0.6900 - val_accuracy: 0.8292\n",
            "Epoch 6/10\n",
            "782/782 [==============================] - 18s 24ms/step - loss: 0.0794 - accuracy: 0.9730 - val_loss: 0.5700 - val_accuracy: 0.8333\n",
            "Epoch 7/10\n",
            "782/782 [==============================] - 19s 24ms/step - loss: 0.0505 - accuracy: 0.9833 - val_loss: 0.6798 - val_accuracy: 0.8370\n",
            "Epoch 8/10\n",
            "782/782 [==============================] - 18s 24ms/step - loss: 0.0460 - accuracy: 0.9848 - val_loss: 0.6846 - val_accuracy: 0.8300\n",
            "Epoch 9/10\n",
            "782/782 [==============================] - 18s 23ms/step - loss: 0.0302 - accuracy: 0.9912 - val_loss: 0.7452 - val_accuracy: 0.8279\n",
            "Epoch 10/10\n",
            "782/782 [==============================] - 18s 23ms/step - loss: 0.0195 - accuracy: 0.9941 - val_loss: 0.8609 - val_accuracy: 0.8259\n"
          ],
          "name": "stdout"
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<tensorflow.python.keras.callbacks.History at 0x7f8b51a9e490>"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 17
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qKNqM34x-ver"
      },
      "source": [
        "### 6. 換個存檔方式\n",
        "\n",
        "這次是把 model 和訓練權重分開存, 使用上更有彈性。"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "UkHz6PbACrg-",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "e5897b5a-650d-42ab-d9f2-872a92f8ad1f"
      },
      "source": [
        "from google.colab import drive\n",
        "\n",
        "drive.mount('/content/drive')"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Mounted at /content/drive\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "2puuhaa-C0-Q",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "b2c97d72-4417-4f62-8a2d-22d40bc78f9b"
      },
      "source": [
        "%cd '/content/drive/My Drive/Colab Notebooks'"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "/content/drive/My Drive/Colab Notebooks\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "DTbM9T7A-ver"
      },
      "source": [
        "model_json = model.to_json()\n",
        "open('imdb_model_architecture.json', 'w').write(model_json)\n",
        "model.save_weights('imdb_model_weights.h5')"
      ],
      "execution_count": null,
      "outputs": []
    }
  ]
}