{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "冒險21_打造 RNN 情意分析函數學習機", "provenance": [], "collapsed_sections": [], "authorship_tag": "ABX9TyMuNoxkHeLlcLvpYz9y5WwW", "include_colab_link": true }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "PMWDNxo3N6z_" }, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": { "id": "pU0MI3Jm-vej" }, "source": [ "### 1. 讀入深度學習套件" ] }, { "cell_type": "code", "metadata": { "id": "yGQHjFkg-vek" }, "source": [ "from tensorflow.keras.preprocessing import sequence\n", "from tensorflow.keras.models import Sequential\n", "from tensorflow.keras.layers import Dense, Embedding\n", "from tensorflow.keras.layers import LSTM\n", "from tensorflow.keras.datasets import imdb" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "upzJSb2Y-vek" }, "source": [ "### 2. 讀入數據\n", "\n", "一般自然語言處理, 我們會限制最大要使用的字數。" ] }, { "cell_type": "code", "metadata": { "id": "TMRJOWWD-vek", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "a7c4f0c6-1fab-4225-da74-18c6e57f74b2" }, "source": [ "(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)" ], "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz\n", "17465344/17464789 [==============================] - 0s 0us/step\n", "17473536/17464789 [==============================] - 0s 0us/step\n" ] } ] }, { "cell_type": "code", "source": [ "print(f'訓練資料筆數:{len(x_train)}')\n", "print(f'測試資料筆數:{len(x_test)}')" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ru3eSKVZmLLn", "outputId": "7121387f-ec85-423e-94f3-3abd3dea9b9f" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "訓練資料筆數:25000\n", "測試資料筆數:25000\n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "q2HndB0cAH_d" }, "source": [ "注意每筆評論的長度當然是不一樣的。" ] }, { "cell_type": "code", "source": [ "print(f'第一筆訓練資料的長度:{len(x_train[0])}')\n", "print(f'第二筆測試資料的長度:{len(x_train[1])}')" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "-mXnNk5HnIkK", "outputId": "6d7227ca-fa2a-4736-9300-54ca8ff40b19" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "第一筆訓練資料的長度:218\n", "第二筆測試資料的長度:189\n" ] } ] }, { "cell_type": "code", "metadata": { "id": "TwmfAdEm-ven", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "8a8fd643-8366-4f6b-d181-d19873e8ab4f" }, "source": [ "print(f'第一筆資料的標籤:{y_train[0]}(正評)')\n", "print(f'第二筆資料的標籤:{y_train[1]}(負評)')" ], "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "第一筆資料的標籤:1(正評)\n", "第二筆資料的標籤:0(負評)\n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "kY0bt0Mv-veo" }, "source": [ "### 3. 資料處理\n", "\n", "雖然我們可以做真的 seq2seq, 可是資料長度不一樣對計算上有麻煩, 因此平常還是會固定一定長度, 其餘補 0。" ] }, { "cell_type": "code", "metadata": { "id": "ynXfhV0S-veo" }, "source": [ "x_train = sequence.pad_sequences(x_train, maxlen=100)\n", "x_test = sequence.pad_sequences(x_test, maxlen=100)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "gLw1t5YZ-vep" }, "source": [ "### 4. step 01: 打造一個函數學習機" ] }, { "cell_type": "code", "metadata": { "id": "T3WoAWzg-vep" }, "source": [ "model = Sequential()" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "paCqtY9E-vep" }, "source": [ "model.add(Embedding(10000, 128))" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "e3ZFi6IO-vep" }, "source": [ "model.add(LSTM(128))" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "WYMx8FF7-veq" }, "source": [ "model.add(Dense(1, activation='sigmoid'))" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "XbmruxHeAxT8" }, "source": [ "#### 組裝" ] }, { "cell_type": "code", "metadata": { "id": "Hc3kJo8u-veq" }, "source": [ "model.compile(loss='binary_crossentropy',\n", " optimizer='adam',\n", " metrics=['accuracy'])" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "7_s0X6DHA6A-" }, "source": [ "#### 欣賞我們的 model" ] }, { "cell_type": "code", "metadata": { "id": "vKwxPjN6-veq", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "f3d02e97-00d9-4916-c46b-9bd25ec72dc8" }, "source": [ "model.summary()" ], "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Model: \"sequential\"\n", "_________________________________________________________________\n", " Layer (type) Output Shape Param # \n", "=================================================================\n", " embedding (Embedding) (None, None, 128) 1280000 \n", " \n", " lstm (LSTM) (None, 128) 131584 \n", " \n", " dense (Dense) (None, 1) 129 \n", " \n", "=================================================================\n", "Total params: 1,411,713\n", "Trainable params: 1,411,713\n", "Non-trainable params: 0\n", "_________________________________________________________________\n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "O19zWWcl-veq" }, "source": [ "### 5. step 02: 訓練" ] }, { "cell_type": "code", "metadata": { "id": "iIL00x1u-veq", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "efbeea1d-3a67-4a12-e40d-c10582014d3f" }, "source": [ "model.fit(x_train, y_train, batch_size=32, epochs=10,\n", " validation_data=(x_test, y_test))" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Epoch 1/10\n", "782/782 [==============================] - 22s 24ms/step - loss: 0.4183 - accuracy: 0.8095 - val_loss: 0.3533 - val_accuracy: 0.8436\n", "Epoch 2/10\n", "782/782 [==============================] - 18s 24ms/step - loss: 0.2572 - accuracy: 0.8975 - val_loss: 0.3465 - val_accuracy: 0.8459\n", "Epoch 3/10\n", "782/782 [==============================] - 19s 24ms/step - loss: 0.1825 - accuracy: 0.9292 - val_loss: 0.4315 - val_accuracy: 0.8412\n", "Epoch 4/10\n", "782/782 [==============================] - 18s 23ms/step - loss: 0.1317 - accuracy: 0.9512 - val_loss: 0.4378 - val_accuracy: 0.8267\n", "Epoch 5/10\n", "782/782 [==============================] - 19s 24ms/step - loss: 0.0930 - accuracy: 0.9672 - val_loss: 0.6900 - val_accuracy: 0.8292\n", "Epoch 6/10\n", "782/782 [==============================] - 18s 24ms/step - loss: 0.0794 - accuracy: 0.9730 - val_loss: 0.5700 - val_accuracy: 0.8333\n", "Epoch 7/10\n", "782/782 [==============================] - 19s 24ms/step - loss: 0.0505 - accuracy: 0.9833 - val_loss: 0.6798 - val_accuracy: 0.8370\n", "Epoch 8/10\n", "782/782 [==============================] - 18s 24ms/step - loss: 0.0460 - accuracy: 0.9848 - val_loss: 0.6846 - val_accuracy: 0.8300\n", "Epoch 9/10\n", "782/782 [==============================] - 18s 23ms/step - loss: 0.0302 - accuracy: 0.9912 - val_loss: 0.7452 - val_accuracy: 0.8279\n", "Epoch 10/10\n", "782/782 [==============================] - 18s 23ms/step - loss: 0.0195 - accuracy: 0.9941 - val_loss: 0.8609 - val_accuracy: 0.8259\n" ], "name": "stdout" }, { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": { "tags": [] }, "execution_count": 17 } ] }, { "cell_type": "markdown", "metadata": { "id": "qKNqM34x-ver" }, "source": [ "### 6. 換個存檔方式\n", "\n", "這次是把 model 和訓練權重分開存, 使用上更有彈性。" ] }, { "cell_type": "code", "metadata": { "id": "UkHz6PbACrg-", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "e5897b5a-650d-42ab-d9f2-872a92f8ad1f" }, "source": [ "from google.colab import drive\n", "\n", "drive.mount('/content/drive')" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Mounted at /content/drive\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "2puuhaa-C0-Q", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "b2c97d72-4417-4f62-8a2d-22d40bc78f9b" }, "source": [ "%cd '/content/drive/My Drive/Colab Notebooks'" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "/content/drive/My Drive/Colab Notebooks\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "DTbM9T7A-ver" }, "source": [ "model_json = model.to_json()\n", "open('imdb_model_architecture.json', 'w').write(model_json)\n", "model.save_weights('imdb_model_weights.h5')" ], "execution_count": null, "outputs": [] } ] }