{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "冒險21_打造 RNN 情意分析函數學習機",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyMuNoxkHeLlcLvpYz9y5WwW",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "PMWDNxo3N6z_"
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pU0MI3Jm-vej"
},
"source": [
"### 1. 讀入深度學習套件"
]
},
{
"cell_type": "code",
"metadata": {
"id": "yGQHjFkg-vek"
},
"source": [
"from tensorflow.keras.preprocessing import sequence\n",
"from tensorflow.keras.models import Sequential\n",
"from tensorflow.keras.layers import Dense, Embedding\n",
"from tensorflow.keras.layers import LSTM\n",
"from tensorflow.keras.datasets import imdb"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "upzJSb2Y-vek"
},
"source": [
"### 2. 讀入數據\n",
"\n",
"一般自然語言處理, 我們會限制最大要使用的字數。"
]
},
{
"cell_type": "code",
"metadata": {
"id": "TMRJOWWD-vek",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "a7c4f0c6-1fab-4225-da74-18c6e57f74b2"
},
"source": [
"(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz\n",
"17465344/17464789 [==============================] - 0s 0us/step\n",
"17473536/17464789 [==============================] - 0s 0us/step\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"print(f'訓練資料筆數:{len(x_train)}')\n",
"print(f'測試資料筆數:{len(x_test)}')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ru3eSKVZmLLn",
"outputId": "7121387f-ec85-423e-94f3-3abd3dea9b9f"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"訓練資料筆數:25000\n",
"測試資料筆數:25000\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "q2HndB0cAH_d"
},
"source": [
"注意每筆評論的長度當然是不一樣的。"
]
},
{
"cell_type": "code",
"source": [
"print(f'第一筆訓練資料的長度:{len(x_train[0])}')\n",
"print(f'第二筆測試資料的長度:{len(x_train[1])}')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "-mXnNk5HnIkK",
"outputId": "6d7227ca-fa2a-4736-9300-54ca8ff40b19"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"第一筆訓練資料的長度:218\n",
"第二筆測試資料的長度:189\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "TwmfAdEm-ven",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "8a8fd643-8366-4f6b-d181-d19873e8ab4f"
},
"source": [
"print(f'第一筆資料的標籤:{y_train[0]}(正評)')\n",
"print(f'第二筆資料的標籤:{y_train[1]}(負評)')"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"第一筆資料的標籤:1(正評)\n",
"第二筆資料的標籤:0(負評)\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kY0bt0Mv-veo"
},
"source": [
"### 3. 資料處理\n",
"\n",
"雖然我們可以做真的 seq2seq, 可是資料長度不一樣對計算上有麻煩, 因此平常還是會固定一定長度, 其餘補 0。"
]
},
{
"cell_type": "code",
"metadata": {
"id": "ynXfhV0S-veo"
},
"source": [
"x_train = sequence.pad_sequences(x_train, maxlen=100)\n",
"x_test = sequence.pad_sequences(x_test, maxlen=100)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "gLw1t5YZ-vep"
},
"source": [
"### 4. step 01: 打造一個函數學習機"
]
},
{
"cell_type": "code",
"metadata": {
"id": "T3WoAWzg-vep"
},
"source": [
"model = Sequential()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "paCqtY9E-vep"
},
"source": [
"model.add(Embedding(10000, 128))"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "e3ZFi6IO-vep"
},
"source": [
"model.add(LSTM(128))"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "WYMx8FF7-veq"
},
"source": [
"model.add(Dense(1, activation='sigmoid'))"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "XbmruxHeAxT8"
},
"source": [
"#### 組裝"
]
},
{
"cell_type": "code",
"metadata": {
"id": "Hc3kJo8u-veq"
},
"source": [
"model.compile(loss='binary_crossentropy',\n",
" optimizer='adam',\n",
" metrics=['accuracy'])"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "7_s0X6DHA6A-"
},
"source": [
"#### 欣賞我們的 model"
]
},
{
"cell_type": "code",
"metadata": {
"id": "vKwxPjN6-veq",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "f3d02e97-00d9-4916-c46b-9bd25ec72dc8"
},
"source": [
"model.summary()"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Model: \"sequential\"\n",
"_________________________________________________________________\n",
" Layer (type) Output Shape Param # \n",
"=================================================================\n",
" embedding (Embedding) (None, None, 128) 1280000 \n",
" \n",
" lstm (LSTM) (None, 128) 131584 \n",
" \n",
" dense (Dense) (None, 1) 129 \n",
" \n",
"=================================================================\n",
"Total params: 1,411,713\n",
"Trainable params: 1,411,713\n",
"Non-trainable params: 0\n",
"_________________________________________________________________\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "O19zWWcl-veq"
},
"source": [
"### 5. step 02: 訓練"
]
},
{
"cell_type": "code",
"metadata": {
"id": "iIL00x1u-veq",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "efbeea1d-3a67-4a12-e40d-c10582014d3f"
},
"source": [
"model.fit(x_train, y_train, batch_size=32, epochs=10,\n",
" validation_data=(x_test, y_test))"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Epoch 1/10\n",
"782/782 [==============================] - 22s 24ms/step - loss: 0.4183 - accuracy: 0.8095 - val_loss: 0.3533 - val_accuracy: 0.8436\n",
"Epoch 2/10\n",
"782/782 [==============================] - 18s 24ms/step - loss: 0.2572 - accuracy: 0.8975 - val_loss: 0.3465 - val_accuracy: 0.8459\n",
"Epoch 3/10\n",
"782/782 [==============================] - 19s 24ms/step - loss: 0.1825 - accuracy: 0.9292 - val_loss: 0.4315 - val_accuracy: 0.8412\n",
"Epoch 4/10\n",
"782/782 [==============================] - 18s 23ms/step - loss: 0.1317 - accuracy: 0.9512 - val_loss: 0.4378 - val_accuracy: 0.8267\n",
"Epoch 5/10\n",
"782/782 [==============================] - 19s 24ms/step - loss: 0.0930 - accuracy: 0.9672 - val_loss: 0.6900 - val_accuracy: 0.8292\n",
"Epoch 6/10\n",
"782/782 [==============================] - 18s 24ms/step - loss: 0.0794 - accuracy: 0.9730 - val_loss: 0.5700 - val_accuracy: 0.8333\n",
"Epoch 7/10\n",
"782/782 [==============================] - 19s 24ms/step - loss: 0.0505 - accuracy: 0.9833 - val_loss: 0.6798 - val_accuracy: 0.8370\n",
"Epoch 8/10\n",
"782/782 [==============================] - 18s 24ms/step - loss: 0.0460 - accuracy: 0.9848 - val_loss: 0.6846 - val_accuracy: 0.8300\n",
"Epoch 9/10\n",
"782/782 [==============================] - 18s 23ms/step - loss: 0.0302 - accuracy: 0.9912 - val_loss: 0.7452 - val_accuracy: 0.8279\n",
"Epoch 10/10\n",
"782/782 [==============================] - 18s 23ms/step - loss: 0.0195 - accuracy: 0.9941 - val_loss: 0.8609 - val_accuracy: 0.8259\n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
""
]
},
"metadata": {
"tags": []
},
"execution_count": 17
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qKNqM34x-ver"
},
"source": [
"### 6. 換個存檔方式\n",
"\n",
"這次是把 model 和訓練權重分開存, 使用上更有彈性。"
]
},
{
"cell_type": "code",
"metadata": {
"id": "UkHz6PbACrg-",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "e5897b5a-650d-42ab-d9f2-872a92f8ad1f"
},
"source": [
"from google.colab import drive\n",
"\n",
"drive.mount('/content/drive')"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Mounted at /content/drive\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "2puuhaa-C0-Q",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "b2c97d72-4417-4f62-8a2d-22d40bc78f9b"
},
"source": [
"%cd '/content/drive/My Drive/Colab Notebooks'"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"/content/drive/My Drive/Colab Notebooks\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "DTbM9T7A-ver"
},
"source": [
"model_json = model.to_json()\n",
"open('imdb_model_architecture.json', 'w').write(model_json)\n",
"model.save_weights('imdb_model_weights.h5')"
],
"execution_count": null,
"outputs": []
}
]
}