{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "5PUseM_N42in" }, "source": [ "이 노트북은 [케라스 창시자에게 배우는 딥러닝 2판](https://tensorflow.blog/kerasdl2/)의 예제 코드를 담고 있습니다.\n", "\n", "\n", " \n", " \n", " \n", "
\n", " \"Open\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "NhOmioaR42iq" }, "source": [ "## 트랜스포머 아키텍처" ] }, { "cell_type": "markdown", "metadata": { "id": "xWSmzK5142ir" }, "source": [ "### 셀프 어텐션 이해하기" ] }, { "cell_type": "markdown", "metadata": { "id": "2XdzrCS-42ir" }, "source": [ "#### 일반화된 셀프 어텐션: 쿼리-키-값 모델" ] }, { "cell_type": "markdown", "metadata": { "id": "Qo8cK-c_42ir" }, "source": [ "### 멀티 헤드 어텐션" ] }, { "cell_type": "markdown", "metadata": { "id": "F0HqiH6C42ir" }, "source": [ "### 트랜스포머 인코더" ] }, { "cell_type": "markdown", "metadata": { "id": "LjYMcwAl42is" }, "source": [ "**데이터 가져오기**" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "i7VVn1yl42is", "outputId": "edfcf035-e3b2-4f70-e03b-5a0e192bb411" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "rm: cannot remove 'aclImdb': No such file or directory\n", " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "100 80.2M 100 80.2M 0 0 21.3M 0 0:00:03 0:00:03 --:--:-- 21.3M\n" ] } ], "source": [ "!rm -r aclImdb\n", "!curl -O https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz\n", "!tar -xf aclImdb_v1.tar.gz\n", "!rm -r aclImdb/train/unsup" ] }, { "cell_type": "markdown", "metadata": { "id": "D8aiOAjJ42iu" }, "source": [ "**데이터 준비**" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "7C8nBL2242iu", "outputId": "e536b586-ba3b-4be8-dab7-5aacd04dd0f9" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Found 20000 files belonging to 2 classes.\n", "Found 5000 files belonging to 2 classes.\n", "Found 25000 files belonging to 2 classes.\n" ] } ], "source": [ "import os, pathlib, shutil, random\n", "from tensorflow import keras\n", "batch_size = 32\n", "base_dir = pathlib.Path(\"aclImdb\")\n", "val_dir = base_dir / \"val\"\n", "train_dir = base_dir / \"train\"\n", "for category in (\"neg\", \"pos\"):\n", " os.makedirs(val_dir / category)\n", " files = os.listdir(train_dir / category)\n", " random.Random(1337).shuffle(files)\n", " num_val_samples = int(0.2 * len(files))\n", " val_files = files[-num_val_samples:]\n", " for fname in val_files:\n", " shutil.move(train_dir / category / fname,\n", " val_dir / category / fname)\n", "\n", "train_ds = keras.utils.text_dataset_from_directory(\n", " \"aclImdb/train\", batch_size=batch_size\n", ")\n", "val_ds = keras.utils.text_dataset_from_directory(\n", " \"aclImdb/val\", batch_size=batch_size\n", ")\n", "test_ds = keras.utils.text_dataset_from_directory(\n", " \"aclImdb/test\", batch_size=batch_size\n", ")\n", "text_only_train_ds = train_ds.map(lambda x, y: x)" ] }, { "cell_type": "markdown", "metadata": { "id": "e0MsIvhr42iv" }, "source": [ "**데이터 벡터화**" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "--rJ5PYw42iv" }, "outputs": [], "source": [ "from tensorflow.keras import layers\n", "\n", "max_length = 600\n", "max_tokens = 20000\n", "text_vectorization = layers.TextVectorization(\n", " max_tokens=max_tokens,\n", " output_mode=\"int\",\n", " output_sequence_length=max_length,\n", ")\n", "text_vectorization.adapt(text_only_train_ds)\n", "\n", "int_train_ds = train_ds.map(\n", " lambda x, y: (text_vectorization(x), y),\n", " num_parallel_calls=4)\n", "int_val_ds = val_ds.map(\n", " lambda x, y: (text_vectorization(x), y),\n", " num_parallel_calls=4)\n", "int_test_ds = test_ds.map(\n", " lambda x, y: (text_vectorization(x), y),\n", " num_parallel_calls=4)" ] }, { "cell_type": "markdown", "metadata": { "id": "uZrJUai142iv" }, "source": [ "**`Layer` 층을 상속하여 구현한 트랜스포머 인코더**" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "924jztFe42iv" }, "outputs": [], "source": [ "import tensorflow as tf\n", "from tensorflow import keras\n", "from tensorflow.keras import layers\n", "\n", "class TransformerEncoder(layers.Layer):\n", " def __init__(self, embed_dim, dense_dim, num_heads, **kwargs):\n", " super().__init__(**kwargs)\n", " self.embed_dim = embed_dim\n", " self.dense_dim = dense_dim\n", " self.num_heads = num_heads\n", " self.attention = layers.MultiHeadAttention(\n", " num_heads=num_heads, key_dim=embed_dim)\n", " self.dense_proj = keras.Sequential(\n", " [layers.Dense(dense_dim, activation=\"relu\"),\n", " layers.Dense(embed_dim),]\n", " )\n", " self.layernorm_1 = layers.LayerNormalization()\n", " self.layernorm_2 = layers.LayerNormalization()\n", "\n", " def call(self, inputs, mask=None):\n", " if mask is not None:\n", " mask = mask[:, tf.newaxis, :]\n", " attention_output = self.attention(\n", " inputs, inputs, attention_mask=mask)\n", " proj_input = self.layernorm_1(inputs + attention_output)\n", " proj_output = self.dense_proj(proj_input)\n", " return self.layernorm_2(proj_input + proj_output)\n", "\n", " def get_config(self):\n", " config = super().get_config()\n", " config.update({\n", " \"embed_dim\": self.embed_dim,\n", " \"num_heads\": self.num_heads,\n", " \"dense_dim\": self.dense_dim,\n", " })\n", " return config" ] }, { "cell_type": "markdown", "metadata": { "id": "8yaSXFTO42iw" }, "source": [ "**트랜스포머 인코더를 사용하여 텍스트 분류하기**" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "-Nk9lx8A42iw", "outputId": "4aa30402-4915-4011-a165-293782cdb2fc" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Model: \"model\"\n", "_________________________________________________________________\n", " Layer (type) Output Shape Param # \n", "=================================================================\n", " input_1 (InputLayer) [(None, None)] 0 \n", " \n", " embedding (Embedding) (None, None, 256) 5120000 \n", " \n", " transformer_encoder (Trans (None, None, 256) 543776 \n", " formerEncoder) \n", " \n", " global_max_pooling1d (Glob (None, 256) 0 \n", " alMaxPooling1D) \n", " \n", " dropout (Dropout) (None, 256) 0 \n", " \n", " dense_2 (Dense) (None, 1) 257 \n", " \n", "=================================================================\n", "Total params: 5664033 (21.61 MB)\n", "Trainable params: 5664033 (21.61 MB)\n", "Non-trainable params: 0 (0.00 Byte)\n", "_________________________________________________________________\n" ] } ], "source": [ "vocab_size = 20000\n", "embed_dim = 256\n", "num_heads = 2\n", "dense_dim = 32\n", "\n", "inputs = keras.Input(shape=(None,), dtype=\"int64\")\n", "x = layers.Embedding(vocab_size, embed_dim)(inputs)\n", "x = TransformerEncoder(embed_dim, dense_dim, num_heads)(x)\n", "x = layers.GlobalMaxPooling1D()(x)\n", "x = layers.Dropout(0.5)(x)\n", "outputs = layers.Dense(1, activation=\"sigmoid\")(x)\n", "model = keras.Model(inputs, outputs)\n", "model.compile(optimizer=\"rmsprop\",\n", " loss=\"binary_crossentropy\",\n", " metrics=[\"accuracy\"])\n", "model.summary()" ] }, { "cell_type": "markdown", "metadata": { "id": "7nXp84h_42iw" }, "source": [ "**트랜스포머 인코더 기반 모델 훈련하고 평가하기**" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "E3H26gQC42ix", "outputId": "aa448162-347e-4a68-862b-a212504b542f" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Epoch 1/20\n", "625/625 [==============================] - 49s 78ms/step - loss: 0.3364 - accuracy: 0.8550 - val_loss: 0.3118 - val_accuracy: 0.8732\n" ] }, { "output_type": "stream", "name": "stderr", "text": [ "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py:3079: UserWarning: You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')`.\n", " saving_api.save_model(\n" ] }, { "output_type": "stream", "name": "stdout", "text": [ "Epoch 2/20\n", "625/625 [==============================] - 43s 69ms/step - loss: 0.3045 - accuracy: 0.8699 - val_loss: 0.2991 - val_accuracy: 0.8732\n", "Epoch 3/20\n", "625/625 [==============================] - 44s 70ms/step - loss: 0.2724 - accuracy: 0.8887 - val_loss: 0.2877 - val_accuracy: 0.8802\n", "Epoch 4/20\n", "625/625 [==============================] - 44s 70ms/step - loss: 0.2435 - accuracy: 0.9021 - val_loss: 0.2880 - val_accuracy: 0.8804\n", "Epoch 5/20\n", "625/625 [==============================] - 43s 69ms/step - loss: 0.2141 - accuracy: 0.9158 - val_loss: 0.2930 - val_accuracy: 0.8814\n", "Epoch 6/20\n", "625/625 [==============================] - 42s 66ms/step - loss: 0.1861 - accuracy: 0.9284 - val_loss: 0.3078 - val_accuracy: 0.8786\n", "Epoch 7/20\n", "625/625 [==============================] - 43s 68ms/step - loss: 0.1538 - accuracy: 0.9419 - val_loss: 0.3307 - val_accuracy: 0.8772\n", "Epoch 8/20\n", "625/625 [==============================] - 43s 69ms/step - loss: 0.1293 - accuracy: 0.9524 - val_loss: 0.3357 - val_accuracy: 0.8776\n", "Epoch 9/20\n", "625/625 [==============================] - 41s 65ms/step - loss: 0.1035 - accuracy: 0.9615 - val_loss: 0.3685 - val_accuracy: 0.8768\n", "Epoch 10/20\n", "625/625 [==============================] - 41s 66ms/step - loss: 0.0827 - accuracy: 0.9698 - val_loss: 0.3738 - val_accuracy: 0.8778\n", "Epoch 11/20\n", "625/625 [==============================] - 41s 65ms/step - loss: 0.0657 - accuracy: 0.9765 - val_loss: 0.4341 - val_accuracy: 0.8772\n", "Epoch 12/20\n", "625/625 [==============================] - 41s 65ms/step - loss: 0.0524 - accuracy: 0.9814 - val_loss: 0.4558 - val_accuracy: 0.8726\n", "Epoch 13/20\n", "625/625 [==============================] - 42s 68ms/step - loss: 0.0408 - accuracy: 0.9854 - val_loss: 0.4878 - val_accuracy: 0.8640\n", "Epoch 14/20\n", "625/625 [==============================] - 41s 65ms/step - loss: 0.0329 - accuracy: 0.9883 - val_loss: 0.5147 - val_accuracy: 0.8688\n", "Epoch 15/20\n", "625/625 [==============================] - 42s 68ms/step - loss: 0.0296 - accuracy: 0.9897 - val_loss: 0.6139 - val_accuracy: 0.8692\n", "Epoch 16/20\n", "625/625 [==============================] - 40s 65ms/step - loss: 0.0241 - accuracy: 0.9915 - val_loss: 0.6086 - val_accuracy: 0.8654\n", "Epoch 17/20\n", "625/625 [==============================] - 42s 68ms/step - loss: 0.0185 - accuracy: 0.9936 - val_loss: 0.6920 - val_accuracy: 0.8594\n", "Epoch 18/20\n", "625/625 [==============================] - 41s 65ms/step - loss: 0.0177 - accuracy: 0.9943 - val_loss: 0.6765 - val_accuracy: 0.8652\n", "Epoch 19/20\n", "625/625 [==============================] - 42s 67ms/step - loss: 0.0160 - accuracy: 0.9942 - val_loss: 0.6938 - val_accuracy: 0.8624\n", "Epoch 20/20\n", "625/625 [==============================] - 40s 64ms/step - loss: 0.0158 - accuracy: 0.9948 - val_loss: 0.7845 - val_accuracy: 0.8628\n", "782/782 [==============================] - 19s 24ms/step - loss: 0.2999 - accuracy: 0.8715\n", "테스트 정확도: 0.871\n" ] } ], "source": [ "callbacks = [\n", " keras.callbacks.ModelCheckpoint(\"transformer_encoder.h5\",\n", " save_best_only=True)\n", "]\n", "model.fit(int_train_ds, validation_data=int_val_ds, epochs=20, callbacks=callbacks)\n", "model = keras.models.load_model(\n", " \"transformer_encoder.h5\",\n", " custom_objects={\"TransformerEncoder\": TransformerEncoder})\n", "print(f\"테스트 정확도: {model.evaluate(int_test_ds)[1]:.3f}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "Ffb0_42j42ix" }, "source": [ "#### 위치 인코딩을 사용해 위치 정보 주입하기" ] }, { "cell_type": "markdown", "metadata": { "id": "c3eCKYxO42ix" }, "source": [ "**서브클래싱으로 위치 임베딩 구현하기**" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "id": "7ngZXecb42ix" }, "outputs": [], "source": [ "class PositionalEmbedding(layers.Layer):\n", " def __init__(self, sequence_length, input_dim, output_dim, **kwargs):\n", " super().__init__(**kwargs)\n", " self.token_embeddings = layers.Embedding(\n", " input_dim=input_dim, output_dim=output_dim)\n", " self.position_embeddings = layers.Embedding(\n", " input_dim=sequence_length, output_dim=output_dim)\n", " self.sequence_length = sequence_length\n", " self.input_dim = input_dim\n", " self.output_dim = output_dim\n", "\n", " def call(self, inputs):\n", " length = tf.shape(inputs)[-1]\n", " positions = tf.range(start=0, limit=length, delta=1)\n", " embedded_tokens = self.token_embeddings(inputs)\n", " embedded_positions = self.position_embeddings(positions)\n", " return embedded_tokens + embedded_positions\n", "\n", " def compute_mask(self, inputs, mask=None):\n", " return tf.math.not_equal(inputs, 0)\n", "\n", " def get_config(self):\n", " config = super().get_config()\n", " config.update({\n", " \"output_dim\": self.output_dim,\n", " \"sequence_length\": self.sequence_length,\n", " \"input_dim\": self.input_dim,\n", " })\n", " return config" ] }, { "cell_type": "markdown", "metadata": { "id": "YAOg6EBF42ix" }, "source": [ "#### 텍스트 분류 트랜스포머" ] }, { "cell_type": "markdown", "metadata": { "id": "9RXcbNDa42iy" }, "source": [ "**트랜스포머 인코더와 위치 임베딩 합치기**" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "QmzXmO7042iy", "outputId": "672c3025-15f0-47ea-ea17-3cd329269ac6" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Model: \"model_1\"\n", "_________________________________________________________________\n", " Layer (type) Output Shape Param # \n", "=================================================================\n", " input_2 (InputLayer) [(None, None)] 0 \n", " \n", " positional_embedding (Posi (None, None, 256) 5273600 \n", " tionalEmbedding) \n", " \n", " transformer_encoder_1 (Tra (None, None, 256) 543776 \n", " nsformerEncoder) \n", " \n", " global_max_pooling1d_1 (Gl (None, 256) 0 \n", " obalMaxPooling1D) \n", " \n", " dropout_1 (Dropout) (None, 256) 0 \n", " \n", " dense_7 (Dense) (None, 1) 257 \n", " \n", "=================================================================\n", "Total params: 5817633 (22.19 MB)\n", "Trainable params: 5817633 (22.19 MB)\n", "Non-trainable params: 0 (0.00 Byte)\n", "_________________________________________________________________\n", "Epoch 1/20\n", "625/625 [==============================] - 60s 93ms/step - loss: 0.5504 - accuracy: 0.7326 - val_loss: 0.3707 - val_accuracy: 0.8356\n", "Epoch 2/20\n", "625/625 [==============================] - 50s 80ms/step - loss: 0.3100 - accuracy: 0.8709 - val_loss: 0.3869 - val_accuracy: 0.8262\n", "Epoch 3/20\n", "625/625 [==============================] - 47s 74ms/step - loss: 0.2424 - accuracy: 0.9011 - val_loss: 0.3069 - val_accuracy: 0.8760\n", "Epoch 4/20\n", "625/625 [==============================] - 46s 74ms/step - loss: 0.1992 - accuracy: 0.9220 - val_loss: 0.2885 - val_accuracy: 0.8918\n", "Epoch 5/20\n", "625/625 [==============================] - 44s 71ms/step - loss: 0.1664 - accuracy: 0.9370 - val_loss: 0.3382 - val_accuracy: 0.8808\n", "Epoch 6/20\n", "625/625 [==============================] - 44s 70ms/step - loss: 0.1395 - accuracy: 0.9460 - val_loss: 0.3723 - val_accuracy: 0.8836\n", "Epoch 7/20\n", "625/625 [==============================] - 44s 70ms/step - loss: 0.1140 - accuracy: 0.9571 - val_loss: 0.4272 - val_accuracy: 0.8844\n", "Epoch 8/20\n", "625/625 [==============================] - 45s 71ms/step - loss: 0.0952 - accuracy: 0.9646 - val_loss: 0.4487 - val_accuracy: 0.8862\n", "Epoch 9/20\n", "625/625 [==============================] - 43s 69ms/step - loss: 0.0735 - accuracy: 0.9739 - val_loss: 0.4597 - val_accuracy: 0.8780\n", "Epoch 10/20\n", "625/625 [==============================] - 45s 71ms/step - loss: 0.0556 - accuracy: 0.9793 - val_loss: 0.6001 - val_accuracy: 0.8814\n", "Epoch 11/20\n", "625/625 [==============================] - 44s 71ms/step - loss: 0.0441 - accuracy: 0.9847 - val_loss: 0.6343 - val_accuracy: 0.8684\n", "Epoch 12/20\n", "625/625 [==============================] - 44s 70ms/step - loss: 0.0355 - accuracy: 0.9886 - val_loss: 0.7697 - val_accuracy: 0.8670\n", "Epoch 13/20\n", "625/625 [==============================] - 44s 70ms/step - loss: 0.0258 - accuracy: 0.9913 - val_loss: 0.7606 - val_accuracy: 0.8728\n", "Epoch 14/20\n", "625/625 [==============================] - 43s 69ms/step - loss: 0.0227 - accuracy: 0.9924 - val_loss: 0.9193 - val_accuracy: 0.8730\n", "Epoch 15/20\n", "625/625 [==============================] - 43s 69ms/step - loss: 0.0235 - accuracy: 0.9927 - val_loss: 0.9620 - val_accuracy: 0.8788\n", "Epoch 16/20\n", "625/625 [==============================] - 43s 69ms/step - loss: 0.0180 - accuracy: 0.9944 - val_loss: 1.0631 - val_accuracy: 0.8780\n", "Epoch 17/20\n", "625/625 [==============================] - 43s 69ms/step - loss: 0.0155 - accuracy: 0.9956 - val_loss: 0.9305 - val_accuracy: 0.8728\n", "Epoch 18/20\n", "625/625 [==============================] - 43s 69ms/step - loss: 0.0124 - accuracy: 0.9961 - val_loss: 0.9398 - val_accuracy: 0.8710\n", "Epoch 19/20\n", "625/625 [==============================] - 43s 69ms/step - loss: 0.0106 - accuracy: 0.9967 - val_loss: 0.9718 - val_accuracy: 0.8752\n", "Epoch 20/20\n", "625/625 [==============================] - 43s 69ms/step - loss: 0.0112 - accuracy: 0.9963 - val_loss: 1.1590 - val_accuracy: 0.8770\n", "782/782 [==============================] - 22s 28ms/step - loss: 0.3193 - accuracy: 0.8801\n", "테스트 정확도: 0.880\n" ] } ], "source": [ "vocab_size = 20000\n", "sequence_length = 600\n", "embed_dim = 256\n", "num_heads = 2\n", "dense_dim = 32\n", "\n", "inputs = keras.Input(shape=(None,), dtype=\"int64\")\n", "x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(inputs)\n", "x = TransformerEncoder(embed_dim, dense_dim, num_heads)(x)\n", "x = layers.GlobalMaxPooling1D()(x)\n", "x = layers.Dropout(0.5)(x)\n", "outputs = layers.Dense(1, activation=\"sigmoid\")(x)\n", "model = keras.Model(inputs, outputs)\n", "model.compile(optimizer=\"rmsprop\",\n", " loss=\"binary_crossentropy\",\n", " metrics=[\"accuracy\"])\n", "model.summary()\n", "\n", "callbacks = [\n", " keras.callbacks.ModelCheckpoint(\"full_transformer_encoder.h5\",\n", " save_best_only=True)\n", "]\n", "model.fit(int_train_ds, validation_data=int_val_ds, epochs=20, callbacks=callbacks)\n", "model = keras.models.load_model(\n", " \"full_transformer_encoder.h5\",\n", " custom_objects={\"TransformerEncoder\": TransformerEncoder,\n", " \"PositionalEmbedding\": PositionalEmbedding})\n", "print(f\"테스트 정확도: {model.evaluate(int_test_ds)[1]:.3f}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "T17kNue242iy" }, "source": [ "### BoW 모델 대신 언제 시퀀스 모델을 사용하나요?" ] } ], "metadata": { "accelerator": "GPU", "colab": { "name": "chapter11_part03_transformer.i", "provenance": [] }, "kernelspec": { "display_name": "default:Python", "language": "python", "name": "conda-env-default-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.10" } }, "nbformat": 4, "nbformat_minor": 0 }