{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "mrDzC9FUyWJk" }, "source": [ "# 머신 러닝 교과서 3판" ] }, { "cell_type": "markdown", "metadata": { "id": "90ZMzS7wyWJm" }, "source": [ "# 16장 - 순환 신경망으로 순차 데이터 모델링 (2/2)" ] }, { "cell_type": "markdown", "metadata": { "id": "i-YXfR8UyWJm" }, "source": [ "**아래 링크를 통해 이 노트북을 주피터 노트북 뷰어(nbviewer.jupyter.org)로 보거나 구글 코랩(colab.research.google.com)에서 실행할 수 있습니다.**\n", "\n", "\n", " \n", " \n", "
\n", " 주피터 노트북 뷰어로 보기\n", " \n", " 구글 코랩(Colab)에서 실행하기\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "5_u_b0r4yWJn" }, "source": [ "### 목차" ] }, { "cell_type": "markdown", "metadata": { "id": "G-xguWppyWJn" }, "source": [ "- 텐서플로로 시퀀스 모델링을 위한 RNN 구현하기\n", " - 두 번째 프로젝트: 텐서플로로 글자 단위 언어 모델 구현\n", " - 데이터셋 전처리\n", " - 문자 수준의 RNN 모델 만들기\n", " - 평가 단계 - 새로운 텍스트 생성\n", "- 트랜스포머 모델을 사용한 언어 이해\n", " - 셀프 어텐션 메카니즘 이해하기\n", " - 셀프 어텐션 기본 구조\n", " - 쿼리, 키, 값 가중치를 가진 셀프 어텐션 메카니즘\n", " - 멀티-헤드 어텐션과 트랜스포머 블록\n", "- 요약" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2021-01-01T08:20:46.163997Z", "iopub.status.busy": "2021-01-01T08:20:46.163019Z", "iopub.status.idle": "2021-01-01T08:20:46.165178Z", "shell.execute_reply": "2021-01-01T08:20:46.165803Z" }, "id": "RLnk-L-ZyWJn" }, "outputs": [], "source": [ "from IPython.display import Image" ] }, { "cell_type": "markdown", "metadata": { "id": "k8wSLT9byWJn" }, "source": [ "## 두 번째 프로젝트: 텐서플로로 글자 단위 언어 모델 구현" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 240 }, "execution": { "iopub.execute_input": "2021-01-01T08:20:46.170192Z", "iopub.status.busy": "2021-01-01T08:20:46.169457Z", "iopub.status.idle": "2021-01-01T08:20:46.205811Z", "shell.execute_reply": "2021-01-01T08:20:46.206653Z" }, "id": "-tysyJWSyWJn", "outputId": "c89e5f71-4c4f-49d4-99df-aa280a64ba43" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 2 } ], "source": [ "Image(url='https://git.io/JLdVE', width=700)" ] }, { "cell_type": "markdown", "metadata": { "id": "xrqTH2wvyWJo" }, "source": [ "### 데이터셋 전처리" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:20:46.213066Z", "iopub.status.busy": "2021-01-01T08:20:46.212126Z", "iopub.status.idle": "2021-01-01T08:20:50.103519Z", "shell.execute_reply": "2021-01-01T08:20:50.102717Z" }, "id": "GtxzLN-RyWJo", "outputId": "b9182664-2a0e-462b-d722-1730f93a2378" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "--2025-09-03 02:58:30-- https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch16/1268-0.txt\n", "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n", "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 1171600 (1.1M) [text/plain]\n", "Saving to: ‘1268-0.txt’\n", "\n", "1268-0.txt 100%[===================>] 1.12M --.-KB/s in 0.04s \n", "\n", "2025-09-03 02:58:30 (27.1 MB/s) - ‘1268-0.txt’ saved [1171600/1171600]\n", "\n" ] } ], "source": [ "# 코랩에서 실행할 경우 다음 코드를 실행해 주세요.\n", "!wget https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch16/1268-0.txt" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:20:50.111825Z", "iopub.status.busy": "2021-01-01T08:20:50.110901Z", "iopub.status.idle": "2021-01-01T08:20:50.138395Z", "shell.execute_reply": "2021-01-01T08:20:50.137468Z" }, "id": "mRwzImO7yWJo", "outputId": "15af7faf-7544-460b-b21a-7b0a711e2584" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "567 1112917\n", "전체 길이: 1112350\n", "고유한 문자: 80\n" ] } ], "source": [ "import numpy as np\n", "\n", "\n", "## 텍스트 읽고 전처리하기\n", "with open('1268-0.txt', 'r', encoding='UTF8') as fp:\n", " text=fp.read()\n", "\n", "start_indx = text.find('THE MYSTERIOUS ISLAND')\n", "end_indx = text.find('End of the Project Gutenberg')\n", "print(start_indx, end_indx)\n", "\n", "text = text[start_indx:end_indx]\n", "char_set = set(text)\n", "print('전체 길이:', len(text))\n", "print('고유한 문자:', len(char_set))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 288 }, "execution": { "iopub.execute_input": "2021-01-01T08:20:50.143620Z", "iopub.status.busy": "2021-01-01T08:20:50.142625Z", "iopub.status.idle": "2021-01-01T08:20:50.148075Z", "shell.execute_reply": "2021-01-01T08:20:50.148851Z" }, "id": "HjzozZLuyWJp", "outputId": "51f50806-1836-422a-ba6d-bd711625a2e9" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 5 } ], "source": [ "Image(url='https://git.io/JLdVz', width=700)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:20:50.198178Z", "iopub.status.busy": "2021-01-01T08:20:50.167394Z", "iopub.status.idle": "2021-01-01T08:20:50.298755Z", "shell.execute_reply": "2021-01-01T08:20:50.297847Z" }, "id": "X3zvJimcyWJp", "outputId": "132fa178-2aee-4033-c5db-1d8503c2c6f0" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "인코딩된 텍스트 크기: (1112350,)\n", "THE MYSTERIOUS == 인코딩 ==> [44 32 29 1 37 48 43 44 29 42 33 39 45 43 1]\n", "[33 43 36 25 38 28] == 디코딩 ==> ISLAND\n" ] } ], "source": [ "chars_sorted = sorted(char_set)\n", "char2int = {ch:i for i,ch in enumerate(chars_sorted)}\n", "char_array = np.array(chars_sorted)\n", "\n", "text_encoded = np.array(\n", " [char2int[ch] for ch in text],\n", " dtype=np.int32)\n", "\n", "print('인코딩된 텍스트 크기: ', text_encoded.shape)\n", "\n", "print(text[:15], ' == 인코딩 ==> ', text_encoded[:15])\n", "print(text_encoded[15:21], ' == 디코딩 ==> ', ''.join(char_array[text_encoded[15:21]]))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 418 }, "execution": { "iopub.execute_input": "2021-01-01T08:20:50.303833Z", "iopub.status.busy": "2021-01-01T08:20:50.302867Z", "iopub.status.idle": "2021-01-01T08:20:50.309281Z", "shell.execute_reply": "2021-01-01T08:20:50.310125Z" }, "id": "c9JdWEpwyWJp", "outputId": "3de0b1c5-394d-471b-91fb-8256b4859e02" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 7 } ], "source": [ "Image(url='https://git.io/JLdVV', width=700)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:20:50.316611Z", "iopub.status.busy": "2021-01-01T08:20:50.315609Z", "iopub.status.idle": "2021-01-01T08:20:52.091027Z", "shell.execute_reply": "2021-01-01T08:20:52.091768Z" }, "id": "_E5CojZ5yWJp", "outputId": "44795a95-7a66-44c2-8197-a9e80d764008" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "44 -> T\n", "32 -> H\n", "29 -> E\n", "1 -> \n", "37 -> M\n" ] } ], "source": [ "import tensorflow as tf\n", "\n", "\n", "ds_text_encoded = tf.data.Dataset.from_tensor_slices(text_encoded)\n", "\n", "for ex in ds_text_encoded.take(5):\n", " print('{} -> {}'.format(ex.numpy(), char_array[ex.numpy()]))" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:20:52.100011Z", "iopub.status.busy": "2021-01-01T08:20:52.099319Z", "iopub.status.idle": "2021-01-01T08:20:52.107438Z", "shell.execute_reply": "2021-01-01T08:20:52.108132Z" }, "id": "Ilw6v7iKyWJq", "outputId": "9135408b-716d-414d-cce9-2fe4347c6822" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "[44 32 29 1 37 48 43 44 29 42 33 39 45 43 1 33 43 36 25 38 28 1 6 6\n", " 6 0 0 0 0 0 40 67 64 53 70 52 54 53 1 51] -> 74\n", "'THE MYSTERIOUS ISLAND ***\\n\\n\\n\\n\\nProduced b' -> 'y'\n" ] } ], "source": [ "seq_length = 40\n", "chunk_size = seq_length + 1\n", "\n", "ds_chunks = ds_text_encoded.batch(chunk_size, drop_remainder=True)\n", "\n", "## inspection:\n", "for seq in ds_chunks.take(1):\n", " input_seq = seq[:seq_length].numpy()\n", " target = seq[seq_length].numpy()\n", " print(input_seq, ' -> ', target)\n", " print(repr(''.join(char_array[input_seq])),\n", " ' -> ', repr(''.join(char_array[target])))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 514 }, "execution": { "iopub.execute_input": "2021-01-01T08:20:52.113143Z", "iopub.status.busy": "2021-01-01T08:20:52.112271Z", "iopub.status.idle": "2021-01-01T08:20:52.120737Z", "shell.execute_reply": "2021-01-01T08:20:52.121466Z" }, "id": "9Ot6_7ptyWJq", "outputId": "351a0415-cc73-4040-edea-40f2898710f7" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 10 } ], "source": [ "Image(url='https://git.io/JLdVr', width=700)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:20:52.131350Z", "iopub.status.busy": "2021-01-01T08:20:52.130702Z", "iopub.status.idle": "2021-01-01T08:20:52.199641Z", "shell.execute_reply": "2021-01-01T08:20:52.198873Z" }, "id": "oDIGchMbyWJq", "outputId": "b70bf233-aa2a-430b-9e40-5d289e74797c" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "입력 (x): 'THE MYSTERIOUS ISLAND ***\\n\\n\\n\\n\\nProduced b'\n", "타깃 (y): 'HE MYSTERIOUS ISLAND ***\\n\\n\\n\\n\\nProduced by'\n", "\n", "입력 (x): ' Anthony Matonak, and Trevor Carlson\\n\\n\\n\\n'\n", "타깃 (y): 'Anthony Matonak, and Trevor Carlson\\n\\n\\n\\n\\n'\n", "\n" ] } ], "source": [ "## x & y를 나누기 위한 함수를 정의합니다\n", "def split_input_target(chunk):\n", " input_seq = chunk[:-1]\n", " target_seq = chunk[1:]\n", " return input_seq, target_seq\n", "\n", "ds_sequences = ds_chunks.map(split_input_target)\n", "\n", "## 확인:\n", "for example in ds_sequences.take(2):\n", " print('입력 (x):', repr(''.join(char_array[example[0].numpy()])))\n", " print('타깃 (y):', repr(''.join(char_array[example[1].numpy()])))\n", " print()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:20:52.206427Z", "iopub.status.busy": "2021-01-01T08:20:52.205493Z", "iopub.status.idle": "2021-01-01T08:20:52.214632Z", "shell.execute_reply": "2021-01-01T08:20:52.214055Z" }, "id": "EwXM1x7MyWJr", "outputId": "80db52f7-c6c1-4cbe-c1a3-100fe66a61a4" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "<_BatchDataset element_spec=(TensorSpec(shape=(None, 40), dtype=tf.int32, name=None), TensorSpec(shape=(None, 40), dtype=tf.int32, name=None))>" ] }, "metadata": {}, "execution_count": 12 } ], "source": [ "# 배치 크기\n", "BATCH_SIZE = 64\n", "BUFFER_SIZE = 10000\n", "\n", "tf.random.set_seed(1)\n", "ds = ds_sequences.shuffle(BUFFER_SIZE).batch(BATCH_SIZE)# drop_remainder=True)\n", "\n", "ds" ] }, { "cell_type": "markdown", "metadata": { "id": "ALd_KVXbyWJr" }, "source": [ "### 문자 수준의 RNN 모델 만들기" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 247 }, "execution": { "iopub.execute_input": "2021-01-01T08:20:52.222512Z", "iopub.status.busy": "2021-01-01T08:20:52.221844Z", "iopub.status.idle": "2021-01-01T08:20:52.507739Z", "shell.execute_reply": "2021-01-01T08:20:52.508469Z" }, "id": "2DQqLCqAyWJr", "outputId": "d4c729a2-5025-48bc-a143-546db078209b" }, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "\u001b[1mModel: \"sequential_7\"\u001b[0m\n" ], "text/html": [ "
Model: \"sequential_7\"\n",
              "
\n" ] }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": [ "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓\n", "┃\u001b[1m \u001b[0m\u001b[1mLayer (type) \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mOutput Shape \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1m Param #\u001b[0m\u001b[1m \u001b[0m┃\n", "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩\n", "│ embedding_11 (\u001b[38;5;33mEmbedding\u001b[0m) │ ? │ \u001b[38;5;34m0\u001b[0m (unbuilt) │\n", "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", "│ lstm_7 (\u001b[38;5;33mLSTM\u001b[0m) │ ? │ \u001b[38;5;34m0\u001b[0m (unbuilt) │\n", "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", "│ dense_7 (\u001b[38;5;33mDense\u001b[0m) │ ? │ \u001b[38;5;34m0\u001b[0m (unbuilt) │\n", "└─────────────────────────────────┴────────────────────────┴───────────────┘\n" ], "text/html": [ "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓\n",
              "┃ Layer (type)                     Output Shape                  Param # ┃\n",
              "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩\n",
              "│ embedding_11 (Embedding)        │ ?                      │   0 (unbuilt) │\n",
              "├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
              "│ lstm_7 (LSTM)                   │ ?                      │   0 (unbuilt) │\n",
              "├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
              "│ dense_7 (Dense)                 │ ?                      │   0 (unbuilt) │\n",
              "└─────────────────────────────────┴────────────────────────┴───────────────┘\n",
              "
\n" ] }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": [ "\u001b[1m Total params: \u001b[0m\u001b[38;5;34m0\u001b[0m (0.00 B)\n" ], "text/html": [ "
 Total params: 0 (0.00 B)\n",
              "
\n" ] }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": [ "\u001b[1m Trainable params: \u001b[0m\u001b[38;5;34m0\u001b[0m (0.00 B)\n" ], "text/html": [ "
 Trainable params: 0 (0.00 B)\n",
              "
\n" ] }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": [ "\u001b[1m Non-trainable params: \u001b[0m\u001b[38;5;34m0\u001b[0m (0.00 B)\n" ], "text/html": [ "
 Non-trainable params: 0 (0.00 B)\n",
              "
\n" ] }, "metadata": {} } ], "source": [ "def build_model(vocab_size, embedding_dim, rnn_units):\n", " model = tf.keras.Sequential([\n", " tf.keras.layers.Embedding(vocab_size, embedding_dim),\n", " tf.keras.layers.LSTM(\n", " rnn_units, return_sequences=True),\n", " tf.keras.layers.Dense(vocab_size)\n", " ])\n", " return model\n", "\n", "\n", "charset_size = len(char_array)\n", "embedding_dim = 256\n", "rnn_units = 512\n", "\n", "tf.random.set_seed(1)\n", "\n", "model = build_model(\n", " vocab_size = charset_size,\n", " embedding_dim=embedding_dim,\n", " rnn_units=rnn_units)\n", "\n", "model.summary()" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:20:52.523264Z", "iopub.status.busy": "2021-01-01T08:20:52.522357Z", "iopub.status.idle": "2021-01-01T08:52:55.869819Z", "shell.execute_reply": "2021-01-01T08:52:55.870445Z" }, "id": "6oDKVX0JyWJr", "outputId": "7ef015c9-ff3d-4ef8-c6f2-634dce2beb04" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Epoch 1/20\n", "\u001b[1m424/424\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m5s\u001b[0m 5ms/step - loss: 2.6163\n", "Epoch 2/20\n", "\u001b[1m424/424\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 5ms/step - loss: 1.7046\n", "Epoch 3/20\n", "\u001b[1m424/424\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 5ms/step - loss: 1.4891\n", "Epoch 4/20\n", "\u001b[1m424/424\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 5ms/step - loss: 1.3822\n", "Epoch 5/20\n", "\u001b[1m424/424\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 5ms/step - loss: 1.3160\n", "Epoch 6/20\n", "\u001b[1m424/424\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 5ms/step - loss: 1.2699\n", "Epoch 7/20\n", "\u001b[1m424/424\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 5ms/step - loss: 1.2370\n", "Epoch 8/20\n", "\u001b[1m424/424\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 5ms/step - loss: 1.2099\n", "Epoch 9/20\n", "\u001b[1m424/424\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 5ms/step - loss: 1.1820\n", "Epoch 10/20\n", "\u001b[1m424/424\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 5ms/step - loss: 1.1663\n", "Epoch 11/20\n", "\u001b[1m424/424\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 5ms/step - loss: 1.1460\n", "Epoch 12/20\n", "\u001b[1m424/424\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 5ms/step - loss: 1.1275\n", "Epoch 13/20\n", "\u001b[1m424/424\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 5ms/step - loss: 1.1099\n", "Epoch 14/20\n", "\u001b[1m424/424\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 5ms/step - loss: 1.0946\n", "Epoch 15/20\n", "\u001b[1m424/424\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 5ms/step - loss: 1.0759\n", "Epoch 16/20\n", "\u001b[1m424/424\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 5ms/step - loss: 1.0631\n", "Epoch 17/20\n", "\u001b[1m424/424\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 5ms/step - loss: 1.0511\n", "Epoch 18/20\n", "\u001b[1m424/424\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 5ms/step - loss: 1.0343\n", "Epoch 19/20\n", "\u001b[1m424/424\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 5ms/step - loss: 1.0219\n", "Epoch 20/20\n", "\u001b[1m424/424\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 5ms/step - loss: 1.0086\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": {}, "execution_count": 35 } ], "source": [ "model.compile(\n", " optimizer='adam',\n", " loss=tf.keras.losses.SparseCategoricalCrossentropy(\n", " from_logits=True\n", " ))\n", "\n", "model.fit(ds, epochs=20)" ] }, { "cell_type": "markdown", "metadata": { "id": "HPxMUjzzyWJr" }, "source": [ "### 평가 단계 - 새로운 텍스트 생성" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:52:55.887487Z", "iopub.status.busy": "2021-01-01T08:52:55.886806Z", "iopub.status.idle": "2021-01-01T08:52:55.892794Z", "shell.execute_reply": "2021-01-01T08:52:55.892440Z" }, "id": "qJ8Wf-ofyWJs", "outputId": "c352d67b-27de-4252-a0fc-e5b126e30fb4" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "확률: [0.33333334 0.33333334 0.33333334]\n", "array([[1, 2, 0, 1, 0, 1, 1, 2, 1, 1]])\n" ] } ], "source": [ "tf.random.set_seed(1)\n", "\n", "logits = [[1.0, 1.0, 1.0]]\n", "print('확률:', tf.math.softmax(logits).numpy()[0])\n", "\n", "samples = tf.random.categorical(\n", " logits=logits, num_samples=10)\n", "tf.print(samples.numpy())" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:52:55.899437Z", "iopub.status.busy": "2021-01-01T08:52:55.898456Z", "iopub.status.idle": "2021-01-01T08:52:55.903367Z", "shell.execute_reply": "2021-01-01T08:52:55.904194Z" }, "id": "PZd-fbXByWJs", "outputId": "f8f78eff-ca97-4dd2-9d6f-07f89b2b8522" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "확률: [0.10650698 0.10650698 0.78698605]\n", "array([[2, 2, 0, 2, 2, 2, 2, 2, 1, 2]])\n" ] } ], "source": [ "tf.random.set_seed(1)\n", "\n", "logits = [[1.0, 1.0, 3.0]]\n", "print('확률:', tf.math.softmax(logits).numpy()[0])\n", "\n", "samples = tf.random.categorical(\n", " logits=logits, num_samples=10)\n", "tf.print(samples.numpy())" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:52:55.914798Z", "iopub.status.busy": "2021-01-01T08:52:55.914137Z", "iopub.status.idle": "2021-01-01T08:53:06.162148Z", "shell.execute_reply": "2021-01-01T08:53:06.162896Z" }, "id": "R_b2-tLDyWJs", "outputId": "5270df5d-186e-49e1-c14b-01d51662f4f8" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "The island was egbly float in the water.\n", "\n", "They never seen for the present, from whole found it marked go river. But the obstinate vast continents of trees should ask any well. Here is a vessel was increased by\n", "Union? Gretoms,” said Herbert.\n", "\n", "“I should be a simple enemy. Master Jup should take up, with its entable that he might little obscure.\n", "\n", "As to Hester Cyrus Harding, Neb, and there were neither observation.\n", "\n", "The engineer intended a see the first servants\n", "of a coldience towards the last season.\n", "\n", "As to \n" ] } ], "source": [ "def sample(model, starting_str,\n", " len_generated_text=500,\n", " max_input_length=40,\n", " scale_factor=1.0):\n", " encoded_input = [char2int[s] for s in starting_str]\n", " encoded_input = tf.reshape(encoded_input, (1, -1))\n", "\n", " generated_str = starting_str\n", "\n", " # model.reset_states()\n", " for i in range(len_generated_text):\n", " logits = model(encoded_input)\n", " logits = tf.squeeze(logits, 0)\n", "\n", " scaled_logits = logits * scale_factor\n", " new_char_indx = tf.random.categorical(\n", " scaled_logits, num_samples=1)\n", "\n", " new_char_indx = tf.squeeze(new_char_indx)[-1].numpy()\n", "\n", " generated_str += str(char_array[new_char_indx])\n", "\n", " new_char_indx = tf.expand_dims([new_char_indx], 0)\n", " encoded_input = tf.concat(\n", " [encoded_input, new_char_indx],\n", " axis=1)\n", " encoded_input = encoded_input[:, -max_input_length:]\n", "\n", " return generated_str\n", "\n", "tf.random.set_seed(1)\n", "print(sample(model, starting_str='The island'))" ] }, { "cell_type": "markdown", "metadata": { "id": "Stm4dLH2yWJs" }, "source": [ "* **예측 가능성 대 무작위성**" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:53:06.171621Z", "iopub.status.busy": "2021-01-01T08:53:06.170721Z", "iopub.status.idle": "2021-01-01T08:53:06.175211Z", "shell.execute_reply": "2021-01-01T08:53:06.175546Z" }, "id": "-UKHgrqTyWJs", "outputId": "a23daf10-03d3-4975-9c06-791e0a94d47f" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "스케일 조정 전의 확률: [0.10650698 0.10650698 0.78698604]\n", "0.5배 조정 후 확률: [0.21194156 0.21194156 0.57611688]\n", "0.1배 조정 후 확률: [0.31042377 0.31042377 0.37915245]\n" ] } ], "source": [ "logits = np.array([[1.0, 1.0, 3.0]])\n", "\n", "print('스케일 조정 전의 확률: ', tf.math.softmax(logits).numpy()[0])\n", "\n", "print('0.5배 조정 후 확률: ', tf.math.softmax(0.5*logits).numpy()[0])\n", "\n", "print('0.1배 조정 후 확률: ', tf.math.softmax(0.1*logits).numpy()[0])" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:53:06.182417Z", "iopub.status.busy": "2021-01-01T08:53:06.182031Z", "iopub.status.idle": "2021-01-01T08:53:16.415192Z", "shell.execute_reply": "2021-01-01T08:53:16.415932Z" }, "id": "ickgML_8yWJt", "outputId": "58b42e64-fce9-4312-e27e-6df66204e311" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "The island was in the little band was endeavored to transist some profound drop were the case of the cart was enough to cave up the same time a sort of flight. A few moments were still the castaways could not be able to give the heart of the sea, and the convicts had not been able to die to the convicts.\n", "\n", "“That is not long it?” asked the reporter.\n", "\n", "“No,” replied the sailor.\n", "“And you will be the convicts have only lose in the fire of the left bank of the Mercy, and the wire was now to be feared. As to the \n" ] } ], "source": [ "tf.random.set_seed(1)\n", "print(sample(model, starting_str='The island',\n", " scale_factor=2.0))" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:53:16.424579Z", "iopub.status.busy": "2021-01-01T08:53:16.424183Z", "iopub.status.idle": "2021-01-01T08:53:26.644678Z", "shell.execute_reply": "2021-01-01T08:53:26.643763Z" }, "id": "UF4sqqpcyWJt", "outputId": "4d3f3d2f-5491-4356-9035-72a30190ac8f" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "The island had egblope fox Cork any way Heaven. SU“more ciff shote never, I\n", "rejoke,--1 of\n", "Jouttin. Be punfivecteds had bohes\n", "visipeh gave\n", "unfluence open,\n", "sig her a visib who indergrieable mathar,\n", "orcape\n", "lyevinaze,\n", "liste,\n", "ressec\n", "larquet,\n", "with\n", "occosped loss encrobce; in less.\n", "Harding’s dwelling was nowed hereods! excetively Eisable, unknowcbod\n", "frose, uses, smening, whn Pencroft been her cheside ral dwelbing!”\n", "\n", "Their\n", "yearspog\n", "here?ed zeal-slet,\n", "lift; vercy, lines us, ob,” rest Jup of tiedles’, sombority our \n" ] } ], "source": [ "tf.random.set_seed(1)\n", "print(sample(model, starting_str='The island',\n", " scale_factor=0.5))" ] }, { "cell_type": "markdown", "metadata": { "id": "_V30iCf5yWJt" }, "source": [ "# 트랜스포머 모델을 사용한 언어 이해\n", "\n", "## 셀프 어텐션 메카니즘 이해하기\n", "\n", "### 셀프 어텐션 기본 구조" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 350 }, "execution": { "iopub.execute_input": "2021-01-01T08:53:26.650577Z", "iopub.status.busy": "2021-01-01T08:53:26.649535Z", "iopub.status.idle": "2021-01-01T08:53:26.663374Z", "shell.execute_reply": "2021-01-01T08:53:26.664093Z" }, "id": "aWx-IXRtyWJt", "outputId": "d6bc0765-923d-4095-ee30-7c300d5eefe7" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 43 } ], "source": [ "Image(url='https://git.io/JLdVo', width=700)" ] }, { "cell_type": "markdown", "metadata": { "id": "EXNdJdxnyWJt" }, "source": [ "### 쿼리, 키, 값 가중치를 가진 셀프 어텐션 메카니즘" ] }, { "cell_type": "markdown", "metadata": { "id": "o99EBntFyWJt" }, "source": [ "## 멀티-헤드 어텐션과 트랜스포머 블록" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 278 }, "execution": { "iopub.execute_input": "2021-01-01T08:53:26.668048Z", "iopub.status.busy": "2021-01-01T08:53:26.667215Z", "iopub.status.idle": "2021-01-01T08:53:26.672956Z", "shell.execute_reply": "2021-01-01T08:53:26.673524Z" }, "id": "Klo3F7nvyWJu", "outputId": "5b86caef-b5b3-45dc-e937-4d7944e160e1" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 44 } ], "source": [ "Image(url='https://git.io/JLdV6', width=700)" ] } ], "metadata": { "accelerator": "GPU", "colab": { "name": "ch16_part2.ipynb", "provenance": [], "gpuType": "A100" }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 0 }