{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mrDzC9FUyWJk"
      },
      "source": [
        "# 머신 러닝 교과서 3판"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "90ZMzS7wyWJm"
      },
      "source": [
        "# 16장 - 순환 신경망으로 순차 데이터 모델링 (2/2)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "i-YXfR8UyWJm"
      },
      "source": [
        "**아래 링크를 통해 이 노트북을 주피터 노트북 뷰어(nbviewer.jupyter.org)로 보거나 구글 코랩(colab.research.google.com)에서 실행할 수 있습니다.**\n",
        "\n",
        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://nbviewer.org/github/rickiepark/python-machine-learning-book-3rd-edition/blob/master/ch16/ch16_part2.ipynb\"><img src=\"https://jupyter.org/assets/share.png\" width=\"60\" />주피터 노트북 뷰어로 보기</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/rickiepark/python-machine-learning-book-3rd-edition/blob/master/ch16/ch16_part2.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />구글 코랩(Colab)에서 실행하기</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5_u_b0r4yWJn"
      },
      "source": [
        "### 목차"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "G-xguWppyWJn"
      },
      "source": [
        "- 텐서플로로 시퀀스 모델링을 위한 RNN 구현하기\n",
        "    - 두 번째 프로젝트: 텐서플로로 글자 단위 언어 모델 구현\n",
        "        - 데이터셋 전처리\n",
        "        - 문자 수준의 RNN 모델 만들기\n",
        "        - 평가 단계 - 새로운 텍스트 생성\n",
        "- 트랜스포머 모델을 사용한 언어 이해\n",
        "    - 셀프 어텐션 메카니즘 이해하기\n",
        "        - 셀프 어텐션 기본 구조\n",
        "        - 쿼리, 키, 값 가중치를 가진 셀프 어텐션 메카니즘\n",
        "    - 멀티-헤드 어텐션과 트랜스포머 블록\n",
        "- 요약"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-01-01T08:20:46.163997Z",
          "iopub.status.busy": "2021-01-01T08:20:46.163019Z",
          "iopub.status.idle": "2021-01-01T08:20:46.165178Z",
          "shell.execute_reply": "2021-01-01T08:20:46.165803Z"
        },
        "id": "RLnk-L-ZyWJn"
      },
      "outputs": [],
      "source": [
        "from IPython.display import Image"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "k8wSLT9byWJn"
      },
      "source": [
        "## 두 번째 프로젝트: 텐서플로로 글자 단위 언어 모델 구현"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 240
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:20:46.170192Z",
          "iopub.status.busy": "2021-01-01T08:20:46.169457Z",
          "iopub.status.idle": "2021-01-01T08:20:46.205811Z",
          "shell.execute_reply": "2021-01-01T08:20:46.206653Z"
        },
        "id": "-tysyJWSyWJn",
        "outputId": "68191190-4cae-492d-d3c3-7873b0fd4bfc"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/JLdVE\" width=\"700\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 2
        }
      ],
      "source": [
        "Image(url='https://git.io/JLdVE', width=700)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xrqTH2wvyWJo"
      },
      "source": [
        "### 데이터셋 전처리"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:20:46.213066Z",
          "iopub.status.busy": "2021-01-01T08:20:46.212126Z",
          "iopub.status.idle": "2021-01-01T08:20:50.103519Z",
          "shell.execute_reply": "2021-01-01T08:20:50.102717Z"
        },
        "id": "GtxzLN-RyWJo",
        "outputId": "c520517c-2ba7-41ed-9ea1-35f603ed08e0"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "--2023-11-10 06:06:00--  https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch16/1268-0.txt\n",
            "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n",
            "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n",
            "HTTP request sent, awaiting response... 200 OK\n",
            "Length: 1171600 (1.1M) [text/plain]\n",
            "Saving to: ‘1268-0.txt’\n",
            "\n",
            "1268-0.txt          100%[===================>]   1.12M  --.-KB/s    in 0.06s   \n",
            "\n",
            "2023-11-10 06:06:00 (18.0 MB/s) - ‘1268-0.txt’ saved [1171600/1171600]\n",
            "\n"
          ]
        }
      ],
      "source": [
        "# 코랩에서 실행할 경우 다음 코드를 실행해 주세요.\n",
        "!wget https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch16/1268-0.txt"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:20:50.111825Z",
          "iopub.status.busy": "2021-01-01T08:20:50.110901Z",
          "iopub.status.idle": "2021-01-01T08:20:50.138395Z",
          "shell.execute_reply": "2021-01-01T08:20:50.137468Z"
        },
        "id": "mRwzImO7yWJo",
        "outputId": "9422b729-3fb5-4c3f-a064-230f4e71c900"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "567 1112917\n",
            "전체 길이: 1112350\n",
            "고유한 문자: 80\n"
          ]
        }
      ],
      "source": [
        "import numpy as np\n",
        "\n",
        "\n",
        "## 텍스트 읽고 전처리하기\n",
        "with open('1268-0.txt', 'r', encoding='UTF8') as fp:\n",
        "    text=fp.read()\n",
        "\n",
        "start_indx = text.find('THE MYSTERIOUS ISLAND')\n",
        "end_indx = text.find('End of the Project Gutenberg')\n",
        "print(start_indx, end_indx)\n",
        "\n",
        "text = text[start_indx:end_indx]\n",
        "char_set = set(text)\n",
        "print('전체 길이:', len(text))\n",
        "print('고유한 문자:', len(char_set))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 289
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:20:50.143620Z",
          "iopub.status.busy": "2021-01-01T08:20:50.142625Z",
          "iopub.status.idle": "2021-01-01T08:20:50.148075Z",
          "shell.execute_reply": "2021-01-01T08:20:50.148851Z"
        },
        "id": "HjzozZLuyWJp",
        "outputId": "27923ff3-33fc-4618-e6d9-e033d57f132a"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/JLdVz\" width=\"700\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 5
        }
      ],
      "source": [
        "Image(url='https://git.io/JLdVz', width=700)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:20:50.198178Z",
          "iopub.status.busy": "2021-01-01T08:20:50.167394Z",
          "iopub.status.idle": "2021-01-01T08:20:50.298755Z",
          "shell.execute_reply": "2021-01-01T08:20:50.297847Z"
        },
        "id": "X3zvJimcyWJp",
        "outputId": "a33a234d-e9d3-4802-dcba-51927200a809"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "인코딩된 텍스트 크기:  (1112350,)\n",
            "THE MYSTERIOUS       == 인코딩 ==>  [44 32 29  1 37 48 43 44 29 42 33 39 45 43  1]\n",
            "[33 43 36 25 38 28]  == 디코딩 ==>  ISLAND\n"
          ]
        }
      ],
      "source": [
        "chars_sorted = sorted(char_set)\n",
        "char2int = {ch:i for i,ch in enumerate(chars_sorted)}\n",
        "char_array = np.array(chars_sorted)\n",
        "\n",
        "text_encoded = np.array(\n",
        "    [char2int[ch] for ch in text],\n",
        "    dtype=np.int32)\n",
        "\n",
        "print('인코딩된 텍스트 크기: ', text_encoded.shape)\n",
        "\n",
        "print(text[:15], '     == 인코딩 ==> ', text_encoded[:15])\n",
        "print(text_encoded[15:21], ' == 디코딩 ==> ', ''.join(char_array[text_encoded[15:21]]))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 419
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:20:50.303833Z",
          "iopub.status.busy": "2021-01-01T08:20:50.302867Z",
          "iopub.status.idle": "2021-01-01T08:20:50.309281Z",
          "shell.execute_reply": "2021-01-01T08:20:50.310125Z"
        },
        "id": "c9JdWEpwyWJp",
        "outputId": "7798af70-4fb3-4ac8-ee0e-bdad57ea53f3"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/JLdVV\" width=\"700\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 7
        }
      ],
      "source": [
        "Image(url='https://git.io/JLdVV', width=700)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:20:50.316611Z",
          "iopub.status.busy": "2021-01-01T08:20:50.315609Z",
          "iopub.status.idle": "2021-01-01T08:20:52.091027Z",
          "shell.execute_reply": "2021-01-01T08:20:52.091768Z"
        },
        "id": "_E5CojZ5yWJp",
        "outputId": "e98cd3a2-acd5-4d44-f4a1-e0a61dd70993"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "44 -> T\n",
            "32 -> H\n",
            "29 -> E\n",
            "1 ->  \n",
            "37 -> M\n"
          ]
        }
      ],
      "source": [
        "import tensorflow as tf\n",
        "\n",
        "\n",
        "ds_text_encoded = tf.data.Dataset.from_tensor_slices(text_encoded)\n",
        "\n",
        "for ex in ds_text_encoded.take(5):\n",
        "    print('{} -> {}'.format(ex.numpy(), char_array[ex.numpy()]))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:20:52.100011Z",
          "iopub.status.busy": "2021-01-01T08:20:52.099319Z",
          "iopub.status.idle": "2021-01-01T08:20:52.107438Z",
          "shell.execute_reply": "2021-01-01T08:20:52.108132Z"
        },
        "id": "Ilw6v7iKyWJq",
        "outputId": "9aba0cbc-eaeb-4fc7-c33c-c8cc692e1e89"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[44 32 29  1 37 48 43 44 29 42 33 39 45 43  1 33 43 36 25 38 28  1  6  6\n",
            "  6  0  0  0  0  0 40 67 64 53 70 52 54 53  1 51]  ->  74\n",
            "'THE MYSTERIOUS ISLAND ***\\n\\n\\n\\n\\nProduced b'  ->  'y'\n"
          ]
        }
      ],
      "source": [
        "seq_length = 40\n",
        "chunk_size = seq_length + 1\n",
        "\n",
        "ds_chunks = ds_text_encoded.batch(chunk_size, drop_remainder=True)\n",
        "\n",
        "## inspection:\n",
        "for seq in ds_chunks.take(1):\n",
        "    input_seq = seq[:seq_length].numpy()\n",
        "    target = seq[seq_length].numpy()\n",
        "    print(input_seq, ' -> ', target)\n",
        "    print(repr(''.join(char_array[input_seq])),\n",
        "          ' -> ', repr(''.join(char_array[target])))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 515
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:20:52.113143Z",
          "iopub.status.busy": "2021-01-01T08:20:52.112271Z",
          "iopub.status.idle": "2021-01-01T08:20:52.120737Z",
          "shell.execute_reply": "2021-01-01T08:20:52.121466Z"
        },
        "id": "9Ot6_7ptyWJq",
        "outputId": "b2b28175-5d33-4639-d228-693008e4c726"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/JLdVr\" width=\"700\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 10
        }
      ],
      "source": [
        "Image(url='https://git.io/JLdVr', width=700)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:20:52.131350Z",
          "iopub.status.busy": "2021-01-01T08:20:52.130702Z",
          "iopub.status.idle": "2021-01-01T08:20:52.199641Z",
          "shell.execute_reply": "2021-01-01T08:20:52.198873Z"
        },
        "id": "oDIGchMbyWJq",
        "outputId": "50dbdc08-1bb1-448a-b016-f971b13f3111"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "입력 (x): 'THE MYSTERIOUS ISLAND ***\\n\\n\\n\\n\\nProduced b'\n",
            "타깃 (y): 'HE MYSTERIOUS ISLAND ***\\n\\n\\n\\n\\nProduced by'\n",
            "\n",
            "입력 (x): ' Anthony Matonak, and Trevor Carlson\\n\\n\\n\\n'\n",
            "타깃 (y): 'Anthony Matonak, and Trevor Carlson\\n\\n\\n\\n\\n'\n",
            "\n"
          ]
        }
      ],
      "source": [
        "## x & y를 나누기 위한 함수를 정의합니다\n",
        "def split_input_target(chunk):\n",
        "    input_seq = chunk[:-1]\n",
        "    target_seq = chunk[1:]\n",
        "    return input_seq, target_seq\n",
        "\n",
        "ds_sequences = ds_chunks.map(split_input_target)\n",
        "\n",
        "## 확인:\n",
        "for example in ds_sequences.take(2):\n",
        "    print('입력 (x):', repr(''.join(char_array[example[0].numpy()])))\n",
        "    print('타깃 (y):', repr(''.join(char_array[example[1].numpy()])))\n",
        "    print()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:20:52.206427Z",
          "iopub.status.busy": "2021-01-01T08:20:52.205493Z",
          "iopub.status.idle": "2021-01-01T08:20:52.214632Z",
          "shell.execute_reply": "2021-01-01T08:20:52.214055Z"
        },
        "id": "EwXM1x7MyWJr",
        "outputId": "d5fd1538-cf59-45aa-f608-65585a141236"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<_BatchDataset element_spec=(TensorSpec(shape=(None, 40), dtype=tf.int32, name=None), TensorSpec(shape=(None, 40), dtype=tf.int32, name=None))>"
            ]
          },
          "metadata": {},
          "execution_count": 12
        }
      ],
      "source": [
        "# 배치 크기\n",
        "BATCH_SIZE = 64\n",
        "BUFFER_SIZE = 10000\n",
        "\n",
        "tf.random.set_seed(1)\n",
        "ds = ds_sequences.shuffle(BUFFER_SIZE).batch(BATCH_SIZE)# drop_remainder=True)\n",
        "\n",
        "ds"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ALd_KVXbyWJr"
      },
      "source": [
        "### 문자 수준의 RNN 모델 만들기"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:20:52.222512Z",
          "iopub.status.busy": "2021-01-01T08:20:52.221844Z",
          "iopub.status.idle": "2021-01-01T08:20:52.507739Z",
          "shell.execute_reply": "2021-01-01T08:20:52.508469Z"
        },
        "id": "2DQqLCqAyWJr",
        "outputId": "5db512d4-2afb-4c20-c229-9b80fc90d8cc"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Model: \"sequential\"\n",
            "_________________________________________________________________\n",
            " Layer (type)                Output Shape              Param #   \n",
            "=================================================================\n",
            " embedding (Embedding)       (None, None, 256)         20480     \n",
            "                                                                 \n",
            " lstm (LSTM)                 (None, None, 512)         1574912   \n",
            "                                                                 \n",
            " dense (Dense)               (None, None, 80)          41040     \n",
            "                                                                 \n",
            "=================================================================\n",
            "Total params: 1636432 (6.24 MB)\n",
            "Trainable params: 1636432 (6.24 MB)\n",
            "Non-trainable params: 0 (0.00 Byte)\n",
            "_________________________________________________________________\n"
          ]
        }
      ],
      "source": [
        "def build_model(vocab_size, embedding_dim, rnn_units):\n",
        "    model = tf.keras.Sequential([\n",
        "        tf.keras.layers.Embedding(vocab_size, embedding_dim),\n",
        "        tf.keras.layers.LSTM(\n",
        "            rnn_units, return_sequences=True),\n",
        "        tf.keras.layers.Dense(vocab_size)\n",
        "    ])\n",
        "    return model\n",
        "\n",
        "\n",
        "charset_size = len(char_array)\n",
        "embedding_dim = 256\n",
        "rnn_units = 512\n",
        "\n",
        "tf.random.set_seed(1)\n",
        "\n",
        "model = build_model(\n",
        "    vocab_size = charset_size,\n",
        "    embedding_dim=embedding_dim,\n",
        "    rnn_units=rnn_units)\n",
        "\n",
        "model.summary()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:20:52.523264Z",
          "iopub.status.busy": "2021-01-01T08:20:52.522357Z",
          "iopub.status.idle": "2021-01-01T08:52:55.869819Z",
          "shell.execute_reply": "2021-01-01T08:52:55.870445Z"
        },
        "id": "6oDKVX0JyWJr",
        "outputId": "4fa28c17-1663-4500-adfa-b989c31b642e"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Epoch 1/20\n",
            "424/424 [==============================] - 20s 17ms/step - loss: 2.2842\n",
            "Epoch 2/20\n",
            "424/424 [==============================] - 8s 16ms/step - loss: 1.7162\n",
            "Epoch 3/20\n",
            "424/424 [==============================] - 6s 13ms/step - loss: 1.5044\n",
            "Epoch 4/20\n",
            "424/424 [==============================] - 7s 13ms/step - loss: 1.3886\n",
            "Epoch 5/20\n",
            "424/424 [==============================] - 7s 14ms/step - loss: 1.3185\n",
            "Epoch 6/20\n",
            "424/424 [==============================] - 7s 14ms/step - loss: 1.2698\n",
            "Epoch 7/20\n",
            "424/424 [==============================] - 8s 16ms/step - loss: 1.2330\n",
            "Epoch 8/20\n",
            "424/424 [==============================] - 6s 13ms/step - loss: 1.2033\n",
            "Epoch 9/20\n",
            "424/424 [==============================] - 7s 16ms/step - loss: 1.1789\n",
            "Epoch 10/20\n",
            "424/424 [==============================] - 7s 15ms/step - loss: 1.1565\n",
            "Epoch 11/20\n",
            "424/424 [==============================] - 7s 14ms/step - loss: 1.1369\n",
            "Epoch 12/20\n",
            "424/424 [==============================] - 8s 14ms/step - loss: 1.1182\n",
            "Epoch 13/20\n",
            "424/424 [==============================] - 7s 15ms/step - loss: 1.1011\n",
            "Epoch 14/20\n",
            "424/424 [==============================] - 6s 13ms/step - loss: 1.0846\n",
            "Epoch 15/20\n",
            "424/424 [==============================] - 7s 14ms/step - loss: 1.0692\n",
            "Epoch 16/20\n",
            "424/424 [==============================] - 8s 16ms/step - loss: 1.0542\n",
            "Epoch 17/20\n",
            "424/424 [==============================] - 6s 13ms/step - loss: 1.0396\n",
            "Epoch 18/20\n",
            "424/424 [==============================] - 7s 14ms/step - loss: 1.0249\n",
            "Epoch 19/20\n",
            "424/424 [==============================] - 8s 16ms/step - loss: 1.0105\n",
            "Epoch 20/20\n",
            "424/424 [==============================] - 7s 14ms/step - loss: 0.9967\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<keras.src.callbacks.History at 0x7badc8bcbb50>"
            ]
          },
          "metadata": {},
          "execution_count": 14
        }
      ],
      "source": [
        "model.compile(\n",
        "    optimizer='adam',\n",
        "    loss=tf.keras.losses.SparseCategoricalCrossentropy(\n",
        "        from_logits=True\n",
        "    ))\n",
        "\n",
        "model.fit(ds, epochs=20)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HPxMUjzzyWJr"
      },
      "source": [
        "### 평가 단계 - 새로운 텍스트 생성"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:52:55.887487Z",
          "iopub.status.busy": "2021-01-01T08:52:55.886806Z",
          "iopub.status.idle": "2021-01-01T08:52:55.892794Z",
          "shell.execute_reply": "2021-01-01T08:52:55.892440Z"
        },
        "id": "qJ8Wf-ofyWJs",
        "outputId": "9109244e-c362-49e4-ba7c-5d976d8073c0"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "확률: [0.33333334 0.33333334 0.33333334]\n",
            "array([[1, 2, 0, 1, 0, 1, 1, 2, 1, 1]])\n"
          ]
        }
      ],
      "source": [
        "tf.random.set_seed(1)\n",
        "\n",
        "logits = [[1.0, 1.0, 1.0]]\n",
        "print('확률:', tf.math.softmax(logits).numpy()[0])\n",
        "\n",
        "samples = tf.random.categorical(\n",
        "    logits=logits, num_samples=10)\n",
        "tf.print(samples.numpy())"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:52:55.899437Z",
          "iopub.status.busy": "2021-01-01T08:52:55.898456Z",
          "iopub.status.idle": "2021-01-01T08:52:55.903367Z",
          "shell.execute_reply": "2021-01-01T08:52:55.904194Z"
        },
        "id": "PZd-fbXByWJs",
        "outputId": "095725b4-3ca4-4ed7-dbbd-d0a8ce74028d"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "확률: [0.10650698 0.10650698 0.78698605]\n",
            "array([[2, 2, 0, 2, 2, 2, 2, 2, 1, 2]])\n"
          ]
        }
      ],
      "source": [
        "tf.random.set_seed(1)\n",
        "\n",
        "logits = [[1.0, 1.0, 3.0]]\n",
        "print('확률:', tf.math.softmax(logits).numpy()[0])\n",
        "\n",
        "samples = tf.random.categorical(\n",
        "    logits=logits, num_samples=10)\n",
        "tf.print(samples.numpy())"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:52:55.914798Z",
          "iopub.status.busy": "2021-01-01T08:52:55.914137Z",
          "iopub.status.idle": "2021-01-01T08:53:06.162148Z",
          "shell.execute_reply": "2021-01-01T08:53:06.162896Z"
        },
        "id": "R_b2-tLDyWJs",
        "outputId": "23b9c035-c2c4-40ea-a4cc-4b5fdde61813"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "The island had explore the island,\n",
            "and fleamed necessary.\n",
            "\n",
            "But Herbert, the reporter thought, there baromed oceans of those water from a long point.\n",
            "The wind heart that Herbert dresing\n",
            "the night, or comb,” answered the engineer\n",
            "approved the sources destroyed, replesieg, was abandoned, which\n",
            "the convicts were able to die read told them!”\n",
            "\n",
            "“Oh! follow?”\n",
            "said Pencroft, “but Neb, I\n",
            "were!” cried Herbert. “We have only a fireplace,” observed Cyrus Harding.\n",
            "\n",
            "The cord seek the projects of times, and on the report\n"
          ]
        }
      ],
      "source": [
        "def sample(model, starting_str,\n",
        "           len_generated_text=500,\n",
        "           max_input_length=40,\n",
        "           scale_factor=1.0):\n",
        "    encoded_input = [char2int[s] for s in starting_str]\n",
        "    encoded_input = tf.reshape(encoded_input, (1, -1))\n",
        "\n",
        "    generated_str = starting_str\n",
        "\n",
        "    model.reset_states()\n",
        "    for i in range(len_generated_text):\n",
        "        logits = model(encoded_input)\n",
        "        logits = tf.squeeze(logits, 0)\n",
        "\n",
        "        scaled_logits = logits * scale_factor\n",
        "        new_char_indx = tf.random.categorical(\n",
        "            scaled_logits, num_samples=1)\n",
        "\n",
        "        new_char_indx = tf.squeeze(new_char_indx)[-1].numpy()\n",
        "\n",
        "        generated_str += str(char_array[new_char_indx])\n",
        "\n",
        "        new_char_indx = tf.expand_dims([new_char_indx], 0)\n",
        "        encoded_input = tf.concat(\n",
        "            [encoded_input, new_char_indx],\n",
        "            axis=1)\n",
        "        encoded_input = encoded_input[:, -max_input_length:]\n",
        "\n",
        "    return generated_str\n",
        "\n",
        "tf.random.set_seed(1)\n",
        "print(sample(model, starting_str='The island'))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Stm4dLH2yWJs"
      },
      "source": [
        "* **예측 가능성 대 무작위성**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:53:06.171621Z",
          "iopub.status.busy": "2021-01-01T08:53:06.170721Z",
          "iopub.status.idle": "2021-01-01T08:53:06.175211Z",
          "shell.execute_reply": "2021-01-01T08:53:06.175546Z"
        },
        "id": "-UKHgrqTyWJs",
        "outputId": "cde805cd-1f94-48d1-b7f1-1ce0e868dace"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "스케일 조정 전의 확률:  [0.10650698 0.10650698 0.78698604]\n",
            "0.5배 조정 후 확률:   [0.21194156 0.21194156 0.57611688]\n",
            "0.1배 조정 후 확률:   [0.31042377 0.31042377 0.37915245]\n"
          ]
        }
      ],
      "source": [
        "logits = np.array([[1.0, 1.0, 3.0]])\n",
        "\n",
        "print('스케일 조정 전의 확률: ', tf.math.softmax(logits).numpy()[0])\n",
        "\n",
        "print('0.5배 조정 후 확률:  ', tf.math.softmax(0.5*logits).numpy()[0])\n",
        "\n",
        "print('0.1배 조정 후 확률:  ', tf.math.softmax(0.1*logits).numpy()[0])"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 19,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:53:06.182417Z",
          "iopub.status.busy": "2021-01-01T08:53:06.182031Z",
          "iopub.status.idle": "2021-01-01T08:53:16.415192Z",
          "shell.execute_reply": "2021-01-01T08:53:16.415932Z"
        },
        "id": "ickgML_8yWJt",
        "outputId": "a3707745-c1af-4c78-b9d8-27822e957932"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "The island had been placed in the water. His continued to five o’clock, the reporter thought that the convicts had passed the body of the water which were the passage of which the sulphate of an incident, the colonists threw the horizon, and only fell back to the exploration of the colonists. The wind from the colonists had not been torn by his former seep to the south\n",
            "with the exploration of the day before. The latter protected the palisade; but the colonists were the projects of the lake, some of the co\n"
          ]
        }
      ],
      "source": [
        "tf.random.set_seed(1)\n",
        "print(sample(model, starting_str='The island',\n",
        "             scale_factor=2.0))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 20,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:53:16.424579Z",
          "iopub.status.busy": "2021-01-01T08:53:16.424183Z",
          "iopub.status.idle": "2021-01-01T08:53:26.644678Z",
          "shell.execute_reply": "2021-01-01T08:53:26.643763Z"
        },
        "id": "UF4sqqpcyWJt",
        "outputId": "caa40546-1ea2-4a87-8e7e-c13319b5ea5f"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "The island had egelyed holligge! By the Hap would passed, formedly never, I\n",
            "rejoyp,--“To “you wild be go five enes or enohmovery pohitat;\n",
            "unforwind hiry wish heat, what symuring rasila\n",
            "wimmed hearor, projeeved\n",
            "up,\n",
            "listengr.\n",
            "He\n",
            "\n",
            "When you,” proocle, five fires, rocies ngltog.\n",
            "However’s,! Then,\n",
            "force izedicarents? Theyt? look in\n",
            "LlEchummons\n",
            "be no ofeer?”\n",
            "said Pencroft.\n",
            "\n",
            "PAcam Daybreak, rib pushem.\n",
            "Sun. He hall\n",
            "as oiley oursquice, obstrug,-- effer fros, ungen warse;-, obtaw\n",
            "by the rooms, always somber\n",
            "to puri\n"
          ]
        }
      ],
      "source": [
        "tf.random.set_seed(1)\n",
        "print(sample(model, starting_str='The island',\n",
        "             scale_factor=0.5))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_V30iCf5yWJt"
      },
      "source": [
        "# 트랜스포머 모델을 사용한 언어 이해\n",
        "\n",
        "## 셀프 어텐션 메카니즘 이해하기\n",
        "\n",
        "### 셀프 어텐션 기본 구조"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 21,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 350
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:53:26.650577Z",
          "iopub.status.busy": "2021-01-01T08:53:26.649535Z",
          "iopub.status.idle": "2021-01-01T08:53:26.663374Z",
          "shell.execute_reply": "2021-01-01T08:53:26.664093Z"
        },
        "id": "aWx-IXRtyWJt",
        "outputId": "27468626-f20b-47bf-e3d7-608333a4813b"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/JLdVo\" width=\"700\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 21
        }
      ],
      "source": [
        "Image(url='https://git.io/JLdVo', width=700)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "EXNdJdxnyWJt"
      },
      "source": [
        "### 쿼리, 키, 값 가중치를 가진 셀프 어텐션 메카니즘"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "o99EBntFyWJt"
      },
      "source": [
        "## 멀티-헤드 어텐션과 트랜스포머 블록"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 22,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 279
        },
        "execution": {
          "iopub.execute_input": "2021-01-01T08:53:26.668048Z",
          "iopub.status.busy": "2021-01-01T08:53:26.667215Z",
          "iopub.status.idle": "2021-01-01T08:53:26.672956Z",
          "shell.execute_reply": "2021-01-01T08:53:26.673524Z"
        },
        "id": "Klo3F7nvyWJu",
        "outputId": "94f7d7e7-93d3-4810-c919-890d532f3b00"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/JLdV6\" width=\"700\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 22
        }
      ],
      "source": [
        "Image(url='https://git.io/JLdV6', width=700)"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "name": "ch16_part2.ipynb",
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}