{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "mrDzC9FUyWJk" }, "source": [ "# 머신 러닝 교과서 3판" ] }, { "cell_type": "markdown", "metadata": { "id": "90ZMzS7wyWJm" }, "source": [ "# 16장 - 순환 신경망으로 순차 데이터 모델링 (2/2)" ] }, { "cell_type": "markdown", "metadata": { "id": "i-YXfR8UyWJm" }, "source": [ "**아래 링크를 통해 이 노트북을 주피터 노트북 뷰어(nbviewer.jupyter.org)로 보거나 구글 코랩(colab.research.google.com)에서 실행할 수 있습니다.**\n", "\n", "\n", " \n", " \n", "
\n", " 주피터 노트북 뷰어로 보기\n", " \n", " 구글 코랩(Colab)에서 실행하기\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "5_u_b0r4yWJn" }, "source": [ "### 목차" ] }, { "cell_type": "markdown", "metadata": { "id": "G-xguWppyWJn" }, "source": [ "- 텐서플로로 시퀀스 모델링을 위한 RNN 구현하기\n", " - 두 번째 프로젝트: 텐서플로로 글자 단위 언어 모델 구현\n", " - 데이터셋 전처리\n", " - 문자 수준의 RNN 모델 만들기\n", " - 평가 단계 - 새로운 텍스트 생성\n", "- 트랜스포머 모델을 사용한 언어 이해\n", " - 셀프 어텐션 메카니즘 이해하기\n", " - 셀프 어텐션 기본 구조\n", " - 쿼리, 키, 값 가중치를 가진 셀프 어텐션 메카니즘\n", " - 멀티-헤드 어텐션과 트랜스포머 블록\n", "- 요약" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2021-01-01T08:20:46.163997Z", "iopub.status.busy": "2021-01-01T08:20:46.163019Z", "iopub.status.idle": "2021-01-01T08:20:46.165178Z", "shell.execute_reply": "2021-01-01T08:20:46.165803Z" }, "id": "RLnk-L-ZyWJn" }, "outputs": [], "source": [ "from IPython.display import Image" ] }, { "cell_type": "markdown", "metadata": { "id": "k8wSLT9byWJn" }, "source": [ "## 두 번째 프로젝트: 텐서플로로 글자 단위 언어 모델 구현" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 240 }, "execution": { "iopub.execute_input": "2021-01-01T08:20:46.170192Z", "iopub.status.busy": "2021-01-01T08:20:46.169457Z", "iopub.status.idle": "2021-01-01T08:20:46.205811Z", "shell.execute_reply": "2021-01-01T08:20:46.206653Z" }, "id": "-tysyJWSyWJn", "outputId": "68191190-4cae-492d-d3c3-7873b0fd4bfc" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 2 } ], "source": [ "Image(url='https://git.io/JLdVE', width=700)" ] }, { "cell_type": "markdown", "metadata": { "id": "xrqTH2wvyWJo" }, "source": [ "### 데이터셋 전처리" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:20:46.213066Z", "iopub.status.busy": "2021-01-01T08:20:46.212126Z", "iopub.status.idle": "2021-01-01T08:20:50.103519Z", "shell.execute_reply": "2021-01-01T08:20:50.102717Z" }, "id": "GtxzLN-RyWJo", "outputId": "c520517c-2ba7-41ed-9ea1-35f603ed08e0" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "--2023-11-10 06:06:00-- https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch16/1268-0.txt\n", "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n", "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 1171600 (1.1M) [text/plain]\n", "Saving to: ‘1268-0.txt’\n", "\n", "1268-0.txt 100%[===================>] 1.12M --.-KB/s in 0.06s \n", "\n", "2023-11-10 06:06:00 (18.0 MB/s) - ‘1268-0.txt’ saved [1171600/1171600]\n", "\n" ] } ], "source": [ "# 코랩에서 실행할 경우 다음 코드를 실행해 주세요.\n", "!wget https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch16/1268-0.txt" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:20:50.111825Z", "iopub.status.busy": "2021-01-01T08:20:50.110901Z", "iopub.status.idle": "2021-01-01T08:20:50.138395Z", "shell.execute_reply": "2021-01-01T08:20:50.137468Z" }, "id": "mRwzImO7yWJo", "outputId": "9422b729-3fb5-4c3f-a064-230f4e71c900" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "567 1112917\n", "전체 길이: 1112350\n", "고유한 문자: 80\n" ] } ], "source": [ "import numpy as np\n", "\n", "\n", "## 텍스트 읽고 전처리하기\n", "with open('1268-0.txt', 'r', encoding='UTF8') as fp:\n", " text=fp.read()\n", "\n", "start_indx = text.find('THE MYSTERIOUS ISLAND')\n", "end_indx = text.find('End of the Project Gutenberg')\n", "print(start_indx, end_indx)\n", "\n", "text = text[start_indx:end_indx]\n", "char_set = set(text)\n", "print('전체 길이:', len(text))\n", "print('고유한 문자:', len(char_set))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 289 }, "execution": { "iopub.execute_input": "2021-01-01T08:20:50.143620Z", "iopub.status.busy": "2021-01-01T08:20:50.142625Z", "iopub.status.idle": "2021-01-01T08:20:50.148075Z", "shell.execute_reply": "2021-01-01T08:20:50.148851Z" }, "id": "HjzozZLuyWJp", "outputId": "27923ff3-33fc-4618-e6d9-e033d57f132a" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 5 } ], "source": [ "Image(url='https://git.io/JLdVz', width=700)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:20:50.198178Z", "iopub.status.busy": "2021-01-01T08:20:50.167394Z", "iopub.status.idle": "2021-01-01T08:20:50.298755Z", "shell.execute_reply": "2021-01-01T08:20:50.297847Z" }, "id": "X3zvJimcyWJp", "outputId": "a33a234d-e9d3-4802-dcba-51927200a809" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "인코딩된 텍스트 크기: (1112350,)\n", "THE MYSTERIOUS == 인코딩 ==> [44 32 29 1 37 48 43 44 29 42 33 39 45 43 1]\n", "[33 43 36 25 38 28] == 디코딩 ==> ISLAND\n" ] } ], "source": [ "chars_sorted = sorted(char_set)\n", "char2int = {ch:i for i,ch in enumerate(chars_sorted)}\n", "char_array = np.array(chars_sorted)\n", "\n", "text_encoded = np.array(\n", " [char2int[ch] for ch in text],\n", " dtype=np.int32)\n", "\n", "print('인코딩된 텍스트 크기: ', text_encoded.shape)\n", "\n", "print(text[:15], ' == 인코딩 ==> ', text_encoded[:15])\n", "print(text_encoded[15:21], ' == 디코딩 ==> ', ''.join(char_array[text_encoded[15:21]]))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 419 }, "execution": { "iopub.execute_input": "2021-01-01T08:20:50.303833Z", "iopub.status.busy": "2021-01-01T08:20:50.302867Z", "iopub.status.idle": "2021-01-01T08:20:50.309281Z", "shell.execute_reply": "2021-01-01T08:20:50.310125Z" }, "id": "c9JdWEpwyWJp", "outputId": "7798af70-4fb3-4ac8-ee0e-bdad57ea53f3" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 7 } ], "source": [ "Image(url='https://git.io/JLdVV', width=700)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:20:50.316611Z", "iopub.status.busy": "2021-01-01T08:20:50.315609Z", "iopub.status.idle": "2021-01-01T08:20:52.091027Z", "shell.execute_reply": "2021-01-01T08:20:52.091768Z" }, "id": "_E5CojZ5yWJp", "outputId": "e98cd3a2-acd5-4d44-f4a1-e0a61dd70993" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "44 -> T\n", "32 -> H\n", "29 -> E\n", "1 -> \n", "37 -> M\n" ] } ], "source": [ "import tensorflow as tf\n", "\n", "\n", "ds_text_encoded = tf.data.Dataset.from_tensor_slices(text_encoded)\n", "\n", "for ex in ds_text_encoded.take(5):\n", " print('{} -> {}'.format(ex.numpy(), char_array[ex.numpy()]))" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:20:52.100011Z", "iopub.status.busy": "2021-01-01T08:20:52.099319Z", "iopub.status.idle": "2021-01-01T08:20:52.107438Z", "shell.execute_reply": "2021-01-01T08:20:52.108132Z" }, "id": "Ilw6v7iKyWJq", "outputId": "9aba0cbc-eaeb-4fc7-c33c-c8cc692e1e89" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "[44 32 29 1 37 48 43 44 29 42 33 39 45 43 1 33 43 36 25 38 28 1 6 6\n", " 6 0 0 0 0 0 40 67 64 53 70 52 54 53 1 51] -> 74\n", "'THE MYSTERIOUS ISLAND ***\\n\\n\\n\\n\\nProduced b' -> 'y'\n" ] } ], "source": [ "seq_length = 40\n", "chunk_size = seq_length + 1\n", "\n", "ds_chunks = ds_text_encoded.batch(chunk_size, drop_remainder=True)\n", "\n", "## inspection:\n", "for seq in ds_chunks.take(1):\n", " input_seq = seq[:seq_length].numpy()\n", " target = seq[seq_length].numpy()\n", " print(input_seq, ' -> ', target)\n", " print(repr(''.join(char_array[input_seq])),\n", " ' -> ', repr(''.join(char_array[target])))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 515 }, "execution": { "iopub.execute_input": "2021-01-01T08:20:52.113143Z", "iopub.status.busy": "2021-01-01T08:20:52.112271Z", "iopub.status.idle": "2021-01-01T08:20:52.120737Z", "shell.execute_reply": "2021-01-01T08:20:52.121466Z" }, "id": "9Ot6_7ptyWJq", "outputId": "b2b28175-5d33-4639-d228-693008e4c726" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 10 } ], "source": [ "Image(url='https://git.io/JLdVr', width=700)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:20:52.131350Z", "iopub.status.busy": "2021-01-01T08:20:52.130702Z", "iopub.status.idle": "2021-01-01T08:20:52.199641Z", "shell.execute_reply": "2021-01-01T08:20:52.198873Z" }, "id": "oDIGchMbyWJq", "outputId": "50dbdc08-1bb1-448a-b016-f971b13f3111" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "입력 (x): 'THE MYSTERIOUS ISLAND ***\\n\\n\\n\\n\\nProduced b'\n", "타깃 (y): 'HE MYSTERIOUS ISLAND ***\\n\\n\\n\\n\\nProduced by'\n", "\n", "입력 (x): ' Anthony Matonak, and Trevor Carlson\\n\\n\\n\\n'\n", "타깃 (y): 'Anthony Matonak, and Trevor Carlson\\n\\n\\n\\n\\n'\n", "\n" ] } ], "source": [ "## x & y를 나누기 위한 함수를 정의합니다\n", "def split_input_target(chunk):\n", " input_seq = chunk[:-1]\n", " target_seq = chunk[1:]\n", " return input_seq, target_seq\n", "\n", "ds_sequences = ds_chunks.map(split_input_target)\n", "\n", "## 확인:\n", "for example in ds_sequences.take(2):\n", " print('입력 (x):', repr(''.join(char_array[example[0].numpy()])))\n", " print('타깃 (y):', repr(''.join(char_array[example[1].numpy()])))\n", " print()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:20:52.206427Z", "iopub.status.busy": "2021-01-01T08:20:52.205493Z", "iopub.status.idle": "2021-01-01T08:20:52.214632Z", "shell.execute_reply": "2021-01-01T08:20:52.214055Z" }, "id": "EwXM1x7MyWJr", "outputId": "d5fd1538-cf59-45aa-f608-65585a141236" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "<_BatchDataset element_spec=(TensorSpec(shape=(None, 40), dtype=tf.int32, name=None), TensorSpec(shape=(None, 40), dtype=tf.int32, name=None))>" ] }, "metadata": {}, "execution_count": 12 } ], "source": [ "# 배치 크기\n", "BATCH_SIZE = 64\n", "BUFFER_SIZE = 10000\n", "\n", "tf.random.set_seed(1)\n", "ds = ds_sequences.shuffle(BUFFER_SIZE).batch(BATCH_SIZE)# drop_remainder=True)\n", "\n", "ds" ] }, { "cell_type": "markdown", "metadata": { "id": "ALd_KVXbyWJr" }, "source": [ "### 문자 수준의 RNN 모델 만들기" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:20:52.222512Z", "iopub.status.busy": "2021-01-01T08:20:52.221844Z", "iopub.status.idle": "2021-01-01T08:20:52.507739Z", "shell.execute_reply": "2021-01-01T08:20:52.508469Z" }, "id": "2DQqLCqAyWJr", "outputId": "5db512d4-2afb-4c20-c229-9b80fc90d8cc" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Model: \"sequential\"\n", "_________________________________________________________________\n", " Layer (type) Output Shape Param # \n", "=================================================================\n", " embedding (Embedding) (None, None, 256) 20480 \n", " \n", " lstm (LSTM) (None, None, 512) 1574912 \n", " \n", " dense (Dense) (None, None, 80) 41040 \n", " \n", "=================================================================\n", "Total params: 1636432 (6.24 MB)\n", "Trainable params: 1636432 (6.24 MB)\n", "Non-trainable params: 0 (0.00 Byte)\n", "_________________________________________________________________\n" ] } ], "source": [ "def build_model(vocab_size, embedding_dim, rnn_units):\n", " model = tf.keras.Sequential([\n", " tf.keras.layers.Embedding(vocab_size, embedding_dim),\n", " tf.keras.layers.LSTM(\n", " rnn_units, return_sequences=True),\n", " tf.keras.layers.Dense(vocab_size)\n", " ])\n", " return model\n", "\n", "\n", "charset_size = len(char_array)\n", "embedding_dim = 256\n", "rnn_units = 512\n", "\n", "tf.random.set_seed(1)\n", "\n", "model = build_model(\n", " vocab_size = charset_size,\n", " embedding_dim=embedding_dim,\n", " rnn_units=rnn_units)\n", "\n", "model.summary()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:20:52.523264Z", "iopub.status.busy": "2021-01-01T08:20:52.522357Z", "iopub.status.idle": "2021-01-01T08:52:55.869819Z", "shell.execute_reply": "2021-01-01T08:52:55.870445Z" }, "id": "6oDKVX0JyWJr", "outputId": "4fa28c17-1663-4500-adfa-b989c31b642e" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Epoch 1/20\n", "424/424 [==============================] - 20s 17ms/step - loss: 2.2842\n", "Epoch 2/20\n", "424/424 [==============================] - 8s 16ms/step - loss: 1.7162\n", "Epoch 3/20\n", "424/424 [==============================] - 6s 13ms/step - loss: 1.5044\n", "Epoch 4/20\n", "424/424 [==============================] - 7s 13ms/step - loss: 1.3886\n", "Epoch 5/20\n", "424/424 [==============================] - 7s 14ms/step - loss: 1.3185\n", "Epoch 6/20\n", "424/424 [==============================] - 7s 14ms/step - loss: 1.2698\n", "Epoch 7/20\n", "424/424 [==============================] - 8s 16ms/step - loss: 1.2330\n", "Epoch 8/20\n", "424/424 [==============================] - 6s 13ms/step - loss: 1.2033\n", "Epoch 9/20\n", "424/424 [==============================] - 7s 16ms/step - loss: 1.1789\n", "Epoch 10/20\n", "424/424 [==============================] - 7s 15ms/step - loss: 1.1565\n", "Epoch 11/20\n", "424/424 [==============================] - 7s 14ms/step - loss: 1.1369\n", "Epoch 12/20\n", "424/424 [==============================] - 8s 14ms/step - loss: 1.1182\n", "Epoch 13/20\n", "424/424 [==============================] - 7s 15ms/step - loss: 1.1011\n", "Epoch 14/20\n", "424/424 [==============================] - 6s 13ms/step - loss: 1.0846\n", "Epoch 15/20\n", "424/424 [==============================] - 7s 14ms/step - loss: 1.0692\n", "Epoch 16/20\n", "424/424 [==============================] - 8s 16ms/step - loss: 1.0542\n", "Epoch 17/20\n", "424/424 [==============================] - 6s 13ms/step - loss: 1.0396\n", "Epoch 18/20\n", "424/424 [==============================] - 7s 14ms/step - loss: 1.0249\n", "Epoch 19/20\n", "424/424 [==============================] - 8s 16ms/step - loss: 1.0105\n", "Epoch 20/20\n", "424/424 [==============================] - 7s 14ms/step - loss: 0.9967\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": {}, "execution_count": 14 } ], "source": [ "model.compile(\n", " optimizer='adam',\n", " loss=tf.keras.losses.SparseCategoricalCrossentropy(\n", " from_logits=True\n", " ))\n", "\n", "model.fit(ds, epochs=20)" ] }, { "cell_type": "markdown", "metadata": { "id": "HPxMUjzzyWJr" }, "source": [ "### 평가 단계 - 새로운 텍스트 생성" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:52:55.887487Z", "iopub.status.busy": "2021-01-01T08:52:55.886806Z", "iopub.status.idle": "2021-01-01T08:52:55.892794Z", "shell.execute_reply": "2021-01-01T08:52:55.892440Z" }, "id": "qJ8Wf-ofyWJs", "outputId": "9109244e-c362-49e4-ba7c-5d976d8073c0" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "확률: [0.33333334 0.33333334 0.33333334]\n", "array([[1, 2, 0, 1, 0, 1, 1, 2, 1, 1]])\n" ] } ], "source": [ "tf.random.set_seed(1)\n", "\n", "logits = [[1.0, 1.0, 1.0]]\n", "print('확률:', tf.math.softmax(logits).numpy()[0])\n", "\n", "samples = tf.random.categorical(\n", " logits=logits, num_samples=10)\n", "tf.print(samples.numpy())" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:52:55.899437Z", "iopub.status.busy": "2021-01-01T08:52:55.898456Z", "iopub.status.idle": "2021-01-01T08:52:55.903367Z", "shell.execute_reply": "2021-01-01T08:52:55.904194Z" }, "id": "PZd-fbXByWJs", "outputId": "095725b4-3ca4-4ed7-dbbd-d0a8ce74028d" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "확률: [0.10650698 0.10650698 0.78698605]\n", "array([[2, 2, 0, 2, 2, 2, 2, 2, 1, 2]])\n" ] } ], "source": [ "tf.random.set_seed(1)\n", "\n", "logits = [[1.0, 1.0, 3.0]]\n", "print('확률:', tf.math.softmax(logits).numpy()[0])\n", "\n", "samples = tf.random.categorical(\n", " logits=logits, num_samples=10)\n", "tf.print(samples.numpy())" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:52:55.914798Z", "iopub.status.busy": "2021-01-01T08:52:55.914137Z", "iopub.status.idle": "2021-01-01T08:53:06.162148Z", "shell.execute_reply": "2021-01-01T08:53:06.162896Z" }, "id": "R_b2-tLDyWJs", "outputId": "23b9c035-c2c4-40ea-a4cc-4b5fdde61813" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "The island had explore the island,\n", "and fleamed necessary.\n", "\n", "But Herbert, the reporter thought, there baromed oceans of those water from a long point.\n", "The wind heart that Herbert dresing\n", "the night, or comb,” answered the engineer\n", "approved the sources destroyed, replesieg, was abandoned, which\n", "the convicts were able to die read told them!”\n", "\n", "“Oh! follow?”\n", "said Pencroft, “but Neb, I\n", "were!” cried Herbert. “We have only a fireplace,” observed Cyrus Harding.\n", "\n", "The cord seek the projects of times, and on the report\n" ] } ], "source": [ "def sample(model, starting_str,\n", " len_generated_text=500,\n", " max_input_length=40,\n", " scale_factor=1.0):\n", " encoded_input = [char2int[s] for s in starting_str]\n", " encoded_input = tf.reshape(encoded_input, (1, -1))\n", "\n", " generated_str = starting_str\n", "\n", " model.reset_states()\n", " for i in range(len_generated_text):\n", " logits = model(encoded_input)\n", " logits = tf.squeeze(logits, 0)\n", "\n", " scaled_logits = logits * scale_factor\n", " new_char_indx = tf.random.categorical(\n", " scaled_logits, num_samples=1)\n", "\n", " new_char_indx = tf.squeeze(new_char_indx)[-1].numpy()\n", "\n", " generated_str += str(char_array[new_char_indx])\n", "\n", " new_char_indx = tf.expand_dims([new_char_indx], 0)\n", " encoded_input = tf.concat(\n", " [encoded_input, new_char_indx],\n", " axis=1)\n", " encoded_input = encoded_input[:, -max_input_length:]\n", "\n", " return generated_str\n", "\n", "tf.random.set_seed(1)\n", "print(sample(model, starting_str='The island'))" ] }, { "cell_type": "markdown", "metadata": { "id": "Stm4dLH2yWJs" }, "source": [ "* **예측 가능성 대 무작위성**" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:53:06.171621Z", "iopub.status.busy": "2021-01-01T08:53:06.170721Z", "iopub.status.idle": "2021-01-01T08:53:06.175211Z", "shell.execute_reply": "2021-01-01T08:53:06.175546Z" }, "id": "-UKHgrqTyWJs", "outputId": "cde805cd-1f94-48d1-b7f1-1ce0e868dace" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "스케일 조정 전의 확률: [0.10650698 0.10650698 0.78698604]\n", "0.5배 조정 후 확률: [0.21194156 0.21194156 0.57611688]\n", "0.1배 조정 후 확률: [0.31042377 0.31042377 0.37915245]\n" ] } ], "source": [ "logits = np.array([[1.0, 1.0, 3.0]])\n", "\n", "print('스케일 조정 전의 확률: ', tf.math.softmax(logits).numpy()[0])\n", "\n", "print('0.5배 조정 후 확률: ', tf.math.softmax(0.5*logits).numpy()[0])\n", "\n", "print('0.1배 조정 후 확률: ', tf.math.softmax(0.1*logits).numpy()[0])" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:53:06.182417Z", "iopub.status.busy": "2021-01-01T08:53:06.182031Z", "iopub.status.idle": "2021-01-01T08:53:16.415192Z", "shell.execute_reply": "2021-01-01T08:53:16.415932Z" }, "id": "ickgML_8yWJt", "outputId": "a3707745-c1af-4c78-b9d8-27822e957932" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "The island had been placed in the water. His continued to five o’clock, the reporter thought that the convicts had passed the body of the water which were the passage of which the sulphate of an incident, the colonists threw the horizon, and only fell back to the exploration of the colonists. The wind from the colonists had not been torn by his former seep to the south\n", "with the exploration of the day before. The latter protected the palisade; but the colonists were the projects of the lake, some of the co\n" ] } ], "source": [ "tf.random.set_seed(1)\n", "print(sample(model, starting_str='The island',\n", " scale_factor=2.0))" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-01-01T08:53:16.424579Z", "iopub.status.busy": "2021-01-01T08:53:16.424183Z", "iopub.status.idle": "2021-01-01T08:53:26.644678Z", "shell.execute_reply": "2021-01-01T08:53:26.643763Z" }, "id": "UF4sqqpcyWJt", "outputId": "caa40546-1ea2-4a87-8e7e-c13319b5ea5f" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "The island had egelyed holligge! By the Hap would passed, formedly never, I\n", "rejoyp,--“To “you wild be go five enes or enohmovery pohitat;\n", "unforwind hiry wish heat, what symuring rasila\n", "wimmed hearor, projeeved\n", "up,\n", "listengr.\n", "He\n", "\n", "When you,” proocle, five fires, rocies ngltog.\n", "However’s,! Then,\n", "force izedicarents? Theyt? look in\n", "LlEchummons\n", "be no ofeer?”\n", "said Pencroft.\n", "\n", "PAcam Daybreak, rib pushem.\n", "Sun. He hall\n", "as oiley oursquice, obstrug,-- effer fros, ungen warse;-, obtaw\n", "by the rooms, always somber\n", "to puri\n" ] } ], "source": [ "tf.random.set_seed(1)\n", "print(sample(model, starting_str='The island',\n", " scale_factor=0.5))" ] }, { "cell_type": "markdown", "metadata": { "id": "_V30iCf5yWJt" }, "source": [ "# 트랜스포머 모델을 사용한 언어 이해\n", "\n", "## 셀프 어텐션 메카니즘 이해하기\n", "\n", "### 셀프 어텐션 기본 구조" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 350 }, "execution": { "iopub.execute_input": "2021-01-01T08:53:26.650577Z", "iopub.status.busy": "2021-01-01T08:53:26.649535Z", "iopub.status.idle": "2021-01-01T08:53:26.663374Z", "shell.execute_reply": "2021-01-01T08:53:26.664093Z" }, "id": "aWx-IXRtyWJt", "outputId": "27468626-f20b-47bf-e3d7-608333a4813b" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 21 } ], "source": [ "Image(url='https://git.io/JLdVo', width=700)" ] }, { "cell_type": "markdown", "metadata": { "id": "EXNdJdxnyWJt" }, "source": [ "### 쿼리, 키, 값 가중치를 가진 셀프 어텐션 메카니즘" ] }, { "cell_type": "markdown", "metadata": { "id": "o99EBntFyWJt" }, "source": [ "## 멀티-헤드 어텐션과 트랜스포머 블록" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 279 }, "execution": { "iopub.execute_input": "2021-01-01T08:53:26.668048Z", "iopub.status.busy": "2021-01-01T08:53:26.667215Z", "iopub.status.idle": "2021-01-01T08:53:26.672956Z", "shell.execute_reply": "2021-01-01T08:53:26.673524Z" }, "id": "Klo3F7nvyWJu", "outputId": "94f7d7e7-93d3-4810-c919-890d532f3b00" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 22 } ], "source": [ "Image(url='https://git.io/JLdV6', width=700)" ] } ], "metadata": { "accelerator": "GPU", "colab": { "name": "ch16_part2.ipynb", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 0 }