{ "nbformat": 4, "nbformat_minor": 5, "metadata": { "accelerator": "GPU", "colab": { "name": "RNN.ipynb", "provenance": [], "collapsed_sections": [], "include_colab_link": true }, "kernelspec": { "display_name": "Python [conda env:PIC16B] *", "language": "python", "name": "conda-env-PIC16B-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "metadata": { "id": "3005db6d" }, "source": [ "# Text Generation with Recurrent Neural Networks\n", "\n", "In this set of lecture notes, we'll consider a new kind of machine learning task. Previously, we've focused on *classification* problems. In classification problems, the goal is to assign a given piece of data to one of several categories. Today, we'll instead consider a simple *generation* problem. A *generative* model can be used to create \"realistic\" examples after it's been trained. Generative models are at the heart of machine learning topics like [deepfakes](https://en.wikipedia.org/wiki/Deepfake), [language generation](https://aiweirdness.com/post/140219420017/the-silicon-gourmet-training-a-neural-network-to), and [style transfer](https://www.tensorflow.org/tutorials/generative/style_transfer). \n", "\n", "*Parts of these lecture notes were based on [this tutorial](https://keras.io/examples/generative/lstm_character_level_text_generation/). It is recommended to run the code contained in these notes in a Google Colab instance with GPU acceleration enabled.* " ], "id": "3005db6d" }, { "cell_type": "code", "metadata": { "id": "f84e1501" }, "source": [ "import tensorflow as tf\n", "from tensorflow.keras.layers.experimental import preprocessing\n", "from tensorflow.keras import layers\n", "\n", "import numpy as np\n", "from matplotlib import pyplot as plt\n", "import pandas as pd" ], "id": "f84e1501", "execution_count": 2, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "908268fd", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "055c80d5-d80b-4cd8-8242-67b37d19d845" }, "source": [ "# link to Google Drive to read in trained model\n", "from google.colab import drive\n", "drive.mount('/content/drive')" ], "id": "908268fd", "execution_count": 3, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Mounted at /content/drive\n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "2ea93a0e" }, "source": [ "\n", "## Our Task\n", "\n", "Today, we are going to see whether we can teach an algorithm to understand and reproduce the pinnacle of cultural achievement; the benchmark against which all art is to be judged; the mirror that reveals to humany its truest self. I speak, of course, of *Star Trek: Deep Space Nine.*\n", "\n", "
\n", " \"\"\n", "
\n", "
\n", "\n", "In particular, we are going to attempt to teach a neural network to generate *episode scripts*. This a text generation task: after training, our hope is that our model will be able to create scripts that are reasonably realistic in their appearance. \n" ], "id": "2ea93a0e" }, { "cell_type": "code", "metadata": { "id": "21ca9435" }, "source": [ "## miscellaneous data cleaning\n", "\n", "start_episode = 20 # Start in Season 2, Season 1 is not very good\n", "num_episodes = 50 # only pick this many episodes to train on\n", "\n", "url = \"https://github.com/PhilChodrow/PIC16B/blob/master/datasets/star_trek_scripts.json?raw=true\"\n", "star_trek_scripts = pd.read_json(url)\n", "\n", "cleaned = star_trek_scripts[\"DS9\"].str.replace(\"\\n\\n\\n\\n\\n\\nThe Deep Space Nine Transcripts -\", \"\")\n", "cleaned = cleaned.str.split(\"\\n\\n\\n\\n\\n\\n\\n\").str.get(-2)\n", "text = \"\\n\\n\".join(cleaned[start_episode:(start_episode + num_episodes)])\n", "for char in ['\\xa0', 'à', 'é', \"}\", \"{\"]:\n", " text = text.replace(char, \"\")" ], "id": "21ca9435", "execution_count": 4, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "1cb17a37" }, "source": [ "The result is one long string containing the scripts of 50 episodes of Star Trek: Deep Space 9. How glorious!" ], "id": "1cb17a37" }, { "cell_type": "code", "metadata": { "id": "aa08a1a2", "outputId": "068ae063-c54b-4aa1-f9a9-8ae74b6a6d74", "colab": { "base_uri": "https://localhost:8080/" } }, "source": [ "len(text)" ], "id": "aa08a1a2", "execution_count": 5, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "1570834" ] }, "metadata": {}, "execution_count": 5 } ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "e46acb3a", "outputId": "6b4ee005-9221-46d1-dfe9-8653cecf37f1" }, "source": [ "print(text[0:500])" ], "id": "e46acb3a", "execution_count": 6, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ " Last\n", "time on Deep Space Nine. \n", "SISKO: This is the emblem of the Alliance for Global Unity. They call\n", "themselves the Circle. \n", "O'BRIEN: What gives them the right to mess up our station? \n", "ODO: They're an extremist faction who believe in Bajor for the\n", "Bajorans. \n", "SISKO: I can't loan you a Starfleet runabout without knowing where you\n", "plan on taking it. \n", "KIRA: To Cardassia Four to rescue a Bajoran prisoner of war. \n", "(The prisoners are rescued.) \n", "KIRA: Come on. We have a ship waiting. \n", "JARO: What you \n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "acf3c0af" }, "source": [ "Our first step, as usual, is data preparation. What we need to do is format the data in such a way that we can treat the situation as a classification problem after all. That is: \n", "\n", "> Given a string of text, predict the next character in that string. \n", "\n", "Doing this repeatedly will allow the model to generate large bodies of text. \n", "\n", "To do this, we want to split our data like so: \n", "\n", "```\n", "predictor = \"to boldly g\"\n", "target = \"o\"\n", "```\n", "\n", "The following function will do this for us. The `max_len` argument gives the number of characters that should be in the predictor string, and the `step_size` argument lets us skip indices if we want to in order to decrease the size of the data. " ], "id": "acf3c0af" }, { "cell_type": "code", "metadata": { "id": "ed8c868b" }, "source": [ "def split(raw_text, max_len, step_size = 1):\n", "\n", " lines = []\n", " next_chars = []\n", "\n", " for i in range(0, len(text) - max_len, step_size):\n", " lines.append(text[i:i+max_len])\n", " next_chars.append(text[i+max_len])\n", " \n", " return lines, next_chars" ], "id": "ed8c868b", "execution_count": 7, "outputs": [] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "4e6c2016", "outputId": "ebef3c15-a0e6-43a8-d2c4-3ad7f5e77f3f" }, "source": [ "max_len = 20\n", "\n", "lines, next_chars = split(text, max_len = max_len, step_size = 5)\n", "for i in range(10, 15):\n", " print(lines[i] + \" => \" + next_chars[i])" ], "id": "4e6c2016", "execution_count": 8, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "he emblem of the All => i\n", "blem of the Alliance => \n", "of the Alliance for => G\n", "e Alliance for Globa => l\n", "iance for Global Uni => t\n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "751e52ec" }, "source": [ "Our next step is to vectorize the characters. This is similar to the word vectorization task, but it's simple enough in this case that's arguably more convenient to actually handle it outside of TensorFlow. It is also possible to handle vectorization using TensorFlow tools, as demonstrated in [this tutorial](https://www.tensorflow.org/tutorials/text/text_generation). " ], "id": "751e52ec" }, { "cell_type": "code", "metadata": { "id": "ac803629" }, "source": [ "chars = sorted(set(text))\n", "char_indices = {char : chars.index(char) for char in chars}\n", "X = np.zeros((len(lines), max_len, len(chars)))\n", "y = np.zeros((len(lines), 1), dtype = np.int32)\n", "for i, line in enumerate(lines):\n", "\tfor t, char in enumerate(line):\n", "\t\tX[i, t, char_indices[char]] = 1\n", "\ty[i] = char_indices[next_chars[i]]" ], "id": "ac803629", "execution_count": 9, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "61ab612a" }, "source": [ "Let's take a look at the results. " ], "id": "61ab612a" }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "08f088dc", "outputId": "d7b236aa-1690-4769-f8f0-acdd248071c4" }, "source": [ "X.shape, y.shape" ], "id": "08f088dc", "execution_count": 10, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "((314163, 20, 78), (314163, 1))" ] }, "metadata": {}, "execution_count": 10 } ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "03435c5a", "outputId": "e29b8895-584a-4ec6-9bd4-1e0f6761480f" }, "source": [ "X[0]" ], "id": "03435c5a", "execution_count": 11, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "array([[0., 1., 0., ..., 0., 0., 0.],\n", " [0., 1., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " ...,\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 1., 0., ..., 0., 0., 0.]])" ] }, "metadata": {}, "execution_count": 11 } ] }, { "cell_type": "markdown", "metadata": { "id": "6334478d" }, "source": [ "What does this matrix say? `X[0]` is a single sequence of 20 characters. The first row says that the first character of this sequence is the character corresponding to integer `1`. The second row says that the second character corresponds to integer `1`, and so on (`1` is the space character `\" \"`, in case you're wondering). " ], "id": "6334478d" }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "b59a4010", "outputId": "c8272585-6df3-43df-d411-18ac4efb078f" }, "source": [ "y[0]" ], "id": "b59a4010", "execution_count": 12, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "array([42], dtype=int32)" ] }, "metadata": {}, "execution_count": 12 } ] }, { "cell_type": "markdown", "metadata": { "id": "61697000" }, "source": [ "This says that the 21st character (the character after the first 20 characters of the string) is the character with index `42`. " ], "id": "61697000" }, { "cell_type": "markdown", "metadata": { "id": "03ffcfbb" }, "source": [ "Now we're ready to perform a train-test split: " ], "id": "03ffcfbb" }, { "cell_type": "code", "metadata": { "id": "0ef097e9" }, "source": [ "train_len = int(0.7*X.shape[0])\n", "X_train = X[0:train_len]\n", "X_val = X[train_len:]\n", "\n", "y_train = y[0:train_len]\n", "y_val = y[train_len:]" ], "id": "0ef097e9", "execution_count": 13, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "b43212e2" }, "source": [ "## Modeling\n", "\n", "Model time! We'll use a simple *Long Short-Term Memory* (LSTM) model for this example. LSTMs are one example of *recurrent* neural network layers. Here's a diagram illustrating the schematic functioning of a recurrent layer. \n", "\n", "![](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png)\n", "*Image credit: [Chris Olah](https://colah.github.io/posts/2015-08-Understanding-LSTMs/), OpenAI*\n", "\n", "On the lefthand side, we have a \"zoomed out\" picture of a recurrent neural network layer. On the righthand side, we see the \"zoomed in\" version. The key point here is that output $h_2$ depends not only on input $x_2$, but also, indirectly, on inputs $x_0$ and $x_1$. This means that recurrent neural networks are highly suitable for modeling processes that have temporal structure. Text is an example: the last few characters are the \"history\" of the text. Timeseries data are another clear example, and indeed, we can use a very similar workflow to the one we'll use today in order to do forecasting in timeseries. \n", "\n", "Since training for this kind of task gets expensive fast, we'll use just one LSTM layer followed by a `Dense` output layer. " ], "id": "b43212e2" }, { "cell_type": "code", "metadata": { "id": "8802e47f" }, "source": [ "model = tf.keras.models.Sequential([\n", " layers.LSTM(128, name = \"LSTM\", input_shape=(max_len, len(chars))),\n", " layers.Dense(len(chars)) \n", "])" ], "id": "8802e47f", "execution_count": 14, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "ac7272e9" }, "source": [ "model.compile(loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits = True), \n", " optimizer = \"adam\")" ], "id": "ac7272e9", "execution_count": 15, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "d5de9f4f" }, "source": [ "Time for training. We'll do just one epoch for now, mostly just to prove that we've set up our model correctly. " ], "id": "d5de9f4f" }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "a87b3673", "outputId": "06cdab30-9a4a-421c-cb13-9c4bf6a648be" }, "source": [ "# code I used to train and save the model\n", "# model.fit(X_train, \n", "# y_train,\n", "# validation_data= (X_val, y_val),\n", "# batch_size=128, epochs = 200)\n", "# model.save('/content/drive/MyDrive/DS9_model') \n", "\n", "model.fit(X_train, \n", " y_train,\n", " validation_data= (X_val, y_val),\n", " batch_size=128, epochs = 1)" ], "id": "a87b3673", "execution_count": 16, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "1719/1719 [==============================] - 34s 17ms/step - loss: 2.5975 - val_loss: 2.2011\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": {}, "execution_count": 16 } ] }, { "cell_type": "markdown", "metadata": { "id": "4d173822" }, "source": [ "Rather than training the entire model live during lecture, I'm instead going to load in a saved model that I previously trained for 200 epochs on Google Colab. On Colab, each epoch takes around 10s or so. 200 epochs corresponds to roughly 30 minutes. " ], "id": "4d173822" }, { "cell_type": "code", "metadata": { "id": "34381d8c", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "b3b6b801-77a8-47b1-ab45-a9a5bc537d9e" }, "source": [ "model = tf.keras.models.load_model('/content/drive/MyDrive/DS9_model')" ], "id": "34381d8c", "execution_count": 17, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), *NOT* tf.saved_model.save(). To confirm, there should be a file named \"keras_metadata.pb\" in the SavedModel directory.\n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "13e42394" }, "source": [ "Generative models define *probability distributions* over the space of possible outputs. So, our overall algorithm is going to generate new text in a partially randomized way. To make this happen, we define a `sample` function which will take the model outputs, turn them into probabilities, and then sample from the probabilities to produce a single character (well, technically, an integer corresponding to a single character). \n", "\n", "An important parameter here is the so-called *temperature* (this terminology comes from statistical physics. When the temperature is high, the model will more frequently choose low-probability characters. This is sometimes interpreted as \"creativity,\" and leads to more unpredictable outputs. When the temperature is low, on the other hand, the model will \"play it safe\" and tend to stick to known patterns. In the extreme limiting case as the temperature approaches 0, the model will ultimately get stuck in \"loops\" in which it repeats common phrases over and over again. \n", "\n", "I'm not going to dwell on the math here, but if you're familiar with phrases like \"softmax\" or \"Boltzmann distribution,\" this code implements and samples from such a distribution using the model predictions." ], "id": "13e42394" }, { "cell_type": "code", "metadata": { "id": "811ae0c5" }, "source": [ "def sample(preds, temp):\n", " \n", " # format the model predictions\n", " preds = np.asarray(preds).astype(\"float64\")\n", " \n", " # construct normalized Boltzman with temp\n", " probs = np.exp(preds/temp)\n", " probs = probs / probs.sum()\n", " \n", " # sample from Boltzman\n", " samp = np.random.multinomial(1, probs, 1)\n", " return np.argmax(samp)" ], "id": "811ae0c5", "execution_count": 18, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "73978e20" }, "source": [ "Note that this function takes in some model predictions and returns a single integer, which we can interpret as a character. \n", "\n", "Now that we know how to sample from the model predictions to create a new character, let's now define a convenient function that will allow us to create entire strings of specified length using this process. There's some index management here that can be a little tricky. " ], "id": "73978e20" }, { "cell_type": "code", "metadata": { "id": "56fb1bcc" }, "source": [ "def generate_string(seed_index, temp, gen_length, model): \n", " \n", " # sequence of integer indices for generated text\n", " gen_seq = np.zeros((max_len + gen_length, len(chars)))\n", " \n", " # first part of the generated indices actually corresponds to the real text\n", " seed = X[seed_index]\n", " gen_seq[0:max_len] = seed\n", " \n", " # character version\n", " gen_text = lines[seed_index]\n", " \n", " # main loop. \n", " # at each stage we are going to get a single \n", " # character from the model prediction (with the sample function)\n", " # and then feed that character BACK into the model as \"data\"\n", " # for the next prediction\n", " for i in range(0, gen_length):\n", " \n", " # this corresponds to the part of the generated\n", " # text that the model can \"see\"\n", " window = gen_seq[i: i + max_len]\n", " \n", " # get the prediction and sample a single index\n", " preds = model.predict(np.array([window]))[0]\n", " next_index = sample(preds, temp)\n", " \n", " # add sampled index to the current output\n", " gen_seq[max_len + i, next_index] = True\n", " \n", " # create the string version\n", " next_char = chars[next_index]\n", " gen_text += next_char\n", " \n", " # only return the string version because that's what we care about\n", " return(gen_text)" ], "id": "56fb1bcc", "execution_count": 19, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "8a157c57" }, "source": [ "Let's try it out! We'll create strings of length 500, separating the real seed text from the generated text. We'll also vary the temperature parameter of the `sample()` function, which controls how random the model's predictions can be. " ], "id": "8a157c57" }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "46e690c3", "outputId": "7aa7c1cb-da21-4268-df0f-1733794e93b4" }, "source": [ "gen_length = 200\n", "seed_index = 10000\n", "\n", "for temp in [0.01, 0.02, 0.03, 0.04, 0.05]:\n", "\n", " gen = generate_string(seed_index, temp, gen_length, model)\n", "\n", " print(4*\"-\")\n", " print(\"TEMPERATURE: \" + str(temp))\n", " print(gen[:-gen_length], end=\"\")\n", " print(\" => \", end = \"\")\n", " print(gen[-gen_length:], \"\")" ], "id": "46e690c3", "execution_count": 20, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "----\n", "TEMPERATURE: 0.01\n", "tioning. \n", "KIRA: No p => roblem still and with the system no one kink that I know that. \n", "ODO: I think you're get your own has now\n", "the replions are anything throks we can to had a moment. \n", "ODO: I don't take you won, I was just age. \n", "BASHIR: Now deathsmest we could have become betterany. \n", "KOVAT: It's Curzoa, I'll be ride this there's all your something our security begnapt sutptends all and I can't seems a Fedenal ship. \n", "DAX: I've must kind out an actoptation enter) \n", "KIRA: I'm not sure in trill through the starting duct. \n", "----\n", "TEMPERATURE: 0.02\n", "tioning. \n", "KIRA: No p => rourcted to know why it was the station? \n", "KIRA: I'm not service, Commander, but I can't dectine that that you're going to risk's ready. \n", "ODO: What don't say her. \n", "GARAK: I'm sure you don't have a ship. \n", "DAX: I'm not sure our sensors areved. \n", "KIRA: They're not a lot for business to say. \n", "DAX: Light, I was justignten to reques the evencent way to be a Karama. Be'turatiss.) \n", "k'GYEUS KiAn and Hore]\n", "\n", "SISKO: Commander, you outh it? \n", "KIRA: The Bajoran securething the starts now, they make it don't love \n", "----\n", "TEMPERATURE: 0.03\n", "tioning. \n", "KIRA: No p => roblem still and with the system no one kink that I know that. \n", "ODO: I think you're get your entire pass beamsed. \n", "KIRA: They're not a seri. \n", "QUARK: Not your with the Dominion. \n", "KIRA: I don't like me to keep him. \n", "KIRA: I don't know what you're trandny, Odo I know what you're trandny, put the perinity been anything everyone worked you as the Gamma Quadrant take unternablet Cardassia, even one. \n", "BASHIR: He could have from the Your. It's transporstors are anything. \n", "DAX: And that was a 8raabEpatre \n", "----\n", "TEMPERATURE: 0.04\n", "tioning. \n", "KIRA: No p => rourcted to know why it was the station? \n", "KIRA: I'm not service, we can understand the offer the station in the patinion are surverce. In fewr entersterd to be a ship. \n", "\n", "[Rarn. Soorsaty engines. \n", "DAX: Can the station seconds. \n", "DAX: You know how mar? Examper yoursents. \n", "O'BRIEN: Are you try some Quark. \n", "QUARK: What are you can bround 2Dre plegans so 8ominicr, out of them Holong have too lateruse frommenticy. I would be abore allow dry the Federation on the 7ardassian loviling there was about the \n", "----\n", "TEMPERATURE: 0.05\n", "tioning. \n", "KIRA: No p => roblem shipt? \n", "KIRA: Protes, she was one of your continuisation is Vulld Quark talk. \n", " [Cardassissile] \n", "DAX: Are you sure the sbostshet. \n", "KIRA: How do you know what you don't 5un to anyone Yel(mand? \n", "DAX: You don't want you admit. \n", "DAX: Sisko to Oloy, sir. I've been 'HAR Here many, I'm no one he treaty to the station. \n", "myiral seven weatonic subspanebar under belier, we Obside. \n", "BASHIR: /real taking your Volan ]\n", "\n", "ASIAN: I don't know the station that our s4azle - offer relaxiars. \n", "('verina nothing \n" ] } ] }, { "cell_type": "code", "source": [ "gen_length = 1000\n", "seed_index = 10002\n", "\n", "gen = generate_string(seed_index, 0.02, gen_length, model)" ], "metadata": { "id": "UbilVBRTIW_G" }, "id": "UbilVBRTIW_G", "execution_count": 41, "outputs": [] }, { "cell_type": "code", "source": [ "import re\n", "cast = set(re.findall(r\"[A-Z]+(?=:)\",gen))\n", "print(\"CAST OF CHARACTERS: \", end = \"\")\n", "print(cast)\n", "print(\"-\"*80)\n", "print(gen)" ], "metadata": { "id": "kjAPnVbsJ6tY", "outputId": "a51d5860-ea34-47ad-bcf7-d0894dcec2f3", "colab": { "base_uri": "https://localhost:8080/" } }, "id": "kjAPnVbsJ6tY", "execution_count": 42, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "CAST OF CHARACTERS: {'DAX', 'SISKO', 'ORAT', 'BAREIL', 'KIRA'}\n", "--------------------------------------------------------------------------------\n", "KIRA: No problem. \n", "DAX: No. \n", "KIRA: Are you sure? \n", "(Enter adobs, Ber give an office and go planty starting the would werk to get it and we can be about to the station. \n", "KIRA: Thanks. I have a nood for the Necold. \n", "SISKO: I'm not surprised any here. \n", "(The only wined of realing. \n", "DAX: It much to Odo,\n", "we had a man long. \n", "KIRA: Oh, the one whose hime. \n", "KIRA: I'm sorry, Che Sector warry hele with you. \n", "KIRA: Are you done that you're not then bus hasteres. \n", "KIRA: I'm not sure in trill through the starting condicy for the station? \n", "DAX: Your way. I'll be thrown the station to the station and see if we can tell my half to be a ) \n", "ORAT: The ollywate, Commander had nothing to talk to the station? \n", "BAREIL: He's edjonents are yoursely? \n", "DAX: No. \n", "KIRA: It's been to compution. \n", "KIRA: Kira, and no will took about your door around the security to Lemar ifnact it know I've got to storfleaned to be all right. They wonder the Federation. \n", "DAX: Look and no one homes a lot of the other has readint tor plame of and I'm sure yo\n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "5d24f90f" }, "source": [ "Let's make a few observations. \n", "\n", "1. First of all, it can take a surprisingly long time to make predictions using our model. This is because we have to call the `predict()` method *for each character*, in order to ensure that the model appropriately takes into account its recent predictions. This can take a pretty long time! \n", "2. Second, determining a good value for the temperature can take some experimentation. Note that low temperatures don't necessarily correspond to \"more realistic\" text -- they just correspond to highlighting common patterns in the text, possibly in excess. Higher temperatures also don't necessarily correspond to a \"creative\" algorithm in any normal sense of the word -- set the temperature too high, and you'll just get gibberish. \n", "\n", "## Specialization\n", "\n", "In this case, we were able to create a model for generating Star Trek scripts using an instance of Google Colab in roughly 30 minutes. This model is highly limited. Although it clearly has learned some relevant features of Star Trek scripts, there's no way that you'd mistake the output of the model for an actual script by a screenwriter. Considering how hard this was, imagine how much effort and computational resources are required to create more general language models! Indeed, as highlighted in a [recent and controversial paper](https://faculty.washington.edu/ebender/papers/Stochastic_Parrots.pdf), training large language models in this day and age can require energy expenditure comparable to a trans-Atlantic flight! " ], "id": "5d24f90f" } ] }