{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "40fe1c0748661d409405f09e9250edcf", "grade": false, "grade_id": "cell-f67b416b0411edc1", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "# Part 1: Sequence Modelling" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "ed5be3ed6978c8f54e5301bbb9d0d0bd", "grade": false, "grade_id": "cell-a328195514e4147d", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "__Before starting, we recommend you enable GPU acceleration if you're running on Colab.__" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "5d3c83086d3c94835d3f8510df7f8d0f", "grade": false, "grade_id": "cell-52dc2d0ecf866f90", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "# Execute this code block to install dependencies when running on colab\n", "try:\n", " import torch\n", "except:\n", " from os.path import exists\n", " from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag\n", " platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())\n", " cuda_output = !ldconfig -p|grep cudart.so|sed -e 's/.*\\.\\([0-9]*\\)\\.\\([0-9]*\\)$/cu\\1\\2/'\n", " accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'\n", "\n", " !pip install -q http://download.pytorch.org/whl/{accelerator}/torch-1.0.0-{platform}-linux_x86_64.whl torchvision\n", "\n", "try: \n", " import torchbearer\n", "except:\n", " !pip install torchbearer" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "e9c1849d2839e6c1ebac8b563fe9d0bc", "grade": false, "grade_id": "cell-36906cd59153af5b", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "## Markov chains\n", "\n", "We'll start our exploration of modelling sequences and building generative models using a 1st order Markov chain. The Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. In our case we're going to learn a model over a set of characters from an English language text. The events, or states, in our model are the set of possible characters, and we'll learn the probability of moving from one character to the next.\n", "\n", "Let's start by loading the data from the web:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "508d37e774c4619f7c4706291f52c466", "grade": false, "grade_id": "cell-a592d788c1587e58", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "from torchvision.datasets.utils import download_url\n", "import torch\n", "import random\n", "import sys\n", "import io\n", "\n", "# Read the data\n", "download_url('https://s3.amazonaws.com/text-datasets/nietzsche.txt', '.', 'nietzsche.txt', None)\n", "text = io.open('./nietzsche.txt', encoding='utf-8').read().lower()\n", "print('corpus length:', len(text))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "a1b08d3393744a64f2d1991d8ddd3c1a", "grade": false, "grade_id": "cell-f345306b2d88866e", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "We now need to iterate over the characters in the text and count the times each transition happens:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "8452011964d66c1c591df5d4f49c82b6", "grade": false, "grade_id": "cell-21c1c86b5d26778f", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "transition_counts = dict()\n", "for i in range(0,len(text)-1):\n", " currc = text[i]\n", " nextc = text[i+1]\n", " if currc not in transition_counts:\n", " transition_counts[currc] = dict()\n", " if nextc not in transition_counts[currc]:\n", " transition_counts[currc][nextc] = 0\n", " transition_counts[currc][nextc] += 1" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "73f178f03f644719a7684c942212b488", "grade": false, "grade_id": "cell-f93390ea74e0de36", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "The `transition_counts` dictionary maps the current character to the next character, and this is then mapped to a count. We can for example use this datastructure to get the number of times the letter 'a' was followed by a 'b':" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "d37afa35d97a35848444cbffb3d4369c", "grade": false, "grade_id": "cell-ed4eee0105f3a0cf", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "print(\"Number of transitions from 'a' to 'b': \" + str(transition_counts['a']['b']))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "89b218435d0475de2fd126d6cf654d07", "grade": false, "grade_id": "cell-195d5f1aebd40797", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "Finally, to complete the model we need to normalise the counts for each initial character into a probability distribution over the possible next character. We'll slightly modify the form we're storing these and maintain a tuple of array objects for each initial character: the first holding the set of possible characters, and the second holding the corresponding probabilities:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "9dbad32065a0c97c65cc5760b5efa718", "grade": false, "grade_id": "cell-28d68d95d39e3180", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "transition_probabilities = dict()\n", "for currentc, next_counts in transition_counts.items():\n", " values = []\n", " probabilities = []\n", " sumall = 0\n", " for nextc, count in next_counts.items():\n", " values.append(nextc)\n", " probabilities.append(count)\n", " sumall += count\n", " for i in range(0, len(probabilities)):\n", " probabilities[i] /= float(sumall)\n", " transition_probabilities[currentc] = (values, probabilities)" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "7e7336b1ef946f10dc63020be2b17ecf", "grade": false, "grade_id": "cell-f79188db65a705aa", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "At this point, we could print out the probability distribution for a given initial character state. For example, to print the distribution for 'a':" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "5eaaf96ae4181ad3a2b7087d9643ff1f", "grade": false, "grade_id": "cell-ee624b4c0ff2c64f", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "for a,b in zip(transition_probabilities['a'][0], transition_probabilities['a'][1]):\n", " print(a,b)" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "93b3745091fcb17322253b02af1bc2d9", "grade": false, "grade_id": "cell-9cc2f51ebb3ea728", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "It looks like the most probable letter to follow an 'a' is 'n'. \n", "\n", "__What is the most likely letter to follow the letter 'j'? Write your answer in the block below:__" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "checksum": "bd3cad04582e881ebc595619cef14fd1", "grade": true, "grade_id": "cell-2a3a71268b5a2df9", "locked": false, "points": 1, "schema_version": 1, "solution": true } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "fb10d0a7366b0f59055017a39614e93e", "grade": false, "grade_id": "cell-43b70458c31baaf3", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "We mentioned earlier that the Markov model is generative. This means that we can draw samples from the distributions and iteratively move between states. \n", "\n", "Use the following code block to iteratively sample 1000 characters from the model, starting with an initial character 't'. You can use the `torch.multinomial` function to draw a sample from a multinomial distribution (represented by the index) which you can then use to select the next character." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "checksum": "4f50c283225208b2043431a1c2da104f", "grade": true, "grade_id": "cell-5be0d1f3fc6e8f65", "locked": false, "points": 4, "schema_version": 1, "solution": true } }, "outputs": [], "source": [ "current = 't'\n", "for i in range(0, 1000):\n", " print(current, end='')\n", " # sample the next character based on `current` and store the result in `current`\n", " # YOUR CODE HERE\n", " raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "56c49154c48797d312762e4761085bf7", "grade": false, "grade_id": "cell-de87ac1c9708205f", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "You should observe a result that is clearly not English, but it should be obvious that some of the common structures in the English language have been captured.\n", "\n", "__Rather than building a model based on individual characters, can you implement a model in the following code block that works on words instead?__" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "checksum": "ed71d61ce7a2f648d7507693902da3d7", "grade": true, "grade_id": "cell-a6d4a8a639190296", "locked": false, "points": 5, "schema_version": 1, "solution": true } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "cd494f7c9fd0e838e74a5c70cd5ef773", "grade": false, "grade_id": "cell-d54dcdc1d540df4f", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "## RNN-based sequence modelling\n", "\n", "It is possible to build higher-order Markov models that capture longer-term dependencies in the text and have higher accuracy, however this does tend to become computationally infeasible very quickly. Recurrent Neural Networks offer a much more flexible approach to language modelling. \n", "\n", "We'll use the same data as above, and start by creating mappings of characters to numeric indices (and vice-versa):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "ae5b316a7c13ecd23143d4dfa35bee5c", "grade": false, "grade_id": "cell-d152a2cb9707a4c4", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "chars = sorted(list(set(text)))\n", "print('total chars:', len(chars))\n", "char_indices = dict((c, i) for i, c in enumerate(chars))\n", "indices_char = dict((i, c) for i, c in enumerate(chars))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "f5f0f03e31eccfb78d1756139f90e1ac", "grade": false, "grade_id": "cell-46a85547d03a1f30", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "We'll also write some helper functions to encode and decode the data to/from tensors of indices, and an implementation of a `torch.Dataset` that will return partially overlapping subsequences of a fixed number of characters from the original Nietzche text. Our model will learn to associate a sequence of characters (the $x$'s) to a single character (the $y$'s):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "fb93c4daba5adbfabf8ef5656cc89543", "grade": false, "grade_id": "cell-64f8b2518a008c54", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "from torch.utils.data import Dataset, DataLoader\n", "from torch import nn\n", "from torch.nn import functional as F\n", "from torch import optim\n", "import random\n", "import sys\n", "import io\n", "\n", "maxlen = 40\n", "step = 3\n", "\n", "\n", "def encode(inp):\n", " # encode the characters in a tensor\n", " x = torch.zeros(maxlen, dtype=torch.long)\n", " for t, char in enumerate(inp):\n", " x[t] = char_indices[char]\n", "\n", " return x\n", "\n", "\n", "def decode(ten):\n", " s = ''\n", " for v in ten:\n", " s += indices_char[v] \n", " return s\n", "\n", "\n", "class MyDataset(Dataset):\n", " # cut the text in semi-redundant sequences of maxlen characters\n", " def __len__(self):\n", " return (len(text) - maxlen) // step\n", "\n", " def __getitem__(self, i):\n", " inp = text[i*step: i*step + maxlen]\n", " out = text[i*step + maxlen]\n", "\n", " x = encode(inp)\n", " y = char_indices[out]\n", "\n", " return x, y" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "df5113fc9d6d635dca32463f07fcb7d1", "grade": false, "grade_id": "cell-bdbb9997aa91863b", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "We can now define the model. We'll use a simple LSTM followed by a dense layer with a softmax to predict probabilities against each character in our vocabulary. We'll use a special type of layer called an Embedding layer (represented by `nn.Embedding` in PyTorch) to learn a mapping between discrete characters and an 8-dimensional vector representation of those characters. You'll learn more about Embeddings in the next part of the lab." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "2d79273b7a17d28c4d6c93e3d22927f5", "grade": false, "grade_id": "cell-2aa5e9483ccd6ba4", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "class CharPredictor(nn.Module):\n", " def __init__(self):\n", " super(CharPredictor, self).__init__()\n", " self.emb = nn.Embedding(len(chars), 8)\n", " self.lstm = nn.LSTM(8, 128, batch_first=True)\n", " self.lin = nn.Linear(128, len(chars))\n", "\n", " def forward(self, x):\n", " x = self.emb(x)\n", " lstm_out, _ = self.lstm(x)\n", " out = self.lin(lstm_out[:,-1]) #we want the final timestep output (timesteps in last index with batch_first)\n", " return out" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "8d9e5e92460bbe5a977021f631362478", "grade": false, "grade_id": "cell-eda65e7ef08f66a5", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "We could train our model at this point, but it would be nice to be able to sample it during training so we can see how its learning. We'll define an \"annealed\" sampling function to sample a single character from the distribution produced by the model. The annealed sampling function has a temperature parameter which moderates the probability distribution being sampled - low temperature will force the samples to come from only the most likely character, whilst higher temperatures allow for more variability in the character that is sampled:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "5aedd2ccd26ef2d6505278a7894671e4", "grade": false, "grade_id": "cell-be521ae6b656234a", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "def sample(logits, temperature=1.0):\n", " # helper function to sample an index from a probability array\n", " logits = logits / temperature\n", " return torch.multinomial(F.softmax(logits, dim=0), 1)" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "5057964b1e77f5ba8c92abf3148ce73f", "grade": false, "grade_id": "cell-981d79ec1593b83b", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "Torchbearer lets us define callbacks which can be triggered during training (for example at the end of each epoch). Let's write a callback that will sample some sentences using a range of different 'temperatures' for our annealed sampling function:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "44f0bad7cc7aa788fa4111d6a55766ca", "grade": false, "grade_id": "cell-788b63e92e811aa8", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "import torchbearer\n", "from torchbearer import Trial\n", "from torchbearer.callbacks.decorators import on_end_epoch\n", "\n", "device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n", "\n", "@on_end_epoch\n", "def create_samples(state):\n", " with torch.no_grad():\n", " epoch = -1\n", " if state is not None:\n", " epoch = state[torchbearer.EPOCH]\n", "\n", " print()\n", " print('----- Generating text after Epoch: %d' % epoch)\n", "\n", " start_index = random.randint(0, len(text) - maxlen - 1)\n", " for diversity in [0.2, 0.5, 1.0, 1.2]:\n", " print()\n", " print()\n", " print('----- diversity:', diversity)\n", "\n", " generated = ''\n", " sentence = text[start_index:start_index+maxlen-1]\n", " generated += sentence\n", " print('----- Generating with seed: \"' + sentence + '\"')\n", " print()\n", " sys.stdout.write(generated)\n", "\n", " inputs = encode(sentence).unsqueeze(0).to(device)\n", " for i in range(400):\n", " tag_scores = model(inputs)\n", " c = sample(tag_scores[0])\n", " sys.stdout.write(indices_char[c.item()])\n", " sys.stdout.flush()\n", " inputs[0, 0:inputs.shape[1]-1] = inputs[0, 1:].clone()\n", " inputs[0, inputs.shape[1]-1] = c\n", " print()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "1da7de78addd5726d2a156d5dc983e9c", "grade": false, "grade_id": "cell-2dc6814904f12692", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "Now, all the pieces are in place. __Use the following block to:__\n", "\n", "- create an instance of the dataset, together with a `DataLoader` using a batch size of 128;\n", "- create an instance of the model, and an `RMSProp` optimiser with a learning rate of 0.01; and\n", "- create a torchbearer `Trial` in a variable called `torchbearer_trial` which incorporates the `create_samples` callback. Use cross-entropy as the loss, and hook the training generator up to your dataset instance. Make sure you move your `Trial` object to the GPU if one is available." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "checksum": "d31a6ee7d253ba186dc92bebe139ece3", "grade": true, "grade_id": "cell-9c17e699a59cd71c", "locked": false, "points": 10, "schema_version": 1, "solution": true } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "bdf13f8a87d1ef999f3c1ee8e083d7fe", "grade": false, "grade_id": "cell-f70caee6c691fd1d", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "Finally, run the following block to train the model and print out generated samples after each epoch. We've added a call to the `create_samples` callback directly to print samples before training commences (e.g. with random weights). Be aware this will take some time to run..." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "2b52b125ce1a870bc1cc96b8dc130bc9", "grade": false, "grade_id": "cell-ab250fe0c0be1234", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "create_samples.on_end_epoch(None)\n", "torchbearer_trial.run(epochs=10)" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "e7794bc1ea4e6f9d9b7bbb920f2f4b8f", "grade": false, "grade_id": "cell-6b611545370451e8", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "Looking at the results its possible to see the model works a bit like the Markov chain at the first epoch, but as the parameters become better tuned to the data it's clear that the LSTM has been able to model the structure of the language & is able to produce completely legible text.\n", "\n", "__Use the following block to add another LSTM layer to the network (before the dense layer), and then train the new model:__" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "checksum": "0a76447f4d9fc2b75f4060b59961fa2c", "grade": true, "grade_id": "cell-471a1591e08e7879", "locked": false, "points": 7, "schema_version": 1, "solution": true } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "0317b05707ece61dd95a6b1fa4b5b4d5", "grade": false, "grade_id": "cell-873e6f510f67f829", "locked": true, "schema_version": 1, "solution": false } }, "source": [ " __How does the additional layer affect performance of the model? Provide your answer in the block below:__" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "nbgrader": { "checksum": "dacc657a6333f7a357e3d06b389f050b", "grade": true, "grade_id": "cell-e44766c257b457e9", "locked": false, "points": 3, "schema_version": 1, "solution": true } }, "source": [ "YOUR ANSWER HERE" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 2 }