{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This lesson was adapted from the end of [lesson 3](https://course.fast.ai/videos/?lesson=3) and beginning of [lesson 4](https://course.fast.ai/videos/?lesson=4) of the latest fast.ai Practical Deep Learning for Coders course. We will cover all the material you need here in this notebook, so no need to have taken the Deep Learning course. Even if you have taken the DL class, we will go slower and get into more detail here!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Transfer Learning for Natural Language Modeling\n", "### Contructing a Language Model and a Sentiment Classifier for IMDB movie reviews" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Transfer learning has been widely used with great success in computer vision for several years, but only in the last year or so has it been successfully applied to NLP (beginning with ULMFit, which we will use here, which was built upon by BERT and GPT-2).\n", "\n", "As Sebastian Ruder wrote in [The Gradient](https://thegradient.pub/) last summer, [NLP's ImageNet moment has arrived](https://thegradient.pub/nlp-imagenet/).\n", "\n", "We will first build a language model for IMDB movie reviews. Next we will build a sentiment classifier, which will predict whether a review is negative or positive, based on its text. For both of these tasks, we will use **transfer learning**. Starting with the pre-trained weights from the `wikitext-103` language model, we will tune these weights to specialize to the language of `IMDb` movie reviews. " ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "## Language Models" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Language modeling can be a fun creative form. Research scientist [Janelle Shane blogs](https://aiweirdness.com/) & [tweets](https://twitter.com/JanelleCShane) about her creative AI explorations, which often involve text. For instance, see her:\n", "\n", "- [Why did the neural network cross the road?](https://aiweirdness.com/post/174691534037/why-did-the-neural-network-cross-the-road)\n", "- [Try these neural network-generated recipes at your own risk.](https://aiweirdness.com/post/163878889437/try-these-neural-network-generated-recipes-at-your)\n", "- [D&D character bios - now making slightly more sense](https://aiweirdness.com/post/183471928977/dd-character-bios-now-making-slightly-more)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using a GPU" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You will need to have the fastai library installed for this lesson, and you will want to use a GPU to train your neural net. If you don't have a GPU you can use in your computer (currently, only Nvidia GPUs are fully supported by the main deep learning libraries), no worries! There are a number of cloud options you can consider:\n", "\n", "[GPU Cloud Options](https://course.fast.ai/#using-a-gpu)\n", "\n", "**Reminder: If you are using a cloud GPU, always be sure to shut it down when you are done!!! Otherwise, you could end up with an expensive bill!**" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%reload_ext autoreload\n", "%autoreload 2\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from fastai import *\n", "from fastai.text import *\n", "from scipy.spatial.distance import cosine as dist" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that language models can use a lot of GPU, so you may need to decrease batchsize here." ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [], "source": [ "# bs=192\n", "bs=48\n", "# bs=24" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Fix this line: should be `device(0)` instead of `device(2)`" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "#torch.cuda.set_device(2)\n", "torch.cuda.set_device(0)" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "## 1. Prepare the IMDb data (on a sample)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "First let's download the dataset we are going to study. The `IMDb` [dataset](http://ai.stanford.edu/~amaas/data/sentiment/) has been curated by Andrew Maas et al. and contains a total of 100,000 reviews on IMDB. 25,000 of them are labelled as positive and negative for training, another 25,000 are labelled for testing (in both cases they are highly polarized). The remaning 50,000 is an additional unlabelled data (but we will find a use for it nonetheless).\n", "\n", "We'll begin with a sample we've prepared for you, so that things run quickly before going over the full dataset." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "[WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb_sample/data_save.pkl'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb_sample/texts.csv')]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "path = untar_data(URLs.IMDB_SAMPLE)\n", "path.ls()" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "It only contains one csv file, let's have a look at it." ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "It contains one line per review, with the label ('negative' or 'positive'), the text and a flag to determine if it should be part of the validation set or the training set. If we ignore this flag, we can create a `DataBunch` containing this data in one line of code:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load and preprocess the data and form a `databunch`\n", "Add workaround for the bug in the `fastai Text API`" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "failure count is 3\n", "\n", "Wall time: 54.9 s\n" ] } ], "source": [ "%%time\n", "\n", "# throws `BrokenProcessPool' Error sometimes. Keep trying `till it works!\n", "count = 0\n", "error = True\n", "while error:\n", " try: \n", " # Preprocessing steps\n", " data_lm = TextDataBunch.from_csv(path, 'texts.csv')\n", " error = False\n", " print(f'failure count is {count}\\n') \n", " except: # catch *all* exceptions\n", " # accumulate failure count\n", " count = count + 1\n", " print(f'failure count is {count}')" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "By executing this line a process was launched that took a bit of time. Let's dig a bit into it. Images could be fed (almost) directly into a model because they're just a big array of pixel values that are floats between 0 and 1. A text is composed of words, and we can't apply mathematical functions to them directly. We first have to convert them to numbers. This is done in two differents steps: tokenization and numericalization. A `TextDataBunch` does all of that behind the scenes for you." ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### Tokenization" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "The first step of processing we make texts go through is to split the raw sentences into words, or more exactly tokens. The easiest way to do this would be to split the string on spaces, but we can be smarter:\n", "\n", "- we need to take care of punctuation\n", "- some words are contractions of two different words, like isn't or don't\n", "- we may need to clean some parts of our texts, if there's HTML code for instance\n", "\n", "To see what the tokenizer had done behind the scenes, let's have a look at a few texts in a batch." ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "The texts are truncated at 100 tokens for more readability. We can see that it did more than just split on space and punctuation symbols: \n", "- the \"'s\" are grouped together in one token\n", "- the contractions are separated like his: \"did\", \"n't\"\n", "- content has been cleaned for any HTML symbol and lower cased\n", "- there are several special tokens (all those that begin by xx), to replace unkown tokens (see below) or to introduce different text fields (here we only have one)." ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### Numericalization" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Once we have extracted tokens from our texts, we convert to integers by creating a list of all the words used. We only keep the ones that appear at list twice with a maximum vocabulary size of 60,000 (by default) and replace the ones that don't make the cut by the unknown token `UNK`.\n", "\n", "The correspondance from ids tokens is stored in the `vocab` attribute of our datasets, in a dictionary called `itos` (for int to string)." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "['xxunk',\n", " 'xxpad',\n", " 'xxbos',\n", " 'xxeos',\n", " 'xxfld',\n", " 'xxmaj',\n", " 'xxup',\n", " 'xxrep',\n", " 'xxwrep',\n", " 'the']" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_lm.vocab.itos[:10]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "And if we look at what a what's in our datasets, we'll see the tokenized text as a representation:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "Text xxbos xxmaj this movie is so bad , i knew how it ends right after this little girl killed the first person . xxmaj very bad acting very bad plot very bad movie \n", " \n", " do yourself a favour and xxup don't watch it 1 / 10" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_lm.train_ds[0][0]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "But the underlying data is all numbers" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "array([ 2, 5, 21, 31, 16, 52, 107, 10, 19, 669], dtype=int64)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_lm.train_ds[0][0].data[:10]" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### Alternative approach: with the `data block API`" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "We can use the data block API with NLP and have a lot more flexibility than what the default factory methods offer. In the previous example for instance, the data was randomly split between train and validation instead of reading the third column of the csv.\n", "\n", "With the data block API though, we have to manually call the tokenize and numericalize steps. This allows more flexibility, and if you're not using the defaults from fastai, the various arguments to pass will appear in the step they're revelant, so it'll be more readable." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load and preprocess the data and form a `datablock`\n", "Add workaround for the bug in the `fastai Text API`" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "failure count is 0\n", "\n", "Wall time: 15.5 s\n" ] } ], "source": [ "%%time\n", "\n", "# throws `BrokenProcessPool' Error sometimes. Keep trying `till it works!\n", "count = 0\n", "error = True\n", "while error:\n", " try: \n", " # Preprocessing steps\n", " data = (TextList.from_csv(path, 'texts.csv', cols='text')\n", " .split_from_df(col=2)\n", " .label_from_df(cols=0)\n", " .databunch()) \n", " error = False\n", " print(f'failure count is {count}\\n') \n", " except: # catch *all* exceptions\n", " # accumulate failure count\n", " count = count + 1\n", " print(f'failure count is {count}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Transfer Learning
\n", "### We are going to create an `IMDb` language model starting with the pretrained weights from the `wikitext-103` language model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's grab the full `IMDb` dataset for what follows." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/data_clas.pkl'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/data_lm.pkl'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/data_save.pkl'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/finetuned.pth'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/finetuned_enc.pth'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/imdb.vocab'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/imdb_textlist_class'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/ld.pkl'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/ll_clas.pkl'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/ll_lm.pkl'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/lm_databunch'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/models'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/pretrained'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/README'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/test'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/tmp_clas'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/tmp_lm'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/train'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/unsup'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/vocab_lm.pkl')]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "path = untar_data(URLs.IMDB)\n", "path.ls()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/train/labeledBow.feat'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/train/neg'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/train/pos'),\n", " WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/train/unsupBow.feat')]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(path/'train').ls()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The reviews are in a training and test set following an imagenet structure. The only difference is that there is an `unsup` folder in `train` that contains the unlabelled data.\n", "\n", "We're not going to train a model that classifies the reviews from scratch. Like in computer vision, we'll use a model pretrained on a bigger dataset (a cleaned subset of wikipedia called [wikitext-103](https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip)). That model has been trained to guess what the next word, its input being all the previous words. It has a recurrent structure and a hidden state that is updated each time it sees a new word. This hidden state thus contains information about the sentence up to that point.\n", "\n", "We are going to use that 'knowledge' of the English language to build our classifier, but first, like for computer vision, we need to fine-tune the pretrained model to our particular dataset. Because the English of the reviews left by people on IMDB isn't the same as the English of wikipedia, we'll need to adjust a little bit the parameters of our model. Plus there might be some words extremely common in that dataset that were barely present in wikipedia, and therefore might no be part of the vocabulary the model was trained on." ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### More about WikiText-103" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "We will be using the `WikiText-103` dataset created by [Stephen Merity](https://smerity.com/) to pre-train a language model.\n", "\n", "To quote [Stephen's post](https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/):\n", "\n", "*The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License.*\n", "\n", "*Compared to the preprocessed version of Penn Treebank (PTB), WikiText-2 is over 2 times larger and WikiText-103 is over 110 times larger. The WikiText dataset also features a far larger vocabulary and retains the original case, punctuation and numbers - all of which are removed in PTB. As it is composed of full articles, the dataset is well suited for models that can take advantage of long term dependencies.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Download wikitext-103](https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip). Unzip it into the `.fastai/data/` folder on your computer." ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### 2A. Package the `IMDb` data into a language model `databunch`" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "This is where the unlabelled data is going to be useful to us, as we can use it to fine-tune our model. Let's create our data object with the data block API (this takes a few minutes).\n", "\n", "We'll to use a special kind of `TextDataBunch` for the language model, that ignores the labels (that's why we put 0 everywhere), will shuffle the texts at each epoch before concatenating them all together (only for training; we don't shuffle for the validation set) and will send batches that read that text in order with targets that are the next word in the sentence.\n", "\n", "Add a `try-except` wrapper as a workaround for the bug in the `fastai Text API`" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "failure count is 0\n", "\n", "Wall time: 2.79 s\n" ] } ], "source": [ "%%time\n", "\n", "# throws `BrokenProcessPool` Error sometimes. Keep trying `till it works!\n", "count = 0\n", "error = True\n", "while error:\n", " try: \n", " # Preprocessing steps\n", " data_lm = (TextList.from_folder(path)\n", " #Inputs: all the text files in path\n", " .filter_by_folder(include=['train', 'test', 'unsup']) \n", " # notebook 3-logreg-nb-imbd used .split_by_folder instead of .filter_by_folder\n", " # and this took less time to run. Can we do the same here?\n", " #We may have other temp folders that contain text files so we only keep what's in train and test\n", " .split_by_rand_pct(0.1, seed=42))\n", " #We randomly split and keep 10% (10,000 reviews) for validation\n", " #.label_for_lm() \n", " #We want to make a language model so we label accordingly\n", " #.databunch(bs=bs, num_workers=1))\n", " error = False\n", " print(f'failure count is {count}\\n') \n", " except: # catch *all* exceptions\n", " # accumulate failure count\n", " count = count + 1\n", " print(f'failure count is {count}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### I got faster results when I do the last two steps in a separate cell:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "failure count is 4\n", "\n", "Wall time: 14min 18s\n" ] } ], "source": [ "%%time\n", "\n", "# throws `BrokenProcessPool' Error sometimes. Keep trying `till it works!\n", "count = 0\n", "error = True\n", "while error:\n", " try: \n", " # Preprocessing steps\n", " # the next step is the bottleneck\n", " data_lm = (data_lm.label_for_lm() \n", " #We want to make a language model so we label accordingly\n", " .databunch(bs=bs, num_workers=1))\n", " error = False\n", " print(f'failure count is {count}\\n') \n", " except: # catch *all* exceptions\n", " # accumulate failure count\n", " count = count + 1\n", " print(f'failure count is {count}')" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "hidden": true, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "(60000, 90000)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(data_lm.vocab.itos),len(data_lm.train_ds)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idxtext
0forgotten until much later , by which time i did not care . xxmaj the character we should really care about is a very cocky , overconfident xxmaj ashton xxmaj kutcher . xxmaj the problem is he comes off as kid who thinks he 's better than anyone else around him and shows no signs of a cluttered closet . xxmaj his only obstacle appears to be winning over xxmaj
1of an xxunk bad b - movie . xxmaj there are worse movies than this one ( xxmaj titanic for example ) , but this definitely shares the pile of steaming crap movies . \\n \\n xxup ok this was apparently shot in xxmaj kansas xxmaj city , which explains why everyone is so lame . xxmaj the main guy looks like xxmaj steve xxmaj guttenberg , and is
2, then this'll be okay for you .. xxmaj however , if you ca n't stand a plot being thrown at you which remains unresolved by the time the credits roll , you should go watch something else . xxwrep 5 xxbos i am not quite sure what to say / think about this movie . xxmaj it is definitely not the worst in the series ( there
3tension for such a famous race and anyway the race - off at the end seems like another xxunk . \\n \\n xxmaj actually i 'd have given it another mark if they 'd stuck to the alternative title \" xxmaj those xxmaj magnificent xxmaj men in xxmaj their xxmaj jaunty xxmaj xxunk \" but in truth the animated series \" xxmaj wacky xxmaj races \" did this so
4though . xxmaj rest of the cast are average at best . xxmaj overall xxmaj not worth your time or money . * 1 / 2 out of 5 xxwrep 5 xxbos xxup the xxup protector . xxmaj you hear the name . xxmaj you think , \" ah , it 's a crappy xxmaj hong xxmaj kong movie . \" xxmaj guess what - it 's not
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "data_lm.show_batch()" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "#### Save the `databunch` for next time." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "data_lm.save()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Load the saved data" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "data_lm = load_data(path, 'lm_databunch', bs=bs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2B. The **Transfer Learning** step.\n", "#### This is where the magic happens!\n", "#### The `AWD_LSTM` object contains the pretrained weights and the neural net architecture of the `wikitext-103` language model. These will be downloaded the first time you execute the following line, and stored in `~/.fastai/models/` (or elsewhere if you specified different paths in your config file). \n", "\n", "We import these into the `language_model_learner` object for our `IMDb` language model as follows:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "scrolled": true }, "outputs": [], "source": [ "learn_lm = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Get the `IMDb` language model `vocabulary`" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "vocab = data_lm.vocab" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "35639" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vocab.stoi[\"stingray\"]" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'stingray'" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vocab.itos[vocab.stoi[\"stingray\"]]" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'xxunk'" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vocab.itos[vocab.stoi[\"mobula\"]]" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "awd = learn_lm.model[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Get the `IMDb` language model `encoder`. Recall that the `encoder` translates tokens into numerical vectors in the space defined by the `IMDb` vocabulary." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "enc = learn_lm.model[0].encoder" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "torch.Size([60000, 400])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "enc.weight.size()" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "#### Difference in vocabulary between IMDB and Wikipedia language models" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "We are going to load `wiki_itos` (the index-to-string list) from the `wikitext 103` language model. We will compare the vocabularies of `wikitext-103` and `IMDB`. It is to be expected that the two sets have some different vocabulary words, and that is no problem for transfer learning!" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "#wiki_itos = pickle.load(open(Config().model_path()/'wt103-1/itos_wt103.pkl', 'rb'))\n", "wiki_itos = pickle.load(open(Config().model_path()/'wt103-fwd/itos_wt103.pkl', 'rb'))" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "['xxunk',\n", " 'xxpad',\n", " 'xxbos',\n", " 'xxeos',\n", " 'xxfld',\n", " 'xxmaj',\n", " 'xxup',\n", " 'xxrep',\n", " 'xxwrep',\n", " 'the']" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wiki_itos[:10]" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "60000" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(wiki_itos)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "60000" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(vocab.itos)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "hidden": true }, "outputs": [], "source": [ "i, unks = 0, []\n", "while len(unks) < 50:\n", " if data_lm.vocab.itos[i] not in wiki_itos: unks.append((i,data_lm.vocab.itos[i]))\n", " i += 1" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "hidden": true }, "outputs": [], "source": [ "wiki_words = set(wiki_itos)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "hidden": true }, "outputs": [], "source": [ "imdb_words = set(vocab.itos)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "hidden": true }, "outputs": [], "source": [ "wiki_not_imbdb = wiki_words.difference(imdb_words)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "hidden": true }, "outputs": [], "source": [ "imdb_not_wiki = imdb_words.difference(wiki_words)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "hidden": true }, "outputs": [], "source": [ "wiki_not_imdb_list = []\n", "\n", "for i in range(100):\n", " word = wiki_not_imbdb.pop()\n", " wiki_not_imdb_list.append(word)\n", " wiki_not_imbdb.add(word)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "hidden": true, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "['fustat',\n", " 'sterilization',\n", " 'konkani',\n", " 'tvrđa',\n", " 'bruner',\n", " 'brachiopods',\n", " 'aud',\n", " 'i-68',\n", " 'kaafjord',\n", " 'maximian',\n", " '48th',\n", " 'morphy',\n", " 'freyja',\n", " 'maenas',\n", " 'perthshire']" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wiki_not_imdb_list[:15]" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "hidden": true }, "outputs": [], "source": [ "imdb_not_wiki_list = []\n", "\n", "for i in range(100):\n", " word = imdb_not_wiki.pop()\n", " imdb_not_wiki_list.append(word)\n", " imdb_not_wiki.add(word)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "['sceptic',\n", " 'paquin',\n", " 'louche',\n", " 'sodom',\n", " 'atlantean',\n", " 'kron',\n", " 'gourd',\n", " 'tolwyn',\n", " 'black-',\n", " 'catty',\n", " 'bandito',\n", " 'skulduggery',\n", " 'johnathon',\n", " 'fiendishly',\n", " 'malachi']" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "imdb_not_wiki_list[:15]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "All words that appear in the `IMDB` vocab, but not the `wikitext-103` vocab, will be initialized to the same random vector in a model. As the model trains, we will learn their weights." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vocab.stoi[\"modernisation\"]" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"modernisation\" in wiki_words" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "25365" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vocab.stoi[\"30-something\"]" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "(False, True)" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"30-something\" in wiki_words, \"30-something\" in imdb_words" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "16735" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vocab.stoi[\"linklater\"]" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "(False, True)" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"linklater\" in wiki_words, \"linklater\" in imdb_words" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "hidden": true, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "(True, True)" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"house\" in wiki_words, \"house\" in imdb_words" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.allclose(enc.weight[vocab.stoi[\"30-something\"], :], \n", " enc.weight[vocab.stoi[\"linklater\"], :])" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.allclose(enc.weight[vocab.stoi[\"30-something\"], :], \n", " enc.weight[vocab.stoi[\"house\"], :])" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "hidden": true }, "outputs": [], "source": [ "new_word_vec = enc.weight[vocab.stoi[\"linklater\"], :]" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "#### Generating fake movie review-like text with the **untrained** `IMDb` language model" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "hidden": true }, "outputs": [], "source": [ "TEXT = \"The color of the sky is\"\n", "N_WORDS = 40\n", "N_SENTENCES = 2" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "hidden": true, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The color of the sky is hues in the National Portrait Gallery , London , which was originally the Royal Academy of Art . This display has since been referred to as the Venus Jubilee\n", "The color of the sky is called black , in the form of a white cloth . The color of the body varies through the color of the colours , but the black and blue areas of the wings are black . The white\n" ] } ], "source": [ "print(\"\\n\".join(learn_lm.predict(TEXT, N_WORDS, temperature=0.75) for _ in range(N_SENTENCES)))" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "hidden": true }, "outputs": [], "source": [ "TEXT = \"I hated this movie\"\n", "N_WORDS = 30\n", "N_SENTENCES = 2" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I hated this movie . The first film in the genre , Don ' t Look Now , was what the Academy Awards called a \" apologized \"\n", "I hated this movie by a Dutch composer . This was a musical method often used by Latin music , which was used by the original French film La\n" ] } ], "source": [ "print(\"\\n\".join(learn_lm.predict(TEXT, N_WORDS, temperature=0.75) for _ in range(N_SENTENCES)))" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "hidden": true, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I hated this movie , \" Blaxplotation of the Lake \" , and other films from the era , and all of them were not pfetten on film . It was\n", "I hated this movie = \n", " \n", " The film is the third film in the Harry Potter series and the third film in the Harry Potter series . The\n" ] } ], "source": [ "print(\"\\n\".join(learn_lm.predict(TEXT, N_WORDS, temperature=0.75) for _ in range(N_SENTENCES)))" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "hidden": true }, "outputs": [], "source": [ "doc(LanguageLearner.predict)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Lowering the `temperature` will make the texts less randomized." ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "hidden": true, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I hated this movie by John Lennon , and he said he wanted to do it for the American public . He said that he was \" a little bit\n", "I hated this movie . It was a film that was a success . It was a success , and it was the first film to be released in the United\n" ] } ], "source": [ "print(\"\\n\".join(learn_lm.predict(TEXT, N_WORDS, temperature=0.10) for _ in range(N_SENTENCES)))" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "hidden": true }, "outputs": [], "source": [ "doc(LanguageLearner.predict)" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I hated this movie by Michael Jackson , and he said he was \" a fan of the Academy Award for Best Picture \" . He said\n", "I hated this movie by John Lennon , and the film was released in the United States on November 11 , 1999 . The film was released on\n" ] } ], "source": [ "print(\"\\n\".join(learn_lm.predict(TEXT, N_WORDS, temperature=0.10) for _ in range(N_SENTENCES)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2C. Training (fine-tuning) the `IMDb` language model\n", "#### Starting with the `wikitext-103` pretrained weights, we'll fine-tune the model to \"learn\" the structure in the \"language\" of IMDb movie reviews." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Choose an appropriate learning rate." ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.\n" ] } ], "source": [ "learn_lm.lr_find()" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "learn_lm.recorder.plot(skip_end=15)" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [], "source": [ "lr = 1e-3\n", "lr *= bs/48" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Use the mixed-precision option, if you have it, otherwise omit this step" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LanguageLearner(data=TextLMDataBunch;\n", "\n", "Train: LabelList (90000 items)\n", "x: LMTextList\n", " xxwrep 18 xxbos xxmaj once again xxmaj mr. xxmaj costner has dragged out a movie for far longer than necessary . xxmaj aside from the terrific sea rescue sequences , of which there are very few i just did not care about any of the characters . xxmaj most of us have ghosts in the closet , and xxmaj costner 's character are realized early on , and then forgotten until much later , by which time i did not care . xxmaj the character we should really care about is a very cocky , overconfident xxmaj ashton xxmaj kutcher . xxmaj the problem is he comes off as kid who thinks he 's better than anyone else around him and shows no signs of a cluttered closet . xxmaj his only obstacle appears to be winning over xxmaj costner . xxmaj finally when we are well past the half way point of this stinker , xxmaj costner tells us all about xxmaj kutcher 's ghosts . xxmaj we are told why xxmaj kutcher is driven to be the best with no prior inkling or foreshadowing . xxmaj no magic here , it was all i could do to keep from turning it off an hour in ., xxwrep 18 xxbos xxmaj this is an example of why the majority of action films are the same . xxmaj generic and boring , there 's really nothing worth watching here . a complete waste of the then barely - tapped talents of xxmaj ice - t and xxmaj ice xxmaj cube , who 've each proven many times over that they are capable of acting , and acting well . xxmaj do n't bother with this one , go see xxmaj new xxmaj jack xxmaj city , xxmaj ricochet or watch xxmaj new xxmaj york xxmaj undercover for xxmaj ice - t , or xxmaj boyz n the xxmaj hood , xxmaj higher xxmaj learning or xxmaj friday for xxmaj ice xxmaj cube and see the real deal . xxmaj ice - t 's horribly cliched dialogue alone makes this film grate at the teeth , and i 'm still wondering what the heck xxmaj bill xxmaj paxton was doing in this film ? xxmaj and why the heck does he always play the exact same character ? xxmaj from xxmaj aliens onward , every film i 've seen with xxmaj bill xxmaj paxton has him playing the exact same irritating character , and at least in xxmaj aliens his character died , which made it somewhat gratifying ... \n", " \n", " xxmaj overall , this is second - rate action trash . xxmaj there are countless better films to see , and if you really want to see this one , watch xxmaj judgement xxmaj night , which is practically a carbon copy but has better acting and a better script . xxmaj the only thing that made this at all worth watching was a decent hand on the camera - the cinematography was almost refreshing , which comes close to making up for the horrible film itself - but not quite . 4 / 10 ., xxwrep 18 xxbos xxmaj first of all i hate those moronic rappers , who could'nt act if they had a gun pressed against their foreheads . xxmaj all they do is curse and shoot each other and acting like xxunk version of gangsters . \n", " \n", " xxmaj the movie does n't take more than five minutes to explain what is going on before we 're already at the warehouse xxmaj there is not a single sympathetic character in this movie , except for the homeless guy , who is also the only one with half a brain . \n", " \n", " xxmaj bill xxmaj paxton and xxmaj william xxmaj sadler are both hill billies and xxmaj xxunk character is just as much a villain as the gangsters . i did'nt like him right from the start . \n", " \n", " xxmaj the movie is filled with pointless violence and xxmaj walter xxmaj hills specialty : people falling through windows with glass flying everywhere . xxmaj there is pretty much no plot and it is a big problem when you root for no - one . xxmaj everybody dies , except from xxmaj paxton and the homeless guy and everybody get what they deserve . \n", " \n", " xxmaj the only two black people that can act is the homeless guy and the junkie but they 're actors by profession , not annoying ugly brain dead rappers . \n", " \n", " xxmaj stay away from this crap and watch 48 hours 1 and 2 instead . xxmaj at lest they have characters you care about , a sense of humor and nothing but real actors in the cast ., xxwrep 18 xxbos xxmaj not even the xxmaj beatles could write songs everyone liked , and although xxmaj walter xxmaj hill is no mop - top he 's second to none when it comes to thought provoking action movies . xxmaj the nineties came and social platforms were changing in music and film , the emergence of the xxmaj rapper turned movie star was in full swing , the acting took a back seat to each man 's overpowering regional accent and transparent acting . xxmaj this was one of the many ice - t movies i saw as a kid and loved , only to watch them later and cringe . xxmaj bill xxmaj paxton and xxmaj william xxmaj sadler are firemen with basic lives until a burning building tenant about to go up in flames hands over a map with gold implications . i hand it to xxmaj walter for quickly and neatly setting up the main characters and location . xxmaj but i fault everyone involved for turning out xxmaj lame - o performances . xxmaj ice - t and cube must have been red hot at this time , and while i 've enjoyed both their careers as rappers , in my opinion they fell flat in this movie . xxmaj it 's about ninety minutes of one guy ridiculously turning his back on the other guy to the point you find yourself locked in multiple states of disbelief . xxmaj now this is a movie , its not a documentary so i wo nt waste my time recounting all the stupid plot twists in this movie , but there were many , and they led nowhere . i got the feeling watching this that everyone on set was xxunk of confused and just playing things off the cuff . xxmaj there are two things i still enjoy about it , one involves a scene with a needle and the other is xxmaj sadler 's huge 45 pistol . xxmaj bottom line this movie is like domino 's pizza . xxmaj yeah ill eat it if i 'm hungry and i do n't feel like cooking , xxmaj but i 'm well aware it tastes like crap . 3 stars , meh ., xxwrep 18 xxbos xxmaj brass pictures ( movies is not a fitting word for them ) really are somewhat brassy . xxmaj their alluring visual qualities are reminiscent of expensive high class xxup tv commercials . xxmaj but unfortunately xxmaj brass pictures are feature films with the pretense of wanting to entertain viewers for over two hours ! xxmaj in this they fail miserably , their undeniable , but rather soft and flabby than steamy , erotic qualities non withstanding . \n", " \n", " xxmaj senso ' 45 is a remake of a film by xxmaj luchino xxmaj visconti with the same title and xxmaj alida xxmaj valli and xxmaj farley xxmaj granger in the lead . xxmaj the original tells a story of senseless love and lust in and around xxmaj venice during the xxmaj italian wars of independence . xxmaj brass moved the action from the 19th into the 20th century , 1945 to be exact , so there are xxmaj mussolini murals , men in black shirts , xxmaj german uniforms or the tattered garb of the partisans . xxmaj but it is just window dressing , the historic context is completely negligible . \n", " \n", " xxmaj anna xxmaj xxunk plays the attractive aristocratic woman who falls for the amoral xxup ss guy who always puts on too much lipstick . xxmaj she is an attractive , versatile , well trained xxmaj italian actress and clearly above the material . xxmaj her wide range of facial expressions ( xxunk boredom , loathing , delight , fear , hate ... and ecstasy ) are the best reason to watch this picture and worth two stars . xxmaj she endures this basically trashy stuff with an astonishing amount of dignity . i wish some really good parts come along for her . xxmaj she really deserves it .\n", "y: LMLabelList\n", ",,,,\n", "Path: C:\\Users\\cross-entropy\\.fastai\\data\\imdb;\n", "\n", "Valid: LabelList (10000 items)\n", "x: LMTextList\n", "xxbos xxup cavite is an example of ultimate independent film , with a very short budget , a very simple concept , an exotic locale , a minimal cast , and a hand - held camera . \n", " \n", " xxmaj the story is simple : xxmaj adam ( xxmaj ian xxmaj gamazon ) is called home to the xxmaj phillipines because of a family crisis . xxmaj instead of his family picking him up , he finds himself forced to follow instructions of a man claiming to have his family . xxmaj there 's no clear reason for the abduction , or what makes xxmaj adam a target ; all xxmaj adam really knows is that his every move is watched , and the kidnappers have no regard for their victims . \n", " \n", " xxmaj as xxmaj adam follows the obscure instructions , and the obstacles in his way , the audience ca n't help but be caught up in his plight . xxmaj the hand held camera and jumpy editing style enhances the sense of desperation and time . xxmaj the scenes of urban xxmaj phillipines , particular the markets and the squatter holdings are a vivid cacophony . \n", " \n", " xxmaj co - directors and xxmaj co - writers xxmaj neill xxmaj xxunk xxmaj xxunk and xxmaj ian xxmaj gamazon have done an outstanding job of making the most out of limited means . xxmaj the economy of the film makes it both intimate and discomfiting , as xxmaj adam is an everyman who only wants his family safe and instead is completely at the whim of an omniscient tormentor . \n", " \n", " xxup cavite is an absolute must for anyone who has an interest in film , as storytelling , in it 's structure , and as an art form .,xxbos xxmaj decades ago , a crate filled with weapons grade plutonium crashes on an island and soaks into the ground . xxmaj today , a team of military men are sent to track down a notorious terrorist ( of ambiguous national origin ) and they track him to this polluted island . xxmaj when their raft is destroyed , the team must spend the night on shore , but soon discover that the plutonium has done something awful to the island -- it has called forth hundreds of bloodthirsty velociraptors . \n", " \n", " xxmaj let me start this with a lesson : do n't lend a movie to your friend before you 've seen it , especially if you are supposed to be reviewing it for the internet 's finest horror movie site . xxmaj it took me almost a year to get this film back , and the person who borrowed it still had not watched it ( though we ended up seeing it together ) . xxmaj and a second , more important , lesson : when you do watch this , keep your expectations as low as humanly possible . xxmaj because this film ranks among the worst i 've ever seen . \n", " \n", " xxmaj my acting in 8th grade was more convincing than the seasoned actors who appear in this film ( xxmaj lorenzo xxmaj lamas , xxmaj stephen xxmaj bauer ) . xxmaj line delivery is very fake , and the words themselves are poorly scripted . xxmaj the opening words come from a man checking out his gun 's scope : \" xxmaj boom . xxmaj dead bad guys . \" xxmaj yes , that 's pure genius at work . xxmaj the only conversation with any depth has two main characters explaining their histories . xxmaj but it , too , seems unnatural and a poor attempt to provide character background and to fill time . xxmaj we did n't need to know anything about their histories , so why bore us with it ? xxmaj and if you think the conversations are bad , you ai n't seen nothing yet . \n", " \n", " xxmaj the lighting is atrocious . i generally do n't notice lighting , but my friend ( a former film school student ) was practically vomiting in rage at the way more often than not shadows fell on the actors ' faces and the light would be in the background , focused on nothing in particular . xxmaj most lighting looks like a spotlight in a dim room , and many of the scenes involve a deep , subterranean cavern -- which you 'd then expect to be poorly lit , but had lights coming from all sorts of random angles . xxmaj do n't ask me why . \n", " \n", " xxmaj the plot was pretty bad . xxmaj some films can take the idea of military men chasing a terrorist and make a convincing film out of it . xxmaj cat and mouse stories are riveting . xxmaj well , not here . xxmaj the terrorist is really not even part of the story , just an excuse to go to the island . xxmaj and the raptors ? xxmaj and the allosaurus ? xxmaj sure , they came from the plutonium that soaked into the ground . xxmaj but if that makes sense to you , please explain it to me because i have no clue how radiation brings dinosaurs back from millions of years of extinction . \n", " \n", " xxmaj by far the worst part of \" xxmaj raptor xxmaj island \" is the animation of the raptors . xxmaj that 's right -- the selling point of the film is the worst aspect . xxmaj the animation is n't just bad , it 's subpar . i ca n't even express the hilarity of cartoons this cheesy . xxmaj and when they get shot ? xxmaj red splats like one would see in an old video game . xxmaj even the airplane , helicopter and xxmaj navy ship are cartoons ... how hard is it to get a model plane ? xxmaj please do n't see \" xxmaj raptor xxmaj island \" unless you need a good laugh or want to get sickeningly drunk . xxmaj sure , you probably want to see it before you see \" xxmaj raptor xxmaj island 2 \" ( which seems to be getting better reviews ) . xxmaj but just avoiding it entirely is your best bet . xxmaj the closest thing i can compare it to is \" xxmaj pinata : xxmaj survival xxmaj island \" , and unfortunately this film makes the pinata look good by comparison . xxmaj you have been warned .,xxbos xxmaj this is one creepy underrated xxmaj gem with chilling performances and a fantastic xxunk xxmaj all the characters are great , and the story was awesome , plus i thought the ending was really xxunk xxmaj the plot was great , and it never bored me , plus while the child actors were bad , they gave me the xxunk xxmaj this happened to be on the space channel a while ago , so i decided to check it out and tape it , i read some good reviews from fellow horror fans , i must say i agree with them , it 's very creepy , and suspenseful , plus xxmaj strother xxmaj martin , was fantastic in his role , as the xxmaj satan worshiper . xxmaj it has tons of creepy atmosphere , and it keeps you guessing throughout , plus all the characters were very likable , and you really start to root for xxmaj ben and his xxunk xxmaj it has plenty of disturbing moments , and the film really shocked me at times , plus , it 's extremely well made on a low xxunk xxmaj this is one creepy underrated gem , with chilling performances and a fantastic finale ! , i highly recommend this one!. xxmaj the xxmaj direction is very good!. xxmaj bernard mceveety does a very good job here , with great camera work , creating a lot of creepy atmosphere , and keeping the film at a very fast pace!. xxmaj ther is a little bit of blood and gore . xxmaj we get a severed leg , lots of bloody corpses , bloody slit throat , slicing and dicing , decapitation , and an impaling . xxmaj the xxmaj acting is xxunk xxmaj strother xxmaj martin is fantastic here ! as the xxmaj satan worshiper , he is extremely creepy , very convincing , was quite chilling , was extremely intense , seemed to be enjoying himself , and just did a fantastic job xxunk xxmaj charles xxmaj bateman is great as the xxmaj dad , he was very caring , very likable , and gave a good show ! , i liked him lots . xxup l.q. xxmaj jones is awesome as the xxmaj sheriff , he was funny , on top of things , looked very young , had a cool character , and just did an awesome job xxunk xxmaj ahna xxmaj capri is good as the girlfriend and did what she had to do pretty well . xxmaj charles xxmaj robinson overacted to the extreme as the xxmaj priest and did n't convince me one bit ! , and that laugh of his was especially bad . xxmaj geri xxmaj reischl is actually decent as the daughter , she was somewhat likable , and only got on my nerves a couple times , i rather liked her . xxmaj alvy xxmaj moore was goofy , but very likable in his role as xxmaj tobey i dug him!. xxmaj rest of the cast do good . xxmaj overall i highly recommend it!. * * * 1 / 2 out of 5,xxbos xxmaj aside from the gross factual inaccuracies of this movie ( ie king xxmaj richard having a son and that son replacing xxmaj john on the throne ) this movie was a sweet love tale full of adventure . i 'm sure that this movie will appeal to the younger generation of girls and boys .,xxbos xxmaj the special effects again are superb but this is xxmaj finding xxmaj nemo in reverse i.e the parents get taken away and xxmaj pi ( xxmaj nemo ) is left behind , if you have n't seen xxmaj finding xxmaj nemo you will like it . \n", " \n", " xxmaj it becomes very clear at the beginning of the film what the plot is and takes a long time to reach the end ! \n", " \n", " xxmaj there 's is nothing new or nail biting in this one and there is no humour at all , which is very disappointing . \n", " \n", " xxmaj all in all very disappointing as a follow up movie to xxmaj shark xxmaj tale which there really are no similarities with it should have been called xxmaj finding xxmaj nemo 2 xxrep 56 !\n", "y: LMLabelList\n", ",,,,\n", "Path: C:\\Users\\cross-entropy\\.fastai\\data\\imdb;\n", "\n", "Test: None, model=SequentialRNN(\n", " (0): AWD_LSTM(\n", " (encoder): Embedding(60000, 400, padding_idx=1)\n", " (encoder_dp): EmbeddingDropout(\n", " (emb): Embedding(60000, 400, padding_idx=1)\n", " )\n", " (rnns): ModuleList(\n", " (0): WeightDropout(\n", " (module): LSTM(400, 1152, batch_first=True)\n", " )\n", " (1): WeightDropout(\n", " (module): LSTM(1152, 1152, batch_first=True)\n", " )\n", " (2): WeightDropout(\n", " (module): LSTM(1152, 400, batch_first=True)\n", " )\n", " )\n", " (input_dp): RNNDropout()\n", " (hidden_dps): ModuleList(\n", " (0): RNNDropout()\n", " (1): RNNDropout()\n", " (2): RNNDropout()\n", " )\n", " )\n", " (1): LinearDecoder(\n", " (decoder): Linear(in_features=400, out_features=60000, bias=True)\n", " (output_dp): RNNDropout()\n", " )\n", "), opt_func=functools.partial(, betas=(0.9, 0.99)), loss_func=FlattenedLoss of CrossEntropyLoss(), metrics=[], true_wd=True, bn_wd=True, wd=0.01, train_bn=True, path=WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb'), model_dir='models', callback_fns=[functools.partial(, add_time=True, silent=False)], callbacks=[RNNTrainer\n", "learn: ...\n", "alpha: 2.0\n", "beta: 1.0, MixedPrecision\n", "learn: ...\n", "loss_scale: 65536\n", "max_noskip: 1000\n", "dynamic: True\n", "clip: None\n", "flat_master: False\n", "max_scale: 16777216\n", "loss_fp32: True], layer_groups=[Sequential(\n", " (0): WeightDropout(\n", " (module): LSTM(400, 1152, batch_first=True)\n", " )\n", " (1): RNNDropout()\n", "), Sequential(\n", " (0): WeightDropout(\n", " (module): LSTM(1152, 1152, batch_first=True)\n", " )\n", " (1): RNNDropout()\n", "), Sequential(\n", " (0): WeightDropout(\n", " (module): LSTM(1152, 400, batch_first=True)\n", " )\n", " (1): RNNDropout()\n", "), Sequential(\n", " (0): Embedding(60000, 400, padding_idx=1)\n", " (1): EmbeddingDropout(\n", " (emb): Embedding(60000, 400, padding_idx=1)\n", " )\n", " (2): LinearDecoder(\n", " (decoder): Linear(in_features=400, out_features=60000, bias=True)\n", " (output_dp): RNNDropout()\n", " )\n", ")], add_time=True, silent=False)" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "learn_lm.to_fp16()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The first step in fine-tuning is to train only the last layer of the model. \n", "This takes about a half-hour on an NVIDIA RTX-2070 GPU" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
epochtrain_lossvalid_lossaccuracytime
04.1098103.4323690.35714327:48
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "learn_lm.fit_one_cycle(1, lr*10, moms=(0.8,0.7))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since this is relatively slow to train, we will save our weights:" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [], "source": [ "learn_lm.save('fit_1')" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LanguageLearner(data=TextLMDataBunch;\n", "\n", "Train: LabelList (90000 items)\n", "x: LMTextList\n", " xxwrep 18 xxbos xxmaj once again xxmaj mr. xxmaj costner has dragged out a movie for far longer than necessary . xxmaj aside from the terrific sea rescue sequences , of which there are very few i just did not care about any of the characters . xxmaj most of us have ghosts in the closet , and xxmaj costner 's character are realized early on , and then forgotten until much later , by which time i did not care . xxmaj the character we should really care about is a very cocky , overconfident xxmaj ashton xxmaj kutcher . xxmaj the problem is he comes off as kid who thinks he 's better than anyone else around him and shows no signs of a cluttered closet . xxmaj his only obstacle appears to be winning over xxmaj costner . xxmaj finally when we are well past the half way point of this stinker , xxmaj costner tells us all about xxmaj kutcher 's ghosts . xxmaj we are told why xxmaj kutcher is driven to be the best with no prior inkling or foreshadowing . xxmaj no magic here , it was all i could do to keep from turning it off an hour in ., xxwrep 18 xxbos xxmaj this is an example of why the majority of action films are the same . xxmaj generic and boring , there 's really nothing worth watching here . a complete waste of the then barely - tapped talents of xxmaj ice - t and xxmaj ice xxmaj cube , who 've each proven many times over that they are capable of acting , and acting well . xxmaj do n't bother with this one , go see xxmaj new xxmaj jack xxmaj city , xxmaj ricochet or watch xxmaj new xxmaj york xxmaj undercover for xxmaj ice - t , or xxmaj boyz n the xxmaj hood , xxmaj higher xxmaj learning or xxmaj friday for xxmaj ice xxmaj cube and see the real deal . xxmaj ice - t 's horribly cliched dialogue alone makes this film grate at the teeth , and i 'm still wondering what the heck xxmaj bill xxmaj paxton was doing in this film ? xxmaj and why the heck does he always play the exact same character ? xxmaj from xxmaj aliens onward , every film i 've seen with xxmaj bill xxmaj paxton has him playing the exact same irritating character , and at least in xxmaj aliens his character died , which made it somewhat gratifying ... \n", " \n", " xxmaj overall , this is second - rate action trash . xxmaj there are countless better films to see , and if you really want to see this one , watch xxmaj judgement xxmaj night , which is practically a carbon copy but has better acting and a better script . xxmaj the only thing that made this at all worth watching was a decent hand on the camera - the cinematography was almost refreshing , which comes close to making up for the horrible film itself - but not quite . 4 / 10 ., xxwrep 18 xxbos xxmaj first of all i hate those moronic rappers , who could'nt act if they had a gun pressed against their foreheads . xxmaj all they do is curse and shoot each other and acting like xxunk version of gangsters . \n", " \n", " xxmaj the movie does n't take more than five minutes to explain what is going on before we 're already at the warehouse xxmaj there is not a single sympathetic character in this movie , except for the homeless guy , who is also the only one with half a brain . \n", " \n", " xxmaj bill xxmaj paxton and xxmaj william xxmaj sadler are both hill billies and xxmaj xxunk character is just as much a villain as the gangsters . i did'nt like him right from the start . \n", " \n", " xxmaj the movie is filled with pointless violence and xxmaj walter xxmaj hills specialty : people falling through windows with glass flying everywhere . xxmaj there is pretty much no plot and it is a big problem when you root for no - one . xxmaj everybody dies , except from xxmaj paxton and the homeless guy and everybody get what they deserve . \n", " \n", " xxmaj the only two black people that can act is the homeless guy and the junkie but they 're actors by profession , not annoying ugly brain dead rappers . \n", " \n", " xxmaj stay away from this crap and watch 48 hours 1 and 2 instead . xxmaj at lest they have characters you care about , a sense of humor and nothing but real actors in the cast ., xxwrep 18 xxbos xxmaj not even the xxmaj beatles could write songs everyone liked , and although xxmaj walter xxmaj hill is no mop - top he 's second to none when it comes to thought provoking action movies . xxmaj the nineties came and social platforms were changing in music and film , the emergence of the xxmaj rapper turned movie star was in full swing , the acting took a back seat to each man 's overpowering regional accent and transparent acting . xxmaj this was one of the many ice - t movies i saw as a kid and loved , only to watch them later and cringe . xxmaj bill xxmaj paxton and xxmaj william xxmaj sadler are firemen with basic lives until a burning building tenant about to go up in flames hands over a map with gold implications . i hand it to xxmaj walter for quickly and neatly setting up the main characters and location . xxmaj but i fault everyone involved for turning out xxmaj lame - o performances . xxmaj ice - t and cube must have been red hot at this time , and while i 've enjoyed both their careers as rappers , in my opinion they fell flat in this movie . xxmaj it 's about ninety minutes of one guy ridiculously turning his back on the other guy to the point you find yourself locked in multiple states of disbelief . xxmaj now this is a movie , its not a documentary so i wo nt waste my time recounting all the stupid plot twists in this movie , but there were many , and they led nowhere . i got the feeling watching this that everyone on set was xxunk of confused and just playing things off the cuff . xxmaj there are two things i still enjoy about it , one involves a scene with a needle and the other is xxmaj sadler 's huge 45 pistol . xxmaj bottom line this movie is like domino 's pizza . xxmaj yeah ill eat it if i 'm hungry and i do n't feel like cooking , xxmaj but i 'm well aware it tastes like crap . 3 stars , meh ., xxwrep 18 xxbos xxmaj brass pictures ( movies is not a fitting word for them ) really are somewhat brassy . xxmaj their alluring visual qualities are reminiscent of expensive high class xxup tv commercials . xxmaj but unfortunately xxmaj brass pictures are feature films with the pretense of wanting to entertain viewers for over two hours ! xxmaj in this they fail miserably , their undeniable , but rather soft and flabby than steamy , erotic qualities non withstanding . \n", " \n", " xxmaj senso ' 45 is a remake of a film by xxmaj luchino xxmaj visconti with the same title and xxmaj alida xxmaj valli and xxmaj farley xxmaj granger in the lead . xxmaj the original tells a story of senseless love and lust in and around xxmaj venice during the xxmaj italian wars of independence . xxmaj brass moved the action from the 19th into the 20th century , 1945 to be exact , so there are xxmaj mussolini murals , men in black shirts , xxmaj german uniforms or the tattered garb of the partisans . xxmaj but it is just window dressing , the historic context is completely negligible . \n", " \n", " xxmaj anna xxmaj xxunk plays the attractive aristocratic woman who falls for the amoral xxup ss guy who always puts on too much lipstick . xxmaj she is an attractive , versatile , well trained xxmaj italian actress and clearly above the material . xxmaj her wide range of facial expressions ( xxunk boredom , loathing , delight , fear , hate ... and ecstasy ) are the best reason to watch this picture and worth two stars . xxmaj she endures this basically trashy stuff with an astonishing amount of dignity . i wish some really good parts come along for her . xxmaj she really deserves it .\n", "y: LMLabelList\n", ",,,,\n", "Path: C:\\Users\\cross-entropy\\.fastai\\data\\imdb;\n", "\n", "Valid: LabelList (10000 items)\n", "x: LMTextList\n", "xxbos xxup cavite is an example of ultimate independent film , with a very short budget , a very simple concept , an exotic locale , a minimal cast , and a hand - held camera . \n", " \n", " xxmaj the story is simple : xxmaj adam ( xxmaj ian xxmaj gamazon ) is called home to the xxmaj phillipines because of a family crisis . xxmaj instead of his family picking him up , he finds himself forced to follow instructions of a man claiming to have his family . xxmaj there 's no clear reason for the abduction , or what makes xxmaj adam a target ; all xxmaj adam really knows is that his every move is watched , and the kidnappers have no regard for their victims . \n", " \n", " xxmaj as xxmaj adam follows the obscure instructions , and the obstacles in his way , the audience ca n't help but be caught up in his plight . xxmaj the hand held camera and jumpy editing style enhances the sense of desperation and time . xxmaj the scenes of urban xxmaj phillipines , particular the markets and the squatter holdings are a vivid cacophony . \n", " \n", " xxmaj co - directors and xxmaj co - writers xxmaj neill xxmaj xxunk xxmaj xxunk and xxmaj ian xxmaj gamazon have done an outstanding job of making the most out of limited means . xxmaj the economy of the film makes it both intimate and discomfiting , as xxmaj adam is an everyman who only wants his family safe and instead is completely at the whim of an omniscient tormentor . \n", " \n", " xxup cavite is an absolute must for anyone who has an interest in film , as storytelling , in it 's structure , and as an art form .,xxbos xxmaj decades ago , a crate filled with weapons grade plutonium crashes on an island and soaks into the ground . xxmaj today , a team of military men are sent to track down a notorious terrorist ( of ambiguous national origin ) and they track him to this polluted island . xxmaj when their raft is destroyed , the team must spend the night on shore , but soon discover that the plutonium has done something awful to the island -- it has called forth hundreds of bloodthirsty velociraptors . \n", " \n", " xxmaj let me start this with a lesson : do n't lend a movie to your friend before you 've seen it , especially if you are supposed to be reviewing it for the internet 's finest horror movie site . xxmaj it took me almost a year to get this film back , and the person who borrowed it still had not watched it ( though we ended up seeing it together ) . xxmaj and a second , more important , lesson : when you do watch this , keep your expectations as low as humanly possible . xxmaj because this film ranks among the worst i 've ever seen . \n", " \n", " xxmaj my acting in 8th grade was more convincing than the seasoned actors who appear in this film ( xxmaj lorenzo xxmaj lamas , xxmaj stephen xxmaj bauer ) . xxmaj line delivery is very fake , and the words themselves are poorly scripted . xxmaj the opening words come from a man checking out his gun 's scope : \" xxmaj boom . xxmaj dead bad guys . \" xxmaj yes , that 's pure genius at work . xxmaj the only conversation with any depth has two main characters explaining their histories . xxmaj but it , too , seems unnatural and a poor attempt to provide character background and to fill time . xxmaj we did n't need to know anything about their histories , so why bore us with it ? xxmaj and if you think the conversations are bad , you ai n't seen nothing yet . \n", " \n", " xxmaj the lighting is atrocious . i generally do n't notice lighting , but my friend ( a former film school student ) was practically vomiting in rage at the way more often than not shadows fell on the actors ' faces and the light would be in the background , focused on nothing in particular . xxmaj most lighting looks like a spotlight in a dim room , and many of the scenes involve a deep , subterranean cavern -- which you 'd then expect to be poorly lit , but had lights coming from all sorts of random angles . xxmaj do n't ask me why . \n", " \n", " xxmaj the plot was pretty bad . xxmaj some films can take the idea of military men chasing a terrorist and make a convincing film out of it . xxmaj cat and mouse stories are riveting . xxmaj well , not here . xxmaj the terrorist is really not even part of the story , just an excuse to go to the island . xxmaj and the raptors ? xxmaj and the allosaurus ? xxmaj sure , they came from the plutonium that soaked into the ground . xxmaj but if that makes sense to you , please explain it to me because i have no clue how radiation brings dinosaurs back from millions of years of extinction . \n", " \n", " xxmaj by far the worst part of \" xxmaj raptor xxmaj island \" is the animation of the raptors . xxmaj that 's right -- the selling point of the film is the worst aspect . xxmaj the animation is n't just bad , it 's subpar . i ca n't even express the hilarity of cartoons this cheesy . xxmaj and when they get shot ? xxmaj red splats like one would see in an old video game . xxmaj even the airplane , helicopter and xxmaj navy ship are cartoons ... how hard is it to get a model plane ? xxmaj please do n't see \" xxmaj raptor xxmaj island \" unless you need a good laugh or want to get sickeningly drunk . xxmaj sure , you probably want to see it before you see \" xxmaj raptor xxmaj island 2 \" ( which seems to be getting better reviews ) . xxmaj but just avoiding it entirely is your best bet . xxmaj the closest thing i can compare it to is \" xxmaj pinata : xxmaj survival xxmaj island \" , and unfortunately this film makes the pinata look good by comparison . xxmaj you have been warned .,xxbos xxmaj this is one creepy underrated xxmaj gem with chilling performances and a fantastic xxunk xxmaj all the characters are great , and the story was awesome , plus i thought the ending was really xxunk xxmaj the plot was great , and it never bored me , plus while the child actors were bad , they gave me the xxunk xxmaj this happened to be on the space channel a while ago , so i decided to check it out and tape it , i read some good reviews from fellow horror fans , i must say i agree with them , it 's very creepy , and suspenseful , plus xxmaj strother xxmaj martin , was fantastic in his role , as the xxmaj satan worshiper . xxmaj it has tons of creepy atmosphere , and it keeps you guessing throughout , plus all the characters were very likable , and you really start to root for xxmaj ben and his xxunk xxmaj it has plenty of disturbing moments , and the film really shocked me at times , plus , it 's extremely well made on a low xxunk xxmaj this is one creepy underrated gem , with chilling performances and a fantastic finale ! , i highly recommend this one!. xxmaj the xxmaj direction is very good!. xxmaj bernard mceveety does a very good job here , with great camera work , creating a lot of creepy atmosphere , and keeping the film at a very fast pace!. xxmaj ther is a little bit of blood and gore . xxmaj we get a severed leg , lots of bloody corpses , bloody slit throat , slicing and dicing , decapitation , and an impaling . xxmaj the xxmaj acting is xxunk xxmaj strother xxmaj martin is fantastic here ! as the xxmaj satan worshiper , he is extremely creepy , very convincing , was quite chilling , was extremely intense , seemed to be enjoying himself , and just did a fantastic job xxunk xxmaj charles xxmaj bateman is great as the xxmaj dad , he was very caring , very likable , and gave a good show ! , i liked him lots . xxup l.q. xxmaj jones is awesome as the xxmaj sheriff , he was funny , on top of things , looked very young , had a cool character , and just did an awesome job xxunk xxmaj ahna xxmaj capri is good as the girlfriend and did what she had to do pretty well . xxmaj charles xxmaj robinson overacted to the extreme as the xxmaj priest and did n't convince me one bit ! , and that laugh of his was especially bad . xxmaj geri xxmaj reischl is actually decent as the daughter , she was somewhat likable , and only got on my nerves a couple times , i rather liked her . xxmaj alvy xxmaj moore was goofy , but very likable in his role as xxmaj tobey i dug him!. xxmaj rest of the cast do good . xxmaj overall i highly recommend it!. * * * 1 / 2 out of 5,xxbos xxmaj aside from the gross factual inaccuracies of this movie ( ie king xxmaj richard having a son and that son replacing xxmaj john on the throne ) this movie was a sweet love tale full of adventure . i 'm sure that this movie will appeal to the younger generation of girls and boys .,xxbos xxmaj the special effects again are superb but this is xxmaj finding xxmaj nemo in reverse i.e the parents get taken away and xxmaj pi ( xxmaj nemo ) is left behind , if you have n't seen xxmaj finding xxmaj nemo you will like it . \n", " \n", " xxmaj it becomes very clear at the beginning of the film what the plot is and takes a long time to reach the end ! \n", " \n", " xxmaj there 's is nothing new or nail biting in this one and there is no humour at all , which is very disappointing . \n", " \n", " xxmaj all in all very disappointing as a follow up movie to xxmaj shark xxmaj tale which there really are no similarities with it should have been called xxmaj finding xxmaj nemo 2 xxrep 56 !\n", "y: LMLabelList\n", ",,,,\n", "Path: C:\\Users\\cross-entropy\\.fastai\\data\\imdb;\n", "\n", "Test: None, model=SequentialRNN(\n", " (0): AWD_LSTM(\n", " (encoder): Embedding(60000, 400, padding_idx=1)\n", " (encoder_dp): EmbeddingDropout(\n", " (emb): Embedding(60000, 400, padding_idx=1)\n", " )\n", " (rnns): ModuleList(\n", " (0): WeightDropout(\n", " (module): LSTM(400, 1152, batch_first=True)\n", " )\n", " (1): WeightDropout(\n", " (module): LSTM(1152, 1152, batch_first=True)\n", " )\n", " (2): WeightDropout(\n", " (module): LSTM(1152, 400, batch_first=True)\n", " )\n", " )\n", " (input_dp): RNNDropout()\n", " (hidden_dps): ModuleList(\n", " (0): RNNDropout()\n", " (1): RNNDropout()\n", " (2): RNNDropout()\n", " )\n", " )\n", " (1): LinearDecoder(\n", " (decoder): Linear(in_features=400, out_features=60000, bias=True)\n", " (output_dp): RNNDropout()\n", " )\n", "), opt_func=functools.partial(, betas=(0.9, 0.99)), loss_func=FlattenedLoss of CrossEntropyLoss(), metrics=[], true_wd=True, bn_wd=True, wd=0.01, train_bn=True, path=WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb'), model_dir='models', callback_fns=[functools.partial(, add_time=True, silent=False)], callbacks=[RNNTrainer\n", "learn: ...\n", "alpha: 2.0\n", "beta: 1.0, MixedPrecision\n", "learn: ...\n", "loss_scale: 2097152.0\n", "max_noskip: 1000\n", "dynamic: True\n", "clip: None\n", "flat_master: False\n", "max_scale: 16777216\n", "loss_fp32: True], layer_groups=[Sequential(\n", " (0): WeightDropout(\n", " (module): LSTM(400, 1152, batch_first=True)\n", " )\n", " (1): RNNDropout()\n", "), Sequential(\n", " (0): WeightDropout(\n", " (module): LSTM(1152, 1152, batch_first=True)\n", " )\n", " (1): RNNDropout()\n", "), Sequential(\n", " (0): WeightDropout(\n", " (module): LSTM(1152, 400, batch_first=True)\n", " )\n", " (1): RNNDropout()\n", "), Sequential(\n", " (0): Embedding(60000, 400, padding_idx=1)\n", " (1): EmbeddingDropout(\n", " (emb): Embedding(60000, 400, padding_idx=1)\n", " )\n", " (2): LinearDecoder(\n", " (decoder): Linear(in_features=400, out_features=60000, bias=True)\n", " (output_dp): RNNDropout()\n", " )\n", ")], add_time=True, silent=False)" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "learn_lm.load('fit_1')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### To complete the fine-tuning, we unfreeze all the weights and retrain\n", "Adopting the `wikitext-103` weights as initial values, our neural network will adjust them via optimization, finding new values that are specialized to the \"language\" of `IMDb` movie reviews." ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "learn_lm.unfreeze()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fine tuning the model takes ~30 minutes per epoch on an NVIDIA RTX-2070 GPU, with bs=48
\n", "Note the relatively low value of accuracy, which did not improve significantly beyond `epoch 4`." ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
epochtrain_lossvalid_lossaccuracytime
03.8245573.1458280.42857130:49
13.7857423.2778130.44285730:39
23.7532883.1116100.40000030:37
33.6767713.1635470.35714330:31
43.6058993.0779330.45714330:21
53.5677943.0450430.45714330:18
63.4904803.0728210.41428630:27
73.4348753.0529060.40000030:18
83.3709242.9909840.44285730:10
93.3362412.9945310.45714330:21
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "learn_lm.fit_one_cycle(10, lr, moms=(0.8,0.7))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Save the fine-tuned **language model** and the **encoder**\n", "We have to save not just the `fine-tuned` **IMDb language model** but also its **encoder**. The **language model** is the part that tries to guess the next word. The **encoder** is the part that's responsible for creating and updating the hidden state. \n", "\n", "In the next part we will build a **sentiment classifier** for the IMDb movie reviews. To do this we will need the **encoder** from the **IMDb language model** that we built." ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [], "source": [ "learn_lm.save('fine_tuned')" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "learn_lm.save_encoder('fine_tuned_enc')" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "#### Load the saved **model** and its **encoder**" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "LanguageLearner(data=TextLMDataBunch;\n", "\n", "Train: LabelList (90000 items)\n", "x: LMTextList\n", " xxwrep 18 xxbos xxmaj once again xxmaj mr. xxmaj costner has dragged out a movie for far longer than necessary . xxmaj aside from the terrific sea rescue sequences , of which there are very few i just did not care about any of the characters . xxmaj most of us have ghosts in the closet , and xxmaj costner 's character are realized early on , and then forgotten until much later , by which time i did not care . xxmaj the character we should really care about is a very cocky , overconfident xxmaj ashton xxmaj kutcher . xxmaj the problem is he comes off as kid who thinks he 's better than anyone else around him and shows no signs of a cluttered closet . xxmaj his only obstacle appears to be winning over xxmaj costner . xxmaj finally when we are well past the half way point of this stinker , xxmaj costner tells us all about xxmaj kutcher 's ghosts . xxmaj we are told why xxmaj kutcher is driven to be the best with no prior inkling or foreshadowing . xxmaj no magic here , it was all i could do to keep from turning it off an hour in ., xxwrep 18 xxbos xxmaj this is an example of why the majority of action films are the same . xxmaj generic and boring , there 's really nothing worth watching here . a complete waste of the then barely - tapped talents of xxmaj ice - t and xxmaj ice xxmaj cube , who 've each proven many times over that they are capable of acting , and acting well . xxmaj do n't bother with this one , go see xxmaj new xxmaj jack xxmaj city , xxmaj ricochet or watch xxmaj new xxmaj york xxmaj undercover for xxmaj ice - t , or xxmaj boyz n the xxmaj hood , xxmaj higher xxmaj learning or xxmaj friday for xxmaj ice xxmaj cube and see the real deal . xxmaj ice - t 's horribly cliched dialogue alone makes this film grate at the teeth , and i 'm still wondering what the heck xxmaj bill xxmaj paxton was doing in this film ? xxmaj and why the heck does he always play the exact same character ? xxmaj from xxmaj aliens onward , every film i 've seen with xxmaj bill xxmaj paxton has him playing the exact same irritating character , and at least in xxmaj aliens his character died , which made it somewhat gratifying ... \n", " \n", " xxmaj overall , this is second - rate action trash . xxmaj there are countless better films to see , and if you really want to see this one , watch xxmaj judgement xxmaj night , which is practically a carbon copy but has better acting and a better script . xxmaj the only thing that made this at all worth watching was a decent hand on the camera - the cinematography was almost refreshing , which comes close to making up for the horrible film itself - but not quite . 4 / 10 ., xxwrep 18 xxbos xxmaj first of all i hate those moronic rappers , who could'nt act if they had a gun pressed against their foreheads . xxmaj all they do is curse and shoot each other and acting like xxunk version of gangsters . \n", " \n", " xxmaj the movie does n't take more than five minutes to explain what is going on before we 're already at the warehouse xxmaj there is not a single sympathetic character in this movie , except for the homeless guy , who is also the only one with half a brain . \n", " \n", " xxmaj bill xxmaj paxton and xxmaj william xxmaj sadler are both hill billies and xxmaj xxunk character is just as much a villain as the gangsters . i did'nt like him right from the start . \n", " \n", " xxmaj the movie is filled with pointless violence and xxmaj walter xxmaj hills specialty : people falling through windows with glass flying everywhere . xxmaj there is pretty much no plot and it is a big problem when you root for no - one . xxmaj everybody dies , except from xxmaj paxton and the homeless guy and everybody get what they deserve . \n", " \n", " xxmaj the only two black people that can act is the homeless guy and the junkie but they 're actors by profession , not annoying ugly brain dead rappers . \n", " \n", " xxmaj stay away from this crap and watch 48 hours 1 and 2 instead . xxmaj at lest they have characters you care about , a sense of humor and nothing but real actors in the cast ., xxwrep 18 xxbos xxmaj not even the xxmaj beatles could write songs everyone liked , and although xxmaj walter xxmaj hill is no mop - top he 's second to none when it comes to thought provoking action movies . xxmaj the nineties came and social platforms were changing in music and film , the emergence of the xxmaj rapper turned movie star was in full swing , the acting took a back seat to each man 's overpowering regional accent and transparent acting . xxmaj this was one of the many ice - t movies i saw as a kid and loved , only to watch them later and cringe . xxmaj bill xxmaj paxton and xxmaj william xxmaj sadler are firemen with basic lives until a burning building tenant about to go up in flames hands over a map with gold implications . i hand it to xxmaj walter for quickly and neatly setting up the main characters and location . xxmaj but i fault everyone involved for turning out xxmaj lame - o performances . xxmaj ice - t and cube must have been red hot at this time , and while i 've enjoyed both their careers as rappers , in my opinion they fell flat in this movie . xxmaj it 's about ninety minutes of one guy ridiculously turning his back on the other guy to the point you find yourself locked in multiple states of disbelief . xxmaj now this is a movie , its not a documentary so i wo nt waste my time recounting all the stupid plot twists in this movie , but there were many , and they led nowhere . i got the feeling watching this that everyone on set was xxunk of confused and just playing things off the cuff . xxmaj there are two things i still enjoy about it , one involves a scene with a needle and the other is xxmaj sadler 's huge 45 pistol . xxmaj bottom line this movie is like domino 's pizza . xxmaj yeah ill eat it if i 'm hungry and i do n't feel like cooking , xxmaj but i 'm well aware it tastes like crap . 3 stars , meh ., xxwrep 18 xxbos xxmaj brass pictures ( movies is not a fitting word for them ) really are somewhat brassy . xxmaj their alluring visual qualities are reminiscent of expensive high class xxup tv commercials . xxmaj but unfortunately xxmaj brass pictures are feature films with the pretense of wanting to entertain viewers for over two hours ! xxmaj in this they fail miserably , their undeniable , but rather soft and flabby than steamy , erotic qualities non withstanding . \n", " \n", " xxmaj senso ' 45 is a remake of a film by xxmaj luchino xxmaj visconti with the same title and xxmaj alida xxmaj valli and xxmaj farley xxmaj granger in the lead . xxmaj the original tells a story of senseless love and lust in and around xxmaj venice during the xxmaj italian wars of independence . xxmaj brass moved the action from the 19th into the 20th century , 1945 to be exact , so there are xxmaj mussolini murals , men in black shirts , xxmaj german uniforms or the tattered garb of the partisans . xxmaj but it is just window dressing , the historic context is completely negligible . \n", " \n", " xxmaj anna xxmaj xxunk plays the attractive aristocratic woman who falls for the amoral xxup ss guy who always puts on too much lipstick . xxmaj she is an attractive , versatile , well trained xxmaj italian actress and clearly above the material . xxmaj her wide range of facial expressions ( xxunk boredom , loathing , delight , fear , hate ... and ecstasy ) are the best reason to watch this picture and worth two stars . xxmaj she endures this basically trashy stuff with an astonishing amount of dignity . i wish some really good parts come along for her . xxmaj she really deserves it .\n", "y: LMLabelList\n", ",,,,\n", "Path: C:\\Users\\cross-entropy\\.fastai\\data\\imdb;\n", "\n", "Valid: LabelList (10000 items)\n", "x: LMTextList\n", "xxbos xxup cavite is an example of ultimate independent film , with a very short budget , a very simple concept , an exotic locale , a minimal cast , and a hand - held camera . \n", " \n", " xxmaj the story is simple : xxmaj adam ( xxmaj ian xxmaj gamazon ) is called home to the xxmaj phillipines because of a family crisis . xxmaj instead of his family picking him up , he finds himself forced to follow instructions of a man claiming to have his family . xxmaj there 's no clear reason for the abduction , or what makes xxmaj adam a target ; all xxmaj adam really knows is that his every move is watched , and the kidnappers have no regard for their victims . \n", " \n", " xxmaj as xxmaj adam follows the obscure instructions , and the obstacles in his way , the audience ca n't help but be caught up in his plight . xxmaj the hand held camera and jumpy editing style enhances the sense of desperation and time . xxmaj the scenes of urban xxmaj phillipines , particular the markets and the squatter holdings are a vivid cacophony . \n", " \n", " xxmaj co - directors and xxmaj co - writers xxmaj neill xxmaj xxunk xxmaj xxunk and xxmaj ian xxmaj gamazon have done an outstanding job of making the most out of limited means . xxmaj the economy of the film makes it both intimate and discomfiting , as xxmaj adam is an everyman who only wants his family safe and instead is completely at the whim of an omniscient tormentor . \n", " \n", " xxup cavite is an absolute must for anyone who has an interest in film , as storytelling , in it 's structure , and as an art form .,xxbos xxmaj decades ago , a crate filled with weapons grade plutonium crashes on an island and soaks into the ground . xxmaj today , a team of military men are sent to track down a notorious terrorist ( of ambiguous national origin ) and they track him to this polluted island . xxmaj when their raft is destroyed , the team must spend the night on shore , but soon discover that the plutonium has done something awful to the island -- it has called forth hundreds of bloodthirsty velociraptors . \n", " \n", " xxmaj let me start this with a lesson : do n't lend a movie to your friend before you 've seen it , especially if you are supposed to be reviewing it for the internet 's finest horror movie site . xxmaj it took me almost a year to get this film back , and the person who borrowed it still had not watched it ( though we ended up seeing it together ) . xxmaj and a second , more important , lesson : when you do watch this , keep your expectations as low as humanly possible . xxmaj because this film ranks among the worst i 've ever seen . \n", " \n", " xxmaj my acting in 8th grade was more convincing than the seasoned actors who appear in this film ( xxmaj lorenzo xxmaj lamas , xxmaj stephen xxmaj bauer ) . xxmaj line delivery is very fake , and the words themselves are poorly scripted . xxmaj the opening words come from a man checking out his gun 's scope : \" xxmaj boom . xxmaj dead bad guys . \" xxmaj yes , that 's pure genius at work . xxmaj the only conversation with any depth has two main characters explaining their histories . xxmaj but it , too , seems unnatural and a poor attempt to provide character background and to fill time . xxmaj we did n't need to know anything about their histories , so why bore us with it ? xxmaj and if you think the conversations are bad , you ai n't seen nothing yet . \n", " \n", " xxmaj the lighting is atrocious . i generally do n't notice lighting , but my friend ( a former film school student ) was practically vomiting in rage at the way more often than not shadows fell on the actors ' faces and the light would be in the background , focused on nothing in particular . xxmaj most lighting looks like a spotlight in a dim room , and many of the scenes involve a deep , subterranean cavern -- which you 'd then expect to be poorly lit , but had lights coming from all sorts of random angles . xxmaj do n't ask me why . \n", " \n", " xxmaj the plot was pretty bad . xxmaj some films can take the idea of military men chasing a terrorist and make a convincing film out of it . xxmaj cat and mouse stories are riveting . xxmaj well , not here . xxmaj the terrorist is really not even part of the story , just an excuse to go to the island . xxmaj and the raptors ? xxmaj and the allosaurus ? xxmaj sure , they came from the plutonium that soaked into the ground . xxmaj but if that makes sense to you , please explain it to me because i have no clue how radiation brings dinosaurs back from millions of years of extinction . \n", " \n", " xxmaj by far the worst part of \" xxmaj raptor xxmaj island \" is the animation of the raptors . xxmaj that 's right -- the selling point of the film is the worst aspect . xxmaj the animation is n't just bad , it 's subpar . i ca n't even express the hilarity of cartoons this cheesy . xxmaj and when they get shot ? xxmaj red splats like one would see in an old video game . xxmaj even the airplane , helicopter and xxmaj navy ship are cartoons ... how hard is it to get a model plane ? xxmaj please do n't see \" xxmaj raptor xxmaj island \" unless you need a good laugh or want to get sickeningly drunk . xxmaj sure , you probably want to see it before you see \" xxmaj raptor xxmaj island 2 \" ( which seems to be getting better reviews ) . xxmaj but just avoiding it entirely is your best bet . xxmaj the closest thing i can compare it to is \" xxmaj pinata : xxmaj survival xxmaj island \" , and unfortunately this film makes the pinata look good by comparison . xxmaj you have been warned .,xxbos xxmaj this is one creepy underrated xxmaj gem with chilling performances and a fantastic xxunk xxmaj all the characters are great , and the story was awesome , plus i thought the ending was really xxunk xxmaj the plot was great , and it never bored me , plus while the child actors were bad , they gave me the xxunk xxmaj this happened to be on the space channel a while ago , so i decided to check it out and tape it , i read some good reviews from fellow horror fans , i must say i agree with them , it 's very creepy , and suspenseful , plus xxmaj strother xxmaj martin , was fantastic in his role , as the xxmaj satan worshiper . xxmaj it has tons of creepy atmosphere , and it keeps you guessing throughout , plus all the characters were very likable , and you really start to root for xxmaj ben and his xxunk xxmaj it has plenty of disturbing moments , and the film really shocked me at times , plus , it 's extremely well made on a low xxunk xxmaj this is one creepy underrated gem , with chilling performances and a fantastic finale ! , i highly recommend this one!. xxmaj the xxmaj direction is very good!. xxmaj bernard mceveety does a very good job here , with great camera work , creating a lot of creepy atmosphere , and keeping the film at a very fast pace!. xxmaj ther is a little bit of blood and gore . xxmaj we get a severed leg , lots of bloody corpses , bloody slit throat , slicing and dicing , decapitation , and an impaling . xxmaj the xxmaj acting is xxunk xxmaj strother xxmaj martin is fantastic here ! as the xxmaj satan worshiper , he is extremely creepy , very convincing , was quite chilling , was extremely intense , seemed to be enjoying himself , and just did a fantastic job xxunk xxmaj charles xxmaj bateman is great as the xxmaj dad , he was very caring , very likable , and gave a good show ! , i liked him lots . xxup l.q. xxmaj jones is awesome as the xxmaj sheriff , he was funny , on top of things , looked very young , had a cool character , and just did an awesome job xxunk xxmaj ahna xxmaj capri is good as the girlfriend and did what she had to do pretty well . xxmaj charles xxmaj robinson overacted to the extreme as the xxmaj priest and did n't convince me one bit ! , and that laugh of his was especially bad . xxmaj geri xxmaj reischl is actually decent as the daughter , she was somewhat likable , and only got on my nerves a couple times , i rather liked her . xxmaj alvy xxmaj moore was goofy , but very likable in his role as xxmaj tobey i dug him!. xxmaj rest of the cast do good . xxmaj overall i highly recommend it!. * * * 1 / 2 out of 5,xxbos xxmaj aside from the gross factual inaccuracies of this movie ( ie king xxmaj richard having a son and that son replacing xxmaj john on the throne ) this movie was a sweet love tale full of adventure . i 'm sure that this movie will appeal to the younger generation of girls and boys .,xxbos xxmaj the special effects again are superb but this is xxmaj finding xxmaj nemo in reverse i.e the parents get taken away and xxmaj pi ( xxmaj nemo ) is left behind , if you have n't seen xxmaj finding xxmaj nemo you will like it . \n", " \n", " xxmaj it becomes very clear at the beginning of the film what the plot is and takes a long time to reach the end ! \n", " \n", " xxmaj there 's is nothing new or nail biting in this one and there is no humour at all , which is very disappointing . \n", " \n", " xxmaj all in all very disappointing as a follow up movie to xxmaj shark xxmaj tale which there really are no similarities with it should have been called xxmaj finding xxmaj nemo 2 xxrep 56 !\n", "y: LMLabelList\n", ",,,,\n", "Path: C:\\Users\\cross-entropy\\.fastai\\data\\imdb;\n", "\n", "Test: None, model=SequentialRNN(\n", " (0): AWD_LSTM(\n", " (encoder): Embedding(60000, 400, padding_idx=1)\n", " (encoder_dp): EmbeddingDropout(\n", " (emb): Embedding(60000, 400, padding_idx=1)\n", " )\n", " (rnns): ModuleList(\n", " (0): WeightDropout(\n", " (module): LSTM(400, 1152, batch_first=True)\n", " )\n", " (1): WeightDropout(\n", " (module): LSTM(1152, 1152, batch_first=True)\n", " )\n", " (2): WeightDropout(\n", " (module): LSTM(1152, 400, batch_first=True)\n", " )\n", " )\n", " (input_dp): RNNDropout()\n", " (hidden_dps): ModuleList(\n", " (0): RNNDropout()\n", " (1): RNNDropout()\n", " (2): RNNDropout()\n", " )\n", " )\n", " (1): LinearDecoder(\n", " (decoder): Linear(in_features=400, out_features=60000, bias=True)\n", " (output_dp): RNNDropout()\n", " )\n", "), opt_func=functools.partial(, betas=(0.9, 0.99)), loss_func=FlattenedLoss of CrossEntropyLoss(), metrics=[], true_wd=True, bn_wd=True, wd=0.01, train_bn=True, path=WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb'), model_dir='models', callback_fns=[functools.partial(, add_time=True, silent=False)], callbacks=[RNNTrainer\n", "learn: ...\n", "alpha: 2.0\n", "beta: 1.0, MixedPrecision\n", "learn: ...\n", "loss_scale: 2097152.0\n", "max_noskip: 1000\n", "dynamic: True\n", "clip: None\n", "flat_master: False\n", "max_scale: 16777216\n", "loss_fp32: True], layer_groups=[Sequential(\n", " (0): WeightDropout(\n", " (module): LSTM(400, 1152, batch_first=True)\n", " )\n", " (1): RNNDropout()\n", "), Sequential(\n", " (0): WeightDropout(\n", " (module): LSTM(1152, 1152, batch_first=True)\n", " )\n", " (1): RNNDropout()\n", "), Sequential(\n", " (0): WeightDropout(\n", " (module): LSTM(1152, 400, batch_first=True)\n", " )\n", " (1): RNNDropout()\n", "), Sequential(\n", " (0): Embedding(60000, 400, padding_idx=1)\n", " (1): EmbeddingDropout(\n", " (emb): Embedding(60000, 400, padding_idx=1)\n", " )\n", " (2): LinearDecoder(\n", " (decoder): Linear(in_features=400, out_features=60000, bias=True)\n", " (output_dp): RNNDropout()\n", " )\n", ")], add_time=True, silent=False)" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "learn_lm.load('fine_tuned')" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Now that we've trained our model, different representations have been learned for the words that were in `IMDb` but not `wikitext-103` (remember that at the beginning we had initialized them all to the same thing):" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "hidden": true }, "outputs": [], "source": [ "enc = learn_lm.model[0].encoder" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.allclose(enc.weight[vocab.stoi[\"30-something\"], :], \n", " enc.weight[vocab.stoi[\"linklater\"], :])" ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.allclose(enc.weight[vocab.stoi[\"30-something\"], :], new_word_vec)" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "#### Generate movie review-like text, with the **fine-tuned** ` IMDb` language model\n", "Compare these texts to the ones generated with the **untrained** `IMDb model` in part **2A**. Do they seem qualitatively better?" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "How good is our fine-tuned IMDb language model? Well let's try to see what it predicts when given a phrase that might appear in a movie review." ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "hidden": true }, "outputs": [], "source": [ "TEXT = \"i liked this movie because\"\n", "N_WORDS = 40\n", "N_SENTENCES = 2" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "i liked this movie because it gave a new perspective on the subject . The story of how to kill a dog was actually a good concept that i found to be very funny . The whole movie is shot with a sense\n", "i liked this movie because it was fun to see Fred Macmurray - a good actor - and this movie had a good cast like Gene Hackman , Gene Hackman , Catherine o'hara , William Hurt\n" ] } ], "source": [ "print(\"\\n\".join(learn_lm.predict(TEXT, N_WORDS, temperature=0.75) for _ in range(N_SENTENCES)))" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "hidden": true }, "outputs": [], "source": [ "TEXT = \"This movie was\"\n", "N_WORDS = 30\n", "N_SENTENCES = 2" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This movie was just awful . i mean , they had a lot of money , i mean , they wanted to make money with a $ 35 million budget . And\n", "This movie was very pleasant to watch , but it was just too dull . It took us a few minutes to catch the Mexican guy , but he seemed to\n" ] } ], "source": [ "print(\"\\n\".join(learn_lm.predict(TEXT, N_WORDS, temperature=0.75) for _ in range(N_SENTENCES)))" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "hidden": true }, "outputs": [], "source": [ "TEXT = \"I hated this movie\"\n", "N_WORDS = 40\n", "N_SENTENCES = 2" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I hated this movie , if only it ever made it to DVD . If you are planning to see this : 1 ) i would recommend that you buy this movie instead of the movie . 2 . It is\n", "I hated this movie . It was a little too distant from the book , but the acting was good and the at least some scenes were funny . The story line was pretty good , but it was n't the best\n" ] } ], "source": [ "print(\"\\n\".join(learn_lm.predict(TEXT, N_WORDS, temperature=0.75) for _ in range(N_SENTENCES)))" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "#### Risks of language models" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "We will talk about ethical concerns raised by very accurate language models in lesson 7, but here are a few brief notes:\n", "\n", "In reference to [OpenAI's GPT-2](https://www.theverge.com/2019/2/14/18224704/ai-machine-learning-language-models-read-write-openai-gpt2): Jeremy Howard said, *I’ve been trying to warn people about this for a while. We have the technology to totally fill Twitter, email, and the web up with reasonable-sounding, context-appropriate prose, which would drown out all other speech and be impossible to filter.*\n", "\n", "For a small example, consider when completely incorrect (but reasonable sounding) ML generated answers were [posted to StackOverflow](https://meta.stackoverflow.com/questions/384596/completely-incorrect-machine-learning-generated-answers?stw=2):\n", "\n", "\"Roboflow\"" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "\"Roboflow\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Building an `IMDb Sentiment Classifier`\n", "#### We'll now use **transfer learning** to create a `classifier`, again starting from the pretrained weights of the `wikitext-103` language model. We'll also need the `IMDb language model` **encoder** that we saved previously. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3A. Load and preprocess the data, and form a `databunch`\n", "Using fastai's flexible API, we will now create a different kind of `databunch` object, one that is suitable for a **classifier** rather than a for **language model** (as we did in **2A**). This time we'll keep the labels for the `IMDb` movie reviews data. \n", "\n", "Add the `try-except` wrapper workaround for the bug in the `fastai Text API`\n", "\n", "Here the batch size is decreased from 48 to 8, to avoid a `CUDA out of memory error`; your hardware may be able to handle a larger batch, in which case training will likely be faster.\n", "\n", "Again, this takes a bit of time." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "bs=8" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "failure count is 5\n", "\n", "Wall time: 9min 15s\n" ] } ], "source": [ "%%time\n", "\n", "# throws `BrokenProcessPool' Error sometimes. Keep trying `till it works!\n", "# the progress bar has to complete three consecutive steps. Why three? \n", "# fails nearly 100 times, and doesn't respond to interrupt\n", "count = 0\n", "error = True\n", "while error:\n", " try: \n", " # Preprocessing steps\n", " data_clas = (TextList.from_folder(path, vocab=data_lm.vocab)\n", " #grab all the text files in path\n", " .split_by_folder(valid='test')\n", " #split by train and valid folder (that only keeps 'train' and 'test' so no need to filter)\n", " .label_from_folder(classes=['neg', 'pos']))\n", " #label them all with their folders\n", " #.databunch(bs=bs, num_workers=1))\n", " error = False\n", " print(f'failure count is {count}\\n') \n", " \n", " except: # catch *all* exceptions\n", " # accumulate failure count\n", " count = count + 1\n", " print(f'failure count is {count}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Form the preprocessed data into a `databunch`" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [], "source": [ "data_clas = data_clas.databunch(bs=bs, num_workers=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Save the databunch (since it took so long to make) and load it" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [], "source": [ "data_clas.save('imdb_textlist_class')" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "data_clas = load_data(path, 'imdb_textlist_class', bs=bs, num_workers=1)" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
texttarget
xxbos xxmaj match 1 : xxmaj tag xxmaj team xxmaj table xxmaj match xxmaj bubba xxmaj ray and xxmaj spike xxmaj dudley vs xxmaj eddie xxmaj guerrero and xxmaj chris xxmaj benoit xxmaj bubba xxmaj ray and xxmaj spike xxmaj dudley started things off with a xxmaj tag xxmaj team xxmaj table xxmaj match against xxmaj eddie xxmaj guerrero and xxmaj chris xxmaj benoit . xxmaj according to the rulespos
xxbos xxmaj some have praised _ xxunk _ as a xxmaj disney adventure for adults . i do n't think so -- at least not for thinking adults . \\n \\n xxmaj this script suggests a beginning as a live - action movie , that struck someone as the type of crap you can not sell to adults anymore . xxmaj the \" crack staff \" of many olderneg
xxbos xxmaj this movie was recently released on xxup dvd in the xxup us and i finally got the chance to see this hard - to - find gem . xxmaj it even came with original theatrical previews of other xxmaj italian horror classics like \" xxup xxunk \" and \" xxup beyond xxup the xxup darkness \" . xxmaj unfortunately , the previews were the best thing about thisneg
xxbos xxmaj there are many adaptations of xxmaj charlotte xxmaj brontë 's classic novel \" xxmaj jane xxmaj eyre \" , and taking into consideration the numerous reviews written about them there is also a lively discussion on which of them is the best . xxmaj the short film adaptations all suffer from the fact that it is simply not possible to cram the whole plot of the novel intopos
xxbos xxmaj title : xxmaj zombie 3 ( 1988 ) \\n \\n xxmaj directors : xxmaj mostly xxmaj lucio xxmaj fulci , but also xxmaj claudio xxmaj fragasso and xxmaj bruno xxmaj mattei \\n \\n xxmaj cast : xxmaj xxunk xxunk , xxmaj massimo xxmaj xxunk , xxmaj beatrice xxmaj ring , xxmaj deran xxmaj xxunk \\n \\n xxmaj review : \\n \\n xxmaj to reviewneg
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "data_clas.show_batch()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3B. Create a model to **classify** the `IMDb` reviews, and load the **encoder** we saved before.\n", "#### Freeze the weights for all but the last layer and find a good value for the learning rate. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "learn_c = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.3).to_fp16()\n", "learn_c.load_encoder('fine_tuned_enc')\n", "learn_c.freeze()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.\n" ] } ], "source": [ "learn_c.lr_find()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "learn_c.recorder.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3C. Training and fine-tuning the `IMDb sentiment classifier`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Train for one cycle, save intermediate result" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
epochtrain_lossvalid_lossaccuracytime
00.2870570.2023390.93272006:29
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "learn_c.fit_one_cycle(1, 2e-2, moms=(0.8,0.7))" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "learn_c.save('first')" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "learn_c.load('first');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Unfreeze last two layers and train for one cycle, save intermediate result." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
epochtrain_lossvalid_lossaccuracytime
00.2535360.1644510.94332007:25
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "learn_c.freeze_to(-2)\n", "learn_c.fit_one_cycle(1, slice(1e-2/(2.6**4),1e-2), moms=(0.8,0.7))" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "learn_c.save('2nd')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Unfreeze the last three layers, and train for one cycle, and save intermediate result.\n", "At this point we've already beaten the 2017 (pre-transfer learning) state of the art!" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
epochtrain_lossvalid_lossaccuracytime
00.2077900.1510740.94732009:21
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "learn_c.freeze_to(-3)\n", "learn_c.fit_one_cycle(1, slice(5e-3/(2.6**4),5e-3), moms=(0.8,0.7))" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "learn_c.save('3rd')" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RNNLearner(data=TextClasDataBunch;\n", "\n", "Train: LabelList (25000 items)\n", "x: TextList\n", "xxbos xxmaj story of a man who has unnatural feelings for a pig . xxmaj starts out with a opening scene that is a terrific example of absurd comedy . a formal orchestra audience is turned into an insane , violent mob by the crazy xxunk of it 's singers . xxmaj unfortunately it stays absurd the xxup whole time with no general narrative eventually making it just too off putting . xxmaj even those from the era should be turned off . xxmaj the cryptic dialogue would make xxmaj shakespeare seem easy to a third grader . xxmaj on a technical level it 's better than you might think with some good cinematography by future great xxmaj vilmos xxmaj zsigmond . xxmaj future stars xxmaj sally xxmaj kirkland and xxmaj frederic xxmaj forrest can be seen briefly .,xxbos xxmaj airport ' 77 starts as a brand new luxury 747 plane is loaded up with valuable paintings & such belonging to rich businessman xxmaj philip xxmaj stevens ( xxmaj james xxmaj stewart ) who is flying them & a bunch of xxup vip 's to his estate in preparation of it being opened to the public as a museum , also on board is xxmaj stevens daughter xxmaj julie ( xxmaj kathleen xxmaj quinlan ) & her son . xxmaj the luxury jetliner takes off as planned but mid - air the plane is hi - jacked by the co - pilot xxmaj chambers ( xxmaj robert xxmaj foxworth ) & his two accomplice 's xxmaj banker ( xxmaj monte xxmaj markham ) & xxmaj wilson ( xxmaj michael xxmaj pataki ) who knock the passengers & crew out with sleeping gas , they plan to steal the valuable cargo & land on a disused plane strip on an isolated island but while making his descent xxmaj chambers almost hits an oil rig in the xxmaj ocean & loses control of the plane sending it crashing into the sea where it sinks to the bottom right bang in the middle of the xxmaj bermuda xxmaj triangle . xxmaj with air in short supply , water leaking in & having flown over 200 miles off course the problems mount for the survivor 's as they await help with time fast running out ... \n", " \n", " xxmaj also known under the slightly different tile xxmaj airport 1977 this second sequel to the smash - hit disaster thriller xxmaj airport ( 1970 ) was directed by xxmaj jerry xxmaj jameson & while once again like it 's predecessors i ca n't say xxmaj airport ' 77 is any sort of forgotten classic it is entertaining although not necessarily for the right reasons . xxmaj out of the three xxmaj airport films i have seen so far i actually liked this one the best , just . xxmaj it has my favourite plot of the three with a nice mid - air hi - jacking & then the crashing ( did n't he see the oil rig ? ) & sinking of the 747 ( maybe the makers were trying to cross the original xxmaj airport with another popular disaster flick of the period xxmaj the xxmaj poseidon xxmaj adventure ( 1972 ) ) & submerged is where it stays until the end with a stark dilemma facing those trapped inside , either suffocate when the air runs out or drown as the 747 floods or if any of the doors are opened & it 's a decent idea that could have made for a great little disaster flick but bad unsympathetic character 's , dull dialogue , lethargic set - pieces & a real lack of danger or suspense or tension means this is a missed opportunity . xxmaj while the rather sluggish plot keeps one entertained for 108 odd minutes not that much happens after the plane sinks & there 's not as much urgency as i thought there should have been . xxmaj even when the xxmaj navy become involved things do n't pick up that much with a few shots of huge ships & helicopters flying about but there 's just something lacking here . xxmaj george xxmaj kennedy as the jinxed airline worker xxmaj joe xxmaj patroni is back but only gets a couple of scenes & barely even says anything preferring to just look worried in the background . \n", " \n", " xxmaj the home video & theatrical version of xxmaj airport ' 77 run 108 minutes while the xxup us xxup tv versions add an extra hour of footage including a new opening credits sequence , many more scenes with xxmaj george xxmaj kennedy as xxmaj patroni , flashbacks to flesh out character 's , longer rescue scenes & the discovery or another couple of dead bodies including the navigator . xxmaj while i would like to see this extra footage i am not sure i could sit through a near three hour cut of xxmaj airport ' 77 . xxmaj as expected the film has dated badly with horrible fashions & interior design choices , i will say no more other than the toy plane model effects are n't great either . xxmaj along with the other two xxmaj airport sequels this takes pride of place in the xxmaj razzie xxmaj award 's xxmaj hall of xxmaj shame although i can think of lots of worse films than this so i reckon that 's a little harsh . xxmaj the action scenes are a little dull unfortunately , the pace is slow & not much excitement or tension is generated which is a shame as i reckon this could have been a pretty good film if made properly . \n", " \n", " xxmaj the production values are alright if nothing spectacular . xxmaj the acting is n't great , two time xxmaj oscar winner xxmaj jack xxmaj lemmon has said since it was a mistake to star in this , one time xxmaj oscar winner xxmaj james xxmaj stewart looks old & frail , also one time xxmaj oscar winner xxmaj lee xxmaj grant looks drunk while xxmaj sir xxmaj christopher xxmaj lee is given little to do & there are plenty of other familiar faces to look out for too . \n", " \n", " xxmaj airport ' 77 is the most disaster orientated of the three xxmaj airport films so far & i liked the ideas behind it even if they were a bit silly , the production & bland direction does n't help though & a film about a sunken plane just should n't be this boring or lethargic . xxmaj followed by xxmaj the xxmaj concorde ... xxmaj airport ' 79 ( 1979 ) .,xxbos xxmaj this film lacked something i could n't put my finger on at first : charisma on the part of the leading actress . xxmaj this inevitably translated to lack of chemistry when she shared the screen with her leading man . xxmaj even the romantic scenes came across as being merely the actors at play . xxmaj it could very well have been the director who miscalculated what he needed from the actors . i just do n't know . \n", " \n", " xxmaj but could it have been the screenplay ? xxmaj just exactly who was the chef in love with ? xxmaj he seemed more enamored of his culinary skills and restaurant , and ultimately of himself and his youthful exploits , than of anybody or anything else . xxmaj he never convinced me he was in love with the princess . \n", " \n", " i was disappointed in this movie . xxmaj but , do n't forget it was nominated for an xxmaj oscar , so judge for yourself .,xxbos xxmaj sorry everyone , , , i know this is supposed to be an \" art \" film , , but wow , they should have handed out guns at the screening so people could blow their brains out and not watch . xxmaj although the scene design and photographic direction was excellent , this story is too painful to watch . xxmaj the absence of a sound track was brutal . xxmaj the l xxrep 4 o xxrep 5 n g shots were too long . xxmaj how long can you watch two people just sitting there and talking ? xxmaj especially when the dialogue is two people complaining . i really had a hard time just getting through this film . xxmaj the performances were excellent , but how much of that dark , sombre , uninspired , stuff can you take ? xxmaj the only thing i liked was xxmaj maureen xxmaj stapleton and her red dress and dancing scene . xxmaj otherwise this was a ripoff of xxmaj bergman . xxmaj and i 'm no fan f his either . i think anyone who says they enjoyed 1 1 / 2 hours of this is , , well , lying .,xxbos xxmaj when i was little my parents took me along to the theater to see xxmaj interiors . xxmaj it was one of many movies i watched with my parents , but this was the only one we walked out of . xxmaj since then i had never seen xxmaj interiors until just recently , and i could have lived out the rest of my life without it . xxmaj what a pretentious , ponderous , and painfully boring piece of 70 's wine and cheese tripe . xxmaj woody xxmaj allen is one of my favorite directors but xxmaj interiors is by far the worst piece of crap of his career . xxmaj in the unmistakable style of xxmaj ingmar xxmaj berman , xxmaj allen gives us a dark , angular , muted , insight in to the lives of a family wrought by the psychological damage caused by divorce , estrangement , career , love , non - love , xxunk , whatever . xxmaj the film , intentionally , has no comic relief , no music , and is drenched in shadowy pathos . xxmaj this film style can be best defined as expressionist in nature , using an improvisational method of dialogue to illicit a \" more pronounced depth of meaning and truth \" . xxmaj but xxmaj woody xxmaj allen is no xxmaj ingmar xxmaj bergman . xxmaj the film is painfully slow and dull . xxmaj but beyond that , i simply had no connection with or sympathy for any of the characters . xxmaj instead i felt only contempt for this parade of shuffling , whining , nicotine stained , martyrs in a perpetual quest for identity . xxmaj amid a backdrop of cosmopolitan affluence and baked xxmaj brie intelligentsia the story looms like a fart in the room . xxmaj everyone speaks in affected platitudes and elevated language between cigarettes . xxmaj everyone is \" lost \" and \" struggling \" , desperate to find direction or understanding or whatever and it just goes on and on to the point where you just want to slap all of them . xxmaj it 's never about resolution , it 's only about interminable introspective babble . xxmaj it is nothing more than a psychological drama taken to an extreme beyond the audience 's ability to connect . xxmaj woody xxmaj allen chose to make characters so immersed in themselves we feel left out . xxmaj and for that reason i found this movie painfully self indulgent and spiritually draining . i see what he was going for but his insistence on promoting his message through xxmaj prozac prose and distorted film techniques jettisons it past the point of relevance . i highly recommend this one if you 're feeling a little too happy and need something to remind you of death . xxmaj otherwise , let 's just pretend this film never happened .\n", "y: CategoryList\n", "neg,neg,neg,neg,neg\n", "Path: C:\\Users\\cross-entropy\\.fastai\\data\\imdb;\n", "\n", "Valid: LabelList (25000 items)\n", "x: TextList\n", "xxbos xxmaj once again xxmaj mr. xxmaj costner has dragged out a movie for far longer than necessary . xxmaj aside from the terrific sea rescue sequences , of which there are very few i just did not care about any of the characters . xxmaj most of us have ghosts in the closet , and xxmaj costner 's character are realized early on , and then forgotten until much later , by which time i did not care . xxmaj the character we should really care about is a very cocky , overconfident xxmaj ashton xxmaj kutcher . xxmaj the problem is he comes off as kid who thinks he 's better than anyone else around him and shows no signs of a cluttered closet . xxmaj his only obstacle appears to be winning over xxmaj costner . xxmaj finally when we are well past the half way point of this stinker , xxmaj costner tells us all about xxmaj kutcher 's ghosts . xxmaj we are told why xxmaj kutcher is driven to be the best with no prior inkling or foreshadowing . xxmaj no magic here , it was all i could do to keep from turning it off an hour in .,xxbos xxmaj this is an example of why the majority of action films are the same . xxmaj generic and boring , there 's really nothing worth watching here . a complete waste of the then barely - tapped talents of xxmaj ice - t and xxmaj ice xxmaj cube , who 've each proven many times over that they are capable of acting , and acting well . xxmaj do n't bother with this one , go see xxmaj new xxmaj jack xxmaj city , xxmaj ricochet or watch xxmaj new xxmaj york xxmaj undercover for xxmaj ice - t , or xxmaj boyz n the xxmaj hood , xxmaj higher xxmaj learning or xxmaj friday for xxmaj ice xxmaj cube and see the real deal . xxmaj ice - t 's horribly cliched dialogue alone makes this film grate at the teeth , and i 'm still wondering what the heck xxmaj bill xxmaj paxton was doing in this film ? xxmaj and why the heck does he always play the exact same character ? xxmaj from xxmaj aliens onward , every film i 've seen with xxmaj bill xxmaj paxton has him playing the exact same irritating character , and at least in xxmaj aliens his character died , which made it somewhat gratifying ... \n", " \n", " xxmaj overall , this is second - rate action trash . xxmaj there are countless better films to see , and if you really want to see this one , watch xxmaj judgement xxmaj night , which is practically a carbon copy but has better acting and a better script . xxmaj the only thing that made this at all worth watching was a decent hand on the camera - the cinematography was almost refreshing , which comes close to making up for the horrible film itself - but not quite . 4 / 10 .,xxbos xxmaj first of all i hate those moronic rappers , who could'nt act if they had a gun pressed against their foreheads . xxmaj all they do is curse and shoot each other and acting like xxunk version of gangsters . \n", " \n", " xxmaj the movie does n't take more than five minutes to explain what is going on before we 're already at the warehouse xxmaj there is not a single sympathetic character in this movie , except for the homeless guy , who is also the only one with half a brain . \n", " \n", " xxmaj bill xxmaj paxton and xxmaj william xxmaj sadler are both hill billies and xxmaj xxunk character is just as much a villain as the gangsters . i did'nt like him right from the start . \n", " \n", " xxmaj the movie is filled with pointless violence and xxmaj walter xxmaj hills specialty : people falling through windows with glass flying everywhere . xxmaj there is pretty much no plot and it is a big problem when you root for no - one . xxmaj everybody dies , except from xxmaj paxton and the homeless guy and everybody get what they deserve . \n", " \n", " xxmaj the only two black people that can act is the homeless guy and the junkie but they 're actors by profession , not annoying ugly brain dead rappers . \n", " \n", " xxmaj stay away from this crap and watch 48 hours 1 and 2 instead . xxmaj at lest they have characters you care about , a sense of humor and nothing but real actors in the cast .,xxbos xxmaj not even the xxmaj beatles could write songs everyone liked , and although xxmaj walter xxmaj hill is no mop - top he 's second to none when it comes to thought provoking action movies . xxmaj the nineties came and social platforms were changing in music and film , the emergence of the xxmaj rapper turned movie star was in full swing , the acting took a back seat to each man 's overpowering regional accent and transparent acting . xxmaj this was one of the many ice - t movies i saw as a kid and loved , only to watch them later and cringe . xxmaj bill xxmaj paxton and xxmaj william xxmaj sadler are firemen with basic lives until a burning building tenant about to go up in flames hands over a map with gold implications . i hand it to xxmaj walter for quickly and neatly setting up the main characters and location . xxmaj but i fault everyone involved for turning out xxmaj lame - o performances . xxmaj ice - t and cube must have been red hot at this time , and while i 've enjoyed both their careers as rappers , in my opinion they fell flat in this movie . xxmaj it 's about ninety minutes of one guy ridiculously turning his back on the other guy to the point you find yourself locked in multiple states of disbelief . xxmaj now this is a movie , its not a documentary so i wo nt waste my time recounting all the stupid plot twists in this movie , but there were many , and they led nowhere . i got the feeling watching this that everyone on set was xxunk of confused and just playing things off the cuff . xxmaj there are two things i still enjoy about it , one involves a scene with a needle and the other is xxmaj sadler 's huge 45 pistol . xxmaj bottom line this movie is like domino 's pizza . xxmaj yeah ill eat it if i 'm hungry and i do n't feel like cooking , xxmaj but i 'm well aware it tastes like crap . 3 stars , meh .,xxbos xxmaj brass pictures ( movies is not a fitting word for them ) really are somewhat brassy . xxmaj their alluring visual qualities are reminiscent of expensive high class xxup tv commercials . xxmaj but unfortunately xxmaj brass pictures are feature films with the pretense of wanting to entertain viewers for over two hours ! xxmaj in this they fail miserably , their undeniable , but rather soft and flabby than steamy , erotic qualities non withstanding . \n", " \n", " xxmaj senso ' 45 is a remake of a film by xxmaj luchino xxmaj visconti with the same title and xxmaj alida xxmaj valli and xxmaj farley xxmaj granger in the lead . xxmaj the original tells a story of senseless love and lust in and around xxmaj venice during the xxmaj italian wars of independence . xxmaj brass moved the action from the 19th into the 20th century , 1945 to be exact , so there are xxmaj mussolini murals , men in black shirts , xxmaj german uniforms or the tattered garb of the partisans . xxmaj but it is just window dressing , the historic context is completely negligible . \n", " \n", " xxmaj anna xxmaj xxunk plays the attractive aristocratic woman who falls for the amoral xxup ss guy who always puts on too much lipstick . xxmaj she is an attractive , versatile , well trained xxmaj italian actress and clearly above the material . xxmaj her wide range of facial expressions ( xxunk boredom , loathing , delight , fear , hate ... and ecstasy ) are the best reason to watch this picture and worth two stars . xxmaj she endures this basically trashy stuff with an astonishing amount of dignity . i wish some really good parts come along for her . xxmaj she really deserves it .\n", "y: CategoryList\n", "neg,neg,neg,neg,neg\n", "Path: C:\\Users\\cross-entropy\\.fastai\\data\\imdb;\n", "\n", "Test: None, model=SequentialRNN(\n", " (0): MultiBatchEncoder(\n", " (module): AWD_LSTM(\n", " (encoder): Embedding(60000, 400, padding_idx=1)\n", " (encoder_dp): EmbeddingDropout(\n", " (emb): Embedding(60000, 400, padding_idx=1)\n", " )\n", " (rnns): ModuleList(\n", " (0): WeightDropout(\n", " (module): LSTM(400, 1152, batch_first=True)\n", " )\n", " (1): WeightDropout(\n", " (module): LSTM(1152, 1152, batch_first=True)\n", " )\n", " (2): WeightDropout(\n", " (module): LSTM(1152, 400, batch_first=True)\n", " )\n", " )\n", " (input_dp): RNNDropout()\n", " (hidden_dps): ModuleList(\n", " (0): RNNDropout()\n", " (1): RNNDropout()\n", " (2): RNNDropout()\n", " )\n", " )\n", " )\n", " (1): PoolingLinearClassifier(\n", " (layers): Sequential(\n", " (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (1): Dropout(p=0.12, inplace=False)\n", " (2): Linear(in_features=1200, out_features=50, bias=True)\n", " (3): ReLU(inplace=True)\n", " (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (5): Dropout(p=0.1, inplace=False)\n", " (6): Linear(in_features=50, out_features=2, bias=True)\n", " )\n", " )\n", "), opt_func=functools.partial(, betas=(0.9, 0.99)), loss_func=FlattenedLoss of CrossEntropyLoss(), metrics=[], true_wd=True, bn_wd=True, wd=0.01, train_bn=True, path=WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb'), model_dir='models', callback_fns=[functools.partial(, add_time=True, silent=False)], callbacks=[RNNTrainer\n", "learn: ...\n", "alpha: 2.0\n", "beta: 1.0, MixedPrecision\n", "learn: ...\n", "loss_scale: 65536\n", "max_noskip: 1000\n", "dynamic: True\n", "clip: None\n", "flat_master: False\n", "max_scale: 16777216\n", "loss_fp32: True], layer_groups=[Sequential(\n", " (0): Embedding(60000, 400, padding_idx=1)\n", " (1): EmbeddingDropout(\n", " (emb): Embedding(60000, 400, padding_idx=1)\n", " )\n", "), Sequential(\n", " (0): WeightDropout(\n", " (module): LSTM(400, 1152, batch_first=True)\n", " )\n", " (1): RNNDropout()\n", "), Sequential(\n", " (0): WeightDropout(\n", " (module): LSTM(1152, 1152, batch_first=True)\n", " )\n", " (1): RNNDropout()\n", "), Sequential(\n", " (0): WeightDropout(\n", " (module): LSTM(1152, 400, batch_first=True)\n", " )\n", " (1): RNNDropout()\n", "), Sequential(\n", " (0): PoolingLinearClassifier(\n", " (layers): Sequential(\n", " (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (1): Dropout(p=0.12, inplace=False)\n", " (2): Linear(in_features=1200, out_features=50, bias=True)\n", " (3): ReLU(inplace=True)\n", " (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (5): Dropout(p=0.1, inplace=False)\n", " (6): Linear(in_features=50, out_features=2, bias=True)\n", " )\n", " )\n", ")], add_time=True, silent=False)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "learn_c.load('3rd')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Unfreeze all the layers, train for two cycles, and save the result." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: at this step I encountered a `CUDA error: unspecified launch failure`. This is a known (and unsolved) problem with PyTorch when using an LSTM. https://github.com/pytorch/pytorch/issues/27837\n", "\n", "Nothing to do but try again... and it worked on the second try." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
epochtrain_lossvalid_lossaccuracytime
00.2057160.1448050.94872012:00
10.1322350.1424930.94932012:01
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "learn_c.unfreeze()\n", "learn_c.fit_one_cycle(2, slice(1e-3/(2.6**4),1e-3), moms=(0.8,0.7))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The state of the art for this dataset in 2017 was 94.1%, and we have crushed it!!!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Save the IMDb classifer model" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "learn_c.save('clas')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Let's look at a few examples, just to check that the classifier is working as we think it should. \n", "The three outputs of the model predition are the label (`pos` or `neg`) and the class probability estimates for `neg` and `pos`, which meausure the model's confidence in it's prediction. As we'd expect, the model is extremely confident that the first review is `pos` and quite confident that the second review is `neg`. So it passes the test with flying colors. " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\cross-entropy\\Anaconda3\\envs\\fastai\\lib\\site-packages\\fastai\\torch_core.py:83: UserWarning: Tensor is int32: upgrading to int64; for better performance use int64 input\n", " warn('Tensor is int32: upgrading to int64; for better performance use int64 input')\n" ] }, { "data": { "text/plain": [ "(Category pos, tensor(1), tensor([0.0161, 0.9839]))" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "learn_c.predict(\"I really loved that movie, it was awesome!\")" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\cross-entropy\\Anaconda3\\envs\\fastai\\lib\\site-packages\\fastai\\torch_core.py:83: UserWarning: Tensor is int32: upgrading to int64; for better performance use int64 input\n", " warn('Tensor is int32: upgrading to int64; for better performance use int64 input')\n" ] }, { "data": { "text/plain": [ "(Category neg, tensor(0), tensor([0.9698, 0.0302]))" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "learn_c.predict(\"I didn't really love that movie, and I didn't think it was awesome.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Now that we've built the model, here is the part where you get to have some fun!! Take the model for a spin, try out your own examples!!" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "## Appendix: Language Model Zoo" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "fast.ai alumni have applied ULMFit to dozens of different languages, and have beat the SOTA in Thai, Polish, German, Indonesian, Hindi, & Malay.\n", "\n", "They share tips and best practices in [this forum thread](https://forums.fast.ai/t/language-model-zoo-gorilla/14623) in case you are interested in getting involved!\n", "\n", "\"language" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }