{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lesson 13 - NLP with Deep Learning\n",
"\n",
"> An introduction to Deep Learning and its applications in NLP"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lvwerra/dslectures/blob/master/notebooks/lesson13_nlp-deep.ipynb)[![slides](https://img.shields.io/static/v1?label=slides&message=2021-lesson13.pdf&color=blue&logo=Google-drive)](https://drive.google.com/open?id=1glpOcwvG0vyuRvTo0nmOaXDqOfVmADy9)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> Note: Make sure you are connected to a GPU machine when running the Colab notebook by clicking on `Runtime -> Change runtime type` and set hardware type to GPU."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Learning objectives\n",
"In this lecture we cover the basics of Deep Learning and its application to NLP. The learning goals are:\n",
"* The basics of transfer learning\n",
"* Preprocess data with the fastai data loader\n",
"* Train a text classifier with the fastai library"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## References\n",
"* Practical Deep Learning for Coders - Lesson 4: Deep Learning 2019 by fastai [[video](https://www.youtube.com/watch?v=qqt3aMPB81c)]\n",
"\n",
"This notebooks follows fastai's excellent tutotrial in this video and the original notebook can be found [here](https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson3-imdb.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Homework\n",
"As homework read the references, work carefully through the notebook and solve the exercises. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"Last time we built a sentiment classifier with `scikit-learn` and achieved around 85% accuracy on the test set. This is already pretty good, but can we do better? We are still wrong 15/100 times. It turns out we can if we use deep learning.\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"Deep learning uses an architecture that is modeled after the brain and uses networks of artificial neurons to mimic its behaviour. These models are much bigger than the models we encountered so far and can have millions to billions of parameters. Training these models and adjusting the parameters is also more challenging, and generally requires much more data and compute. A lot of computations are easily parallelizable, which is a strength of modern GPUs. Therefore we will run this notebook on a GPU that enables much faster training than a CPU.\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"Since we don't have much training data on the IMDb dataset for deep learning standards we use transfer learning to still achieve high accuracy in predicting the sentiment of the movie reviews."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Transfer learning\n",
"Training deep learning models requires a lot of data. It is not uncommon to train models on millions of images or gigabytes of text data to achieve good results. Most real-world problems don't have that amount of labeled data ready, and not all companies or individuals who want to train a model can afford to hire people to label data for them.\n",
"\n",
"For many years this has been very challenging. Fortunately, it has been solved for image based models a couple of years ago and recently also for NLP. One approach that helps train models with limited labeled data is called **transfer learning**.\n",
"\n",
"The idea is that once a model is trained on a large dataset for a specific task (e.g., classifying houses vs. planes), the model has learned certain features of the data that can be reused for another task. Such features could be how to detect edges or textures in images. If these features are useful for another task, then we can train the model on new data without requiring as many labels as if we were training it from scratch."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ULMFiT\n",
"For transfer learning in NLP Jeremy Howard and Sebastian Ruder came up with a similar approach called `ULMFiT` (Universal Language Model Fine-tuning for Text Classification) for texts. The central theme of the approach is language modeling."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Language modeling\n",
"In language modeling the goal is to predict the next word based on the previous word in a text. An example:\n",
"\n",
"`Yesterday I discovered a really nice cinema so I went there to watch a ____ .`\n",
"\n",
"The task of the model is to fill the blank. The advantage of language modeling is that it does not require any labels, but to achieve good results, the model needs to learn a lot about language. In this example, the model needs to understand that one watches movies in cinemas. The same goes for sentiment and other topics. With `ULMFiT` one can train a language model and then use it for classifications tasks in three steps."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Three steps\n",
"The three steps are visualised in the following figure:\n",
"
\n",
"\n",
"
\n",
"\n",
"1. **Language model (wiki)**: A language model is trained on a large dataset. Wikipedia is a common choice for this task as it includes many topics, and the text is of high quality. This step usually takes the most time on the order of days. In this step, the model learns the general structure of language.\n",
"\n",
"2. **Language model (domain)**: The language model trained on Wikipedia might be missing some aspects of the domain we are interested in. If we want to do sentiment classification, Wikipedia does not offer much insight since Wikipedia articles are generally of neutral sentiment. Therefore, we continue training the language model on the text we are interested in. This step still takes several hours.\n",
"\n",
"3. **Classifier (domain)**: Now that the language model works well on the text we are interested in, it is time to build a classifier. We do this by adapting the output of the network to yield classes instead of words. This step only takes a couple of minutes to an hour to complete.\n",
"\n",
"The power of this approach is that you only need little labeled data for the last step and only need to go through the expensive first step once. Even the second step can be reused on the dataset if you, for example, build a sentiment classifier and additionally a topic classifier. This can be done in minutes and allows us to achieve great results with little time and resources."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Other methods\n",
"Today, there are many approaches in NLP that use transfer learning such as Google's BERT. These models cost on the order of $100'000 to pretrain (1. step) and massive computational facilities. Fortunately, most of these models are shared and can then be fine-tuned on small machines."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The `fastai` library\n",
"The `fastai` library wraps around the deep learning framework `PyTorch` and has a lot of functionality built in to achieve great results quickly. The library abstracts a lot of functionality, so it can be difficult to follow initially. To get a better understanding, we highly recommend the [fastai course](https://course.fast.ai/). In this lesson, we will use the library to build a world-class classifier with just a few lines of code."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we need to make sure that the Colab notebook has all the right libraries installed. We use the [r.txt](https://drive.google.com/file/d/1pDime9zgq66wDXgMlcS_CYrcrFb_bGxI/view) file that contains all the necessary libraries. To upload it use the folder icon on the left in colab. After you run the following step make sure to restart the notebook with `Runtime -> Restart Runtime`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install -r r.txt"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from fastai.text import *\n",
"from dslectures.core import get_dataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Download data\n",
"The fastai library includes a lot of datasets including, the IMDb dataset we already know. Similar to our `download_dataset()` function we can do this with fastai's `untar_data()` function."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"path = untar_data(URLs.IMDB)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Data structure\n",
"Looking at the downloaded folder, we can see that there are several files and folders. The relevant ones for our case are `test`, `train`, and `unsup`. The `train` and `test` folders split the data the same way we split it in the previous lecture. The new `unsup` (for unsupervised) folder contains 50k movie reviews that are not classified. We can't use them for training a classifier, but we can use them to fine-tune the language model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(#8) [Path('/Users/leandro/.fastai/data/imdb/test'),Path('/Users/leandro/.fastai/data/imdb/tmp_clas'),Path('/Users/leandro/.fastai/data/imdb/imdb.vocab'),Path('/Users/leandro/.fastai/data/imdb/unsup'),Path('/Users/leandro/.fastai/data/imdb/README'),Path('/Users/leandro/.fastai/data/imdb/tmp_lm'),Path('/Users/leandro/.fastai/data/imdb/data_lm.pkl'),Path('/Users/leandro/.fastai/data/imdb/train')]"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"path.ls()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Looking at the training folder, we find two folders for the negative and positive movie reviews."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(#4) [Path('/Users/leandro/.fastai/data/imdb/train/neg'),Path('/Users/leandro/.fastai/data/imdb/train/pos'),Path('/Users/leandro/.fastai/data/imdb/train/unsupBow.feat'),Path('/Users/leandro/.fastai/data/imdb/train/labeledBow.feat')]"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(path/'train/').ls()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In both folders, we find many files, each containing one movie review. This is exactly the same data we used last time. It is just arranged in a different structure. We don't need to load all these files manually - the fastai library does this automatically. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(#10) [Path('/Users/leandro/.fastai/data/imdb/train/neg/1821_4.txt'),Path('/Users/leandro/.fastai/data/imdb/train/neg/10402_1.txt'),Path('/Users/leandro/.fastai/data/imdb/train/neg/1062_4.txt'),Path('/Users/leandro/.fastai/data/imdb/train/neg/9056_1.txt'),Path('/Users/leandro/.fastai/data/imdb/train/neg/5392_3.txt'),Path('/Users/leandro/.fastai/data/imdb/train/neg/2682_3.txt'),Path('/Users/leandro/.fastai/data/imdb/train/neg/3351_4.txt'),Path('/Users/leandro/.fastai/data/imdb/train/neg/399_2.txt'),Path('/Users/leandro/.fastai/data/imdb/train/neg/10447_1.txt'),Path('/Users/leandro/.fastai/data/imdb/train/neg/10096_1.txt')]"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(path/'train/neg').ls()[:10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Fine-tune language model\n",
"\n",
"> Note: Fine-tuning the language model takes around 4h. You can skip this step and download the fine-tuned model in the *Load language model* section of the notebook."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Preprocess data for language modeling\n",
"In the last lecture, we implemented our own function to preprocess the texts and tokenize them. In principle, we could do the same here, but fastai comes with built-in functions to take care of this. In addtion, we can specify which folders to use and what percentage to split off for validation. The batch size `bs` specifies how many samples the model is optimised for at each step. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"bs=48"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data_lm = (TextList.from_folder(path)\n",
" #Inputs: all the text files in path\n",
" .filter_by_folder(include=['train', 'test', 'unsup']) \n",
" #We may have other temp folders that contain text files so we only keep what's in train and test\n",
" .split_by_rand_pct(0.1)\n",
" #We randomly split and keep 10% (10,000 reviews) for validation\n",
" .label_for_lm() \n",
" #We want to do a language model so we label accordingly\n",
" .databunch(bs=bs))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similar to the vectorizer vocabulary, we can have a look at the encoding scheme. The `itos` (stands for id-to-string) object tells us which token is encoded at which position."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"60000"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(data_lm.vocab.itos)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that the vocabulary contains XXX tokens."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['xxunk',\n",
" 'xxpad',\n",
" 'xxbos',\n",
" 'xxeos',\n",
" 'xxfld',\n",
" 'xxmaj',\n",
" 'xxup',\n",
" 'xxrep',\n",
" 'xxwrep',\n",
" 'the',\n",
" '.',\n",
" ',',\n",
" 'and',\n",
" 'a',\n",
" 'of',\n",
" 'to',\n",
" 'is',\n",
" 'it',\n",
" 'in',\n",
" 'i']"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_lm.vocab.itos[:20]\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that the first few positions are reserved for special tokens starting with `xx`. The token `xxunk` is used for a word that is not in the dictionary. The `xxbos` and `xxeos` identify the beginning and the end of a string. So if the first entry in the encoding vector is 1 this means that the token is `xxunk`. If the third entry is 1 then the token is `xxbos`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also look at a processed text. We notice that the token `xxmaj` is used frequently. It signifies that the first letter of the following word is capitalised."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text [ 2 5 1869 5 ... 12 5 17310 10]"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_lm.train_ds[0][0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With fastai we can also sample a batch from the dataset and display the sample in a dataframe:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
" \n",
"
\n",
"
idx
\n",
"
text
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
, but one gets the impression in watching the film that it was not pulled off as well as it could have been . xxmaj the fact that it is cluttered by a rather uninteresting subplot and mostly uninteresting kidnappers really muddles things . xxmaj the movie is worth a xxunk if for nothing more than entertaining performances by xxmaj rickman , xxmaj thompson , and xxmaj holbrook . xxbos
\n",
"
\n",
"
\n",
"
1
\n",
"
gosling 's smile or contrived moralizing . xxmaj after the first 45 minutes however , the script blossomed into a watch - able albeit not completely entertaining or thought - provoking . xxmaj the highlights certainly include both xxmaj gosling and xxmaj morse 's acting , xxmaj gosling being an up - and - coming star , and xxmaj morse being an extremely well - established character actor with a
\n",
"
\n",
"
\n",
"
2
\n",
"
. xxmaj their glares and facial ticks would not cut any mustard with the boys from the hood . xxmaj unfortunately for all good people , these xxmaj charmed boys were sent to xxmaj xxunk 's xxmaj reform xxmaj school for the warlocks that could n't get into xxmaj harry xxmaj potter 's class . \\n \\n xxmaj so what was the movie all about ? xxmaj three teenagers
\n",
"
\n",
"
\n",
"
3
\n",
"
to be ) is around , and a string pulling bed sheets up and down . xxmaj oooh , can you feel the chills ? xxmaj the xxup dvd quality is that of a xxup vhs transfer ( which actually helps the film more than hurts it ) . xxmaj the dubbing is below even the lowest \" bad xxmaj italian movie \" standards and i gave it one star
\n",
"
\n",
"
\n",
"
4
\n",
"
as to xxmaj jack xxmaj lemmon , he plays his part so straight , he can hardly dip and glide when dancing . xxmaj and as mentioned , xxmaj dyan xxmaj cannon is outstandingly attractive as another swindler sailing with her mother who thinks xxmaj walter is rich , while he thinks she is rich . xxmaj elaine xxmaj stritch plays xxmaj dyan 's mother , another retread from the
\n",
"
\n",
" \n",
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data_lm.show_batch()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Representation\n",
"When we vectorized the texts in the last lecture, we represented them as count vectors. The architecture we use in this lecture allows for each word to be processed sequentially and thus conserving the order information. Therefore we encode the text with **one-hot encodings**. However, storing the information as vectors would not be very memory efficient; one entry is 1 and all the other entries are 0. It is more efficient to just store the information on which entry is 1 and then create the vector when we need it.\n",
"\n",
"We can look at the data representation of the example we printed above. Each number represents a word in the vocabulary and specifies the entry in the one-hot encoding that is set to one."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 2, 5, 1869, 5, 9777, 205, 5, 3608, 5, 4537])"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_lm.train_ds[0][0].data[:10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With the `itos` we can translate it back to tokens:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"xxbos\n",
"xxmaj\n",
"alan\n",
"xxmaj\n",
"rickman\n",
"&\n",
"xxmaj\n",
"emma\n",
"xxmaj\n",
"thompson\n"
]
}
],
"source": [
"for i in data_lm.train_ds[0][0].data[:10]:\n",
" print(data_lm.vocab.itos[i])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Train model\n",
"We will train a variant of a model called LSTM (long short-term memory) network. This is a neural network with a feedback loop. That means that when fed a sequence of tokens, it feeds back its output for the next prediction. With this the model has a mechanism remembering the past inputs. This is especially useful when dealing with sequential data such as texts, where the sequence of words and characters carries important meaning.\n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Load pretrained model\n",
"Training the model on Wikipedia takes a day or two. Fortunately, people have already trained the model and shared it in the fastai library. Therefore, we can just load the pretrained langauge model. When we load it, we also pass the dataset it will be trained on."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Downloading https://s3.amazonaws.com/fast-ai-modelzoo/wt103-fwd.tgz\n"
]
},
{
"data": {
"text/html": [],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3, model_dir=\"../data/\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Learning rate finder\n",
"The learning rate is a key parameter when training models in deep learning. It specifies how strongly we update the model parameters. If the learning rate is too small, the training takes forever. If the learning rate is too big, we will never converge to a minimum.\n",
"\n",
"With the `lr_find()` function, we can explore how the loss function behaves with regards to the value of the learning rate:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.\n"
]
}
],
"source": [
"learn.lr_find()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZgAAAEGCAYAAABYV4NmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8GearUAAAgAElEQVR4nO3deXzV9Zn3/9eVhCRAFtYQIEAAkUWQRUARd6jivnSzrrVVb6e2te1M2+nMPXN37k5n2vEex3a0Vdu6dKzVcao/17rUgqi4sIMsshOWrASykv36/XFOaMQkRDjfs+X9fDzy4Jzv93POuT6ck1zn8/1s5u6IiIhEWkqsAxARkeSkBCMiIoFQghERkUAowYiISCCUYEREJBBpsQ4gkoYMGeKFhYWxDkNEJGGsXLmywt2HBvHcSZVgCgsLWbFiRazDEBFJGGa2O6jn1iUyEREJhBKMiIgEQglGREQCoQQjIiKBUIIREZFAKMGIiEgglGBERCQQSjAiIgns9Y2lPPjm9liH0SklGBGRBPbHD4t5bNmuWIfRKSUYEZEEVlrdwLDczFiH0SklGBGRBFZS1UB+jhKMiIhEWGl1I8OUYEREJJJqG1uobWwhX5fIREQkkkqqGgB0iUxERCKrtDqUYHSJTEREIupIC0aXyEREJJJKjrRgMmIcSeeUYEREElRpdQPZmWn0S4/PzYmVYEREElQ8z4EBJRgRkYRVWt0Qt/0vEIUEY2apZrbazF7s5NxAM3vWzNaZ2QdmNrXDuUVm9pGZbTOzvw06ThGRRFNS3RC3I8ggOi2Yu4BNXZz7O2CNu58K3AT8DEJJCbgfuBiYAnzJzKZEIVYRkYTQ0tpGeU1j771EZmYFwKXAr7soMgV4A8DdNwOFZjYMmAtsc/cd7t4EPAlcGWSsIiKJ5EBdE21O3C50CcG3YO4Fvge0dXF+LXANgJnNBcYABcBIYE+HcnvDxz7BzG43sxVmtqK8vDxScYuIxLV4n8UPASYYM7sMKHP3ld0U+wkw0MzWAN8AVgMtgHVS1jt7And/yN1nu/vsoUOHnmjYIiIJoX0OTDwnmCAHT88HrjCzS4BMIMfMHnf3G9oLuHs1cAuAmRmwM/zTDxjV4bkKgP0BxioiklCOLBOTG5+TLCHAFoy7/8DdC9y9ELgW+HPH5AJgZgPMLD1891ZgaTjpLAcmmNnY8PlrgeeDilVEJNGUVDWQlmIM6R+/CSbq0z/N7A4Ad38AmAz81sxagY3AV8PnWszs68CrQCrwsLtviHasIiLxqqS6gbzsDFJSOutRiA9RSTDuvgRYEr79QIfj7wITunjMy8DLUQhPRCThxPNWye00k19EJAHF+zIxoAQjIpKQ4nmr5HZKMCIiCSbet0pupwQjIpJgEmGSJSjBiIgknHjfKrmdEoyISIKJ962S2ynBiIgkmERYJgaUYEREEk5pdQM5mWn0TU+NdSjdUoIREUkwJVXxvdFYOyUYEZEEE+9bJbdTghERSTDxvlVyOyUYEZEEkghbJbdTghERSSAVtfG/VXI7JRgRkQRSmiBDlEEJRkQkoSTKHBhQghERSSiJsFVyOyUYEZEEkghbJbdTghERSSCJsFVyOyUYEZEEkghbJbdTghERSSDFVQ0MV4IREZFIcndKqhrIz+kb61B6RAlGRCRB1DS2UN/USn4CjCADJRgRkYTxl43G1IIREZEIak8w6oMREZGIOtKCSYBZ/KAEIyKSMIrDCSYvR30wIiISQSXVDQzJSicjLb63Sm6nBCMikiBKqg4nxEZj7ZRgREQSRCJNsoQoJBgzSzWz1Wb2Yifncs3sBTNba2YbzOyWDud2mdl6M1tjZiuCjlNEJN6VVjeQn0AJJi0Kr3EXsAnI6eTcncBGd7/czIYCH5nZ79y9KXz+fHeviEKMIiJxraG5lYP1zQkzggwCbsGYWQFwKfDrLoo4kG1mBmQBlUBLkDGJiCSiRJtkCcFfIrsX+B7Q1sX5+4DJwH5gPXCXu7eXdeA1M1tpZrcHHKeISFxr38lSfTCAmV0GlLn7ym6KXQSsAUYAM4D7zKz9Utp8d58FXAzcaWbndPE6t5vZCjNbUV5eHsEaiIjEj/YWjEaRhcwHrjCzXcCTwAVm9vhRZW4BnvGQbcBOYBKAu+8P/1sGPAvM7exF3P0hd5/t7rOHDh0aTE1ERGKs+MglMiUY3P0H7l7g7oXAtcCf3f2Go4oVAQsAzGwYMBHYYWb9zSw7fLw/cCHwYVCxiojEu9LqBrIz08jKiMbYrMiIeqRmdgeAuz8A/Ah41MzWAwZ8390rzGwc8Gyo75804Al3fyXasYqIxIviqsMJNYIMopRg3H0JsCR8+4EOx/cTap0cXX4HMD0asYmIJIKSqsSaAwOayS8ikhBKqhNrFj8owYiIxL3m1jbKahoT7hKZEoyISJwrr2nEPbEmWYISjIhI3EvESZagBCMiEvcScZIlKMGIiMS99kmWasGIiEhElVY3kJGWwoB+fWIdyqeiBCMiEueKw3NgwpPPE4YSjIhInCutaki4IcqgBCMiEveKqw8nXP8LKMGIiMQ1d6e0qpFhSjAiIhJJlXVNNLW2MVyXyEREJJKKE3Cr5HZKMCIicawkATcaa6cEIyISxxJ1mRhQghERiWslVQ2kphhDsjJiHcqnpgQjIhLHiqsayMvOIDUlsSZZghKMiEhcK61OvJ0s2ynBiIjEsf2HEnOSJSjBiIjErdY2Z+/Bw4wa1C/WoRwXJRgRkThVUt1AU2sbYwb1j3Uox0UJRkQkThUdqAdgtFowIiISSXsqQwlmzGAlGBERiaDdlXWkpZg6+UVEJLKKKg8zcmBf0lIT8091YkYtItILFB2oS9j+F1CCERGJW0WV9Qk7RBmUYERE4lJ1QzMH65sZowQjIiKRlOhDlEEJRkQkLrUPUR6doEOUIQoJxsxSzWy1mb3YyblcM3vBzNaa2QYzu6XDuUVm9pGZbTOzvw06ThGReLI7nGDUB9O9u4BNXZy7E9jo7tOB84B/N7N0M0sF7gcuBqYAXzKzKVGIVUQkLhRV1jOwXx9yMvvEOpTj1qMEY2b9zSwlfPtkM7vCzI5ZazMrAC4Fft1FEQeyzcyALKASaAHmAtvcfYe7NwFPAlf2JFYRkWRQdKCe0YMTcw2ydj1twSwFMs1sJPAGcAvwaA8edy/wPaCti/P3AZOB/cB64C53bwNGAns6lNsbPvYJZna7ma0wsxXl5eU9CElEJP4VVdYndAc/9DzBmLvXA9cA/+nuVxO6dNX1A8wuA8rcfWU3xS4C1gAjgBnAfWaWA3S2dZt39gTu/pC7z3b32UOHDu1BVURE4ltLaxv7Dh1O6CHK8CkSjJnNA64HXgofSzvGY+YDV5jZLkKXuC4ws8ePKnML8IyHbAN2ApMItVhGdShXQKiVIyKS9PYfaqC1zXtNC+ZbwA+AZ919g5mNAxZ39wB3/4G7F7h7IXAt8Gd3v+GoYkXAAgAzGwZMBHYAy4EJZjbWzNLDj3++h7GKiCS0oiQYogzHboUA4O5vAm8ChDv7K9z9m8fzgmZ2R/g5HwB+BDxqZusJXRb7vrtXhMt9HXgVSAUedvcNx/N6IiKJZndlHZDYkyyhhwnGzJ4A7gBagZVArpnd4+539+Tx7r4EWBK+/UCH4/uBC7t4zMvAyz15fhGRZFJUWU96agrDchJzmf52Pb1ENsXdq4GrCP3RHw3cGFhUIiK9WNGBegoG9SU1pbPxTomjpwmmT3jey1XAc+7eTBejukRE5MQkwxBl6HmCeRDYBfQHlprZGKA6qKBERHord6foQH3CD1GGnnfy/xz4eYdDu83s/GBCEhHpvQ7VN1PT2JLQa5C16+lSMblmdk/7jHkz+3dCrRkREYmgI0OUe0uCAR4GaoAvhH+qgUeCCkpEpLdqX0V5TIKvQwY9vEQGjHf3z3a4/09mtiaIgEREerM9R5bp7xvjSE5cT1swh83srPY7ZjYfOBxMSCIivdfuA3UMzc6gX3pPv//Hr57W4A7gt2aWG75/ELg5mJBERHqvZBmiDD1swbj72vCmYKcCp7r7TOCCQCMTEemFkmWIMnzKHS3dvTo8ox/gOwHEIyLSa9U3tbC/qiEpOvjhxLZMTuw1DERE4syO8tAilycPy4pxJJFxIglGS8WIiETQ1rIaAE7KS44E020nv5nV0HkiMSDxx9CJiMSRbWW1pKVY0lwi6zbBuHt2tAIREenttpbWMmZwP9LTTuTiUvxIjlqIiCSBbeW1TMhLnu/1SjAiInGgqaWN3Qfqk6b/BZRgRETiwq4DdbS2OROSZAQZKMGIiMSFraW1AIwfqgQjIiIRtK2sFjMlGBERibCtZTUUDOxL3/TUWIcSMUowIiJxYFtZco0gAyUYEZGYa21zdlTUMSGJRpCBEoyISMztqaynqaWN8UowIiISSVvLQiPI1IIREZGIal/kUi0YERGJqG1lteTnZJKT2SfWoUSUEoyISIxtK6tNqiVi2inBiIjEkLsrwYiISOTtr2qgvqk1KRNMt/vBRIKZpQIrgH3uftlR574LXN8hlsnAUHevNLNdQA3QCrS4++ygYxURibZtSTqCDKKQYIC7gE1AztEn3P1u4G4AM7sc+La7V3Yocr67V0QhRhGRmNhaGhpBNmFYcs3ih4AvkZlZAXAp8OseFP8S8Psg4xERiTfby2sZ1D+dQf3TYx1KxAXdB3Mv8D2grbtCZtYPWAT8ocNhB14zs5Vmdns3j73dzFaY2Yry8vJIxCwiEjVbS5Ozgx8CTDBmdhlQ5u4re1D8cuCdoy6PzXf3WcDFwJ1mdk5nD3T3h9x9trvPHjp06IkHLiISJe7O1iQdQQbBtmDmA1eEO+ufBC4ws8e7KHstR10ec/f94X/LgGeBucGFKiISfSXVDVQdbuZkJZhPx91/4O4F7l5IKIH82d1vOLqcmeUC5wLPdTjW38yy228DFwIfBhWriEgsrC46BMDM0QNjHEkwojGK7GPM7A4Ad38gfOhq4DV3r+tQbBjwrJlBKMYn3P2VqAYqIhKw1UUHSU9LYfLwTwyyTQpRSTDuvgRYEr79wFHnHgUePerYDmB6NGITEYmV1UWHmDYyl/S05Jzznpy1EhGJc00tbazbV8XMUQNiHUpglGBERGJgU3E1TS1tzBqTnP0voAQjIhITq4sOAjBztFowIiISQav3HCI/J5PhuX1jHUpglGBERGJgVdHBpG69gBKMiEjUldc0sqfysBKMiIhE1po9oQmWs5J0gmU7JRgRkShbXXSQtBRj6sjcWIcSKCUYEZEoW1V0kCkjcsjskxrrUAKlBCMiEkUtrW2s25vcEyzbKcGIiETRltJa6ptak3aBy46UYEREomj1nuSfYNlOCUZEJIpW7T7E4P7pjB7UL9ahBE4JRkQkilbvCU2wDG9HktSUYEREouRQfRM7yut6Rf8LKMGIiETNsu0HAJhTOCjGkUSHEoyISJQs3lxGTmYas3pBBz8owYiIREVbm7NkSznnnDyUtNTe8ae3d9RSRCTGNhZXU17TyPkT82IdStQowYiIRMHizWUAnDtxaIwjiR4lGBGRKPjzR2VML8hlSFZGrEOJGiUYEZGAVdY1sWbPIc7rRZfHQAlGEoy7xzoEkU9t6ZZy3OGCSb0rwaTFOgCRnvrj+mL+4bkNpKbApPwcJg3PZnJ+DmeOH0xeTmaswxPp0uKPyhjcP51pSb7/y9GUYCTu1TW28E8vbOC/V+xl2shcJgzLYnNxDe9uP0BTaxtpKcaFpwzj+tPHcOb4wb1iCQ5JHK1tzptbyrlgUh4pKb3rs6kEI3FtzZ5DfOvJ1eyurOfO88fzrYUn0yc8h6C5tY2tpbU8u3ovT6/cy8vrSxg3pD/nThzKsJxMhmZlkJeTwcT8bPKy1cKR2Fiz5xCH6pt71fDkdkowEpcamlv52RtbeWjpDvJzMnnytjM4fdzgj5Xpk5rClBE5TBkxhb++cCIvry/m9x8U8dTyPdQ3tR4pl5piXDw1n1vmFzJr9MAjLZy2NmfXgTpKqhoYn5dFXnaGWj8ScUs+KiPF4JwJvWd4cjslGIk7H+ys5G//sI4dFXV8YXYBf3/pFHL79un2MZl9UrlmVgHXzCoAoLaxhbLqBspqGnljUylPLt/Di+uKmTYylzPHD2ZjcTVr9xyiuqHlyHMM7NeHifnZTB2RyzWzCpgyIifQekrvsPijMk4bM5Dcft1/hpORJdOonNmzZ/uKFStiHYYcB3dna1ktjy3bxe/eL6JgYF9+cs2pnDVhSESev66xhWdW7+PRd3ay60A9k/KzmT5qADMKBjB8QCbby2r5qLSGzSU1bNhfTVNLG6eNGchN88awaGo+GWnJvXe6BKOsuoG5//IG31s0ka+dd1Ksw+mUma1099lBPLdaMBIzbW3Oit0HeW1DCa9vKmX3gXrM4Cvzx/I3F51Mv/TIfTz7Z6Rx4xljuOH00bS0+ZF+nHZnd7h8UVXfzNMr9/C794u468k1DMlK55sLJnDd3NG9Zg0piYynV+4F4DOTh8U4ktgIvAVjZqnACmCfu1921LnvAteH76YBk4Gh7l5pZouAnwGpwK/d/SfHei21YBJDY0srz63Zz0NLd7CtrJb01BTmjR/MZ6YMY+HkYeTnxkeHfFub8872Cn6xeDvv7jjApPxs/vHyKZw5PjKtKkluDc2tnPXTxUwZkcNvvzI31uF0KdFbMHcBm4BPXNB297uBuwHM7HLg2+HkkgrcD3wG2AssN7Pn3X1jFOKVgByobeTplXt55J2dlFY3Mnl4Dvd8YToXnpJPVkb8NaZTUoyzJwzlrJOG8OqGEv75pU1c96v3uWRaPjfNK+S0MQM/0RISafeHVXupqG3kjnPHxTqUmAn0t9rMCoBLgR8D3zlG8S8Bvw/fngtsc/cd4ed5ErgSUIKJM3WNLZTXNFJW00hZTQNtDsOyM8jPzWRYTiaVdU28uqGEVz4sYfmuStoc5p80mLs/N52zJwxJiFFbZsaiqcM5b2IeDy3dwS+XbOfl9SXkZKZxzslDWTA5j3MmDGVwJ2tMbS6p5plV+6hvamHmqIHMGD2AsYP797r5EL1Na5vzq6U7OLUgl3lHjX7sTYL+2ngv8D0gu7tCZtYPWAR8PXxoJLCnQ5G9wOldPPZ24HaA0aNHn2C40lNV9c3c+tvlLN91sEflJw7L5usXTOCSaflMyk/M0VmZfVL55oIJ3DK/kHe2VfDGpjIWf1TGi+uKMYOpI3I5e8IQzjppCNvKa3l6xV7W76uiT6qRkZbK4+8VAZDbtw/zxg3mi3NHcc6EoaR2SDb1TS38aVMZO8pruXz6CMYPzYpVdeUEvLqhhF0H6vnF9bMS4ktUUAJLMGZ2GVDm7ivN7LxjFL8ceMfdK9sf3kmZTjuL3P0h4CEI9cEcZ7jyKVTVN3Pjw++zubiGuxZMYPSgfuTlZJCXnYkZlFY3UFrdSGl1A31SjYWThzEuif5QZmf2YdHU4SyaOpy2Nmf9viqWbiln6dZyHly6g18s2Q7AlOE5/J/Lp3DljJHk9u3D9vJa1hQdYlXRQV7fWMorG0oYOaAvX5g9ion5Wby0voQ/bSzlcHNoDs+9f9rK2ROGcPO8Qs6flPexRCTxy9158M3tFA7ux0Wn5Mc6nJgKrJPfzP4VuBFoATIJ9cE84+43dFL2WeBpd38ifH8e8EN3vyh8/wcA7v6v3b2mOvmDV3W4mRt/E0ouv7xhFgt66eiYrtQ0NLN8VyV52ZlM7WbdqaaWNl7fWMqTy4t4a2sFEJqHc8m04Vw+fQTjhvTnqeWhkWwl1Q2MGtSXW84cyxfnjKJ/lPurDtQ28tL6YraWhlpVcwoH9upv5ceybHsF1/3qfX589VSuP31MrMM5piA7+aMyDybcgvmbo0eRhc/lAjuBUe5eFz6WBmwBFgD7gOXAde6+obvXUYLpuTV7DvHD5zfwtfPGc2EX37K2ldXQ1OIMzc5gUP90ahtbuPE377OpuJoHbjhNySVCig7Us+/QYWYXfnLQQHNrKBE9/PZOVuw+SG7fPtxwxmhunlcY6AKf9U0tvLahlP9vzT7e2lpBa5uTnppCU2sbk4fncPO8MVw5YyR907ueH1RR28jAfum9ruV108MfsHF/NW9//3wy+8T//KmkSjBmdgeAuz8QPvdlYJG7X3vUYy4h1IeTCjzs7j8+1usowfRMSVUDV9z3NuW1jbjD7eeM47sXTTzyx62suoEfv7yJ59bsP/KYFAv1QTS3tim5xMjK3Qf59Vs7eGVDCX1SUrhixgi+Mn/sx1YcONzUyovr9vPMqn0cOtxMWoqRkmKkpRh9+6SSlZFGVmYaWRlpFAzsy6wxAzllRA4Zaam4O2v2HOK/V+zhhbXF1Da2MCI3kytmjOSqmSMYPagfz63Zz2PLdrG5pIbcvn24euZIrp076ki/mrvz9rYKHlq6g7e2VlA4uB9/dd54rp5ZQHpa8o+427i/mkt+/hbfvWgid54fnxMrj5bwCSZajjfBuDtNrW29YrZ2Q3MrX3jwXbaX1fLU/5rHU8v38F/v7WZO4UDuvXYmr20o4Z7XttDY0sb/OnccU4bnUF7bSHlNIwfqmrhs2nDOPEnzQGJpV0UdD7+zk6dX7OVwcyvzxg3m2rmjWF10iGdW7aW6oYVxQ/ozbmgWbe60tDmtbW0cbmqltrGF2oYWahpaqGkMLZOTnprC1JE51Da2sKW0lr59Urlk2nA+P7uAuYWDPjHizd35YGcl//Xebl7bUEpTaxszRg1gwaQ8XlpfzOaSGoZmZ/C50wp4e2sF6/dVMTw3k9vOHsd1p49OiG/1x+t7/7OWF9YW894PFiTM0jBKMD10PAnmcFMrC+95k2vnjOIbCyYEFFl8cHe++eQaXly3n1/dOJuFU0KtkOfX7ucHf1jH4eZW2hzOnjCE/3vlVMYO6R/jiKU7VfXN/H55EY8t20VxVWhAxaKpw7lu7mjOGDfomP0kZdUNrCo6yKqiQ6zafRAzuHpmAZdPH052Zs/+OFbWNfHs6n08+UERW8tqmZCXxW3njOPKGSOOtIre2lrBfYu38cHOSj4zZRgP3XhaUvbhHKxr4ox/fYPPnlbAv1w9Ldbh9JgSTA8dbwvmyvveBjOeu3N+AFHFj/sXb+PuVz/qtPm+vbyWe17fwqXThnPx1Pyk/AOQrJpb21ix6yAThmXFbL93d6e4qoHhuZldfnYeWrqdf3l5M/90xSncfGZhdAOMgvb6vfKtsxNqKH6QCSb5L4r2wILJw1i75xBlNQ2xDiUwr3xYzN2vfsSVM0bwtfPGf+L8+KFZ3H/dLC6ZNlzJJcH0CS+1E6vkAqHJqCMG9O32s3Pb2eO4YFIeP35pExv2V0UxuuC1tjn/9d5u5o4dlFDJJWhKMMCCyaGNgJZsLo9xJMFYvquSbz65hpmjB/DTz56qBCIxYWb8v89PZ2D/PnzjidXUNbYc+0EJ4s0tZeypPMzN8wpjHUpcUYIhNCFuRG4mf9pUGutQIm5raQ23PraCggF9+c3Nc5K6g1Xi36D+6fzs2pnsOlDHPzz3YazDiZjHlu1mWE4GF56i0ZUdKcEQ+mZ1weQ83tpaQUNz67EfkCBKqhq4+eEPSE9L4bGvzGVQ//RYhyTCGeMG840LJvDMqn38aukO2toSux94Z0Udb24p57q5Y7T46VH0vxG2YPIwDje38u6OA7EOJSKqG5r58iMfUHW4mUe+PIdRg/rFOiSRI765YEKoP+blTVzzy2Ws23so1iEdt8ff201aivGluaNiHUrcUYIJmzduMP3SU3kjCS6TFR2o5/O/fJdtZbU8cONp3S5ZIhILqSnGb26ezT1fmM7eg4e58v53+Ltn11NanVgDbeqbWvjvFXu4eNrwQFdWSFTxtwlHjGT2SeWsk4bw501l+JUe9x3hDc2tVDc0k5f98Q/1su0VfO13q3CHR26Z87GdGkXiiZlxzawCFk4Zxr2vb+Wxd3fxxPtFjBzQl2kjczl1VC5zCwcxa/TAuN3e4NnV+6hpaOGmefG/5lgsKMF0sHDyMF7bWMrG4mpOGRG/3/p3VtRx/a/eY39VA6cW5HLRKfksmprPO9sq+KcXNjJ2SH9+fdNsCjVRUhJATmYf/vHyKVx3+mgWby5j7d5DrN9XxSsbSgAYkZvJZdNHcMX0EZwyIiduvvw1trTyi8XbmT5qALPHDIx1OHFJCaaD8yeFhiu/saksbhPMRyU13PCb92ltc+5aMIE3t5Rz96sfcferHwGwYFIe9147o8czsUXixUl5WZyU95dtHQ7VN/HmlnKeW7Ofh9/eyUNLdzApP5tvLZzARafEfjLwU8v3sO/QYX7y2WkxjyVeaSb/Ua68/x1w57mvnxWhqCJn/d4qbnz4fTLSUvjdradzUl5oH7fiqsO8vrGU1BTj2jmje93qtZL8DtY18ccPS3j4nZ1sK6tl2shcvnvRxJjtitrQ3Mo5/7aYwiH9eer2MxI6wQQ5k18tmKMsnJTHv7++hbLqhph32rW2OQfrm6iobWRraS1/98x6cvv14Ylbz2D04L+MChue25ebNMFLktjA/ulcd/povjhnFM+u3sd/vL6Fmx7+gNljBnL59BEsmJxHwcDojZR8/L3dlNU08p9fmpnQySVoasEcpX257Z9+dhpfnBObLZgPN7Vy8yMfsCK8h327cUP68/itpzNiQN+YxCUSLxpbWnlq+R4eXbaLHeV1QGhb7otOGcYd542nX3pw353rGls4+98Wc8qIHP7rq53u5J5Q1IKJosnDsykY2JcnPtjD508bFZPRK//80kaW76rk1rPGUjCwH0OyMhiSlc60gtxAf3FEEkVGWio3zSvkpnmF7Kyo441NpbyxqYz/XLyNt7dV8PCX5zCgXzATix9dtovKuib++sKJgTx/MtE8mKOYGd9eeDJrwxsvRdvrG0v53ftF3H72OP7+0incfGYhl546nNPHDVZyEenE2CH9ufXscfz+9jP45fWz+HBfNV988L1A5tRUHW7mwTe3s3ByHjNGDYj48ycbJZhOXDNrJHMLB/GTVzZTWdcUtdctq27g+39YxykjcvjOhSdH7XVFksWiqcN59JY57D1Yz+ceWMauin7wdJsAAAr/SURBVLqIPXd1QzN/9+x6qhta+PZn9PvZE0ownTAzfnTVVGobWvjpHzdH5TXb2py/fnot9U0t/OzaGb1id02RIJx50hCeuO0Mahta+NwD7/Lw2zs/0Zppa3PW7jnEvX/awj2vfcTj7+3m9Y2lrNt7iNpOVnl+dUMJC//9Tf64vpjvfObkuJ3GEG90zaULE/Oz+epZY3lw6Q6+MKeA08YMCvT1Hlm2i7e2VvDPV009MvxYRI7P9FEDePqOM/n2U2v4vy9u5EcvbWRu4SAuPCWfnRW1vL6xlNLqRtq7WDsOpkkxmJifw6zRA5g5eiBvbCrljx+WMHl4Dr++eTanFujSWE9pFFk36hpbWHjPmwzol84LX59PdUMLL60v5oU1+6msb2Lh5GFcPDWfUwtyP9VQRXdnS2kty3dVsnxXJSt2HWTfocMsnJzHr26arWGPIhG0rayWF9ft54W1+9leXkffPqmce/JQLjxlGBdMyiMrI40DdU2UVDVQUt3ApuJqVu4+yJqiQ9Q0tpCelsK3Fk7gtrPHJeVqydoyuYcinWAA/ri+mL/63SomD89ha2kNLW3OhLwshuVk8t6OA7S0OSMH9OWCSXlMK8hlyvAcJgzL+tglrpbWNspqGlm2/QBvby3n7W0HqKhtBCAvO4M5hYOYXTiQz88eRVaGGpUiQXB39lQeJi8no0f7IrW2OdvKasnt24f83ORdyFIJpoeCSDDuzp1PrGLd3iouO3UEV84YwaT8bMyMQ/VN/GlTGa98WMyy7QeobwrtJZOWEto+9nBzK7UNLRzusMfMkKx05p80hPknDeGMsYMZNaj7bWZFRIKkBNNDQSSYnmprc3ZX1rNxfzWbiqspqqynf0YqWRlpZGX0YUC/PswpHMSk/Oy4XRlWRHofTbRMACkpxtgh/Rk7pD+Xnjo81uGIiMRc8vVYiYhIXFCCERGRQCjBiIhIIJRgREQkEEowIiISCCUYEREJhBKMiIgEQglGREQCkVQz+c2sHNh91OFcoOoYxzreP9btIUDFCYTZWTw9LfNp63L0/fbbyVSXjrdPpD4nUpeuzulz9pdjem96FuuxygTx3kx092CWcHf3pP4BHjrWsY73j3UbWBHpeHpa5tPWpZs6JE1dIlWfE6mLPmfdf8703iTve3Osn95wieyFHhx74VPejnQ8PS3zaety9P0XuihzvOKhLj2N41hOpC5dndPnLDL03nR/PJbvTbeS6hJZNJjZCg9oYbhoS6a6QHLVJ5nqAslVn2SqCwRbn97Qgom0h2IdQAQlU10gueqTTHWB5KpPMtUFAqyPWjAiIhIItWBERCQQSjAiIhKIXp1gzOxhMyszsw+P47Gnmdl6M9tmZj+3Dvsem9kXzGyjmW0wsyciG3WX8US8Lmb2ZTMrN7M14Z9bIx95lzEF8t6Ez3/OzNzMotJRG9B7c0f4+Boze9vMpkQ+8k7jCaIu3wn/vqwzszfMbEzkI+8ypiDqc46ZrTKzFjP7XOSj/kQcx12HLp7vZjPbGv65ucPxsWb2fvj4U2aWfswnC2r8cyL8AOcAs4APj+OxHwDzAAP+CFwcPj4BWA0MDN/PS+C6fBm4L1nem/C5bGAp8B4wO1HrAuR0KHMF8EoC1+V8oF/49l8BTyXy5wwoBE4Ffgt8Ll7rACwBCo86NgjYEf53YPh2+9+y/wauDd9+APirY71Gr27BuPtSoLLjMTMbb2avmNlKM3vLzCYd/TgzG07oF/xdD/1v/xa4Knz6NuB+dz8Yfo2yYGsRElBdYibA+vwI+DegIcDwPyaIurh7dYei/YGojNYJqC6L3b0+XPQ9oCDYWvxFQPXZ5e7rgLYoVOG469CFi4DX3b0y/DfsdWBRuHV2AfA/4XKP0YO/E706wXThIeAb7n4a8DfALzopMxLY2+H+3vAxgJOBk83sHTN7z8wWBRpt9060LgCfDV+6+B8zGxVcqD1yQvUxs5nAKHd/MehAe+CE3xszu9PMthNKmN8MMNZjicTnrN1XCbUGYimS9YmVntShMyOBPR3ut9drMHDI3VuOOt6ttB6H2wuYWRZwJvB0h8v2GZ0V7eRY+zfINEKXyc4j9E3sLTOb6u6HIhtt9yJUlxeA37t7o5ndQehbywWRjrUnTrQ+ZpYC/Aehy34xFaH3Bne/H7jfzK4D/jdwcyflAxWpuoSf6wZgNnBuJGP8NCJZn1jprg5mdgtwV/jYScDLZtYE7HT3q+m6XsdVXyWYj0shlKVndDxoZqnAyvDd54Ff8vFmfAGwP3x7L/CeuzcDO83sI0IJZ3mQgXfihOvi7gc6HP8V8NPAoj22E61PNjAVWBL+pcsHnjezK9x9RcCxHy0Sn7OOngyXjYWI1MXMFgJ/D5zr7o2BRty9SL83sdBpHQDc/RHgEQAzWwJ82d13dSiyl9CX43YFhPpqKoABZpYWbsX0rL5Bd0DF+w+hDrkPO9xfBnw+fNuA6V08bjlwBn/p4LskfHwR8Fj49hBCzc3BCVqX4R3KXE0ocSbse3NUmSVEqZM/oPdmQocylxPggoVRqMtMYHvHOiXD5wx4lCh08h9vHei6k38noQ7+geHbg8LnnubjnfxfO2ZcsXhD4+UH+D1QDDQTytxfBcYCrwBrgY3AP3bx2NnAh+FfjPv4y6oIBtwTfuz69jckQevyr8CG8OMXA5MS+b05qswSojeKLIj35mfh92ZN+L05JYHr8iegNFyXNcDzifw5A+aEn6sOOABsiMc60EmCCR//CrAt/HNLh+PjCI2c20Yo2WQcKzYtFSMiIoHQKDIREQmEEoyIiARCCUZERAKhBCMiIoFQghERkUAowUhSM7PaKL/esgg9z3lmVmVmq81ss5n9vx485iqL0qrKIj2hBCPyKZhZt6tfuPuZEXy5t9x9JqGJiJeZ2fxjlL8KUIKRuKGlYqTXMbPxwP3AUKAeuM3dN5vZ5YTW9EonNEHuencvNbMfAiMIzZauMLMtwGhCE89GA/e6+8/Dz13r7llmdh7wQ0JLbEwltMzIDe7uZnYJocm4FcAqYJy7X9ZVvO5+2MzW8JdFO28Dbg/HuQ24EZhBaNn+c83sfwOfDT/8E/U8gf86kU9FLRjpjbpaafZt4Ixwq+FJ4HsdHnMacKW7Xxe+P4nQ0uZzgf9jZn06eZ2ZwLcItSrGAfPNLBN4kNDeIWcR+uPfLTMbSGg9u6XhQ8+4+xx3nw5sAr7q7ssIrZH1XXef4e7bu6mnSFSoBSO9yjFWyy0Angrv9ZFOaB2mds+7++EO91/y0KKMjWZWBgzj48u3A3zg7nvDr7uGUAuoFtjh7u3P/XtCrZHOnG1m64CJwE/cvSR8fKqZ/TMwAMgCXv2U9RSJCiUY6W26XGkW+E/gHnd/vsMlrnZ1R5XtuOJvK53/LnVWprNlz7vylrtfZmYnA2+b2bPuvobQIopXuftaM/syH1/9tl139RSJCl0ik17FQztB7jSzzwNYyPTw6VxgX/h2UHurbAbGmVlh+P4Xj/UAd99CaOHR74cPZQPF4cty13coWhM+d6x6ikSFEowku35mtrfDz3cI/VH+qpmtJbQi8ZXhsj8kdEnpLUId8BEXvsz2NeAVM3ub0CrCVT146APAOWY2FvgH4H1C29l27LR/EvhueGjzeLqup0hUaDVlkSgzsyx3rw3vc34/sNXd/yPWcYlEmlowItF3W7jTfwOhy3IPxjgekUCoBSMiIoFQC0ZERAKhBCMiIoFQghERkUAowYiISCCUYEREJBD/P7/7mc9ITbRiAAAAAElFTkSuQmCC\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"learn.recorder.plot(skip_end=15)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First of all, we see that if we choose the learning rate too big, the loss function starts to increase. We want to avoid this at all costs. So we want to find the spot where the loss function decreases the steepest with the largest learning rate. In this case, a good value is `1e-2`. The first parameter determines how many epochs we train. One epoch corresponds to one pass through the training set."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
" \n",
"
\n",
"
epoch
\n",
"
train_loss
\n",
"
valid_loss
\n",
"
accuracy
\n",
"
time
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
4.145812
\n",
"
4.027751
\n",
"
0.294872
\n",
"
24:40
\n",
"
\n",
" \n",
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn.fit_one_cycle(1, 1e-2, moms=(0.8,0.7))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Deep learning models learn more and more abstractions with each layer. The first layers of an image model might learn about edges and textures in an image, and as you progress through the layers, you can see how the model combines edges to eyes or ears and eventually combines these to faces. Therefore the last few layers are usually the ones that are very task-specific while the others contain general information.\n",
"\n",
"For this reason, we usually start by just tuning the last few layers because we don't want to lose that information and then, in the end, fine-tune the whole model. This is what we did above: we just trained the last few layers. Now to get the best possible performance, we want to train the whole model. The `unfreeze()` function enables the training of the whole model. We train the model for 10 more epochs with a slightly lower learning rate."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn.unfreeze()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
" \n",
"
\n",
"
epoch
\n",
"
train_loss
\n",
"
valid_loss
\n",
"
accuracy
\n",
"
time
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
3.883628
\n",
"
3.846353
\n",
"
0.312791
\n",
"
25:13
\n",
"
\n",
"
\n",
"
1
\n",
"
3.830023
\n",
"
3.806326
\n",
"
0.319647
\n",
"
25:14
\n",
"
\n",
"
\n",
"
2
\n",
"
3.823724
\n",
"
3.776663
\n",
"
0.323472
\n",
"
25:14
\n",
"
\n",
"
\n",
"
3
\n",
"
3.790490
\n",
"
3.747104
\n",
"
0.327012
\n",
"
25:14
\n",
"
\n",
"
\n",
"
4
\n",
"
3.708543
\n",
"
3.720774
\n",
"
0.330030
\n",
"
25:14
\n",
"
\n",
"
\n",
"
5
\n",
"
3.685525
\n",
"
3.700337
\n",
"
0.332860
\n",
"
25:14
\n",
"
\n",
"
\n",
"
6
\n",
"
3.594311
\n",
"
3.683683
\n",
"
0.334876
\n",
"
25:14
\n",
"
\n",
"
\n",
"
7
\n",
"
3.559871
\n",
"
3.672719
\n",
"
0.336452
\n",
"
25:14
\n",
"
\n",
"
\n",
"
8
\n",
"
3.545992
\n",
"
3.668409
\n",
"
0.337152
\n",
"
25:14
\n",
"
\n",
"
\n",
"
9
\n",
"
3.494757
\n",
"
3.668425
\n",
"
0.337193
\n",
"
25:14
\n",
"
\n",
" \n",
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn.fit_one_cycle(10, 1e-3, moms=(0.8,0.7))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Save language model\n",
"Since the step above took about 4h we want to save the progress, so we don't have to repeat the step when we restart the notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# uncomment if you want to fine-tune the language mode\n",
"# data_lm.path = Path('')\n",
"# data_lm.save('data_lm.pkl')\n",
"# learn.save('fine_tuned')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load language model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"get_dataset(\"fine_tuned.pth\")\n",
"get_dataset(\"data_lm.pkl\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data_lm = load_data(Path(\"../data/\"), 'data_lm.pkl', bs=bs)\n",
"data_lm.path = path\n",
"\n",
"learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3, model_dir=\"../data/\")\n",
"learn.path = Path(\"\")\n",
"learn.load('fine_tuned');\n",
"learn.path = path"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Text generation\n",
"The objective of a language model is to predict the next word based on a sequence of words. We can use the trained model to generate some movie reviews:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"TEXT = \"I liked this movie because\"\n",
"N_WORDS = 40\n",
"N_SENTENCES = 2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"I liked this movie because it had a nice twist , i liked it . i like all of the characters in the movie , the same in each one . Displayed refreshing to see . The story and acting were really good\n",
"I liked this movie because it was about a person playing far , the same way . It had some good moments of humor , but it really did n't make an epic film . i really did n't understand why the audience was\n"
]
}
],
"source": [
"print(\"\\n\".join(learn.predict(TEXT, N_WORDS, temperature=0.75) for _ in range(N_SENTENCES)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise 1\n",
"Generate a few movie reviews with different input texts. Post the funniest review on the Teams channel."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Encoder\n",
"As mentioned previously, the last layers of a deep learning model are usually the most task-specific. In the case of language modeling, the last layer predicts the next word in a sequence. We want to do text classification, however, so we don't need that layer. Therefore, we discard the last layer and only save what is called the encoder. In the next step, we add a new layer on top of the encoder for text classification."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn.save_encoder('fine_tuned_enc')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train classifier\n",
"In this section, we will use the fine-tuned language model and build a text classifier on top of it. The procedure is very similar to the language model fine-tuning but needs some minor adjustments for text classification."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Preprocess data for classification\n",
"Preprocessing the data follows similar steps as for language modeling. The main differences are that 1) we don't want a random train/valid split, but the official one and 2) we want to label each element with its sentiment based on the folder name."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data_clas = (TextList.from_folder(path, vocab=data_lm.vocab)\n",
" #grab all the text files in path\n",
" .split_by_folder(valid='test')\n",
" #split by train and valid folder (that only keeps 'train' and 'test' so no need to filter)\n",
" .label_from_folder(classes=['neg', 'pos'])\n",
" #label them all with their folders\n",
" .databunch(bs=bs))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When we display a batch we see that the tokens look the same with the addition of a label column:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
" \n",
"
\n",
"
text
\n",
"
target
\n",
"
\n",
" \n",
" \n",
"
\n",
"
xxbos xxmaj match 1 : xxmaj tag xxmaj team xxmaj table xxmaj match xxmaj bubba xxmaj ray and xxmaj spike xxmaj dudley vs xxmaj eddie xxmaj guerrero and xxmaj chris xxmaj benoit xxmaj bubba xxmaj ray and xxmaj spike xxmaj dudley started things off with a xxmaj tag xxmaj team xxmaj table xxmaj match against xxmaj eddie xxmaj guerrero and xxmaj chris xxmaj benoit . xxmaj according to the rules
\n",
"
pos
\n",
"
\n",
"
\n",
"
xxbos xxmaj by now you 've probably heard a bit about the new xxmaj disney dub of xxmaj miyazaki 's classic film , xxmaj laputa : xxmaj castle xxmaj in xxmaj the xxmaj sky . xxmaj during late summer of 1998 , xxmaj disney released \" xxmaj kiki 's xxmaj delivery xxmaj service \" on video which included a preview of the xxmaj laputa dub saying it was due out
\n",
"
pos
\n",
"
\n",
"
\n",
"
xxbos xxmaj some have praised xxunk xxmaj lost xxmaj xxunk as a xxmaj disney adventure for adults . i do n't think so -- at least not for thinking adults . \\n \\n xxmaj this script suggests a beginning as a live - action movie , that struck someone as the type of crap you can not sell to adults anymore . xxmaj the \" crack staff \" of
\n",
"
neg
\n",
"
\n",
"
\n",
"
xxbos xxmaj by 1987 xxmaj hong xxmaj kong had given the world such films as xxmaj sammo xxmaj hung 's ` xxmaj encounters of the xxmaj spooky xxmaj kind ' xxmaj chow xxmaj yun xxmaj fat in xxmaj john xxmaj woo 's iconic ` a xxmaj better xxmaj tomorrow ' , ` xxmaj zu xxmaj warriors ' and the classic ` xxmaj mr xxmaj vampire ' . xxmaj jackie xxmaj
\n",
"
pos
\n",
"
\n",
"
\n",
"
xxbos xxmaj to be a xxmaj buster xxmaj keaton fan is to have your heart broken on a regular basis . xxmaj most of us first encounter xxmaj keaton in one of the brilliant feature films from his great period of independent production : ' xxmaj the xxmaj general ' , ' xxmaj the xxmaj navigator ' , ' xxmaj sherlock xxmaj jnr ' . xxmaj we recognise him as
\n",
"
neg
\n",
"
\n",
" \n",
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data_clas.show_batch()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load model\n",
"We create a text classifier model and load the pretrained encoder part from the fine-tuned language model into it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5, model_dir=\"../data/\")\n",
"learn.load_encoder('fine_tuned_enc');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Find the learning rate\n",
"Again, we need to find the best learning rate for training the classifier."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"learn.unfreeze()\n",
"learn.fit_one_cycle(2, slice(1e-3/(2.6**4),1e-3), moms=(0.8,0.7))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that after the last optimisation step the model can predict the sentiment on the test set with **94%** accuracy. This is roughly 10% better than our Naïve Bayes model from the last lecture. In other words, this model makes **3 times** fewer mistakes than the Naïve Bayes model!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Make predictions"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(Category pos, tensor(1), tensor([3.5403e-06, 1.0000e+00]))"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"learn.predict(\"I really loved that movie, it was awesome!\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise 2\n",
"Experiment with the trained classifier and see if you can fool it. Can you find a pattern that fools it consistently?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}