## NLP model creation and training

In [None]:
from fastai.gen_doc.nbdoc import *
from fastai.text import * 


The main thing here is [`RNNLearner`](/text.learner.html#RNNLearner). There are also some utility functions to help create and update text models.

## Quickly get a learner

In [None]:
show_doc(language_model_learner)

<h4 id="language_model_learner"><code>language_model_learner</code><a href="https://github.com/fastai/fastai/blob/master/fastai/text/learner.py#L189" class="source_link">[source]</a></h4>

> <code>language_model_learner</code>(**`data`**:[`DataBunch`](/basic_data.html#DataBunch), **`arch`**, **`config`**:`dict`=***`None`***, **`drop_mult`**:`float`=***`1.0`***, **`pretrained`**:`bool`=***`True`***, **`pretrained_fnames`**:`OptStrTuple`=***`None`***, **\*\*`learn_kwargs`**) → `LanguageLearner`

Create a [`Learner`](/basic_train.html#Learner) with a language model from `data` and `arch`.  

The model used is given by `arch` and `config`. It can be:

- an [`AWD_LSTM`](/text.models.awd_lstm.html#AWD_LSTM)([Merity et al.](https://arxiv.org/abs/1708.02182))
- a [`Transformer`](/text.models.transformer.html#Transformer) decoder ([Vaswani et al.](https://arxiv.org/abs/1706.03762))
- a [`TransformerXL`](/text.models.transformer.html#TransformerXL) ([Dai et al.](https://arxiv.org/abs/1901.02860))

They each have a default config for language modelling that is in <code>{lower_case_class_name}_lm_config</code> if you want to change the default parameter. At this stage, only the AWD LSTM support `pretrained=True` but we hope to add more pretrained models soon. `drop_mult` is applied to all the dropouts weights of the `config`, `learn_kwargs` are passed to the [`Learner`](/basic_train.html#Learner) initialization.

In [None]:
jekyll_note("Using QRNN (change the flag in the config of the AWD LSTM) requires to have cuda installed (same version as pytorch is using).")

<div markdown="span" class="alert alert-info" role="alert"><i class="fa fa-info-circle"></i> <b>Note: </b>Using QRNN (change the flag in the config of the AWD LSTM) requires to have cuda installed (same version as pytorch is using).</div>

In [None]:
path = untar_data(URLs.IMDB_SAMPLE)
data = TextLMDataBunch.from_csv(path, 'texts.csv')
learn = language_model_learner(data, AWD_LSTM, drop_mult=0.5)

In [None]:
show_doc(text_classifier_learner)

<h4 id="text_classifier_learner"><code>text_classifier_learner</code><a href="https://github.com/fastai/fastai/blob/master/fastai/text/learner.py#L249" class="source_link">[source]</a></h4>

> <code>text_classifier_learner</code>(**`data`**:[`DataBunch`](/basic_data.html#DataBunch), **`arch`**:`Callable`, **`bptt`**:`int`=***`70`***, **`max_len`**:`int`=***`1400`***, **`config`**:`dict`=***`None`***, **`pretrained`**:`bool`=***`True`***, **`drop_mult`**:`float`=***`1.0`***, **`lin_ftrs`**:`Collection`\[`int`\]=***`None`***, **`ps`**:`Collection`\[`float`\]=***`None`***, **\*\*`learn_kwargs`**) → `TextClassifierLearner`

Create a [`Learner`](/basic_train.html#Learner) with a text classifier from `data` and `arch`.  

Here again, the backbone of the model is determined by `arch` and `config`. The input texts are fed into that model by bunch of `bptt` and only the last `max_len` activations are considered. This gives us the backbone of our model. The head then consists of:
- a layer that concatenates the final outputs of the RNN with the maximum and average of all the intermediate outputs (on the sequence length dimension),
- blocks of ([`nn.BatchNorm1d`](https://pytorch.org/docs/stable/nn.html#torch.nn.BatchNorm1d), [`nn.Dropout`](https://pytorch.org/docs/stable/nn.html#torch.nn.Dropout), [`nn.Linear`](https://pytorch.org/docs/stable/nn.html#torch.nn.Linear), [`nn.ReLU`](https://pytorch.org/docs/stable/nn.html#torch.nn.ReLU)) layers.

The blocks are defined by the `lin_ftrs` and `drops` arguments. Specifically, the first block will have a number of inputs inferred from the backbone arch and the last one will have a number of outputs equal to data.c (which contains the number of classes of the data) and the intermediate blocks have a number of inputs/outputs determined by `lin_ftrs` (of course a block has a number of inputs equal to the number of outputs of the previous block). The dropouts all have a the same value ps if you pass a float, or the corresponding values if you pass a list. Default is to have an intermediate hidden size of 50 (which makes two blocks model_activation -> 50 -> n_classes) with a dropout of 0.1.

In [None]:
path = untar_data(URLs.IMDB_SAMPLE)
data = TextClasDataBunch.from_csv(path, 'texts.csv')
learn = text_classifier_learner(data, AWD_LSTM, drop_mult=0.5)

In [None]:
show_doc(RNNLearner)

<h2 id="RNNLearner"><code>class</code> <code>RNNLearner</code><a href="https://github.com/fastai/fastai/blob/master/fastai/text/learner.py#L43" class="source_link">[source]</a></h2>

> <code>RNNLearner</code>(**`data`**:[`DataBunch`](/basic_data.html#DataBunch), **`model`**:[`Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module), **`split_func`**:`OptSplitFunc`=***`None`***, **`clip`**:`float`=***`None`***, **`alpha`**:`float`=***`2.0`***, **`beta`**:`float`=***`1.0`***, **`metrics`**=***`None`***, **\*\*`learn_kwargs`**) :: [`Learner`](/basic_train.html#Learner)

Basic class for a [`Learner`](/basic_train.html#Learner) in NLP.  

Handles the whole creation from <code>data</code> and a `model` with a text data using a certain `bptt`. The `split_func` is used to properly split the model in different groups for gradual unfreezing and differential learning rates. Gradient clipping of `clip` is optionally applied. `alpha` and `beta` are all passed to create an instance of [`RNNTrainer`](/callbacks.rnn.html#RNNTrainer). Can be used for a language model or an RNN classifier. It also handles the conversion of weights from a pretrained model as well as saving or loading the encoder.

In [None]:
show_doc(RNNLearner.get_preds)

<h4 id="RNNLearner.get_preds"><code>get_preds</code><a href="https://github.com/fastai/fastai/blob/master/fastai/text/learner.py#L77" class="source_link">[source]</a></h4>

> <code>get_preds</code>(**`ds_type`**:[`DatasetType`](/basic_data.html#DatasetType)=***`<DatasetType.Valid: 2>`***, **`with_loss`**:`bool`=***`False`***, **`n_batch`**:`Optional`\[`int`\]=***`None`***, **`pbar`**:`Union`\[`MasterBar`, `ProgressBar`, `NoneType`\]=***`None`***, **`ordered`**:`bool`=***`False`***) → `List`\[`Tensor`\]

Return predictions and targets on the valid, train, or test set, depending on `ds_type`.  

If `ordered=True`, returns the predictions in the order of the dataset, otherwise they will be ordered by the sampler (from the longest text to the shortest). The other arguments are passed [`Learner.get_preds`](/basic_train.html#Learner.get_preds).

### Loading and saving

In [None]:
show_doc(RNNLearner.load_encoder)

<h4 id="RNNLearner.load_encoder"><code>load_encoder</code><a href="https://github.com/fastai/fastai/blob/master/fastai/text/learner.py#L61" class="source_link">[source]</a></h4>

> <code>load_encoder</code>(**`name`**:`str`)

Load the encoder `name` from the model directory.  

In [None]:
show_doc(RNNLearner.save_encoder)

<h4 id="RNNLearner.save_encoder"><code>save_encoder</code><a href="https://github.com/fastai/fastai/blob/master/fastai/text/learner.py#L55" class="source_link">[source]</a></h4>

> <code>save_encoder</code>(**`name`**:`str`)

Save the encoder to `name` inside the model directory.  

In [None]:
show_doc(RNNLearner.load_pretrained)

<h4 id="RNNLearner.load_pretrained"><code>load_pretrained</code><a href="https://github.com/fastai/fastai/blob/master/fastai/text/learner.py#L68" class="source_link">[source]</a></h4>

> <code>load_pretrained</code>(**`wgts_fname`**:`str`, **`itos_fname`**:`str`, **`strict`**:`bool`=***`True`***)

Load a pretrained model and adapts it to the data vocabulary.  

Opens the weights in the `wgts_fname` of `self.model_dir` and the dictionary in `itos_fname` then adapts the pretrained weights to the vocabulary of the <code>data</code>. The two files should be in the models directory of the `learner.path`.

## Utility functions

In [None]:
show_doc(convert_weights)

<h4 id="convert_weights"><code>convert_weights</code><a href="https://github.com/fastai/fastai/blob/master/fastai/text/learner.py#L27" class="source_link">[source]</a></h4>

> <code>convert_weights</code>(**`wgts`**:`Weights`, **`stoi_wgts`**:`Dict`\[`str`, `int`\], **`itos_new`**:`StrList`) → `Weights`

Convert the model `wgts` to go with a new vocabulary.  

Uses the dictionary `stoi_wgts` (mapping of word to id) of the weights to map them to a new dictionary `itos_new` (mapping id to word).

## Get predictions

In [None]:
show_doc(LanguageLearner, title_level=3)

<h3 id="LanguageLearner"><code>class</code> <code>LanguageLearner</code><a href="https://github.com/fastai/fastai/blob/master/fastai/text/learner.py#L107" class="source_link">[source]</a></h3>

> <code>LanguageLearner</code>(**`data`**:[`DataBunch`](/basic_data.html#DataBunch), **`model`**:[`Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module), **`split_func`**:`OptSplitFunc`=***`None`***, **`clip`**:`float`=***`None`***, **`alpha`**:`float`=***`2.0`***, **`beta`**:`float`=***`1.0`***, **`metrics`**=***`None`***, **\*\*`learn_kwargs`**) :: [`RNNLearner`](/text.learner.html#RNNLearner)

Subclass of RNNLearner for predictions.  

In [None]:
show_doc(LanguageLearner.predict)

<h4 id="LanguageLearner.predict"><code>predict</code><a href="https://github.com/fastai/fastai/blob/master/fastai/text/learner.py#L110" class="source_link">[source]</a></h4>

> <code>predict</code>(**`text`**:`str`, **`n_words`**:`int`=***`1`***, **`no_unk`**:`bool`=***`True`***, **`temperature`**:`float`=***`1.0`***, **`min_p`**:`float`=***`None`***, **`sep`**:`str`=***`' '`***, **`decoder`**=***`'decode_spec_tokens'`***)

Return the `n_words` that come after `text`.  

If `no_unk=True` the unknown token is never picked. Words are taken randomly with the distribution of probabilities returned by the model. If `min_p` is not `None`, that value is the minimum probability to be considered in the pool of words. Lowering `temperature` will make the texts less randomized. 

In [None]:
show_doc(LanguageLearner.beam_search)

<h4 id="LanguageLearner.beam_search"><code>beam_search</code><a href="https://github.com/fastai/fastai/blob/master/fastai/text/learner.py#L128" class="source_link">[source]</a></h4>

> <code>beam_search</code>(**`text`**:`str`, **`n_words`**:`int`, **`top_k`**:`int`=***`10`***, **`beam_sz`**:`int`=***`1000`***, **`temperature`**:`float`=***`1.0`***, **`sep`**:`str`=***`' '`***, **`decoder`**=***`'decode_spec_tokens'`***)

Return the `n_words` that come after `text` using beam search.  

## Basic functions to get a model

In [None]:
show_doc(get_language_model)

<h4 id="get_language_model"><code>get_language_model</code><a href="https://github.com/fastai/fastai/blob/master/fastai/text/learner.py#L175" class="source_link">[source]</a></h4>

> <code>get_language_model</code>(**`arch`**:`Callable`, **`vocab_sz`**:`int`, **`config`**:`dict`=***`None`***, **`drop_mult`**:`float`=***`1.0`***)

Create a language model from `arch` and its `config`, maybe `pretrained`.  

In [None]:
show_doc(get_text_classifier)

<h4 id="get_text_classifier"><code>get_text_classifier</code><a href="https://github.com/fastai/fastai/blob/master/fastai/text/learner.py#L233" class="source_link">[source]</a></h4>

> <code>get_text_classifier</code>(**`arch`**:`Callable`, **`vocab_sz`**:`int`, **`n_class`**:`int`, **`bptt`**:`int`=***`70`***, **`max_len`**:`int`=***`1400`***, **`config`**:`dict`=***`None`***, **`drop_mult`**:`float`=***`1.0`***, **`lin_ftrs`**:`Collection`\[`int`\]=***`None`***, **`ps`**:`Collection`\[`float`\]=***`None`***) → [`Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module)

Create a text classifier from `arch` and its `config`, maybe `pretrained`.  

This model uses an encoder taken from the `arch` on `config`. This encoder is fed the sequence by successive bits of size `bptt` and we only keep the last `max_seq` outputs for the pooling layers.

The decoder use a concatenation of the last outputs, a `MaxPooling` of all the outputs and an `AveragePooling` of all the outputs. It then uses a list of `BatchNorm`, `Dropout`, `Linear`, `ReLU` blocks (with no `ReLU` in the last one), using a first layer size of `3*emb_sz` then following the numbers in `n_layers`. The dropouts probabilities are read in `drops`.

Note that the model returns a list of three things, the actual output being the first, the two others being the intermediate hidden states before and after dropout (used by the [`RNNTrainer`](/callbacks.rnn.html#RNNTrainer)). Most loss functions expect one output, so you should use a Callback to remove the other two if you're not using [`RNNTrainer`](/callbacks.rnn.html#RNNTrainer).

## Undocumented Methods - Methods moved below this line will intentionally be hidden

## New Methods - Please document or move to the undocumented section