{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training tweaks for an RNN"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [],
   "source": [
    "from fastai.gen_doc.nbdoc import *\n",
    "from fastai.callbacks.rnn import * "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [],
   "source": [
    "from fastai.gen_doc.nbdoc import *\n",
    "from fastai.callbacks.rnn import *"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This callback regroups a few tweaks to properly train RNNs. They all come from [this article](https://arxiv.org/abs/1708.02182) by Stephen Merity et al.\n",
    "\n",
    "**Activation Regularization:** on top of weight decay, we apply another form of regularization that is pretty similar and consists in adding to the loss a scaled factor of the sum of all the squares of the outputs (with dropout applied) of the various layers of the RNN. Intuitively, weight decay tries to get the network to learn small weights, this is to get the model to learn to produce smaller activations.\n",
    "\n",
    "**Temporal Activation Regularization:** lastly, we add to the loss a scaled factor of the sum of the squares of the `h_(t+1) - h_t`, where `h_i` is the output (before dropout is applied) of one layer of the RNN at the time step i (word i of the sentence). This will encourage the model to produce activations that don’t vary too fast between two consecutive words of the sentence. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h2 id=\"RNNTrainer\"><code>class</code> <code>RNNTrainer</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/callbacks/rnn.py#L8\" class=\"source_link\">[source]</a></h2>\n",
       "\n",
       "> <code>RNNTrainer</code>(**`learn`**, **`alpha`**:`float`=***`0.0`***, **`beta`**:`float`=***`0.0`***) :: [`LearnerCallback`](/basic_train.html#LearnerCallback)\n",
       "\n",
       "[`Callback`](/callback.html#Callback) that regroups lr adjustment to seq_len, AR and TAR.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(RNNTrainer)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a [`Callback`](/callback.html#Callback) that adds to learner the RNN tweaks for training on data with `bptt`. `alpha` is the scale for AR, `beta` is the scale for TAR.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Callback methods"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You don't call these yourself - they're called by fastai's [`Callback`](/callback.html#Callback) system automatically to enable the class's functionality."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"RNNTrainer.on_epoch_begin\"><code>on_epoch_begin</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/callbacks/rnn.py#L15\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>on_epoch_begin</code>(**\\*\\*`kwargs`**)\n",
       "\n",
       "Reset the hidden state of the model.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(RNNTrainer.on_epoch_begin)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"RNNTrainer.on_loss_begin\"><code>on_loss_begin</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/callbacks/rnn.py#L19\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>on_loss_begin</code>(**`last_output`**:`Tuple`\\[`Tensor`, `Tensor`, `Tensor`\\], **\\*\\*`kwargs`**)\n",
       "\n",
       "Save the extra outputs for later and only returns the true output.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(RNNTrainer.on_loss_begin)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The fastai RNNs return `last_output` that are tuples of three elements, the true output (that is returned) and the hidden states before and after dropout (which are saved internally for the next function)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": true
   },
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<h4 id=\"RNNTrainer.on_backward_begin\"><code>on_backward_begin</code><a href=\"https://github.com/fastai/fastai/blob/master/fastai/callbacks/rnn.py#L24\" class=\"source_link\">[source]</a></h4>\n",
       "\n",
       "> <code>on_backward_begin</code>(**`last_loss`**:`Rank0Tensor`, **`last_input`**:`Tensor`, **\\*\\*`kwargs`**)\n",
       "\n",
       "Apply AR and TAR to `last_loss`.  "
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_doc(RNNTrainer.on_backward_begin)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Undocumented Methods - Methods moved below this line will intentionally be hidden"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## New Methods - Please document or move to the undocumented section"
   ]
  }
 ],
 "metadata": {
  "jekyll": {
   "keywords": "fastai",
   "summary": "Implementation of a callback for RNN training",
   "title": "callbacks.rnn"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}