{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Deep Q Learning From Demonstrations (DQfD)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have seen a lot about DQN. We started off with vanilla DQN and then we saw various\n",
    "improvements such as double DQN, dueling network architecture, prioritized experience\n",
    "replay. We have also learned to build DQN to play atari games. We stored the agent's\n",
    "interactions with the environment in the experience buffer and made the agent to learn\n",
    "from those experiences. But the problem we encountered is that it took us lot of time for training. For learning in simulated environments it is fine but when we make our agent to\n",
    "learn in the real world environment it will cause a lot of problems. So, to overcome this\n",
    "researchers from Google's DeepMind introduced an improvement over DQN called Deep Q\n",
    "learning from demonstrations (DQfD).\n",
    "\n",
    "If we already have some demonstrations data then we can directly add those\n",
    "demonstrations to the experience replay buffer. For an example, consider an agent learning\n",
    "to play atari games, if we have already some demonstration data which tells our agent\n",
    "which state is better, which action gives good reward in a state then the agent can directly\n",
    "makes use of this data for learning. Even a small amount of demonstrations will increase\n",
    "the agent's performance and also minimizes the training time. Since these demonstrations\n",
    "data will be added directly to the prioritized experience replay buffer, the amount of data\n",
    "the agent can use from demonstration data and amount of data the agent can use from its\n",
    "own interaction for learning will be controlled by prioritized experience replay buffer as the\n",
    "experience will be prioritized.\n",
    "\n",
    "<br>\n",
    "\n",
    "Loss functions in DQfD will be the sum of various losses. In order to prevent our agent\n",
    "from overfitting to the demonstration data, we compute l2 regularization loss over the\n",
    "network weights. We compute TD loss as usual and also supervised loss to see how our\n",
    "agent is learning from the demonstration data. Authors of this paper experimented DQfD"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " with various environments and the performance of DQfD is better and faster than\n",
    "prioritized dueling double deep q networks.\n",
    "You can check this video to see how DQfD learned to play private eye game <a href='https://\n",
    "youtu.be/4IFZvqBHsFY'>https://\n",
    "youtu.be/4IFZvqBHsFY</a>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:anaconda]",
   "language": "python",
   "name": "conda-env-anaconda-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}