{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Quantum Circuit Builder v0\n",
    "\n",
    "In this notebook we rely on IBM *qiskit* [1], OpenAI *gym* [2] and the library *stable-baselines* [3] to setup a quantum game and have some artificial reinforcement learning agent play and learn them.\n",
    "\n",
    "We setup a very simple game, *qcircuit-v0*, and we compare the performances of different agents playing it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "First of all, let us setup the packages necessary for this simulation as explained in [Setup.ipynb](Setup.ipynb).\n",
    "\n",
    "Next, let us import some basic libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import gym\n",
    "\n",
    "from IPython.display import display"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Importing the game\n",
    "\n",
    "The game we will run is provided in **gym-qcircuit** [4], and it is implemented complying with the standard OpenAI gym interface. \n",
    "\n",
    "The game is a simple *quantum circuit building* game: given a fixed number of qubits and a desired final state for these qubits, the objective is to design a quantum circuit that takes the given qubits to the desired final state. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import qcircuit"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The module **qcircuit** offers two versions of the game:\n",
    "- *qcircuit-v0*: it presents the player with a single qubit, and it requires to design a simple circuit setting this qubit in a perfect superposition.\n",
    "- *qcircuit-v1*: a slightly more challenging scenario where the player is presented with two qubits and he/she is requested to design a circuit setting the qubits in the state $\\frac{1}{\\sqrt{2}}\\left|00\\right\\rangle +\\frac{1}{\\sqrt{2}}\\left|11\\right\\rangle $.\n",
    "\n",
    "Details on the implementation of these games are available at https://github.com/FMZennaro/gym-qcircuit/blob/master/qcircuit/envs/qcircuit_env.py."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## qcircuit-v0\n",
    "We start loading the first scenario and run agents on it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "env = gym.make('qcircuit-v0')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The game *qcircuit-v0* is *completely observed*, and both its *state space* and *action space* are small.\n",
    "\n",
    "Remember that a single qubit is described by $\\alpha\\left|0\\right\\rangle +\\beta\\left|1\\right\\rangle$, where $\\alpha, \\beta$ are complex numbers and $\\left|0\\right\\rangle, \\left|1\\right\\rangle$ are the measurement axes. The state space is then described by four real numbers between -1 and 1 representing the real and complex part of $\\alpha, \\beta$.\n",
    "\n",
    "An agent plays the game interacting with a quantum circuit, adding and removing standard gates. In this version of the game there are only three actions available: add an *X gate*, add a *Hadamard gate*, or remove the last inserted gate.\n",
    "\n",
    "Again, details on the implementation of the state space and the action space are available at https://github.com/FMZennaro/gym-qcircuit/blob/master/qcircuit/envs/qcircuit_env.py."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Random agent\n",
    "First, we simply run a random agent. This allows us to test out the game and see its evolution.\n",
    "\n",
    "A random agent selects a possible action from the action space at random and executes it. Given the limited amount of actions (including the possibility of undoing actions by removing a gate), and the simple objective, the random agent should be able to land on the right circuit in a limited amount of actions. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAGMAAAB7CAYAAABgvj5jAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAD1UlEQVR4nO3czyt8exzH8ddclyGZmJGUETFSI5T5AyxsTlFqSiJSChsrm3tLtnyvlPXcvcVITbPBwsJQs8DCz+RsFEqxIJSNfO7K/Sb3dt3FOZ9XzetRZ3NOzXnXs885p6lzAsYYA6Hwi+0B5CfFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQoYnx/v6OpaUltLS0oLS0FJ2dncjlcmhtbcXk5KTt8Xzxq+0BPoyPjyOTyWBubg6JRAL5fB5DQ0O4v7/HzMyM7fH8YQisrKwYAGZ7e/vT/mQyaQCY/f19S5P5i+IytbCwAMdx0N3d/Wl/LBZDcXEx2tvbLU3mL+sxbm5ucHp6ioGBgS/Hrq6u0NbWhmAw6Nn5A4GA59t3UcQAgNra2k/7X19fkcvlkEgkbIxlhfUY1dXVAADXdT/tX1xcxO3tLbq6ujw9vzHG8+27rD9NNTU1oaOjA/Pz8wiHw6irq8Pa2hrW19cBoKBWRsD8n3QecV0XU1NT2NvbQyQSwdjYGCoqKjA7O4unpyeUlZXZHtEXFDH+yejoKI6OjnB8fGx7FN9Yv2f8m4ODg4K6RAGkMV5eXuC6ruc3bza0l6lCRLkyCpViEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUgwhVjGw2i76+PtTU1CAYDKKhoQHDw8M4OTmxPZovKF4je3t7w8jICNLpNKLRKHp7exEKheC6LjY2NpDNZuE4ju0xPWf9pXwAmJ6eRjqdxsTEBJaXl1FeXv73sevra1RWVnp27t//+NOz3/7w47fvfS/Leozd3V2kUik4joNUKvXlwyf19fWWJrPA848o/YePb0odHh7aHsU66/eMUCiESCSCy8tLK+dnukxZfZp6fHzE8/MzGhsbbY5Bw+rKeHh4QDgcRjwex9nZma0xaFhdGVVVVWhubsb5+Tm2tra+HL+4uLAwlT3W7xmrq6sYHBxEUVER+vv7EYvFcHd3h3w+j3g8jkwmY3M8f9l8eviwublpenp6TCgUMiUlJSYajZpkMml2dnZsj+Yr6ytDfqL6b6rQKQYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSDyF6IOl5yTF0T5AAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 113.176x144.48 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAGMAAAB7CAYAAABgvj5jAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAD1UlEQVR4nO3czyt8exzH8ddclyGZmJGUETFSI5T5AyxsTlFqSiJSChsrm3tLtnyvlPXcvcVITbPBwsJQs8DCz+RsFEqxIJSNfO7K/Sb3dt3FOZ9XzetRZ3NOzXnXs885p6lzAsYYA6Hwi+0B5CfFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQoYnx/v6OpaUltLS0oLS0FJ2dncjlcmhtbcXk5KTt8Xzxq+0BPoyPjyOTyWBubg6JRAL5fB5DQ0O4v7/HzMyM7fH8YQisrKwYAGZ7e/vT/mQyaQCY/f19S5P5i+IytbCwAMdx0N3d/Wl/LBZDcXEx2tvbLU3mL+sxbm5ucHp6ioGBgS/Hrq6u0NbWhmAw6Nn5A4GA59t3UcQAgNra2k/7X19fkcvlkEgkbIxlhfUY1dXVAADXdT/tX1xcxO3tLbq6ujw9vzHG8+27rD9NNTU1oaOjA/Pz8wiHw6irq8Pa2hrW19cBoKBWRsD8n3QecV0XU1NT2NvbQyQSwdjYGCoqKjA7O4unpyeUlZXZHtEXFDH+yejoKI6OjnB8fGx7FN9Yv2f8m4ODg4K6RAGkMV5eXuC6ruc3bza0l6lCRLkyCpViEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUgwhVjGw2i76+PtTU1CAYDKKhoQHDw8M4OTmxPZovKF4je3t7w8jICNLpNKLRKHp7exEKheC6LjY2NpDNZuE4ju0xPWf9pXwAmJ6eRjqdxsTEBJaXl1FeXv73sevra1RWVnp27t//+NOz3/7w47fvfS/Leozd3V2kUik4joNUKvXlwyf19fWWJrPA848o/YePb0odHh7aHsU66/eMUCiESCSCy8tLK+dnukxZfZp6fHzE8/MzGhsbbY5Bw+rKeHh4QDgcRjwex9nZma0xaFhdGVVVVWhubsb5+Tm2tra+HL+4uLAwlT3W7xmrq6sYHBxEUVER+vv7EYvFcHd3h3w+j3g8jkwmY3M8f9l8eviwublpenp6TCgUMiUlJSYajZpkMml2dnZsj+Yr6ytDfqL6b6rQKQYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSDyF6IOl5yTF0T5AAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 113.176x144.48 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAJEAAAB7CAYAAAB0B2LHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAFOElEQVR4nO3dT2gUZxyH8e/sJiY2EpIYgqCiaERIcAMKHtoG8dB2i6ke0iRV9KAQU6uHgLT10j/UNrFJWntoD+lNaJWNYcNeXKShmEZzMEH8l4p7MEUFQQQjColtdqeHUotYmzU/l3fWPh+YywxkfoSHd2aZbMbzfd8XYBByPQDyHxHBjIhgRkQwIyKYERHMiAhmRAQzIoIZEcGMiGBGRDAjIpgREcyICGZEBDMighkRwYyIYEZEMCMimBERzIgIZkQEMyKCGRHBjIhgVuB6gHww76ekk/P+/tqbTs77rFiJYEZEMCMimBERzIgIZkQEMyKCGRHBjIhgFpiIMpmMenp6tGrVKhUXF6uurk5DQ0NavXq1du/e7Xq8rPlTU/qj6R1lhk//s+/hQ82079fMZ1/Iz2QcTpcbgYlo165dOnjwoNra2pRMJtXc3KytW7fq2rVrWrdunevxsubNn6/Q241K/3hMvu/LT6eV/rxTKixU+MAH8kKB+ZU/N4F4dnb06FEdOXJEp06d0oYNGyRJGzdu1Llz5xSPx/MqIkkKbX5Lmf64/NNnlDk7Jv/OHRX0dMmbV+h6tJwIRESdnZ2KRqOPAvpbdXW1CgsLtWbNGkeTzY03v1ihpkalu7+WystU8M1X8kpecj1WzjhfW2/evKnLly+rqanpiWPXr19XbW2tioqKcnZ+z/Nm3eZselrhlmZ55eU5my2XW7YCEZEkLVq06LH9U1NTGhoayrtLmSRlBn9WJtYn743XlR5I6EV/84XziCorKyVJqVTqsf1dXV26deuW1q5dm9Pz+74/6/YsMmdHlf72O4U/+Ujh996VJifl/zKcs9lyuWXL+T3RihUrFIlE1NHRoYqKCi1evFj9/f06ceKEJOXVSpQZ/1XpjkMKv79fochf93Ghpkalfzgmr/7VF/KTmRSAlSgUCun48eOqra3Vnj17tHPnTlVWVmrv3r0qKChQJBJxPWJW/InflP74U4XbWhV65eVH+0ObG6R79+a8GuUDL6ivqtqxY4cuXLigixcvuh6FP4+dhfOV6GnGxsby6lL2fxbIiB48eKBUKpXzm2o8H85vrP/NggULlE6nXY+BLAVyJUJ+ISKYERHMiAhmRAQzIoIZEcEssI89kD9YiWBGRDAjIpgREcyICGZEBDMighkRwYyIYEZEMCMimBERzIgIZkQEMyKCGRHBjIhgRkQwIyKYERHMiAhmRAQzIoIZEcGMiGAWqIgSiYQaGhpUVVWloqIiLVu2TNu2bdOlS5dcj4b/EIivUc/MzGj79u2KxWJasmSJNm3apNLSUqVSKSWTSSUSCUWjUddj4ikC8T8b9+3bp1gsptbWVh0+fFglJSWPjt24cUNlZWU5O/eBL7/P2c/Od4c+zO49c84jGh4eVm9vr6LRqHp7e594McnSpUsdTYZsOb+cNTY2Kh6P6/z586qrq3M5CubIeUSlpaVauHChJiYmnJyfy9nTZXs5c/rpbHJyUvfv39fy5ctdjgEjpyvR3bt3VVFRoZqaGo2Pj7saA0ZOV6Ly8nKtXLlSV65c0eDg4BPHr1696mAqPCvn90R9fX1qaWlROBzWli1bVF1drdu3b2tkZEQ1NTUaGBhwOR6y4DwiSTp58qS6u7s1Ojqq6elpVVVVaf369Wpvb1d9fb3r8TCLQESE/BaoZ2fIT0QEMyKCGRHBjIhgRkQwIyKYERHMiAhmRAQzIoIZEcGMiGBGRDAjIpgREcyICGZEBDMighkRwYyIYEZEMCMimBERzIgIZkQEMyKC2Z+VCsVwnKYocAAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 173.376x144.48 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAL4AAAB7CAYAAADKUTqaAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAGZUlEQVR4nO3df0zUdRzH8df37hAQRH51sIRQxFwwjk3NtpI5/yhPRd0iIU3bdAGabOFcpX9ULgsMKP2j1mj1h5s/BjIY/8gs1iCMNo85Uch5saOhjSAX58AA5e7bHy7azZIDPD5f7v16bPfP57j7voHnPny/twM0Xdd1EAljUj0AkQoMn0Ri+CQSwyeRGD6JxPBJJIZPIjF8Eonhk0gMn0Ri+CQSwyeRGD6JxPBJJIZPIjF8Eonhk0gMn0Ri+CQSwyeRGD6JxPBJJIZPIjF8Eonhk0gMn0Ri+CSSRfUAc8G87xqVHPfeixtm9PiS049pkCk48drsH3M6uOOTSAyfRGL4JBLDJ5EYPonE8Ekkhk8iMXwSieGTSIYJ3+v1orKyEsuWLUNYWBiysrLQ0tKC5cuXo7CwUPV4ftNHRnB/26vwtl78d21sDOMlBzH+4cfQvV6F0z3a18VPorP5G581Xdfx5RtR6HbUK5oqMAwT/p49e3D06FEUFRWhsbEReXl52L59O1wuF1auXKl6PL9p4eEwvZILz+mz0HUduscDz0dlQEgIzIfegWYyzJfcx/Cfv+Guuw9PPJXls35nwIV7o0NISF2laLLAMMR7dc6cOYOTJ0+iubkZa9euBQCsW7cOly9fRl1d3ZwKHwBMWzbDW1sH/eKP8F5qh377NiyV5dDmhage7X/1uxzQTGbEJWX4rN/u7cD8hQlYEJesaLLAMET4ZWVlsNvtE9H/Iy0tDSEhIcjMzFQ02fRo4WEwbcuFp+IzICYalhOfQouYr3qsR+p3ORCT+DQs88J91v/o7YB1SXDt9oABwr916xY6Oztx4MCBh+7r7e1FRkYGQkNDA3Z8TdMm/ZiQb89P78lHR2HOz4MWEzOth/sz26O8dcr/f1rf73LA3d+Nqr3xPuv3x4axavNhv59npjPPlK779zkbInwASExM9FkfGRlBS0sLNm7cqGKsGfE2fQ9vdQ209S/BU98AbcN65UFMpr+nHc+9fATPrHndZ/304UwkBOGOr/xKKz7+wQ7jdDp91svLy9HX14cVK1YE9Pi6rk96mwrvJQc8n38B8wfvwfzmXsDthv5Da8Bmexxzu3/vxtjdQaTY1mNBXNLEzXN/FGN/uWGdwoXtTGeerc9Z+Y6fmpoKm82G0tJSxMbGYtGiRaitrcX58w9OL+bSha2362d4So/B/PZBmGwPrktM23LhOXUWWvYaw76i0+9ywBI6/6FXdPp+aUNkXDIiFiYomixwlH8nTCYTzp07h4yMDOzbtw+7d+9GfHw89u/fD4vFApvNpnpEv+g9v8Lz/hGYiwpgeuH5iXXTlhzgzp1p7/qzod/lQMKSZ2Ey++6Dfd0/BeVpDgBo+lR/ls+SXbt2oaOjA1evXlU9Cn/1cAr4q4cz1N7ePqdOc2huMWT4w8PDcDqdAb+wJbmUX9z+l8jISHg8HtVjUBAz5I5PFGgMn0Ri+CQSwyeRGD6JxPBJJIZPIhn2LQtEgcQdn0Ri+CQSwyeRGD6JxPBJJIZPIjF8Eonhk0gMn0Ri+CQSwyeRGD6JxPBJJIZPIjF8Eonhk0gMn0Ri+CQSwyeRGD6JxPBJJIZPIjF8Eonhk0gMn0QyVPgNDQ3IycmB1WpFaGgoUlJSsGPHDly7dk31aBRkDPEnBMfHx7Fz505UV1cjKSkJmzZtQlRUFJxOJxobG9HQ0AC73a56TAoihvgfWMXFxaiurkZBQQGOHz+OiIiIiftu3ryJ6OjogB370CdfBey5afYde7fQr49THn5rayuqqqpgt9tRVVUFTdN87k9OTlY0GQUz5ac6ubm5qKurw5UrV5CVlTX5A4geA+XhR0VFIS4uDj09PUqOz1Od4OLvqY7SV3XcbjeGhoawePFilWOQQEp3/MHBQcTGxiI9PR1dXV2qxiCBlO74MTExWLp0Ka5fv46mpqaH7r9x44aCqUgC5ef4NTU1yM/Ph9lsxtatW5GWloaBgQG0tbUhPT0d9fX1KsejIKU8fAC4cOECKioq4HA4MDo6CqvVitWrV6OkpATZ2dmqx6MgZIjwiWabod6rQzRbGD6JxPBJJIZPIjF8Eonhk0gMn0Ri+CQSwyeRGD6JxPBJJIZPIjF8Eonhk0gMn0Ri+CQSwyeRGD6JxPBJJIZPIjF8Eonhk0gMn0Ri+CQSwyeRGD6JxPBJpL8BrWtIpPjnpSwAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 233.576x144.48 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "env.reset()\n",
    "display(env.render())\n",
    "\n",
    "done = False\n",
    "while(not done):\n",
    "    obs, _, done, info = env.step(env.action_space.sample())\n",
    "    display(info['circuit_img'])\n",
    "       \n",
    "env.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### PPO2 Agent\n",
    "\n",
    "We now run a *PPO2* agent, a more sophisticated agent picked from the library of *stable_baselines*.\n",
    "\n",
    "First we import the agent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:\n",
      "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n",
      "For more information, please see:\n",
      "  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n",
      "  * https://github.com/tensorflow/addons\n",
      "  * https://github.com/tensorflow/io (for I/O related ops)\n",
      "If you depend on functionality not listed there, please file an issue.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from stable_baselines.common.policies import MlpPolicy\n",
    "from stable_baselines.common.vec_env import DummyVecEnv\n",
    "from stable_baselines import PPO2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we train it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/tf_util.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.\n",
      "\n",
      "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/tf_util.py:66: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.\n",
      "\n",
      "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/policies.py:115: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n",
      "\n",
      "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/input.py:25: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.\n",
      "\n",
      "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/policies.py:562: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Use keras.layers.flatten instead.\n",
      "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:332: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Please use `layer.__call__` method instead.\n",
      "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/a2c/utils.py:156: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n",
      "\n",
      "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/distributions.py:323: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.\n",
      "\n",
      "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/distributions.py:324: The name tf.log is deprecated. Please use tf.math.log instead.\n",
      "\n",
      "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/ppo2/ppo2.py:193: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.\n",
      "\n",
      "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/ppo2/ppo2.py:201: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.\n",
      "\n",
      "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1424: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Use tf.where in 2.0, which has the same broadcast rule as np.where\n",
      "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/ppo2/ppo2.py:209: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.\n",
      "\n",
      "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/ppo2/ppo2.py:243: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.\n",
      "\n",
      "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/ppo2/ppo2.py:245: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.\n",
      "\n",
      "--------------------------------------\n",
      "| approxkl           | 7.973169e-05  |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | -0.000982     |\n",
      "| fps                | 22            |\n",
      "| n_updates          | 1             |\n",
      "| policy_entropy     | 1.0985414     |\n",
      "| policy_loss        | -0.0070484644 |\n",
      "| serial_timesteps   | 128           |\n",
      "| time_elapsed       | 3.58e-06      |\n",
      "| total_timesteps    | 128           |\n",
      "| value_loss         | 3996.909      |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 3.2882737e-05 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | 0.000217      |\n",
      "| fps                | 30            |\n",
      "| n_updates          | 2             |\n",
      "| policy_entropy     | 1.0981455     |\n",
      "| policy_loss        | -0.004966901  |\n",
      "| serial_timesteps   | 256           |\n",
      "| time_elapsed       | 5.61          |\n",
      "| total_timesteps    | 256           |\n",
      "| value_loss         | 4248.403      |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 6.612671e-05  |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | -0.000358     |\n",
      "| fps                | 28            |\n",
      "| n_updates          | 3             |\n",
      "| policy_entropy     | 1.0974414     |\n",
      "| policy_loss        | -0.0062959143 |\n",
      "| serial_timesteps   | 384           |\n",
      "| time_elapsed       | 9.82          |\n",
      "| total_timesteps    | 384           |\n",
      "| value_loss         | 3907.8992     |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 1.0456845e-05 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | -0.000337     |\n",
      "| fps                | 30            |\n",
      "| n_updates          | 4             |\n",
      "| policy_entropy     | 1.0968205     |\n",
      "| policy_loss        | -0.002253918  |\n",
      "| serial_timesteps   | 512           |\n",
      "| time_elapsed       | 14.4          |\n",
      "| total_timesteps    | 512           |\n",
      "| value_loss         | 4101.919      |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 0.00016963946 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | -1.24e-05     |\n",
      "| fps                | 28            |\n",
      "| n_updates          | 5             |\n",
      "| policy_entropy     | 1.095028      |\n",
      "| policy_loss        | -0.0103989765 |\n",
      "| serial_timesteps   | 640           |\n",
      "| time_elapsed       | 18.6          |\n",
      "| total_timesteps    | 640           |\n",
      "| value_loss         | 4050.2324     |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 0.00015043674 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | -0.00013      |\n",
      "| fps                | 39            |\n",
      "| n_updates          | 6             |\n",
      "| policy_entropy     | 1.0920858     |\n",
      "| policy_loss        | -0.009303682  |\n",
      "| serial_timesteps   | 768           |\n",
      "| time_elapsed       | 23            |\n",
      "| total_timesteps    | 768           |\n",
      "| value_loss         | 3945.5684     |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 0.00032779737 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | 0.00017       |\n",
      "| fps                | 28            |\n",
      "| n_updates          | 7             |\n",
      "| policy_entropy     | 1.0865046     |\n",
      "| policy_loss        | -0.01436338   |\n",
      "| serial_timesteps   | 896           |\n",
      "| time_elapsed       | 26.3          |\n",
      "| total_timesteps    | 896           |\n",
      "| value_loss         | 3870.6        |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 0.0006958895 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | 0.000924     |\n",
      "| fps                | 37           |\n",
      "| n_updates          | 8            |\n",
      "| policy_entropy     | 1.0740579    |\n",
      "| policy_loss        | -0.019546235 |\n",
      "| serial_timesteps   | 1024         |\n",
      "| time_elapsed       | 30.8         |\n",
      "| total_timesteps    | 1024         |\n",
      "| value_loss         | 3722.3884    |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 0.00084322505 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | 0.000482      |\n",
      "| fps                | 25            |\n",
      "| n_updates          | 9             |\n",
      "| policy_entropy     | 1.0548321     |\n",
      "| policy_loss        | -0.025074812  |\n",
      "| serial_timesteps   | 1152          |\n",
      "| time_elapsed       | 34.2          |\n",
      "| total_timesteps    | 1152          |\n",
      "| value_loss         | 3957.3313     |\n",
      "--------------------------------------\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-------------------------------------\n",
      "| approxkl           | 0.0018638866 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | -0.000548    |\n",
      "| fps                | 35           |\n",
      "| n_updates          | 10           |\n",
      "| policy_entropy     | 1.01555      |\n",
      "| policy_loss        | -0.04078684  |\n",
      "| serial_timesteps   | 1280         |\n",
      "| time_elapsed       | 39.3         |\n",
      "| total_timesteps    | 1280         |\n",
      "| value_loss         | 4177.0903    |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 0.0021850825 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | 0.00265      |\n",
      "| fps                | 22           |\n",
      "| n_updates          | 11           |\n",
      "| policy_entropy     | 0.9594531    |\n",
      "| policy_loss        | -0.039690077 |\n",
      "| serial_timesteps   | 1408         |\n",
      "| time_elapsed       | 42.9         |\n",
      "| total_timesteps    | 1408         |\n",
      "| value_loss         | 4124.8867    |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 0.0033041902 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | -0.000456    |\n",
      "| fps                | 34           |\n",
      "| n_updates          | 12           |\n",
      "| policy_entropy     | 0.8843033    |\n",
      "| policy_loss        | -0.04696823  |\n",
      "| serial_timesteps   | 1536         |\n",
      "| time_elapsed       | 48.7         |\n",
      "| total_timesteps    | 1536         |\n",
      "| value_loss         | 4201.283     |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 0.0030925882 |\n",
      "| clipfrac           | 0.021484375  |\n",
      "| explained_variance | 0.000952     |\n",
      "| fps                | 34           |\n",
      "| n_updates          | 13           |\n",
      "| policy_entropy     | 0.7641637    |\n",
      "| policy_loss        | -0.05795412  |\n",
      "| serial_timesteps   | 1664         |\n",
      "| time_elapsed       | 52.5         |\n",
      "| total_timesteps    | 1664         |\n",
      "| value_loss         | 4440.92      |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 0.0038732863 |\n",
      "| clipfrac           | 0.04296875   |\n",
      "| explained_variance | 0.00518      |\n",
      "| fps                | 22           |\n",
      "| n_updates          | 14           |\n",
      "| policy_entropy     | 0.6456456    |\n",
      "| policy_loss        | -0.061438754 |\n",
      "| serial_timesteps   | 1792         |\n",
      "| time_elapsed       | 56.2         |\n",
      "| total_timesteps    | 1792         |\n",
      "| value_loss         | 4367.134     |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 0.003097237  |\n",
      "| clipfrac           | 0.041015625  |\n",
      "| explained_variance | 0.00306      |\n",
      "| fps                | 33           |\n",
      "| n_updates          | 15           |\n",
      "| policy_entropy     | 0.51307905   |\n",
      "| policy_loss        | -0.057217635 |\n",
      "| serial_timesteps   | 1920         |\n",
      "| time_elapsed       | 61.7         |\n",
      "| total_timesteps    | 1920         |\n",
      "| value_loss         | 4417.79      |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 0.0021201954 |\n",
      "| clipfrac           | 0.03125      |\n",
      "| explained_variance | -0.0133      |\n",
      "| fps                | 33           |\n",
      "| n_updates          | 16           |\n",
      "| policy_entropy     | 0.40890202   |\n",
      "| policy_loss        | -0.04748139  |\n",
      "| serial_timesteps   | 2048         |\n",
      "| time_elapsed       | 65.5         |\n",
      "| total_timesteps    | 2048         |\n",
      "| value_loss         | 4432.298     |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 0.0014529026 |\n",
      "| clipfrac           | 0.01953125   |\n",
      "| explained_variance | 0.00501      |\n",
      "| fps                | 17           |\n",
      "| n_updates          | 17           |\n",
      "| policy_entropy     | 0.31277794   |\n",
      "| policy_loss        | -0.03971738  |\n",
      "| serial_timesteps   | 2176         |\n",
      "| time_elapsed       | 69.3         |\n",
      "| total_timesteps    | 2176         |\n",
      "| value_loss         | 4414.0176    |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 0.0011339688 |\n",
      "| clipfrac           | 0.01953125   |\n",
      "| explained_variance | 0.00757      |\n",
      "| fps                | 32           |\n",
      "| n_updates          | 18           |\n",
      "| policy_entropy     | 0.24806914   |\n",
      "| policy_loss        | -0.025814183 |\n",
      "| serial_timesteps   | 2304         |\n",
      "| time_elapsed       | 76.6         |\n",
      "| total_timesteps    | 2304         |\n",
      "| value_loss         | 4313.063     |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 0.0007391007 |\n",
      "| clipfrac           | 0.01171875   |\n",
      "| explained_variance | 0.0316       |\n",
      "| fps                | 23           |\n",
      "| n_updates          | 19           |\n",
      "| policy_entropy     | 0.19068442   |\n",
      "| policy_loss        | -0.02404321  |\n",
      "| serial_timesteps   | 2432         |\n",
      "| time_elapsed       | 80.5         |\n",
      "| total_timesteps    | 2432         |\n",
      "| value_loss         | 4365.6025    |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 0.0008076045 |\n",
      "| clipfrac           | 0.01171875   |\n",
      "| explained_variance | -0.0131      |\n",
      "| fps                | 29           |\n",
      "| n_updates          | 20           |\n",
      "| policy_entropy     | 0.14820632   |\n",
      "| policy_loss        | -0.026867293 |\n",
      "| serial_timesteps   | 2560         |\n",
      "| time_elapsed       | 85.9         |\n",
      "| total_timesteps    | 2560         |\n",
      "| value_loss         | 4335.884     |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 5.5856257e-05 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | -0.0232       |\n",
      "| fps                | 27            |\n",
      "| n_updates          | 21            |\n",
      "| policy_entropy     | 0.11489105    |\n",
      "| policy_loss        | -0.0046238033 |\n",
      "| serial_timesteps   | 2688          |\n",
      "| time_elapsed       | 90.3          |\n",
      "| total_timesteps    | 2688          |\n",
      "| value_loss         | 4331.92       |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 2.1058022e-05 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | 0             |\n",
      "| fps                | 17            |\n",
      "| n_updates          | 22            |\n",
      "| policy_entropy     | 0.09650411    |\n",
      "| policy_loss        | -0.0025608484 |\n",
      "| serial_timesteps   | 2816          |\n",
      "| time_elapsed       | 94.9          |\n",
      "| total_timesteps    | 2816          |\n",
      "| value_loss         | 4296.74       |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 4.2838517e-05 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | -0.0345       |\n",
      "| fps                | 28            |\n",
      "| n_updates          | 23            |\n",
      "| policy_entropy     | 0.08733705    |\n",
      "| policy_loss        | -0.004682667  |\n",
      "| serial_timesteps   | 2944          |\n",
      "| time_elapsed       | 102           |\n",
      "| total_timesteps    | 2944          |\n",
      "| value_loss         | 4259.8        |\n",
      "--------------------------------------\n",
      "---------------------------------------\n",
      "| approxkl           | 0.000102847254 |\n",
      "| clipfrac           | 0.001953125    |\n",
      "| explained_variance | -0.0375        |\n",
      "| fps                | 29             |\n",
      "| n_updates          | 24             |\n",
      "| policy_entropy     | 0.07257957     |\n",
      "| policy_loss        | -0.0053370036  |\n",
      "| serial_timesteps   | 3072           |\n",
      "| time_elapsed       | 107            |\n",
      "| total_timesteps    | 3072           |\n",
      "| value_loss         | 4229.9087      |\n",
      "---------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 1.1270785e-07 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | nan           |\n",
      "| fps                | 32            |\n",
      "| n_updates          | 25            |\n",
      "| policy_entropy     | 0.06345923    |\n",
      "| policy_loss        | 0.0           |\n",
      "| serial_timesteps   | 3200          |\n",
      "| time_elapsed       | 111           |\n",
      "| total_timesteps    | 3200          |\n",
      "| value_loss         | 4198.5474     |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 5.508119e-06  |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | -0.0257       |\n",
      "| fps                | 28            |\n",
      "| n_updates          | 26            |\n",
      "| policy_entropy     | 0.06236457    |\n",
      "| policy_loss        | -0.0010745144 |\n",
      "| serial_timesteps   | 3328          |\n",
      "| time_elapsed       | 115           |\n",
      "| total_timesteps    | 3328          |\n",
      "| value_loss         | 4156.753      |\n",
      "--------------------------------------\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------------------------------------\n",
      "| approxkl           | 1.636944e-05  |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | 0             |\n",
      "| fps                | 15            |\n",
      "| n_updates          | 27            |\n",
      "| policy_entropy     | 0.055864867   |\n",
      "| policy_loss        | -0.0025804834 |\n",
      "| serial_timesteps   | 3456          |\n",
      "| time_elapsed       | 119           |\n",
      "| total_timesteps    | 3456          |\n",
      "| value_loss         | 4127.7583     |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 2.8690836e-05 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | -1.19e-07     |\n",
      "| fps                | 29            |\n",
      "| n_updates          | 28            |\n",
      "| policy_entropy     | 0.0511804     |\n",
      "| policy_loss        | -0.0035447306 |\n",
      "| serial_timesteps   | 3584          |\n",
      "| time_elapsed       | 128           |\n",
      "| total_timesteps    | 3584          |\n",
      "| value_loss         | 4091.4973     |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 7.4687875e-08 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | nan           |\n",
      "| fps                | 30            |\n",
      "| n_updates          | 29            |\n",
      "| policy_entropy     | 0.047582556   |\n",
      "| policy_loss        | 0.0           |\n",
      "| serial_timesteps   | 3712          |\n",
      "| time_elapsed       | 132           |\n",
      "| total_timesteps    | 3712          |\n",
      "| value_loss         | 4069.115      |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 4.1875264e-06 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | 0             |\n",
      "| fps                | 29            |\n",
      "| n_updates          | 30            |\n",
      "| policy_entropy     | 0.045673173   |\n",
      "| policy_loss        | -0.0009450745 |\n",
      "| serial_timesteps   | 3840          |\n",
      "| time_elapsed       | 136           |\n",
      "| total_timesteps    | 3840          |\n",
      "| value_loss         | 4034.147      |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 8.003526e-08 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | nan          |\n",
      "| fps                | 31           |\n",
      "| n_updates          | 31           |\n",
      "| policy_entropy     | 0.04141441   |\n",
      "| policy_loss        | 0.0          |\n",
      "| serial_timesteps   | 3968         |\n",
      "| time_elapsed       | 141          |\n",
      "| total_timesteps    | 3968         |\n",
      "| value_loss         | 4008.5713    |\n",
      "-------------------------------------\n",
      "---------------------------------------\n",
      "| approxkl           | 2.251936e-06   |\n",
      "| clipfrac           | 0.0            |\n",
      "| explained_variance | 0              |\n",
      "| fps                | 31             |\n",
      "| n_updates          | 32             |\n",
      "| policy_entropy     | 0.03956447     |\n",
      "| policy_loss        | -0.00078786956 |\n",
      "| serial_timesteps   | 4096           |\n",
      "| time_elapsed       | 145            |\n",
      "| total_timesteps    | 4096           |\n",
      "| value_loss         | 3974.9924      |\n",
      "---------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 6.080667e-08 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | nan          |\n",
      "| fps                | 32           |\n",
      "| n_updates          | 33           |\n",
      "| policy_entropy     | 0.035701253  |\n",
      "| policy_loss        | 0.0          |\n",
      "| serial_timesteps   | 4224         |\n",
      "| time_elapsed       | 149          |\n",
      "| total_timesteps    | 4224         |\n",
      "| value_loss         | 3948.7913    |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 1.4656681e-09 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | nan           |\n",
      "| fps                | 12            |\n",
      "| n_updates          | 34            |\n",
      "| policy_entropy     | 0.03424095    |\n",
      "| policy_loss        | 0.0           |\n",
      "| serial_timesteps   | 4352          |\n",
      "| time_elapsed       | 153           |\n",
      "| total_timesteps    | 4352          |\n",
      "| value_loss         | 3919.9714     |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 6.521696e-12 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | nan          |\n",
      "| fps                | 29           |\n",
      "| n_updates          | 35           |\n",
      "| policy_entropy     | 0.034051728  |\n",
      "| policy_loss        | 0.0          |\n",
      "| serial_timesteps   | 4480         |\n",
      "| time_elapsed       | 163          |\n",
      "| total_timesteps    | 4480         |\n",
      "| value_loss         | 3890.7466    |\n",
      "-------------------------------------\n",
      "---------------------------------------\n",
      "| approxkl           | 5.814285e-07   |\n",
      "| clipfrac           | 0.0            |\n",
      "| explained_variance | -0.0895        |\n",
      "| fps                | 29             |\n",
      "| n_updates          | 36             |\n",
      "| policy_entropy     | 0.03464252     |\n",
      "| policy_loss        | -0.00025901757 |\n",
      "| serial_timesteps   | 4608           |\n",
      "| time_elapsed       | 168            |\n",
      "| total_timesteps    | 4608           |\n",
      "| value_loss         | 3858.5444      |\n",
      "---------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 1.0398925e-05 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | 2.98e-07      |\n",
      "| fps                | 30            |\n",
      "| n_updates          | 37            |\n",
      "| policy_entropy     | 0.030399777   |\n",
      "| policy_loss        | -0.0020691967 |\n",
      "| serial_timesteps   | 4736          |\n",
      "| time_elapsed       | 172           |\n",
      "| total_timesteps    | 4736          |\n",
      "| value_loss         | 3828.6802     |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 2.3587065e-08 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | nan           |\n",
      "| fps                | 29            |\n",
      "| n_updates          | 38            |\n",
      "| policy_entropy     | 0.02762542    |\n",
      "| policy_loss        | 0.0           |\n",
      "| serial_timesteps   | 4864          |\n",
      "| time_elapsed       | 176           |\n",
      "| total_timesteps    | 4864          |\n",
      "| value_loss         | 3804.909      |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 5.955296e-10 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | nan          |\n",
      "| fps                | 25           |\n",
      "| n_updates          | 39           |\n",
      "| policy_entropy     | 0.026657818  |\n",
      "| policy_loss        | 0.0          |\n",
      "| serial_timesteps   | 4992         |\n",
      "| time_elapsed       | 181          |\n",
      "| total_timesteps    | 4992         |\n",
      "| value_loss         | 3777.6287    |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 2.6384682e-12 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | nan           |\n",
      "| fps                | 28            |\n",
      "| n_updates          | 40            |\n",
      "| policy_entropy     | 0.026530726   |\n",
      "| policy_loss        | 0.0           |\n",
      "| serial_timesteps   | 5120          |\n",
      "| time_elapsed       | 186           |\n",
      "| total_timesteps    | 5120          |\n",
      "| value_loss         | 3750.984      |\n",
      "--------------------------------------\n",
      "------------------------------------\n",
      "| approxkl           | 6.78427e-12 |\n",
      "| clipfrac           | 0.0         |\n",
      "| explained_variance | nan         |\n",
      "| fps                | 27          |\n",
      "| n_updates          | 41          |\n",
      "| policy_entropy     | 0.026557334 |\n",
      "| policy_loss        | 0.0         |\n",
      "| serial_timesteps   | 5248        |\n",
      "| time_elapsed       | 190         |\n",
      "| total_timesteps    | 5248        |\n",
      "| value_loss         | 3724.9565   |\n",
      "------------------------------------\n",
      "----------------------------------------\n",
      "| approxkl           | 1.6480338e-07   |\n",
      "| clipfrac           | 0.0             |\n",
      "| explained_variance | -0.0964         |\n",
      "| fps                | 12              |\n",
      "| n_updates          | 42              |\n",
      "| policy_entropy     | 0.027045451     |\n",
      "| policy_loss        | -0.000119969714 |\n",
      "| serial_timesteps   | 5376            |\n",
      "| time_elapsed       | 195             |\n",
      "| total_timesteps    | 5376            |\n",
      "| value_loss         | 3695.2075       |\n",
      "----------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 1.0748278e-05 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | 1.19e-07      |\n",
      "| fps                | 31            |\n",
      "| n_updates          | 43            |\n",
      "| policy_entropy     | 0.023529774   |\n",
      "| policy_loss        | -0.0020988667 |\n",
      "| serial_timesteps   | 5504          |\n",
      "| time_elapsed       | 205           |\n",
      "| total_timesteps    | 5504          |\n",
      "| value_loss         | 3668.289      |\n",
      "--------------------------------------\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------------------------------------\n",
      "| approxkl           | 9.843582e-06  |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | -0.0959       |\n",
      "| fps                | 29            |\n",
      "| n_updates          | 44            |\n",
      "| policy_entropy     | 0.021820018   |\n",
      "| policy_loss        | -0.0019928538 |\n",
      "| serial_timesteps   | 5632          |\n",
      "| time_elapsed       | 209           |\n",
      "| total_timesteps    | 5632          |\n",
      "| value_loss         | 3643.6006     |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 1.6955488e-05 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | -0.0952       |\n",
      "| fps                | 29            |\n",
      "| n_updates          | 45            |\n",
      "| policy_entropy     | 0.019816618   |\n",
      "| policy_loss        | -0.0025166627 |\n",
      "| serial_timesteps   | 5760          |\n",
      "| time_elapsed       | 213           |\n",
      "| total_timesteps    | 5760          |\n",
      "| value_loss         | 3617.57       |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 8.778779e-09 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | nan          |\n",
      "| fps                | 30           |\n",
      "| n_updates          | 46           |\n",
      "| policy_entropy     | 0.017607767  |\n",
      "| policy_loss        | 0.0          |\n",
      "| serial_timesteps   | 5888         |\n",
      "| time_elapsed       | 218          |\n",
      "| total_timesteps    | 5888         |\n",
      "| value_loss         | 3594.7239    |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 2.3879365e-10 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | nan           |\n",
      "| fps                | 30            |\n",
      "| n_updates          | 47            |\n",
      "| policy_entropy     | 0.01695734    |\n",
      "| policy_loss        | 0.0           |\n",
      "| serial_timesteps   | 6016          |\n",
      "| time_elapsed       | 222           |\n",
      "| total_timesteps    | 6016          |\n",
      "| value_loss         | 3569.6123     |\n",
      "--------------------------------------\n",
      "---------------------------------------\n",
      "| approxkl           | 7.391328e-09   |\n",
      "| clipfrac           | 0.0            |\n",
      "| explained_variance | -1.19e-07      |\n",
      "| fps                | 30             |\n",
      "| n_updates          | 48             |\n",
      "| policy_entropy     | 0.016850296    |\n",
      "| policy_loss        | -4.8967544e-05 |\n",
      "| serial_timesteps   | 6144           |\n",
      "| time_elapsed       | 226            |\n",
      "| total_timesteps    | 6144           |\n",
      "| value_loss         | 3541.0596      |\n",
      "---------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 2.9383868e-05 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | -0.0493       |\n",
      "| fps                | 30            |\n",
      "| n_updates          | 49            |\n",
      "| policy_entropy     | 0.01639788    |\n",
      "| policy_loss        | -0.004105901  |\n",
      "| serial_timesteps   | 6272          |\n",
      "| time_elapsed       | 230           |\n",
      "| total_timesteps    | 6272          |\n",
      "| value_loss         | 3514.154      |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 1.8241103e-08 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | nan           |\n",
      "| fps                | 28            |\n",
      "| n_updates          | 50            |\n",
      "| policy_entropy     | 0.013761687   |\n",
      "| policy_loss        | 0.0           |\n",
      "| serial_timesteps   | 6400          |\n",
      "| time_elapsed       | 234           |\n",
      "| total_timesteps    | 6400          |\n",
      "| value_loss         | 3496.9583     |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 4.921573e-10 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | nan          |\n",
      "| fps                | 26           |\n",
      "| n_updates          | 51           |\n",
      "| policy_entropy     | 0.012798915  |\n",
      "| policy_loss        | 0.0          |\n",
      "| serial_timesteps   | 6528         |\n",
      "| time_elapsed       | 239          |\n",
      "| total_timesteps    | 6528         |\n",
      "| value_loss         | 3473.544     |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 1.2990284e-11 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | nan           |\n",
      "| fps                | 10            |\n",
      "| n_updates          | 52            |\n",
      "| policy_entropy     | 0.0126367025  |\n",
      "| policy_loss        | 0.0           |\n",
      "| serial_timesteps   | 6656          |\n",
      "| time_elapsed       | 244           |\n",
      "| total_timesteps    | 6656          |\n",
      "| value_loss         | 3449.3325     |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 4.424428e-15 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | nan          |\n",
      "| fps                | 24           |\n",
      "| n_updates          | 53           |\n",
      "| policy_entropy     | 0.01261734   |\n",
      "| policy_loss        | 0.0          |\n",
      "| serial_timesteps   | 6784         |\n",
      "| time_elapsed       | 256          |\n",
      "| total_timesteps    | 6784         |\n",
      "| value_loss         | 3425.9346    |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 3.002179e-13 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | nan          |\n",
      "| fps                | 28           |\n",
      "| n_updates          | 54           |\n",
      "| policy_entropy     | 0.012623467  |\n",
      "| policy_loss        | 0.0          |\n",
      "| serial_timesteps   | 6912         |\n",
      "| time_elapsed       | 261          |\n",
      "| total_timesteps    | 6912         |\n",
      "| value_loss         | 3402.3008    |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 4.8433507e-13 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | nan           |\n",
      "| fps                | 29            |\n",
      "| n_updates          | 55            |\n",
      "| policy_entropy     | 0.012635817   |\n",
      "| policy_loss        | 0.0           |\n",
      "| serial_timesteps   | 7040          |\n",
      "| time_elapsed       | 266           |\n",
      "| total_timesteps    | 7040          |\n",
      "| value_loss         | 3379.5684     |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 4.046746e-13 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | nan          |\n",
      "| fps                | 30           |\n",
      "| n_updates          | 56           |\n",
      "| policy_entropy     | 0.01264801   |\n",
      "| policy_loss        | 0.0          |\n",
      "| serial_timesteps   | 7168         |\n",
      "| time_elapsed       | 270          |\n",
      "| total_timesteps    | 7168         |\n",
      "| value_loss         | 3356.5557    |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 4.0507368e-13 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | nan           |\n",
      "| fps                | 30            |\n",
      "| n_updates          | 57            |\n",
      "| policy_entropy     | 0.0126599725  |\n",
      "| policy_loss        | 0.0           |\n",
      "| serial_timesteps   | 7296          |\n",
      "| time_elapsed       | 274           |\n",
      "| total_timesteps    | 7296          |\n",
      "| value_loss         | 3333.3027     |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 4.2151704e-13 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | nan           |\n",
      "| fps                | 30            |\n",
      "| n_updates          | 58            |\n",
      "| policy_entropy     | 0.012672037   |\n",
      "| policy_loss        | 0.0           |\n",
      "| serial_timesteps   | 7424          |\n",
      "| time_elapsed       | 279           |\n",
      "| total_timesteps    | 7424          |\n",
      "| value_loss         | 3310.2527     |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 3.842457e-13 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | nan          |\n",
      "| fps                | 31           |\n",
      "| n_updates          | 59           |\n",
      "| policy_entropy     | 0.012684286  |\n",
      "| policy_loss        | 0.0          |\n",
      "| serial_timesteps   | 7552         |\n",
      "| time_elapsed       | 283          |\n",
      "| total_timesteps    | 7552         |\n",
      "| value_loss         | 3287.4731    |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 5.241878e-13 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | nan          |\n",
      "| fps                | 27           |\n",
      "| n_updates          | 60           |\n",
      "| policy_entropy     | 0.012696676  |\n",
      "| policy_loss        | 0.0          |\n",
      "| serial_timesteps   | 7680         |\n",
      "| time_elapsed       | 287          |\n",
      "| total_timesteps    | 7680         |\n",
      "| value_loss         | 3264.962     |\n",
      "-------------------------------------\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-------------------------------------\n",
      "| approxkl           | 4.985035e-13 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | nan          |\n",
      "| fps                | 28           |\n",
      "| n_updates          | 61           |\n",
      "| policy_entropy     | 0.012709266  |\n",
      "| policy_loss        | 0.0          |\n",
      "| serial_timesteps   | 7808         |\n",
      "| time_elapsed       | 291          |\n",
      "| total_timesteps    | 7808         |\n",
      "| value_loss         | 3242.7092    |\n",
      "-------------------------------------\n",
      "---------------------------------------\n",
      "| approxkl           | 3.615873e-10   |\n",
      "| clipfrac           | 0.0            |\n",
      "| explained_variance | -3.58e-07      |\n",
      "| fps                | 26             |\n",
      "| n_updates          | 62             |\n",
      "| policy_entropy     | 0.012707609    |\n",
      "| policy_loss        | -3.6573038e-06 |\n",
      "| serial_timesteps   | 7936           |\n",
      "| time_elapsed       | 296            |\n",
      "| total_timesteps    | 7936           |\n",
      "| value_loss         | 3217.091       |\n",
      "---------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 3.4761847e-09 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | nan           |\n",
      "| fps                | 28            |\n",
      "| n_updates          | 63            |\n",
      "| policy_entropy     | 0.011906338   |\n",
      "| policy_loss        | 0.0           |\n",
      "| serial_timesteps   | 8064          |\n",
      "| time_elapsed       | 301           |\n",
      "| total_timesteps    | 8064          |\n",
      "| value_loss         | 3198.923      |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 9.972234e-11 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | nan          |\n",
      "| fps                | 26           |\n",
      "| n_updates          | 64           |\n",
      "| policy_entropy     | 0.0114687495 |\n",
      "| policy_loss        | 0.0          |\n",
      "| serial_timesteps   | 8192         |\n",
      "| time_elapsed       | 305          |\n",
      "| total_timesteps    | 8192         |\n",
      "| value_loss         | 3177.3672    |\n",
      "-------------------------------------\n",
      "---------------------------------------\n",
      "| approxkl           | 9.6275965e-09  |\n",
      "| clipfrac           | 0.0            |\n",
      "| explained_variance | -0.0722        |\n",
      "| fps                | 8              |\n",
      "| n_updates          | 65             |\n",
      "| policy_entropy     | 0.011702726    |\n",
      "| policy_loss        | -5.7652127e-05 |\n",
      "| serial_timesteps   | 8320           |\n",
      "| time_elapsed       | 310            |\n",
      "| total_timesteps    | 8320           |\n",
      "| value_loss         | 3153.4082      |\n",
      "---------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 1.8454681e-09 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | nan           |\n",
      "| fps                | 26            |\n",
      "| n_updates          | 66            |\n",
      "| policy_entropy     | 0.010539748   |\n",
      "| policy_loss        | 0.0           |\n",
      "| serial_timesteps   | 8448          |\n",
      "| time_elapsed       | 326           |\n",
      "| total_timesteps    | 8448          |\n",
      "| value_loss         | 3134.8647     |\n",
      "--------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 5.293721e-11 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | nan          |\n",
      "| fps                | 29           |\n",
      "| n_updates          | 67           |\n",
      "| policy_entropy     | 0.0102127725 |\n",
      "| policy_loss        | 0.0          |\n",
      "| serial_timesteps   | 8576         |\n",
      "| time_elapsed       | 331          |\n",
      "| total_timesteps    | 8576         |\n",
      "| value_loss         | 3113.911     |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 9.471275e-13 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | nan          |\n",
      "| fps                | 26           |\n",
      "| n_updates          | 68           |\n",
      "| policy_entropy     | 0.010160062  |\n",
      "| policy_loss        | 0.0          |\n",
      "| serial_timesteps   | 8704         |\n",
      "| time_elapsed       | 335          |\n",
      "| total_timesteps    | 8704         |\n",
      "| value_loss         | 3093.1445    |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 6.822244e-14 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | nan          |\n",
      "| fps                | 28           |\n",
      "| n_updates          | 69           |\n",
      "| policy_entropy     | 0.010157408  |\n",
      "| policy_loss        | 0.0          |\n",
      "| serial_timesteps   | 8832         |\n",
      "| time_elapsed       | 340          |\n",
      "| total_timesteps    | 8832         |\n",
      "| value_loss         | 3072.5557    |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 1.745386e-13 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | nan          |\n",
      "| fps                | 26           |\n",
      "| n_updates          | 70           |\n",
      "| policy_entropy     | 0.010164159  |\n",
      "| policy_loss        | 0.0          |\n",
      "| serial_timesteps   | 8960         |\n",
      "| time_elapsed       | 345          |\n",
      "| total_timesteps    | 8960         |\n",
      "| value_loss         | 3052.1367    |\n",
      "-------------------------------------\n",
      "---------------------------------------\n",
      "| approxkl           | 1.7796338e-09  |\n",
      "| clipfrac           | 0.0            |\n",
      "| explained_variance | -0.0652        |\n",
      "| fps                | 25             |\n",
      "| n_updates          | 71             |\n",
      "| policy_entropy     | 0.0104862265   |\n",
      "| policy_loss        | -1.4320016e-05 |\n",
      "| serial_timesteps   | 9088           |\n",
      "| time_elapsed       | 349            |\n",
      "| total_timesteps    | 9088           |\n",
      "| value_loss         | 3029.3086      |\n",
      "---------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 2.333519e-09 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | nan          |\n",
      "| fps                | 28           |\n",
      "| n_updates          | 72           |\n",
      "| policy_entropy     | 0.009473039  |\n",
      "| policy_loss        | 0.0          |\n",
      "| serial_timesteps   | 9216         |\n",
      "| time_elapsed       | 354          |\n",
      "| total_timesteps    | 9216         |\n",
      "| value_loss         | 3011.7778    |\n",
      "-------------------------------------\n",
      "-------------------------------------\n",
      "| approxkl           | 6.870047e-11 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | nan          |\n",
      "| fps                | 28           |\n",
      "| n_updates          | 73           |\n",
      "| policy_entropy     | 0.009099268  |\n",
      "| policy_loss        | 0.0          |\n",
      "| serial_timesteps   | 9344         |\n",
      "| time_elapsed       | 359          |\n",
      "| total_timesteps    | 9344         |\n",
      "| value_loss         | 2990.942     |\n",
      "-------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 1.6176166e-12 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | nan           |\n",
      "| fps                | 27            |\n",
      "| n_updates          | 74            |\n",
      "| policy_entropy     | 0.009037095   |\n",
      "| policy_loss        | 0.0           |\n",
      "| serial_timesteps   | 9472          |\n",
      "| time_elapsed       | 363           |\n",
      "| total_timesteps    | 9472          |\n",
      "| value_loss         | 2970.5623     |\n",
      "--------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 1.3288071e-15 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | nan           |\n",
      "| fps                | 26            |\n",
      "| n_updates          | 75            |\n",
      "| policy_entropy     | 0.009030421   |\n",
      "| policy_loss        | 0.0           |\n",
      "| serial_timesteps   | 9600          |\n",
      "| time_elapsed       | 368           |\n",
      "| total_timesteps    | 9600          |\n",
      "| value_loss         | 2949.9497     |\n",
      "--------------------------------------\n",
      "---------------------------------------\n",
      "| approxkl           | 1.1359422e-08  |\n",
      "| clipfrac           | 0.0            |\n",
      "| explained_variance | -0.0324        |\n",
      "| fps                | 27             |\n",
      "| n_updates          | 76             |\n",
      "| policy_entropy     | 0.009321555    |\n",
      "| policy_loss        | -6.9826376e-05 |\n",
      "| serial_timesteps   | 9728           |\n",
      "| time_elapsed       | 373            |\n",
      "| total_timesteps    | 9728           |\n",
      "| value_loss         | 2924.0024      |\n",
      "---------------------------------------\n",
      "--------------------------------------\n",
      "| approxkl           | 2.3084932e-09 |\n",
      "| clipfrac           | 0.0           |\n",
      "| explained_variance | nan           |\n",
      "| fps                | 30            |\n",
      "| n_updates          | 77            |\n",
      "| policy_entropy     | 0.008434594   |\n",
      "| policy_loss        | 0.0           |\n",
      "| serial_timesteps   | 9856          |\n",
      "| time_elapsed       | 378           |\n",
      "| total_timesteps    | 9856          |\n",
      "| value_loss         | 2909.5547     |\n",
      "--------------------------------------\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-------------------------------------\n",
      "| approxkl           | 6.620775e-11 |\n",
      "| clipfrac           | 0.0          |\n",
      "| explained_variance | nan          |\n",
      "| fps                | 28           |\n",
      "| n_updates          | 78           |\n",
      "| policy_entropy     | 0.008057287  |\n",
      "| policy_loss        | 0.0          |\n",
      "| serial_timesteps   | 9984         |\n",
      "| time_elapsed       | 382          |\n",
      "| total_timesteps    | 9984         |\n",
      "| value_loss         | 2889.783     |\n",
      "-------------------------------------\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<stable_baselines.ppo2.ppo2.PPO2 at 0x7fda7c594550>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "env = DummyVecEnv([lambda: env])\n",
    "modelPPO2 = PPO2(MlpPolicy, env, verbose=1)\n",
    "modelPPO2.learn(total_timesteps=10000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Last, we test it by letting it play the game."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAGMAAAB7CAYAAABgvj5jAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAD1UlEQVR4nO3czyt8exzH8ddclyGZmJGUETFSI5T5AyxsTlFqSiJSChsrm3tLtnyvlPXcvcVITbPBwsJQs8DCz+RsFEqxIJSNfO7K/Sb3dt3FOZ9XzetRZ3NOzXnXs885p6lzAsYYA6Hwi+0B5CfFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQoYnx/v6OpaUltLS0oLS0FJ2dncjlcmhtbcXk5KTt8Xzxq+0BPoyPjyOTyWBubg6JRAL5fB5DQ0O4v7/HzMyM7fH8YQisrKwYAGZ7e/vT/mQyaQCY/f19S5P5i+IytbCwAMdx0N3d/Wl/LBZDcXEx2tvbLU3mL+sxbm5ucHp6ioGBgS/Hrq6u0NbWhmAw6Nn5A4GA59t3UcQAgNra2k/7X19fkcvlkEgkbIxlhfUY1dXVAADXdT/tX1xcxO3tLbq6ujw9vzHG8+27rD9NNTU1oaOjA/Pz8wiHw6irq8Pa2hrW19cBoKBWRsD8n3QecV0XU1NT2NvbQyQSwdjYGCoqKjA7O4unpyeUlZXZHtEXFDH+yejoKI6OjnB8fGx7FN9Yv2f8m4ODg4K6RAGkMV5eXuC6ruc3bza0l6lCRLkyCpViEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUgwhVjGw2i76+PtTU1CAYDKKhoQHDw8M4OTmxPZovKF4je3t7w8jICNLpNKLRKHp7exEKheC6LjY2NpDNZuE4ju0xPWf9pXwAmJ6eRjqdxsTEBJaXl1FeXv73sevra1RWVnp27t//+NOz3/7w47fvfS/Leozd3V2kUik4joNUKvXlwyf19fWWJrPA848o/YePb0odHh7aHsU66/eMUCiESCSCy8tLK+dnukxZfZp6fHzE8/MzGhsbbY5Bw+rKeHh4QDgcRjwex9nZma0xaFhdGVVVVWhubsb5+Tm2tra+HL+4uLAwlT3W7xmrq6sYHBxEUVER+vv7EYvFcHd3h3w+j3g8jkwmY3M8f9l8eviwublpenp6TCgUMiUlJSYajZpkMml2dnZsj+Yr6ytDfqL6b6rQKQYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSDyF6IOl5yTF0T5AAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 113.176x144.48 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAJEAAAB7CAYAAAB0B2LHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAFO0lEQVR4nO3dT2iTdxzH8U//uDhbo226tMx21Vona2gC1rnLRDwtqEOYB9HpQJkysbDqZfMg7KSbCu42MraDMIWqtPSiCB5ahOyQIlYrYiwtVEfXdqwZrbNOa3YYE0Lnmu5r+D3J3i/I5Ze2z/fwzu95Qto+Rel0Oi3AoNj1AMh/RAQzIoIZEcGMiGBGRDAjIpgREcyICGZEBDMighkRwYyIYEZEMCMimBERzIgIZkQEMyKCGRHBjIhgRkQwIyKYERHMiAhmRAQzIoJZqesB8kHbWTfH/fpDN8edL3YimBERzIgIZkQEMyKCGRHBjIhgRkQwIyKYeSaiZ8+e6dSpU1q1apUWLlyoSCSinp4erV69Wvv373c93rx81/q6+ru/z1hLp9P65mO/BhKdjqbKHc987LF37151dnbq6NGjamlpUTwe144dOzQ+Pq7Dhw+7Hi9rU7/+pIepEb32RiRj/bexQf0xPanqhrWOJssdT0R07tw5nTlzRt3d3dqwYYMkaePGjbp+/bo6OjrU0tLieMLsjQ4mVFRcokBtKGP9l+E+LVpSrcWBOkeT5Y4nTmfHjx9XNBp9HtDfGhsbtWDBAjU3NzuabP5GBxOqqHlTpa+8mrE+Ptyn4IrC24UkD+xEDx48UH9/vw4dOjTrueHhYYVCIfl8vpwdv6ioaM6v+fSH7G86MDqYUGp0QLFPqjLWnzye0tr3j7z02XIp25steCIiSaqpqclYf/TokXp6erRp0yYXY/1no0O9eueDL/TWux9lrJ890qzqAt2JnJ/Oqqr+esUmk8mM9RMnTmhkZERr1qzJ6fHT6fScj2ylfh7Q44cTqg+/p8WB2uePmSfTevx7SsF5XlRnM1suH9lyvhM1NDQoHA7r2LFjqqys1LJly3Tx4kVdunRJkvLuorrUt2jWO7ORe3GVB+pUtqTa0WS55XwnKi4u1oULFxQKhXTgwAHt2bNHVVVVOnjwoEpLSxUOh12PmLXRwYSqV7yt4pLM1+bIwI8FeyqTpCKv3qpq9+7d6uvr082bN12Pwq/HzsH5TvQivb29eXUq+z/zZERTU1NKJpM5v6jGy+H8wvqflJeXa2ZmxvUYyJIndyLkFyKCGRHBjIhgRkQwIyKYERHMPPuxB/IHOxHMiAhmRAQzIoIZEcGMiGBGRDAjIpgREcyICGZEBDMighkRwYyIYEZEMCMimBERzIgIZkQEMyKCGRHBjIhgRkQwIyKYERHMPBVRV1eXtmzZomAwKJ/Pp/r6eu3cuVO3bt1yPRr+hSf+jPrp06fatWuX2tvbVVtbq82bN8vv9yuZTOry5cvq6upSNBp1PSZewBP/s7G1tVXt7e3at2+fTp8+rbKysufP3b9/X0uXLs3ZsT//6tuc/ex89+Vn2d1nznlE165dUywWUzQaVSwWm3VTlLq6wru1U6Fxfjrbtm2bOjo6dOPGDUUikbm/AZ7jPCK/369AIKChoSEnx+d09mLZns6cvjtLpVKanJzU8uXLXY4BI6c70cTEhCorK9XU1KTbt2+7GgNGTneiiooKrVy5Unfu3NHVq1dnPX/37l0HU2G+nF8TnT9/Xtu3b1dJSYm2bt2qxsZGjY2NKR6Pq6mpSZ2dhXcL8ELjPCJJunLlik6ePKlEIqHp6WkFg0GtW7dObW1tWr9+vevxMAdPRIT85qnPzpCfiAhmRAQzIoIZEcGMiGBGRDAjIpgREcyICGZEBDMighkRwYyIYEZEMCMimBERzIgIZkQEMyKCGRHBjIhgRkQwIyKYERHMiAhmRASzPwHPJLsuj1leygAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 173.376x144.48 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "obs = env.reset()\n",
    "display(env.render())\n",
    "\n",
    "for _ in range(1):\n",
    "    action, _states = modelPPO2.predict(obs)\n",
    "    obs, _, done, info = env.step(action)\n",
    "    display(info[0]['circuit_img'])\n",
    "    \n",
    "env.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As expected, the agent easily learned the optimal circuit. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A2C Agent\n",
    "\n",
    "For comparison, we now run an *A2C* agent, another agent from the library of *stable_baselines*.\n",
    "\n",
    "First we import the agent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "from stable_baselines import A2C"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We train it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/tf_util.py:312: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.\n",
      "\n",
      "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/tf_util.py:312: The name tf.GraphKeys is deprecated. Please use tf.compat.v1.GraphKeys instead.\n",
      "\n",
      "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/a2c/a2c.py:159: The name tf.train.RMSPropOptimizer is deprecated. Please use tf.compat.v1.train.RMSPropOptimizer instead.\n",
      "\n",
      "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/tensorflow_core/python/training/rmsprop.py:119: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Call initializer instance with the dtype argument instead of passing it to the constructor\n",
      "---------------------------------\n",
      "| explained_variance | 0.0313   |\n",
      "| fps                | 7        |\n",
      "| nupdates           | 1        |\n",
      "| policy_entropy     | 1.1      |\n",
      "| total_timesteps    | 5        |\n",
      "| value_loss         | 9.92e+03 |\n",
      "---------------------------------\n",
      "---------------------------------\n",
      "| explained_variance | 0.00121  |\n",
      "| fps                | 12       |\n",
      "| nupdates           | 100      |\n",
      "| policy_entropy     | 1.1      |\n",
      "| total_timesteps    | 500      |\n",
      "| value_loss         | 7.91e+03 |\n",
      "---------------------------------\n",
      "---------------------------------\n",
      "| explained_variance | 0.000339 |\n",
      "| fps                | 16       |\n",
      "| nupdates           | 200      |\n",
      "| policy_entropy     | 1.1      |\n",
      "| total_timesteps    | 1000     |\n",
      "| value_loss         | 7.81e+03 |\n",
      "---------------------------------\n",
      "---------------------------------\n",
      "| explained_variance | -0.0287  |\n",
      "| fps                | 18       |\n",
      "| nupdates           | 300      |\n",
      "| policy_entropy     | 1.1      |\n",
      "| total_timesteps    | 1500     |\n",
      "| value_loss         | 9.76e+03 |\n",
      "---------------------------------\n",
      "---------------------------------\n",
      "| explained_variance | -0.00107 |\n",
      "| fps                | 20       |\n",
      "| nupdates           | 400      |\n",
      "| policy_entropy     | 1.09     |\n",
      "| total_timesteps    | 2000     |\n",
      "| value_loss         | 7.64e+03 |\n",
      "---------------------------------\n",
      "----------------------------------\n",
      "| explained_variance | -1.19e-07 |\n",
      "| fps                | 21        |\n",
      "| nupdates           | 500       |\n",
      "| policy_entropy     | 1.07      |\n",
      "| total_timesteps    | 2500      |\n",
      "| value_loss         | 9.46e+03  |\n",
      "----------------------------------\n",
      "---------------------------------\n",
      "| explained_variance | -0.00371 |\n",
      "| fps                | 21       |\n",
      "| nupdates           | 600      |\n",
      "| policy_entropy     | 1.06     |\n",
      "| total_timesteps    | 3000     |\n",
      "| value_loss         | 3.73e+03 |\n",
      "---------------------------------\n",
      "---------------------------------\n",
      "| explained_variance | 0        |\n",
      "| fps                | 23       |\n",
      "| nupdates           | 700      |\n",
      "| policy_entropy     | 1        |\n",
      "| total_timesteps    | 3500     |\n",
      "| value_loss         | 8.85e+03 |\n",
      "---------------------------------\n",
      "---------------------------------\n",
      "| explained_variance | 0        |\n",
      "| fps                | 23       |\n",
      "| nupdates           | 800      |\n",
      "| policy_entropy     | 0.939    |\n",
      "| total_timesteps    | 4000     |\n",
      "| value_loss         | 7.97e+03 |\n",
      "---------------------------------\n",
      "---------------------------------\n",
      "| explained_variance | 0        |\n",
      "| fps                | 23       |\n",
      "| nupdates           | 900      |\n",
      "| policy_entropy     | 0.8      |\n",
      "| total_timesteps    | 4500     |\n",
      "| value_loss         | 6.86e+03 |\n",
      "---------------------------------\n",
      "----------------------------------\n",
      "| explained_variance | -1.19e-07 |\n",
      "| fps                | 23        |\n",
      "| nupdates           | 1000      |\n",
      "| policy_entropy     | 0.661     |\n",
      "| total_timesteps    | 5000      |\n",
      "| value_loss         | 3.6e+03   |\n",
      "----------------------------------\n",
      "---------------------------------\n",
      "| explained_variance | -1.57    |\n",
      "| fps                | 24       |\n",
      "| nupdates           | 1100     |\n",
      "| policy_entropy     | 0.554    |\n",
      "| total_timesteps    | 5500     |\n",
      "| value_loss         | 5.12e+03 |\n",
      "---------------------------------\n",
      "---------------------------------\n",
      "| explained_variance | 0        |\n",
      "| fps                | 24       |\n",
      "| nupdates           | 1200     |\n",
      "| policy_entropy     | 0.375    |\n",
      "| total_timesteps    | 6000     |\n",
      "| value_loss         | 4.35e+03 |\n",
      "---------------------------------\n",
      "---------------------------------\n",
      "| explained_variance | nan      |\n",
      "| fps                | 24       |\n",
      "| nupdates           | 1300     |\n",
      "| policy_entropy     | 0.264    |\n",
      "| total_timesteps    | 6500     |\n",
      "| value_loss         | 3.79e+03 |\n",
      "---------------------------------\n",
      "---------------------------------\n",
      "| explained_variance | nan      |\n",
      "| fps                | 24       |\n",
      "| nupdates           | 1400     |\n",
      "| policy_entropy     | 0.172    |\n",
      "| total_timesteps    | 7000     |\n",
      "| value_loss         | 3.24e+03 |\n",
      "---------------------------------\n",
      "---------------------------------\n",
      "| explained_variance | nan      |\n",
      "| fps                | 24       |\n",
      "| nupdates           | 1500     |\n",
      "| policy_entropy     | 0.159    |\n",
      "| total_timesteps    | 7500     |\n",
      "| value_loss         | 2.74e+03 |\n",
      "---------------------------------\n",
      "---------------------------------\n",
      "| explained_variance | nan      |\n",
      "| fps                | 24       |\n",
      "| nupdates           | 1600     |\n",
      "| policy_entropy     | 0.122    |\n",
      "| total_timesteps    | 8000     |\n",
      "| value_loss         | 2.28e+03 |\n",
      "---------------------------------\n",
      "---------------------------------\n",
      "| explained_variance | nan      |\n",
      "| fps                | 24       |\n",
      "| nupdates           | 1700     |\n",
      "| policy_entropy     | 0.103    |\n",
      "| total_timesteps    | 8500     |\n",
      "| value_loss         | 1.86e+03 |\n",
      "---------------------------------\n",
      "---------------------------------\n",
      "| explained_variance | nan      |\n",
      "| fps                | 23       |\n",
      "| nupdates           | 1800     |\n",
      "| policy_entropy     | 0.0669   |\n",
      "| total_timesteps    | 9000     |\n",
      "| value_loss         | 1.49e+03 |\n",
      "---------------------------------\n",
      "---------------------------------\n",
      "| explained_variance | nan      |\n",
      "| fps                | 23       |\n",
      "| nupdates           | 1900     |\n",
      "| policy_entropy     | 0.0509   |\n",
      "| total_timesteps    | 9500     |\n",
      "| value_loss         | 1.16e+03 |\n",
      "---------------------------------\n",
      "---------------------------------\n",
      "| explained_variance | nan      |\n",
      "| fps                | 24       |\n",
      "| nupdates           | 2000     |\n",
      "| policy_entropy     | 0.0564   |\n",
      "| total_timesteps    | 10000    |\n",
      "| value_loss         | 870      |\n",
      "---------------------------------\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<stable_baselines.a2c.a2c.A2C at 0x7fd9682dd450>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "modelA2C = A2C(MlpPolicy, env, verbose=1)\n",
    "modelA2C.learn(total_timesteps=10000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And we test it by letting it play the game."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAGMAAAB7CAYAAABgvj5jAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAD1UlEQVR4nO3czyt8exzH8ddclyGZmJGUETFSI5T5AyxsTlFqSiJSChsrm3tLtnyvlPXcvcVITbPBwsJQs8DCz+RsFEqxIJSNfO7K/Sb3dt3FOZ9XzetRZ3NOzXnXs885p6lzAsYYA6Hwi+0B5CfFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQoYnx/v6OpaUltLS0oLS0FJ2dncjlcmhtbcXk5KTt8Xzxq+0BPoyPjyOTyWBubg6JRAL5fB5DQ0O4v7/HzMyM7fH8YQisrKwYAGZ7e/vT/mQyaQCY/f19S5P5i+IytbCwAMdx0N3d/Wl/LBZDcXEx2tvbLU3mL+sxbm5ucHp6ioGBgS/Hrq6u0NbWhmAw6Nn5A4GA59t3UcQAgNra2k/7X19fkcvlkEgkbIxlhfUY1dXVAADXdT/tX1xcxO3tLbq6ujw9vzHG8+27rD9NNTU1oaOjA/Pz8wiHw6irq8Pa2hrW19cBoKBWRsD8n3QecV0XU1NT2NvbQyQSwdjYGCoqKjA7O4unpyeUlZXZHtEXFDH+yejoKI6OjnB8fGx7FN9Yv2f8m4ODg4K6RAGkMV5eXuC6ruc3bza0l6lCRLkyCpViEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUgwhVjGw2i76+PtTU1CAYDKKhoQHDw8M4OTmxPZovKF4je3t7w8jICNLpNKLRKHp7exEKheC6LjY2NpDNZuE4ju0xPWf9pXwAmJ6eRjqdxsTEBJaXl1FeXv73sevra1RWVnp27t//+NOz3/7w47fvfS/Leozd3V2kUik4joNUKvXlwyf19fWWJrPA848o/YePb0odHh7aHsU66/eMUCiESCSCy8tLK+dnukxZfZp6fHzE8/MzGhsbbY5Bw+rKeHh4QDgcRjwex9nZma0xaFhdGVVVVWhubsb5+Tm2tra+HL+4uLAwlT3W7xmrq6sYHBxEUVER+vv7EYvFcHd3h3w+j3g8jkwmY3M8f9l8eviwublpenp6TCgUMiUlJSYajZpkMml2dnZsj+Yr6ytDfqL6b6rQKQYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSDyF6IOl5yTF0T5AAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 113.176x144.48 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAJEAAAB7CAYAAAB0B2LHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAFO0lEQVR4nO3dT2iTdxzH8U//uDhbo226tMx21Vona2gC1rnLRDwtqEOYB9HpQJkysbDqZfMg7KSbCu42MraDMIWqtPSiCB5ahOyQIlYrYiwtVEfXdqwZrbNOa3YYE0Lnmu5r+D3J3i/I5Ze2z/fwzu95Qto+Rel0Oi3AoNj1AMh/RAQzIoIZEcGMiGBGRDAjIpgREcyICGZEBDMighkRwYyIYEZEMCMimBERzIgIZkQEMyKCGRHBjIhgRkQwIyKYERHMiAhmRAQzIoJZqesB8kHbWTfH/fpDN8edL3YimBERzIgIZkQEMyKCGRHBjIhgRkQwIyKYeSaiZ8+e6dSpU1q1apUWLlyoSCSinp4erV69Wvv373c93rx81/q6+ru/z1hLp9P65mO/BhKdjqbKHc987LF37151dnbq6NGjamlpUTwe144dOzQ+Pq7Dhw+7Hi9rU7/+pIepEb32RiRj/bexQf0xPanqhrWOJssdT0R07tw5nTlzRt3d3dqwYYMkaePGjbp+/bo6OjrU0tLieMLsjQ4mVFRcokBtKGP9l+E+LVpSrcWBOkeT5Y4nTmfHjx9XNBp9HtDfGhsbtWDBAjU3NzuabP5GBxOqqHlTpa+8mrE+Ptyn4IrC24UkD+xEDx48UH9/vw4dOjTrueHhYYVCIfl8vpwdv6ioaM6v+fSH7G86MDqYUGp0QLFPqjLWnzye0tr3j7z02XIp25steCIiSaqpqclYf/TokXp6erRp0yYXY/1no0O9eueDL/TWux9lrJ890qzqAt2JnJ/Oqqr+esUmk8mM9RMnTmhkZERr1qzJ6fHT6fScj2ylfh7Q44cTqg+/p8WB2uePmSfTevx7SsF5XlRnM1suH9lyvhM1NDQoHA7r2LFjqqys1LJly3Tx4kVdunRJkvLuorrUt2jWO7ORe3GVB+pUtqTa0WS55XwnKi4u1oULFxQKhXTgwAHt2bNHVVVVOnjwoEpLSxUOh12PmLXRwYSqV7yt4pLM1+bIwI8FeyqTpCKv3qpq9+7d6uvr082bN12Pwq/HzsH5TvQivb29eXUq+z/zZERTU1NKJpM5v6jGy+H8wvqflJeXa2ZmxvUYyJIndyLkFyKCGRHBjIhgRkQwIyKYERHMPPuxB/IHOxHMiAhmRAQzIoIZEcGMiGBGRDAjIpgREcyICGZEBDMighkRwYyIYEZEMCMimBERzIgIZkQEMyKCGRHBjIhgRkQwIyKYERHMPBVRV1eXtmzZomAwKJ/Pp/r6eu3cuVO3bt1yPRr+hSf+jPrp06fatWuX2tvbVVtbq82bN8vv9yuZTOry5cvq6upSNBp1PSZewBP/s7G1tVXt7e3at2+fTp8+rbKysufP3b9/X0uXLs3ZsT//6tuc/ex89+Vn2d1nznlE165dUywWUzQaVSwWm3VTlLq6wru1U6Fxfjrbtm2bOjo6dOPGDUUikbm/AZ7jPCK/369AIKChoSEnx+d09mLZns6cvjtLpVKanJzU8uXLXY4BI6c70cTEhCorK9XU1KTbt2+7GgNGTneiiooKrVy5Unfu3NHVq1dnPX/37l0HU2G+nF8TnT9/Xtu3b1dJSYm2bt2qxsZGjY2NKR6Pq6mpSZ2dhXcL8ELjPCJJunLlik6ePKlEIqHp6WkFg0GtW7dObW1tWr9+vevxMAdPRIT85qnPzpCfiAhmRAQzIoIZEcGMiGBGRDAjIpgREcyICGZEBDMighkRwYyIYEZEMCMimBERzIgIZkQEMyKCGRHBjIhgRkQwIyKYERHMiAhmRASzPwHPJLsuj1leygAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 173.376x144.48 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "obs = env.reset()\n",
    "display(env.render())\n",
    "\n",
    "for _ in range(1):\n",
    "    action, _states = modelA2C.predict(obs)\n",
    "    obs, _, done, info = env.step(action)\n",
    "    display(info[0]['circuit_img'])\n",
    "    \n",
    "env.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Comparison of the agents\n",
    "\n",
    "Finally, we compare the agents quantitavely by contrasting their average reward computed running 1000 episodes of the game. We rely on the *evaluation* module that provides simple and standard routines to evaluate the agents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "import evaluation\n",
    "n_episodes = 1000\n",
    "\n",
    "PPO2_perf, _ = evaluation.evaluate_model(modelPPO2, env, num_steps=n_episodes)\n",
    "A2C_perf, _ = evaluation.evaluate_model(modelA2C, env, num_steps=n_episodes)\n",
    "\n",
    "env = gym.make('qcircuit-v0')\n",
    "rand_perf, _ = evaluation.evaluate_random(env, num_steps=n_episodes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Mean performance of random agent (out of 1000 episodes): 97.674\n",
      "Mean performance of PPO2 agent (out of 1000 episodes): 99.9\n",
      "Mean performance of A2C agent (out of 1000 episodes): 99.893\n"
     ]
    }
   ],
   "source": [
    "print('Mean performance of random agent (out of {0} episodes): {1}'.format(n_episodes,rand_perf))\n",
    "print('Mean performance of PPO2 agent (out of {0} episodes): {1}'.format(n_episodes,PPO2_perf))\n",
    "print('Mean performance of A2C agent (out of {0} episodes): {1}'.format(n_episodes,A2C_perf))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As expected the reinforcement learning agents (PPO2, A2C) learned to play the game optimally. The random agent is still able to play and reach a solution given the small state and action space available; its average reward, however, is clearly inferior; on average it takes the random agent two and half more actions (or guesses) than PPO2/A2C per episode to reach the solution."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References\n",
    "\n",
    "[1] IBM qiskit, https://qiskit.org/\n",
    "\n",
    "[2] OpenAI gym, http://gym.openai.com/docs/\n",
    "\n",
    "[3] stable-baselines, https://github.com/hill-a/stable-baselines\n",
    "\n",
    "[4] gym-qcircuit, https://github.com/FMZennaro/gym-qcircuit"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}