{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Quantum Circuit Builder v0\n", "\n", "In this notebook we rely on IBM *qiskit* [1], OpenAI *gym* [2] and the library *stable-baselines* [3] to setup a quantum game and have some artificial reinforcement learning agent play and learn them.\n", "\n", "We setup a very simple game, *qcircuit-v0*, and we compare the performances of different agents playing it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "First of all, let us setup the packages necessary for this simulation as explained in [Setup.ipynb](Setup.ipynb).\n", "\n", "Next, let us import some basic libraries." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import numpy as np\n", "import gym\n", "\n", "from IPython.display import display" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importing the game\n", "\n", "The game we will run is provided in **gym-qcircuit** [4], and it is implemented complying with the standard OpenAI gym interface. \n", "\n", "The game is a simple *quantum circuit building* game: given a fixed number of qubits and a desired final state for these qubits, the objective is to design a quantum circuit that takes the given qubits to the desired final state. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import qcircuit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The module **qcircuit** offers two versions of the game:\n", "- *qcircuit-v0*: it presents the player with a single qubit, and it requires to design a simple circuit setting this qubit in a perfect superposition.\n", "- *qcircuit-v1*: a slightly more challenging scenario where the player is presented with two qubits and he/she is requested to design a circuit setting the qubits in the state $\\frac{1}{\\sqrt{2}}\\left|00\\right\\rangle +\\frac{1}{\\sqrt{2}}\\left|11\\right\\rangle $.\n", "\n", "Details on the implementation of these games are available at https://github.com/FMZennaro/gym-qcircuit/blob/master/qcircuit/envs/qcircuit_env.py." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## qcircuit-v0\n", "We start loading the first scenario and run agents on it." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "env = gym.make('qcircuit-v0')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The game *qcircuit-v0* is *completely observed*, and both its *state space* and *action space* are small.\n", "\n", "Remember that a single qubit is described by $\\alpha\\left|0\\right\\rangle +\\beta\\left|1\\right\\rangle$, where $\\alpha, \\beta$ are complex numbers and $\\left|0\\right\\rangle, \\left|1\\right\\rangle$ are the measurement axes. The state space is then described by four real numbers between -1 and 1 representing the real and complex part of $\\alpha, \\beta$.\n", "\n", "An agent plays the game interacting with a quantum circuit, adding and removing standard gates. In this version of the game there are only three actions available: add an *X gate*, add a *Hadamard gate*, or remove the last inserted gate.\n", "\n", "Again, details on the implementation of the state space and the action space are available at https://github.com/FMZennaro/gym-qcircuit/blob/master/qcircuit/envs/qcircuit_env.py." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Random agent\n", "First, we simply run a random agent. This allows us to test out the game and see its evolution.\n", "\n", "A random agent selects a possible action from the action space at random and executes it. Given the limited amount of actions (including the possibility of undoing actions by removing a gate), and the simple objective, the random agent should be able to land on the right circuit in a limited amount of actions. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAGMAAAB7CAYAAABgvj5jAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAD1UlEQVR4nO3czyt8exzH8ddclyGZmJGUETFSI5T5AyxsTlFqSiJSChsrm3tLtnyvlPXcvcVITbPBwsJQs8DCz+RsFEqxIJSNfO7K/Sb3dt3FOZ9XzetRZ3NOzXnXs885p6lzAsYYA6Hwi+0B5CfFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQoYnx/v6OpaUltLS0oLS0FJ2dncjlcmhtbcXk5KTt8Xzxq+0BPoyPjyOTyWBubg6JRAL5fB5DQ0O4v7/HzMyM7fH8YQisrKwYAGZ7e/vT/mQyaQCY/f19S5P5i+IytbCwAMdx0N3d/Wl/LBZDcXEx2tvbLU3mL+sxbm5ucHp6ioGBgS/Hrq6u0NbWhmAw6Nn5A4GA59t3UcQAgNra2k/7X19fkcvlkEgkbIxlhfUY1dXVAADXdT/tX1xcxO3tLbq6ujw9vzHG8+27rD9NNTU1oaOjA/Pz8wiHw6irq8Pa2hrW19cBoKBWRsD8n3QecV0XU1NT2NvbQyQSwdjYGCoqKjA7O4unpyeUlZXZHtEXFDH+yejoKI6OjnB8fGx7FN9Yv2f8m4ODg4K6RAGkMV5eXuC6ruc3bza0l6lCRLkyCpViEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUgwhVjGw2i76+PtTU1CAYDKKhoQHDw8M4OTmxPZovKF4je3t7w8jICNLpNKLRKHp7exEKheC6LjY2NpDNZuE4ju0xPWf9pXwAmJ6eRjqdxsTEBJaXl1FeXv73sevra1RWVnp27t//+NOz3/7w47fvfS/Leozd3V2kUik4joNUKvXlwyf19fWWJrPA848o/YePb0odHh7aHsU66/eMUCiESCSCy8tLK+dnukxZfZp6fHzE8/MzGhsbbY5Bw+rKeHh4QDgcRjwex9nZma0xaFhdGVVVVWhubsb5+Tm2tra+HL+4uLAwlT3W7xmrq6sYHBxEUVER+vv7EYvFcHd3h3w+j3g8jkwmY3M8f9l8eviwublpenp6TCgUMiUlJSYajZpkMml2dnZsj+Yr6ytDfqL6b6rQKQYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSDyF6IOl5yTF0T5AAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAGMAAAB7CAYAAABgvj5jAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAD1UlEQVR4nO3czyt8exzH8ddclyGZmJGUETFSI5T5AyxsTlFqSiJSChsrm3tLtnyvlPXcvcVITbPBwsJQs8DCz+RsFEqxIJSNfO7K/Sb3dt3FOZ9XzetRZ3NOzXnXs885p6lzAsYYA6Hwi+0B5CfFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQoYnx/v6OpaUltLS0oLS0FJ2dncjlcmhtbcXk5KTt8Xzxq+0BPoyPjyOTyWBubg6JRAL5fB5DQ0O4v7/HzMyM7fH8YQisrKwYAGZ7e/vT/mQyaQCY/f19S5P5i+IytbCwAMdx0N3d/Wl/LBZDcXEx2tvbLU3mL+sxbm5ucHp6ioGBgS/Hrq6u0NbWhmAw6Nn5A4GA59t3UcQAgNra2k/7X19fkcvlkEgkbIxlhfUY1dXVAADXdT/tX1xcxO3tLbq6ujw9vzHG8+27rD9NNTU1oaOjA/Pz8wiHw6irq8Pa2hrW19cBoKBWRsD8n3QecV0XU1NT2NvbQyQSwdjYGCoqKjA7O4unpyeUlZXZHtEXFDH+yejoKI6OjnB8fGx7FN9Yv2f8m4ODg4K6RAGkMV5eXuC6ruc3bza0l6lCRLkyCpViEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUgwhVjGw2i76+PtTU1CAYDKKhoQHDw8M4OTmxPZovKF4je3t7w8jICNLpNKLRKHp7exEKheC6LjY2NpDNZuE4ju0xPWf9pXwAmJ6eRjqdxsTEBJaXl1FeXv73sevra1RWVnp27t//+NOz3/7w47fvfS/Leozd3V2kUik4joNUKvXlwyf19fWWJrPA848o/YePb0odHh7aHsU66/eMUCiESCSCy8tLK+dnukxZfZp6fHzE8/MzGhsbbY5Bw+rKeHh4QDgcRjwex9nZma0xaFhdGVVVVWhubsb5+Tm2tra+HL+4uLAwlT3W7xmrq6sYHBxEUVER+vv7EYvFcHd3h3w+j3g8jkwmY3M8f9l8eviwublpenp6TCgUMiUlJSYajZpkMml2dnZsj+Yr6ytDfqL6b6rQKQYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSDyF6IOl5yTF0T5AAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAJEAAAB7CAYAAAB0B2LHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAFOElEQVR4nO3dT2gUZxyH8e/sJiY2EpIYgqCiaERIcAMKHtoG8dB2i6ke0iRV9KAQU6uHgLT10j/UNrFJWntoD+lNaJWNYcNeXKShmEZzMEH8l4p7MEUFQQQjColtdqeHUotYmzU/l3fWPh+YywxkfoSHd2aZbMbzfd8XYBByPQDyHxHBjIhgRkQwIyKYERHMiAhmRAQzIoIZEcGMiGBGRDAjIpgREcyICGZEBDMighkRwYyIYEZEMCMimBERzIgIZkQEMyKCGRHBjIhgVuB6gHww76ekk/P+/tqbTs77rFiJYEZEMCMimBERzIgIZkQEMyKCGRHBjIhgFpiIMpmMenp6tGrVKhUXF6uurk5DQ0NavXq1du/e7Xq8rPlTU/qj6R1lhk//s+/hQ82079fMZ1/Iz2QcTpcbgYlo165dOnjwoNra2pRMJtXc3KytW7fq2rVrWrdunevxsubNn6/Q241K/3hMvu/LT6eV/rxTKixU+MAH8kKB+ZU/N4F4dnb06FEdOXJEp06d0oYNGyRJGzdu1Llz5xSPx/MqIkkKbX5Lmf64/NNnlDk7Jv/OHRX0dMmbV+h6tJwIRESdnZ2KRqOPAvpbdXW1CgsLtWbNGkeTzY03v1ihpkalu7+WystU8M1X8kpecj1WzjhfW2/evKnLly+rqanpiWPXr19XbW2tioqKcnZ+z/Nm3eZselrhlmZ55eU5my2XW7YCEZEkLVq06LH9U1NTGhoayrtLmSRlBn9WJtYn743XlR5I6EV/84XziCorKyVJqVTqsf1dXV26deuW1q5dm9Pz+74/6/YsMmdHlf72O4U/+Ujh996VJifl/zKcs9lyuWXL+T3RihUrFIlE1NHRoYqKCi1evFj9/f06ceKEJOXVSpQZ/1XpjkMKv79fochf93Ghpkalfzgmr/7VF/KTmRSAlSgUCun48eOqra3Vnj17tHPnTlVWVmrv3r0qKChQJBJxPWJW/InflP74U4XbWhV65eVH+0ObG6R79+a8GuUDL6ivqtqxY4cuXLigixcvuh6FP4+dhfOV6GnGxsby6lL2fxbIiB48eKBUKpXzm2o8H85vrP/NggULlE6nXY+BLAVyJUJ+ISKYERHMiAhmRAQzIoIZEcEssI89kD9YiWBGRDAjIpgREcyICGZEBDMighkRwYyIYEZEMCMimBERzIgIZkQEMyKCGRHBjIhgRkQwIyKYERHMiAhmRAQzIoIZEcGMiGAWqIgSiYQaGhpUVVWloqIiLVu2TNu2bdOlS5dcj4b/EIivUc/MzGj79u2KxWJasmSJNm3apNLSUqVSKSWTSSUSCUWjUddj4ikC8T8b9+3bp1gsptbWVh0+fFglJSWPjt24cUNlZWU5O/eBL7/P2c/Od4c+zO49c84jGh4eVm9vr6LRqHp7e594McnSpUsdTYZsOb+cNTY2Kh6P6/z586qrq3M5CubIeUSlpaVauHChJiYmnJyfy9nTZXs5c/rpbHJyUvfv39fy5ctdjgEjpyvR3bt3VVFRoZqaGo2Pj7saA0ZOV6Ly8nKtXLlSV65c0eDg4BPHr1696mAqPCvn90R9fX1qaWlROBzWli1bVF1drdu3b2tkZEQ1NTUaGBhwOR6y4DwiSTp58qS6u7s1Ojqq6elpVVVVaf369Wpvb1d9fb3r8TCLQESE/BaoZ2fIT0QEMyKCGRHBjIhgRkQwIyKYERHMiAhmRAQzIoIZEcGMiGBGRDAjIpgREcyICGZEBDMighkRwYyIYEZEMCMimBERzIgIZkQEMyKC2Z+VCsVwnKYocAAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAL4AAAB7CAYAAADKUTqaAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAGZUlEQVR4nO3df0zUdRzH8df37hAQRH51sIRQxFwwjk3NtpI5/yhPRd0iIU3bdAGabOFcpX9ULgsMKP2j1mj1h5s/BjIY/8gs1iCMNo85Uch5saOhjSAX58AA5e7bHy7azZIDPD5f7v16bPfP57j7voHnPny/twM0Xdd1EAljUj0AkQoMn0Ri+CQSwyeRGD6JxPBJJIZPIjF8Eonhk0gMn0Ri+CQSwyeRGD6JxPBJJIZPIjF8Eonhk0gMn0Ri+CQSwyeRGD6JxPBJJIZPIjF8Eonhk0gMn0Ri+CSSRfUAc8G87xqVHPfeixtm9PiS049pkCk48drsH3M6uOOTSAyfRGL4JBLDJ5EYPonE8Ekkhk8iMXwSieGTSIYJ3+v1orKyEsuWLUNYWBiysrLQ0tKC5cuXo7CwUPV4ftNHRnB/26vwtl78d21sDOMlBzH+4cfQvV6F0z3a18VPorP5G581Xdfx5RtR6HbUK5oqMAwT/p49e3D06FEUFRWhsbEReXl52L59O1wuF1auXKl6PL9p4eEwvZILz+mz0HUduscDz0dlQEgIzIfegWYyzJfcx/Cfv+Guuw9PPJXls35nwIV7o0NISF2laLLAMMR7dc6cOYOTJ0+iubkZa9euBQCsW7cOly9fRl1d3ZwKHwBMWzbDW1sH/eKP8F5qh377NiyV5dDmhage7X/1uxzQTGbEJWX4rN/u7cD8hQlYEJesaLLAMET4ZWVlsNvtE9H/Iy0tDSEhIcjMzFQ02fRo4WEwbcuFp+IzICYalhOfQouYr3qsR+p3ORCT+DQs88J91v/o7YB1SXDt9oABwr916xY6Oztx4MCBh+7r7e1FRkYGQkNDA3Z8TdMm/ZiQb89P78lHR2HOz4MWEzOth/sz26O8dcr/f1rf73LA3d+Nqr3xPuv3x4axavNhv59npjPPlK779zkbInwASExM9FkfGRlBS0sLNm7cqGKsGfE2fQ9vdQ209S/BU98AbcN65UFMpr+nHc+9fATPrHndZ/304UwkBOGOr/xKKz7+wQ7jdDp91svLy9HX14cVK1YE9Pi6rk96mwrvJQc8n38B8wfvwfzmXsDthv5Da8Bmexxzu3/vxtjdQaTY1mNBXNLEzXN/FGN/uWGdwoXtTGeerc9Z+Y6fmpoKm82G0tJSxMbGYtGiRaitrcX58w9OL+bSha2362d4So/B/PZBmGwPrktM23LhOXUWWvYaw76i0+9ywBI6/6FXdPp+aUNkXDIiFiYomixwlH8nTCYTzp07h4yMDOzbtw+7d+9GfHw89u/fD4vFApvNpnpEv+g9v8Lz/hGYiwpgeuH5iXXTlhzgzp1p7/qzod/lQMKSZ2Ey++6Dfd0/BeVpDgBo+lR/ls+SXbt2oaOjA1evXlU9Cn/1cAr4q4cz1N7ePqdOc2huMWT4w8PDcDqdAb+wJbmUX9z+l8jISHg8HtVjUBAz5I5PFGgMn0Ri+CQSwyeRGD6JxPBJJIZPIhn2LQtEgcQdn0Ri+CQSwyeRGD6JxPBJJIZPIjF8Eonhk0gMn0Ri+CQSwyeRGD6JxPBJJIZPIjF8Eonhk0gMn0Ri+CQSwyeRGD6JxPBJJIZPIjF8Eonhk0gMn0QyVPgNDQ3IycmB1WpFaGgoUlJSsGPHDly7dk31aBRkDPEnBMfHx7Fz505UV1cjKSkJmzZtQlRUFJxOJxobG9HQ0AC73a56TAoihvgfWMXFxaiurkZBQQGOHz+OiIiIiftu3ryJ6OjogB370CdfBey5afYde7fQr49THn5rayuqqqpgt9tRVVUFTdN87k9OTlY0GQUz5ac6ubm5qKurw5UrV5CVlTX5A4geA+XhR0VFIS4uDj09PUqOz1Od4OLvqY7SV3XcbjeGhoawePFilWOQQEp3/MHBQcTGxiI9PR1dXV2qxiCBlO74MTExWLp0Ka5fv46mpqaH7r9x44aCqUgC5ef4NTU1yM/Ph9lsxtatW5GWloaBgQG0tbUhPT0d9fX1KsejIKU8fAC4cOECKioq4HA4MDo6CqvVitWrV6OkpATZ2dmqx6MgZIjwiWabod6rQzRbGD6JxPBJJIZPIjF8Eonhk0gMn0Ri+CQSwyeRGD6JxPBJJIZPIjF8Eonhk0gMn0Ri+CQSwyeRGD6JxPBJJIZPIjF8Eonhk0gMn0Ri+CQSwyeRGD6JxPBJpL8BrWtIpPjnpSwAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "env.reset()\n", "display(env.render())\n", "\n", "done = False\n", "while(not done):\n", " obs, _, done, info = env.step(env.action_space.sample())\n", " display(info['circuit_img'])\n", " \n", "env.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### PPO2 Agent\n", "\n", "We now run a *PPO2* agent, a more sophisticated agent picked from the library of *stable_baselines*.\n", "\n", "First we import the agent." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:\n", "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", "For more information, please see:\n", " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", " * https://github.com/tensorflow/addons\n", " * https://github.com/tensorflow/io (for I/O related ops)\n", "If you depend on functionality not listed there, please file an issue.\n", "\n" ] } ], "source": [ "from stable_baselines.common.policies import MlpPolicy\n", "from stable_baselines.common.vec_env import DummyVecEnv\n", "from stable_baselines import PPO2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we train it." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/tf_util.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.\n", "\n", "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/tf_util.py:66: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.\n", "\n", "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/policies.py:115: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", "\n", "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/input.py:25: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.\n", "\n", "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/policies.py:562: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Use keras.layers.flatten instead.\n", "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:332: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Please use `layer.__call__` method instead.\n", "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/a2c/utils.py:156: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", "\n", "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/distributions.py:323: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.\n", "\n", "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/distributions.py:324: The name tf.log is deprecated. Please use tf.math.log instead.\n", "\n", "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/ppo2/ppo2.py:193: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.\n", "\n", "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/ppo2/ppo2.py:201: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.\n", "\n", "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1424: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/ppo2/ppo2.py:209: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.\n", "\n", "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/ppo2/ppo2.py:243: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.\n", "\n", "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/ppo2/ppo2.py:245: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.\n", "\n", "--------------------------------------\n", "| approxkl | 7.973169e-05 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -0.000982 |\n", "| fps | 22 |\n", "| n_updates | 1 |\n", "| policy_entropy | 1.0985414 |\n", "| policy_loss | -0.0070484644 |\n", "| serial_timesteps | 128 |\n", "| time_elapsed | 3.58e-06 |\n", "| total_timesteps | 128 |\n", "| value_loss | 3996.909 |\n", "--------------------------------------\n", "--------------------------------------\n", "| approxkl | 3.2882737e-05 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | 0.000217 |\n", "| fps | 30 |\n", "| n_updates | 2 |\n", "| policy_entropy | 1.0981455 |\n", "| policy_loss | -0.004966901 |\n", "| serial_timesteps | 256 |\n", "| time_elapsed | 5.61 |\n", "| total_timesteps | 256 |\n", "| value_loss | 4248.403 |\n", "--------------------------------------\n", "--------------------------------------\n", "| approxkl | 6.612671e-05 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -0.000358 |\n", "| fps | 28 |\n", "| n_updates | 3 |\n", "| policy_entropy | 1.0974414 |\n", "| policy_loss | -0.0062959143 |\n", "| serial_timesteps | 384 |\n", "| time_elapsed | 9.82 |\n", "| total_timesteps | 384 |\n", "| value_loss | 3907.8992 |\n", "--------------------------------------\n", "--------------------------------------\n", "| approxkl | 1.0456845e-05 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -0.000337 |\n", "| fps | 30 |\n", "| n_updates | 4 |\n", "| policy_entropy | 1.0968205 |\n", "| policy_loss | -0.002253918 |\n", "| serial_timesteps | 512 |\n", "| time_elapsed | 14.4 |\n", "| total_timesteps | 512 |\n", "| value_loss | 4101.919 |\n", "--------------------------------------\n", "--------------------------------------\n", "| approxkl | 0.00016963946 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -1.24e-05 |\n", "| fps | 28 |\n", "| n_updates | 5 |\n", "| policy_entropy | 1.095028 |\n", "| policy_loss | -0.0103989765 |\n", "| serial_timesteps | 640 |\n", "| time_elapsed | 18.6 |\n", "| total_timesteps | 640 |\n", "| value_loss | 4050.2324 |\n", "--------------------------------------\n", "--------------------------------------\n", "| approxkl | 0.00015043674 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -0.00013 |\n", "| fps | 39 |\n", "| n_updates | 6 |\n", "| policy_entropy | 1.0920858 |\n", "| policy_loss | -0.009303682 |\n", "| serial_timesteps | 768 |\n", "| time_elapsed | 23 |\n", "| total_timesteps | 768 |\n", "| value_loss | 3945.5684 |\n", "--------------------------------------\n", "--------------------------------------\n", "| approxkl | 0.00032779737 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | 0.00017 |\n", "| fps | 28 |\n", "| n_updates | 7 |\n", "| policy_entropy | 1.0865046 |\n", "| policy_loss | -0.01436338 |\n", "| serial_timesteps | 896 |\n", "| time_elapsed | 26.3 |\n", "| total_timesteps | 896 |\n", "| value_loss | 3870.6 |\n", "--------------------------------------\n", "-------------------------------------\n", "| approxkl | 0.0006958895 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | 0.000924 |\n", "| fps | 37 |\n", "| n_updates | 8 |\n", "| policy_entropy | 1.0740579 |\n", "| policy_loss | -0.019546235 |\n", "| serial_timesteps | 1024 |\n", "| time_elapsed | 30.8 |\n", "| total_timesteps | 1024 |\n", "| value_loss | 3722.3884 |\n", "-------------------------------------\n", "--------------------------------------\n", "| approxkl | 0.00084322505 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | 0.000482 |\n", "| fps | 25 |\n", "| n_updates | 9 |\n", "| policy_entropy | 1.0548321 |\n", "| policy_loss | -0.025074812 |\n", "| serial_timesteps | 1152 |\n", "| time_elapsed | 34.2 |\n", "| total_timesteps | 1152 |\n", "| value_loss | 3957.3313 |\n", "--------------------------------------\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "-------------------------------------\n", "| approxkl | 0.0018638866 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -0.000548 |\n", "| fps | 35 |\n", "| n_updates | 10 |\n", "| policy_entropy | 1.01555 |\n", "| policy_loss | -0.04078684 |\n", "| serial_timesteps | 1280 |\n", "| time_elapsed | 39.3 |\n", "| total_timesteps | 1280 |\n", "| value_loss | 4177.0903 |\n", "-------------------------------------\n", "-------------------------------------\n", "| approxkl | 0.0021850825 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | 0.00265 |\n", "| fps | 22 |\n", "| n_updates | 11 |\n", "| policy_entropy | 0.9594531 |\n", "| policy_loss | -0.039690077 |\n", "| serial_timesteps | 1408 |\n", "| time_elapsed | 42.9 |\n", "| total_timesteps | 1408 |\n", "| value_loss | 4124.8867 |\n", "-------------------------------------\n", "-------------------------------------\n", "| approxkl | 0.0033041902 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -0.000456 |\n", "| fps | 34 |\n", "| n_updates | 12 |\n", "| policy_entropy | 0.8843033 |\n", "| policy_loss | -0.04696823 |\n", "| serial_timesteps | 1536 |\n", "| time_elapsed | 48.7 |\n", "| total_timesteps | 1536 |\n", "| value_loss | 4201.283 |\n", "-------------------------------------\n", "-------------------------------------\n", "| approxkl | 0.0030925882 |\n", "| clipfrac | 0.021484375 |\n", "| explained_variance | 0.000952 |\n", "| fps | 34 |\n", "| n_updates | 13 |\n", "| policy_entropy | 0.7641637 |\n", "| policy_loss | -0.05795412 |\n", "| serial_timesteps | 1664 |\n", "| time_elapsed | 52.5 |\n", "| total_timesteps | 1664 |\n", "| value_loss | 4440.92 |\n", "-------------------------------------\n", "-------------------------------------\n", "| approxkl | 0.0038732863 |\n", "| clipfrac | 0.04296875 |\n", "| explained_variance | 0.00518 |\n", "| fps | 22 |\n", "| n_updates | 14 |\n", "| policy_entropy | 0.6456456 |\n", "| policy_loss | -0.061438754 |\n", "| serial_timesteps | 1792 |\n", "| time_elapsed | 56.2 |\n", "| total_timesteps | 1792 |\n", "| value_loss | 4367.134 |\n", "-------------------------------------\n", "-------------------------------------\n", "| approxkl | 0.003097237 |\n", "| clipfrac | 0.041015625 |\n", "| explained_variance | 0.00306 |\n", "| fps | 33 |\n", "| n_updates | 15 |\n", "| policy_entropy | 0.51307905 |\n", "| policy_loss | -0.057217635 |\n", "| serial_timesteps | 1920 |\n", "| time_elapsed | 61.7 |\n", "| total_timesteps | 1920 |\n", "| value_loss | 4417.79 |\n", "-------------------------------------\n", "-------------------------------------\n", "| approxkl | 0.0021201954 |\n", "| clipfrac | 0.03125 |\n", "| explained_variance | -0.0133 |\n", "| fps | 33 |\n", "| n_updates | 16 |\n", "| policy_entropy | 0.40890202 |\n", "| policy_loss | -0.04748139 |\n", "| serial_timesteps | 2048 |\n", "| time_elapsed | 65.5 |\n", "| total_timesteps | 2048 |\n", "| value_loss | 4432.298 |\n", "-------------------------------------\n", "-------------------------------------\n", "| approxkl | 0.0014529026 |\n", "| clipfrac | 0.01953125 |\n", "| explained_variance | 0.00501 |\n", "| fps | 17 |\n", "| n_updates | 17 |\n", "| policy_entropy | 0.31277794 |\n", "| policy_loss | -0.03971738 |\n", "| serial_timesteps | 2176 |\n", "| time_elapsed | 69.3 |\n", "| total_timesteps | 2176 |\n", "| value_loss | 4414.0176 |\n", "-------------------------------------\n", "-------------------------------------\n", "| approxkl | 0.0011339688 |\n", "| clipfrac | 0.01953125 |\n", "| explained_variance | 0.00757 |\n", "| fps | 32 |\n", "| n_updates | 18 |\n", "| policy_entropy | 0.24806914 |\n", "| policy_loss | -0.025814183 |\n", "| serial_timesteps | 2304 |\n", "| time_elapsed | 76.6 |\n", "| total_timesteps | 2304 |\n", "| value_loss | 4313.063 |\n", "-------------------------------------\n", "-------------------------------------\n", "| approxkl | 0.0007391007 |\n", "| clipfrac | 0.01171875 |\n", "| explained_variance | 0.0316 |\n", "| fps | 23 |\n", "| n_updates | 19 |\n", "| policy_entropy | 0.19068442 |\n", "| policy_loss | -0.02404321 |\n", "| serial_timesteps | 2432 |\n", "| time_elapsed | 80.5 |\n", "| total_timesteps | 2432 |\n", "| value_loss | 4365.6025 |\n", "-------------------------------------\n", "-------------------------------------\n", "| approxkl | 0.0008076045 |\n", "| clipfrac | 0.01171875 |\n", "| explained_variance | -0.0131 |\n", "| fps | 29 |\n", "| n_updates | 20 |\n", "| policy_entropy | 0.14820632 |\n", "| policy_loss | -0.026867293 |\n", "| serial_timesteps | 2560 |\n", "| time_elapsed | 85.9 |\n", "| total_timesteps | 2560 |\n", "| value_loss | 4335.884 |\n", "-------------------------------------\n", "--------------------------------------\n", "| approxkl | 5.5856257e-05 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -0.0232 |\n", "| fps | 27 |\n", "| n_updates | 21 |\n", "| policy_entropy | 0.11489105 |\n", "| policy_loss | -0.0046238033 |\n", "| serial_timesteps | 2688 |\n", "| time_elapsed | 90.3 |\n", "| total_timesteps | 2688 |\n", "| value_loss | 4331.92 |\n", "--------------------------------------\n", "--------------------------------------\n", "| approxkl | 2.1058022e-05 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | 0 |\n", "| fps | 17 |\n", "| n_updates | 22 |\n", "| policy_entropy | 0.09650411 |\n", "| policy_loss | -0.0025608484 |\n", "| serial_timesteps | 2816 |\n", "| time_elapsed | 94.9 |\n", "| total_timesteps | 2816 |\n", "| value_loss | 4296.74 |\n", "--------------------------------------\n", "--------------------------------------\n", "| approxkl | 4.2838517e-05 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -0.0345 |\n", "| fps | 28 |\n", "| n_updates | 23 |\n", "| policy_entropy | 0.08733705 |\n", "| policy_loss | -0.004682667 |\n", "| serial_timesteps | 2944 |\n", "| time_elapsed | 102 |\n", "| total_timesteps | 2944 |\n", "| value_loss | 4259.8 |\n", "--------------------------------------\n", "---------------------------------------\n", "| approxkl | 0.000102847254 |\n", "| clipfrac | 0.001953125 |\n", "| explained_variance | -0.0375 |\n", "| fps | 29 |\n", "| n_updates | 24 |\n", "| policy_entropy | 0.07257957 |\n", "| policy_loss | -0.0053370036 |\n", "| serial_timesteps | 3072 |\n", "| time_elapsed | 107 |\n", "| total_timesteps | 3072 |\n", "| value_loss | 4229.9087 |\n", "---------------------------------------\n", "--------------------------------------\n", "| approxkl | 1.1270785e-07 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 32 |\n", "| n_updates | 25 |\n", "| policy_entropy | 0.06345923 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 3200 |\n", "| time_elapsed | 111 |\n", "| total_timesteps | 3200 |\n", "| value_loss | 4198.5474 |\n", "--------------------------------------\n", "--------------------------------------\n", "| approxkl | 5.508119e-06 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -0.0257 |\n", "| fps | 28 |\n", "| n_updates | 26 |\n", "| policy_entropy | 0.06236457 |\n", "| policy_loss | -0.0010745144 |\n", "| serial_timesteps | 3328 |\n", "| time_elapsed | 115 |\n", "| total_timesteps | 3328 |\n", "| value_loss | 4156.753 |\n", "--------------------------------------\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "--------------------------------------\n", "| approxkl | 1.636944e-05 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | 0 |\n", "| fps | 15 |\n", "| n_updates | 27 |\n", "| policy_entropy | 0.055864867 |\n", "| policy_loss | -0.0025804834 |\n", "| serial_timesteps | 3456 |\n", "| time_elapsed | 119 |\n", "| total_timesteps | 3456 |\n", "| value_loss | 4127.7583 |\n", "--------------------------------------\n", "--------------------------------------\n", "| approxkl | 2.8690836e-05 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -1.19e-07 |\n", "| fps | 29 |\n", "| n_updates | 28 |\n", "| policy_entropy | 0.0511804 |\n", "| policy_loss | -0.0035447306 |\n", "| serial_timesteps | 3584 |\n", "| time_elapsed | 128 |\n", "| total_timesteps | 3584 |\n", "| value_loss | 4091.4973 |\n", "--------------------------------------\n", "--------------------------------------\n", "| approxkl | 7.4687875e-08 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 30 |\n", "| n_updates | 29 |\n", "| policy_entropy | 0.047582556 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 3712 |\n", "| time_elapsed | 132 |\n", "| total_timesteps | 3712 |\n", "| value_loss | 4069.115 |\n", "--------------------------------------\n", "--------------------------------------\n", "| approxkl | 4.1875264e-06 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | 0 |\n", "| fps | 29 |\n", "| n_updates | 30 |\n", "| policy_entropy | 0.045673173 |\n", "| policy_loss | -0.0009450745 |\n", "| serial_timesteps | 3840 |\n", "| time_elapsed | 136 |\n", "| total_timesteps | 3840 |\n", "| value_loss | 4034.147 |\n", "--------------------------------------\n", "-------------------------------------\n", "| approxkl | 8.003526e-08 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 31 |\n", "| n_updates | 31 |\n", "| policy_entropy | 0.04141441 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 3968 |\n", "| time_elapsed | 141 |\n", "| total_timesteps | 3968 |\n", "| value_loss | 4008.5713 |\n", "-------------------------------------\n", "---------------------------------------\n", "| approxkl | 2.251936e-06 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | 0 |\n", "| fps | 31 |\n", "| n_updates | 32 |\n", "| policy_entropy | 0.03956447 |\n", "| policy_loss | -0.00078786956 |\n", "| serial_timesteps | 4096 |\n", "| time_elapsed | 145 |\n", "| total_timesteps | 4096 |\n", "| value_loss | 3974.9924 |\n", "---------------------------------------\n", "-------------------------------------\n", "| approxkl | 6.080667e-08 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 32 |\n", "| n_updates | 33 |\n", "| policy_entropy | 0.035701253 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 4224 |\n", "| time_elapsed | 149 |\n", "| total_timesteps | 4224 |\n", "| value_loss | 3948.7913 |\n", "-------------------------------------\n", "--------------------------------------\n", "| approxkl | 1.4656681e-09 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 12 |\n", "| n_updates | 34 |\n", "| policy_entropy | 0.03424095 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 4352 |\n", "| time_elapsed | 153 |\n", "| total_timesteps | 4352 |\n", "| value_loss | 3919.9714 |\n", "--------------------------------------\n", "-------------------------------------\n", "| approxkl | 6.521696e-12 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 29 |\n", "| n_updates | 35 |\n", "| policy_entropy | 0.034051728 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 4480 |\n", "| time_elapsed | 163 |\n", "| total_timesteps | 4480 |\n", "| value_loss | 3890.7466 |\n", "-------------------------------------\n", "---------------------------------------\n", "| approxkl | 5.814285e-07 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -0.0895 |\n", "| fps | 29 |\n", "| n_updates | 36 |\n", "| policy_entropy | 0.03464252 |\n", "| policy_loss | -0.00025901757 |\n", "| serial_timesteps | 4608 |\n", "| time_elapsed | 168 |\n", "| total_timesteps | 4608 |\n", "| value_loss | 3858.5444 |\n", "---------------------------------------\n", "--------------------------------------\n", "| approxkl | 1.0398925e-05 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | 2.98e-07 |\n", "| fps | 30 |\n", "| n_updates | 37 |\n", "| policy_entropy | 0.030399777 |\n", "| policy_loss | -0.0020691967 |\n", "| serial_timesteps | 4736 |\n", "| time_elapsed | 172 |\n", "| total_timesteps | 4736 |\n", "| value_loss | 3828.6802 |\n", "--------------------------------------\n", "--------------------------------------\n", "| approxkl | 2.3587065e-08 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 29 |\n", "| n_updates | 38 |\n", "| policy_entropy | 0.02762542 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 4864 |\n", "| time_elapsed | 176 |\n", "| total_timesteps | 4864 |\n", "| value_loss | 3804.909 |\n", "--------------------------------------\n", "-------------------------------------\n", "| approxkl | 5.955296e-10 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 25 |\n", "| n_updates | 39 |\n", "| policy_entropy | 0.026657818 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 4992 |\n", "| time_elapsed | 181 |\n", "| total_timesteps | 4992 |\n", "| value_loss | 3777.6287 |\n", "-------------------------------------\n", "--------------------------------------\n", "| approxkl | 2.6384682e-12 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 28 |\n", "| n_updates | 40 |\n", "| policy_entropy | 0.026530726 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 5120 |\n", "| time_elapsed | 186 |\n", "| total_timesteps | 5120 |\n", "| value_loss | 3750.984 |\n", "--------------------------------------\n", "------------------------------------\n", "| approxkl | 6.78427e-12 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 27 |\n", "| n_updates | 41 |\n", "| policy_entropy | 0.026557334 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 5248 |\n", "| time_elapsed | 190 |\n", "| total_timesteps | 5248 |\n", "| value_loss | 3724.9565 |\n", "------------------------------------\n", "----------------------------------------\n", "| approxkl | 1.6480338e-07 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -0.0964 |\n", "| fps | 12 |\n", "| n_updates | 42 |\n", "| policy_entropy | 0.027045451 |\n", "| policy_loss | -0.000119969714 |\n", "| serial_timesteps | 5376 |\n", "| time_elapsed | 195 |\n", "| total_timesteps | 5376 |\n", "| value_loss | 3695.2075 |\n", "----------------------------------------\n", "--------------------------------------\n", "| approxkl | 1.0748278e-05 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | 1.19e-07 |\n", "| fps | 31 |\n", "| n_updates | 43 |\n", "| policy_entropy | 0.023529774 |\n", "| policy_loss | -0.0020988667 |\n", "| serial_timesteps | 5504 |\n", "| time_elapsed | 205 |\n", "| total_timesteps | 5504 |\n", "| value_loss | 3668.289 |\n", "--------------------------------------\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "--------------------------------------\n", "| approxkl | 9.843582e-06 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -0.0959 |\n", "| fps | 29 |\n", "| n_updates | 44 |\n", "| policy_entropy | 0.021820018 |\n", "| policy_loss | -0.0019928538 |\n", "| serial_timesteps | 5632 |\n", "| time_elapsed | 209 |\n", "| total_timesteps | 5632 |\n", "| value_loss | 3643.6006 |\n", "--------------------------------------\n", "--------------------------------------\n", "| approxkl | 1.6955488e-05 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -0.0952 |\n", "| fps | 29 |\n", "| n_updates | 45 |\n", "| policy_entropy | 0.019816618 |\n", "| policy_loss | -0.0025166627 |\n", "| serial_timesteps | 5760 |\n", "| time_elapsed | 213 |\n", "| total_timesteps | 5760 |\n", "| value_loss | 3617.57 |\n", "--------------------------------------\n", "-------------------------------------\n", "| approxkl | 8.778779e-09 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 30 |\n", "| n_updates | 46 |\n", "| policy_entropy | 0.017607767 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 5888 |\n", "| time_elapsed | 218 |\n", "| total_timesteps | 5888 |\n", "| value_loss | 3594.7239 |\n", "-------------------------------------\n", "--------------------------------------\n", "| approxkl | 2.3879365e-10 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 30 |\n", "| n_updates | 47 |\n", "| policy_entropy | 0.01695734 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 6016 |\n", "| time_elapsed | 222 |\n", "| total_timesteps | 6016 |\n", "| value_loss | 3569.6123 |\n", "--------------------------------------\n", "---------------------------------------\n", "| approxkl | 7.391328e-09 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -1.19e-07 |\n", "| fps | 30 |\n", "| n_updates | 48 |\n", "| policy_entropy | 0.016850296 |\n", "| policy_loss | -4.8967544e-05 |\n", "| serial_timesteps | 6144 |\n", "| time_elapsed | 226 |\n", "| total_timesteps | 6144 |\n", "| value_loss | 3541.0596 |\n", "---------------------------------------\n", "--------------------------------------\n", "| approxkl | 2.9383868e-05 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -0.0493 |\n", "| fps | 30 |\n", "| n_updates | 49 |\n", "| policy_entropy | 0.01639788 |\n", "| policy_loss | -0.004105901 |\n", "| serial_timesteps | 6272 |\n", "| time_elapsed | 230 |\n", "| total_timesteps | 6272 |\n", "| value_loss | 3514.154 |\n", "--------------------------------------\n", "--------------------------------------\n", "| approxkl | 1.8241103e-08 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 28 |\n", "| n_updates | 50 |\n", "| policy_entropy | 0.013761687 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 6400 |\n", "| time_elapsed | 234 |\n", "| total_timesteps | 6400 |\n", "| value_loss | 3496.9583 |\n", "--------------------------------------\n", "-------------------------------------\n", "| approxkl | 4.921573e-10 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 26 |\n", "| n_updates | 51 |\n", "| policy_entropy | 0.012798915 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 6528 |\n", "| time_elapsed | 239 |\n", "| total_timesteps | 6528 |\n", "| value_loss | 3473.544 |\n", "-------------------------------------\n", "--------------------------------------\n", "| approxkl | 1.2990284e-11 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 10 |\n", "| n_updates | 52 |\n", "| policy_entropy | 0.0126367025 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 6656 |\n", "| time_elapsed | 244 |\n", "| total_timesteps | 6656 |\n", "| value_loss | 3449.3325 |\n", "--------------------------------------\n", "-------------------------------------\n", "| approxkl | 4.424428e-15 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 24 |\n", "| n_updates | 53 |\n", "| policy_entropy | 0.01261734 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 6784 |\n", "| time_elapsed | 256 |\n", "| total_timesteps | 6784 |\n", "| value_loss | 3425.9346 |\n", "-------------------------------------\n", "-------------------------------------\n", "| approxkl | 3.002179e-13 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 28 |\n", "| n_updates | 54 |\n", "| policy_entropy | 0.012623467 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 6912 |\n", "| time_elapsed | 261 |\n", "| total_timesteps | 6912 |\n", "| value_loss | 3402.3008 |\n", "-------------------------------------\n", "--------------------------------------\n", "| approxkl | 4.8433507e-13 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 29 |\n", "| n_updates | 55 |\n", "| policy_entropy | 0.012635817 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 7040 |\n", "| time_elapsed | 266 |\n", "| total_timesteps | 7040 |\n", "| value_loss | 3379.5684 |\n", "--------------------------------------\n", "-------------------------------------\n", "| approxkl | 4.046746e-13 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 30 |\n", "| n_updates | 56 |\n", "| policy_entropy | 0.01264801 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 7168 |\n", "| time_elapsed | 270 |\n", "| total_timesteps | 7168 |\n", "| value_loss | 3356.5557 |\n", "-------------------------------------\n", "--------------------------------------\n", "| approxkl | 4.0507368e-13 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 30 |\n", "| n_updates | 57 |\n", "| policy_entropy | 0.0126599725 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 7296 |\n", "| time_elapsed | 274 |\n", "| total_timesteps | 7296 |\n", "| value_loss | 3333.3027 |\n", "--------------------------------------\n", "--------------------------------------\n", "| approxkl | 4.2151704e-13 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 30 |\n", "| n_updates | 58 |\n", "| policy_entropy | 0.012672037 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 7424 |\n", "| time_elapsed | 279 |\n", "| total_timesteps | 7424 |\n", "| value_loss | 3310.2527 |\n", "--------------------------------------\n", "-------------------------------------\n", "| approxkl | 3.842457e-13 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 31 |\n", "| n_updates | 59 |\n", "| policy_entropy | 0.012684286 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 7552 |\n", "| time_elapsed | 283 |\n", "| total_timesteps | 7552 |\n", "| value_loss | 3287.4731 |\n", "-------------------------------------\n", "-------------------------------------\n", "| approxkl | 5.241878e-13 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 27 |\n", "| n_updates | 60 |\n", "| policy_entropy | 0.012696676 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 7680 |\n", "| time_elapsed | 287 |\n", "| total_timesteps | 7680 |\n", "| value_loss | 3264.962 |\n", "-------------------------------------\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "-------------------------------------\n", "| approxkl | 4.985035e-13 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 28 |\n", "| n_updates | 61 |\n", "| policy_entropy | 0.012709266 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 7808 |\n", "| time_elapsed | 291 |\n", "| total_timesteps | 7808 |\n", "| value_loss | 3242.7092 |\n", "-------------------------------------\n", "---------------------------------------\n", "| approxkl | 3.615873e-10 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -3.58e-07 |\n", "| fps | 26 |\n", "| n_updates | 62 |\n", "| policy_entropy | 0.012707609 |\n", "| policy_loss | -3.6573038e-06 |\n", "| serial_timesteps | 7936 |\n", "| time_elapsed | 296 |\n", "| total_timesteps | 7936 |\n", "| value_loss | 3217.091 |\n", "---------------------------------------\n", "--------------------------------------\n", "| approxkl | 3.4761847e-09 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 28 |\n", "| n_updates | 63 |\n", "| policy_entropy | 0.011906338 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 8064 |\n", "| time_elapsed | 301 |\n", "| total_timesteps | 8064 |\n", "| value_loss | 3198.923 |\n", "--------------------------------------\n", "-------------------------------------\n", "| approxkl | 9.972234e-11 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 26 |\n", "| n_updates | 64 |\n", "| policy_entropy | 0.0114687495 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 8192 |\n", "| time_elapsed | 305 |\n", "| total_timesteps | 8192 |\n", "| value_loss | 3177.3672 |\n", "-------------------------------------\n", "---------------------------------------\n", "| approxkl | 9.6275965e-09 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -0.0722 |\n", "| fps | 8 |\n", "| n_updates | 65 |\n", "| policy_entropy | 0.011702726 |\n", "| policy_loss | -5.7652127e-05 |\n", "| serial_timesteps | 8320 |\n", "| time_elapsed | 310 |\n", "| total_timesteps | 8320 |\n", "| value_loss | 3153.4082 |\n", "---------------------------------------\n", "--------------------------------------\n", "| approxkl | 1.8454681e-09 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 26 |\n", "| n_updates | 66 |\n", "| policy_entropy | 0.010539748 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 8448 |\n", "| time_elapsed | 326 |\n", "| total_timesteps | 8448 |\n", "| value_loss | 3134.8647 |\n", "--------------------------------------\n", "-------------------------------------\n", "| approxkl | 5.293721e-11 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 29 |\n", "| n_updates | 67 |\n", "| policy_entropy | 0.0102127725 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 8576 |\n", "| time_elapsed | 331 |\n", "| total_timesteps | 8576 |\n", "| value_loss | 3113.911 |\n", "-------------------------------------\n", "-------------------------------------\n", "| approxkl | 9.471275e-13 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 26 |\n", "| n_updates | 68 |\n", "| policy_entropy | 0.010160062 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 8704 |\n", "| time_elapsed | 335 |\n", "| total_timesteps | 8704 |\n", "| value_loss | 3093.1445 |\n", "-------------------------------------\n", "-------------------------------------\n", "| approxkl | 6.822244e-14 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 28 |\n", "| n_updates | 69 |\n", "| policy_entropy | 0.010157408 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 8832 |\n", "| time_elapsed | 340 |\n", "| total_timesteps | 8832 |\n", "| value_loss | 3072.5557 |\n", "-------------------------------------\n", "-------------------------------------\n", "| approxkl | 1.745386e-13 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 26 |\n", "| n_updates | 70 |\n", "| policy_entropy | 0.010164159 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 8960 |\n", "| time_elapsed | 345 |\n", "| total_timesteps | 8960 |\n", "| value_loss | 3052.1367 |\n", "-------------------------------------\n", "---------------------------------------\n", "| approxkl | 1.7796338e-09 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -0.0652 |\n", "| fps | 25 |\n", "| n_updates | 71 |\n", "| policy_entropy | 0.0104862265 |\n", "| policy_loss | -1.4320016e-05 |\n", "| serial_timesteps | 9088 |\n", "| time_elapsed | 349 |\n", "| total_timesteps | 9088 |\n", "| value_loss | 3029.3086 |\n", "---------------------------------------\n", "-------------------------------------\n", "| approxkl | 2.333519e-09 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 28 |\n", "| n_updates | 72 |\n", "| policy_entropy | 0.009473039 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 9216 |\n", "| time_elapsed | 354 |\n", "| total_timesteps | 9216 |\n", "| value_loss | 3011.7778 |\n", "-------------------------------------\n", "-------------------------------------\n", "| approxkl | 6.870047e-11 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 28 |\n", "| n_updates | 73 |\n", "| policy_entropy | 0.009099268 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 9344 |\n", "| time_elapsed | 359 |\n", "| total_timesteps | 9344 |\n", "| value_loss | 2990.942 |\n", "-------------------------------------\n", "--------------------------------------\n", "| approxkl | 1.6176166e-12 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 27 |\n", "| n_updates | 74 |\n", "| policy_entropy | 0.009037095 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 9472 |\n", "| time_elapsed | 363 |\n", "| total_timesteps | 9472 |\n", "| value_loss | 2970.5623 |\n", "--------------------------------------\n", "--------------------------------------\n", "| approxkl | 1.3288071e-15 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 26 |\n", "| n_updates | 75 |\n", "| policy_entropy | 0.009030421 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 9600 |\n", "| time_elapsed | 368 |\n", "| total_timesteps | 9600 |\n", "| value_loss | 2949.9497 |\n", "--------------------------------------\n", "---------------------------------------\n", "| approxkl | 1.1359422e-08 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | -0.0324 |\n", "| fps | 27 |\n", "| n_updates | 76 |\n", "| policy_entropy | 0.009321555 |\n", "| policy_loss | -6.9826376e-05 |\n", "| serial_timesteps | 9728 |\n", "| time_elapsed | 373 |\n", "| total_timesteps | 9728 |\n", "| value_loss | 2924.0024 |\n", "---------------------------------------\n", "--------------------------------------\n", "| approxkl | 2.3084932e-09 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 30 |\n", "| n_updates | 77 |\n", "| policy_entropy | 0.008434594 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 9856 |\n", "| time_elapsed | 378 |\n", "| total_timesteps | 9856 |\n", "| value_loss | 2909.5547 |\n", "--------------------------------------\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "-------------------------------------\n", "| approxkl | 6.620775e-11 |\n", "| clipfrac | 0.0 |\n", "| explained_variance | nan |\n", "| fps | 28 |\n", "| n_updates | 78 |\n", "| policy_entropy | 0.008057287 |\n", "| policy_loss | 0.0 |\n", "| serial_timesteps | 9984 |\n", "| time_elapsed | 382 |\n", "| total_timesteps | 9984 |\n", "| value_loss | 2889.783 |\n", "-------------------------------------\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "env = DummyVecEnv([lambda: env])\n", "modelPPO2 = PPO2(MlpPolicy, env, verbose=1)\n", "modelPPO2.learn(total_timesteps=10000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Last, we test it by letting it play the game." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAGMAAAB7CAYAAABgvj5jAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAD1UlEQVR4nO3czyt8exzH8ddclyGZmJGUETFSI5T5AyxsTlFqSiJSChsrm3tLtnyvlPXcvcVITbPBwsJQs8DCz+RsFEqxIJSNfO7K/Sb3dt3FOZ9XzetRZ3NOzXnXs885p6lzAsYYA6Hwi+0B5CfFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQoYnx/v6OpaUltLS0oLS0FJ2dncjlcmhtbcXk5KTt8Xzxq+0BPoyPjyOTyWBubg6JRAL5fB5DQ0O4v7/HzMyM7fH8YQisrKwYAGZ7e/vT/mQyaQCY/f19S5P5i+IytbCwAMdx0N3d/Wl/LBZDcXEx2tvbLU3mL+sxbm5ucHp6ioGBgS/Hrq6u0NbWhmAw6Nn5A4GA59t3UcQAgNra2k/7X19fkcvlkEgkbIxlhfUY1dXVAADXdT/tX1xcxO3tLbq6ujw9vzHG8+27rD9NNTU1oaOjA/Pz8wiHw6irq8Pa2hrW19cBoKBWRsD8n3QecV0XU1NT2NvbQyQSwdjYGCoqKjA7O4unpyeUlZXZHtEXFDH+yejoKI6OjnB8fGx7FN9Yv2f8m4ODg4K6RAGkMV5eXuC6ruc3bza0l6lCRLkyCpViEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUgwhVjGw2i76+PtTU1CAYDKKhoQHDw8M4OTmxPZovKF4je3t7w8jICNLpNKLRKHp7exEKheC6LjY2NpDNZuE4ju0xPWf9pXwAmJ6eRjqdxsTEBJaXl1FeXv73sevra1RWVnp27t//+NOz3/7w47fvfS/Leozd3V2kUik4joNUKvXlwyf19fWWJrPA848o/YePb0odHh7aHsU66/eMUCiESCSCy8tLK+dnukxZfZp6fHzE8/MzGhsbbY5Bw+rKeHh4QDgcRjwex9nZma0xaFhdGVVVVWhubsb5+Tm2tra+HL+4uLAwlT3W7xmrq6sYHBxEUVER+vv7EYvFcHd3h3w+j3g8jkwmY3M8f9l8eviwublpenp6TCgUMiUlJSYajZpkMml2dnZsj+Yr6ytDfqL6b6rQKQYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSDyF6IOl5yTF0T5AAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAJEAAAB7CAYAAAB0B2LHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAFO0lEQVR4nO3dT2iTdxzH8U//uDhbo226tMx21Vona2gC1rnLRDwtqEOYB9HpQJkysbDqZfMg7KSbCu42MraDMIWqtPSiCB5ahOyQIlYrYiwtVEfXdqwZrbNOa3YYE0Lnmu5r+D3J3i/I5Ze2z/fwzu95Qto+Rel0Oi3AoNj1AMh/RAQzIoIZEcGMiGBGRDAjIpgREcyICGZEBDMighkRwYyIYEZEMCMimBERzIgIZkQEMyKCGRHBjIhgRkQwIyKYERHMiAhmRAQzIoJZqesB8kHbWTfH/fpDN8edL3YimBERzIgIZkQEMyKCGRHBjIhgRkQwIyKYeSaiZ8+e6dSpU1q1apUWLlyoSCSinp4erV69Wvv373c93rx81/q6+ru/z1hLp9P65mO/BhKdjqbKHc987LF37151dnbq6NGjamlpUTwe144dOzQ+Pq7Dhw+7Hi9rU7/+pIepEb32RiRj/bexQf0xPanqhrWOJssdT0R07tw5nTlzRt3d3dqwYYMkaePGjbp+/bo6OjrU0tLieMLsjQ4mVFRcokBtKGP9l+E+LVpSrcWBOkeT5Y4nTmfHjx9XNBp9HtDfGhsbtWDBAjU3NzuabP5GBxOqqHlTpa+8mrE+Ptyn4IrC24UkD+xEDx48UH9/vw4dOjTrueHhYYVCIfl8vpwdv6ioaM6v+fSH7G86MDqYUGp0QLFPqjLWnzye0tr3j7z02XIp25steCIiSaqpqclYf/TokXp6erRp0yYXY/1no0O9eueDL/TWux9lrJ890qzqAt2JnJ/Oqqr+esUmk8mM9RMnTmhkZERr1qzJ6fHT6fScj2ylfh7Q44cTqg+/p8WB2uePmSfTevx7SsF5XlRnM1suH9lyvhM1NDQoHA7r2LFjqqys1LJly3Tx4kVdunRJkvLuorrUt2jWO7ORe3GVB+pUtqTa0WS55XwnKi4u1oULFxQKhXTgwAHt2bNHVVVVOnjwoEpLSxUOh12PmLXRwYSqV7yt4pLM1+bIwI8FeyqTpCKv3qpq9+7d6uvr082bN12Pwq/HzsH5TvQivb29eXUq+z/zZERTU1NKJpM5v6jGy+H8wvqflJeXa2ZmxvUYyJIndyLkFyKCGRHBjIhgRkQwIyKYERHMPPuxB/IHOxHMiAhmRAQzIoIZEcGMiGBGRDAjIpgREcyICGZEBDMighkRwYyIYEZEMCMimBERzIgIZkQEMyKCGRHBjIhgRkQwIyKYERHMPBVRV1eXtmzZomAwKJ/Pp/r6eu3cuVO3bt1yPRr+hSf+jPrp06fatWuX2tvbVVtbq82bN8vv9yuZTOry5cvq6upSNBp1PSZewBP/s7G1tVXt7e3at2+fTp8+rbKysufP3b9/X0uXLs3ZsT//6tuc/ex89+Vn2d1nznlE165dUywWUzQaVSwWm3VTlLq6wru1U6Fxfjrbtm2bOjo6dOPGDUUikbm/AZ7jPCK/369AIKChoSEnx+d09mLZns6cvjtLpVKanJzU8uXLXY4BI6c70cTEhCorK9XU1KTbt2+7GgNGTneiiooKrVy5Unfu3NHVq1dnPX/37l0HU2G+nF8TnT9/Xtu3b1dJSYm2bt2qxsZGjY2NKR6Pq6mpSZ2dhXcL8ELjPCJJunLlik6ePKlEIqHp6WkFg0GtW7dObW1tWr9+vevxMAdPRIT85qnPzpCfiAhmRAQzIoIZEcGMiGBGRDAjIpgREcyICGZEBDMighkRwYyIYEZEMCMimBERzIgIZkQEMyKCGRHBjIhgRkQwIyKYERHMiAhmRASzPwHPJLsuj1leygAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "obs = env.reset()\n", "display(env.render())\n", "\n", "for _ in range(1):\n", " action, _states = modelPPO2.predict(obs)\n", " obs, _, done, info = env.step(action)\n", " display(info[0]['circuit_img'])\n", " \n", "env.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As expected, the agent easily learned the optimal circuit. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A2C Agent\n", "\n", "For comparison, we now run an *A2C* agent, another agent from the library of *stable_baselines*.\n", "\n", "First we import the agent." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "from stable_baselines import A2C" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We train it." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/tf_util.py:312: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.\n", "\n", "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/common/tf_util.py:312: The name tf.GraphKeys is deprecated. Please use tf.compat.v1.GraphKeys instead.\n", "\n", "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/stable_baselines/a2c/a2c.py:159: The name tf.train.RMSPropOptimizer is deprecated. Please use tf.compat.v1.train.RMSPropOptimizer instead.\n", "\n", "WARNING:tensorflow:From /home/fmzennaro/miniconda2_1/envs/quantumgymstable/lib/python3.7/site-packages/tensorflow_core/python/training/rmsprop.py:119: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Call initializer instance with the dtype argument instead of passing it to the constructor\n", "---------------------------------\n", "| explained_variance | 0.0313 |\n", "| fps | 7 |\n", "| nupdates | 1 |\n", "| policy_entropy | 1.1 |\n", "| total_timesteps | 5 |\n", "| value_loss | 9.92e+03 |\n", "---------------------------------\n", "---------------------------------\n", "| explained_variance | 0.00121 |\n", "| fps | 12 |\n", "| nupdates | 100 |\n", "| policy_entropy | 1.1 |\n", "| total_timesteps | 500 |\n", "| value_loss | 7.91e+03 |\n", "---------------------------------\n", "---------------------------------\n", "| explained_variance | 0.000339 |\n", "| fps | 16 |\n", "| nupdates | 200 |\n", "| policy_entropy | 1.1 |\n", "| total_timesteps | 1000 |\n", "| value_loss | 7.81e+03 |\n", "---------------------------------\n", "---------------------------------\n", "| explained_variance | -0.0287 |\n", "| fps | 18 |\n", "| nupdates | 300 |\n", "| policy_entropy | 1.1 |\n", "| total_timesteps | 1500 |\n", "| value_loss | 9.76e+03 |\n", "---------------------------------\n", "---------------------------------\n", "| explained_variance | -0.00107 |\n", "| fps | 20 |\n", "| nupdates | 400 |\n", "| policy_entropy | 1.09 |\n", "| total_timesteps | 2000 |\n", "| value_loss | 7.64e+03 |\n", "---------------------------------\n", "----------------------------------\n", "| explained_variance | -1.19e-07 |\n", "| fps | 21 |\n", "| nupdates | 500 |\n", "| policy_entropy | 1.07 |\n", "| total_timesteps | 2500 |\n", "| value_loss | 9.46e+03 |\n", "----------------------------------\n", "---------------------------------\n", "| explained_variance | -0.00371 |\n", "| fps | 21 |\n", "| nupdates | 600 |\n", "| policy_entropy | 1.06 |\n", "| total_timesteps | 3000 |\n", "| value_loss | 3.73e+03 |\n", "---------------------------------\n", "---------------------------------\n", "| explained_variance | 0 |\n", "| fps | 23 |\n", "| nupdates | 700 |\n", "| policy_entropy | 1 |\n", "| total_timesteps | 3500 |\n", "| value_loss | 8.85e+03 |\n", "---------------------------------\n", "---------------------------------\n", "| explained_variance | 0 |\n", "| fps | 23 |\n", "| nupdates | 800 |\n", "| policy_entropy | 0.939 |\n", "| total_timesteps | 4000 |\n", "| value_loss | 7.97e+03 |\n", "---------------------------------\n", "---------------------------------\n", "| explained_variance | 0 |\n", "| fps | 23 |\n", "| nupdates | 900 |\n", "| policy_entropy | 0.8 |\n", "| total_timesteps | 4500 |\n", "| value_loss | 6.86e+03 |\n", "---------------------------------\n", "----------------------------------\n", "| explained_variance | -1.19e-07 |\n", "| fps | 23 |\n", "| nupdates | 1000 |\n", "| policy_entropy | 0.661 |\n", "| total_timesteps | 5000 |\n", "| value_loss | 3.6e+03 |\n", "----------------------------------\n", "---------------------------------\n", "| explained_variance | -1.57 |\n", "| fps | 24 |\n", "| nupdates | 1100 |\n", "| policy_entropy | 0.554 |\n", "| total_timesteps | 5500 |\n", "| value_loss | 5.12e+03 |\n", "---------------------------------\n", "---------------------------------\n", "| explained_variance | 0 |\n", "| fps | 24 |\n", "| nupdates | 1200 |\n", "| policy_entropy | 0.375 |\n", "| total_timesteps | 6000 |\n", "| value_loss | 4.35e+03 |\n", "---------------------------------\n", "---------------------------------\n", "| explained_variance | nan |\n", "| fps | 24 |\n", "| nupdates | 1300 |\n", "| policy_entropy | 0.264 |\n", "| total_timesteps | 6500 |\n", "| value_loss | 3.79e+03 |\n", "---------------------------------\n", "---------------------------------\n", "| explained_variance | nan |\n", "| fps | 24 |\n", "| nupdates | 1400 |\n", "| policy_entropy | 0.172 |\n", "| total_timesteps | 7000 |\n", "| value_loss | 3.24e+03 |\n", "---------------------------------\n", "---------------------------------\n", "| explained_variance | nan |\n", "| fps | 24 |\n", "| nupdates | 1500 |\n", "| policy_entropy | 0.159 |\n", "| total_timesteps | 7500 |\n", "| value_loss | 2.74e+03 |\n", "---------------------------------\n", "---------------------------------\n", "| explained_variance | nan |\n", "| fps | 24 |\n", "| nupdates | 1600 |\n", "| policy_entropy | 0.122 |\n", "| total_timesteps | 8000 |\n", "| value_loss | 2.28e+03 |\n", "---------------------------------\n", "---------------------------------\n", "| explained_variance | nan |\n", "| fps | 24 |\n", "| nupdates | 1700 |\n", "| policy_entropy | 0.103 |\n", "| total_timesteps | 8500 |\n", "| value_loss | 1.86e+03 |\n", "---------------------------------\n", "---------------------------------\n", "| explained_variance | nan |\n", "| fps | 23 |\n", "| nupdates | 1800 |\n", "| policy_entropy | 0.0669 |\n", "| total_timesteps | 9000 |\n", "| value_loss | 1.49e+03 |\n", "---------------------------------\n", "---------------------------------\n", "| explained_variance | nan |\n", "| fps | 23 |\n", "| nupdates | 1900 |\n", "| policy_entropy | 0.0509 |\n", "| total_timesteps | 9500 |\n", "| value_loss | 1.16e+03 |\n", "---------------------------------\n", "---------------------------------\n", "| explained_variance | nan |\n", "| fps | 24 |\n", "| nupdates | 2000 |\n", "| policy_entropy | 0.0564 |\n", "| total_timesteps | 10000 |\n", "| value_loss | 870 |\n", "---------------------------------\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "modelA2C = A2C(MlpPolicy, env, verbose=1)\n", "modelA2C.learn(total_timesteps=10000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we test it by letting it play the game." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAGMAAAB7CAYAAABgvj5jAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAD1UlEQVR4nO3czyt8exzH8ddclyGZmJGUETFSI5T5AyxsTlFqSiJSChsrm3tLtnyvlPXcvcVITbPBwsJQs8DCz+RsFEqxIJSNfO7K/Sb3dt3FOZ9XzetRZ3NOzXnXs885p6lzAsYYA6Hwi+0B5CfFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQUQwiikFEMYgoBhHFIKIYRBSDiGIQoYnx/v6OpaUltLS0oLS0FJ2dncjlcmhtbcXk5KTt8Xzxq+0BPoyPjyOTyWBubg6JRAL5fB5DQ0O4v7/HzMyM7fH8YQisrKwYAGZ7e/vT/mQyaQCY/f19S5P5i+IytbCwAMdx0N3d/Wl/LBZDcXEx2tvbLU3mL+sxbm5ucHp6ioGBgS/Hrq6u0NbWhmAw6Nn5A4GA59t3UcQAgNra2k/7X19fkcvlkEgkbIxlhfUY1dXVAADXdT/tX1xcxO3tLbq6ujw9vzHG8+27rD9NNTU1oaOjA/Pz8wiHw6irq8Pa2hrW19cBoKBWRsD8n3QecV0XU1NT2NvbQyQSwdjYGCoqKjA7O4unpyeUlZXZHtEXFDH+yejoKI6OjnB8fGx7FN9Yv2f8m4ODg4K6RAGkMV5eXuC6ruc3bza0l6lCRLkyCpViEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUgwhVjGw2i76+PtTU1CAYDKKhoQHDw8M4OTmxPZovKF4je3t7w8jICNLpNKLRKHp7exEKheC6LjY2NpDNZuE4ju0xPWf9pXwAmJ6eRjqdxsTEBJaXl1FeXv73sevra1RWVnp27t//+NOz3/7w47fvfS/Leozd3V2kUik4joNUKvXlwyf19fWWJrPA848o/YePb0odHh7aHsU66/eMUCiESCSCy8tLK+dnukxZfZp6fHzE8/MzGhsbbY5Bw+rKeHh4QDgcRjwex9nZma0xaFhdGVVVVWhubsb5+Tm2tra+HL+4uLAwlT3W7xmrq6sYHBxEUVER+vv7EYvFcHd3h3w+j3g8jkwmY3M8f9l8eviwublpenp6TCgUMiUlJSYajZpkMml2dnZsj+Yr6ytDfqL6b6rQKQYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSCiGEQUg4hiEFEMIopBRDGIKAYRxSDyF6IOl5yTF0T5AAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAJEAAAB7CAYAAAB0B2LHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAFO0lEQVR4nO3dT2iTdxzH8U//uDhbo226tMx21Vona2gC1rnLRDwtqEOYB9HpQJkysbDqZfMg7KSbCu42MraDMIWqtPSiCB5ahOyQIlYrYiwtVEfXdqwZrbNOa3YYE0Lnmu5r+D3J3i/I5Ze2z/fwzu95Qto+Rel0Oi3AoNj1AMh/RAQzIoIZEcGMiGBGRDAjIpgREcyICGZEBDMighkRwYyIYEZEMCMimBERzIgIZkQEMyKCGRHBjIhgRkQwIyKYERHMiAhmRAQzIoJZqesB8kHbWTfH/fpDN8edL3YimBERzIgIZkQEMyKCGRHBjIhgRkQwIyKYeSaiZ8+e6dSpU1q1apUWLlyoSCSinp4erV69Wvv373c93rx81/q6+ru/z1hLp9P65mO/BhKdjqbKHc987LF37151dnbq6NGjamlpUTwe144dOzQ+Pq7Dhw+7Hi9rU7/+pIepEb32RiRj/bexQf0xPanqhrWOJssdT0R07tw5nTlzRt3d3dqwYYMkaePGjbp+/bo6OjrU0tLieMLsjQ4mVFRcokBtKGP9l+E+LVpSrcWBOkeT5Y4nTmfHjx9XNBp9HtDfGhsbtWDBAjU3NzuabP5GBxOqqHlTpa+8mrE+Ptyn4IrC24UkD+xEDx48UH9/vw4dOjTrueHhYYVCIfl8vpwdv6ioaM6v+fSH7G86MDqYUGp0QLFPqjLWnzye0tr3j7z02XIp25steCIiSaqpqclYf/TokXp6erRp0yYXY/1no0O9eueDL/TWux9lrJ890qzqAt2JnJ/Oqqr+esUmk8mM9RMnTmhkZERr1qzJ6fHT6fScj2ylfh7Q44cTqg+/p8WB2uePmSfTevx7SsF5XlRnM1suH9lyvhM1NDQoHA7r2LFjqqys1LJly3Tx4kVdunRJkvLuorrUt2jWO7ORe3GVB+pUtqTa0WS55XwnKi4u1oULFxQKhXTgwAHt2bNHVVVVOnjwoEpLSxUOh12PmLXRwYSqV7yt4pLM1+bIwI8FeyqTpCKv3qpq9+7d6uvr082bN12Pwq/HzsH5TvQivb29eXUq+z/zZERTU1NKJpM5v6jGy+H8wvqflJeXa2ZmxvUYyJIndyLkFyKCGRHBjIhgRkQwIyKYERHMPPuxB/IHOxHMiAhmRAQzIoIZEcGMiGBGRDAjIpgREcyICGZEBDMighkRwYyIYEZEMCMimBERzIgIZkQEMyKCGRHBjIhgRkQwIyKYERHMPBVRV1eXtmzZomAwKJ/Pp/r6eu3cuVO3bt1yPRr+hSf+jPrp06fatWuX2tvbVVtbq82bN8vv9yuZTOry5cvq6upSNBp1PSZewBP/s7G1tVXt7e3at2+fTp8+rbKysufP3b9/X0uXLs3ZsT//6tuc/ex89+Vn2d1nznlE165dUywWUzQaVSwWm3VTlLq6wru1U6Fxfjrbtm2bOjo6dOPGDUUikbm/AZ7jPCK/369AIKChoSEnx+d09mLZns6cvjtLpVKanJzU8uXLXY4BI6c70cTEhCorK9XU1KTbt2+7GgNGTneiiooKrVy5Unfu3NHVq1dnPX/37l0HU2G+nF8TnT9/Xtu3b1dJSYm2bt2qxsZGjY2NKR6Pq6mpSZ2dhXcL8ELjPCJJunLlik6ePKlEIqHp6WkFg0GtW7dObW1tWr9+vevxMAdPRIT85qnPzpCfiAhmRAQzIoIZEcGMiGBGRDAjIpgREcyICGZEBDMighkRwYyIYEZEMCMimBERzIgIZkQEMyKCGRHBjIhgRkQwIyKYERHMiAhmRASzPwHPJLsuj1leygAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "obs = env.reset()\n", "display(env.render())\n", "\n", "for _ in range(1):\n", " action, _states = modelA2C.predict(obs)\n", " obs, _, done, info = env.step(action)\n", " display(info[0]['circuit_img'])\n", " \n", "env.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comparison of the agents\n", "\n", "Finally, we compare the agents quantitavely by contrasting their average reward computed running 1000 episodes of the game. We rely on the *evaluation* module that provides simple and standard routines to evaluate the agents." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "import evaluation\n", "n_episodes = 1000\n", "\n", "PPO2_perf, _ = evaluation.evaluate_model(modelPPO2, env, num_steps=n_episodes)\n", "A2C_perf, _ = evaluation.evaluate_model(modelA2C, env, num_steps=n_episodes)\n", "\n", "env = gym.make('qcircuit-v0')\n", "rand_perf, _ = evaluation.evaluate_random(env, num_steps=n_episodes)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mean performance of random agent (out of 1000 episodes): 97.674\n", "Mean performance of PPO2 agent (out of 1000 episodes): 99.9\n", "Mean performance of A2C agent (out of 1000 episodes): 99.893\n" ] } ], "source": [ "print('Mean performance of random agent (out of {0} episodes): {1}'.format(n_episodes,rand_perf))\n", "print('Mean performance of PPO2 agent (out of {0} episodes): {1}'.format(n_episodes,PPO2_perf))\n", "print('Mean performance of A2C agent (out of {0} episodes): {1}'.format(n_episodes,A2C_perf))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As expected the reinforcement learning agents (PPO2, A2C) learned to play the game optimally. The random agent is still able to play and reach a solution given the small state and action space available; its average reward, however, is clearly inferior; on average it takes the random agent two and half more actions (or guesses) than PPO2/A2C per episode to reach the solution." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "\n", "[1] IBM qiskit, https://qiskit.org/\n", "\n", "[2] OpenAI gym, http://gym.openai.com/docs/\n", "\n", "[3] stable-baselines, https://github.com/hill-a/stable-baselines\n", "\n", "[4] gym-qcircuit, https://github.com/FMZennaro/gym-qcircuit" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 2 }