{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "ctpoSTL9oGq7" }, "source": [ "# Gym environments" ] }, { "cell_type": "markdown", "metadata": { "id": "PsWSoHfooGrJ" }, "source": [ "## Installing gym\n", "\n", "In this course, we will mostly address RL environments available in the **OpenAI Gym** framework:\n", "\n", "\n", "\n", "It provides a multitude of RL problems, from simple text-based problems with a few dozens of states (Gridworld, Taxi) to continuous control problems (Cartpole, Pendulum) to Atari games (Breakout, Space Invaders) to complex robotics simulators (Mujoco):\n", "\n", "\n", "\n", "However, `gym` is not maintained by OpenAI anymore since September 2022. We will use instead the `gymnasium` library maintained by the Farama foundation, which will keep on maintaining and improving the library.\n", "\n", "\n", "\n", "You can install gymnasium and its dependencies using:\n", "\n", "```bash\n", "pip install -U gymnasium pygame moviepy\n", "pip install gymnasium[classic_control]\n", "pip install gymnasium[box2d]\n", "```\n", "\n", "For this exercise and the following, we will focus on simple environments whose installation is straightforward: toy text, classic control and box2d. More complex environments based on Atari games or the Mujoco physics simulator are described in the last (optional) section of this notebook, as they require additional dependencies. \n", "\n", "On colab, `gym` cannot open graphical windows for visualizing the environments, as it is not possible in the browser. We will see a workaround allowing to produce videos. Running that cell in colab should allow you to run the simplest environments:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "YxLOmLKIoGrN" }, "outputs": [], "source": [ "try:\n", " import google.colab\n", " IN_COLAB = True\n", "except:\n", " IN_COLAB = False\n", "\n", "if IN_COLAB:\n", " !pip install -U gymnasium pygame moviepy\n", " !pip install gymnasium[box2d]\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "gym version: 0.26.3\n" ] } ], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import os\n", "\n", "import gymnasium as gym\n", "print(\"gym version:\", gym.__version__)\n", "\n", "from moviepy.editor import ImageSequenceClip, ipython_display\n", "\n", "class GymRecorder(object):\n", " \"\"\"\n", " Simple wrapper over moviepy to generate a .gif with the frames of a gym environment.\n", " \n", " The environment must have the render_mode `rgb_array_list`.\n", " \"\"\"\n", " def __init__(self, env):\n", " self.env = env\n", " self._frames = []\n", "\n", " def record(self, frames):\n", " \"To be called at the end of an episode.\"\n", " for frame in frames:\n", " self._frames.append(np.array(frame))\n", "\n", " def make_video(self, filename):\n", " \"Generates the gif video.\"\n", " directory = os.path.dirname(os.path.abspath(filename))\n", " if not os.path.exists(directory):\n", " os.mkdir(directory)\n", " self.clip = ImageSequenceClip(list(self._frames), fps=self.env.metadata[\"render_fps\"])\n", " self.clip.write_gif(filename, fps=self.env.metadata[\"render_fps\"], loop=0)\n", " del self._frames\n", " self._frames = []" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interacting with an environment\n", "\n", "A gym environment is created using:\n", "\n", "```python\n", "env = gym.make('CartPole-v1', render_mode=\"human\")\n", "```\n", "\n", "where 'CartPole-v1' should be replaced by the environment you want to interact with. The following cell lists the environments available to you (including the different versions)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CartPole-v0\n", "CartPole-v1\n", "MountainCar-v0\n", "MountainCarContinuous-v0\n", "Pendulum-v1\n", "Acrobot-v1\n", "LunarLander-v2\n", "LunarLanderContinuous-v2\n", "BipedalWalker-v3\n", "BipedalWalkerHardcore-v3\n", "CarRacing-v2\n", "Blackjack-v1\n", "FrozenLake-v1\n", "FrozenLake8x8-v1\n", "CliffWalking-v0\n", "Taxi-v3\n", "Reacher-v2\n", "Reacher-v4\n", "Pusher-v2\n", "Pusher-v4\n", "InvertedPendulum-v2\n", "InvertedPendulum-v4\n", "InvertedDoublePendulum-v2\n", "InvertedDoublePendulum-v4\n", "HalfCheetah-v2\n", "HalfCheetah-v3\n", "HalfCheetah-v4\n", "Hopper-v2\n", "Hopper-v3\n", "Hopper-v4\n", "Swimmer-v2\n", "Swimmer-v3\n", "Swimmer-v4\n", "Walker2d-v2\n", "Walker2d-v3\n", "Walker2d-v4\n", "Ant-v2\n", "Ant-v3\n", "Ant-v4\n", "Humanoid-v2\n", "Humanoid-v3\n", "Humanoid-v4\n", "HumanoidStandup-v2\n", "HumanoidStandup-v4\n", "GymV26Environment-v0\n" ] } ], "source": [ "for env in gym.envs.registry.items():\n", " print(env[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `render_mode` argument defines how you will see the environment:\n", "\n", "* `None` (default): allows to train a DRL algorithm without wasting computational resources rendering it.\n", "* `rgb_array_list`: allows to get numpy arrays corresponding to each frame. Will be useful when generating videos.\n", "* `ansi`: string representation of each state. Only available for the \"Toy text\" environments.\n", "* `human`: graphical window displaying the environment live." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The main interest of gym(nasium) is that all problems have a common interface defined by the class `gym.Env`. There are only three methods that have to be used when interacting with an environment:\n", "\n", "* `state, info = env.reset()` restarts the environment and returns an initial state $s_0$.\n", "\n", "* `state, reward, terminal, truncated, info = env.step(action)` takes an action $a_t$ and returns:\n", " * the new state $s_{t+1}$, \n", " * the reward $r_{t+1}$, \n", " * two boolean flags indicating whether the current state is terminal (won/lost) or truncated (timeout),\n", " * a dictionary containing additional info for debugging (you can ignore it most of the time).\n", "\n", "* `env.render()` displays the current state of the MDP. When the render mode is set to `rgb_array_list` or `human`, it does not even have to called explicitly (since gym 0.25)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With this interface, we can interact with the environment in a standardized way:\n", "\n", "* We first create the environment.\n", "* For a fixed number of episodes:\n", " * We pick an initial state with `reset()`.\n", " * Until the episode is terminated:\n", " * We select an action using our RL algorithm or randomly.\n", " * We take that action (`step()`), observe the new state and the reward.\n", " * We go into the new state.\n", "\n", "The following cell shows how to interact with the CartPole environment using a random policy. Note that it will only work on your computer, not in colab." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "env = gym.make('CartPole-v1', render_mode=\"human\")\n", "\n", "for episode in range(10):\n", " state, info = env.reset()\n", " done = False\n", " while not done:\n", " # Select an action randomly\n", " action = env.action_space.sample()\n", " \n", " # Sample a single transition\n", " next_state, reward, terminal, truncated, info = env.step(action)\n", " \n", " # Go in the next state\n", " state = next_state\n", "\n", " # End of the episode\n", " done = terminal or truncated\n", "\n", "env.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On colab (or whenever you want to record videos of the episodes instead of watching them live), you need to create the environment with the rendering mode `rgb_array_list`. \n", "\n", "You then create a `GymRecorder` object (defined in the first cell of this notebook). \n", "\n", "```python\n", "recorder = GymRecorder(env)\n", "```\n", "\n", "At the end of each episode, you tell the recorder to record all frames generated during the episode. The frames returned by `env.render()` are (width, height, 3) numpy arrays which are accumulated by the environment during the episode and flushed when `env.reset()` is called.\n", "\n", "```python\n", "recorder.record(env.render())\n", "```\n", "\n", "You can then generate a gif at the end of the simulation with:\n", "\n", "```python\n", "recorder.make_video('videos/CartPole-v1.gif')\n", "```\n", "\n", "Finally, you can render the gif in the notebook by calling **at the very last line of the cell**:\n", "\n", "```python\n", "ipython_display('videos/CartPole-v1.gif')\n", "```" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MoviePy - Building file videos/CartPole-v1.gif with imageio.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " \r" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "env = gym.make('CartPole-v1', render_mode=\"rgb_array_list\")\n", "recorder = GymRecorder(env)\n", "\n", "for episode in range(10):\n", " state, info = env.reset()\n", "\n", " done = False\n", " while not done:\n", " # Select an action randomly\n", " action = env.action_space.sample()\n", " \n", " # Sample a single transition\n", " next_state, reward, terminal, truncated, info = env.step(action)\n", " \n", " # Go in the next state\n", " state = next_state\n", "\n", " # End of the episode\n", " done = terminal or truncated\n", "\n", " # Record at the end of the episode \n", " recorder.record(env.render())\n", "\n", "recorder.make_video('videos/CartPole-v1.gif')\n", "ipython_display('videos/CartPole-v1.gif', autoplay=1, loop=0)" ] }, { "cell_type": "markdown", "metadata": { "id": "WENOr5atoGr1" }, "source": [ "Each environment defines its state space (`env.observation_space`) and action space (`env.action_space`). \n", "\n", "State and action spaces can either be :\n", "\n", "* discrete (`gym.spaces.Discrete(nb_states)`), with states being an integer between 0 and `nb_states` -1.\n", "\n", "* feature-based (`gym.spaces.Box(low=0, high=255, shape=(SCREEN_HEIGHT, SCREEN_WIDTH, 3))`) for pixel frames.\n", "\n", "* continuous. Example for two joints of a robotic arm limited between -180 and 180 degrees:\n", "\n", "```python\n", "gym.spaces.Box(-180.0, 180.0, (2, ))\n", "```\n", "\n", "You can sample a state or action randomly from these spaces:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-113.33049 83.84796]\n" ] } ], "source": [ "action_space = gym.spaces.Box(-180.0, 180.0, (2, ))\n", "action = action_space.sample()\n", "print(action)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "Sampling the action space is particularly useful for exploration. We use it here to perform random (but valid) actions:\n", "\n", "```python\n", "action = env.action_space.sample()\n", "```\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Create a method `random_interaction(env, number_episodes, recorder=None)` that takes as arguments:\n", "\n", "* The environment.\n", "* The number of episodes to be performed.\n", "* An optional `GymRecorder` object that may record the frames of the environment if it is not None (`if renderer is not None:`). Otherwise, do not nothing.\n", "\n", "The method should return the list of undiscounted returns ($\\gamma=1$, i.e. just the sum of rewards obtained during each episode) for all episodes." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def random_interaction(env, number_episodes, recorder=None):\n", "\n", " returns = []\n", "\n", " # Sample episodes\n", " for episode in range(number_episodes):\n", "\n", " # Sample the initial state\n", " state, info = env.reset()\n", "\n", " return_episode = 0.0\n", " done = False\n", " while not done:\n", "\n", " # Select an action randomly\n", " action = env.action_space.sample()\n", " \n", " # Sample a single transition\n", " next_state, reward, terminal, truncated, info = env.step(action)\n", "\n", " # Update the return\n", " return_episode += reward\n", " \n", " # Go in the next state\n", " state = next_state\n", "\n", " # End of the episode\n", " done = terminal or truncated\n", "\n", " # Record at the end of the episode\n", " if recorder is not None:\n", " recorder.record(env.render())\n", "\n", " returns.append(return_episode)\n", "\n", " return returns\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Use that method to visualize all the available simple environments for a few episodes:\n", "\n", "* CartPole-v1\n", "* MountainCar-v0\n", "* Pendulum-v1\n", "* Acrobot-v1\n", "* LunarLander-v2\n", "* BipedalWalker-v3\n", "* CarRacing-v2\n", "* Blackjack-v1\n", "* FrozenLake-v1\n", "* CliffWalking-v0\n", "* Taxi-v3\n", "\n", "If you do many episodes (CarRacing or Taxi have very long episodes with a random policy), plot the obtained returns to see how they vary. \n", "\n", "If you managed to install the mujoco and atari dependencies, feel free to visualize them too. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "MoviePy - Building file videos/CartPole-v1.gif with imageio.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " \r" ] }, { "data": { "text/html": [ "