{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Training an agent to Walk" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let us learn how to train a robot to walk using Gym along with some fundamentals.\n", "The strategy is that reward X points will be given when the robot moves forward and if the\n", "robot fails to move then Y points will be reduced. So the robot will learn to walk in the\n", "event of maximizing the reward.\n", "\n", "First, we will import the library, then we will create a simulation instance by make\n", "function. \n", "\n", "\n", "Open AI Gym provides an environment called BipedalWalker-v2 for training\n", "robotic agents in simple terrain. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import gym\n", "env = gym.make('BipedalWalker-v2')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then for each episode (Agent-Environment interaction between initial and final state), we\n", "will initialize the environment using reset method." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "for episode in range(100):\n", " observation = env.reset()\n", " \n", " # Render the environment on each step \n", " for i in range(10000):\n", " env.render()\n", " \n", " # we choose action by sampling random action from environment's action space. Every environment has\n", " # some action space which contains the all possible valid actions and observations,\n", " \n", " action = env.action_space.sample()\n", " \n", " # Then for each step, we will record the observation, reward, done, info\n", " observation, reward, done, info = env.step(action)\n", " \n", " # When done is true, we print the time steps taken for the episode and break the current episode.\n", " if done:\n", " print(\"{} timesteps taken for the Episode\".format(i+1))\n", " break" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The agent will learn by trail and error and over a period of time it starts selecting actions which gives the\n", "maximum rewards." ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:universe]", "language": "python", "name": "conda-env-universe-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.4" } }, "nbformat": 4, "nbformat_minor": 2 }