{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Recode: Robots that can adapt like animals" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We recode the arm experiment of the article \"Robot that can adapt like animals\" ([10.1038/nature14422](http://dx.doi.org/10.1038/nature14422)) by Antoine Cully, Jeff Clune, Danesh Tarapore and Jean-Baptiste Mouret. The article is available [on the Nature website](http://www.nature.com/nature/journal/v521/n7553/full/nature14422.html), and a preprint [is available here](http://www.isir.upmc.fr/files/2015ACLI3468.pdf). The authors have made the [C++ code used for the experiments](http://pages.isir.upmc.fr/~mouret/code/ite_source_code.tar.gz) in the article available, but it was not necessary to consult it to code this Python implementation. The [supplementary information](http://www.nature.com/nature/journal/v521/n7553/extref/nature14422-s1.pdf) document, however, was instrumental to it. This code is available on the [recode github repository](https://github.com/humm/recode), and is published under the [OpenScience License](http://fabien.benureau.com/openscience.html).\n", "\n", "The article introduces a new method for robots to adapt their behavior when facing hardware failure, such as a motor malfunctioning. The method is remarquable because it does not need to diagnose the problem, features a significantly faster adaptation than previous techniques, and relies on the experience gained from a long babbling phase (many different motions are tested) on the intact robot. \n", "\n", "We won't attempt to summarize or re-explain futher the aims behind the experiments; we assume the reader is familiar with the article. Moreover, we only implement the arm experiment here, not the hexapod one. The main differences between this code and the one presented in the article are:\n", "0. We employ a kinematic, planar simulation for the robotic arm, in place of both the paper's simulation and the real robot.\n", "0. We do not filter self-collisions of the arm.\n", "0. We do restrict the working area of the arm to the camera field of view.\n", "\n", "The code is divided in two parts: one implementing the MAP-Elites algorithm and another implementing the M-BOA optimization algorithm. The code depends on the [numpy](http://www.numpy.org/), and the [bokeh](http://bokeh.pydata.org) library for the figures. The comments from the paper's pseudocode (Supplementary Figure 1) have been inserted into the code when appropriate. They are prefixed with a double \"`##`\" sign.\n", "\n", "The code is optimized for comprehension, not efficiency. Obvious optimization can be made, but they would reduce clarity. A citable version of this notebook is available at [figshare](https://dx.doi.org/10.6084/m9.figshare.5334187). You can contact me for questions or remarks at `fabien.benureau@gmail.com`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import random, math\n", "import numpy as np\n", "import graphs\n", "\n", "random.seed(0) # reproducible results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## MAP-Elites" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The robotic arm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The arm has 8 joints, each with a range of $\\pm \\pi/2$, and a length of 62 cm. The `arm2d()` function computes the position of the end effector (in meters) given a set of angles, expressed as normalized values between 0 and 1. All the angles values manipulated outside of this function are between 0 and 1." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ARM_DIM = 8\n", "\n", "def arm2d_rad(angles):\n", " \"\"\"Return the position of the end effector. Accepts angles in radians.\"\"\"\n", " u, v, sum_a, length = 0, 0, 0, 0.62/len(angles)\n", " for a in angles:\n", " sum_a += a\n", " u, v = u + length*math.sin(sum_a), v + length*math.cos(sum_a) # zero pose is at x=0,y=1\n", " return u, v\n", "\n", "def arm2d(angles):\n", " \"\"\"Return the position of the end effector. Accept angles in [0, 1]\"\"\"\n", " angles = [math.pi*(a-0.5) for a in angles]\n", " return arm2d_rad(angles)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The performance measure for MAP-Elites is the opposite of the variance between joints. Given $p_0, ..., p_8$ the values of the angles, the performance is:\n", "$$\\textrm{performance(angles)} = -\\frac{1}{8}\\sum_0^8 (p_i - m)^2 \\,\\,\\,\\,\\textrm{ with } m \\textrm{ the average value of the angles: }\\,\\,\\,\\, m = \\sum_0^8 p_i$$\n", "Let's remark that the performance in general depends on the value of the angles and on the value of the behavior (returned by `arm2d(angles)`), but it does not here.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def performance(angles):\n", " \"\"\"Performance based on the variance of the angles\"\"\"\n", " m = sum(angles)/len(angles)\n", " return -sum((a - m)**2 for a in angles)/len(angles)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the rest of the code, a set of 8 angle values will be called a *controller*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Populating the performance map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The performance map keeps track of the best controllers (here, angle values) for all different observed behaviors. It discretizes the behavioral space into a grid, and for each cell of the grid, keeps only the topmost performing controller. In this implementation we keep three separate python dictionaries for the controllers, behavior and performance." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ctrl_map = {}\n", "behv_map = {}\n", "perf_map = {}\n", "all_coos = [] # keeping track of all the non-empty cells to quickly choose a random one" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To populate the performance matrix, $I$ simulations are done. First, a number (variable `B`) of random controllers is tried. And then, for the remaining of the $I$ simulations, mutations of those controllers are tried." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def map_elites(I=200000, B=400):\n", " \"\"\"Populate the performance map\"\"\"\n", " for i in range(I):\n", " if i < B:\n", " c = [random.random() for _ in range(ARM_DIM)] ## the first 400 controllers are generated randomly.\n", " else: ## the next controllers are generated using the map.\n", " rand_coo = random.choice(all_coos)\n", " c_prime = ctrl_map[rand_coo] ## Randomly select a controller c in the map.\n", " c = perturb(c_prime) ## Create a randomly modified copy of c.\n", " behavior = arm2d(c) ## Simulate the controller and record its behavioral descriptor.\n", " p = performance(c) ## Record its performance.\n", " add_mat(c, behavior, p) # Update the performance map.\n", "\n", "RES = 200 # number of row and columns in the behavioral grid\n", "\n", "def add_mat(ctrl, behavior, perf):\n", " \"\"\"Update the performance map if necessary\"\"\"\n", " x, y = behavior\n", " coo = (int((x+0.7)/0.7*RES/2), int((y+0.7)/0.7*RES/2)) # coo is the discretized coordinate of a behavior.\n", " perf_old = perf_map.get(coo, float('-inf'))\n", " if perf_old < perf: ## If the cell is empty or if perf is better than the current stored performance.\n", " if not coo in ctrl_map:\n", " all_coos.append(coo)\n", " ctrl_map[coo] = ctrl ## Associate the controller with its behavioral descriptor.\n", " perf_map[coo] = perf ## Store the performance of c′ in the behavior-performance map according\n", " behv_map[coo] = behavior ## to its behavioral descriptor." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So far, what is missing is the `perturb()` function (line 9). The random modification of an existing controller is done using a polynomial mutation operator (see *Multi-Objective Optimization Using Evolutionary Algorithms* by K. Deb\n", "(2001), p. 120). For a value $c_i$, whose extremum values are 0 and 1, and given $r_i$, a random value in [0, 1], the mutation goes:\n", "$$c_i' = c_i + \\delta_i \\textrm{ with } \\delta_i = \\begin{cases} (2r_i)^{1/(\\eta_m+1)} + 1 &\\mbox{if } r < 0.5 \\\\\n", "1 - (2(1-r_i))^{1/(\\eta_m+1)} & \\mbox{if } r_i \\geq 0.5. \\end{cases}$$\n", "The value of $\\eta_m$ is fixed to 10." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ETA_M = 10.0\n", "\n", "def mutate(c_i):\n", " \"\"\"Polynomial mutation operator (see Deb (2001) p. 120)\"\"\"\n", " r_i = random.random()\n", " if r_i < 0.5:\n", " delta_i = (2*r_i)**(1/(ETA_M + 1)) - 1\n", " else:\n", " delta_i = 1 - (2*(1 - r_i))**(1/(ETA_M + 1))\n", " return min(1.0, max(0.0, c_i + delta_i))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When creating a random perturbation of a controller (i.e. a vector of 8 values in [0,1]), each value has a 12.5% chance to mutate. Therefore:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "MUTATION_RATE = 0.125\n", "\n", "def perturb(c):\n", " \"\"\"Return a random perturbation of the controller\"\"\"\n", " return [mutate(c_i) if random.random() < MUTATION_RATE else c_i for c_i in c]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now run the MAP-Elites algorithm. Using this implementation (in 2015), 2 million simulations will take of the order of one minute, depending on your hardware. The original article does 20 million simulations." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "I = 2000000 # number of simulation\n", "B = 400 # bootstrapping\n", "\n", "map_elites(I=I, B=B)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### Visualizating the map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use [bokeh](http://bokeh.pydata.org/en/latest/) for visualizing the performance map. The plotting code is in the `graph.py` file. The colors are displayed on a logarithmic scale." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import graphs\n", "graphs.variance_map(perf_map, RES)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compared to the results of the article (Extended Data Figure 7.c), this performance map differs in the center because we do not prevent self-collisions; they make high-performing postures impossible in the center." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## M-BOA Optimization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A broken arm\n", "Fourteen different damage conditions are explored in the article (Extended Data Figure 7.b). Joints can either be stuck at 45°, or have a permanent offset of 45°. For the latter, we assume the offset does not change the range of angles the controller accepts." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DMG_COND = 3 # change this for different damage condition\n", "\n", "# for each case, the format is (