{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# 04 - Full waveform inversion with Devito and Dask" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "In this tutorial we show how [Devito](http://www.opesci.org/devito-public) and [scipy.optimize.minimize](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html) are used with [Dask](https://dask.pydata.org/en/latest/#dask) to perform [full waveform inversion](https://www.slim.eos.ubc.ca/research/inversion) (FWI) on distributed memory parallel computers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## scipy.optimize.minimize \n", "\n", "In this tutorial we use [scipy.optimize.minimize](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html) to solve the FWI gradient based minimization problem rather than the simple grdient decent algorithm in the previous tutorial.\n", "\n", "```python\n", "scipy.optimize.minimize(fun, x0, args=(), method=None, jac=None, hess=None, hessp=None, bounds=None, constraints=(), tol=None, callback=None, options=None)\n", "```\n", "\n", "> Minimization of scalar function of one or more variables.\n", ">\n", "> In general, the optimization problems are of the form:\n", ">\n", "> minimize f(x) subject to\n", ">\n", "> g_i(x) >= 0, i = 1,...,m\n", "> h_j(x) = 0, j = 1,...,p\n", "> where x is a vector of one or more variables. g_i(x) are the inequality constraints. h_j(x) are the equality constrains.\n", "\n", "[scipy.optimize.minimize](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html) provides a wide variety of methods for solving minimization problems depending on the context. Here we are going to focus on using L-BFGS via [scipy.optimize.minimize(method=’L-BFGS-B’)](https://docs.scipy.org/doc/scipy/reference/optimize.minimize-lbfgsb.html#optimize-minimize-lbfgsb)\n", "\n", "```python\n", "scipy.optimize.minimize(fun, x0, args=(), method='L-BFGS-B', jac=None, bounds=None, tol=None, callback=None, options={'disp': None, 'maxls': 20, 'iprint': -1, 'gtol': 1e-05, 'eps': 1e-08, 'maxiter': 15000, 'ftol': 2.220446049250313e-09, 'maxcor': 10, 'maxfun': 15000})```\n", "\n", "The argument `fun` is a callable function that returns the misfit between the simulated and the observed data. If `jac` is a Boolean and is `True`, `fun` is assumed to return the gradient along with the objective function - as is our case when applying the adjoint-state method." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What is Dask?\n", "\n", "> [Dask](https://dask.pydata.org/en/latest/#dask) is a flexible parallel computing library for analytic computing.\n", ">\n", "> Dask is composed of two components:\n", ">\n", "> * Dynamic task scheduling optimized for computation...\n", "> * “Big Data” collections like parallel arrays, dataframes, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments. These parallel collections run on top of the dynamic task schedulers.\n", ">\n", "> Dask emphasizes the following virtues:\n", "> \n", "> * Familiar: Provides parallelized NumPy array and Pandas DataFrame objects\n", "> * Flexible: Provides a task scheduling interface for more custom workloads and integration with other projects.\n", "> * Native: Enables distributed computing in Pure Python with access to the PyData stack.\n", "> * Fast: Operates with low overhead, low latency, and minimal serialization necessary for fast numerical algorithms\n", "> * Scales up: Runs resiliently on clusters with 1000s of cores\n", "> * Scales down: Trivial to set up and run on a laptop in a single process\n", "> * Responsive: Designed with interactive computing in mind it provides rapid feedback and diagnostics to aid humans\n", "\n", "**We are going to use it here to parallelise the computation of the functional and gradient as this is the vast bulk of the computational expense of FWI and it is trivially parallel over data shots.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting up (synthetic) data\n", "In a real world scenario we work with collected seismic data; for the tutorial we know what the actual solution is and we are using the workers to also generate the synthetic data." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "#NBVAL_IGNORE_OUTPUT\n", "\n", "# Set up inversion parameters.\n", "param = {'t0': 0.,\n", " 'tn': 1000., # Simulation last 1 second (1000 ms)\n", " 'f0': 0.010, # Source peak frequency is 10Hz (0.010 kHz)\n", " 'nshots': 5, # Number of shots to create gradient from\n", " 'm_bounds': (0.08, 0.25), # Set the min and max slowness\n", " 'shape': (101, 101), # Number of grid points (nx, nz).\n", " 'spacing': (10., 10.), # Grid spacing in m. The domain size is now 1km by 1km.\n", " 'origin': (0, 0), # Need origin to define relative source and receiver locations.\n", " 'nbl': 40} # nbl thickness.\n", "\n", "import numpy as np\n", "\n", "import scipy\n", "from scipy import signal, optimize\n", "\n", "from devito import Grid\n", "\n", "from distributed import Client, LocalCluster, wait\n", "\n", "import cloudpickle as pickle\n", "\n", "# Import acoustic solver, source and receiver modules.\n", "from examples.seismic import Model, demo_model, AcquisitionGeometry, Receiver\n", "from examples.seismic.acoustic import AcousticWaveSolver\n", "from examples.seismic import AcquisitionGeometry\n", "\n", "# Import convenience function for plotting results\n", "from examples.seismic import plot_image\n", "\n", "def get_true_model():\n", " ''' Define the test phantom; in this case we are using\n", " a simple circle so we can easily see what is going on.\n", " '''\n", " return demo_model('circle-isotropic', vp=3.0, vp_background=2.5, \n", " origin=param['origin'], shape=param['shape'],\n", " spacing=param['spacing'], nbl=param['nbl'])\n", "\n", "def get_initial_model():\n", " '''The initial guess for the subsurface model.\n", " '''\n", " # Make sure both model are on the same grid\n", " grid = get_true_model().grid\n", " return demo_model('circle-isotropic', vp=2.5, vp_background=2.5, \n", " origin=param['origin'], shape=param['shape'],\n", " spacing=param['spacing'], nbl=param['nbl'],\n", " grid=grid)\n", "\n", "def wrap_model(x, astype=None):\n", " '''Wrap a flat array as a subsurface model.\n", " '''\n", " model = get_initial_model()\n", " if astype:\n", " model.vp = x.astype(astype).reshape(model.vp.data.shape)\n", " else:\n", " model.vp = x.reshape(model.vp.data.shape)\n", " return model\n", "\n", "def load_model(filename):\n", " \"\"\" Returns the current model. This is used by the\n", " worker to get the current model.\n", " \"\"\"\n", " pkl = pickle.load(open(filename, \"rb\"))\n", " \n", " return pkl['model']\n", "\n", "def dump_model(filename, model):\n", " ''' Dump model to disk.\n", " '''\n", " pickle.dump({'model':model}, open(filename, \"wb\"))\n", " \n", "def load_shot_data(shot_id, dt):\n", " ''' Load shot data from disk, resampling to the model time step.\n", " '''\n", " pkl = pickle.load(open(\"shot_%d.p\"%shot_id, \"rb\"))\n", " \n", " return pkl['geometry'].resample(dt), pkl['rec'].resample(dt)\n", "\n", "def dump_shot_data(shot_id, rec, geometry):\n", " ''' Dump shot data to disk.\n", " '''\n", " pickle.dump({'rec':rec, 'geometry': geometry}, open('shot_%d.p'%shot_id, \"wb\"))\n", " \n", "def generate_shotdata_i(param):\n", " \"\"\" Inversion crime alert! Here the worker is creating the\n", " 'observed' data using the real model. For a real case\n", " the worker would be reading seismic data from disk.\n", " \"\"\"\n", " true_model = get_true_model()\n", " shot_id = param['shot_id']\n", " \n", " src_coordinates = np.empty((1, len(param['shape'])))\n", " src_coordinates[0, :] = [30, param['shot_id']*1000./(param['nshots']-1)]\n", " \n", " # Number of receiver locations per shot.\n", " nreceivers = 101\n", "\n", " # Set up receiver data and geometry.\n", " rec_coordinates = np.empty((nreceivers, len(param['shape'])))\n", " rec_coordinates[:, 1] = np.linspace(0, true_model.domain_size[0], num=nreceivers)\n", " rec_coordinates[:, 0] = 980. # 20m from the right end\n", "\n", " # Geometry \n", " geometry = AcquisitionGeometry(true_model, rec_coordinates, src_coordinates,\n", " param['t0'], param['tn'], src_type='Ricker',\n", " f0=param['f0'])\n", " # Set up solver.\n", " solver = AcousticWaveSolver(true_model, geometry, space_order=4)\n", "\n", " # Generate synthetic receiver data from true model.\n", " true_d, _, _ = solver.forward(vp=true_model.vp)\n", "\n", " dump_shot_data(shot_id, true_d, geometry)\n", "\n", "def generate_shotdata(param):\n", " # Define work list\n", " work = [dict(param) for i in range(param['nshots'])]\n", " for i in range(param['nshots']):\n", " work[i]['shot_id'] = i\n", " generate_shotdata_i(work[i])\n", " \n", " # Map worklist to cluster\n", " futures = client.map(generate_shotdata_i, work)\n", "\n", " # Wait for all futures\n", " wait(futures)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Operator `Forward` run in 0.02 s\n", "Operator `Forward` run in 0.06 s\n", "Operator `Forward` run in 0.02 s\n", "Operator `Forward` run in 0.03 s\n", "Operator `Forward` run in 0.04 s\n" ] } ], "source": [ "#NBVAL_IGNORE_OUTPUT\n", "\n", "# Start Dask cluster\n", "cluster = LocalCluster(n_workers=2, death_timeout=600)\n", "client = Client(cluster)\n", "\n", "# Generate shot data.\n", "generate_shotdata(param)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dask specifics\n", "\n", "Previously we defined a function to calculate the individual contribution to the functional and gradient for each shot, which was then used in a loop over all shots. However, when using distributed frameworks such as Dask we instead think in terms of creating a worklist which gets *mapped* onto the worker pool. The sum reduction is also performed in parallel. For now however we assume that the scipy.optimize.minimize itself is running on the *master* process; this is a reasonable simplification because the computational cost of calculating (f, g) far exceeds the other compute costs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because we want to be able to use standard reduction operators such as sum on (f, g) we first define it as a type so that we can define the `__add__` (and `__rand__` method)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Define a type to store the functional and gradient.\n", "class fg_pair:\n", " def __init__(self, f, g):\n", " self.f = f\n", " self.g = g\n", " \n", " def __add__(self, other):\n", " f = self.f + other.f\n", " g = self.g + other.g\n", " \n", " return fg_pair(f, g)\n", " \n", " def __radd__(self, other):\n", " if other == 0:\n", " return self\n", " else:\n", " return self.__add__(other)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create operators for gradient based inversion\n", "To perform the inversion we are going to use [scipy.optimize.minimize(method=’L-BFGS-B’)](https://docs.scipy.org/doc/scipy/reference/optimize.minimize-lbfgsb.html#optimize-minimize-lbfgsb).\n", "\n", "First we define the functional, ```f```, and gradient, ```g```, operator (i.e. the function ```fun```) for a single shot of data. This is the work that is going to be performed by the worker on a unit of data." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from devito import Function\n", "\n", "# Create FWI gradient kernel for a single shot\n", "def fwi_gradient_i(param):\n", " # Load the current model and the shot data for this worker.\n", " # Note, unlike the serial example the model is not passed in\n", " # as an argument. Broadcasting large datasets is considered\n", " # a programming anti-pattern and at the time of writing it\n", " # it only worked relaiably with Dask master. Therefore, the\n", " # the model is communicated via a file.\n", " model0 = load_model(param['model'])\n", " \n", " dt = model0.critical_dt\n", "\n", " geometry, rec = load_shot_data(param['shot_id'], dt)\n", " geometry.model = model0\n", " # Set up solver.\n", " solver = AcousticWaveSolver(model0, geometry, space_order=4)\n", "\n", " # Compute simulated data and full forward wavefield u0\n", " d, u0, _ = solver.forward(save=True)\n", " \n", " # Compute the data misfit (residual) and objective function\n", " residual = Receiver(name='rec', grid=model0.grid,\n", " time_range=geometry.time_axis,\n", " coordinates=geometry.rec_positions)\n", "\n", " residual.data[:] = d.data[:residual.shape[0], :] - rec.data[:residual.shape[0], :]\n", " f = .5*np.linalg.norm(residual.data.flatten())**2\n", " \n", " # Compute gradient using the adjoint-state method. Note, this\n", " # backpropagates the data misfit through the model.\n", " grad = Function(name=\"grad\", grid=model0.grid)\n", " solver.gradient(rec=residual, u=u0, grad=grad)\n", " \n", " # Copying here to avoid a (probably overzealous) destructor deleting\n", " # the gradient before Dask has had a chance to communicate it.\n", " g = np.array(grad.data[:])\n", " \n", " # return the objective functional and gradient.\n", " return fg_pair(f, g)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define the global functional-gradient operator. This does the following:\n", "* Maps the worklist (shots) to the workers so that the invidual contributions to (f, g) are computed.\n", "* Sum individual contributions to (f, g) and returns the result." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def fwi_gradient(model, param):\n", " # Dump a copy of the current model for the workers\n", " # to pick up when they are ready.\n", " param['model'] = \"model_0.p\"\n", " dump_model(param['model'], wrap_model(model))\n", "\n", " # Define work list\n", " work = [dict(param) for i in range(param['nshots'])]\n", " for i in range(param['nshots']):\n", " work[i]['shot_id'] = i\n", " \n", " # Distribute worklist to workers.\n", " fgi = client.map(fwi_gradient_i, work, retries=1)\n", " \n", " # Perform reduction.\n", " fg = client.submit(sum, fgi).result()\n", " \n", " # L-BFGS in scipy expects a flat array in 64-bit floats.\n", " return fg.f, -fg.g.flatten().astype(np.float64)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## FWI with L-BFGS-B\n", "Equipped with a function to calculate the functional and gradient, we are finally ready to define the optimization function." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from scipy import optimize\n", "\n", "# Define bounding box constraints on the solution.\n", "def apply_box_constraint(vp):\n", " # Maximum possible 'realistic' velocity is 3.5 km/sec\n", " # Minimum possible 'realistic' velocity is 2 km/sec\n", " return np.clip(vp, 2.0, 3.5)\n", "\n", "# Many optimization methods in scipy.optimize.minimize accept a callback\n", "# function that can operate on the solution after every iteration. Here\n", "# we use this to apply box constraints and to monitor the true relative\n", "# solution error.\n", "relative_error = []\n", "def fwi_callbacks(x):\n", " # Apply boundary constraint\n", " x.data[:] = apply_box_constraint(x)\n", " \n", " # Calculate true relative error\n", " true_x = get_true_model().vp.data.flatten()\n", " relative_error.append(np.linalg.norm((x-true_x)/true_x))\n", "\n", "def fwi(model, param, ftol=0.1, maxiter=5):\n", " result = optimize.minimize(fwi_gradient,\n", " model.vp.data.flatten().astype(np.float64),\n", " args=(param, ), method='L-BFGS-B', jac=True,\n", " callback=fwi_callbacks,\n", " options={'ftol':ftol,\n", " 'maxiter':maxiter,\n", " 'disp':True})\n", "\n", " return result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now apply our FWI function and have a look at the result." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " fun: 211.63605994835484\n", " hess_inv: <32761x32761 LbfgsInvHessProduct with dtype=float64>\n", " jac: array([-6.69854363e-12, -3.05454377e-11, -7.75552331e-11, ...,\n", " -1.31146288e-10, -5.26459223e-11, -1.16629224e-11])\n", " message: b'STOP: TOTAL NO. of ITERATIONS REACHED LIMIT'\n", " nfev: 6\n", " nit: 5\n", " status: 1\n", " success: False\n", " x: array([2.5, 2.5, 2.5, ..., 2.5, 2.5, 2.5])\n" ] } ], "source": [ "#NBVAL_IGNORE_OUTPUT\n", "\n", "model0 = get_initial_model()\n", "\n", "# Baby steps\n", "result = fwi(model0, param)\n", "\n", "# Print out results of optimizer.\n", "print(result)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "#NBVAL_SKIP\n", "\n", "# Show what the update does to the model\n", "from examples.seismic import plot_image, plot_velocity\n", "\n", "model0.vp = result.x.astype(np.float32).reshape(model0.vp.data.shape)\n", "plot_velocity(model0)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "#NBVAL_SKIP\n", "\n", "# Plot percentage error\n", "plot_image(100*np.abs(model0.vp.data-get_true_model().vp.data)/get_true_model().vp.data, vmax=15, cmap=\"hot\")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "#NBVAL_SKIP\n", "import matplotlib.pyplot as plt\n", "\n", "# Plot objective function decrease\n", "plt.figure()\n", "plt.loglog(relative_error)\n", "plt.xlabel('Iteration number')\n", "plt.ylabel('True relative error')\n", "plt.title('Convergence')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook is part of the tutorial \"Optimised Symbolic Finite Difference Computation with Devito\" presented at the Intel® HPC Developer Conference 2017." ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" }, "widgets": { "state": {}, "version": "1.1.2" } }, "nbformat": 4, "nbformat_minor": 1 }