{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Neural Ordinary Differential Equations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A significant portion of processes can be described by differential equations: let it be evolution of physical systems, medical conditions of a patient, fundamental properties of markets, etc. Such data is sequential and continuous in its nature, meaning that observations are merely realizations of some continuously changing state.\n", "\n", "There is also another type of sequential data that is discrete – NLP data, for example: its state changes discretely, from one symbol to another, from one word to another.\n", "\n", "Today both these types are normally processed using recurrent neural networks. They are, however, essentially different in their nature, and it seems that they should be treated differently.\n", "\n", "At the last NIPS conference a very interesting [paper](https://arxiv.org/abs/1806.07366) was presented that attempts to tackle this problem. Authors propose a very promising approach, which they call **Neural Ordinary Differential Equations**.\n", "\n", "Here I tried to reproduce and summarize the results of original paper, making it a little easier to familiarize yourself with the idea. As I believe, this new architecture may soon be, among convolutional and recurrent networks, in a toolbox of any data scientist." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Imagine a problem: there is a process following an unknown ODE and some (noisy) observations along its trajectory\n", "\n", "$$\n", "\\frac{dz}{dt} = f(z(t), t) \\tag{1}\n", "$$\n", "$$\n", "\\{(z_0, t_0),(z_1, t_1),...,(z_M, t_M)\\} - \\text{observations}\n", "$$\n", "\n", "Is it possible to find an approximation $\\widehat{f}(z, t, \\theta)$ of dynamics function $f(z, t)$?\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, consider a somewhat simpler task: there are only 2 observations, at the beginning and at the end of the trajectory, $(z_0, t_0), (z_1, t_1)$. One starts the evolution of the system from $z_0, t_0$ for time $t_1 - t_0$ with some parameterized dynamics function using any ODE initial value solver. After that, one ends up being at some new state $\\hat{z_1}, t_1$, compares it with the observation $z_1$, and tries to minimize the difference by varying the parameters $\\theta$.\n", "\n", "Or, more formally, consider optimizing the following loss function $L(\\hat{z_1})$:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\n", "L(z(t_1)) = L \\Big( z(t_0) + \\int_{t_0}^{t_1} f(z(t), t, \\theta)dt \\Big) = L \\big( \\text{ODESolve}(z(t_0), f, t_0, t_1, \\theta) \\big) \\tag{2}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Figure 1: Continuous backpropagation of the gradient requires solving the augmented ODE backwards in time.
Arrows represent adjusting backpropagated gradients with gradients from observations.
\n",
"Figure from the original paper