"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib widget\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import ipywidgets as wd\n",
"from mpl_toolkits.mplot3d import Axes3D\n",
"from matplotlib import cm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let $f:\\mathbb{R}^m \\to \\mathbb{R}$ be a differentiable function. Gradient descent is a class of methods to find the minimum\n",
"$$ \\min_{x \\in \\mathbb{R}^m}{f(x)}$$\n",
"\n",
"One method to find a minimum is to find a zero of the gradient $\\operatorname{grad} f$, for example via Newton's method. However, this requires the computation of the Hessian of $f$, which can be quite costly. Gradient descent is an alternative method, which only requires the computation of the gradient of $f$, but not the Hessian."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Algorithm\n",
"The basic idea of gradient descent is simple: Start with an arbitrary guess $x_0 \\in \\mathbb{R}^m$ and then recursively descent into the direction negative gradient:\n",
"$$ x_{n+1} = x_n - \\alpha_n \\operatorname{grad} f(x_n)$$\n",
"Here, the $\\alpha_n \\in \\mathbb{R}$ are a choice of *step sizes*. This method is motivated by the below mathematical background."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Mathematical Background\n",
"The key principles behind gradient descent rely on the following simple insights:\n",
"\n",
"**Lemma**: Let $f:\\mathbb{R}^m \\to \\mathbb{R}$ be a differentiable function and $x_0 \\in \\mathbb{R}^m$. \n",
"1. For any $v \\in \\mathbb{R}^m$, the directional derivative $\\partial _v f$ satisfies\n",
"\\begin{align*}\n",
" \\lim_{t \\searrow 0}\\frac{f(x_0 + tv)}{t} = \\partial_v f(x_0) =\\nabla f(x_0) v = \\langle \\nabla f(x_0)^{\\top}, v \\rangle \n",
" = \\langle \\operatorname{grad} f(x_0), v \\rangle \n",
"\\end{align*}\n",
"2. The negative gradient is the direction of steepest descent, i.e.\n",
"\\begin{align*}\n",
" \\underset{\\|v\\|=1}{\\operatorname{argmin}}\\partial_v f(x_0) = - \\frac{\\nabla f(x_0)}{\\|\\nabla f(x_0)\\|},\n",
"\\end{align*}\n",
"provided $\\nabla f(x_0) \\neq 0$.\n",
"\n",
"**Proof**: The first claim is the definition of directional derivative and an application of the chain rule. To see the second claim, denote by $\\vartheta$ the angle between $\\nabla f(x_0)$ and a $v \\in \\mathbb{R}^m$, $\\|v\\|=1$. By definition of the angle\n",
"$$ \\cos(\\vartheta) = \\langle \\nabla f(x_0)^{\\top}, v \\rangle \\|\\nabla f(x_0)\\| ,$$\n",
"thus\n",
"$$\\partial_v f(x_0) = \\langle \\nabla f(x_0)^{\\top}, v \\rangle = \\frac{\\cos(\\vartheta)}{\\|\\nabla f(x_0)\\|}, $$\n",
"which becomes smallest when $\\cos(\\vartheta)=-1$, i.e. when $v = -\\frac{\\nabla f(x_0)}{\\|\\nabla f(x_0)\\|}$.\n",
"\n",
"**Interpretation:** This means that at any point $x_0 \\in \\mathbb{R}^m$ descending an infinitesimal step into the direction of $-\\operatorname{grad} f(x_0)$ decreases $f$ most. The problem with that view is that 1) in reality, one needs to chose a finite step size and 2) this only yields a local minimum of $f$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"def f(x):\n",
" return x[0]**2 + x[1]**2\n",
"\n",
"def df(x):\n",
" return 2*(x[0]+x[1])\n",
"\n",
"def gradient_descent_sequence(x0, f, df, N):\n",
" X = np.zeros((N+1, x0.shape[0]))\n",
" X[0] = x0\n",
" alpha=0.1\n",
" for n in range(N):\n",
" X[n+1] = X[n] - alpha * df(X[n])\n",
" return X"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"D:\\Apps\\Anaconda3\\envs\\heroku\\lib\\site-packages\\matplotlib\\__init__.py:942: MatplotlibDeprecationWarning: nbagg.transparent is deprecated and ignored. Use figure.facecolor instead.\n",
" mplDeprecation)\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "4bae68a51a404ae08486c7669289a11c",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "ec07f46d7b5c4f9d8a9da13719ecc23b",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"interactive(children=(IntSlider(value=1, description='N', max=20, min=1), Output()), _dom_classes=('widget-int…"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"x = np.linspace(-1, 1, 20)\n",
"y = np.linspace(-1, 1, 20)\n",
"Z = np.array([[f(np.array([xx, yy])) for xx in x] for yy in y])\n",
"\n",
"X, Y = np.meshgrid(x, y)\n",
"\n",
"fig = plt.figure()\n",
"ax = fig.add_subplot(111, projection='3d')\n",
"\n",
"@wd.interact(N=wd.IntSlider(min=1, max=20, value=1))\n",
"def plot_gradient_descent(N):\n",
" ax.clear()\n",
" ax.plot_wireframe(X, Y, Z, alpha=0.8, cmap=cm.coolwarm)\n",
" Xgd = gradient_descent_sequence(np.array([1, 1]), f, df, N)\n",
" ax.plot(Xgd[:, 0], Xgd[:, 1], np.array([f(xx) for xx in Xgd]), marker='x', color='k')\n",
" plt.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "heroku",
"language": "python",
"name": "heroku"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.12"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}