{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Here we review the optimizers used in machine learning. \n", "# Gradient Descent" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from matplotlib import pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data\n", "Let's use a simple dataset of salaries from developers and machine learning engineers in five Chinese cities in 2019" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "\n", "# developer salary in Beijing, Shanghai, Hangzhou, Shenzhen and Guangzhou in 2019\n", "x = [13854,12213,11009,10655,9503] \n", "x = np.reshape(x,newshape=(5,1)) / 10000.0\n", "\n", "# Machine Learning Engineer in the five cities.\n", "y = [21332, 20162, 19138, 18621, 18016] \n", "y = np.reshape(y,newshape=(5,1)) / 10000.0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Functions\n", "Objective Function:\n", "$$y=ax+b+ε$$\n", "Cost Function:\n", "$$J(a,b)=\\frac{1}{2n}\\sum_{i=0}^{n}(y_i−\\hat{y}_i )^2$$\n", "Optimization Function or optimizer:\n", "$$\\theta = \\theta - \\alpha \\frac{\\partial J}{\\partial \\theta}$$\n", "Here in the univariate linear regression:\n", "$$a = a - \\alpha \\frac{\\partial J}{\\partial a}$$\n", "$$b = b - \\alpha \\frac{\\partial J}{\\partial b}$$\n", "\n", "Here $\\frac{\\partial J}{\\partial a}$ and $\\frac{\\partial J}{\\partial b}$ are:\n", "\n", "$$ \\frac{\\partial J}{\\partial a} = \\frac{1}{n}\\sum_{i=0}^{n}x(\\hat{y}_i-y_i)$$\n", "\n", "\n", "$$ \\frac{\\partial J}{\\partial b} = \\frac{1}{n}\\sum_{i=0}^{n}(\\hat{y}_i-y_i)$$" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def model(a, b, x):\n", " return a*x + b\n", "\n", "def cost_function(a, b, x, y):\n", " n = 5\n", " return 0.5/n * (np.square(y-a*x-b)).sum()\n", "\n", "def sgd(a,b,x,y):\n", " n = 5\n", " alpha = 1e-1\n", " y_hat = model(a,b,x)\n", " da = (1.0/n) * ((y_hat-y)*x).sum()\n", " db = (1.0/n) * ((y_hat-y).sum())\n", " a = a - alpha*da\n", " b = b - alpha*db\n", " return a, b\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def iterate_sgd(a,b,x,y,times):\n", " for i in range(times):\n", " a,b = sgd(a,b,x,y)\n", "\n", " y_hat=model(a,b,x)\n", " cost = cost_function(a, b, x, y)\n", " print(a,b,cost)\n", " plt.scatter(x,y)\n", " plt.plot(x,y_hat)\n", " plt.show()\n", " return a,b, cost" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.950768563083351 0.8552812669346652 0.00035532090622957674\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "a=0\n", "b=0\n", "_, _, sgd_cost = iterate_sgd(a,b,x,y,100)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.00035532090622957674" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sgd_cost" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After 100 iterations, the regression is almost done. We record the cost such that in the following exploration of other optimizers, we will be able to compare iterations to reach the same cost." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def iterate(a, b, x, y, target_cost, func):\n", " i=0\n", " for i in range(1000):\n", " a,b = func(a,b,x,y)\n", " cost = cost_function(a, b, x, y)\n", " if cost