{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 概念\n", "- 梯度是向量,和参数维度一样。简单地来说,多元函数的导数(derivative)就是梯度(gradient),分别对每个变量进行微分,然后用逗号分割开,梯度是用括号包括起来,说明梯度其实是一个向量。\n", "\n", "- 计算过程:\n", " - ①、对各参数求偏导,得出 $\\triangledown f$;\n", " - ②、设置初始参数向量、学习率 η及阈值;\n", " - ③、下个参数向量=上一个参数向量-η*$\\triangledown f$;\n", " - ④、直到$\\triangledown f$≤阈值停止,此时寻找到参数向量是局部最优解。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 手工代码实现" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 一元函数\n", " $f(x)=3x^2+5x$" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def loss_function(x):\n", " return 3*(x**2)+5*x\n", "\n", "def det_function(x):\n", " return 6*x+5" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def get_GD(od_f=None,f=None,x_0=None,eta=0.001,threshold=0):\n", " x_all=[]\n", " od_f_all=[]\n", " det_f_all=[]\n", " count_n=0\n", " while True:\n", " count_n+=1\n", " y=od_f(x_0)\n", " #计算导数在x处的值\n", " det_f=f(x_0) \n", " od_f_all.append(y)\n", " x_all.append(x_0)\n", " det_f_all.append(det_f)\n", " #计算下一个点的值\n", " x_0=x_0-eta*det_f\n", " #判断是否到达目的地\n", " if det_f<=threshold:\n", " break\n", " \n", " return x_all,od_f_all,det_f_all,count_n\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "x,y,dety,n=get_GD(loss_function,det_function,x_0=1,eta=0.1,threshold=0.0001)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot_x = np.linspace(-3,1.3,300)\n", "plot_y = loss_function(plot_x)\n", "plt.plot(plot_x,plot_y)\n", "plt.plot(x,y, color='red', marker='o')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 多元方程式\n", " - $f(x)=β_{0}+β_{1}x_{1}+β_{2}x_{2}+...+β_{i}x_{i}$\n", " - 向量化后梯度:$\\triangledown J(θ)=\\frac{1}{m} X_{b}^{T}(X_{b}θ-y)$\n", " - 把代码封装在线性回归代码中" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def fit_gd(self, X_train, y_train, eta=0.01, n_iters=1e4):\n", " \"\"\"根据训练数据集X_train, y_train, 使用梯度下降法训练Linear Regression模型\"\"\"\n", " assert X_train.shape[0] == y_train.shape[0], \\\n", " \"the size of X_train must be equal to the size of y_train\"\n", " \n", " def J(theta, X_b, y):\n", " try:\n", " return np.sum((y - X_b.dot(theta)) ** 2) / len(y)\n", " except:\n", " return float('inf')\n", " \n", " def dJ(theta, X_b, y):\n", " return X_b.T.dot(X_b.dot(theta) - y) * 2. / len(y)\n", " \n", " \n", " def gradient_descent(X_b, y, initial_theta, eta, n_iters=1e4, epsilon=1e-8):\n", "\n", " theta = initial_theta\n", " cur_iter = 0\n", "\n", " while cur_iter < n_iters:\n", " gradient = dJ(theta, X_b, y)\n", " last_theta = theta\n", " theta = theta - eta * gradient\n", " if (abs(J(theta, X_b, y) - J(last_theta, X_b, y)) < epsilon):\n", " break\n", "\n", " cur_iter += 1\n", "\n", " return theta\n", " \n", " X_b = np.hstack([np.ones((len(X_train), 1)), X_train])\n", " initial_theta = np.zeros(X_b.shape[1])\n", " self._theta = gradient_descent(X_b, y_train, initial_theta, eta, n_iters)\n", "\n", " self.intercept_ = self._theta[0]\n", " self.coef_ = self._theta[1:]\n", "\n", " return self\n", " " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# 用小例子验证我们的代码\n", "import numpy as np\n", "from sklearn import datasets\n", "\n", "boston = datasets.load_boston()\n", "X = boston.data\n", "y = boston.target\n", "\n", "X = X[y < 50.0]\n", "y = y[y < 50.0]" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "D:\\tool\\anaconda\\lib\\site-packages\\numpy\\core\\fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce\n", " return ufunc.reduce(obj, axis, dtype, out, **passkwargs)\n", "E:\\Data analysis\\machinelearning\\机器学习小组二期\\代码\\myML_Algorithm\\LinearRegression.py:43: RuntimeWarning: overflow encountered in square\n", " return np.sum((y - X_b.dot(theta)) ** 2) / len(y)\n", "E:\\Data analysis\\machinelearning\\机器学习小组二期\\代码\\myML_Algorithm\\LinearRegression.py:60: RuntimeWarning: invalid value encountered in double_scalars\n", " if (abs(J(theta, X_b, y) - J(last_theta, X_b, y)) < epsilon):\n" ] }, { "data": { "text/plain": [ "array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from myML_Algorithm.LinearRegression import linearRegression as LR\n", "from myML_Algorithm.model_selection import train_test_split\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, seed=666)\n", "\n", "lr = LR()\n", "lr.fit_gd(X_train, y_train)\n", "lr.coef_" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[6.3200e-03, 1.8000e+01, 2.3100e+00, 0.0000e+00, 5.3800e-01,\n", " 6.5750e+00, 6.5200e+01, 4.0900e+00, 1.0000e+00, 2.9600e+02,\n", " 1.5300e+01, 3.9690e+02, 4.9800e+00],\n", " [2.7310e-02, 0.0000e+00, 7.0700e+00, 0.0000e+00, 4.6900e-01,\n", " 6.4210e+00, 7.8900e+01, 4.9671e+00, 2.0000e+00, 2.4200e+02,\n", " 1.7800e+01, 3.9690e+02, 9.1400e+00],\n", " [2.7290e-02, 0.0000e+00, 7.0700e+00, 0.0000e+00, 4.6900e-01,\n", " 7.1850e+00, 6.1100e+01, 4.9671e+00, 2.0000e+00, 2.4200e+02,\n", " 1.7800e+01, 3.9283e+02, 4.0300e+00]])" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X[:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "lr.coef_全是NAN值,是由于数据值大小不一致,存在极大值和极小值。解决方法是数据做归一化处理。因此在使用梯度下降前需要对数据进行归一化处理" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-1.04042202 0.83093351 -0.24794356 0.01179456 -1.35034756 2.25074\n", " -0.66384353 -2.53568774 2.25572406 -2.34011572 -1.76565394 0.70923397\n", " -2.72677064]\n" ] } ], "source": [ "# 归一化处理后\n", "from sklearn.preprocessing import StandardScaler\n", "\n", "standardScaler = StandardScaler()\n", "standardScaler.fit(X_train)\n", "X_train_std = standardScaler.transform(X_train)\n", "\n", "lr.fit_gd(X_train_std, y_train)\n", "print(lr.coef_)\n", "X_test_std = standardScaler.transform(X_test)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 随机梯度下降\n", "- 日常工作中,我们遇到的数据往往特别大,如果采取传统的梯度下降算法系统运行时间会特别长,而且我们找到的是局部最优解。于是,我们采取随机梯度下降方法,可以减少运行时间也能找到全局最优解" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-1.04042202, 0.83093351, -0.24794356, 0.01179456, -1.35034756,\n", " 2.25074 , -0.66384353, -2.53568774, 2.25572406, -2.34011572,\n", " -1.76565394, 0.70923397, -2.72677064])" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def fit_sgd(self, X_train, y_train, n_iters=50, t0=5, t1=50):\n", " \"\"\"根据训练数据集X_train, y_train, 使用梯度下降法训练Linear Regression模型\"\"\"\n", " assert X_train.shape[0] == y_train.shape[0], \\\n", " \"the size of X_train must be equal to the size of y_train\"\n", " assert n_iters >= 1\n", "\n", " def dJ_sgd(theta, X_b_i, y_i):\n", " return X_b_i * (X_b_i.dot(theta) - y_i) * 2.\n", "\n", " def sgd(X_b, y, initial_theta, n_iters=5, t0=5, t1=50):\n", "\n", " def learning_rate(t):\n", " return t0 / (t + t1)\n", "\n", " theta = initial_theta\n", " m = len(X_b)\n", " for i_iter in range(n_iters):\n", " # 将原本的数据随机打乱,然后再按顺序取值就相当于随机取值\n", " indexes = np.random.permutation(m)\n", " X_b_new = X_b[indexes,:]\n", " y_new = y[indexes]\n", " for i in range(m):\n", " gradient = dJ_sgd(theta, X_b_new[i], y_new[i])\n", " theta = theta - learning_rate(i_iter * m + i) * gradient\n", "\n", " return theta\n", "\n", " X_b = np.hstack([np.ones((len(X_train), 1)), X_train])\n", " initial_theta = np.random.randn(X_b.shape[1])\n", " self._theta = sgd(X_b, y_train, initial_theta, n_iters, t0, t1)\n", " self.intercept_ = self._theta[0]\n", " self.coef_ = self._theta[1:]\n", " return self" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# sklearn中的SGD" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Wall time: 168 ms\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "D:\\tool\\anaconda\\lib\\site-packages\\sklearn\\linear_model\\stochastic_gradient.py:166: FutureWarning: max_iter and tol parameters have been added in SGDRegressor in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.\n", " FutureWarning)\n" ] }, { "data": { "text/plain": [ "0.8032440859338719" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.linear_model import SGDRegressor\n", "\n", "sgd_reg = SGDRegressor() # 默认n_iter=5\n", "%time sgd_reg.fit(X_train_std, y_train)\n", "sgd_reg.score(X_test_std, y_test)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Wall time: 3.99 ms\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "D:\\tool\\anaconda\\lib\\site-packages\\sklearn\\linear_model\\stochastic_gradient.py:152: DeprecationWarning: n_iter parameter is deprecated in 0.19 and will be removed in 0.21. Use max_iter and tol instead.\n", " DeprecationWarning)\n" ] }, { "data": { "text/plain": [ "0.8127865478262196" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 增加迭代次数\n", "sgd_reg = SGDRegressor(n_iter=100)\n", "%time sgd_reg.fit(X_train_std, y_train)\n", "sgd_reg.score(X_test_std, y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "速度非常快!增加迭代次数,可以提升效果" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }