{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 一、[线性回归](code/1-LinearRegression)\n",
    "- [全部代码](../code/1-LinearRegression/LinearRegression.py)\n",
    "\n",
    "### 1、代价函数\n",
    "- ![J(\\theta ) = \\frac{1}{{2{\\text{m}}}}\\sum\\limits_{i = 1}^m {{{({h_\\theta }({x^{(i)}}) - {y^{(i)}})}^2}} ](http://chart.apis.google.com/chart?cht=tx&chs=1x0&chf=bg,s,FFFFFF00&chco=000000&chl=J%28%5Ctheta%20%29%20%3D%20%5Cfrac%7B1%7D%7B%7B2%7B%5Ctext%7Bm%7D%7D%7D%7D%5Csum%5Climits_%7Bi%20%3D%201%7D%5Em%20%7B%7B%7B%28%7Bh_%5Ctheta%20%7D%28%7Bx%5E%7B%28i%29%7D%7D%29%20-%20%7By%5E%7B%28i%29%7D%7D%29%7D%5E2%7D%7D%20)\n",
    "- 其中：\n",
    "![{h_\\theta }(x) = {\\theta _0} + {\\theta _1}{x_1} + {\\theta _2}{x_2} + ...](http://chart.apis.google.com/chart?cht=tx&chs=1x0&chf=bg,s,FFFFFF00&chco=000000&chl=%7Bh_%5Ctheta%20%7D%28x%29%20%3D%20%7B%5Ctheta%20_0%7D%20%2B%20%7B%5Ctheta%20_1%7D%7Bx_1%7D%20%2B%20%7B%5Ctheta%20_2%7D%7Bx_2%7D%20%2B%20...)\n",
    "\n",
    "- 下面就是要求出theta，使代价最小，即代表我们拟合出来的方程距离真实值最近\n",
    "- 共有m条数据，其中![{{{({h_\\theta }({x^{(i)}}) - {y^{(i)}})}^2}}](http://chart.apis.google.com/chart?cht=tx&chs=1x0&chf=bg,s,FFFFFF00&chco=000000&chl=%7B%7B%7B%28%7Bh_%5Ctheta%20%7D%28%7Bx%5E%7B%28i%29%7D%7D%29%20-%20%7By%5E%7B%28i%29%7D%7D%29%7D%5E2%7D%7D)代表我们要拟合出来的方程到真实值距离的平方，平方的原因是因为可能有负值，正负可能会抵消\n",
    "- 前面有系数`2`的原因是下面求梯度是对每个变量求偏导，`2`可以消去\n",
    "\n",
    "- 实现代码："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 计算代价函数\n",
    "def computerCost(X,y,theta):\n",
    "    m = len(y)\n",
    "    J = 0\n",
    "    \n",
    "    J = (np.transpose(X*theta-y))*(X*theta-y)/(2*m) #计算代价J\n",
    "    return J"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 注意这里的X是真实数据前加了一列1，因为有theta(0)\n",
    "\n",
    "### 2、梯度下降算法\n",
    "- 代价函数对![{{\\theta _j}}](http://chart.apis.google.com/chart?cht=tx&chs=1x0&chf=bg,s,FFFFFF00&chco=000000&chl=%7B%7B%5Ctheta%20_j%7D%7D)求偏导得到：   \n",
    "![\\frac{{\\partial J(\\theta )}}{{\\partial {\\theta _j}}} = \\frac{1}{m}\\sum\\limits_{i = 1}^m {[({h_\\theta }({x^{(i)}}) - {y^{(i)}})x_j^{(i)}]} ](http://chart.apis.google.com/chart?cht=tx&chs=1x0&chf=bg,s,FFFFFF00&chco=000000&chl=%5Cfrac%7B%7B%5Cpartial%20J%28%5Ctheta%20%29%7D%7D%7B%7B%5Cpartial%20%7B%5Ctheta%20_j%7D%7D%7D%20%3D%20%5Cfrac%7B1%7D%7Bm%7D%5Csum%5Climits_%7Bi%20%3D%201%7D%5Em%20%7B%5B%28%7Bh_%5Ctheta%20%7D%28%7Bx%5E%7B%28i%29%7D%7D%29%20-%20%7By%5E%7B%28i%29%7D%7D%29x_j%5E%7B%28i%29%7D%5D%7D%20)\n",
    "- 所以对theta的更新可以写为：   \n",
    "![{\\theta _j} = {\\theta _j} - \\alpha \\frac{1}{m}\\sum\\limits_{i = 1}^m {[({h_\\theta }({x^{(i)}}) - {y^{(i)}})x_j^{(i)}]} ](http://chart.apis.google.com/chart?cht=tx&chs=1x0&chf=bg,s,FFFFFF00&chco=000000&chl=%7B%5Ctheta%20_j%7D%20%3D%20%7B%5Ctheta%20_j%7D%20-%20%5Calpha%20%5Cfrac%7B1%7D%7Bm%7D%5Csum%5Climits_%7Bi%20%3D%201%7D%5Em%20%7B%5B%28%7Bh_%5Ctheta%20%7D%28%7Bx%5E%7B%28i%29%7D%7D%29%20-%20%7By%5E%7B%28i%29%7D%7D%29x_j%5E%7B%28i%29%7D%5D%7D%20)\n",
    "- 其中![\\alpha ](http://chart.apis.google.com/chart?cht=tx&chs=1x0&chf=bg,s,FFFFFF00&chco=000000&chl=%5Calpha%20)为学习速率，控制梯度下降的速度，一般取**0.01,0.03,0.1,0.3.....**\n",
    "- 为什么梯度下降可以逐步减小代价函数\n",
    " - 假设函数`f(x)`\n",
    " - 泰勒展开：`f(x+△x)=f(x)+f'(x)*△x+o(△x)`\n",
    " - 令：`△x=-α*f'(x)`   ,即负梯度方向乘以一个很小的步长`α`\n",
    " - 将`△x`代入泰勒展开式中：`f(x+x)=f(x)-α*[f'(x)]²+o(△x)`\n",
    " - 可以看出，`α`是取得很小的正数，`[f'(x)]²`也是正数，所以可以得出：`f(x+△x)<=f(x)`\n",
    " - 所以沿着**负梯度**放下，函数在减小，多维情况一样。\n",
    "- 实现代码"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 梯度下降算法\n",
    "def gradientDescent(X,y,theta,alpha,num_iters):\n",
    "    m = len(y)      \n",
    "    n = len(theta)\n",
    "    \n",
    "    temp = np.matrix(np.zeros((n,num_iters)))   # 暂存每次迭代计算的theta，转化为矩阵形式\n",
    "    \n",
    "    \n",
    "    J_history = np.zeros((num_iters,1)) #记录每次迭代计算的代价值\n",
    "    \n",
    "    for i in range(num_iters):  # 遍历迭代次数    \n",
    "        h = np.dot(X,theta)     # 计算内积，matrix可以直接乘\n",
    "        temp[:,i] = theta - ((alpha/m)*(np.dot(np.transpose(X),h-y)))   #梯度的计算\n",
    "        theta = temp[:,i]\n",
    "        J_history[i] = computerCost(X,y,theta)      #调用计算代价函数\n",
    "        print '.',      \n",
    "    return theta,J_history "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3、均值归一化\n",
    "- 目的是使数据都缩放到一个范围内，便于使用梯度下降算法\n",
    "- ![{x_i} = \\frac{{{x_i} - {\\mu _i}}}{{{s_i}}}](http://chart.apis.google.com/chart?cht=tx&chs=1x0&chf=bg,s,FFFFFF00&chco=000000&chl=%7Bx_i%7D%20%3D%20%5Cfrac%7B%7B%7Bx_i%7D%20-%20%7B%5Cmu%20_i%7D%7D%7D%7B%7B%7Bs_i%7D%7D%7D)\n",
    "- 其中 ![{{\\mu _i}}](http://chart.apis.google.com/chart?cht=tx&chs=1x0&chf=bg,s,FFFFFF00&chco=000000&chl=%7B%7B%5Cmu%20_i%7D%7D) 为所有此feture数据的平均值\n",
    "- ![{{s_i}}](http://chart.apis.google.com/chart?cht=tx&chs=1x0&chf=bg,s,FFFFFF00&chco=000000&chl=%7B%7Bs_i%7D%7D)可以是**最大值-最小值**，也可以是这个feature对应的数据的**标准差**\n",
    "- 实现代码："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 归一化feature\n",
    "def featureNormaliza(X):\n",
    "    X_norm = np.array(X)            #将X转化为numpy数组对象，才可以进行矩阵的运算\n",
    "    #定义所需变量\n",
    "    mu = np.zeros((1,X.shape[1]))   \n",
    "    sigma = np.zeros((1,X.shape[1]))\n",
    "    \n",
    "    mu = np.mean(X_norm,0)          # 求每一列的平均值（0指定为列，1代表行）\n",
    "    sigma = np.std(X_norm,0)        # 求每一列的标准差\n",
    "    for i in range(X.shape[1]):     # 遍历列\n",
    "        X_norm[:,i] = (X_norm[:,i]-mu[i])/sigma[i]  # 归一化\n",
    "    \n",
    "    return X_norm,mu,sigma"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 注意预测的时候也需要均值归一化数据\n",
    "\n",
    "### 4、最终运行结果\n",
    "- 代价随迭代次数的变化   \n",
    "![enter description here](../../images/LinearRegression_01.png)\n",
    "\n",
    "\n",
    "### 5、使用scikit-learn库中的线性模型实现\n",
    "- [全部代码](../../code/1-LinearRegression/LinearRegression_scikit-learn.py)\n",
    "- 导入包"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from sklearn import linear_model\n",
    "from sklearn.preprocessing import StandardScaler    #引入缩放的包"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 归一化"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    " # 归一化操作\n",
    "scaler = StandardScaler()   \n",
    "scaler.fit(X)\n",
    "x_train = scaler.transform(X)\n",
    "x_test = scaler.transform(np.array([1650,3]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 线性模型拟合"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 线性模型拟合\n",
    "model = linear_model.LinearRegression()\n",
    "model.fit(x_train, y)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}