{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "bOChJSNXtC9g"
},
"source": [
"# 线性回归"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "OLIxEDq6VhvZ"
},
"source": [
"
\n",
"\n",
"在这节课上我们将学习线性回归。 我们将先理解它背后的数学基础原理再用python去实现它。 我们还将通过方法去讲解线性模型。\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "VoMq0eFRvugb"
},
"source": [
"# 概述"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "-qHciBsX93ej"
},
"source": [
"
\n",
"\n",
"$\\hat{y} = XW$\n",
"\n",
"*where*:\n",
"* $\\hat{y}$ = 预测值 | $\\in \\mathbb{R}^{NX1}$ ($N$ 是样本的个数)\n",
"* $X$ = 输入 | $\\in \\mathbb{R}^{NXD}$ ($D$ 是特征的个数)\n",
"* $W$ = 权重 | $\\in \\mathbb{R}^{DX1}$ "
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "QAgr7Grv9pb6"
},
"source": [
"* **目标:** 通过线性模型的输入 $X$ 去预测 $\\hat{y}$。模型将会寻找一条最优的线使得我们的预测值和目标值最为接近。训练数据 $(X, y)$ 用来训练这个模型并且通过随机梯度下降(SGD)学习权重 $W$。\n",
"* **优点:**\n",
" * 计算简单。\n",
" * 解释性强。\n",
" * 可用于连续(continuous)和无序的类别(categorical)特征。\n",
"* **缺点:**\n",
" * 线性模型只能用于线性可分的数据(针对于分类任务).\n",
" * 但是通常来讲不会用于分类任务,仅仅用于回归问题。\n",
"* **其他:** 当然你也可以使用线性回归去做二分类任务,如果预测出的连续数值高于一个阈值它就属于一个特定的分类。但是我们在未来的课程中将会介绍可用于做二分类任务更好的模型,所以我们本次课程只会集中在怎么用线性回归去做回归任务。\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "xP7XD24-09Io"
},
"source": [
"# 训练"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "476yPgTM1BKJ"
},
"source": [
"*步骤*: \n",
"1. 随机初始化模型的权重$W$。\n",
"2. 将输入值 $X$ 传入模型并且得到预测值$\\hat{y}$。\n",
"3. 通过损失函数来计算预测值$\\hat{y}$和真实值$\\hat{y}$之间的差距,从而得到损失值$J$。普遍在线性回归中用到的损失函数是均方误差(MSE)。这个函数计算出预测值和真实值之间的差距的平方($\\frac{1}{2}$ 没有数学意义,只是在求导的时候可以正好和平方抵消,方便计算)。\n",
" * $MSE = J(\\theta) = \\frac{1}{2}\\sum_{i}(\\hat{y}_i - y_i)^2$\n",
"4. 计算出对于模型权重的损失梯度$J(\\theta)$\n",
" * $J(\\theta) = \\frac{1}{2}\\sum_{i}(\\hat{y}_i - y_i)^2 = \\frac{1}{2}\\sum_{i}(X_iW - y_i)^2 $\n",
" * $\\frac{\\partial{J}}{\\partial{W}} = X(\\hat{y} - y)$\n",
"4. 我们使用学习率$\\alpha$和一个优化方法(比如随机梯度下降),通过反向传播来更新权重$W$。 一个简单的比方就是梯度可以告诉你在哪个方向上增加数值,然后通过减法来使得损失值$J(\\theta)$越来越小。\n",
" * $W = W- \\alpha\\frac{\\partial{J}}{\\partial{W}}$\n",
"5. 重复2 - 4步直到模型表现最好(也可以说直到损失收敛)。"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "jvJKjkMeJP4Q"
},
"source": [
"# 数据"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "RuPl9qlSJTIY"
},
"source": [
"我们将自己创建一些假数据应用在线性回归上。"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "HRXD7LqVJZ43"
},
"outputs": [],
"source": [
"from argparse import Namespace\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "NFsKg-Z6IWqG"
},
"outputs": [],
"source": [
"# 参数\n",
"args = Namespace(\n",
" seed=1234,\n",
" data_file=\"sample_data.csv\",\n",
" num_samples=100,\n",
" train_size=0.75,\n",
" test_size=0.25,\n",
" num_epochs=100,\n",
")\n",
"\n",
"# 设置随机种子来保证实验结果的可重复性。\n",
"np.random.seed(args.seed)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "NWux2lcoIWss"
},
"outputs": [],
"source": [
"# 生成数据\n",
"def generate_data(num_samples):\n",
" X = np.array(range(num_samples))\n",
" y = 3.65*X + 10\n",
" return X, y"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"colab_type": "code",
"id": "2mb2SjSQIWvF",
"outputId": "3aa66ef6-3c88-40fd-f53a-a77f93c8e052"
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" X | \n",
" y | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.0 | \n",
" 10.00 | \n",
"
\n",
" \n",
" 1 | \n",
" 1.0 | \n",
" 13.65 | \n",
"
\n",
" \n",
" 2 | \n",
" 2.0 | \n",
" 17.30 | \n",
"
\n",
" \n",
" 3 | \n",
" 3.0 | \n",
" 20.95 | \n",
"
\n",
" \n",
" 4 | \n",
" 4.0 | \n",
" 24.60 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" X y\n",
"0 0.0 10.00\n",
"1 1.0 13.65\n",
"2 2.0 17.30\n",
"3 3.0 20.95\n",
"4 4.0 24.60"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 生成随机数据\n",
"X, y = generate_data(args.num_samples)\n",
"data = np.vstack([X, y]).T\n",
"df = pd.DataFrame(data, columns=['X', 'y'])\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 362
},
"colab_type": "code",
"id": "6LwVVOkiLfBN",
"outputId": "ab2ecddb-bbeb-4117-d1cf-29312846da16"
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAEICAYAAABRSj9aAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAGY1JREFUeJzt3X2QHPV95/H3RytZFg+xDAisR8s2wjb2lUVuD3DI+Qhw4eFckXNlA04Og0NOSYwqJiEOwq4KxHVO8JmAScEpkY2NcNkWROZBRRFzGMwRcoAtgcKDMRclxrDsIgms5cGWMZK+90f/lrSW2Z2e3emZ6Z7Pq2pqZ3q6Z35Ni89+9jc9M4oIzMysvmZ0ewBmZlYuB72ZWc056M3Mas5Bb2ZWcw56M7Oac9CbmdWcg96sBJLOkXRvC+s/KemkMsdk/ctBbx0j6UxJD0j6qaTt6fonJKnbYxtP0t2Sfrfb42hEUkg6vNvjsOpw0FtHSLoAuBL4AvAW4DDg94HjgDd0eCwzO/l8Zt3moLfSSXoT8FngExGxISJeisxDEfHbEfFKWm+2pMskPSVpm6S/kTQn3Xe8pCFJF6S/BkYkfTz3HEW2vVDSs8BXJb1Z0q2Sdkjama4vSut/DviPwFWSXpZ0VVr+Lkl3SPqJpCcknZ57/oMlbZT0oqTvAe9o8t/kLEk/lvS8pM+Mu+9oSfdJGk37eZWkN6T77kmr/VMa2xmT7YsZOOitM94PzAZuabLe54EjgOXA4cBC4M9y978FeFNafi5wtaQ3t7DtQcBbgZVk//a/mm4vAXYBVwFExGeAfwBWRcQBEbFK0v7AHcA3gEOBjwL/S9J70uNfDfwcmA/8Tro0JOlIYA1wFrAAOBjIB/Me4I+AQ8j+250IfCKN7QNpnfelsV0/2b6YARARvvhS6gX4b8Cz45b9X2CULJQ+AAj4KfCO3DrvB36Urh+f1p2Zu387cGzBbX8BvHGSMS4HduZu3w38bu72GcA/jNvmb4GLgQHgVeBdufv+Arh3guf6M2B97vb+aXwnTbD++cBNudsBHF50X3zxxXOV1gnPA4dImhkRuwEi4lcAJA2RNdJ5wH7A5txrsyIL0dceZ2z75GfAAQW33RERP3/tTmk/4ArgFGDsr4IDJQ1ExJ4G+/BW4BhJo7llM4GvpeefCTydu+/Hjf9TAFmLf23diPippOdzYzsCuBwYTPs1E9g80YNNYV+sz3jqxjrhPuAVYMUk6zxH1tjfExFz0+VNEXFAgccvsu34j2m9AHgncExE/BLZXxWQ/YJotP7TwP/JPf7cyKZO/gDYAewGFufWXzLJeEfy66agPjh3/xrgh8CyNLZP58bVSLN9sT7noLfSRcQo8Odkc9oflnSApBmSlpNNWxARe4EvAVdIOhRA0kJJJxd4/KlseyDZL4dRSQeRTcHkbQPenrt9K3BEehF1Vrr8B0nvTq35RuASSfulOfizJ3nuDcAHJf1qepH1s+z7/+KBwIvAy5LeBfxBk7E12xfrcw5664iI+J/AHwN/Sja3vo1sjvtCsvl60vWtwP2SXgS+Q9ZUi2h12y8Cc8j+Grgf+Pa4+68EPpzOYvnriHgJ+HXgTGAYeJbsBeDZaf1VZNNIzwLXkr042lBEPAacR/bC7giwExjKrfInwG8BL5H9Art+3ENcAqxLZ+WcXmBfrM8pwl88YmZWZ270ZmY156A3M6s5B72ZWc056M3Maq4n3jB1yCGHxNKlS7s9DDOzStm8efNzETGv2Xo9EfRLly5l06ZN3R6GmVmlSJrsHdiv8dSNmVnNOejNzGrOQW9mVnMOejOzmnPQm5nVXE+cdWNm1m9ufugZvnD7EwyP7mLB3Dl86uR38qGjFpbyXA56M7MOu/mhZ7joxkfY9Wr2vTDPjO7iohsfASgl7B30ZmYdMtbinxnd9br7dr26hy/c/oSD3sysqsa3+EaGG/wCaAcHvZlZiSZr8eMtmDunlDE46M3MSlKkxY+ZM2uAT51c9AvVWuOgNzNrs1ZaPMBCn3VjZlYdrbb4v/yv/660gB/joDczm6b8OfEzJPYU+C7uslt8noPezGwaxjf4ZiHfqRaf56A3M5uCVufhobMtPs9Bb2bWolbm4aE7LT6vadBLeiNwDzA7rb8hIi6WdC3wn4AX0qrnRMQWSQKuBE4DfpaWP1jG4M3MOqmVFj8gsTei9M+xKaJIo38FOCEiXpY0C7hX0t+n+z4VERvGrX8qsCxdjgHWpJ9mZpXVi2fTFNU06CMigJfTzVnpMtmrDSuA69J290uaK2l+RIxMe7RmZh3Wa+fET0Whz6OXNCBpC7AduCMiHkh3fU7Sw5KukDQ7LVsIPJ3bfCgtG/+YKyVtkrRpx44d09gFM7NyjLX4IiE/Z9YAXzxjOf+4+oSeCnko+GJsROwBlkuaC9wk6b3ARcCzwBuAtcCFwGcBNXqIBo+5Nm3H4OBg85NOzcw6pA4tPq+ls24iYlTS3cApEXFZWvyKpK8Cf5JuDwGLc5stAoanO1Azs06o8lz8RIqcdTMPeDWF/BzgJODzY/Pu6SybDwGPpk02AqskrSd7EfYFz8+bWa+rW4vPK9Lo5wPrJA2QzenfEBG3Sror/RIQsAX4/bT+bWSnVm4lO73y4+0ftplZ+9SxxecVOevmYeCoBstPmGD9AM6b/tDMzMpV5xaf53fGmllfqnuLz3PQm1lf6ZcWn+egN7O+0U8tPs9Bb2a1148tPs9Bb2a11q8tPs9Bb2a11O8tPs9Bb2a14xa/Lwe9mdWGW3xjDnozqwW3+Ik56M2s0tzim3PQm1llucUX46A3s8pxi2+Ng97MKsUtvnUOejPreWMNfnh0FzMk9kTzL6Xr9xaf56A3s542vsE3C3m3+Ndz0JtZT2p1Hh7c4ifioDezntPKPDy4xTfjoDezntFKix+Q2BvBArf4pop8OfgbgXuA2Wn9DRFxsaS3AeuBg4AHgbMi4heSZgPXAf8eeB44IyKeLGn8ZlYTPpumPEUa/SvACRHxsqRZwL2S/h74Y+CKiFgv6W+Ac4E16efOiDhc0pnA54EzShq/mVWcz4kvX5EvBw/g5XRzVroEcALwW2n5OuASsqBfka4DbACukqT0OGZmr3GL74xCc/SSBoDNwOHA1cC/AKMRsTutMgSM/ddfCDwNEBG7Jb0AHAw8N+4xVwIrAZYsWTK9vTCzSnGL76xCQR8Re4DlkuYCNwHvbrRa+qlJ7ss/5lpgLcDg4KDbvlmfcIvvvJbOuomIUUl3A8cCcyXNTK1+ETCcVhsCFgNDkmYCbwJ+0r4hm1kVucV3T5GzbuYBr6aQnwOcRPYC63eBD5OdeXM2cEvaZGO6fV+6/y7Pz5v1N7f47irS6OcD69I8/Qzghoi4VdIPgPWS/gfwEHBNWv8a4GuStpI1+TNLGLeZVYBbfG8octbNw8BRDZb/K3B0g+U/Bz7SltGZWWW5xfcOvzPWzNrKLb73OOjNrG3c4nuTg97Mps0tvrc56M1sWtzie5+D3symxC2+Ohz0ZtYyt/hqcdCbWWFu8dXkoDezQtziq8tBb2aTcouvPge9mU3ILb4eHPRm9jpu8fXioDezfbjF14+D3swAt/g6c9CbmVt8zTnozfrUWIMfHt3FDIk9Bb4fyC2+mhz0Zn1ofINvFvJu8dXmoDfrI63Ow4NbfB046M36RCvz8OAWXyczmq0gabGk70p6XNJjkj6Zll8i6RlJW9LltNw2F0naKukJSSeXuQNmNrmbH3qG4y69i/Ov39I05AckRNbiHfL1UaTR7wYuiIgHJR0IbJZ0R7rvioi4LL+ypCPJvhD8PcAC4DuSjoiIYjXCzNrGZ9MYFPty8BFgJF1/SdLjwGT/ElYA6yPiFeBHkraSfYn4fW0Yr5kV4HPiLa+lOXpJS4GjgAeA44BVkj4GbCJr/TvJfgncn9tsiMl/MZhZG7nF23hN5+jHSDoA+BZwfkS8CKwB3gEsJ2v8fzW2aoPNX3fulqSVkjZJ2rRjx46WB25m+2plLh48D99PCjV6SbPIQv7rEXEjQERsy93/JeDWdHMIWJzbfBEwPP4xI2ItsBZgcHCw+Ts1zGxCbvE2maZBL0nANcDjEXF5bvn8NH8P8JvAo+n6RuAbki4nezF2GfC9to7azADPxVsxRRr9ccBZwCOStqRlnwY+Kmk52bTMk8DvAUTEY5JuAH5AdsbOeT7jxqz93OKtqCJn3dxL43n32ybZ5nPA56YxLjObgFu8tcrvjDWrELd4mwoHvVkFuMXbdDjozXqcW7xNl4PerEe5xVu7OOjNepBbvLWTg96sh7jFWxkc9GY9wi3eyuKgN+syt3grm4PerIvc4q0THPRmXeAWb53koDfrMLd46zQHvVmHuMVbtzjozTrALd66yUFvViK3eOsFDnqzkrjFW69w0Ju10ViDHx7dxQyJPdH8WzLd4q1sDnqzNhnf4JuFvFu8dYqD3myaWp2HB7d46ywHvdk0tDIPD27x1h1Ng17SYuA64C3AXmBtRFwp6SDgemAp2ZeDnx4ROyUJuBI4DfgZcE5EPFjO8M26o5UWPyCxN4IFbvHWJUUa/W7ggoh4UNKBwGZJdwDnAHdGxKWSVgOrgQuBU4Fl6XIMsCb9NKsFn01jVdM06CNiBBhJ11+S9DiwEFgBHJ9WWwfcTRb0K4DrIiKA+yXNlTQ/PY5ZZfmceKuqluboJS0FjgIeAA4bC++IGJF0aFptIfB0brOhtGyfoJe0ElgJsGTJkikM3axz3OKtygoHvaQDgG8B50fEi9lUfONVGyx73XlmEbEWWAswODjY/GRjsy5wi7c6KBT0kmaRhfzXI+LGtHjb2JSMpPnA9rR8CFic23wRMNyuAZt1ilu81UWRs24EXAM8HhGX5+7aCJwNXJp+3pJbvkrSerIXYV/w/LxViVu81U2RRn8ccBbwiKQtadmnyQL+BknnAk8BH0n33UZ2auVWstMrP97WEZuVyC3e6qjIWTf30njeHeDEBusHcN40x2XWUW7xVmd+Z6z1Pbd4qzsHvfUtt3jrFw5660tu8dZPHPTWV9zirR856K1vuMVbv3LQW+25xVu/c9BbrbnFmznorabc4s3+jYPeasct3mxfDnqrDbd4s8Yc9FYLbvFmE3PQW6W5xZs156C3ynKLNyvGQW+VMtbgh0d3MUNiTzT/cjK3eOt3DnqrjPENvlnIu8WbZRz01vNanYcHt3izPAe99bRW5uHBLd6sEQe99aRWWvyAxN4IFrjFmzVU5MvBvwJ8ENgeEe9Nyy4B/juwI6326Yi4Ld13EXAusAf4w4i4vYRxW435bBqz9irS6K8FrgKuG7f8ioi4LL9A0pHAmcB7gAXAdyQdERHF/u62vuZz4s3KUeTLwe+RtLTg460A1kfEK8CPJG0Fjgbum/IIrS+4xZuVZzpz9KskfQzYBFwQETuBhcD9uXWG0rLXkbQSWAmwZMmSaQzDqswt3qx8M6a43RrgHcByYAT4q7RcDdZteLJzRKyNiMGIGJw3b94Uh2FVNtbii4T8nFkDfPGM5fzj6hMc8mYtmlKjj4htY9clfQm4Nd0cAhbnVl0EDE95dFZLbvFmnTWloJc0PyJG0s3fBB5N1zcC35B0OdmLscuA7017lFYbnos367wip1d+EzgeOETSEHAxcLyk5WTTMk8CvwcQEY9JugH4AbAbOM9n3Bi4xZt1k6LAh0KVbXBwMDZt2tTtYVhJ3OLNyiFpc0QMNlvP74y10rjFm/UGB72Vwi3erHc46K2t3OLNeo+D3trGLd6sNznobdrc4s16m4PepsUt3qz3OehtStzizarDQW8tc4s3qxYHvRXmFm9WTQ56K8Qt3qy6HPQ2Kbd4s+pz0NuE3OLN6sFBb6/jFm9WLw5624dbvFn9OOjttQY/PLqLGRJ7Cnx0tVu8WXU46Pvc+AbfLOTd4s2qx0Hfp1qdhwe3eLOqctD3oVbm4cEt3qzqHPR9pJUWPyCxN4IFbvFmlVfky8G/AnwQ2B4R703LDgKuB5aSfTn46RGxU5KAK4HTgJ8B50TEg+UM3Vrhs2nM+teMAutcC5wybtlq4M6IWAbcmW4DnAosS5eVwJr2DNOm6uaHnuG4S+/i/Ou3FAr5hXPnOOTNaqZpo4+IeyQtHbd4BXB8ur4OuBu4MC2/LiICuF/SXEnzI2KkXQO24tzizQymPkd/2Fh4R8SIpEPT8oXA07n1htKy1wW9pJVkrZ8lS5ZMcRjWiN/ZamZ57X4xVg2WNTwxOyLWAmsBBgcHm79Dxwpxizez8aYa9NvGpmQkzQe2p+VDwOLceouA4ekM0IpxizeziUw16DcCZwOXpp+35JavkrQeOAZ4wfPz5XOLN7PJFDm98ptkL7weImkIuJgs4G+QdC7wFPCRtPptZKdWbiU7vfLjJYzZErd4MyuiyFk3H53grhMbrBvAedMdlDXnFm9mRfmdsRXjFm9mrXLQV4hbvJlNhYO+AtzizWw6HPQ9zi3ezKbLQd+j3OLNrF0c9D3ILd7M2slB30Pc4s2sDA76HuEWb2ZlcdB3mVu8mZXNQd9FbvFm1gkO+i5wizezTnLQd5hbvJl1moO+A8Ya/PDoLmZI7Inm37PiFm9m7eKgL9n4Bt8s5N3izazdHPQlaXUeHtzizawcDvoStDIPD27xZlYuB30btdLiByT2RrDALd7MSuagbxOfTWNmvWpaQS/pSeAlYA+wOyIGJR0EXA8sBZ4ETo+IndMbZu/yOfFm1uva0eh/LSKey91eDdwZEZdKWp1uX9iG5+k5bvFmVgVlTN2sAI5P19cBd1OzoHeLN7MqmW7QB/C/JQXwtxGxFjgsIkYAImJE0qGNNpS0ElgJsGTJkmkOo3Pc4s2saqYb9MdFxHAK8zsk/bDohumXwlqAwcHB5m8V7TK3eDOrqmkFfUQMp5/bJd0EHA1skzQ/tfn5wPY2jLOr3OLNrMqmHPSS9gdmRMRL6fqvA58FNgJnA5emn7e0Y6Dd4BZvZnUwnUZ/GHCTpLHH+UZEfFvS94EbJJ0LPAV8ZPrD7Dy3eDOriykHfUT8K/C+BsufB06czqC6yS3ezOrG74zNcYs3szpy0OMWb2b11vdB7xZvZnXXt0HvFm9m/aIvg94t3sz6SV8FvVu8mfWjvgl6t3gz61e1D3q3eDPrd7UOerd4M7OaBr1bvJnZv6ld0LvFm5ntqzZB7xZvZtZYLYLeLd7MbGKVDfqxBj88uosZEnui+ZdUucWbWT+qZNCPb/DNQt4t3sz6WSWD/gu3P1Fomgbc4s3MKhn0wwVecHWLNzPLzOj2AKZiwdw5DZcPSIisxTvkzcwypTV6SacAVwIDwJcj4tJ2PfanTn7n686ycYM3M2uslKCXNABcDfxnYAj4vqSNEfGDdjz+WJiPnXWzwPPwZmYTKqvRHw1sTV8gjqT1wAqgLUEPWdg72M3Mmitrjn4h8HTu9lBa9hpJKyVtkrRpx44dJQ3DzMzKCno1WLbPye4RsTYiBiNicN68eSUNw8zMygr6IWBx7vYiYLik5zIzs0mUFfTfB5ZJepukNwBnAhtLei4zM5tEKS/GRsRuSauA28lOr/xKRDxWxnOZmdnkFAU+DKz0QUg7gB9PcfNDgOfaOJyq6Mf97sd9hv7c737cZ2h9v98aEU1f5OyJoJ8OSZsiYrDb4+i0ftzvftxn6M/97sd9hvL2u5IfgWBmZsU56M3Maq4OQb+22wPokn7c737cZ+jP/e7HfYaS9rvyc/RmZja5OjR6MzObhIPezKzmKh30kk6R9ISkrZJWd3s8ZZC0WNJ3JT0u6TFJn0zLD5J0h6R/Tj/f3O2xlkHSgKSHJN2abr9N0gNpv69P77yuDUlzJW2Q9MN0zN/fD8da0h+lf9+PSvqmpDfW8VhL+oqk7ZIezS1reHyV+euUbw9L+uWpPm9lgz73mfenAkcCH5V0ZHdHVYrdwAUR8W7gWOC8tJ+rgTsjYhlwZ7pdR58EHs/d/jxwRdrvncC5XRlVea4Evh0R7wLeR7bvtT7WkhYCfwgMRsR7yd5Nfyb1PNbXAqeMWzbR8T0VWJYuK4E1U33SygY9uc+8j4hfAGOfeV8rETESEQ+m6y+R/Y+/kGxf16XV1gEf6s4IyyNpEfBfgC+n2wJOADakVWq135J+CfgAcA1ARPwiIkbpg2NN9nEscyTNBPYDRqjhsY6Ie4CfjFs80fFdAVwXmfuBuZLmT+V5qxz0TT/zvm4kLQWOAh4ADouIEch+GQCHdm9kpfki8KfA3nT7YGA0Inan23U75m8HdgBfTdNVX5a0PzU/1hHxDHAZ8BRZwL8AbKbexzpvouPbtoyrctA3/cz7OpF0APAt4PyIeLHb4ymbpA8C2yNic35xg1XrdMxnAr8MrImIo4CfUrNpmkbSnPQK4G3AAmB/smmL8ep0rIto27/3Kgd933zmvaRZZCH/9Yi4MS3eNvZnXPq5vVvjK8lxwG9IepJsWu4EsoY/N/15D/U75kPAUEQ8kG5vIAv+uh/rk4AfRcSOiHgVuBH4Fep9rPMmOr5ty7gqB31ffOZ9mpe+Bng8Ii7P3bURODtdPxu4pdNjK1NEXBQRiyJiKdmxvSsifhv4LvDhtFqt9jsingWelvTOtOhEsu9ZrvWxJpuyOVbSfunf+9h+1/ZYjzPR8d0IfCydfXMs8MLYFE/LIqKyF+A04P8B/wJ8ptvjKWkff5Xsz7WHgS3pchrZfPWdwD+nnwd1e6wl/jc4Hrg1XX878D1gK/B3wOxuj6/N+7oc2JSO983Am/vhWAN/DvwQeBT4GjC7jsca+CbZ6xCvkjX2cyc6vmRTN1enfHuE7KykKT2vPwLBzKzmqjx1Y2ZmBTjozcxqzkFvZlZzDnozs5pz0JuZ1ZyD3sys5hz0ZmY19/8BQ400FTRzri8AAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# 画散点图\n",
"plt.title(\"Generated data\")\n",
"plt.scatter(x=df[\"X\"], y=df[\"y\"])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Qwn29SjK-XCg"
},
"source": [
"# Scikit-learn 实现方法"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "-kSEp8MY-y9C"
},
"source": [
"**注意**: `LinearRegression`类在Scikit-learn中使用的是正规方程法来做的拟合。然而,我们将会使用Scikit-learn中的随机梯度下降`SGDRegressor`类来拟合数据。我们使用这个优化方法是因为在未来的几节课程中我们也会使用到它。"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "uKmBKodpgHEE"
},
"outputs": [],
"source": [
"# 调包\n",
"from sklearn.linear_model.stochastic_gradient import SGDRegressor\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.model_selection import train_test_split"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 85
},
"colab_type": "code",
"id": "WuUQwD72NVAE",
"outputId": "c46cd73c-7ca4-4d57-ee4b-0fc636f5cde5"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"X_train: (75, 1)\n",
"y_train: (75,)\n",
"X_test: (25, 1)\n",
"y_test: (25,)\n"
]
}
],
"source": [
"# 划分数据到训练集和测试集\n",
"X_train, X_test, y_train, y_test = train_test_split(\n",
" df[\"X\"].values.reshape(-1, 1), df[\"y\"], test_size=args.test_size, \n",
" random_state=args.seed)\n",
"print (\"X_train:\", X_train.shape)\n",
"print (\"y_train:\", y_train.shape)\n",
"print (\"X_test:\", X_test.shape)\n",
"print (\"y_test:\", y_test.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "MJVs6JF7trja"
},
"source": [
"我们需要标准化我们的数据(零均值和单位方差),以便正确使用SGD并在速度上优化。"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 51
},
"colab_type": "code",
"id": "VlOYPD5GRjRC",
"outputId": "9d63c0d3-da44-487c-fb8c-c9879f2d97d3"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"mean: [8.22952817e-17] -1.5617137213060536e-16\n",
"std: [1.] 0.9999999999999999\n"
]
}
],
"source": [
"# 标准化训练集数据 (mean=0, std=1) \n",
"X_scaler = StandardScaler().fit(X_train)\n",
"y_scaler = StandardScaler().fit(y_train.values.reshape(-1,1))\n",
"\n",
"# 在训练集和测试集上进行标准化操作\n",
"standardized_X_train = X_scaler.transform(X_train)\n",
"standardized_y_train = y_scaler.transform(y_train.values.reshape(-1,1)).ravel()\n",
"standardized_X_test = X_scaler.transform(X_test)\n",
"standardized_y_test = y_scaler.transform(y_test.values.reshape(-1,1)).ravel()\n",
"\n",
"\n",
"# 检查\n",
"print (\"mean:\", np.mean(standardized_X_train, axis=0), \n",
" np.mean(standardized_y_train, axis=0)) # mean 应该是 ~0\n",
"print (\"std:\", np.std(standardized_X_train, axis=0), \n",
" np.std(standardized_y_train, axis=0)) # std 应该是 1"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "CiE3oLCkOCEa"
},
"outputs": [],
"source": [
"# 初始化模型\n",
"lm = SGDRegressor(loss=\"squared_loss\", penalty=\"none\", max_iter=args.num_epochs)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 102
},
"colab_type": "code",
"id": "sGH_pQaDOb49",
"outputId": "ddfff892-314a-4d19-d8a3-549b5e6c555a"
},
"outputs": [
{
"data": {
"text/plain": [
"SGDRegressor(alpha=0.0001, average=False, epsilon=0.1, eta0=0.01,\n",
" fit_intercept=True, l1_ratio=0.15, learning_rate='invscaling',\n",
" loss='squared_loss', max_iter=100, n_iter=None, penalty='none',\n",
" power_t=0.25, random_state=None, shuffle=True, tol=None, verbose=0,\n",
" warm_start=False)"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 训练\n",
"lm.fit(X=standardized_X_train, y=standardized_y_train)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "fA8VkVVGPkTr"
},
"outputs": [],
"source": [
"# 预测 (还未标准化)\n",
"pred_train = (lm.predict(standardized_X_train) * np.sqrt(y_scaler.var_)) + y_scaler.mean_\n",
"pred_test = (lm.predict(standardized_X_test) * np.sqrt(y_scaler.var_)) + y_scaler.mean_"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "T8Ws-khqJuNr"
},
"source": [
"# 评估"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Y2pha3VRWd2D"
},
"source": [
"有很多种方法可以来评估我们模型表现的好坏。"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "abGgfBbLVjJ_"
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"id": "RKm8IiP7O66e",
"outputId": "4fd58928-36da-4d96-9222-9658e1a0bd58"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"train_MSE: 0.00, test_MSE: 0.00\n"
]
}
],
"source": [
"# 训练和测试集上的均方误差 MSE\n",
"train_mse = np.mean((y_train - pred_train) ** 2)\n",
"test_mse = np.mean((y_test - pred_test) ** 2)\n",
"print (\"train_MSE: {0:.2f}, test_MSE: {1:.2f}\".format(train_mse, test_mse))"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "TegkJM2-YKEq"
},
"source": [
"除了使用MSE,如果我们只有一个特征向量,我们可以把他们可视化出来直观的评估模型。"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 335
},
"colab_type": "code",
"id": "gH5N-U7YQVgn",
"outputId": "4a1ce429-18c5-45e8-e701-2cc7affcb94b"
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA3AAAAE/CAYAAAAHeyFHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJzt3Xm8VVX5+PHPw6Dc0rrOyUXCgUoLRUWzTDPNKQfQ0kxzVvo2fPNXauBcVoqSmWlfk9LUcqjM0EgjxzRyHnJGTUkBUzQxzQsyrN8f59xz973cCe6wz7n78369eN299t7nsjgc7sOz11rPipQSkiRJkqTqNyDvDkiSJEmSusYETpIkSZJqhAmcJEmSJNUIEzhJkiRJqhEmcJIkSZJUI0zgJEmSJKlGmMBJVSQiBkbEWxExPO++SJIkqfqYwEndUE62mn4tjYjGTPug5f1+KaUlKaVVUkov9EZ/JUnqaz0dKzPf9+6I+GJP9lWqBYPy7oBUy1JKqzQdR8Qs4KiU0s3t3R8Rg1JKi/uib5IkVYPljZWSOuYInNSLIuJ7EfHriLgqIt4EvhgRHys/NZwfES9FxI8jYnD5/kERkSJiRLn9q/L1GyPizYi4KyLWz/GPJElSjyovHzglIp6LiFcj4oqIqC9fe3dEXB0R/y7HzXsiYrWIOAfYCvh5eSTvnHz/FFLfMYGTet8+wJXAe4FfA4uBY4A1gW2B3YAvdfD6A4FTgNWBF4Dv9mZnJUnqY8cDuwCfAIYBi4Bzy9eOojRjrIFS3Pwa8E5K6VjgPkqjeauU21IhmMBJve+vKaU/pJSWppQaU0r3pZTuSSktTik9B0wBPtnB669JKd2fUloEXAGM7pNeS5LUN74ETEwpzU0pLQC+A3w+IoJSMrcWsGE5bt6XUvpvnp2V8uYaOKn3vZhtRMSHgHOALYF3Ufp3eE8Hr/9X5vhtYJX2bpQkqZaUk7T1gBsiImUuDQDWAC4G3gdcExGrAJcDp6SUlvR5Z6Uq4Qic1PtSq/ZFwGPARiml9wCnAtHnvZIkKWcppQTMAXZMKdVnfg1JKb2aUlqYUjo1pfQhYHtgP+CAppfn1W8pTyZwUt9bFXgD+G9EbEzH698kServfgpMioj1ACJi7YjYq3z86YjYJCIGAP+htI68afTtZWCDPDos5ckETup7xwKHAm9SGo37db7dkSQpV2cDNwO3lis2/w3YonytAbiOUsx8DLgB+E352rnAIRHxekSc3bddlvITpZFrSZIkSVK1cwROkiRJkmqECZwkSZIk1QgTOEmSJEmqESZwkiRJklQjTOAkSZIkqUYMyrsDAGuuuWYaMWJE3t2QJPWBBx544NWU0lp596NWGCMlqRi6Gh+rIoEbMWIE999/f97dkCT1gYj4Z959qCXGSEkqhq7GR6dQSpIkSVKNMIGTJEmSpBphAidJkiRJNcIETpIkSZJqhAmcJEmSJNUIEzhJkiRJqhEmcJIkSZJUI6piHzhJUvWa+tAcJk+fydz5jQytr+P4XT/IuM0b8u6WJEm5yyNGmsBJkto19aE5nHDtozQuWgLAnPmNnHDtowAmcZKkQssrRjqFUpLUrsnTZ5YCU0oMWbQAgMZFS5g8fWbOPZMkKV9NMXLlxe8wYGkpieuLGGkCJ0lq19z5jQyb/y9mnb0XT/3wcy3OS5JUZHPnN3LSrT9n5jn7svXsx1uc701OoZQkteuEB3/H+Jt+AcCmx1xdOT+0vi6vLkmSlL///Ifnz9oTgPM/9nnuHr5p5VJvx0gTOEnSMouwJ3xyOHt/fCTjgau23IMTPv3lyr11gwdy/K4fzK+zkiT1odYx8tyBz7D1t/4HgN2+8nOeWvV9lXv7IkaawElSwbVehL3BQzPY+4SdShcffpi6pWvSYBVKSVIBZWNkpKVcfM4RfOjVf/Lvj2zO6o88wP88PNcqlJKkvpUtVDL1l8cy+qWn+Wf9+/ji8Zdz52abMQ4rTkqSiqkpRm706gvcfPFXADjys6fw1JgdmBHBuM0b+jxGmsBJUsHNnd/INi88wtVXnQjA/9vzWKZ++FPEf97JuWeSJOVr7vxGZpXXugFs8o3f8vZKdUSOxbxM4CSp4B4/93O8653SFgHbfennvFhfmstvoRJJUqHNnl0pVAIwYsK0ynGeMdIETpIKovUi7BO3WZs9PjWKd5WvZwOThUokSYV24IFw1VUAfOmA7zD9/VtWLuUdI90HTpIKoGkR9pz5jSTg0GsvYI9PjSpdvPxypj44m4b6OgJoqK/jzH1Hue5NklQYUx+aw7aTbmXDb10PEZXkjaVL2f1bR1ZVjHQETpIKIFuoZNbZe1XOb/+9P3PHwTtbqGQFRcQQ4A5gZUox9ZqU0mkRcSnwSeCN8q2HpZQejogAzgM+A7xdPv9g3/dcktSk6SHnJx+7gxlTzwTgmtG7MuiSixmXU6GSjpjASVIBzJ3fyGcfvYVzbjgXgNs22JLD9/sO8aaFSrppIbBjSumtiBgM/DUibixfOz6ldE2r+3cHRpZ/fRS4sPxVkpSTydNn8uT3dq+0t/rqL5m3ymo0TJ9ZVYlbk04TOJ8uSlLtyy7C3vPQH/HY+zYCLFTSXSmlBLxVbg4u/0odvGQscHn5dXdHRH1ErJtSeqmXuypJastTTzGjae9TWq4Hn5tjpcmOdGUEzqeLklQjWhcqOWWL97LbrmMq1y1U0vMiYiDwALAR8JOU0j0R8WXg+xFxKnALMDGltBBoAF7MvHx2+ZwJnCT1tbo6WFCqwnz6jkdzyVZjW1yu1oecnRYxSSUr9HQxpXQ3UB8R63a/q5KkjrQuVDLl3KObk7dvftNCJb0kpbQkpTQaGAZsHREfAU4APgRsBawOTCjfHm19i9YnImJ8RNwfEffPmzevl3ouSQW1ZEmpUEk5eZt6/wtc9fF9W9xSzQ85u7QGzqeLklT92itUssN3b+T2k3ezUEkvSynNj4jbgd1SSj8on14YEb8Ajiu3ZwPrZV42DJjbxveaAkwBGDNmTEcPTSVJy+PMM+HEE0vHa60Fr7zCOIABA1rMYDl+1w9WbczsUgKXUloCjI6IeuD3maeL/wJWohRkJgCnsxxPF4HxAMOHD1+hzkuSmqdNzpnfyP/OuIpj/3pF5dqICdOIt5bk2Lv+LSLWAhaVk7c64NPAWU3r2srrwscBj5Vfcj3wtYi4mtLygjdc/yZJfSQyacozz8BGG1Wa1VZpsiPLVYXSp4uSVF2apk02LlrCrEyhkoP3P507198CqN45/P3EusBl5ZkqA4DfpJSmRcSt5eQugIeB/ynffwOlIl/PUir0dXgOfZakYnnqKdh44+Z2qu3UoytVKH26KElVJFuoZEAE68/7Jzdf/JXKdQuV9J2U0iPA5m2c37Gd+xPw1d7ulySp7F3vgsZyNckf/hC+8Y18+9MDujIC59NFSaoS2RE3gH9M2qNy7ek1hrPLUf9XaTdU+Rx+SZJ6zZIlMGhQy/aATus31oROEzifLkpS9WivUMnG37iGxpWGVNoN9XXMmNjmj2lJkvq3SZPghBNKx2usAa++mm9/ethyrYGTJOUjW6jkl1efzHb/fLhyLTtlEpw2KUkqsA4KlfQXJnCSVOXaK1Ry8i5f4VebfwaAgREsTanqSx9LktQrZs6ED32ouV3jhUo6YgInSVWodaGSzV58gmuvOL5yvXWhEjflliQV1iqrwH//Wzo+5xz45jfz7U8vM4GTpCrTUaESaJm8WahEklQk2Qecw96zEneetEvzxX5UqKQjJnCSVGXaK1Sy6TFX858hq1TaFiqRJBVJ9gHnl+/+LRP+chkAC+tXY+XX/51z7/qOCZwkVZm58xu55WdfYsN/z6mcs1CJJKmosoW8gBbrwXc4+iIWbbARM/LqXA5M4CSpyjyfCUw/2O6LXPDxAwALlUiSiic76jbqpWf4w+XNG3E3PdyMcmJXFCZwklQt7r4bPvaxStNCJZKkomtaVpAddbti9G6ctOvXKu2h9XV5dC03JnCSVA2y+9YAUx+cTUN5kbYjbpKkonr5328xa/LYSnuD469j6YCBlXYRlxSYwElSnlJqWTHr9dehvp5xYMImSSq2I47g2V/8otJsvR68qJWYTeAkqQ9lyx/fdslXGDHvheaL/XjTUUmSOpKNj0Pr65hxwk6Va/sedT4PrrF+pV30ZQX9f6MESaoSTQux58xv5Pmz9qwkb09++TiTN0lSYWXj44f/9WyL5I2UOOQr+9BQX0dQGnUrcvIGjsBJUp+ZPH0mm/7jYX591QmVcyMmTCvt55ZjvyRJylNbhUqu3GxXfnLAt5hBaUlBkRO21kzgJKmPtHiiSPNc/rkFK38sSVJWe4VKirY9QFeZwElSb2tVqGT0169kft17Ku2ilT+WJKniE5/g2RnN81CyhUqMj20zgZOkHpZdiP34ufvxrneanyBufPKNNC5aUmkXsfyxJElAiy10Dj74bO4cukmlbXxsn0VMJKkHtS5U0pS8/eMLR0JKnLnvKBdiS5KK7S9/abn/aUp89hsHGh+7yBE4Seqm7IjbgAg+/twD/PI3p1auZwuVuBBbklRo2cRtu+3gjjsA4+PyMIGTpG5oGnFrmhb5j0l7tLhuoRJJUlFlH3Cut+pK3HHyLs0XFy2CQaYiK8J3TZK6oan0MSkx6+y9Kue3/NqveO3d9ZW2C7ElSUWSfcD56ysm8NHZjzdfdO/TbjGBk6RumDu/scW+NdCygha4EFuSVDxt7e22/4GTmDNqK/c+7SYTOEnqhuczgem2Dbbk8P2+A8DACJamxND6Oo7f9YPO65ck9XvZKZM7/OM+Zlzzncq1poeb7u3WfSZwkrQi/vAH2HvvSjM76lY3eKDVsyRJhZKdMpkddXv1Xe9lzP9eUWm7pKD7TOAkaXllK2gBUx+cTUP5iaMjbpKkIpo8fSaLFixk1g/GVc5tdNxUFg9sTjdcUtAzTOAkqatSggGZ7TPnzoV112UcmLAVVEQMAe4AVqYUU69JKZ0WEesDVwOrAw8CB6eU3omIlYHLgS2B14DPp5Rm5dJ5SepBM07YqUU7OzMlwAecPajTBM7gJKmosnP5n29VqMQKWipbCOyYUnorIgYDf42IG4FvAuemlK6OiJ8CRwIXlr++nlLaKCIOAM4CPp9X5yWpR2RmpvzvXsfzh00+WWk31NcxY+KOefSq3xrQ+S2V4LQZMBrYLSK2oRR0zk0pjQRepxSUIBOcgHPL90lSTWmayz+nVfL28sc+afKmilTyVrk5uPwrATsC15TPXwY0zSkaW25Tvr5TRKs5uZJUK3796xbJ28Yn39gieXPKZO/odAQupZSA9oLTgeXzlwHfpvR0cWz5GErB6YKIiPL3kaSqlR1xGxDBzk/9lZ9OPbNyfcSEaaUniTn2UdUnIgYCDwAbAT8B/gHMTyktLt8yG2iaM9QAvAiQUlocEW8AawCv9mmnJam7Wj97SokzM3HUKZO9p0tr4AxOkvq7bPUsgH9M2qPF9aa5/HMtf6xWUkpLgNERUQ/8Hti4rdvKX9sabVvmAWdEjAfGAwwfPryHeipJPWDRIlhppeb2woWV9rjNG0zY+kCXEjiDk6T+rmnDUVJi1tl7Vc5/7Mu/4KX3rFVpW/5Y7UkpzY+I24FtgPqIGFR+0DkMmFu+bTawHjA7IgYB7wX+3cb3mgJMARgzZowzWCRVhzZG3dT3urIGriKlNB+4nUxwKl9qKzjRWXBKKY1JKY1Za621Wl+WpD4x9aE5bDvpVubMb2TWWXu2SN5GTJjWInlzLr9ai4i1yg83iYg64NPAk8BtwOfKtx0KXFc+vr7cpnz9VpcYSKoJ2eTtiitM3nLUaQJncJLUX2ULlWQ3HX3xvetUpkwOjCAoVdFyc261YV3gtoh4BLgPuCmlNA2YAHwzIp6ltIzg4vL9FwNrlM9/E5iYQ58lqet++9uWyVtKcOCB7d+vXteVKZTrApeV18ENAH6TUpoWEU8AV0fE94CHaBmcflkOTv8GDuiFfkvSCmldqGTfv/+ZyTeeV7me3bembvBAkzZ1KKX0CLB5G+efA7Zu4/wCYL8+6JokdZ9TJqtSV6pQGpwk9QtdLVQCpRE3q2dJkgqpg0Ilyl+XiphIUn/QXqGS7cf/jBdWW7fSdtNRSVJhOepW9UzgJPV7TdMmW691g5ajbmChEklSgWWTt1/+Er74xfz6onaZwEnq17LTJttL3gZGsDQlNx2VJBVGdk34gS/ey/evPL35oqNuVc0ETlK/Nnn6TPa7eyqn33xR5ZyFSiRJRdbRw02Tt+pnAiepX5txwk4t2hYqkSQV3eTpM3ln4TvMmjy2cm7kcb9n7TXew4wc+6WuMYGT1H9l5vLvdOSF/GPN9SptC5VIkorqz6fuwbsXLai0mx5uzp3fmFeXtBxM4CT1C9m5/M+3mg6y8ck3VrYOAAuVSJIKLIJ3lw+/uvcE/rjxdpVLQ+vr8umTlsuAvDsgSd3VNJd/ThvJGylx5r6jaKivIyiNvLnmTZJUONOmtZiZsvHJN7ZI3ny4WTscgZNUs7LbAxx+/3WcdsvPKtdGTJhWmiYJjNu8wYRNklRcbeztdmZm5opVmGuLCZykmtSV7QGcyy9JKrTFi2Hw4Ob2woWw0kqADzdrmQmcpJo0efrMZZK3nY/4Cc+s9f5K27n8kqTCWnVVeOut5rbbA/QbJnCSakZHhUqy2wOAc/klSQWWnTJ59dXw+c/n1xf1OBM4STWhK1Mmm7i/mySpKLIPNz/7r7/zg8tOar7oqFu/ZAInqSZMnj6T/e+eynduvqhyrq1RNytMSpKKoqOHmyZv/ZcJnKSaMOOEnVq0s8lbgBW0JEmFM3n6TBYufIdZk8dWzo087vesvcZ7mJFjv9S7TOAkVb/MXP5djriAp9caUWk31NcxY+KOOXRKkqR8/em0PVn1neaKy1ZhLgYTOElVpaNCJRuffCONi5ZU2hYqkSQVVgSrlg//d6/j+cMmn6xcsgpz/zYg7w5IUpOmufxz2kjeSIkz9x1FQ30dQWnkzfVukqTCueGGFjNTNj75xhbJmw83+z9H4CTlrmnUbc78Rg5+cBrfvemnlWsjJkwrTZPETUclSQWX3R4ASg83MzNXXA9eDCZwknLVle0BnMsvSSq0JUtgUOa/7QsWwMorAz7cLCITOEm5mjx95jLJ226Hn89Ta69faTuXX5JUJNn14H8/7wDes+Ct5otuD1B4JnCSctXR9gDgXH5JUrG0NzPl/u+fz5gTv5Zjz1QtTOAk5afVXP7WyVuDc/klSQUzefpMRj33d35z5cTKuRETptEwoM693QSYwEnqI9npIF956iaOv+68yrW2tgewwqQkqYiyM1MWxwA2+tb1gOvB1cwETlKv66hQiRW0JElimUIlI4/7PYsGDq60XQ+uJiZwknpdW4VKdj/8x/znAx92ewDVrIhYD7gceB+wFJiSUjovIr4NHA3MK996YkrphvJrTgCOBJYAX08pTe/zjkuqPh/9KNx7b6W58ck3sqjVzBTXg6tJpwmcAUrSishOmWy9KXfTWrdwOohq22Lg2JTSgxGxKvBARNxUvnZuSukH2ZsjYhPgAODDwFDg5oj4QEppCZKKK7se/PrrYa+9nJmiDnVlBM4AJWm5dGVvN3A6iGpbSukl4KXy8ZsR8STQ0f+wxgJXp5QWAs9HxLPA1sBdvd5ZSdXnzjth++2b25ntAZyZoo50msAZoCR1VdOo25z5jRz48I2cMf0nlWtuD6D+LCJGAJsD9wDbAl+LiEOA+yk9BH2dUuy8O/Oy2bQTTyNiPDAeYPjw4b3Wb0k5yY66DR0Kc+bk1xfVnAHLc3OrAAWlAPVIRFwSEauVzzUAL2Ze1maAiojxEXF/RNw/b9681pcl1ZimUbc58xuZddae7SZvQWl7AKtMqr+IiFWA3wH/L6X0H+BCYENgNKUHoOc03drGy9vckTelNCWlNCalNGattdbqhV5LysWSJS2TtwULTN603LpcxKR1gIqIC4HvUgo+36UUoI6giwEqpTQFmAIwZswYt5SXalxbhUr2OOw8Hl9nw0q7ob6OGRN3zKN7Uq+IiMGUYuMVKaVrAVJKL2eu/wxoeoIxG1gv8/JhwNw+6qqkvH3sY3B3ZhA++d9frZguJXAGKElt6UqhkiZOmVR/ExEBXAw8mVL6Yeb8uuXlBwD7AI+Vj68HroyIH1JaIz4SuBdJ/V8bhUqkFdWVKpQGKEnL6GqhEiiNvFlBS/3QtsDBwKMR8XD53InAFyJiNKXZJ7OALwGklB6PiN8AT1AqEPZVC3xJ/VwHhUqkFdWVETgDlKRlTJ4+k4Nm/JaTb7ukcq6tUTfXuqm/Sin9lbaXDdzQwWu+D3y/1zolKVftzkxZd12Y64Q09YyuVKE0QElaxowTdmrRbl2oxH1rJElF0jQzZcE7i3j+7L0r56+/+x/s/dENcuyZ+psuFzGRpIrMXP59vziZBxs2rrQtVCJJKqLJ02fyq0u+wZZzn6qcGzFhGg23zTKBU48ygZPUoY4KlWx88o00LmqeIW2hEklSUWVnphy978ncNHIbAObOb8yrS+qnlmsfOEnFkt3brXXyRkqcue8oGurr3NtNklRcM2a0mJkyYsK0SvIGpSUFUk9yBE5SuyZPn8ln7/0D3/vz/1XOjZgwrTRNEhi3eYMJmySpUNqbmdK45jps8T+XgjNT1MtM4CS1q71CJU4HkSQVUWeFSs7MJHcW81JvMYGT1LbMdJBxB5/Dw0ObnyA6HUSSVESTp8/ke1Mn89nHbq2cyxYqcWaK+oIJnCQLlUiS1AXZmSkHfv57/G3EaMCZKepbFjGRCs5CJZIkdeLhh5cpVNKUvIEzU9S3HIGTCm7y9Jnsf/dUvnPzRZVzFiqRJBVZezNT5o35ONvvdoqFSpQrEzip4CxUIklSs/YKlVx3z3OM3Xp9C5UodyZwUpFlpoN87qCzuH/Yhyttp4NIkopo8vSZnD71HPZ77ObKuRETptFw6/OM3Xp9Z6YodyZwUkFYqESSpM5lZ6Z84YDvc9f7NwOcmaLqYRETqQAsVCJJUiceeWSZQiVNyRs4M0XVwxE4qR9rGnWbM7+RAx7+E5OmX1C5ZqESSZLKMonbq1tuw3a7n2ahElUtEzipn2oadWtctIRZrUbdLFQiSRKwdCkMHNjcXriQNVdayUIlqmomcFI/NXn6zGWSt30PmsyDwzautJ0OIkkqiqmtkrIr757C+6/7dfMNKVUOnZmiamYCJ/VT7W0P0MTpIJKkosjOSoFWMfLmm2Gnndp5pVR9LGIi9UeZufywbPJmoRJJUpE0zUr54LxZLWambHvmLSZvqjmOwEk1Ljsl5PDn/sqpv51UudbW9gAmbpKkopk7v7FF4nbPsA/z+YPOIlwLrhpkAifVsI4KlZCSi7AlSUqpxRY6Gx03lcUDS/8Fdi24apEJnFTD2ipUMu7gc5i3yWi3B5Ak6Ywz4KSTKs3skgLXgqtWmcBJNSY7ZbL1ptxNgckpIZKkwsuuB7/3XqYOGkqDs1LUD5jASTWkK3u7gVNCJEkF9uyzMHJkc7u8PcA4MGFTv2ACJ9WAplG3OfMbGff4bfxo2jmVa24PIElSWXbU7Utfgp/+NL++SL3EBE6qcl0ddQtwSogkqZhSggGZ3bHeeQcGD86vP1Iv6nQfuIhYLyJui4gnI+LxiDimfH71iLgpIp4pf12tfD4i4scR8WxEPBIRW/T2H0Lqz9oqVLLPF3/QInlrqK/j+Ul7MGPijiZvUh8yRkpVYNKklslbSiZv6te6MgK3GDg2pfRgRKwKPBARNwGHAbeklCZFxERgIjAB2B0YWf71UeDC8ldJXdSVQiVNnDIp5coYKeWpVaESttoqv75IfaTTEbiU0ksppQfLx28CTwINwFjgsvJtl1FaG0r5/OWp5G6gPiLW7fGeS/1U05TJOV1I3hrq69yYW8qRMVLKyT/+0TJ5S8nkTYWxXGvgImIEsDlwD7BOSuklKAWwiFi7fFsD8GLmZbPL517qbmelIpg8fSZ7PvAnJt94XuVcW6NuJm5SdenJGBkR44HxAMOHD+/VfkvVLjsrZWh9HTNO2Kn54vjxcNFF+XVOykGXE7iIWAX4HfD/Ukr/iexTj1a3tnEutfH9DE5SG1oEJixUItWCno6RKaUpwBSAMWPGLHNdKopsIS9SahkjFy6ElVbKr3NSTrqUwEXEYEqB6YqU0rXl0y9HxLrlJ4vrAq+Uz88G1su8fBgwt/X3NDhJbcj8p2//Aydx73ofqbQb6uuYMXHHPHolqQO9ESMllTQV8jrjT+dz4N+nV85ve+YtzDB5U0F1pQplABcDT6aUfpi5dD1waPn4UOC6zPlDypW2tgHeaJpGIqnZ1IfmsO2kW1l/4h9LiVsmedv45BtbJG8WKpGqkzFS6l1z5zcy66w9K8nb2IPPYcSEacyd35hzz6T8dGUEblvgYODRiHi4fO5EYBLwm4g4EngB2K987QbgM8CzwNvA4T3aY6kf6GhvN1LizFbz/Z0yKVUtY6TUWx59tEUxr+ySgqH1dXn0SKoKnSZwKaW/0vacfYCdWp9IKSXgq93sl9QvNS3EnjO/kb2f+As//sPkyrURE6aVpkkC4zZvMGGTaoAxUuolmVkpDw7bhH0POrvSdlaKim65qlBKWnEdjbo1PVV0SogkqdBSarkp94IFvPDEqzQ4K0WqMIGT+kjTQuxs8vbZg87mgWGbVNpOCZEkFdb//E/LLQFSqcads1KklkzgpF6U3bums025nRIiSSqs7NYb06fDLrvk1xepypnASb2kK1MmmzQ4JUSSVESPPQajRjW3kztLSZ0xgZN6WLZQyT6P3cq5f2yuLN7WqNuZ+44ycZMkFU921G2bbeCuu/Lri1RDTOCkHtTVUbcAF2JLkoqpjUIlrLxyfv2RaowJnNSD2ipU8oUDvs9d79+s0m6or2PGxB3z6J4kSfn68pfhpz9tbjtlUlpuJnBSN1moRJKkLshOmfzTn2DXXfPri1TDTOCkbrBQiSRJnXj8cfjIR5rbjrpJ3WICJ62AbKGSPZ+8gwuuP7tyzUIlkiSVZUfdPvpRuPvu/Poi9RMmcNJyslCJJEnAgjMqAAAgAElEQVSdaF2opLERhgzJrz9SP2ICJy2ntguVnMFd79+00rZQiSSpsL7yFbjwwua2UyalHmUCJ3WBhUokSeqC7JTJG2+E3XbLry9SP2UCJ3XCQiWSJHXiiSfgwx9ubjvqJvUaEzipHdlCJZ979GZ+cMOPKtcsVCJJUll21G2rreDee/Pri1QAJnBSGyxUIklSJyxUIuXCBE5qQ1uFSg773GncvuFWlbaFSiRJhfXVr8L//V9z2ymTUp8xgZPaMOOEnVq0LVQiSSqqbCGvofV1LWOkhUqkPmcCJ7WWncuPhUokScWVXVKw0asvcPNZX2m+6KiblAsTOBVa9qniEc/dySm/PatybeOTb6Rx0ZJK20IlkqSiaWtJwSPv24gvH3MRM3Lsl1RkJnAqrI4KlZASZ7aaMuKomySpaOa+/jazzt6r0v7gsdeycNBKxPzGHHslFZsJnAqrraeKh3/uNJ7ecntmAOM2bzBhkyQV15578vwf/1hpZpcUDK2vy6NHkjCBU4G1V6jEp4qSpKLpqFDJceO+xTUf3L7StpCXlK8Bnd8i9UMdFCrxqaIkqUialhTMmd/IJv96tuUDzpT4xKlfp6G+jqBUyMv14FK+HIFTsVxyCRx5ZKXZVqESnypKkoqkrSUFCwcOYsfvTXdJgVSFOh2Bi4hLIuKViHgsc+7bETEnIh4u//pM5toJEfFsRMyMiF17q+PScotokbyREmfuO8qnipJWmDFStWzqQ3PYdtKtzHn97RbJ24e+eQ0fPG4qc11SIFWlrozAXQpcAFze6vy5KaUfZE9ExCbAAcCHgaHAzRHxgZTSEqQ8ZadMXnst7LMP4FNFSd12KcZI1aCmaZOXXXosW89+onLeJQVS9es0gUsp3RERI7r4/cYCV6eUFgLPR8SzwNbAXSvcQ2k5ZRdiP9/G9gCS1FOMkapVk6fP5Mnv7V5pf2u3r/ObzXaptF1SIFWv7hQx+VpEPFKePrJa+VwD8GLmntnlc1KfyC7ENnmTlCNjpKrXvfe2KFQyYsK0FsmbSwqk6raiRUwuBL4LpPLXc4AjgGjj3jb/1xwR44HxAMOHD1/BbkglTaNuc+Y3Mvbx2zhv2jmVayMmTKOhvo4ZOfZPUqEYI1W9OqjCDKXkbcbEHfuyR5KW0wolcCmll5uOI+JnQNO//tnAeplbhwFz2/keU4ApAGPGjHFoRCusadStdQUtaA5MLsSW1FeMkapKKcGA5olXf5jxNN+68R9gJWap5qzQFMqIWDfT3Adoqr51PXBARKwcEesDI4F7u9dFqWNtlT8+et+TXYgtKRfGSFWd7bdvkbyREnt9fKSVmKUa1ekIXERcBewArBkRs4HTgB0iYjSlqR+zgC8BpJQej4jfAE8Ai4GvWl1LvaGjQiWtp4P4RFFSbzFGquplp0z+7Gdw1FGVppWYpdoUqQoKO4wZMybdf//9eXdDNaIrUyabNNTXcfyuHzRASVUkIh5IKY3Jux+1whipFXLffbD11s3tKvj/nqSOdTU+rmgREyk3k6fPZOe/38qP/zC5cq6tUTengkiSCqlVoRKTN6l/MYFTzcmWPoaWyVtQWu/mqJskqQiySwqGvncIM078dPPFN9+EVVbJr3OSeoUJnGpL5qnil/Y5kekf+HilbeljSVKRZJcUXH3lRLZ58bHmi466Sf2WCZyqVkeFSjY++UYaLX0sSSqwtqowT9z1a9y5wz7ufSr1YyZwqkodFSohJc7MThlxyqQkqYBWeeZJZl3ytUq7aUlBuPep1K+ZwKkqTZ4+k+0fu5OLpp5ROTdiwrTSNEksfSxJKrgIppcP572rnq3+91eVS+59KvVvJnCqSu0VKpnrU0VJUsF0VKhki2/9nn/H4ErbJQVS/2cCp+qTKVRy9L4nc9PIbSptnypKkooku6Tg7Bt+xP6P3tx8MSVOdUmBVDgmcMqVhUokSWpfW4VKvrr3BB7+2C4uKZAKygROubFQiSRJHRs463lmXXRUpW2hEkkmcOpzTaNuc+Y3suOz93LJ706vXLNQiSRJZRHcUT688/2jOfiA71UuuaRAKi4TOPWpjkbdLFQiSRKlTbgHDKg0R0+YyvzMf9lcUiAV24DOb5F6Tltz+Y/47KmV5A18qihJKrDTT2+RvJES3/78GBrq6wigob6OM/cd5QwVqcAcgVOv66hQSTZxA58qSpIKLFOFmT//GXbeGXBJgaSWTODUq7oyZbJJg4VKJElF9NxzsOGGze2U8uuLpKpnAqdeNXn6TLZ8+n5+9ZtTKufaGnVzOogkqZCyo2477QQ339z+vZKECZx62YwTdmrRziZvAW4PIEkqjOySgqHvHcKMEz/dfPHtt6HONeCSOmcCp96Teap42Oe+ze0bjqm0G+rrmDFxxzx6JUlSn8suKfjfGVdx7F+vaL7olElJy8EETj2io0IlG598I42LllTaFiqRJBVNW1WYD97/dJ7bfFtm5NgvSbXHBE7d1lGhElLizOyUEadMSpIKaNHsOcz6ySGVdtOSgnDvU0nLyQRO3TZ5+ky2eOYBrvj1yZVzIyZMK02TxPLHkqSC23JL7n3wQQAuHjOW7+50dOWSe59KWl4mcOq29gqVzPWpoiSpYDoqVLLpxOv4TxpYabukQNKKGJB3B1TjWhQqOa1FlUmfKkqSiqRpScGc+Y2Me+zWllUmU+L0/bekob6OoFTMyy10JK0IR+DUZRYqkSSpfW0VKhl78Dm8uslolxRI6jEmcOoSC5VIktSxN/81j1nnHVBpW6hEUm/oNIGLiEuAPYFXUkofKZ9bHfg1MAKYBeyfUno9IgI4D/gM8DZwWErpwd7puvpC06jbnPmNfOyff+eqq0+qXLNQiSRJZRMn8sh5ZwFw/O7H8NtNd65cckmBpJ7UlRG4S4ELgMsz5yYCt6SUJkXExHJ7ArA7MLL866PAheWvqkEdjbpZqESSfMgpSptwD2guKTDqhD/w5tLm9eEuKZDU0zotYpJSugP4d6vTY4HLyseXAeMy5y9PJXcD9RGxbk91Vn2rrbn8h1uoRJKyLgV2a3Wu6SHnSOCWchtaPuQcT+khp2rZHXc0J2877wwp8d39NrdQiaRetaJr4NZJKb0EkFJ6KSLWLp9vAF7M3De7fO6lFe+i+lJHhUqyiRv4VFGSUkp3RMSIVqfHAjuUjy8Dbqc0S6XykBO4OyLqI2LdpniqGlNfD2+8UTp+5hnYaCPAJQWSel9PFzGJNs6lNm+MGE/pCSTDhw/v4W5oRXRlymSTBguVSFJ7fMjZn82fD6ut1txObf43R5J6zYomcC83PTUsT5F8pXx+NrBe5r5hwNy2vkFKaQowBWDMmDH+9MtRtlDJNi88wtVXnVi51taom9NBJGmF+JCz1p1wAkyaVDq+5BI4/PB8+yOpkFY0gbseOBSYVP56Xeb81yLiakrFS95wakh16+qoW4DbA0hS1/iQs79pVaiEd96BwYPz64+kQuvKNgJXUZrLv2ZEzAZOo5S4/SYijgReAPYr334Dpepaz1KqsOWjqSrXVqGSIz57KrdutHWl3VBfx4yJO+bRPUmqRT7k7E/uvBO23750/OlPw0035dsfSYXXaQKXUvpCO5d2auPeBHy1u51S77JQiST1DB9y9nOrrVZa8wYtCpVIUp56uoiJqpyFSiSp5/iQs5+yUImkKmYCVxDZQiVbzn6C313xrco1C5VIklSWLVRy8cVwxBH59keSWjGBKwALlUiS1AkLlUiqESZwBdBWoZLDPvdtbt9wTKVtoRJJUmH99a+w3Xal4512gptvzrc/ktQBE7h+ykIlkiR1wRprwL//XTp++mkYOTLf/khSJ0zg+iELlUiS1AkLlUiqUSZw/Ui2UMlmc2dy3S+PrVyzUIkkSWUnnghnnlk6tlCJpBpjAtdPWKhEkqROtC5UsnAhrLRSfv2RpBVgAtdPtFWo5OD9T+fO9beotC1UIkkqLAuVSOonTOBqmIVKJEnqgjXXhNdeKx1bqERSjTOBq1EWKpEkqRMWKpHUD5nA1ZhsoZJNX3qa6y//ZuWahUokSSo76SQ444zS8c9/DkcemW9/JKmHmMDVEAuVSJLUBRHNxxYqkdTPmMDVkLYKlRyy33e4Y4MtK20LlUiSCmvGDPjEJ0rHn/oU3Hprvv2RpF5gAlflLFQiSVIXZAuVzJwJH/hAvv2RpF5iAlfFLFQiSVIn3ngD6uub2xYqkdTPmcBVoWyhkg/Mm8WfL/la5ZqFSiRJKjv5ZPj+90vHP/sZHHVUvv2RpD5gAldlLFQiSVIXWKhEUkGZwFWB7Dq3AREsSalF8rb/gZO4d72PVNoWKpEkFVa2UMkOO8Btt+XaHUnqayZwOcuOuAHcdNHRbPD63Mp1C5VIklS29towb17p2EIlkgrKBC5nTVsDABYqkSSpLRYqkaQKE7icZAuVDH/9Je6YcnTlmoVKJEkqs1CJJLVgApeDrhQqGRjB0pQsVCJJKi4LlUjSMkzg+khnhUr2OuRcHl13JOCImySp4P72N9h229Lx9tvDX/6Sb38kqYqYwPWB1oVKLrj2++z+9N8q17NTJl3nJkkqtHXWgVdeKR0/9RR80MJdkpTVrQQuImYBbwJLgMUppTERsTrwa2AEMAvYP6X0eve6WdvaK1TyxNrr85nDz6+03R5AklRYFiqRpC4Z0APf41MppdEppTHl9kTglpTSSOCWcruQpj40h20n3cqc+Y2s8+arLZK3EROmtUje3B5AklRYp5zSnLxNmWLyJkkd6I0plGOBHcrHlwG3AxN64fepahYqkaRic5ZKF1moRJKWS3cTuAT8OSIScFFKaQqwTkrpJYCU0ksRsXZ3O1krOitUstORF/KPNdcDLFQiSQXxqZTSq5l20yyVSRExsdwu3ENOwEIlkrSCupvAbZtSmltO0m6KiKe6+sKIGA+MBxg+fHg3u5G/1oVKvvOnC/jiwzdWrluoRJKEs1RKLFQiSSusWwlcSmlu+esrEfF7YGvg5YhYtzz6ti7wSjuvnQJMARgzZkzNT3Zvr1DJX9bfgkP3P73StlCJ1P8sWrSI2bNns2DBgry7UlWGDBnCsGHDGDx4cN5dyYuzVFqzUIlUeMbM7sfHFU7gIuLdwICU0pvl412A04HrgUOBSeWv163o71FL5s5vZI3/zueBC75YOZcddQMLlUj91ezZs1l11VUZMWIEkV3PU2ApJV577TVmz57N+uuvn3d38uIslaxTT4Xvfrd0fNFFMH58vv2RlIuix8yeiI/dGYFbB/h9+Y0fBFyZUvpTRNwH/CYijgReAPbrxu9RM563UIlUWAsWLChsIGpPRLDGGmswb968vLuSG2epZFioRFJZ0WNmT8THFU7gUkrPAZu1cf41YKcV7lGVyxYqqSRlWwyrXN9+/M94YbV1AQuVSEVS1EDUkSK/J0WepZKNkzu/8RxTfvr10oXttoM77si3c5KqQpHjA3T/z98b2wj0W60Llew37eeMO+Gq5usPzmbJ9JlENrkzeZPUB1ZZZRXeeuutvLuhZoWcpZKNk/decDBr/7e0Q8LNv7udT+/7yZx7J0kltR4zTeCWQ3uFSv645a7scf+fGAcmbJKkws5SmTx9JksaFzDrnH0q50ZMmEbD00v4dI79kqT+xASuC5qmg8yZ38h7G9/k7z/+QuXaiAnTCGCP/LonSRW33347p512Guussw4PP/ww++67L6NGjeK8886jsbGRqVOnsuGGG+bdTfVTw/9+DzOuPhGAE3f9KleO3h0oFfqSpGpTqzHTBK4T2ekgs9opVDK0vi6PrklSm/7+97/z5JNPsvrqq7PBBhtw1FFHce+993Leeedx/vnn86Mf/SjvLqo/2mMPrrrhBl5eZXU+/uVfsGTAwMol46SkalWLMdMErg3ZBdgDIliSUovk7eNfvoS57ylt3ePWAJKW0RuLs5djv6ytttqKddctFVPacMMN2WWXXQAYNWoUt912W8/3TcX24otQ3urgwVN/wEFLP8yS8nIDME5K6oQxc7kNyLsD1aZpxG3O/EYSsMuTd7ZI3kZMmFZJ3hrq66wyKWlZKfX8r+Ww8sorV44HDBhQaQ8YMIDFixf36B9VBXfOOZXkjXnz2OI7x3LmvqNoqK8jME5K6gJj5nJzBK6V9gqVnLLz//DLLZrbDfV1zJi4Y5/3T5Kk3C1cCEOGlI4POwx+8YvKpXGbN5iwSVIvMoEryxYqqXtnAU+e+7nKtaa1bk2cDiJJKqzbb4dPfap0fN99MGZMrt2RpKKJtJzDjL1hzJgx6f7778/t988WKjn7hh+x/6M3A3DHiM055PPfBWBgBEtTcn83Sct48skn2XjjjfPuRlVq672JiAdSSv6vv4vyjpEt7LEH3HADvO99pbVvg3wOLGn5GDNLuhMfC/uTt7NCJaO/fiXz694DlEbcnMMvSSqsTKESfv5zOPLIfPsjSQVWyAQuO+IGsP68f3LzxV+pXM9OmWxwxE2SVGTnnAPHHVc6fuUVWGutfPsjSQVXyAQuW6jk2l8eyxZzZwIw7uBzeHho89o2C5VIkgpr4UJ417tg6dJlCpVIkvJTqAQuW6hk4NIl3POTQ1jz7TcAC5VIklRhoRJJqlqFSeCy0yY3felprr/8mwB8/gtncs/wUYCFSiRJYq+9YNo0C5VIUpXq9z+Vs6Nu0Ly326IBA/nwN67hnUGDAQuVSJIKzkIlklQTBuTdgd7UNOo2Z34jG776YiV5u2mjjzLy+OsqyVtDfZ3JmySpuM45pzl5e+UVkzdJhTdr1iyuvPLKFX79GWec0YO9aalfJ3BNxUp+98vjuOXiLwOlQiVHf/aUyj1NhUpM3iQVQXcC0sc//vEe7o1yt3AhDBxYqjJ5yCGQklUmJQkTuNy8/O+3uO/8L7Ll3KeAUqGSbJVJC5VIysPUh+aw7aRbWX/iH9l20q1MfWhOn/3eHQWkxYsXd/jav/3tb73RJeXl9tthyJBSlcn77oPLLsu7R5K0jJ6OmaeccgrnnXdepX3SSSfx4x//eJn7Jk6cyJ133sno0aM599xzWbJkCccffzxbbbUVm266KRdddBEAL730Ettvvz2jR4/mIx/5CHfeeScTJ06ksbGR0aNHc9BBB3Wrv23pv2vg7ruPZyePBVoWKmni/m6S8tB6H8o58xs54dpHAbr18+iUU05hzTXX5JhjjgFKAWmdddbh61//eov7Jk6cyJNPPsno0aM59NBDWW211fjjH//IggUL+O9//8v111/P2LFjef3111m0aBHf+973GDu29LN0lVVW4a233uL222/n29/+NmuuuSaPPfYYW265Jb/61a+IiBXuv/pYU6GStdeGOXMsVCKpKvVGzDzyyCPZd999OeaYY1i6dClXX30199577zL3TZo0iR/84AdMm1aqVD9lyhTe+973ct9997Fw4UK23XZbdtllF6699lp23XVXTjrpJJYsWcLbb7/NdtttxwUXXMDDDz+8gn/yjvWLn9hNhUrmzm9kaH0dV/3tpwz/w29ZOmgQo4/7Hf9JAyv3WqxEUp6y+1A2aVy0hMnTZ3br59KKBqRLL72Uu+66i0ceeYTVV1+dxYsX8/vf/573vOc9vPrqq2yzzTbsvffeyyRnDz30EI8//jhDhw5l2223ZcaMGXziE59Y4f6rd7SOj6eMXpXddt+6dNFCJZKqXG/EzBEjRrDGGmvw0EMP8fLLL7P55puzxhprdPq6P//5zzzyyCNcc801ALzxxhs888wzbLXVVhxxxBEsWrSIcePGMXr06BXq1/Ko+QQum5mv/vYbzCgXKnn0m6cy6pzvcHqr4OWom6Q8zS1XxO3q+a5a0YAEsPPOO7P66qsDkFLixBNP5I477mDAgAHMmTOHl19+mfe9730tXrP11lszbNgwAEaPHs2sWbNM4KpM6yfXu/35SnY74eLSxVdeca2bpKrXWzHzqKOO4tJLL+Vf//oXRxxxRJdek1Li/PPPZ9ddd13m2h133MEf//hHDj74YI4//ngOOeSQbvWvMzWfwDVl5p//+3TO+tP5AGzz5UsZuNZ6zKA0vGrCJqlaDK2vq2xr0vp8d61IQAJ497vfXTm+4oormDdvHg888ACDBw9mxIgRLFiwYJnXrLzyypXjgQMHdrp+Tn2vKT6utHgRT/zwswxKS/ndR3bkhwedxAyTN0k1oLdi5j777MOpp57KokWL2l0Xvuqqq/Lmm29W2rvuuisXXnghO+64I4MHD+bpp5+moaGBV199lYaGBo4++mj++9//8uCDD3LIIYcwePBgFi1axODBg7vV17bUfALXlIGf9afzuWXDrTjyc6cBEN3MzCWpNxy/6wdbjIpAzxVUWpGA1Nobb7zB2muvzeDBg7ntttv45z//2e1+KR9N8XGH5+5nUFrK3of8kEfW/YDxUVLN6K2YudJKK/GpT32K+vp6Bg4c2OY9m266KYMGDWKzzTbjsMMO45hjjmHWrFlsscUWpJRYa621mDp1KrfffjuTJ09m8ODBrLLKKlx++eUAjB8/nk033ZQtttiCK664olv9ba3mE7imzHzEhGnLnJekatM0I6A3pnavSEBabbXVWlw/6KCD2GuvvRgzZgyjR4/mQx/6ULf7pXw0xcc/f+BjLWKk8VFSreitmLl06VLuvvtufvvb37Z7z+DBg7nllltanDvjjDOW2R7g0EMP5dBDD13m9WeddRZnnXVWt/rZnppP4HrzabYk9Ybemtq9ogHpsMMOqxyvueaa3HXXXW2+9q233gJghx12YIcddqicv+CCC1a80+o1xkdJ/UFPx8wnnniCPffck3322YeRI0f22PftS72WwEXEbsB5wEDg5ymlSb3x+/Tm02xJqhX9ISCpZxkfJWlZm2yyCc8991yl/eijj3LwwQe3uGfllVfmnnvu6euudVmvJHARMRD4CbAzMBu4LyKuTyk90Ru/n4VKJBVdfwhI6nnGR0nq2KhRo3ptv7be0lsjcFsDz6aUngOIiKuBsUCvJHCSpJZqMSBJkqTODeil79sAvJhpzy6fk6R+KaWUdxeqju9J2yJit4iYGRHPRsTEvPsjSX2t6PGhu3/+3krgoo1zLXoaEeMj4v6IuH/evHm91A1J6n1DhgzhtddeK3xAykop8dprrzFkyJC8u1JVMksMdgc2Ab4QEZvk2ytJ6jtFj5k9ER97awrlbGC9THsYMDd7Q0ppCjAFYMyYMcX8G5TULwwbNozZs2fjw6iWhgwZwrBhw/LuRrVxiYGkQjNmdj8+9lYCdx8wMiLWB+YABwAH9tLvJUm5Gjx4MOuvv37e3VBtaGuJwUdz6osk9TljZvf1SgKXUlocEV8DplPaRuCSlNLjvfF7SZJUQzpdYgClZQbAeIDhw4f3dp8kSTWk1/aBSyndANzQW99fkqQa1OkSA3CZgSSpfb1VxESSJC2rssQgIlaitMTg+pz7JEmqIVENFWAiYh7wzx74VmsCr/bA9+lPfE/a5vvSNt+Xtvm+tG1F35f3p5TW6unO1IqI+AzwI5qXGHy/k/tXNEb6uW2f7037fG/a53vTMd+f9nX1velSfKyKBK6nRMT9KaUxefejmvietM33pW2+L23zfWmb70t18++nfb437fO9aZ/vTcd8f9rX0++NUyglSZIkqUaYwEmSJElSjehvCdyUvDtQhXxP2ub70jbfl7b5vrTN96W6+ffTPt+b9vnetM/3pmO+P+3r0femX62BkyRJkqT+rL+NwEmSJElSv9UvEriI2C0iZkbEsxExMe/+5CUi1ouI2yLiyYh4PCKOKZ9fPSJuiohnyl9Xy7uvfS0iBkbEQxExrdxePyLuKb8nvy7vx1Q4EVEfEddExFPlz83Hiv55iYhvlP/9PBYRV0XEkKJ+XiLikoh4JSIey5xr8/MRJT8u/xx+JCK2yK/nxWZMbGZc7JzxsX3GyPYZK1vq63hZ8wlcRAwEfgLsDmwCfCEiNsm3V7lZDBybUtoY2Ab4avm9mAjcklIaCdxSbhfNMcCTmfZZwLnl9+R14MhcepW/84A/pZQ+BGxG6T0q7OclIhqArwNjUkofobRP1wEU9/NyKbBbq3PtfT52B0aWf40HLuyjPirDmLgM42LnjI/tM0a2wVjZpkvpw3hZ8wkcsDXwbErpuZTSO8DVwNic+5SLlNJLKaUHy8dvUvpB00Dp/bisfNtlwLh8epiPiBgG7AH8vNwOYEfgmvIthXtPACLiPcD2wMUAKaV3UkrzKfjnBRgE1EXEIOBdwEsU9POSUroD+Her0+19PsYCl6eSu4H6iFi3b3qqDGNihnGxY8bH9hkjO2WszOjreNkfErgG4MVMe3b5XKFFxAhgc+AeYJ2U0ktQCmbA2vn1LBc/Ar4FLC231wDmp5QWl9tF/cxsAMwDflGePvPziHg3Bf68pJTmAD8AXqAUjN4AHsDPS1Z7nw9/FlcH/x7aYVxsk/GxfcbIdhgru6zX4mV/SOCijXOFLq0ZEasAvwP+X0rpP3n3J08RsSfwSkrpgezpNm4t4mdmELAFcGFKaXPgvxRwKkhWeX76WGB9YCjwbkpTHVor4uelM/67qg7+PbTBuLgs42OnjJHtMFZ2W7f/nfWHBG42sF6mPQyYm1NfchcRgykFqStSSteWT7/cNDRb/vpKXv3LwbbA3hExi9JUoh0pPXGsLw/7Q3E/M7OB2Smle8rtaygFqyJ/Xj4NPJ9SmpdSWgRcC3wcPy9Z7X0+/FlcHfx7aMW42C7jY8eMke0zVnZNr8XL/pDA3QeMLFe+WYnSIsrrc+5TLspz1y8Gnkwp/TBz6Xrg0PLxocB1fd23vKSUTkgpDUspjaD02bg1pXQQcBvwufJthXpPmqSU/gW8GBEfLJ/aCXiCAn9eKE0H2SYi3lX+99T0nhT+85LR3ufjeuCQcnWtbYA3mqaOqE8ZEzOMi+0zPnbMGNkhY2XX9Fq87BcbeexdYz4AAADzSURBVEfEZyg9NRoIXJJS+n7OXcpFRHwCuBN4lOb57CdSmu//G2A4pX90+6WUWi+07PciYgfguJTSnhGxAaUnjqsDDwFfTCktzLN/eYiI0ZQWr68EPAccTunBTmE/LxHxHeDzlKrXPQQcRWlueuE+LxFxFbADsCbwMnAaMJU2Ph/lIH4BpSpcbwOHp5Tuz6PfRWdMbGZc7BrjY9uMke0zVrbU1/GyXyRwkiRJklQE/WEKpSRJkiQVggmcJEmSJNUIEzhJkiRJqhEmcJIkSZJUI0zgJEmSJKlGmMBJkiRJUo0wgZMkSZKkGmECJ0mSJEk14v8DrjwHLr2LjZQAAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# 图例大小\n",
"plt.figure(figsize=(15,5))\n",
"\n",
"# 画出训练数据\n",
"plt.subplot(1, 2, 1)\n",
"plt.title(\"Train\")\n",
"plt.scatter(X_train, y_train, label=\"y_train\")\n",
"plt.plot(X_train, pred_train, color=\"red\", linewidth=1, linestyle=\"-\", label=\"lm\")\n",
"plt.legend(loc='lower right')\n",
"\n",
"# 画出测试数据\n",
"plt.subplot(1, 2, 2)\n",
"plt.title(\"Test\")\n",
"plt.scatter(X_test, y_test, label=\"y_test\")\n",
"plt.plot(X_test, pred_test, color=\"red\", linewidth=1, linestyle=\"-\", label=\"lm\")\n",
"plt.legend(loc='lower right')\n",
"\n",
"# 显示图例\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "xAP1EoQi86XB"
},
"source": [
"# 推论"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 159
},
"colab_type": "code",
"id": "K2yfNk3d8-Vj",
"outputId": "aef548b1-d0a1-4481-a9fe-4c975f278a3f"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[10.00362276 13.65354177 17.30346078]\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" X | \n",
" y | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.0 | \n",
" 10.00 | \n",
"
\n",
" \n",
" 1 | \n",
" 1.0 | \n",
" 13.65 | \n",
"
\n",
" \n",
" 2 | \n",
" 2.0 | \n",
" 17.30 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" X y\n",
"0 0.0 10.00\n",
"1 1.0 13.65\n",
"2 2.0 17.30"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 传入我们自己的输入值\n",
"X_infer = np.array((0, 1, 2), dtype=np.float32)\n",
"standardized_X_infer = X_scaler.transform(X_infer.reshape(-1, 1))\n",
"pred_infer = (lm.predict(standardized_X_infer) * np.sqrt(y_scaler.var_)) + y_scaler.mean_\n",
"print (pred_infer)\n",
"df.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "PHH0fYp_BYC5"
},
"source": [
"# 可解释性"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "OhXo8CbPBZ-G"
},
"source": [
"线性回归有很强的可解释性。每一个特征都有一个系数来控制对输出值y的影响大小。我们可以这样解释这个系数: 如果我们把x增加1, 我们将把y增加 $W$ (~3.65)。\n",
"\n",
"**注意**: 因为我们在梯度下降时标准化了我们的输入和输出,我们需要对我们的系数和截距做一个反标准化。过程可见下方。"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 51
},
"colab_type": "code",
"id": "JZxnrDuCBbK9",
"outputId": "adbe06d5-449a-4aa0-ebbb-ef5fdd5ef9e9"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[3.64992205]\n",
"[10.00362489]\n"
]
}
],
"source": [
"# 未标准化系数\n",
"coef = lm.coef_ * (y_scaler.scale_/X_scaler.scale_)\n",
"intercept = lm.intercept_ * y_scaler.scale_ + y_scaler.mean_ - np.sum(coef*X_scaler.mean_)\n",
"print (coef) # ~3.65\n",
"print (intercept) # ~10"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "yVmIP13u9s33"
},
"source": [
"### 非标准化系数的证明:\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "ViDPSLbR9v4B"
},
"source": [
"注意我们的X和y都已经标准化了。\n",
"\n",
"$\\frac{\\mathbb{E}[y] - \\hat{y}}{\\sigma_y} = W_0 + \\sum_{j=1}^{k}W_jz_j$\n",
"\n",
"$z_j = \\frac{x_j - \\bar{x}_j}{\\sigma_j}$\n",
"\n",
"$ \\hat{y}_{scaled} = \\frac{\\hat{y}_{unscaled} - \\bar{y}}{\\sigma_y} = \\hat{W_0} + \\sum_{j=1}^{k} \\hat{W}_j (\\frac{x_j - \\bar{x}_j}{\\sigma_j}) $\n",
"\n",
"$\\hat{y}_{unscaled} = \\hat{W}_0\\sigma_y + \\bar{y} - \\sum_{j=1}^{k} \\hat{W}_j(\\frac{\\sigma_y}{\\sigma_j})\\bar{x}_j + \\sum_{j=1}^{k}(\\frac{\\sigma_y}{\\sigma_j})x_j $\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "rToCXKqeJcvj"
},
"source": [
"# 正则化"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "L4GFv8xRJmOZ"
},
"source": [
"正规化有助于减少过拟合。下方是L2正则化(ridge regression)。有很多正则化的方法他们都可以使我们的模型减少过拟合。对于L2正则化, 我们会减小那些值很大的权重。数值很大的权重将会使模型更加看中它们的特征,但是我们希望的是模型会公平的对待所有的特征而不是仅仅权重很大的几个。 当然还有其他的正则化方法比如L1(lasso regression),它对于我们想创建更加稀疏的数据模型有好处,因为它会使得一些权重变成0,或者我们可以结合L2和L1正则化方法。\n",
"\n",
"**注意**: 正则化不仅仅用于线性回归。它可以用于任何常规模型以及我们以后将会学到的模型。"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "D_OcpRxF-Oj7"
},
"source": [
"* $ J(\\theta) = = \\frac{1}{2}\\sum_{i}(X_iW - y_i)^2 + \\frac{\\lambda}{2}\\sum\\sum W^2$\n",
"* $ \\frac{\\partial{J}}{\\partial{W}} = X (\\hat{y} - y) + \\lambda W $\n",
"* $W = W- \\alpha\\frac{\\partial{J}}{\\partial{W}}$\n",
"where:\n",
" * $\\lambda$ 是正则化系数"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "HHaazL9f8QZX"
},
"outputs": [],
"source": [
"# 初始化带有L2正则化的模型\n",
"lm = SGDRegressor(loss=\"squared_loss\", penalty='l2', alpha=1e-2, \n",
" max_iter=args.num_epochs)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 102
},
"colab_type": "code",
"id": "VTIUZLbGZP4e",
"outputId": "e284d26b-6091-4ce9-b40c-5b9d675f0837"
},
"outputs": [
{
"data": {
"text/plain": [
"SGDRegressor(alpha=0.01, average=False, epsilon=0.1, eta0=0.01,\n",
" fit_intercept=True, l1_ratio=0.15, learning_rate='invscaling',\n",
" loss='squared_loss', max_iter=100, n_iter=None, penalty='l2',\n",
" power_t=0.25, random_state=None, shuffle=True, tol=None, verbose=0,\n",
" warm_start=False)"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 训练\n",
"lm.fit(X=standardized_X_train, y=standardized_y_train)"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "ORwkUqcuZhbX"
},
"outputs": [],
"source": [
"# 预测 (还未标准化)\n",
"pred_train = (lm.predict(standardized_X_train) * np.sqrt(y_scaler.var_)) + y_scaler.mean_\n",
"pred_test = (lm.predict(standardized_X_test) * np.sqrt(y_scaler.var_)) + y_scaler.mean_"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"id": "IWCvYxBxZhd5",
"outputId": "a38bb230-eb97-43c1-d43d-8dead30dedcb"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"train_MSE: 1.09, test_MSE: 1.15\n"
]
}
],
"source": [
"# 训练集和测试集的MSE\n",
"train_mse = np.mean((y_train - pred_train) ** 2)\n",
"test_mse = np.mean((y_test - pred_test) ** 2)\n",
"print (\"train_MSE: {0:.2f}, test_MSE: {1:.2f}\".format(\n",
" train_mse, test_mse))"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "mdNX2W5eh2ma"
},
"source": [
"正则化对于我们在做的这个数据帮助很少,因为我们在创建数据的时候用的就是一个线性的函数。但是对于现实中的数据,正则化就可以帮助我们构建更好的模型。"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 51
},
"colab_type": "code",
"id": "C2mrVS4UZp3Q",
"outputId": "5ba6159b-0791-4093-9ed9-8925386edf36"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[3.61384419]\n",
"[11.67785083]\n"
]
}
],
"source": [
"# 未标准化系数\n",
"coef = lm.coef_ * (y_scaler.scale_/X_scaler.scale_)\n",
"intercept = lm.intercept_ * y_scaler.scale_ + y_scaler.mean_ - (coef*X_scaler.mean_)\n",
"print (coef) # ~3.65\n",
"print (intercept) # ~10"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "V74lNFE5v5pQ"
},
"source": [
"# 类别变量"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "2r6Xhyg7v5vX"
},
"source": [
"在我们的例子中,特征用的是连续的数值,那么假设我们要用类别的特征变量呢?一种选择就是使用独热编码来处理类别变量,这种方法用Pandas很容易实现,你可以用和上面相同的步骤来训练你的模型"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 173
},
"colab_type": "code",
"id": "unhcIOfMxQEQ",
"outputId": "ecf36ce0-28af-4381-b61b-2592a3c1b289"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" favorite_letter | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" a | \n",
"
\n",
" \n",
" 1 | \n",
" b | \n",
"
\n",
" \n",
" 2 | \n",
" c | \n",
"
\n",
" \n",
" 3 | \n",
" a | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" favorite_letter\n",
"0 a\n",
"1 b\n",
"2 c\n",
"3 a"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 创建类别特征\n",
"cat_data = pd.DataFrame(['a', 'b', 'c', 'a'], columns=['favorite_letter'])\n",
"cat_data.head()"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 173
},
"colab_type": "code",
"id": "m4eQmJdrxQGr",
"outputId": "247aaac2-afcb-4899-e415-91d3fe169fbf"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" favorite_letter_a | \n",
" favorite_letter_b | \n",
" favorite_letter_c | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 2 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 3 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" favorite_letter_a favorite_letter_b favorite_letter_c\n",
"0 1 0 0\n",
"1 0 1 0\n",
"2 0 0 1\n",
"3 1 0 0"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dummy_cat_data = pd.get_dummies(cat_data) #独热编码 one-hot encoding,与dummy变量不同要注意。\n",
"dummy_cat_data.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "B5R8x-KyiBWJ"
},
"source": [
"现在你可以拼接上连续特征变量来训练线性模型。"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "eVOXoCRsokzp"
},
"source": [
"# TODO"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "4c7ttuUwfeLA"
},
"source": [
"- 多项式回归\n",
"- 一个简单的用正规方程的例子(sklearn.linear_model.LinearRegression)来分析优点和缺点,并且和随机梯度下降线性回归做对比。"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"name": "04_Linear_Regression",
"provenance": [],
"toc_visible": true,
"version": "0.3.2"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 1
}