{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 线性回归和线性分类器"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 介绍"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"本次实验简述了最小二乘法、最大似然估计、逻辑回归、正则化、验证和学习曲线的基本概念,搭建了基于逻辑回归的线性模型并进行正则化,通过分析 IBMD 数据集的二元分类问题和一个 XOR 问题阐述逻辑回归的优缺点。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 知识点"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- 回归\n",
"- 线性分类\n",
"- 逻辑回归的正则化\n",
"- 逻辑回归的优缺点\n",
"- 验证和学习曲线"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 最小二乘法"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"在开始学习线性模型之前,简要介绍一下线性回归,首先指定一个模型将因变量 $y$ 和特征联系起来,对线性模型而言,依赖函数的形式如下:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$y = w_0 + \\sum_{i=1}^m w_i x_i$$ "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"如果为每项观测加上一个虚维度 $x_0 = 1$(比如偏置),那么就可以把 $w_0$ 整合进求和项中,改写为一个略微紧凑的形式:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$y = \\sum_{i=0}^m w_i x_i = \\textbf{w}^\\text{T} \\textbf{x}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"如果有一个特征观测矩阵,其中矩阵的行是数据集中的观测,那么需要在左边加上一列。由此,线性模型可以定义为:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\textbf y = \\textbf X \\textbf w + \\epsilon$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"其中:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- $\\textbf y \\in \\mathbb{R}^n$:因变量(目标变量)。\n",
"- $w$:模型的参数向量(在机器学习中,这些参数经常被称为权重)。\n",
"- $\\textbf X$:观测及其特征矩阵,大小为 n 行、m+1 列(包括左侧的虚列),其秩的大小为 $\\text{rank}\\left(\\textbf X\\right) = m + 1 $。\n",
"- $\\epsilon $:一个变量,用来表示随机、不可预测模型的错误。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"上述表达式亦可这样写:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ y_i = \\sum_{j=0}^m w_j X_{ij} + \\epsilon_i$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"模型具有如下限制(否则它就不是线性回归了):"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- 随机误差的期望为零:$\\forall i: \\mathbb{E}\\left[\\epsilon_i\\right] = 0 $;\n",
"- 随机误差具有相同的有限方差,这一性质称为等分散性:$\\forall i: \\text{Var}\\left(\\epsilon_i\\right) = \\sigma^2 < \\infty $;\n",
"- 随机误差不相关:$\\forall i \\neq j: \\text{Cov}\\left(\\epsilon_i, \\epsilon_j\\right) = 0 $."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"权重 $w_i$ 的估计 $\\widehat{w}_i$ 满足如下条件时,称其为线性:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\widehat{w}_i = \\omega_{1i}y_1 + \\omega_{2i}y_2 + \\cdots + \\omega_{ni}y_n$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"其中对于 $\\forall\\ k\\ $,$\\omega_{ki}$ 仅依赖于 $X$ 中的样本。由于寻求最佳权重的解是一个线性估计,这一模型被称为线性回归。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"再引入一项定义:当期望值等于估计参数的真实值时,权重估计被称为无偏(unbiased):"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\mathbb{E}\\left[\\widehat{w}_i\\right] = w_i$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"计算这些权重的方法之一是普通最小二乘法(OLS)。OLS 可以最小化因变量实际值和模型给出的预测值之间的均方误差:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\begin{array}{rcl}\\mathcal{L}\\left(\\textbf X, \\textbf{y}, \\textbf{w} \\right) &=& \\frac{1}{2n} \\sum_{i=1}^n \\left(y_i - \\textbf{w}^\\text{T} \\textbf{x}_i\\right)^2 \\\\\n",
"&=& \\frac{1}{2n} \\left\\| \\textbf{y} - \\textbf X \\textbf{w} \\right\\|_2^2 \\\\\n",
"&=& \\frac{1}{2n} \\left(\\textbf{y} - \\textbf X \\textbf{w}\\right)^\\text{T} \\left(\\textbf{y} - \\textbf X \\textbf{w}\\right)\n",
"\\end{array}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"为了解决这一优化问题,需要计算模型参数的导数。将导数设为零,然后求解关于 $\\textbf w$ 的等式,倘若不熟悉矩阵求导,可以参考下面的 4 个式子:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\\begin{array}{rcl} \n",
"\\frac{\\partial}{\\partial \\textbf{X}} \\textbf{X}^{\\text{T}} \\textbf{A} &=& \\textbf{A} \\end{array}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\\begin{array}{rcl} \\frac{\\partial}{\\partial \\textbf{X}} \\textbf{X}^{\\text{T}} \\textbf{A} \\textbf{X} &=& \\left(\\textbf{A} + \\textbf{A}^{\\text{T}}\\right)\\textbf{X} \\end{array}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\\begin{array}{rcl}\\frac{\\partial}{\\partial \\textbf{A}} \\textbf{X}^{\\text{T}} \\textbf{A} \\textbf{y} &=& \\textbf{X}^{\\text{T}} \\textbf{y} \\end{array}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\\begin{array}{rcl} \\frac{\\partial}{\\partial \\textbf{X}} \\textbf{A}^{-1} &=& -\\textbf{A}^{-1} \\frac{\\partial \\textbf{A}}{\\partial \\textbf{X}} \\textbf{A}^{-1} \n",
"\\end{array}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"现在开始计算模型参数的导数:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\begin{array}{rcl} \\frac{\\partial \\mathcal{L}}{\\partial \\textbf{w}} &=& \\frac{\\partial}{\\partial \\textbf{w}} \\frac{1}{2n} \\left( \\textbf{y}^{\\text{T}} \\textbf{y} -2\\textbf{y}^{\\text{T}} \\textbf{X} \\textbf{w} + \\textbf{w}^{\\text{T}} \\textbf{X}^{\\text{T}} \\textbf{X} \\textbf{w}\\right) \\\\\n",
"&=& \\frac{1}{2n} \\left(-2 \\textbf{X}^{\\text{T}} \\textbf{y} + 2\\textbf{X}^{\\text{T}} \\textbf{X} \\textbf{w}\\right)\n",
"\\end{array}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\begin{array}{rcl} \\frac{\\partial \\mathcal{L}}{\\partial \\textbf{w}} = 0 &\\Leftrightarrow& \\frac{1}{2n} \\left(-2 \\textbf{X}^{\\text{T}} \\textbf{y} + 2\\textbf{X}^{\\text{T}} \\textbf{X} \\textbf{w}\\right) = 0 \\\\\n",
"&\\Leftrightarrow& -\\textbf{X}^{\\text{T}} \\textbf{y} + \\textbf{X}^{\\text{T}} \\textbf{X} \\textbf{w} = 0 \\\\\n",
"&\\Leftrightarrow& \\textbf{X}^{\\text{T}} \\textbf{X} \\textbf{w} = \\textbf{X}^{\\text{T}} \\textbf{y} \\\\\n",
"&\\Leftrightarrow& \\textbf{w} = \\left(\\textbf{X}^{\\text{T}} \\textbf{X}\\right)^{-1} \\textbf{X}^{\\text{T}} \\textbf{y}\n",
"\\end{array}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"基于上述的定义和条件,可以说,根据高斯-马尔可夫定理,模型参数的 OLS 估计是所有线性无偏估计中最优的,即通过 OLS 估计可以获得最低的方差。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"有人可能会问,为何选择最小化均方误差而不是其他指标?因为若不选择最小化均方误差,那么就不满足高斯-马尔可夫定理的条件,得到的估计将不再是最佳的线性无偏估计。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"最大似然估计是解决线性回归问题一种常用方法,下面介绍它的概念。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 最大似然估计"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"首先举一个简单的例子,我们想做一个试验判定人们是否记得简单的甲醇化学式 $CH_3OH$。首先调查了 400 人,发现只有 117 个人记得甲醇的化学式。那么,直接将 $\\frac{117}{400} \\approx 29\\%$ 作为估计下一个受访者知道甲醇化学式的概率是较为合理的。这个直观的估计就是一个最大似然估计。为什么会这么估计呢?回忆下伯努利分布的定义:如果一个随机变量只有两个值(1 和 0,相应的概率为 $\\theta$ 和 $1 - \\theta$),那么该随机变量满足伯努利分布,遵循以下概率分布函数:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ p\\left(\\theta, x\\right) = \\theta^x \\left(1 - \\theta\\right)^\\left(1 - x\\right), x \\in \\left\\{0, 1\\right\\}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"这一分布正是我们所需要的,分布参数 $\\theta$ 就是「某个人知道甲醇化学式」的概率估计。在 400 个独立试验中,试验的结果记为 $\\textbf{x} = \\left(x_1, x_2, \\ldots, x_{400}\\right)$。写下数据的似然,即观测的可能性,比如正好观测到 117 个随机变量 $x = 1$ 和 283 个随机变量 $x = 0$ 的可能性:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" $$ p(\\textbf{x}; \\theta) = \\prod_{i=1}^{400} \\theta^{x_i} \\left(1 - \\theta\\right)^{\\left(1 - x_i\\right)} = \\theta^{117} \\left(1 - \\theta\\right)^{283}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"接着,将最大化这一 $\\theta$ 的表达式。一般而言,为了简化计算,并不最大化似然 $p(\\textbf{x}; \\theta)$,转而最大化其对数(这种变换不影响最终答案):"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\log p(\\textbf{x}; \\theta) = \\log \\prod_{i=1}^{400} \\theta^{x_i} \\left(1 - \\theta\\right)^{\\left(1 - x_i\\right)} = $$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ = \\log \\theta^{117} \\left(1 - \\theta\\right)^{283} = 117 \\log \\theta + 283 \\log \\left(1 - \\theta\\right)$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"为了找到最大化上式的 $\\theta$ 值,将上式对 $\\theta$ 求导,并令其为零,求解所得等式:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\frac{\\partial \\log p(\\textbf{x}; \\theta)}{\\partial \\theta} = \\frac{\\partial}{\\partial \\theta} \\left(117 \\log \\theta + 283 \\log \\left(1 - \\theta\\right)\\right) = \\frac{117}{\\theta} - \\frac{283}{1 - \\theta};$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"由上可知,我们的直观估计正好是最大似然估计。现在将这一推理过程应用到线性回归问题上,尝试找出均方误差背后的道理。为此,需要从概率论的角度来看线性回归。我们的模型和之前是一样的:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\textbf y = \\textbf X \\textbf w + \\epsilon$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"不过,现在假定随机误差符合均值为零的 [ 正态分布](https://baike.baidu.com/item/正态分布/829892?fr=aladdin):"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\epsilon_i \\sim \\mathcal{N}\\left(0, \\sigma^2\\right)$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"据此改写模型:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\begin{array}{rcl} \n",
"y_i &=& \\sum_{j=1}^m w_j X_{ij} + \\epsilon_i \\\\\n",
"&\\sim& \\sum_{j=1}^m w_j X_{ij} + \\mathcal{N}\\left(0, \\sigma^2\\right) \\\\\n",
"p\\left(y_i \\mid \\textbf X; \\textbf{w}\\right) &=& \\mathcal{N}\\left(\\sum_{j=1}^m w_j X_{ij}, \\sigma^2\\right)\n",
"\\end{array}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"由于样本是独立抽取的(误差不相关是高斯-马尔可夫定理的条件之一),数据的似然看起来会是密度函数 $p\\left(y_i\\right)$ 的积。转化为对数形式:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\begin{array}{rcl} \n",
"\\log p\\left(\\textbf{y}\\mid \\textbf X; \\textbf{w}\\right) &=& \\log \\prod_{i=1}^n \\mathcal{N}\\left(\\sum_{j=1}^m w_j X_{ij}, \\sigma^2\\right) \\\\\n",
"&=& \\sum_{i=1}^n \\log \\mathcal{N}\\left(\\sum_{j=1}^m w_j X_{ij}, \\sigma^2\\right) \\\\\n",
"&=& -\\frac{n}{2}\\log 2\\pi\\sigma^2 -\\frac{1}{2\\sigma^2} \\sum_{i=1}^n \\left(y_i - \\textbf{w}^\\text{T} \\textbf{x}_i\\right)^2\n",
"\\end{array}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"想要找到最大似然假设,即需要最大化表达式 $p\\left(\\textbf{y} \\mid \\textbf X; \\textbf{w}\\right)$ 以得到 $\\textbf{w}_{\\text{ML}}$,这和最大化其对数是一回事。注意,当针对某个参数最大化函数时,可以丢弃所有不依赖这一参数的变量:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\begin{array}{rcl} \n",
"\\textbf{w}_{\\text{ML}} &=& \\arg \\max_{\\textbf w} p\\left(\\textbf{y}\\mid \\textbf X; \\textbf{w}\\right) = \\arg \\max_{\\textbf w} \\log p\\left(\\textbf{y}\\mid \\textbf X; \\textbf{w}\\right)\\\\\n",
"&=& \\arg \\max_{\\textbf w} -\\frac{n}{2}\\log 2\\pi\\sigma^2 -\\frac{1}{2\\sigma^2} \\sum_{i=1}^n \\left(y_i - \\textbf{w}^{\\text{T}} \\textbf{x}_i\\right)^2 \\\\\n",
"&=& \\arg \\max_{\\textbf w} -\\frac{1}{2\\sigma^2} \\sum_{i=1}^n \\left(y_i - \\textbf{w}^{\\text{T}} \\textbf{x}_i\\right)^2 \\\\\n",
"&=& \\arg \\min_{\\textbf w} \\mathcal{L}\\left(\\textbf X, \\textbf{y}, \\textbf{w} \\right)\n",
"\\end{array}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"所以,当测量误差服从正态(高斯)分布的情况下, 最小二乘法等价于极大似然估计。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 偏置-方差分解"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"下面讨论线性回归预测的误差性质(可以推广到机器学习算法上),上文提到:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- 目标变量的真值 $y$ 是确定性函数 $f\\left(\\textbf{x}\\right)$ 和随机误差 $\\epsilon$ 之和:$y = f\\left(\\textbf{x}\\right) + \\epsilon$。\n",
"- 误差符合均值为零、方差一致的正态分布:$\\epsilon \\sim \\mathcal{N}\\left(0, \\sigma^2\\right)$。\n",
"- 目标变量的真值亦为正态分布:$y \\sim \\mathcal{N}\\left(f\\left(\\textbf{x}\\right), \\sigma^2\\right)$。\n",
"- 试图使用一个协变量线性函数逼近一个未知的确定性函数 $f\\left(\\textbf{x}\\right)$,这一协变量线性函数是函数空间中估计函数 $f$ 的一点,即均值和方差的随机变量。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"因此,点 $\\textbf{x}$ 的误差可分解为:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\begin{array}{rcl} \n",
"\\text{Err}\\left(\\textbf{x}\\right) &=& \\mathbb{E}\\left[\\left(y - \\widehat{f}\\left(\\textbf{x}\\right)\\right)^2\\right] \\\\\n",
"&=& \\mathbb{E}\\left[y^2\\right] + \\mathbb{E}\\left[\\left(\\widehat{f}\\left(\\textbf{x}\\right)\\right)^2\\right] - 2\\mathbb{E}\\left[y\\widehat{f}\\left(\\textbf{x}\\right)\\right] \\\\\n",
"&=& \\mathbb{E}\\left[y^2\\right] + \\mathbb{E}\\left[\\widehat{f}^2\\right] - 2\\mathbb{E}\\left[y\\widehat{f}\\right] \\\\\n",
"\\end{array}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"为了简洁,省略函数的参数,分别考虑每个变量。根据公式 $\\text{Var}\\left(z\\right) = \\mathbb{E}\\left[z^2\\right] - \\mathbb{E}\\left[z\\right]^2$ 可以分解前两项为:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\begin{array}{rcl} \n",
"\\mathbb{E}\\left[y^2\\right] &=& \\text{Var}\\left(y\\right) + \\mathbb{E}\\left[y\\right]^2 = \\sigma^2 + f^2\\\\\n",
"\\mathbb{E}\\left[\\widehat{f}^2\\right] &=& \\text{Var}\\left(\\widehat{f}\\right) + \\mathbb{E}\\left[\\widehat{f}\\right]^2 \\\\\n",
"\\end{array}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"注意:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\begin{array}{rcl} \n",
"\\text{Var}\\left(y\\right) &=& \\mathbb{E}\\left[\\left(y - \\mathbb{E}\\left[y\\right]\\right)^2\\right] \\\\\n",
"&=& \\mathbb{E}\\left[\\left(y - f\\right)^2\\right] \\\\\n",
"&=& \\mathbb{E}\\left[\\left(f + \\epsilon - f\\right)^2\\right] \\\\\n",
"&=& \\mathbb{E}\\left[\\epsilon^2\\right] = \\sigma^2\n",
"\\end{array}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\mathbb{E}[y] = \\mathbb{E}[f + \\epsilon] = \\mathbb{E}[f] + \\mathbb{E}[\\epsilon] = f$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"接着处理和的最后一项。由于误差和目标变量相互独立,所以可以将它们分离,写为:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\begin{array}{rcl} \n",
"\\mathbb{E}\\left[y\\widehat{f}\\right] &=& \\mathbb{E}\\left[\\left(f + \\epsilon\\right)\\widehat{f}\\right] \\\\\n",
"&=& \\mathbb{E}\\left[f\\widehat{f}\\right] + \\mathbb{E}\\left[\\epsilon\\widehat{f}\\right] \\\\\n",
"&=& f\\mathbb{E}\\left[\\widehat{f}\\right] + \\mathbb{E}\\left[\\epsilon\\right] \\mathbb{E}\\left[\\widehat{f}\\right] = f\\mathbb{E}\\left[\\widehat{f}\\right]\n",
"\\end{array}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"最后,将上述公式合并为:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\begin{array}{rcl} \n",
"\\text{Err}\\left(\\textbf{x}\\right) &=& \\mathbb{E}\\left[\\left(y - \\widehat{f}\\left(\\textbf{x}\\right)\\right)^2\\right] \\\\\n",
"&=& \\sigma^2 + f^2 + \\text{Var}\\left(\\widehat{f}\\right) + \\mathbb{E}\\left[\\widehat{f}\\right]^2 - 2f\\mathbb{E}\\left[\\widehat{f}\\right] \\\\\n",
"&=& \\left(f - \\mathbb{E}\\left[\\widehat{f}\\right]\\right)^2 + \\text{Var}\\left(\\widehat{f}\\right) + \\sigma^2 \\\\\n",
"&=& \\text{Bias}\\left(\\widehat{f}\\right)^2 + \\text{Var}\\left(\\widehat{f}\\right) + \\sigma^2\n",
"\\end{array}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"由此,从上等式可知,任何线性模型的预测误差由三部分组成:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- 偏差(bias): $\\text{Bias}\\left(\\widehat{f}\\right)$ 度量了学习算法的期望输出与真实结果的偏离程度, 刻画了算法的拟合能力,偏差偏高表示预测函数与真实结果差异很大。\n",
"- 方差(variance): $\\text{Var}\\left(\\widehat{f}\\right)$ 代表「同样大小的不同的训练数据集训练出的模型」与「这些模型的期望输出值」之间的差异。训练集变化导致性能变化,方差偏高表示模型很不稳定。\n",
"- 不可消除的误差(irremovable error): $\\sigma^2$ 刻画了当前任务任何算法所能达到的期望泛化误差的下界,即刻画了问题本身的难度。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"尽管无法消除 $\\sigma^2$,但我们可以影响前两项。理想情况下,希望同时消除偏差和方差(见下图中左上),但是在实践中,常常需要在偏置和不稳定(高方差)间寻找平衡。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"一般而言,当模型的计算量增加时(例如,自由参数的数量增加了),估计的方差(分散程度)也会增加,但偏置会下降,这可能会导致过拟合现象。另一方面,如果模型的计算量太少(例如,自由参数过低),这可能会导致欠拟合现象。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"高斯-马尔可夫定理表明:在线性模型参数估计问题中,OLS 估计是最佳的线性无偏估计。这意味着,如果存在任何无偏线性模型 g,可以确信 $Var\\left(\\widehat{f}\\right) \\leq Var\\left(g\\right)$。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 线性回归的正则化"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"低偏置和低方差往往是不可兼得的,所以在一些情形下,会为了稳定性(降低模型的方差)而导致模型的偏置 $\\text{Var}\\left(\\widehat{f}\\right)$ 提高。高斯-马尔可夫定理成立的条件之一就是矩阵 $\\textbf{X}$ 是满秩的,否则 OLS 的解 $\\textbf{w} = \\left(\\textbf{X}^\\text{T} \\textbf{X}\\right)^{-1} \\textbf{X}^\\text{T} \\textbf{y}$ 就不存在,因为逆矩阵 $\\left(\\textbf{X}^\\text{T} \\textbf{X}\\right)^{-1}$ 不存在,此时矩阵 $\\textbf{X}^\\text{T} \\textbf{X}$ 被称为奇异矩阵或退化矩阵。这类问题被称为病态问题,必须加以矫正,也就是说,矩阵 $\\textbf{X}^\\text{T} \\textbf{X}$ 需要变成非奇异矩阵(这正是这一过程叫做正则化的原因)。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们常常能在这类数据中观察到所谓的多重共线性:两个或更多特征高度相关,也就是矩阵 $\\textbf{X}$ 的列之间存在类似线性依赖的关系(又不完全是线性依赖)。例如,在「基于特征预测房价」这一问题中,属性「含阳台的面积」和「不含阳台的面积」会有一个接近线性依赖的关系。数学上,包含这类数据的矩阵 $\\textbf{X}^\\text{T} \\textbf{X}$ 被称为可逆矩阵,但由于多重共线性,一些本征值(特征值)会接近零。在 $\\textbf{X}^\\text{T} \\textbf{X}$ 的逆矩阵中,因为其本征值为 $\\frac{1}{\\lambda_i}$,所以有些本征值会变得特别大。本征值这种巨大的数值波动会导致模型参数估计的不稳定,即在训练数据中加入一组新的观测会导致完全不同的解。为了解决上述问题,有一种正则化的方法称为吉洪诺夫(Tikhonov)正则化,大致上是在均方误差中加上一个新变量:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\begin{array}{rcl} \\mathcal{L}\\left(\\textbf{X}, \\textbf{y}, \\textbf{w} \\right) &=& \\frac{1}{2n} \\left\\| \\textbf{y} - \\textbf{X} \\textbf{w} \\right\\|_2^2 + \\left\\| \\Gamma \\textbf{w}\\right\\|^2\\end{array}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"吉洪诺夫矩阵常常表达为单位矩阵乘上一个系数:$\\Gamma = \\frac{\\lambda}{2} E$。在这一情形下,最小化均方误差问题变为一个 L2 正则化问题。若对新的损失函数求导,设所得函数为零,据 $\\textbf{w}$ 重整等式,便得到了这一问题的解:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\begin{array}{rcl} \n",
"\\textbf{w} &=& \\left(\\textbf{X}^{\\text{T}} \\textbf{X} + \\lambda \\textbf{E}\\right)^{-1} \\textbf{X}^{\\text{T}} \\textbf{y}\n",
"\\end{array}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"这类回归被称为岭回归(ridge regression)。岭为对角矩阵,在 $\\textbf{X}^\\text{T} \\textbf{X}$ 矩阵上加上这一对角矩阵,以确保能得到一个正则矩阵。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"这样的解降低了方差,但增加了偏置,因为参数的正则向量也被最小化了,这导致解朝零移动。在下图中,OLS 解为白色虚线的交点,蓝点表示岭回归的不同解。可以看到,通过增加正则化参数 $\\lambda$,使解朝零移动。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 线性分类"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"线性分类器背后的基本思路是,目标分类的值可以被特征空间中的一个超平面分开。如果这可以无误差地达成,那么训练集被称为线性可分。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"上面已经介绍了线性回归和普通最小二乘法(OLS)。现在考虑一个二元分类问题,将目标分类记为「+1」(正面样本)和「-1」(负面样本)。最简单的线性分类器可以通过回归定义:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ a(\\textbf{x}) = \\text{sign}(\\textbf{w}^\\text{T}\\textbf x)$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"其中:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" - $\\textbf{x}$ 是特征向量(包括标识)。\n",
" - $\\textbf{w}$ 是线性模型中的权重向量(偏置为 $w_0$)。\n",
" - $\\text{sign}(\\bullet)$ 是符号函数,返回参数的符号。\n",
" - $a(\\textbf{x})$ 是分类 $\\textbf{x}$ 的分类器。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 基于逻辑回归的线性分类器"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"逻辑回归是线性分类器的一个特殊情形,但逻辑回归有一个额外的优点:它可以预测样本 $\\textbf{x}_\\text{i}$ 为分类「+」的概率 $p_+$:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ p_+ = P\\left(y_i = 1 \\mid \\textbf{x}_\\text{i}, \\textbf{w}\\right) $$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"逻辑回归不仅能够预测样本是「+1」还是「-1」,还能预测其分别是「+1」和「-1」的概率是多少。对于很多业务问题(比如,信用评分问题)而言,这是一个非常重要的优点。下面是一个预测贷款违约概率的例子。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"银行选择一个阈值 $p_*$ 以预测贷款违约的概率(上图中阈值为0.15),超过阈值就不批准贷款。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"为了预测概率 $p_+ \\in [0,1]$,使用 OLS 构造线性预测:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$b(\\textbf{x}) = \\textbf{w}^\\text{T} \\textbf{x} \\in \\mathbb{R}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"为了将所得结果转换为 [0,1] 区间内的概率,逻辑回归使用下列函数进行转换: "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\\sigma(z) = \\frac{1}{1 + \\exp^{-z}}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"使用 Matplotlib 库画出上面这个函数。"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import warnings\n",
"\n",
"import numpy as np\n",
"import seaborn as sns\n",
"from matplotlib import pyplot as plt\n",
"\n",
"%matplotlib inline\n",
"warnings.filterwarnings(\"ignore\")"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0.5, 1.0, 'Sigmoid function')"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEWCAYAAACJ0YulAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3Xl8VPW9//HXJwlJWMImYd8FUUQUjFat1h3BWrGLVqu31nq17a3db+/PLj+vtff2drntr9dW27rValvRWq1YacX1UqvIIohsSliTCGGLBAhZ5/P745ykY5yQCTA5s7yfj0eYs3zPnM+cGeYz3+/3nPM1d0dERAQgL+oAREQkfSgpiIhIGyUFERFpo6QgIiJtlBRERKSNkoKIiLRRUpC0Z2ZXm9n8dNuvmb1oZv/cwTozs1+bWY2ZLUpdlAn3/Rczu7Y79ynZw3SdgqQDMzsT+CFwPNACrAG+7O6LIw3sIMzsReC37n5PgnVnAQ8Bk9x9fwpjuBWY4O7XpGofklsKog5AxMz6An8GPgc8AhQCZwENUcZ1mMYAm1KZEERSQc1Hkg6OAXD3h9y9xd0PuPt8d18BYGafMrOXWgub2Qwze9PM9pjZnWb2v63NOGHZv5vZ/zOzd8xsg5mdES6vMLPt8U0rZtbPzB4wsx1mttnMvm1meR3s90IzWxvu9+eAJXoxZnY9cA9wupntM7PvtH+usJyb2YRw+n4zu8PMnjKzvWb2qpkdHVf2eDN7xsx2m1m1mX3TzGYC3wQ+Hu7n9bBsW7OWmeWFr2lz+NofMLN+4bqxYQzXmtkWM9tpZt865HdRsoKSgqSDt4AWM/uNmc0yswEdFTSzQcCjwDeAo4A3gTPaFXsfsCJc/3tgDnAKMAG4Bvi5mfUJy/4M6AeMB84GPglc18F+HwO+DQwC1gPvTxSju98LfBZ4xd37uPu/d3YAQlcC3wEGAOXAf4b7LgGeBf4KDA9fx3Pu/lfge8DD4X5OTPCcnwr/zg1fYx/g5+3KnAlMAs4HbjGz45KMV7KQkoJEzt1rCb6YHLgb2GFmc81sSILiFwOr3P0xd28Gbge2tSuz0d1/7e4twMPAKOA2d29w9/lAIzDBzPIJvoi/4e573X0T8GPgnw6y30fdvQn4aYL9Hq7H3X1R+Lp+B5wULr8E2ObuP3b3+jDWV5N8zquBn7j7BnffR5BMrzSz+Kbj74S1s9eB14FEyUVyhJKCpAV3X+Pun3L3kcAUgl/EP01QdDhQEbedA5XtylTHTR8Iy7Vf1ofgF38PYHPcus3AiCT3W5Gg3OGITzJ1YYwQJLX1h/icw3nv6ysA4hNuR/uVHKSkIGnH3dcC9xMkh/a2AiNbZ8zM4ue7aCfQRNAp3Go0UNXBfke12++oBOU6sh/oFbf90C5sW0HQ9JNIZ6cPvs17X18z706cIm2UFCRyZnasmX3NzEaG86OAq4CFCYo/BZxgZpeFTSCfB7ryBdsmbF56BPhPMysxszHAV4HfdrDf483sI+F+v9jF/b4ebn+SmRUDt3Zh2z8Dw8zsy2ZWFMb6vnBdNTC2tXM8gYeAr5jZuLAfpbUPorkL+5ccoqQg6WAvQefwq2a2nyAZrAS+1r6gu+8ELie4pmEXMBlYwqGfvvoFgl/xG4CXCDqm7zvIfr8f7nci8Pdkd+LubwG3EXQYrwv3ley2e4ELgQ8RNPWsI+g4BvhD+LjLzF5LsPl9wIPAAmAjUE/wmkUS0sVrktHCX8iVwNXu/kLU8YhkOtUUJOOY2UVm1t/MigjO0zcSNzWJSBcpKUgmOp3gbJydBE0ql7n7gWhDEskOaj4SEZE2qimIiEibjLsh3qBBg3zs2LFRhyEiklGWLl26091LOyuXcUlh7NixLFmyJOowREQyiplt7ryUmo9ERCSOkoKIiLRRUhARkTZKCiIi0kZJQURE2qQsKZjZfeHwfys7WG9mdruZlZvZCjObnqpYREQkOamsKdwPzDzI+lkEd5qcCNwI/CKFsYiISBJSdp2Cuy8ws7EHKTIbeCAcwWpheIOzYe6+NVUxiUh2cHcaW2LUN8VoaGqhoTlGQ3OMlpjTHGt99OCxxRMub2oJ5ltiHoxU5OA4MQcPp92DfTnBspiHy8IY4svF4qYBYq3P2xZzu9cQtzZ+3XtuPBS38vzjhnDiqP6HffwOJsqL10bw7uEMK8Nl70kKZnYjQW2C0aNHd0twIpIasZjzzoEmdu5rYOfeBnbub2T3vgb2NTSzt76Z2vpm9tY3tc3vq2+mvrmF+qaWIAk0B0kgl27bZhY8Du5bnNVJIWnufhdwF0BZWVkOfRREMk8s5lTWHGDjrv1U7K6jsuYAFTV1VO6uY+ueenbvb6Q5lvi/cVFBHiXFPehbXECf4gJKigsY1KcXPXvkU9wjn6KCvLbHonbzhQV59MjPIz/PKMiz8DGcz7fEy/OMPDPMgi/etmn+sax1Os8MI25ZHuG8kRdXDuKfJ1jfytq93rhV7yoXpSiTQhXvHuN2JInHxhWRNNXYHGPV23tYXvEOa7fuZW31XtZV76WusaWtTI98Y0T/nowa2ItJQ0sY1Kco+CspYlCfQkr7FDGwdyElxT0oLNAJkVGLMinMBW4yszkEQzHuUX+CSHprbI6xZNNuFqzbydLNu1lRuYeG5hgAA3sXMmlICVeUjeLYoSWML+3DqIE9GVxSTH5eevwKls6lLCmY2UPAOcAgM6sE/h3oAeDuvwTmARcD5UAdcF2qYhGRQ7e3vomnV1Xz7OpqXirfyb6GZnrkG8cP78c/nTaGk8cMYPqYAQwuKUqbJhA5dKk8++iqTtY78PlU7V9EDl1LzPnft7bz2GtVPLO6mobmGMP6FfOhE4dz7qRS3j9hEL2LMqJLUrpI76qItNlb38TDiyu4/+VNVNYcYECvHlxRNooPTx/BtFH9VRPIAUoKIkJtfRN3L9jAr/++iX0NzZwydgDfvPg4LjhuiDp/c4ySgkgOa2hu4f6/b+LOF9ez50ATHzxhGJ85ezxTR6b2XHhJX0oKIjnqlfW7+Paf3mD9jv2cM6mUf50xiSkj+kUdlkRMSUEkx+ytb+K2J1fzh6WVjBrYk19fdwrnThocdViSJpQURHLI8op3+OJDy6isqeNz5xzNF8+bSM/C/KjDkjSipCCSIx58ZRPfeXI1Q/oW8/BnTueUsQOjDknSkJKCSJZraonxnSdX8duFW7jguMH8+IqT6NezR9RhSZpSUhDJYnWNzXzmwaX8bd1OPnfO0Xx9xiTydMsJOQglBZEstedAE5++fzHLttTwo49N5fKyUZ1vJDlPSUEkC71T18jV97zKW9V7ueMT05l1wrCoQ5IMoaQgkmXqGpu57v7FrKvex12fLNPpptIlun5dJIs0Nsf47G9f4/WKd7j9qmlKCNJlqimIZAl35+Y/rmDBWzv44UenMnPK0KhDkgykmoJIlrj3pY08tqyKr1xwDFecok5lOTRKCiJZYMFbO/jevDXMmjKUL5w3IepwJIMpKYhkuKp3DvCFh5ZxzJAS/vvyE3UdghwWJQWRDNYSc74yZznNLTF+ec3JGg1NDps+QSIZ7M4Xylm0aTc/ueJExg7qHXU4kgVUUxDJUMu21PDT59Yx+6ThfHjaiKjDkSyhpCCSgRqaW/j6oysY2reY7142RWMnyxGj5iORDPSLF9dTvn0fv77uFPoW646ncuSopiCSYcq37+XOF9Zz6YnDdcWyHHFKCiIZxN355mMr6VWUzy0fmhx1OJKFlBREMsiTK7ayaNNubp55LIP6FEUdjmQhJQWRDFHf1ML3563h+OF9NTaCpIySgkiGuHvBBt7eU8//vWQy+bpqWVJESUEkA1TX1nPni+uZNWUop40/KupwJIspKYhkgNufW0dzLMY3Zh0XdSiS5ZQURNJcxe46Hl5cwcdPGcXoo3pFHY5kOSUFkTR3+3PryMszbjp3YtShSA5IaVIws5lm9qaZlZvZzQnWjzazF8xsmZmtMLOLUxmPSKbZuHM/jy2r4pr3jWFov+Kow5EckLKkYGb5wB3ALGAycJWZtb/a5tvAI+4+DbgSuDNV8YhkotufW0dhfh6fO+foqEORHJHKmsKpQLm7b3D3RmAOMLtdGQf6htP9gLdTGI9IRtmyq44nlldxzWmjKS3RhWrSPVKZFEYAFXHzleGyeLcC15hZJTAP+EKiJzKzG81siZkt2bFjRypiFUk7d/9tAwV5efzzWeOjDkVySNQdzVcB97v7SOBi4EEze09M7n6Xu5e5e1lpaWm3BynS3Xbua+CRJRV8eNoIhvRVX4J0n1QmhSog/lr8keGyeNcDjwC4+ytAMTAohTGJZITfvLyJxpYYN56tWoJ0r1QmhcXARDMbZ2aFBB3Jc9uV2QKcD2BmxxEkBbUPSU7b39DMA69sZsbkIRxd2ifqcCTHpCwpuHszcBPwNLCG4CyjVWZ2m5ldGhb7GnCDmb0OPAR8yt09VTGJZII/vlbJngNNfOZsnXEk3S+lI6+5+zyCDuT4ZbfETa8G3p/KGEQyibvzm5c3ceLIfkwfPSDqcCQHRd3RLCJxXirfyfod+7n2jLFRhyI5SklBJI385uXNHNW7kA9OHRZ1KJKjlBRE0kTF7jqeW1vNVaeOpqggP+pwJEcpKYikiQcXbibPjKtPGx11KJLDlBRE0kB9UwsPL67gouOHMKxfz6jDkRympCCSBp5etY09B5q4+n1jog5FcpySgkgaeHhxBaMG9uR0DbUpEVNSEInY5l37eXn9Lq44eRR5eRZ1OJLjlBREIvaHJZXkGXysbGTUoYgoKYhEqbklxh+WVnD2MaXqYJa0oKQgEqEF63ZQXdvAx08Z1XlhkW6gpCASoUcWV3JU70LOO3ZI1KGIAEoKIpF5p66R59du59KThlNYoP+Kkh70SRSJyLw3ttHYEuMj09TBLOlDSUEkIn9aXsXRpb2ZMqJv1KGItFFSEIlAZU0dizbu5sPTRmCmaxMkfSgpiETgieVvAzD7pBERRyLybkoKIt3M3Xl8WRVlYwYwamCvqMMReRclBZFuturtWsq37+OyaaolSPpRUhDpZk8sr6JHvvHBEzS6mqQfJQWRbhSLOU++vpWzjyllQO/CqMMReQ8lBZFu9NqWGrbV1nPJ1OFRhyKSkJKCSDd66o2tFBbkcf5xg6MORSQhJQWRbhKLOfPeCJqOSop7RB2OSEJKCiLd5LUtNVTXNnDJVHUwS/pSUhDpJv9oOtIdUSV9KSmIdIPWpqNzjimlT1FB1OGIdEhJQaQbtDYdfVBNR5LmlBREusGfV6jpSDKDkoJIisVizl9WqulIMkNSScHMBpvZh83s82b2aTM71cw63dbMZprZm2ZWbmY3d1DmCjNbbWarzOz3XX0BIuluqZqOJIMc9GeLmZ0L3AwMBJYB24Fi4DLgaDN7FPixu9cm2DYfuAO4EKgEFpvZXHdfHVdmIvAN4P3uXmNmuqJHss5fV25T05FkjM7qshcDN7j7lvYrzKwAuITgS/+PCbY9FSh39w1h+TnAbGB1XJkbgDvcvQbA3bd3+RWIpDF3Z/7qbZw5YZCajiQjHLQJyN2/nighhOua3f1P7p4oIQCMACri5ivDZfGOAY4xs7+b2UIzm5noiczsRjNbYmZLduzYcbCQRdLK2m17qdh9gBmTVUuQzJBsn0KLmX3f4sYNNLPXjsD+C4CJwDnAVcDdZta/fSF3v8vdy9y9rLS09AjsVqR7zF9VjRlqOpKMkezZR6vCsvPNbGC4rLOBZauAUXHzI8Nl8SqBue7e5O4bgbcIkoRIVpi/ehsnjx5AaUlR1KGIJCXZpNDs7v8G3AP8zcxOBryTbRYDE81snJkVAlcCc9uV+RNBLQEzG0TQnLQhyZhE0lplTR2r3q5lxvGqJUjmSLbnywDc/WEzWwX8Hhh9sA3cvdnMbgKeBvKB+9x9lZndBixx97nhuhlmthpoAb7u7rsO8bWIpJVnVlcDcOHkoRFHIpK8ZJPCP7dOuPtKMzuL4Eyig3L3ecC8dstuiZt24Kvhn0hWmb+qmmOG9GHcoN5RhyKStIM2H5nZmQDuvjR+ubvvcfcHzKyvmU1JZYAimahmfyOLNu1mhmoJkmE6qyl81Mx+CPwVWArsILh4bQJwLjAG+FpKIxTJQM+v3U5LzNWfIBnnoEnB3b8Snm30UeByYBhwAFgD/MrdX0p9iCKZ5+lV2xjat5gTRvSLOhSRLum0T8HddwN3h38i0okDjS0sWLeDK8pGEXdpj0hG6OzeRwftAHb3nxzZcEQy39/W7aC+Kab+BMlIndUUSsLHScAp/OM6gw8Bi1IVlEgmm7+6mpLiAt43fmDnhUXSTGd9Ct8BMLMFwHR33xvO3wo8lfLoRDJMc0uM59ZUc/6xg+mRr+FKJPMk+6kdAjTGzTeGy0QkzpLNNdTUNTHjeDUdSWZK9uK1B4BFZvZ4OH8ZcH9KIhLJYPNXVVNYkMcHjtGNGyUzJZUU3P0/zewvwFnhouvcfVnqwhLJPBo7QbJBZ2cf9XX32vBahU3hX+u6geHpqiICrNm6l8qaA9x07oSoQxE5ZJ39nPk9wehqSwnuihp/0rUD41MUl0jGmb96m8ZOkIzX2dlHl4SP47onHJHMNX9VtcZOkIyXdMOnmV0KfCCcfdHd/5yakEQyT8XuOlZvreWbFx8bdSgihyXZ4Ti/D3wJWB3+fcnMvpfKwEQyicZOkGyRbE3hYuAkd48BmNlvgGXAN1MVmEgmmb96m8ZOkKzQlUsu+8dN69aPIqGa/Y0s2ribCyerg1kyX7I1hf8ClpnZCwRnIH0AuDllUYlkkOfWbifmcJGuYpYskOzFaw+Z2YsEN8UD+D/uvi1lUYlkkPkaO0GySFeaj1qv2y8AzjCzj6QgHpGM0jp2wozjh2jsBMkKSdUUzOw+YCqwCoiFix14LEVxiWQEjZ0g2SbZPoXT3H1ySiMRyUAaO0GyTbLNR6+YmZKCSByNnSDZqCu3zn7FzLYBDQRnILm7T01ZZCJpTmMnSDZKNincC/wT8Ab/6FMQyWkaO0GyUbJJYYe7z+28mEhu0NgJkq2S/TQvM7PfA08SNB8B4O46+0hyksZOkGyVbFLoSZAMZsQt0ympkrM0doJkq2SvaL4u1YGIZBKNnSDZKtmL125PsHgPsMTdnziyIYmkN42dINks2ZOri4GTgHXh31RgJHC9mf00RbGJpKVn12jsBMleySaFqcC57v4zd/8ZcAFwLPBh3t3P8C5mNtPM3jSzcjPr8K6qZvZRM3MzK+tK8CJRmL+qWmMnSNZKNikMAPrEzfcGBrp7C3FnI8Uzs3zgDmAWMBm4KtFV0WZWQjCq26tdiFskEjX7G1m0SWMnSPZKNin8EFhuZr82s/sJRl37kZn1Bp7tYJtTgXJ33+DujcAcYHaCct8FfgDUdylykQg8s6aalpjrBniStZJKCu5+L3AG8CfgceBMd7/H3fe7+9c72GwEUBE3Xxkua2Nm04FR7v7UwfZvZjea2RIzW7Jjx45kQhZJib+8sZUR/XsydaTGTpDsdNCkYGbHho/TgWEEX/IVwNBw2SEzszzgJ8DXOivr7ne5e5m7l5WW6pYCEo09B5p4qXwnF58wVGMnSNbq7JTUrwI3Aj+OW+Zx0+cdZNsqYFTc/MhwWasSYArwYvgfbCgw18wudfclncQl0u2eW1NNU4sz64RhUYcikjIHrSm4+43h5C+A2e5+LvACwTUK/9rJcy8GJprZODMrBK4E2u6f5O573H2Qu49197HAQkAJQdLWvDe2MrxfMdNG9Y86FJGUSbaj+dvuXmtmZxLUDu4hSBQdcvdm4CbgaWAN8Ii7rzKz28zs0sMJWqS77a1vYsFbO5k5ZZiajiSrJXvvo5bw8YPA3e7+lJn9R2cbufs8YF67Zbd0UPacJGMR6XbPr91OY0uMi0/QWUeS3ZKtKVSZ2a+AjwPzzKyoC9uKZLx5b2xlcEkR00cPiDoUkZRK9ov9CoJmoIvc/R1gINDRqagiWWV/QzMvvrmDWVOGkpenpiPJbsneJbWOuNtku/tWYGuqghJJJy+8uZ2G5pjOOpKcoCYgkU7Me2Mrg/oUccrYgVGHIpJySgoiB7G/oZnn125n5pQh5KvpSHKAkoLIQTyzupr6phizTxrReWGRLKCkIHIQTyyvYkT/npyss44kRygpiHRg174GFqzbyYdOHK6zjiRnKCmIdGDeym20xJzZJw2POhSRbqOkINKBucurOGZIH44dWhJ1KCLdRklBJIHKmjoWb6ph9kkjdK8jySlKCiIJPPl6cG3mpSeq6Uhyi5KCSAJPLK9i+uj+jBrYK+pQRLqVkoJIOyur9rB2214um6ZrEyT3KCmItPPo0koK8/PUdCQ5SUlBJE5jc4wnlldx4fFD6N+rMOpwRLqdkoJInOfWVFNT18TlJ4+MOhSRSCgpiMT5w9JKhvQt4qyJpVGHIhIJJQWR0Pbaev73rR18ZPpI3RFVcpaSgkjo8WVVtMRcTUeS05QURIBYzJmzuIKyMQMYX9on6nBEIqOkIAK8vH4XG3fu5+rTRkcdikiklBREgN8u3MzA3oXMmqJxmCW3KSlIztu2p55n1lRzedlIinvkRx2OSKSUFCTnzVm8hZg7V586JupQRCKnpCA5raklxkOLtvCBiaWMPko3vxNRUpCc9uzqaqprG7jmNNUSREBJQXLcvS9tZNTAnpx37OCoQxFJC0oKkrNe21LDks01fPr943QFs0hISUFy1j1/20Df4gKuKBsVdSgiaSOlScHMZprZm2ZWbmY3J1j/VTNbbWYrzOw5M1PDrnSLLbvq+OvKbXzifWPoXVQQdTgiaSNlScHM8oE7gFnAZOAqM5vcrtgyoMzdpwKPAj9MVTwi8e77+0by84xPnTE26lBE0koqawqnAuXuvsHdG4E5wOz4Au7+grvXhbMLAd2JTFJu574G5izewodOHM7QfsVRhyOSVlKZFEYAFXHzleGyjlwP/CXRCjO70cyWmNmSHTt2HMEQJRfdvWADjc0xPn/uhKhDEUk7adHRbGbXAGXAjxKtd/e73L3M3ctKSzX4iRy6XfsaeOCVzXzoxOEcrbuhirxHKnvYqoD40zpGhsvexcwuAL4FnO3uDSmMR4R7XtpIfXMLXzhPtQSRRFJZU1gMTDSzcWZWCFwJzI0vYGbTgF8Bl7r79hTGIkLN/kYeeHkTl0wdzoTBJVGHI5KWUpYU3L0ZuAl4GlgDPOLuq8zsNjO7NCz2I6AP8AczW25mczt4OpHDdscL5RxoUi1B5GBSeoK2u88D5rVbdkvc9AWp3L9Iq4rddTzwymY+dvJIjhmiWoJIR9Kio1kk1f57/pvk5cFXLjwm6lBE0pqSgmS9Nyr38MTyt7n+zHEM69cz6nBE0pqSgmQ1d+e7T61mYO9CPnv20VGHI5L2lBQkqz32WhWLNu7m6xdNoqS4R9ThiKQ9JQXJWnvqmvjevDVMG92fj+tOqCJJ0e0hJWv9aP5aauoaeeD6U8nTeAkiSVFNQbLSkk27+d2rW/jUGeM4fni/qMMRyRhKCpJ19jc089VHXmfkgJ58dYZOQRXpCjUfSdb53rw1VNTU8fCNp9NHA+iIdIlqCpJVXnhzO797dQs3nDWeU8cNjDockYyjpCBZ4+13DvC1R15n0pASvqorl0UOiZKCZIXG5hj/8rvXaGyOcec10ynukR91SCIZSQ2ukhX+46nVLK94h19cPV2D54gcBtUUJOP95uVNPPDKZm44axyzThgWdTgiGU1JQTLa06u2ceuTq7hw8hBunnVc1OGIZDwlBclYSzfv5osPLePEkf25/cpp5OuqZZHDpqQgGWnp5t188t5FDO/fk3uvLaNnoTqWRY4EJQXJOK0JYXDfYh664TSO6lMUdUgiWUNJQTLKc2uqueaeICHMufE0hvYrjjokkayipCAZ48GFm7nhgSVMGNyHhz9zGkP6KiGIHGm6TkHSXn1TC//x1Gp+u3AL5x87mJ99Yhq9CvXRFUkF/c+StLZlVx3/8vulrKyq5TMfGM/XL5pEQb4quCKpoqQgaSkWcx5cuJkf/HUtBXnG3Z8s48LJQ6IOSyTrKSlI2llXvZdvPv4GizfV8IFjSvmvj5zAiP49ow5LJCcoKUja2L63np8+u445i7ZQUtyD/778RD46fQRmuihNpLsoKUjktu2p596XNvC7V7fQ2Bzjk6eP5YvnT2Rg78KoQxPJOUoKEgl3Z2VVLQ8u3MTjy6poiTkfnDqcr1wwkfG6y6lIZJQUpFvt2NvAUyve5uEllazZWktRQR5XnjKaG84az+ijekUdnkjOU1KQlHJ3Nuzcz/NrtvP0qm0s3VKDO0wZ0Zfvzj6eS08cQb9ePaIOU0RCSgpyRDW3xNiwcz9LN9fwyvpdLNywi+17GwA4blhfvnT+RGZOGcqxQ/tGHKmIJKKkIIfE3amubWDjzv1s3LmfNVtrWfn2HtZsraW+KQZAaUkRp48/itOPPoozJwxi1EA1D4mku5QmBTObCfwPkA/c4+7fb7e+CHgAOBnYBXzc3TelMibpXFNLjD0Hmti5r4Hq2gaqa+vZXltPdW0D22rrqdhdx+ZddRxoamnbpqSogMnD+/KJU8cwZURfpo7sz9GlvXU6qUiGSVlSMLN84A7gQqASWGxmc919dVyx64Ead59gZlcCPwA+nqqYMo270xxzWmLBY3NLrOP5Fqc5Fsw3Ncc40NRCfVOM+qYW6pta2uYPNLXQEM7vb2hhz4Emag80BY/1wWNdY0vCePr36sGQkmJGDOjJ+ycMYuyg3ow7qjdjB/VieL+e5GmQG5GMl8qawqlAubtvADCzOcBsID4pzAZuDacfBX5uZubufqSDeWRxBb9asB4AD/9xgi/e1p25g+PBY1wErWVal7WVaVvmcdsneM7W+bbt3/2c3m57HFo8+LJPhaKCPHoW5tOrRz59e/agX88ejDmqV9t069+gPkUM6VvEkL7FlJYUUdxDA9mIZLtUJoURQEXcfCXwvo7KuHuzme0BjgJ2xhcysxuBGwFGjx59SMEM6F0YdG6GP2YteN7wsW1x2zIMwqm29dZ+WVjw3dsHZdo/J4m2b3seayvbut+CPCP4w2thAAAHX0lEQVQ/L3gsyM/7x3y+UZD33vnWsvn5RmF+HsU98inukUfPHvkU98hveywqyNMvehHpUEZ0NLv7XcBdAGVlZYf08/nCyUN0QzURkU6k8h7EVcCouPmR4bKEZcysAOhH0OEsIiIRSGVSWAxMNLNxZlYIXAnMbVdmLnBtOP0x4PlU9CeIiEhyUtZ8FPYR3AQ8TXBK6n3uvsrMbgOWuPtc4F7gQTMrB3YTJA4REYlISvsU3H0eMK/dslvipuuBy1MZg4iIJE/jGoqISBslBRERaaOkICIibZQURESkjWXaGaBmtgPYfIibD6Ld1dJpQnF1jeLqunSNTXF1zeHENcbdSzsrlHFJ4XCY2RJ3L4s6jvYUV9corq5L19gUV9d0R1xqPhIRkTZKCiIi0ibXksJdUQfQAcXVNYqr69I1NsXVNSmPK6f6FERE5OByraYgIiIHoaQgIiJtsi4pmNnlZrbKzGJmVtZu3TfMrNzM3jSzizrYfpyZvRqWezi87feRjvFhM1se/m0ys+UdlNtkZm+E5ZYc6TgS7O9WM6uKi+3iDsrNDI9huZnd3A1x/cjM1prZCjN73Mz6d1CuW45XZ6/fzIrC97g8/CyNTVUscfscZWYvmNnq8PP/pQRlzjGzPXHv7y2JnisFsR30fbHA7eHxWmFm07shpklxx2G5mdWa2Zfblem242Vm95nZdjNbGbdsoJk9Y2brwscBHWx7bVhmnZldm6hMl7h7Vv0BxwGTgBeBsrjlk4HXgSJgHLAeyE+w/SPAleH0L4HPpTjeHwO3dLBuEzCoG4/drcC/dlImPzx244HC8JhOTnFcM4CCcPoHwA+iOl7JvH7gX4BfhtNXAg93w3s3DJgeTpcAbyWI6xzgz931eUr2fQEuBv5CMELtacCr3RxfPrCN4OKuSI4X8AFgOrAybtkPgZvD6ZsTfe6BgcCG8HFAOD3gcGLJupqCu69x9zcTrJoNzHH3BnffCJQDp8YXsGAw5fOAR8NFvwEuS1Ws4f6uAB5K1T5S4FSg3N03uHsjMIfg2KaMu8939+ZwdiHBKH5RSeb1zyb47EDwWTrfWgfqThF33+rur4XTe4E1BGOgZ4LZwAMeWAj0N7Nh3bj/84H17n6od0o4bO6+gGBMmXjxn6OOvosuAp5x993uXgM8A8w8nFiyLikcxAigIm6+kvf+pzkKeCfuCyhRmSPpLKDa3dd1sN6B+Wa21MxuTGEc8W4Kq/D3dVBdTeY4ptKnCX5VJtIdxyuZ199WJvws7SH4bHWLsLlqGvBqgtWnm9nrZvYXMzu+m0Lq7H2J+jN1JR3/MIvieLUa4u5bw+ltQKJB5o/4sUvpIDupYmbPAkMTrPqWuz/R3fEkkmSMV3HwWsKZ7l5lZoOBZ8xsbfiLIiVxAb8Avkvwn/i7BE1bnz6c/R2JuFqPl5l9C2gGftfB0xzx45VpzKwP8Efgy+5e2271awRNJPvC/qI/ARO7Iay0fV/CPsNLgW8kWB3V8XoPd3cz65brBzIyKbj7BYewWRUwKm5+ZLgs3i6CqmtB+AsvUZkjEqOZFQAfAU4+yHNUhY/bzexxgqaLw/rPlOyxM7O7gT8nWJXMcTzicZnZp4BLgPM9bExN8BxH/HglkMzrby1TGb7P/Qg+WyllZj0IEsLv3P2x9uvjk4S7zzOzO81skLun9MZvSbwvKflMJWkW8Jq7V7dfEdXxilNtZsPcfWvYnLY9QZkqgr6PViMJ+lMPWS41H80FrgzPDBlHkPEXxRcIv2xeAD4WLroWSFXN4wJgrbtXJlppZr3NrKR1mqCzdWWiskdKu3bcD3ewv8XARAvO0iokqHrPTXFcM4F/Ay5197oOynTX8Urm9c8l+OxA8Fl6vqNEdqSEfRb3Amvc/ScdlBna2rdhZqcS/P9PabJK8n2ZC3wyPAvpNGBPXLNJqnVYW4/ieLUT/znq6LvoaWCGmQ0Im3tnhMsOXXf0rHfnH8GXWSXQAFQDT8et+xbBmSNvArPils8DhofT4wmSRTnwB6AoRXHeD3y23bLhwLy4OF4P/1YRNKOk+tg9CLwBrAg/kMPaxxXOX0xwdsv6boqrnKDddHn498v2cXXn8Ur0+oHbCJIWQHH42SkPP0vju+EYnUnQ7Lci7jhdDHy29XMG3BQem9cJOuzP6Ia4Er4v7eIy4I7weL5B3FmDKY6tN8GXfL+4ZZEcL4LEtBVoCr+/rifoh3oOWAc8CwwMy5YB98Rt++nws1YOXHe4seg2FyIi0iaXmo9ERKQTSgoiItJGSUFERNooKYiISBslBRERaaOkICIibZQURESkjZKCyGEys8/G3XN/o5m9EHVMIodKF6+JHCHhvYeeB37o7k9GHY/IoVBNQeTI+R+C+xwpIUjGysi7pIqkm/AurmMI7pcjkrHUfCRymMzsZIKRsc7yYPQrkYyl5iORw3cTwRi5L4SdzfdEHZDIoVJNQURE2qimICIibZQURESkjZKCiIi0UVIQEZE2SgoiItJGSUFERNooKYiISJv/D+qYWerdJydsAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"def sigma(z):\n",
" return 1.0 / (1 + np.exp(-z))\n",
"\n",
"\n",
"xx = np.linspace(-10, 10, 1000)\n",
"plt.plot(xx, [sigma(x) for x in xx])\n",
"plt.xlabel(\"z\")\n",
"plt.ylabel(\"sigmoid(z)\")\n",
"plt.title(\"Sigmoid function\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"事件 $X$ 的概率记为 $P(X)$,则比值比 $OR(X)$ 由式 $\\frac{P(X)}{1-P(X)}$ 决定,比值比是某一事件是否发生的概率之比。显然,概率和比值比包含同样的信息,不过 $P(X)$ 的范围是 0 到 1,而 $OR(X)$ 的范围是 0 到 $\\infty$。如果计算 $OR(X)$ 的对数,那么显然有 $\\log{OR(X)} \\in \\mathbb{R}$,这在 OLS 中有用到。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"让我们看看逻辑回归是如何做出预测的:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$p_+ = P\\left(y_i = 1 \\mid \\textbf{x}_\\text{i}, \\textbf{w}\\right)$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"现在,假设已经通过某种方式得到了权重 $\\textbf{w}$,即模型已经训练好了,逻辑回归预测的步骤如下:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"步骤一 计算:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$w_{0}+w_{1}x_1 + w_{2}x_2 + ... = \\textbf{w}^\\text{T}\\textbf{x}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"等式 $\\textbf{w}^\\text{T}\\textbf{x} = 0$ 定义了一个超空间将样本分为两类。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"步骤二 计算对数比值比 $OR_{+}$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\log(OR_{+}) = \\textbf{w}^\\text{T}\\textbf{x}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"步骤三 现在已经有了将一个样本分配到「+」分类的概率 $OR_{+}$,可以据此计算 $p_{+}$:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ p_{+} = \\frac{OR_{+}}{1 + OR_{+}} = \\frac{\\exp^{\\textbf{w}^\\text{T}\\textbf{x}}}{1 + \\exp^{\\textbf{w}^\\text{T}\\textbf{x}}} = \\frac{1}{1 + \\exp^{-\\textbf{w}^\\text{T}\\textbf{x}}} = \\sigma(\\textbf{w}^\\text{T}\\textbf{x})$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"上式的右边就是 sigmoid 函数。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"所以,逻辑回归预测一个样本分配为「+」分类的概率(假定已知模型的特征和权重),这一预测过程是通过对权重向量和特征向量的线性组合进行 sigmoid 变换完成的,公式如下:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ p_+(\\textbf{x}_\\text{i}) = P\\left(y_i = 1 \\mid \\textbf{x}_\\text{i}, \\textbf{w}\\right) = \\sigma(\\textbf{w}^\\text{T}\\textbf{x}_\\text{i}). $$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"下面介绍模型是如何被训练的,我们将再次通过最大似然估计训练模型。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 最大似然估计和逻辑回归"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"现在,看下从最大似然估计(MLE)出发如何进行逻辑回归优化,也就是最小化逻辑损失函数。前面已经见过了将样本分配为「+」分类的逻辑回归模型:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ p_+(\\textbf{x}_\\text{i}) = P\\left(y_i = 1 \\mid \\textbf{x}_\\text{i}, \\textbf{w}\\right) = \\sigma(\\textbf{w}^T\\textbf{x}_\\text{i})$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"「-」分类相应的表达式为:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ p_-(\\textbf{x}_\\text{i}) = P\\left(y_i = -1 \\mid \\textbf{x}_\\text{i}, \\textbf{w}\\right) = 1 - \\sigma(\\textbf{w}^T\\textbf{x}_\\text{i}) = \\sigma(-\\textbf{w}^T\\textbf{x}_\\text{i}) $$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"这两个表达式可以组合成一个:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ P\\left(y = y_i \\mid \\textbf{x}_\\text{i}, \\textbf{w}\\right) = \\sigma(y_i\\textbf{w}^T\\textbf{x}_\\text{i})$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"表达式 $M(\\textbf{x}_\\text{i}) = y_i\\textbf{w}^T\\textbf{x}_\\text{i}$ 称为目标 $\\textbf{x}_\\text{i}$ 的分类边缘。如果边缘非负,则模型正确选择了目标 $\\textbf{x}_\\text{i}$ 的分类;如果边缘为负,则目标 $\\textbf{x}_\\text{i}$ 被错误分类了。注意,边缘仅针对训练集中的目标(即标签 $y_i$ 已知的目标)而言。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"为了准确地理解为何有这一结论,需要理解向线性分类器的几何解释。首先,看下线性代数的一个经典入门问题「找出向径 $\\textbf{x}_A$ 与平面 $\\textbf{w}^\\text{T}\\textbf{x} = 0$ 的距离」,即:"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"$$\\rho(\\textbf{x}_A, \\textbf{w}^\\text{T}\\textbf{x} = 0) = \\frac{\\textbf{w}^\\text{T}\\textbf{x}_A}{||\\textbf{w}||}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"答案:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"从答案中,可以看到,表达式 $\\textbf{w}^\\text{T}\\textbf{x}_\\text{i}$ 的绝对值越大,点 $\\textbf{x}_\\text{i}$ 离平面 $\\textbf{w}^\\text{T}\\textbf{x} = 0$ 的距离就越远。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"因此,表达式 $M(\\textbf{x}_\\text{i}) = y_i\\textbf{w}^\\text{T}\\textbf{x}_\\text{i}$ 是模型对目标 $\\textbf{x}_\\text{i}$ 分类的肯定程度:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- 如果边缘的绝对值较大,且为正值,那么分类的标签是正确的,且目标离分界超平面很远,也就是模型对这一分类很肯定。如下图点 $x_3$ 所示。\n",
"- 如果边缘的绝对值较大,且为负值,那么分类的标签是错误的,且目标离分界超平面很远,那么目标很可能是一个异常值(例如,它可能为训练集中一个错误标记的值)。如下图点 $x_1$ 所示。\n",
"- 如果边缘绝对值较小,那么目标距离分界超平面很近,其符号就决定了目标「是否被正确分类」。如下图点 $x_2$ 和 $x_4$ 所示。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"现在,计算数据集的似然,即基于数据集 $\\textbf{x}$ 观测到给定向量 $\\textbf{y}$ 的概率。假设目标来自一个独立分布,然后可建立如下公式:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ P\\left(\\textbf{y} \\mid \\textbf{X}, \\textbf{w}\\right) = \\prod_{i=1}^{\\ell} P\\left(y = y_i \\mid \\textbf{x}_\\text{i}, \\textbf{w}\\right),$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"其中,$\\ell$ 为数据集 $\\textbf{X}$ 的长度(行数)。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"对这个表达式取对数,简化计算:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\log P\\left(\\textbf{y} \\mid \\textbf{X}, \\textbf{w}\\right) = \\log \\prod_{i=1}^{\\ell} P\\left(y = y_i \\mid \\textbf{x}_\\text{i}, \\textbf{w}\\right) = \\log \\prod_{i=1}^{\\ell} \\sigma(y_i\\textbf{w}^\\text{T}\\textbf{x}_\\text{i}) = $$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ = \\sum_{i=1}^{\\ell} \\log \\sigma(y_i\\textbf{w}^\\text{T}\\textbf{x}_\\text{i}) = \\sum_{i=1}^{\\ell} \\log \\frac{1}{1 + \\exp^{-y_i\\textbf{w}^\\text{T}\\textbf{x}_\\text{i}}} = - \\sum_{i=1}^{\\ell} \\log (1 + \\exp^{-y_i\\textbf{w}^\\text{T}\\textbf{x}_\\text{i}})$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"最大化似然等价于最小化以下表达式:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\mathcal{L_{\\log}} (\\textbf X, \\textbf{y}, \\textbf{w}) = \\sum_{i=1}^{\\ell} \\log (1 + \\exp^{-y_i\\textbf{w}^\\text{T}\\textbf{x}_\\text{i}}).$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"上式就是逻辑损失函数。用分类边缘 $M(\\textbf{x}_\\text{i})$ 改写逻辑损失函数,有 $L(M) = \\log (1 + \\exp^{-M})$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"将这一函数的图像和 0-1 损失函数的图像绘制在一张图上。当错误分类发生时,0-1 损失函数只会以恒定的数值 1.0 惩罚模型,即 $L_{1/0}(M) = [M < 0]$。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"上图体现了这样一个想法:如果不能够直接最小化分类问题的误差数量(至少无法通过梯度方法最小化,因为 0-1 损失函数在 0 的导数趋向无穷),那么可以转而最小化它的上界。对逻辑损失函数而言,以下公式是成立的:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\mathcal{L_{1/0}} (\\textbf X, \\textbf{y}, \\textbf{w}) = \\sum_{i=1}^{\\ell} [M(\\textbf{x}_\\text{i}) < 0] \\leq \\sum_{i=1}^{\\ell} \\log (1 + \\exp^{-y_i\\textbf{w}^\\text{T}\\textbf{x}_\\text{i}}) = \\mathcal{L_{\\log}} (\\textbf X, \\textbf{y}, \\textbf{w}), $$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"其中 $\\mathcal{L_{1/0}} (\\textbf X, \\textbf{y})$ 只是数据集$(\\textbf X, \\textbf{y})$ 上权重 $\\textbf{w}$ 的逻辑回归误差。因此,可以通过降低分类误差数 $\\mathcal{L_{log}}$ 的上限,降低分数误差数本身。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 逻辑回归的 $L_2$ 正则化"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"逻辑回归的 L2 正则化和岭回归的情况基本一样。代替 $\\mathcal{L_{\\log}} (\\textbf X, \\textbf{y}, \\textbf{w})$,只用最小化下式:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\mathcal{J}(\\textbf X, \\textbf{y}, \\textbf{w}) = \\mathcal{L_{\\log}} (\\textbf X, \\textbf{y}, \\textbf{w}) + \\lambda |\\textbf{w}|^2$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"在逻辑回归中,通常使用正则化系数的倒数 $C = \\frac{1}{\\lambda}$:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \\widehat{\\textbf w} = \\arg \\min_{\\textbf{w}} \\mathcal{J}(\\textbf X, \\textbf{y}, \\textbf{w}) = \\arg \\min_{\\textbf{w}}\\ (C\\sum_{i=1}^{\\ell} \\log (1 + \\exp^{-y_i\\textbf{w}^\\text{T}\\textbf{x}_\\text{i}})+ |\\textbf{w}|^2)$$ "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"下面通过一个例子直观地理解正则化。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 逻辑回归正则化示例"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"正则化是如何影响分类质量的呢?我们使用吴恩达机器学习课程中的「微芯片测试」数据集,运用基于多项式特征的逻辑回归方法,然后改变正则化参数 $C$。首先,看看正则化是如何影响分类器的分界,并查看欠拟合和过拟合的情况。接着,将通过交叉验证和网格搜索方法来选择接近最优值的正则化参数。"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"from matplotlib import pyplot as plt\n",
"from sklearn.linear_model import LogisticRegression, LogisticRegressionCV\n",
"from sklearn.model_selection import (GridSearchCV, StratifiedKFold,\n",
" cross_val_score)\n",
"from sklearn.preprocessing import PolynomialFeatures\n",
"\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"使用 Pandas 库的 `read_csv()` 方法加载数据。这个数据集内有 118 个微芯片(目标),其中有两项质量控制测试的结果(两个数值变量)和微芯片是否投产的信息。变量已经过归一化,即列中的值已经减去其均值。所以,微芯片的平均测试值为零。"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 118 entries, 0 to 117\n",
"Data columns (total 3 columns):\n",
"test1 118 non-null float64\n",
"test2 118 non-null float64\n",
"released 118 non-null int64\n",
"dtypes: float64(2), int64(1)\n",
"memory usage: 2.8 KB\n"
]
}
],
"source": [
"# 读取数据集\n",
"data = pd.read_csv(\n",
" \"../../data/microchip_tests.txt\", header=None, names=(\"test1\", \"test2\", \"released\")\n",
")\n",
"# 查看数据集的一些信息\n",
"data.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"查看开始五行和最后五行的数据。"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" test1 | \n",
" test2 | \n",
" released | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.051267 | \n",
" 0.69956 | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" -0.092742 | \n",
" 0.68494 | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" -0.213710 | \n",
" 0.69225 | \n",
" 1 | \n",
"
\n",
" \n",
" 3 | \n",
" -0.375000 | \n",
" 0.50219 | \n",
" 1 | \n",
"
\n",
" \n",
" 4 | \n",
" -0.513250 | \n",
" 0.46564 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" test1 test2 released\n",
"0 0.051267 0.69956 1\n",
"1 -0.092742 0.68494 1\n",
"2 -0.213710 0.69225 1\n",
"3 -0.375000 0.50219 1\n",
"4 -0.513250 0.46564 1"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.head(5)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" test1 | \n",
" test2 | \n",
" released | \n",
"
\n",
" \n",
" \n",
" \n",
" 113 | \n",
" -0.720620 | \n",
" 0.538740 | \n",
" 0 | \n",
"
\n",
" \n",
" 114 | \n",
" -0.593890 | \n",
" 0.494880 | \n",
" 0 | \n",
"
\n",
" \n",
" 115 | \n",
" -0.484450 | \n",
" 0.999270 | \n",
" 0 | \n",
"
\n",
" \n",
" 116 | \n",
" -0.006336 | \n",
" 0.999270 | \n",
" 0 | \n",
"
\n",
" \n",
" 117 | \n",
" 0.632650 | \n",
" -0.030612 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" test1 test2 released\n",
"113 -0.720620 0.538740 0\n",
"114 -0.593890 0.494880 0\n",
"115 -0.484450 0.999270 0\n",
"116 -0.006336 0.999270 0\n",
"117 0.632650 -0.030612 0"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.tail(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"分离训练集和目标分类标签。"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"X = data.iloc[:, :2].values\n",
"y = data.iloc[:, 2].values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"绘制数据,橙点对应有缺陷的芯片,蓝点对应正常芯片。"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZQAAAEWCAYAAABBvWFzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJztvXm8XGWV7/39JSSEwyCZgEiSc0KTVsZMR16GdkaIfZHAi+YGjsogxqGx+3O59hWNreJLBBwug9JN5wIy5EgY+lWxbWUIQdtGlIMmBsKQkDAkMoRAIhAIIWfdP/auUKlUnao6teda389nf2rvZz9777V3Ve31PGutZz0yMxzHcRynVYakLYDjOI5TDFyhOI7jOJHgCsVxHMeJBFcojuM4TiS4QnEcx3EiwRWK4ziOEwmuUJxMI2k3ST+TtEnSLRGc7xeSTo9Ctgav1yXJJO1SY/9XJF2VlDxRIukVSQdEfM6J4XmHDlDHJB0Y5XWdaHCFUiAk7SrpaklPSnpZ0lJJHx6g/hmSfhPRtZ+QdGwU56rgo8C+wGgz+1irJzOzD5vZda2LFQ1m9i0zOzvOa0j6hqSFUZ/XzPYws9XhNa6VdEEE53wqPO+28Lz3SGrp+Uj6a0m3SHohbJj8SdK5AymtGudZIOlRSf2SzmhFpqLiCqVY7AI8DbwXeBvwVeBmSV0pytQqncBjZvZmWgLU6l042UfSXwG/I/hfHGZmbwM+BnQDezZ5umXA54E/RCpkkTAzXwq8AH8CTqlSfhDwOrANeAXYGJbvCnwXeAp4DrgS2C3cNwb4d2Aj8CLwnwSNkhuAfuC18Fz/CxgBLAQ2hPXvB/atIeNBwD1hvYeAE8Py84E3gK3heT9V5dhvALeE13oZWA78NfBl4HmCF8lxZfXvAc4u2/408HB47Apgelj+BPCl8PltIVDWVeUM6+8GfA94EtgE/CYs6wIMOD18pi8A8yrkXxiul+rOBf4MPAN8sazuEUAf8Jfwu/nfDf4Gtl+j0Wcf7hsN/Cy83v3ABcBvyvYbcGAo79bwu3oF+FmV65wPfD9cHwa8Cnyn7Nm9Dowqewa7APMJfp+vh+f9Qdl1PwusDOW+AlCN+1sI/Dzi/9RvgDPS/m9ncUldAF9i/HIDU9HrwDtr7D+j/AURll0C3Bb+ufcMXygXhvsuJFAww8Ll3aU/cvgCPrbsPJ8Jj+0AhgIzgL2qyDAMWAV8BRgOfIDg5f6OcH/Nl2HZ/teB48OX0PXAGmBeeO5PA2vK6t9DqFAIWqrrgHcBCl+OnWX3sxSYEL7w6sl5RXju/cP7PZpAOZdekP8nPM8UAgV1UOX9ldW9EdgdOAxYX3quwG+BT4TrewBHNvg7qPoMG7inReHSARxMoJx3Uijh+rXABQPI8AFgebh+NPA48LuyfcsqnsEuld9XxXX/HdgbmBg+o5k1rvsscGad57NxgOW8KvVdodRY3ORVUCQNA3qB68zskQaPEUFr83+Y2Ytm9jLwLWBOWGUrMI7gpbvVzP7Twn9YFbYStHAPNLNtZvaAmf2lSr0jCV6OF5nZG2Z2N8HL4tQGbxXgP83sdgvMYrcAY8PzbSV4IXZJ2rvKcWcD3zaz+y1glZk9Wbb/cjN72sxeG0hOSUOAs4B/MLN14f3ea2Zbys51vpm9ZmbLCEwnUwa4n/PN7FUzWw78sOxZbAUOlDTGzF4xs/uaeEbVGOiehgKnAF83s81mtgJoxff0W2CypNHAe4Crgf0l7UFgov1Vk+e7yMw2mtlTwBJgao16owl6ejUxs70HWC5qUq62xhVKAQlfcDcQmCDOaeLQsQSt0QckbZS0EfhlWA7wHYIW7R2SVks6b4Bz3QDcDiyS9GdJ3w6VXCVvB542s/6ysicJWvqN8lzZ+mvACxY6dcNtCF6clUwgaCnX4ukG5RxDYOIb6FzPlq1vriFPtes+GV4b4FME5rxHJN0v6YQBztEIA93TWN7yyVWTqylCpdxHoDzeQ6BA7gWOYXAKpdHnuYGgEeQkgCuUghH2Mq4mMHedErbSa1HZu3iB4AV8SFkL7W1mtgeAmb1sZv/TzA4ATgTOlfTBaucKezDnm9nBBCaOE4BPVpHhz8CEUAmWmEhgioqbp4G/GmB/+T0NJOcLBGa3gc7VDBMqrvFnADNbaWanAvsAFwO3Stq9hesMdE/rgTeB8TXkqqSRtOW/IjBvTSPwyfyKwFR5BPDrFs47EHcR9LRqEoYp11q+0uL12wpXKMXjXwgcrR8JW4UD8RwwXtJwgLCl+n+ASyTtAyBpf0nHh+snSDowVFqbCBym/WXn2j4mQdL7JR0Wmk7+QmCuKW8Jl/gdQQvzf0kaJul9wEcITFVxcxXwRUkzFHCgpM4adWvKGT63a4D/LentkoZKOkrSroOU658kdUg6BDgTuAlA0scljQ2vtzGsW+2ZVmOIpBFly6517mkb8P8D3whleSfVGwQldvj+a/Cr8BwrzOwNQv8IgY9rfQvnHYivA0dL+o6k/QDC73lhyQxqQZhyreVbpRNJGi5pBIG/bVj4HP0dWoY/jAIRvgw/Q2BPfrasldVT45C7CSJ7npX0Qlj2JQKz1n2S/kLQwntHuG9yuP0KgU38n81sSbjvQuCroansi8B+wK0EyuRhgpfJDZUChC+WjwAfJmjp/zPwyUb9Pq1gZrcQRBL9iMAZ/ROCYIRqdevJ+UWCCLP7CSLgLmbw/69fEXwHi4HvmtkdYflM4CFJrwCXAXNKjYbwe373AOc8laD3WVoeb+CeziEIP3+W4Lu7kSCgoBpXAweH3/9PatS5lyAwodQbWUHQs6vVOyG8z49KeknS5QPUq4qZPQ4cReDsf0jSJuDfCMxvLzd5ujsInt3RwIJw/T3NylRkShE6juOkTDheaA0wzFIcd1MLSRcD+5lZYpkGnHzhPRTHcaoi6Z2SDg/NgUcQBAX8OG25nOziI4Adx6nFngRmrrcT+DK+B/w0VYmcTOMmL8dxHCcS3OTlOI7jREJbmbzGjBljXV1daYvhOI6TKx544IEXzGxsvXptpVC6urro6+tLWwzHcZxcIenJ+rXc5OU4juNEhCsUx3EcJxJcoTiO4ziR4ArFcRzHiQRXKI7jOE4kuEJxHMdxIsEViuM4jhMJrlAcx3GcSHCF4rQ3a3rhJ13woyHB55retCVynNzSViPlHWcH1vTC7+fCts3B9uYng22ASbXmJHMcpxbeQ3Hal2Xz3lImJbZtDsqLiPfGnJjxHorTvmx+qrnyPOO9MScBvIfitC8dE5srzzPt1htzUsEVitO+TJkPQzt2LBvaEZQXjXbqjTmp4QrFaV8m9cARC6CjE1DwecSCYpqA2qk35qSG+1Cc9mZSTzEVSCVT5u/oQ4Hi9sac1PAeiuO0A+3UG3NSw3sojtMutEtvzEkN76E4juM4kZCqQpF0jaTnJT1YY78kXS5plaQ/SZpetu90SSvD5fTkpC4gPuCtcfxZOU5N0u6hXAvMHGD/h4HJ4TIX+BcASaOArwP/D3AE8HVJI2OVtKiUBrxtfhKwtwa8FflFOVil0I7PynGaIFWFYma/Bl4coMos4HoLuA/YW9I44HjgTjN70cxeAu5kYMXk1KLdBry1ohTa7Vk5TpOk3UOpx/7A02Xba8OyWuU7IWmupD5JfevXr49N0NzSbgPeWlEK7fasHKdJsq5QWsbMFphZt5l1jx07Nm1xskeGBrz19kJXFwwZEnz2xmFJakUpZOhZOU4WybpCWQdMKNseH5bVKneaJSPpR3p7Ye5cePJJMAs+586NQam0ohQy8qwcJ6tkXaHcBnwyjPY6EthkZs8AtwPHSRoZOuOPC8ucZsnIgLd582BzhSVq8+agPFJaUQoZeVaOk1VkZuldXLoReB8wBniOIHJrGICZXSlJwA8IHO6bgTPNrC889izgK+Gp5pvZD+tdr7u72/r6+qK+DScChgwJeiaVSNDfH/HF1vQGPpPNTwU9kynzs68U8iizUxgkPWBm3XXrpalQksYVSkxE8LLr6grMXJV0dsITT0QiZbJEqQAq5zKBoFflvSMnIRpVKFk3eTlZJ6KxGfPnQ0eFJaqjIyjPHVGPV/FwZScnuEJpR6Ic7R3Ry66nBxYsCHokUvC5YEFQnjuiVgAeruzkBE8O2W5EPRVshC+7np6cKpBKolYAHRPD3k6VcifAfUyZwHso7UbUrWcfm7EzUT8TD1ceGE+JkxlcobQbUbee/WW3M1E/Ew9XHhj3MWUGN3m1G1GbT0ovNTc3vEUcz8TnMqmN+5gygyuUdiOOqWD9Zbcz/kySw31MmcFNXu2Gm0+couFm18zgPZR2xFvPTpFws2tmcIXiOE7jZDU81xtJmcBNXo5ThURS6ecND8916uAKxSkUUSiCxFLp5w0Pz3Xq4ArFKQxRKYLEUunnDQ/PdergCsVJjajNSlEpgqdqvB9rlTdNlLnU4jhfLTwrglMHVyhOKsRhVopKEUys8X6sVd4UUfshkvRreHiuUwdXKE4qxGFWikoRxJpKP2o/RJJ+DR/D5NQhVYUiaaakRyWtknRelf2XSFoaLo9J2li2b1vZvtuSldxplTjMSlEpglhT6Ufth0jarzGpB056Ak7rDz4Ho0ySMtE5iZPaOBRJQ4ErgA8Ba4H7Jd1mZitKdczsf5TV/wIwrewUr5nZ1KTkdaJl4sTqMzS2YlYqvfDnzQsU08SJgTIZjCKILZV+1GlC8pZ2JOrpE5xMkWYP5QhglZmtNrM3gEXArAHqnwrcmIhkTuzEZVbq6QmmDO7vDz4zN79K1H6IvPk1PPS40KSpUPYHni7bXhuW7YSkTmAScHdZ8QhJfZLuk3RSfGI6cVCoGRqbIWo/RN78GlGZ6NxslknyknplDnCrmW0rK+s0s3WSDgDulrTczB6vPFDSXGAuwMRIwnScqCjMDI3NEnWakDylHYnCROdms8ySZg9lHTChbHt8WFaNOVSYu8xsXfi5GriHHf0r5fUWmFm3mXWPHTu2VZnzRVytOG8dOoMlChOdm80yS5oK5X5gsqRJkoYTKI2dorUkvRMYCfy2rGykpF3D9THAMcCKymPbmrjGJ6SYz2mggZBZyr2VJVnqkbisUZjofMR+ZknN5GVmb0o6B7gdGApcY2YPSfom0GdmJeUyB1hkZlZ2+EHAv0rqJ1CKF5VHh+WGODO3DtSKa+UacZ23DqWBkKWxK6WBkCVq7RvIpNbbG01EWKNyZs28l5qsrZro8hbZ1kZox/d0senu7ra+vr60xQiotAND0PWPyqH6oyFAte9WwRiCrJ23Dl1d1cOMOzuDz1r7nnii+vkqX6YQRJm1GhgwkJy1ZEmLPMm6A3H/d5ydkPSAmXXXq+cj5dMibjtwXHmXUsrnNNBAyMEMkowrAWTsecAiJGpZEzOfZSWyzX2JO+EKJS3itgPHNT4hpXEPA6VVGUzKlbhe/LHmAYuYKGVNPOV/FCP2W8HnhqmKK5S0iLulH1crLqXW4UADIefPh2HDdtw3bNjAgyTjevHHmgcsYqKUte1S/nukWVVcoaRFEi39JlpxTZkrUmgd1hsIKe1Yv3K7kjhH6udlwGaUsmbW1BeXWcojzariTvk0ycj83HE5qJNisM7lOKK82pVMOvjjdN7/pKtGpFln0MgqGI065V2hONl8GTTBkCGB3b4SKcjp5ZQRUyMmk42SOF/6bRZp5lFeTsNk1lzRIHlyhKdKjI7kTJr64jRLZSXSLGO4QskpUYZo5v2FnCdHeKrE7EhOLNNzo36RJAJf0ow0yyCuUHJI1CGaeX8hZ7J1nEWK4EhuppeVt9T+BcAVSg6JOkSzCC/kzM+D0gCxDwys0TJf+9LEXOQdA5rrZblZKnHcKZ9D3AldPBJxaldxJL+6pYNPX7WAG+/tieeaUZNS6p92x53yBSbvPg9nZxIZGFjRYl/7UucOyiSWa0ZNSql/nMZwhZJD8u7zSIp6JqQspZlPLNKuzJE88QtP7KBMYrtmlLhfJNO4QkmSiEbtFsHnETf1AhcSzz1Vh1q9yyFD4lN4uezpul8k07gPJSnabCBU2tQbrJm1wZzVfCiVRO3fyORgRCeTuA8la3gyuUSpZ0LK2mDOyl7n0KE716nl3xis6c57uk7UeA8lKTw6JVHy1kOppNFIPu9lOEmQix6KpJmSHpW0StJ5VfafIWm9pKXhcnbZvtMlrQyX05OVfBAUPDolSw5uqB+4kPXAhkb9G22XNj4pfPKswWFmqSwE88g/DhwADAeWAQdX1DkD+EGVY0cBq8PPkeH6yHrXnDFjhqXG6oVmizrMenlrWdQRlLdyzh93mvUq+GzlXC2wcKFZR4dZ0KYOlo6OoDxNFi406+w0k4LPSnnq7U+TRp+ptGOd0iKlI3erZOI7ieO/mnOAPmvgvZ6ayUvSUcA3zOz4cPvLAGZ2YVmdM4BuMzun4thTgfeZ2WfC7X8F7jGzGwe6ZuoDG6PM9JohJ3/WzUd5pZH0+kV69pkx37VZavpGyIPJa3/g6bLttWFZJadI+pOkWyVNaPJYJM2V1Cepb/369VHIPXiiTCaXISd/qw7urJnLskIj6WSybrprhsyY74qQ8ywlsh7l9TOgy8wOB+4Ermv2BGa2wMy6zax77NixkQuYGoP80cfx8m5lPEPWxoPkjSJFamUm8q7g/s44SVOhrAMmlG2PD8u2Y2YbzGxLuHkVMKPRYwvPIH70cb28W2klZ6ZVmmOKkBgTMjTQ0kfjD5o0Fcr9wGRJkyQNB+YAt5VXkDSubPNE4OFw/XbgOEkjJY0EjgvL2odB/Ojjenm30krOTKvUSZ00zHdVe+w+Gn/wNOK5j2sB/hZ4jCDaa15Y9k3gxHD9QuAhggiwJcA7y449C1gVLmc2cr1Uo7zioMkoryxGBHV2VpepszM9mZz0SDLKK6vRiVmErEd5pUHqUV4pk8WIoMxE9jhtRxb/D1klD1FeTsJkMSKoSE5lJ1+4uTV6XKG0EVl9eefJqewhzsUhM0EABcIVSpuRp5d31vAQ52KRxR573nGF4jgN4iHOxSKrPfY84055x2mQRjMAO07RcKe840SM29ydyClYVmNXKE6uSNMp7jZ3J1JKCV43PwlY8Pn7ublWKq5QnNyQtlPcbe5OpGQowWtUuELJCR6umg2nuEfJOZFRwKzGu6QtgFOfytHkpZY5tNcLzQeiOYWiY2KNeVfy65TzHkqjpOg8y0LLvFHi7Em5U9wpFAXMauwKpRFSdp7lpWUet4/DneJOoShgVmMfh9IIKU8JmpckdknI2ci0uI7jRIuPQ4mSwTjPIjSR5aVlnkRPyp3ijpNdXKE0QrOzI0ZsIstLuKr7OJy2omCDEqOgpkKRtKek/0/SDyXNrtj3/fhFyxDNOs9iiC/PQ8s8Lz0pJz9kNly+gIMSo2CgHso1wG7Az4EzJN0kaVi475jYJcsSzTrPChhf3gh56Uk5+SDtgawDUsBBiVFQ0ykvaamZTS3b/jpwLMHc7ovNbHrLF5dmApcBQ4GrzOyiiv3nAmcDbwLrgbPM7Mlw3zZgeVj1KTM7sd71EksOmbIT33GKQKaDUX40BKj27hScVrxMoVE45UdI2r7fzM4HrgV+DYyKQMChwBXAh4GDgVMlHVxR7Y9At5kdDtwKfLts32tmNjVc6iqTRClgfLnjJE2mw+Wb9au2CQMplJ8DHywvMLOrgfOorpqb5QhglZmtNrM3gEXArIrrLTGzUr/yPmB8BNeNnwLGlztO0mQ6yMMbjVWpqVDM7H+a2Z1Vyn9uZpMiuPb+wNNl22vDslp8CvhF2fYISX2S7pN0Uq2DJM0N6/WtX7++NYmbYVJPYN46rT/4zKkyyaxT1Ck8mQ7y8EZjVXIRNizp40A38J2y4s7QpncacKmkv6p2rJktMLNuM+seO3ZsAtJmk8Eohkw7RZ3Ck/kgj4I0GqMkTYWyDphQtj0+LNsBSccC84ATzWxLqdzM1oWfq4F7gGlxCptnBqsY8pRDzCkmeQiXd96irkKRtFNG4mplg+B+YLKkSZKGA3OA2yquMw34VwJl8nxZ+UhJu4brYwjCmFdEIFMhGaxiyLRT1HGczNFID+X3DZY1hZm9CZwD3A48DNxsZg9J+qakUtTWd4A9gFskLZVUUjgHAX2SlgFLgIvMzBVKDQarGDLtFHUcJ3PU7GlI2gcYB+wm6TBA4a69gI5axzWDmf0H8B8VZV8rWz+2xnH3AodFIUM7MHFi9Xj+eoph/vwd52GB9J2inhzScbLLQD2U/wb8gMC3cUXZ8hXgn+IXzYmKwUbLZM0p6kECTiUehZgt6qavlzTbzG5OSJ5YSWykfAYpQss+0yOnncSpnMkUgoZSpiLBCkKjI+UbUSjnANeb2V8kXQlMB75sZoujETU52lmhFIEhQ4KeSSVSEAXktBfewEiOKOdDmRsqk+MIfCqfZscUKI6TCB4k4JTjUYjZoxGFUmoT/i1BT2VZg8c5TqRkeuS0kzjewMgejSiGZZL+AzgB+IWkPYgml5fjNEXWggScdPEGRvZoxIcyFJhBkMjxxXAg4QQz+2MSAkaJ+1Acp1gUIdgkFtb0BnOzbH4qyIA8ZX5LqWEa9aHUHfFuZtskHQB8CJhPMOmWm7wcx0mdnh5XIDtRmk2yNAFYaTZJiD3fWCOpV34AvB/4eFj0KnBlnEI5juO0La3OVZ/ibJKN5OQ62symS/ojQGj2Gh6zXI7jOO1HFL2LFKcgb8R0tTWcudEAJI0GPOrfcRwnaqLoXaQ4m2RNhVKWUfgK4N+AsZLOB34DXBy7ZI7jOO1GFL2LFGeTHKiH8nsAM7se+CrwXeAl4GNmtih2ydqBVm2ljuMUiyh6FynOJjmQQillF8bMHjKzy8zsUjN7MHap2oGSrXTzk4C9ZSvNiVLxpHyOEwNR9S5Smk1yIIUyVtK5tZZEpCsyKUZitEoUWX9dITlOFXI+V/1ACmUoweRWe9ZYnFZIMRKjVVqdGtjT0BcfbzA0QaXpG3I7V/1ACuUZM/ummZ1fbYni4pJmSnpU0ipJ51XZv6ukm8L9v5PUVbbvy2H5o5KOj0KeRInAVprWn7bVpHw+V32x8QZDE+Tc9F1JQz6UOAhTulwBfBg4GDhV0sEV1T4FvGRmBwKXEEaXhfXmAIcAM4F/Ds+XH1q0lab5p201KZ9niS023mBoghybvqsxkEL5YMzXPoIgP9hqM3sDWATMqqgzC7guXL8V+KAkheWLzGyLma0BVoXnyw8t2krT/NO2mpRvsArJzSj5wBsMTZBj03c1aioUM3sx5mvvDzxdtr02LKtax8zeBDYBoxs8FgBJcyX1Sepbv359RKJHRAuRGGn+aVvN+jsYheRmlPzgaeWbIMVBiHFQ+CSPZrbAzLrNrHvs2LFpixMZaf9pe3qCWfH6+4PPZhL0DUYhuRklPySeVj7P47lSHIQYB2kqlHXAhLLt8WFZ1TrhyP23ARsaPLbQ5H0uiGYVkptR8kOi89bk3amd8zDhSurOhxLbhQMF8RiBr2YdcD9wmpk9VFbn74DDzOyzkuYA/6+ZzZZ0CPAjAr/J24HFwGQz2zbQNYs2H0o7zQXh84c7VflJV6hMKujoDMzITiREOad8LIQ+kXOA24GHgZvN7CFJ35R0YljtamC0pFXAucB54bEPATcDK4BfAn9XT5kUirCL36MhPHFpF/2P9zZtdsoMDZor8t4jc2KiYE7tvJNaDyUNCtFDqUxvDYHNNY/d5CbvpZ16ZE6DeA8lETLfQ2lLonAeFiluvcl7aSUQwCkoWXVq5zlQoAUamWDLiYKopuUsUhe/SPfipEPpvxPh/Oktk+IUvGnjPZSkiKpnUaS49SLdi5MeKWXWrUmRrAhN4golKaJqjWe1iz8YinQvjlOijXverlCSIqrWeJHi1ot0L45Too173u5DSYop86tHNA2mNT6pJ56X7pre5G3Rcd2L46RFlP/1nOE9lKTIems87yOOnbYnM8lDs/5fjxEfh+IEeDy/k2NKyUPL8711dMSY8qXN8HEoTnO0sSMxLjLTYm4DPHloNnAfihPQMbFGD6X4jsQ4qGwxl9Ltg7eY48CTh2YD76E4AR7CG2mPwlvMOxJ3b63V6Ry8NxkRZtY2y4wZM8wZgNULzX7cadar4HP1wlgus3ChWWenmRR8LoznMk2xcKFZR4dZMH1XsHR0DF42acdzlRYpWrnzQNTPNuprJCFf3gH6rIF3bOov+SQXVyjpk9U/b2dndQXQ2ZmN8+WZpJ7FYBsq/l3Vp1GF4iavLFLgxHLNmoKSMkVEbYP3dPtvkZR/Y7DJQxPzvxT4f13CFUrWKPh4kGb+vEnOIx/1lMqJzlqYcdKerroeichX8P91CVcoWaPgieWa+fMm6diOo0eRtXT7aTmes95bS0S+gv+vS7hCyRoFHw/SzJ83yVDQovcokuztVZLEs21FWSby3Rf8f10ilZHykkYBNwFdwBPAbDN7qaLOVOBfgL2AbcB8M7sp3Hct8F5gU1j9DDNbWu+6uRgp3wYj1hudedHnkY+OIj/LXIySz/n/Ousj5c8DFpvZZGBxuF3JZuCTZnYIMBO4VNLeZfv/0cymhktdZZIazTri8jQeZJBOxkZNQVk3leSJIg/8y8WYnzz9r1sgLYUyC7guXL8OOKmygpk9ZmYrw/U/A88DYxOTMAoG44jLcGK5crPC38/q5c1743UytmqK8MFqb5F1x3gr5EJZZvh/HSVpmbw2mtne4bqAl0rbNeofQaB4DjGz/tDkdRSwhbCHY2Zbahw7F5gLMHHixBlPVuv3x0XOu7nlVJoV1lzaRdfY7N5bLswgCVLk51Fkc15WSN3kJekuSQ9WWWaV1wsHzdTUapLGATcAZ5pZf1j8ZeCdwLuAUcCXah1vZgvMrNvMuseOTbiDUyBHXKVZYeKYbN9bLswgCVLkoAM3jWaH2JJDmtmxtfZJek7SODN7JlQYz9eotxfwc2Cemd1Xdu5nwtUtkn4IfDFC0aMjDwkXG5xUq9J88NQLE2v0ULJxb7kwgyRMT08xFEglpXtqJNDDiZe0fCi3AaeH66cDP62sIGk48GPgejO7tWLfuPBTBP6XB2OVdrBk3RHXhI+n0tb+lZvn8+qW7N7Trq3UAAAU5klEQVRbkX0Gzs5kbcxPu5KWQrkI+JCklcCx4TaSuiVdFdaZDbwHOEPS0nCZGu7rlbQcWA6MAS5IVvwGybojronBVpVmhRvv7eGc6xfwimXz3twM4jjJ4zM2tjM/GkJ195XgtP6dShsdP5IV8iav42SVRp3yrlDamQJFoTmOEx+pR3k5OSDrPh7HcXKFK5R2Jus+HsdxcoXPKd/uTOopngJpMBTacSKnzX97rlCcYlEKhS5Fr5VCoaGt/thOCvhvz01eTsFok3knnAzivz1XKE7BGCDdjSeLdGKlQKmWBosrFKdY1Ej98opNjGWCKVdSznZqpR3KSDqiJHCFkmcGOR9Jbq/bCDVCob9y8/zIk0WmOQtipRyZVGpZ/p3EgYfh+8DG3FLpAITgxxt32G9a122GKpE2Q/6qh2o/dSnI/zQYspA2PbNp6fPwO4mDgkZ5+Uj5KhRKoaQ1yj2no+vjePkPGULkSqpZsqDUqpLT34lTHR8pX3TScgDmzfEYml3WzB/Ck5d1cerRb5ldWk0WmYWMxplN05+334kTCa5Q8kpaDsA8OR7L0vNLxsQxT3LVp+dy2tG9kUwwlYWMxllQalXJ0+/EiQxXKHklLQdgnhyPVcYFdAzfTO8/zotkzowszIKYBaVWlTz9TpzIcIWSV9LKw5Wn/F8JmF3SntgpC0qtKnn6nTiR4U55p7i4Y9hxIsGd8o7jZhfHSZRUFIqkUZLulLQy/BxZo962sul/bysrnyTpd5JWSbopnH/ecXbEzS6Okyhp9VDOAxab2WRgcbhdjdfMbGq4nFhWfjFwiZkdCLwEfCpecZ3cMqknMG+d1h98ZlSZZHa0u+M0QVoKZRZwXbh+HXBSowdKEvAB4NbBHO84WSMrKVzSItfKtN3Sy9QhLYWyr5k9E64/C+xbo94ISX2S7pNUUhqjgY1m9ma4vRbYv9aFJM0Nz9G3fv36SIR3nCiZN4/I84xlgUYURdzKNFZlVTbOCeyt+U/aWKnEFuUl6S5gvyq75gHXmdneZXVfMrOd/CiS9jezdZIOAO4GPghsAu4LzV1ImgD8wswOrSeTR3k5WSQLKVyiptEcY3Gmjok9z1kbRRFmOpeXpEeB95nZM5LGAfeY2TvqHHMt8O/AvwHrgf3M7E1JRwHfMLPj6123mkLZunUra9eu5fXXXx/k3TgjRoxg/PjxDBs2LG1Rcklm83G1QKP3FKcyjf25/mgIUO39qcBnVyAaVShpTQF8G3A6cFH4+dPKCmHk12Yz2yJpDHAM8G0zM0lLgI8Ci2od3yhr165lzz33pKuri8A94zSDmbFhwwbWrl3LpEmT0hYnl8yfX70lnfpo9zJ6ewMT3FNPBWld5s8fuJXfaI6xiROrv/SjSB0TeZ6zykzCw0bB1g0712vj9DJp+VAuAj4kaSVwbLiNpG5JV4V1DgL6JC0DlgAXmdmKcN+XgHMlrSLwqVw9WEFef/11Ro8e7cpkkEhi9OjR3sNrgcyOdg8ZjJ+j0RxjcaaOiTTPWTV/ybaXQRW98jYf55SKQjGzDWb2QTObbGbHmtmLYXmfmZ0drt9rZoeZ2ZTw8+qy41eb2RFmdqCZfczMtrQijyuT1ijk80s4emeHFC5LeunZPblr70TFvf/u5t6mgwYaVRRxKtNIlVW1+eL734Bhe/k4pzLSMnk5TnapnByqFL0D8b8s0rx2jetfeNJcXngBbrx3x+sPZDoqKYQBzWShCalHT9FzafSTUTUkQ6PUyv/2xovw0RcGLWPR8NQrGWDo0KFMnTqVQw89lI985CNs3Lix7jF77LFHApJl57qJUq01um1zUJ6Ba8caClvl+rvvuplvzd753uuZjgZMnJlQyG1kyTs9HX9DuEJpkjj+zLvtthtLly7lwQcfZNSoUVxxxRWtn9QZPGlODlXn2rEPgqxx/Yljdixv2c+RptIeDJ4XriFcoTRBEiOajzrqKNatW7d9+zvf+Q7vete7OPzww/n6179e9ZhadU466SRmzJjBIYccwoIFCwDYtm0bZ5xxBoceeiiHHXYYl1xyCQCPP/44M2fOZMaMGbz73e/mkUceAWDNmjUcddRRHHbYYXz1q1+N7kazTJqt0TrXjn0QZI3rb2ZitH6OvM3o6HnhGsIVShPE/Wfetm0bixcv5sQTg7Rld9xxBytXruT3v/89S5cu5YEHHuDXv/71DscMVOeaa67hgQceoK+vj8svv5wNGzawdOlS1q1bx4MPPsjy5cs588wzAZg7dy7f//73eeCBB/jud7/L5z//eQD+4R/+gc997nMsX76ccePGRXOjWSfN1mida8c+5W+N6+9x9Pxo533JowkpJ3nh0sQVShPE9Wd+7bXXmDp1Kvvttx/PPfccH/rQh4BAWdxxxx1MmzaN6dOn88gjj7By5codjh2ozuWXX86UKVM48sgjefrpp1m5ciUHHHAAq1ev5gtf+AK//OUv2WuvvXjllVe49957+djHPsbUqVP5zGc+wzPPBJlx/uu//otTTz0VgE984hOt3ehgSCNXUpqt0TrXjn3K36Tu3U1IxcTM2maZMWOGVbJixYqdymrR2WkWGLt2XDo7Gz5FVXbffXczM3v11Vftb/7mb+yyyy4zM7Nzzz3XrrzyygGPqVVnyZIldswxx9irr75qZmbvfe97bcmSJWZm9vLLL9utt95qs2bNsjPPPNM2bdpk++23X9XrjBo1yrZu3WpmZps2bdp+3UqaeY4Ns3qh2aIOs17eWhZ1BOVtysKFZh0dO/7+OjqC8tyxeqHZjzvNehV8tvH3mnWAPmvgHes9lCaIe/7ujo4OLr/8cr73ve/x5ptvcvzxx3PNNdfwyiuvALBu3Tqef/75HY6pVWfTpk2MHDmSjo4OHnnkEe677z4AXnjhBfr7+znllFO44IIL+MMf/sBee+3FpEmTuOWWW4CgkbFs2TIAjjnmGBYtWgRAb9JpYPPmuE2ArA+CbAo3IRUOH4fSBJHGtddg2rRpHH744dx444184hOf4OGHH+aoo44CgpDdhQsXss8++2yvf9xxx1WtM3PmTK688koOOugg3vGOd3DkkUcCgcI588wz6Q8TJV144YVAoCw+97nPccEFF7B161bmzJnDlClTuOyyyzjttNO4+OKLmTVrVnQ32gh5c9wmRE9PThWIU3jafk75hx9+mIMOOigliYpDLM+xjbK5Ok6W8TnlnfzjjlvHyRWuUJzs4rH/jpMr3IfiZJtJPa5AHCcneA/FcRzHiQRXKI7jOE4kuEJxHCcfpJE1wWmKVBSKpFGS7pS0MvwcWaXO+yUtLVtel3RSuO9aSWvK9k1N/i6io5S+vrQ8McgJr9/3vvdRCov+1re+FaGEjpMyCaW7d1ojrR7KecBiM5sMLA63d8DMlpjZVDObCnwA2AzcUVblH0v7zWxpIlJDLK2kUvr60tLV1dXyOV2hZBBvYQ8ez5qQC9JSKLOA68L164CT6tT/KPALM9tcp168JNhKeuKJJ3j3u9/N9OnTmT59Ovfeey8A99xzDyeccML2eueccw7XXnvtDseed9552xNO9vT08LWvfY1LL710+/558+Zx2WWXRS6zMwBZaGHnWaF51oRckJZC2dfMngnXnwX2rVN/DnBjRdl8SX+SdImkXWsdKGmupD5JfevXr29BZGJrJZVe/lOnTuXkk08GYJ999uHOO+/kD3/4AzfddBN///d/3/D5Lrroou29nt7eXs466yyuv/56APr7+1m0aBEf//jHW5LZaZK0W9hZUGitkMd0921IbONQJN0F7Fdl1w7/IDMzSTXzv0gaBxwG3F5W/GUCRTQcWAB8CfhmtePNbEFYh+7u7tbyzMTUSiq9/MvZunUr55xzDkuXLmXo0KE89thjgz5/V1cXo0eP5o9//CPPPfcc06ZNY/To0S3J7DRJ2i3sgRRaHsb5TJm/41z34FkTMkhsCsXMjq21T9JzksaZ2TOhwni+Vl1gNvBjM9tadu5S72aLpB8CX4xE6Hp0TKyRWyr6VtIll1zCvvvuy7Jly+jv72fEiBEA7LLLLtsTOwK8/vrrDZ3v7LPP5tprr+XZZ5/lrLPOilxepw4J/naqkrZCa5WS0ls2L5C5Y2KgTPKgDNuItExetwGnh+unAz8doO6pVJi7QiWEJBH4Xx6MQcadSTC31KZNmxg3bhxDhgzhhhtuYNu2bQB0dnayYsUKtmzZwsaNG1m8eHHV44cNG8bWrdt1MCeffDK//OUvuf/++zn++OMjl9epQ9p5yYpgMvJ095knLYVyEfAhSSuBY8NtJHVLuqpUSVIXMAH4VcXxvZKWA8uBMcAFCcicaG6pz3/+81x33XVMmTKFRx55hN133x2ACRMmMHv2bA499FBmz57NtGnTqh4/d+5cDj/8cHrCPOfDhw/n/e9/P7Nnz2bo0KGRy+vUIe28ZGkrNKct8PT1bZK+vr+/n+nTp3PLLbcwefLkyM/fLs8x16zpdZORMygaTV/vySHbgBUrVnDCCSdw8sknx6JMnJzgiTadmHGF0gYcfPDBrF69Om0xHMcpOJ7Li2AOdWfw+PNzHAdcoTBixAg2bNjgL8VBYmZs2LBhe1iz4zjtS9ubvMaPH8/atWtpeRR9GzNixAjGjx+fthiO46RM2yuUYcOGMWnSpLTFcBzHyT1tb/JyHMdxosEViuM4jhMJrlAcx3GcSGirkfKS1gNVMvTFzhjghRSu2whZlg2yLV+WZYNsy5dl2SDb8qUhW6eZja1Xqa0USlpI6mskbUEaZFk2yLZ8WZYNsi1flmWDbMuXZdnc5OU4juNEgisUx3EcJxJcoSTDgrQFGIAsywbZli/LskG25cuybJBt+TIrm/tQHMdxnEjwHorjOI4TCa5QHMdxnEhwhRIRkkZJulPSyvBzZJU675e0tGx5XdJJ4b5rJa0p2zc1SdnCetvKrn9bWfkkSb+TtErSTZKGRyVbo/JJmirpt5IekvQnSf+9bF/kz07STEmPhvd8XpX9u4bPYlX4bLrK9n05LH9U0vGtyjII2c6VtCJ8TosldZbtq/odJyzfGZLWl8lxdtm+08PfwUpJp6cg2yVlcj0maWPZvlifnaRrJD0v6cEa+yXp8lD2P0maXrYv1ufWMGbmSwQL8G3gvHD9PODiOvVHAS8CHeH2tcBH05QNeKVG+c3AnHD9SuBzScsH/DUwOVx/O/AMsHcczw4YCjwOHAAMB5YBB1fU+TxwZbg+B7gpXD84rL8rMCk8z9CEZXt/2e/qcyXZBvqOE5bvDOAHVY4dBawOP0eG6yOTlK2i/heAaxJ8du8BpgMP1tj/t8AvAAFHAr9L4rk1s3gPJTpmAdeF69cBJ9Wp/1HgF2a2OVapApqVbTuSBHwAuHUwxzdIXfnM7DEzWxmu/xl4Hqg7cneQHAGsMrPVZvYGsCiUsZbMtwIfDJ/VLGCRmW0xszXAqvB8iclmZkvKflf3AUnOLdDIs6vF8cCdZvaimb0E3AnMTFG2U4EbI7z+gJjZrwkambWYBVxvAfcBe0saR/zPrWFcoUTHvmb2TLj+LLBvnfpz2PnHOj/syl4iadcUZBshqU/SfSVTHDAa2Ghmb4bba4H9I5StGfkAkHQEQQvz8bLiKJ/d/sDTZdvV7nl7nfDZbCJ4Vo0cG7ds5XyKoFVbotp3HCWNyndK+H3dKmlCk8fGLRuhmXAScHdZcdzPrh615I/7uTVM28+H0gyS7gL2q7JrXvmGmZmkmvHYYaviMOD2suIvE7xMhxPEmX8J+GbCsnWa2TpJBwB3S1pO8KJsmYif3Q3A6WbWHxa39OyKiqSPA93Ae8uKd/qOzezx6meIjZ8BN5rZFkmfIejpfSBhGeoxB7jVzLaVlWXh2WUaVyhNYGbH1ton6TlJ48zsmfCl9/wAp5oN/NjMtpadu9RC3yLph8AXk5bNzNaFn6sl3QNMA/6NoGu9S9gSHw+sa0a2qOSTtBfwc2Be2OUvnbulZ1eFdcCEsu1q91yqs1bSLsDbgA0NHhu3bEg6lkBZv9fMtpTKa3zHUb4U68pnZhvKNq8i8KGVjn1fxbH3JClbGXOAvysvSODZ1aOW/HE/t4Zxk1d03AaUoitOB346QN2dbLPhi7TkszgJqBrpEZdskkaWTEWSxgDHACss8PotIfD51Dw+AfmGAz8msCHfWrEv6md3PzBZQXTbcIKXS2VUT7nMHwXuDp/VbcAcBVFgk4DJwO9blKcp2SRNA/4VONHMni8rr/odRyhbo/KNK9s8EXg4XL8dOC6UcyRwHDv24mOXLZTvnQTO7d+WlSXx7OpxG/DJMNrrSGBT2JiK+7k1ThqRAEVcCOzni4GVwF3AqLC8G7iqrF4XQYtiSMXxdwPLCV6GC4E9kpQNODq8/rLw81Nlxx9A8FJcBdwC7Jr0swM+DmwFlpYtU+N6dgQRNY8RtEDnhWXfJHhJA4wIn8Wq8NkcUHbsvPC4R4EPx/BbqyfbXcBzZc/ptnrfccLyXQg8FMqxBHhn2bFnhc90FXBm0rKF298ALqo4LvZnR9DIfCb8na8l8H99FvhsuF/AFaHsy4HupJ5bo4unXnEcx3EiwU1ejuM4TiS4QnEcx3EiwRWK4ziOEwmuUBzHcZxIcIXiOI7jRIIrFMeJCEmjy7LRPitpXdl2wxmaJZ0lqVpWAST9dwWZhPsVYUZqx4kCHynvOBFhwQjwqQCSvkGQnfa7gzjVWcAfCNLJVLKcYPDmNYMU03FiwxWK4yRAOEfF3xHkG7sXOIfAQvBDAiUkgjxkz4XbN0l6DTjCgsy4AJjZivB8icrvOI3gCsVxYkbSocDJwNFm9qakBQRpPx4HxpjZYWG9vc1so6QvAOeY2dL0pHac5nGF4jjxcyzwLqAv7FnsRpBu/HbgHZIuJ0h6eUdqEjpOBLhCcZz4EcHMf/+00w7pcODDBOawU4C5CcvmOJHhUV6OEz93AbPDLLWlaLCJksYCMrNbgK8RTP8K8DKwZzqiOs7g8R6K48SMmS2XdD5wl6QhBNlkPwtsA64O0+4bwcRgEDjqr6rmlJf0MeASgumPb5fUZ2b/LcHbcZyaeLZhx3EcJxLc5OU4juNEgisUx3EcJxJcoTiO4ziR4ArFcRzHiQRXKI7jOE4kuEJxHMdxIsEViuM4jhMJ/xeWgnxJ7hIItwAAAABJRU5ErkJggg==\n",
"text/plain": [
"