{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 基础概念" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 均值回归局限\n", "\n", "只有满足古典假定,估计量才具有优良性质:BLUE\n", "\n", "<div align=center><img src=\"https://gitee.com/lei940324/picture/raw/master/img/2020/0619/205104.png\" width=\"683\" ></div>\n", "\n", "## 为什么需要分位数回归\n", "\n", "在迄今为止的同归模型中,我们着重考察解释变量 x 对被解释变量 y 的条件期望 E $(y | \\boldsymbol{x})$ 的 影响,实际上是均值回归。但我们真正关心的是 x 对整个条件分布 $y | x$ 的影响,而条件期望 $E(y| \\boldsymbol{x})$ 只是刻画条件分布 $y | \\boldsymbol{x}$ 集中趋势的一个指标而已。如果条件分布 $y | \\boldsymbol{x}$ 不是对称分布,则条件期望 E( $y | \\boldsymbol{x}$ )很难反映整个条件分布的全貌。如果能够估计出条件分布 $y | x$ 的若干重要的条件分位数,比如中位数、1/4 分位数 ,3/4 分位数,就能对条件分布 $y | \\boldsymbol{x}$ 有更全面的认识。另一方面, 使用 OLS 的古典“均值同归”,由于最小化的目标函数为残差平方和 $\\left(\\sum_{i=1}^{n} e_{i}^{2}\\right),$ 故容易受极端值的影响。 \n", "\n", "为此, Koenker and Bassett( 1978 ) 提出“分位数同归”(Quantile Regression,简记 QR ) ,使用残差绝对值的加权平均(比如 $\\sum_{i=1}^{n}\\left|e_{i}\\right|$ ) 作为最小化的目标函数,故不易受极端值影响,较为稳健。更重要的是,分位数回归还能提供关于条件分布 $y | \\boldsymbol{x}$ 的全面信息。\n", "\n", "<div align=center><img src=\"https://gitee.com/lei940324/picture/raw/master/img/2020/0619/205146.png\" width=\"750\" ></div>\n", "\n", "## 原理\n", "\n", "### 模型表示\n", "\n", "$Q_{t}(y | x)=x^{T} \\beta_{\\tau}$\n", "\n", "其中 $\\tau$ 为分位点, $\\beta_{\\tau}$ 为依赖于分位点的回归系数\n", "\n", "### 损失函数\n", "\n", "* 二次损失:生成均值 $\n", " E(Y)=\\underset{y}{\\operatorname{argmin}} E(Y-y)^{2}$\n", "* 绝对值损失:生成中位数 $A L(u)=|u|\n", " $\n", "* 非对称绝对值损失:生成分位数 $\\quad \\rho_{\\tau}(u)=u(\\tau-I(u<0)) \\quad Q_{\\tau}(Y)=\\underset{y}{\\operatorname{argmin}} E \\rho_{\\tau}(Y-y)$\n", "\n", "<div align=center><img src=\"https://gitee.com/lei940324/picture/raw/master/img/2020/0619/205807.png\" width=\"547\" ></div>\n", "\n", "### 估计方法\n", "\n", "由于分位数回归的目标函数带有绝对值,不可微分,故通常使用线性规划。\n", "\n", "详情可参考:[statsmodels 官方文档](https://www.statsmodels.org/stable/generated/statsmodels.regression.quantile_regression.QuantReg.html#statsmodels.regression.quantile_regression.QuantReg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 使用 statsmodels 库实现\n", "### 第一种方法" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<table class=\"simpletable\">\n", "<caption>QuantReg Regression Results</caption>\n", "<tr>\n", " <th>Dep. Variable:</th> <td>data['hs300']</td> <th> Pseudo R-squared: </th> <td> 0.7878</td>\n", "</tr>\n", "<tr>\n", " <th>Model:</th> <td>QuantReg</td> <th> Bandwidth: </th> <td> 57.44</td>\n", "</tr>\n", "<tr>\n", " <th>Method:</th> <td>Least Squares</td> <th> Sparsity: </th> <td> 149.5</td>\n", "</tr>\n", "<tr>\n", " <th>Date:</th> <td>Sat, 11 Jul 2020</td> <th> No. Observations: </th> <td> 460</td> \n", "</tr>\n", "<tr>\n", " <th>Time:</th> <td>22:06:40</td> <th> Df Residuals: </th> <td> 458</td> \n", "</tr>\n", "<tr>\n", " <th> </th> <td> </td> <th> Df Model: </th> <td> 1</td> \n", "</tr>\n", "</table>\n", "<table class=\"simpletable\">\n", "<tr>\n", " <td></td> <th>coef</th> <th>std err</th> <th>t</th> <th>P>|t|</th> <th>[0.025</th> <th>0.975]</th> \n", "</tr>\n", "<tr>\n", " <th>Intercept</th> <td> -173.7703</td> <td> 45.949</td> <td> -3.782</td> <td> 0.000</td> <td> -264.066</td> <td> -83.474</td>\n", "</tr>\n", "<tr>\n", " <th>data['sz']</th> <td> 1.2845</td> <td> 0.016</td> <td> 82.112</td> <td> 0.000</td> <td> 1.254</td> <td> 1.315</td>\n", "</tr>\n", "</table><br/><br/>The condition number is large, 3.55e+04. This might indicate that there are<br/>strong multicollinearity or other numerical problems." ], "text/plain": [ "<class 'statsmodels.iolib.summary.Summary'>\n", "\"\"\"\n", " QuantReg Regression Results \n", "==============================================================================\n", "Dep. Variable: data['hs300'] Pseudo R-squared: 0.7878\n", "Model: QuantReg Bandwidth: 57.44\n", "Method: Least Squares Sparsity: 149.5\n", "Date: Sat, 11 Jul 2020 No. Observations: 460\n", "Time: 22:06:40 Df Residuals: 458\n", " Df Model: 1\n", "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "Intercept -173.7703 45.949 -3.782 0.000 -264.066 -83.474\n", "data['sz'] 1.2845 0.016 82.112 0.000 1.254 1.315\n", "==============================================================================\n", "\n", "The condition number is large, 3.55e+04. This might indicate that there are\n", "strong multicollinearity or other numerical problems.\n", "\"\"\"" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "import statsmodels.formula.api as smf\n", "\n", "data = pd.read_excel('../数据/上证指数与沪深300.xlsx')\n", "\n", "mod = smf.quantreg(\"data['hs300'] ~ data['sz']\", data)\n", "res = mod.fit(q=0.3)\n", "res.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 第二种方法" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<table class=\"simpletable\">\n", "<caption>QuantReg Regression Results</caption>\n", "<tr>\n", " <th>Dep. Variable:</th> <td>hs300</td> <th> Pseudo R-squared: </th> <td> 0.7878</td>\n", "</tr>\n", "<tr>\n", " <th>Model:</th> <td>QuantReg</td> <th> Bandwidth: </th> <td> 57.44</td>\n", "</tr>\n", "<tr>\n", " <th>Method:</th> <td>Least Squares</td> <th> Sparsity: </th> <td> 149.5</td>\n", "</tr>\n", "<tr>\n", " <th>Date:</th> <td>Sat, 11 Jul 2020</td> <th> No. Observations: </th> <td> 460</td> \n", "</tr>\n", "<tr>\n", " <th>Time:</th> <td>22:06:40</td> <th> Df Residuals: </th> <td> 458</td> \n", "</tr>\n", "<tr>\n", " <th> </th> <td> </td> <th> Df Model: </th> <td> 1</td> \n", "</tr>\n", "</table>\n", "<table class=\"simpletable\">\n", "<tr>\n", " <td></td> <th>coef</th> <th>std err</th> <th>t</th> <th>P>|t|</th> <th>[0.025</th> <th>0.975]</th> \n", "</tr>\n", "<tr>\n", " <th>const</th> <td> -173.7703</td> <td> 45.949</td> <td> -3.782</td> <td> 0.000</td> <td> -264.066</td> <td> -83.474</td>\n", "</tr>\n", "<tr>\n", " <th>sz</th> <td> 1.2845</td> <td> 0.016</td> <td> 82.112</td> <td> 0.000</td> <td> 1.254</td> <td> 1.315</td>\n", "</tr>\n", "</table><br/><br/>The condition number is large, 3.55e+04. This might indicate that there are<br/>strong multicollinearity or other numerical problems." ], "text/plain": [ "<class 'statsmodels.iolib.summary.Summary'>\n", "\"\"\"\n", " QuantReg Regression Results \n", "==============================================================================\n", "Dep. Variable: hs300 Pseudo R-squared: 0.7878\n", "Model: QuantReg Bandwidth: 57.44\n", "Method: Least Squares Sparsity: 149.5\n", "Date: Sat, 11 Jul 2020 No. Observations: 460\n", "Time: 22:06:40 Df Residuals: 458\n", " Df Model: 1\n", "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "const -173.7703 45.949 -3.782 0.000 -264.066 -83.474\n", "sz 1.2845 0.016 82.112 0.000 1.254 1.315\n", "==============================================================================\n", "\n", "The condition number is large, 3.55e+04. This might indicate that there are\n", "strong multicollinearity or other numerical problems.\n", "\"\"\"" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import statsmodels.regression.quantile_regression as qr\n", "import statsmodels.api as sm\n", " \n", "X = sm.add_constant(data['sz'])\n", "mod = qr.QuantReg(data['hs300'], X)\n", "res = mod.fit(q=0.3)\n", "res.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## matlab实现\n", "可以参考 matlab 代码:[分位数回归](https://github.com/lei940324/econometrics/tree/master/matlab代码/分位数回归/分布滞后模型回归)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }