{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 基础概念" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 均值回归局限\n", "\n", "只有满足古典假定,估计量才具有优良性质:BLUE\n", "\n", "
\n", "\n", "## 为什么需要分位数回归\n", "\n", "在迄今为止的同归模型中,我们着重考察解释变量 x 对被解释变量 y 的条件期望 E $(y | \\boldsymbol{x})$ 的 影响,实际上是均值回归。但我们真正关心的是 x 对整个条件分布 $y | x$ 的影响,而条件期望 $E(y| \\boldsymbol{x})$ 只是刻画条件分布 $y | \\boldsymbol{x}$ 集中趋势的一个指标而已。如果条件分布 $y | \\boldsymbol{x}$ 不是对称分布,则条件期望 E( $y | \\boldsymbol{x}$ )很难反映整个条件分布的全貌。如果能够估计出条件分布 $y | x$ 的若干重要的条件分位数,比如中位数、1/4 分位数 ,3/4 分位数,就能对条件分布 $y | \\boldsymbol{x}$ 有更全面的认识。另一方面, 使用 OLS 的古典“均值同归”,由于最小化的目标函数为残差平方和 $\\left(\\sum_{i=1}^{n} e_{i}^{2}\\right),$ 故容易受极端值的影响。 \n", "\n", "为此, Koenker and Bassett( 1978 ) 提出“分位数同归”(Quantile Regression,简记 QR ) ,使用残差绝对值的加权平均(比如 $\\sum_{i=1}^{n}\\left|e_{i}\\right|$ ) 作为最小化的目标函数,故不易受极端值影响,较为稳健。更重要的是,分位数回归还能提供关于条件分布 $y | \\boldsymbol{x}$ 的全面信息。\n", "\n", "
\n", "\n", "## 原理\n", "\n", "### 模型表示\n", "\n", "$Q_{t}(y | x)=x^{T} \\beta_{\\tau}$\n", "\n", "其中 $\\tau$ 为分位点, $\\beta_{\\tau}$ 为依赖于分位点的回归系数\n", "\n", "### 损失函数\n", "\n", "* 二次损失:生成均值 $\n", " E(Y)=\\underset{y}{\\operatorname{argmin}} E(Y-y)^{2}$\n", "* 绝对值损失:生成中位数 $A L(u)=|u|\n", " $\n", "* 非对称绝对值损失:生成分位数 $\\quad \\rho_{\\tau}(u)=u(\\tau-I(u<0)) \\quad Q_{\\tau}(Y)=\\underset{y}{\\operatorname{argmin}} E \\rho_{\\tau}(Y-y)$\n", "\n", "
\n", "\n", "### 估计方法\n", "\n", "由于分位数回归的目标函数带有绝对值,不可微分,故通常使用线性规划。\n", "\n", "详情可参考:[statsmodels 官方文档](https://www.statsmodels.org/stable/generated/statsmodels.regression.quantile_regression.QuantReg.html#statsmodels.regression.quantile_regression.QuantReg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 使用 statsmodels 库实现\n", "### 第一种方法" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
QuantReg Regression Results
Dep. Variable: data['hs300'] Pseudo R-squared: 0.7878
Model: QuantReg Bandwidth: 57.44
Method: Least Squares Sparsity: 149.5
Date: Sat, 11 Jul 2020 No. Observations: 460
Time: 22:06:40 Df Residuals: 458
Df Model: 1
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err t P>|t| [0.025 0.975]
Intercept -173.7703 45.949 -3.782 0.000 -264.066 -83.474
data['sz'] 1.2845 0.016 82.112 0.000 1.254 1.315


The condition number is large, 3.55e+04. This might indicate that there are
strong multicollinearity or other numerical problems." ], "text/plain": [ "\n", "\"\"\"\n", " QuantReg Regression Results \n", "==============================================================================\n", "Dep. Variable: data['hs300'] Pseudo R-squared: 0.7878\n", "Model: QuantReg Bandwidth: 57.44\n", "Method: Least Squares Sparsity: 149.5\n", "Date: Sat, 11 Jul 2020 No. Observations: 460\n", "Time: 22:06:40 Df Residuals: 458\n", " Df Model: 1\n", "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "Intercept -173.7703 45.949 -3.782 0.000 -264.066 -83.474\n", "data['sz'] 1.2845 0.016 82.112 0.000 1.254 1.315\n", "==============================================================================\n", "\n", "The condition number is large, 3.55e+04. This might indicate that there are\n", "strong multicollinearity or other numerical problems.\n", "\"\"\"" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "import statsmodels.formula.api as smf\n", "\n", "data = pd.read_excel('../数据/上证指数与沪深300.xlsx')\n", "\n", "mod = smf.quantreg(\"data['hs300'] ~ data['sz']\", data)\n", "res = mod.fit(q=0.3)\n", "res.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 第二种方法" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
QuantReg Regression Results
Dep. Variable: hs300 Pseudo R-squared: 0.7878
Model: QuantReg Bandwidth: 57.44
Method: Least Squares Sparsity: 149.5
Date: Sat, 11 Jul 2020 No. Observations: 460
Time: 22:06:40 Df Residuals: 458
Df Model: 1
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err t P>|t| [0.025 0.975]
const -173.7703 45.949 -3.782 0.000 -264.066 -83.474
sz 1.2845 0.016 82.112 0.000 1.254 1.315


The condition number is large, 3.55e+04. This might indicate that there are
strong multicollinearity or other numerical problems." ], "text/plain": [ "\n", "\"\"\"\n", " QuantReg Regression Results \n", "==============================================================================\n", "Dep. Variable: hs300 Pseudo R-squared: 0.7878\n", "Model: QuantReg Bandwidth: 57.44\n", "Method: Least Squares Sparsity: 149.5\n", "Date: Sat, 11 Jul 2020 No. Observations: 460\n", "Time: 22:06:40 Df Residuals: 458\n", " Df Model: 1\n", "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "const -173.7703 45.949 -3.782 0.000 -264.066 -83.474\n", "sz 1.2845 0.016 82.112 0.000 1.254 1.315\n", "==============================================================================\n", "\n", "The condition number is large, 3.55e+04. This might indicate that there are\n", "strong multicollinearity or other numerical problems.\n", "\"\"\"" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import statsmodels.regression.quantile_regression as qr\n", "import statsmodels.api as sm\n", " \n", "X = sm.add_constant(data['sz'])\n", "mod = qr.QuantReg(data['hs300'], X)\n", "res = mod.fit(q=0.3)\n", "res.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## matlab实现\n", "可以参考 matlab 代码:[分位数回归](https://github.com/lei940324/econometrics/tree/master/matlab代码/分位数回归/分布滞后模型回归)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }