{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Granger因果检验\n", "## 模型介绍\n", "经济学中常常需要确定因果关系究竟是从 x 到 y,还是从 y 到 x,亦或双向因果关系。格兰杰提出的检验方法基于以下思想。如果 x 是 y 的因,但 y 不是 x 的因,则 x 的过去值可以帮助预测 y 的未来值,但 y 的过去值却不能帮助预测 x 的未来值。考虑以下时间序列模型:\n", "\n", "$$y_{t}=\\gamma+\\sum_{m=1}^{p} \\alpha_{m} y_{t-m}+\\sum_{m=1}^{p} \\beta_{m} x_{t-m}+\\varepsilon_{t}\\tag{1}$$\n", "\n", "其中,滞后阶数 $p$ 可根据“信息准则”或“由大到小的序贯 $t$ 规则”来确定。检验原假设“ $H_{0}: \\beta_{1}=$ $=\\beta_{p}=0 \",$ 即 $x$ 的过去值对预测 $y$ 的未来值没有帮助。如果拒绝 $H_{0},$ 则称 $x$ 是 $y$ 的“格兰杰因\"( Granger cause)。将以上回归模型中 x 与 y 的位置互换,则可以检验 $y$ 是否为 $x$ 的格兰杰因。 在实际操作中,常将( $x, y$ )构 成一个二元 VAR 系统,然后在 VAR 的框架进行格兰杰因果检验。\n", "\n", "需要指出的是,格兰杰因果关系并非真正意义上的因果关系。它充其量只是一种动态相关关系,表明的是一个变量是否对另一变量有\"预测能力\"( predictability)。从某种意义上来说,它顶多是因果关系的必要条件(如果不考虑非线性的因果关系)。另外,格兰杰因果关系也可能由第三个变量所引起。在经济学的实证研究中,由于通常不可能进行\"控制实验\",能够最有说服力地说明因果关系的当属随机实验与自然实验。\n", "\n", "另外,格兰杰因果检验仅适用于平稳序列,或者有协整关系的单位根过程。对于不存在协整关系的单位根变量,则只能先差分,得到平稳序列后再进行格兰杰因果检验。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 原理讲解\n", "### wald 检验介绍\n", "自回归分布滞后模型做 wald 检验\n", "\n", "对于线性同归模型,检验原假设 $H_{0}: \\boldsymbol{\\beta}=\\boldsymbol{\\beta}_{0}$ 其中 $\\boldsymbol{\\beta}_{K \\times 1}$ 为未知参数 $, \\boldsymbol{\\beta}_{0}$ 已知,共有 $K$ 个约束。\n", "\n", "沃尔德检验:通过研究 $\\boldsymbol{\\beta}$ 的无约束估计量 $\\hat{\\boldsymbol{\\beta}}_{v}$ 与 $\\boldsymbol{\\beta}_{0}$ 的距离来进行检验。其基本思想是,如果 $H_{0}$ 正确,则 $\\left(\\hat{\\boldsymbol{\\beta}}_{v}-\\boldsymbol{\\beta}_{0}\\right)$ 的绝对值不应该很大。沃尔德统计量为:\n", "\n", "$$W \\equiv\\left(\\hat{\\boldsymbol{\\beta}}_{v}-\\boldsymbol{\\beta}_{0}\\right)^{\\prime}\\left[\\operatorname{Var}\\left(\\hat{\\boldsymbol{\\beta}}_{v}\\right)\\right]^{-1}\\left(\\hat{\\boldsymbol{\\beta}}_{v}-\\boldsymbol{\\beta}_{0}\\right) \\stackrel{d}{\\longrightarrow} \\chi^{2}(K)\\tag{2}$$\n", "\n", "其中,$K$ 为约束条件的个数(即解释变量的个数)。\n", "\n", "### Granger 因果检验\n", "$H_{0}: \\quad \\beta_{1}=\\beta_{2}=0$\n", "\n", "$H_{0}:\\left(\\begin{array}{c}\\beta_{1} \\\\ \\beta_{2}\\end{array}\\right)=\\underbrace{\\left(\\begin{array}{cccc}0 & 0 & 1 & 0 \\\\ 0 & 0 & 0 & 1\\end{array}\\right)}_{\\mathrm{R}}\\left(\\begin{array}{c}\\gamma \\\\ \\alpha_{1} \\\\ \\beta_{1} \\\\ \\beta_{2}\\end{array}\\right)=\\left(\\begin{array}{c}0 \\\\ 0\\end{array}\\right)$\n", "\n", "$H_0\\text{:}\\underset{m\\times K}{\\underbrace{R}}\\underset{K\\times 1}{\\underbrace{\\beta }}=\\underset{m\\times 1}{\\underbrace{r}}$\n", "\n", "根据沃尔德检验原理,由于 $\\hat{\\boldsymbol{\\beta}}$ 是 $\\beta$ 的估计量,故如果 $H_{0}$ 成立, 则 $(\\boldsymbol{R} \\hat{\\boldsymbol{\\beta}}-\\boldsymbol{r})$ 应比较接近 0 向量\n", "\n", "这种接近程度可用其二次型来衡量,比如\n", "\n", "$$(R \\hat{\\beta}-r)^{\\prime}[\\operatorname{Var}(R \\hat{\\beta}-r)]^{-1}(R \\hat{\\beta}-r)\\tag{3}$$\n", "\n", "其中,$(\\boldsymbol{R} \\hat{\\boldsymbol{\\beta}}-\\boldsymbol{r})$ 的协方差矩阵可写为\n", "\n", "$\n", "\\begin{aligned}\n", "\\text{Var}\\left( \\boldsymbol{R\\hat{\\beta}}-\\boldsymbol{r} \\right) &=\\text{Var}\\left( \\boldsymbol{R\\hat{\\beta}} \\right) \\,\\, \\left( \\,\\,\\text{去掉常数,方差不变} \\right)\\\\\n", "&=\\boldsymbol{R}\\text{Var}\\left( \\boldsymbol{\\hat{\\beta}} \\right) \\boldsymbol{R}^{\\prime}\\,\\, \\left( \\,\\,\\text{夹心估计量的公式} \\right)\\\\\n", "&=\\sigma ^2\\boldsymbol{R}\\left( \\boldsymbol{X}^{\\prime}\\boldsymbol{X} \\right) ^{-1}\\boldsymbol{R}^{\\prime}\\,\\,\\left( \\,\\,\\text{因}为 \\text{Var}\\left( \\boldsymbol{\\hat{\\beta}} \\right) =\\sigma ^2\\left( \\boldsymbol{X}^{\\prime}\\boldsymbol{X} \\right) ^{-1} \\right)\\\\\n", "\\end{aligned}\n", "$\n", "\n", "其中, $\\sigma^{2}$ 可由 $s^{2}$ 来估计。" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'<F test: F=array([[7.30924536]]), p=0.000751239087419008, df_denom=453, df_num=2>'" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import re\n", "import pandas as pd\n", "import statsmodels.api as sm\n", "\n", "data = pd.read_excel('../数据/上证指数与沪深300.xlsx')\n", "Y = data['hs300']\n", "X = data['sz']\n", "\n", "def lag_list(Y, X, p=1, q=1):\n", " '''\n", " 待估计方程:y = c + y(-1) +....+y(-p) + x(-1) + ... + x(-q)\n", " 获取自回归分布滞后模型的估计向量\n", "\n", " Parameters\n", " ----------\n", " Y : 被估计变量\n", " X : 估计变量\n", " p : ADL 模型 Y 的滞后阶数,标量默认为1\n", " q : ADL 模型 X 的滞后阶数,标量默认为1\n", "\n", " Returns\n", " -------\n", " ADLy : ADL 模型被解释变量\n", " ADLx : ADL 模型解释变量\n", "\n", " '''\n", " ADLx = pd.DataFrame()\n", " T = len(Y)\n", " ADLy = list(Y[max(p, q):T])\n", " for i in range(1, p+1):\n", " name = f'y_{i}'\n", " ADLx[name] = list(Y[max(p, q)-i:T-i])\n", " for i in range(1, q+1):\n", " name = f'x_{i}'\n", " ADLx[name] = list(X[max(p, q)-i:T-i])\n", " return ADLy, ADLx\n", "\n", "p = 2\n", "q = 2\n", "ADLy, ADLx = lag_list(Y, X ,p, q)\n", "ADLx = sm.add_constant(ADLx)\n", "mod = sm.OLS(ADLy, ADLx)\n", "res = mod.fit()\n", "# res.summary()\n", "\n", "wald = ''\n", "for i, value in enumerate(ADLx):\n", " if i > p:\n", " wald += f'{value}='\n", "wald = wald + '0'\n", "wald = str(res.f_test(wald))\n", "walds = float(re.findall('array\\(\\[\\[(.*?)\\]\\]', wald)[0])\n", "wald" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## matlab实现\n", "```matlab\n", "function [beta,wald,wald_P,AIC,p,q,cov_mat] = Granger(Y,X)\n", "\n", "%待估计方程:y = c + y(-1) +....+y(-p) + x(-1) + ... + x(-q)\n", "% 获取Granger因果检验wald值与p值\n", "% X - 解释变量,列向量\n", "% Y - 被解释变量,列向量\n", "% p - ADL模型Y的滞后阶数,标量\n", "% q - ADL模型X的滞后阶数,标量\n", "% wald - wald统计量\n", "% wald_P - wald统计量的P值\n", "% AIC - 最小AIC值\n", "\n", "% 1.计算各滞后阶的AIC值\n", "AIC = zeros(5,5);\n", "for p = 1:5\n", " for q = 1:5\n", " [ADLy,ADLx] = ADLxx(Y,X,p,q);\n", " [~,~,resid] = regress(ADLy,ADLx);\n", " T = length(ADLy);\n", " AIC(p,q) = log(resid'*resid/T)+2*(p+q+1)/T;\n", " end\n", "end\n", "\n", "% 2.找到最小的AIC值所对应的阶数\n", "[p,q] = find(AIC == min(min(AIC)));\n", "[ADLy,ADLx] = ADLxx(Y,X,p,q);\n", "[beta,~,resid] = regress(ADLy,ADLx);\n", " \n", "% 3.计算wald统计量\n", "R = [zeros(q,1+p),eye(q)];\n", "% 3.1 计算协方差矩阵\n", "T = length(ADLy);\n", "nu = T-(1+p+q);\n", "siga2 = sum(resid.^2)/nu;\n", "cov_mat = siga2*inv(ADLx'*ADLx);\n", "Rcov_mat = siga2*R*inv(ADLx'*ADLx)*R';\n", "% 3.2 计算wald\n", "wald = (R*beta)'*inv(Rcov_mat)*(R*beta);\n", "wald_P = 1 - chi2cdf(wald,q);\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }