{ "cells": [ { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "E:\\Anaconda\\lib\\site-packages\\statsmodels\\compat\\pandas.py:65: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.\n", " from pandas import Int64Index as NumericIndex\n" ] } ], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import matplotlib as mpl\n", "import pandas as pd\n", "import sympy as sy\n", "import statsmodels.api as sm\n", "\n", "from sympy import init_printing\n", "init_printing() \n", "\n", "plt.style.use('ggplot')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This chapter focuses on the features of sampling distribution of OLS estimator. We would like to know that if OLS can provide us with unbiased, consistent and efficient estimates. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Unbaisedness of OLS " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this chapter, we assume regression model has the form\n", "\n", "$$\n", "\\boldsymbol{y} = \\boldsymbol{X\\beta} +\\boldsymbol{u}, \\quad \\boldsymbol{u}\\sim \\text{IID}(\\boldsymbol{0},\\sigma^2\\mathbf{I}) \n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The notation $\\text{IID}$ means **identically independent distributed**, $\\sigma^2\\mathbf{I}$ is the **covariance matrix**, being a diagonal matrix means regressors are independently distributed. We also assume all diagonal elements are the same, i.e. $\\sigma^2$, which means **homoscedasticity**. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If an estimator $\\theta$ is unbiased, it should always satisfy the condition\n", "\n", "$$\n", "E(\\hat{\\theta}) -\\theta_0 = 0\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To show that the OLS estimator is unbiased, we denote $\\boldsymbol{\\beta}_0$ as the true parameter in the _data generating process_ (DGP), that is to say, we would like to see $E(\\boldsymbol{\\hat{\\beta}})=\\boldsymbol{\\beta}_0$.\n", "\n", "To show the conditions that makes OLS unbiased, we substitute DGP back in OLS formula\n", "\n", "\\begin{align} \n", "\\hat{\\boldsymbol{\\beta}}&= (\\boldsymbol{X}^T\\boldsymbol{X})^{-1}\\boldsymbol{X}^T\\boldsymbol{y}\\\\\n", "&=(\\boldsymbol{X}^T\\boldsymbol{X})^{-1}\\boldsymbol{X}^T(\\boldsymbol{X}\\boldsymbol{\\beta}_0+u)\\\\\n", "& = \\boldsymbol{\\beta}_0 + (\\boldsymbol{X}^T\\boldsymbol{X})^{-1}\\boldsymbol{X}^T\\boldsymbol{u}\n", "\\end{align}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we assume $\\boldsymbol{X}$ to be _nonstochastic_, also with assumption $E(\\boldsymbol{u}) = 0$, we obtain" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\\begin{align}\n", "E(\\boldsymbol{\\hat{\\beta}}) &= \\boldsymbol{\\beta}_0 + (\\boldsymbol{X}^T\\boldsymbol{X})^{-1}\\boldsymbol{X}^T\\boldsymbol{u}\\\\\n", " &= \\boldsymbol{\\beta}_0 + (\\boldsymbol{X}^T\\boldsymbol{X})^{-1}\\boldsymbol{X}^TE(\\boldsymbol{u})\\\\\n", " &=\\boldsymbol{\\beta}_0\n", "\\end{align}\n", "However nonstochastic assumption is limited mostly in cross section data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Second, we can assume $\\boldsymbol{X}$ **exogenous**, such that $E(\\boldsymbol{u}|\\boldsymbol{X}) = 0$, also proves $E(\\boldsymbol{\\hat{\\beta}})=\\boldsymbol{\\beta}_0$. Exogeneity simply means that the randomness of $\\boldsymbol{X}$ has nothing to do with $\\boldsymbol{u}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have even weaker version of exogeneity, $E(u_t|\\boldsymbol{X}_t) = 0$, which excludes the possibility that $u_t$ might depend on $X_{t-1}$ or $X_{t-2}$, etc. It is called **predeterminedness condition**, which is suitable for time series data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, for time series data, OLS shall be used with cautions, it is widely known that OLS is not suitable for $\\text{ARMA}$ model, because $\\boldsymbol{y}$ has lagged dependent variables.\n", "\n", "For $\\text{VAR}$, OLS is a common practice, because it doesn't differentiate endogenous and exogenous variables." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "