{ "cells": [ { "cell_type": "markdown", "id": "7206877b", "metadata": { "kernel": "SoS" }, "source": [ "# Causal Graphs and Omitted Variables\n", "\n", "The issue of causality creates a large, but not necessarily insurmountable, gap between theoretical and empirical work.\n", "\n", "## Structural Models\n", "\n", "Consider a set of variables $\\{Y,X\\}$ for $Y$ a vector of length $N$ and $X$ a matrix with dimension $N \\times K$. Let the vector $U$ denote random, unobservable noise with mean zero. The unit of observation is indexed by $i$. A **structural model** specifies a function $g()$ such that\n", "\\begin{equation}\n", " g(Y,X,U|\\theta)=0. \\label{structural}\n", "\\end{equation}\n", "\n", "If there is a unique solution for $Y$ given $\\{X,U\\}$, we can write\n", "\\begin{equation}\n", " Y = h(X,U|\\beta) \\label{reduced}\n", "\\end{equation}\n", "where $h()$ is referred to as the **reduced form** of $g()$. The reduced form parameters, $\\beta$, are functions of $\\theta$.\n", "\n", "A theoretical model will yield either $\\eqref{structural}$ or $\\eqref{reduced}$. If the random, unobservable noise $U$ is additively separable from $X$ under $h()$ then we can write the reduced form equation as\n", "\\begin{equation}\n", " Y = f(X|\\beta) + U. \\label{reduced2}\n", "\\end{equation}\n", "When $X$ and $U$ are uncorrelated, $\\beta$ can be estimated econometrically. When $X$ and $U$ are correlated, $X$ is **endogenous** with $U$ and parameters $\\beta$ cannot necessarily be consistently estimated. Linear regression, for instance, will always fail to produce consistent estimates. Later sections of this chapter will discuss special cases in which a consistent estimate for $\\beta$ can be achieved despite the correlation between $X$ and $U$.\n", "\n", "Many theory models describe an error process such that the reduced form can be written as in $\\eqref{reduced2}$. The simplest form of $f()$ is a linear function such that the reduced form can be written as\n", "\\begin{equation}\n", " Y = X \\beta + U. \\label{ols}\n", "\\end{equation}\n", "\n", "A model is **identified** if there is a unique parameter vector $\\theta$ that satisfies equation $\\eqref{structural}$. Some empiricists have co-opted the word \"identification\" to mean something about causality, a problem pointed out by [Kahn and Whited](https://academic.oup.com/rcfs/article/7/1/1/4590088). As an example of an identification problem, suppose that the structural model $\\eqref{structural}$ has the reduced form given in $\\eqref{ols}$. If $X$ is not of full rank, an infinite number of $\\beta$ vectors can be specified to satisfy the reduced form equation. Instead, if $X$ has full rank, the vector $\\beta$ is identified (has a unique value)." ] }, { "cell_type": "markdown", "id": "02e7aac2", "metadata": { "kernel": "SoS" }, "source": [ "Any model has a graphical representation. See [Pearl (2009)](http://bayes.cs.ucla.edu/BOOK-2K/) for a treatise on causal graphical modeling." ] }, { "cell_type": "code", "execution_count": 25, "id": "6a7acef7", "metadata": { "kernel": "Python3" }, "outputs": [ { "data": { "image/svg+xml": [ "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "%3\r\n", "\r\n", "\r\n", "x\r\n", "\r\n", "x\r\n", "\r\n", "\r\n", "y\r\n", "\r\n", "y\r\n", "\r\n", "\r\n", "x->y\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "u\r\n", "\r\n", "u\r\n", "\r\n", "\r\n", "u->y\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n" ], "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from causalgraphicalmodels import CausalGraphicalModel\n", "ols = CausalGraphicalModel(\n", " nodes=[\"x\", \"y\", \"u\"],\n", " edges=[\n", " (\"x\", \"y\"), \n", " (\"u\", \"y\")\n", " ]\n", ")\n", "ols.draw() # str(graphviz.Source(ols.draw().source)) yields the raw graph data" ] }, { "cell_type": "markdown", "id": "fa771d91", "metadata": { "kernel": "SoS" }, "source": [ "### Simultaneity\n", "\n", "Suppose that a model \"exogenizes\" (assumes as given) covariate $X^{(j)}$ (the $j^{th}$ column of $X$) when, in reality, $X^{(j)}$ and $Y$ are jointly determined. Under a linear reduced form with separable errors, joint determination implies a set of simultaneous equations such as\n", "\\begin{equation}\n", " Y = X^{(j)} \\beta^{(j)} + X^{(-j)} \\beta^{(-j)} + U \\label{sem1}\n", "\\end{equation}\n", "and\n", "\\begin{equation}\n", " X^{(j)} = Y \\phi + X^{(-j)} \\varphi + U' \\label{sem2}\n", "\\end{equation}\n", "where ${(-j)}$ indicates removal of column/element $j$ and $U'$ is random, mean zero noise that is independent of $U$. Insert $\\eqref{sem1}$ into $\\eqref{sem2}$ and to solve for $X^{(j)}$:\n", "\\begin{equation}\n", " X^{(j)} = X^{(-j)} \\frac{\\varphi + \\phi\\beta^{(-j)}}{1-\\beta^{(j)}\\phi} + V \\frac{1}{1-\\beta^{(j)}\\phi} + \\frac{\\phi}{1-\\beta^{(j)}\\phi} + U. \\label{sem3}\n", "\\end{equation}\n", "Inserting $\\eqref{sem3}$ back in to $\\eqref{sem1}$ reveals a problem: $\\beta^{(j)}$ cannot be consistently estimated because $X^{(j)}$ is correlated with $U$." ] }, { "cell_type": "code", "execution_count": 2, "id": "a9f48653", "metadata": { "kernel": "Python3" }, "outputs": [ { "data": { "image/svg+xml": [ "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "G\r\n", "\r\n", "\r\n", "x\r\n", "\r\n", "x\r\n", "\r\n", "\r\n", "y\r\n", "\r\n", "y\r\n", "\r\n", "\r\n", "x->y\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "y->x\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "u\r\n", "\r\n", "u\r\n", "\r\n", "\r\n", "u->y\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import graphviz\n", "simultaneity = graphviz.Digraph('G')\n", "simultaneity.edge('x','y')\n", "simultaneity.edge('u','y')\n", "simultaneity.edge('y','x')\n", "simultaneity" ] }, { "cell_type": "markdown", "id": "3833c25e", "metadata": { "kernel": "Python3" }, "source": [ "### Reverse Causality\n", "\n", "Consider a theoretical model that implies a reduced form equation as in $\\eqref{ols}$. If that equation has no issues with endogeneity (see below), parameters $\\beta$ will have a causal intepretation. However, an empiricist who foolishly estimates the model\n", "\\begin{equation}\n", " X^{(j)} = Y\\phi + X^{(-j)}\\varphi + V\n", "\\end{equation}\n", "will produce parameter estimates that are econometrically consistent but not economically causal. This is a reverse causality problem. If $\\beta^{(j)}$ is the causal effect of $X^{(j)}$ on $Y$ then $\\phi$ cannot be the causal effect of $Y$ on $X^{(j)}$." ] }, { "cell_type": "code", "execution_count": 3, "id": "5ea4b2c6", "metadata": { "kernel": "Python3" }, "outputs": [ { "data": { "image/svg+xml": [ "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "G\r\n", "\r\n", "\r\n", "y\r\n", "\r\n", "y\r\n", "\r\n", "\r\n", "x\r\n", "\r\n", "x\r\n", "\r\n", "\r\n", "y->x\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "u\r\n", "\r\n", "u\r\n", "\r\n", "\r\n", "u->y\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n" ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reverse = graphviz.Digraph('G')\n", "reverse.edge('y','x')\n", "reverse.edge('u','y')\n", "reverse" ] }, { "cell_type": "markdown", "id": "21892374", "metadata": { "kernel": "SoS" }, "source": [ "## Endogeneity\n", "\n", "A theorist carefully specifies a set of assumptions about the world, such as the utility function of an agent in a principal-agent problem. Under these assumptions, a maximization problem is solved given a set of constraints. As stated above, the theory model will specify an equation that looks something like $\\eqref{structural}$ or $\\eqref{reduced}$. On occasion, that theory will yield a reduced form that simplifies to $\\eqref{ols}$. The equation in $\\eqref{ols}$ is special because it describes a linear relationship between $Y$ and $X$ that can be estimated by Ordinary Least Squares (OLS) regression.\n", "\n", "However, even if a theory posits a reduced form that looks like $Y = X\\beta + U$, this *does not* give the empiricist carte blanche to run a linear regression on data $\\{Y, X\\}$ and interpret $\\beta$ in a *causal* manner. That is, an empiricist cannot run OLS and interpret $\\beta^{(j)} = \\partial Y / \\partial X^{(j)}$ as indicating that a one unit increase in covariate $j$ *causes* a $\\beta^{(j)}$ increase in $Y$. Why? The reduced form equation is only correct if the theory is correct. Theories are rigorous approximations of reality, but they are only approximations. Any theoretical paper will impose assumptions that may not hold in practice (note that some of the assumptions may be implicit). Often, real-world data $X$ is not additively separable from $U$ because the theory excludes certain elements of reality. This critique *does not* imply that theories are not useful. A good theory model can reveal incredibly powerful insights about the real world." ] }, { "cell_type": "markdown", "id": "5e8d69a5", "metadata": { "kernel": "SoS" }, "source": [ "### Omitted Variable Bias\n", "\n", "An empiricist uses data $\\{Y,X\\}$ observed in the real world. Theoretical assumptions may rule out certain features of reality. Denote these \"omitted features\" as data $Z$. The real-world data generating process for $Y$ may be\n", "\\begin{equation}\n", " Y = X \\beta + Z \\gamma + U \\label{truedgp}\n", "\\end{equation}\n", "which can be written as\n", "\\begin{equation}\n", " Y = X \\beta + V. \\label{endog}\n", "\\end{equation}\n", "An empiricist might regress $Y$ on $X$ following a theoretical description of a relationship that looks like $\\eqref{ols}$. However, if the actual relationship is given by $\\eqref{endog}$, the estimated $\\beta$ parameters will be incorrect if $X$ and $Z$ are correlated because this implies that $X$ and $V$ are correlated.\n", "\n", "Often, an empiricist is concerned about omitted variable bias for a specific right hand side variable. Suppose that we partition $X$ from $\\eqref{endog}$ into components $\\{D,W\\}$ where $D$ is a vector of data for a variable of interest and $W$ is a matrix containing the remaining columns of $X$. Assuming the data generating process in $\\eqref{truedgp}$, the desired regression is\n", "\\begin{equation}\n", " Y = \\tau D + W \\pi + Z \\gamma + U \\label{ovb1}\n", "\\end{equation}\n", "and the regression available to the empiricist is\n", "\\begin{equation}\n", " Y = \\tau D + W \\pi + V. \\label{ovb2}\n", "\\end{equation}\n", "\n", "Graphically, $Z$ shares a correlation with $D$ and also has a causal effect on $Y$. Either $D$ and $Z$ cause each other or share an unobserved confounder $E$." ] }, { "cell_type": "code", "execution_count": 4, "id": "33aa4adb", "metadata": { "kernel": "Python3" }, "outputs": [ { "data": { "image/svg+xml": [ "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "G\r\n", "\r\n", "\r\n", "u1\r\n", "\r\n", "u1\r\n", "\r\n", "\r\n", "y1\r\n", "\r\n", "y1\r\n", "\r\n", "\r\n", "u1->y1\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "z1\r\n", "\r\n", "z1\r\n", "\r\n", "\r\n", "d1\r\n", "\r\n", "d1\r\n", "\r\n", "\r\n", "z1->d1\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "z1->y1\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "d1->y1\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "w1\r\n", "\r\n", "w1\r\n", "\r\n", "\r\n", "w1->y1\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "z2\r\n", "\r\n", "z2\r\n", "\r\n", "\r\n", "y2\r\n", "\r\n", "y2\r\n", "\r\n", "\r\n", "z2->y2\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "u2\r\n", "\r\n", "u2\r\n", "\r\n", "\r\n", "u2->y2\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "e2\r\n", "\r\n", "e2\r\n", "\r\n", "\r\n", "e2->z2\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "d2\r\n", "\r\n", "d2\r\n", "\r\n", "\r\n", "e2->d2\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "d2->y2\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "w2\r\n", "\r\n", "w2\r\n", "\r\n", "\r\n", "w2->y2\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "z3\r\n", "\r\n", "z3\r\n", "\r\n", "\r\n", "y3\r\n", "\r\n", "y3\r\n", "\r\n", "\r\n", "z3->y3\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "u3\r\n", "\r\n", "u3\r\n", "\r\n", "\r\n", "u3->y3\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "d3\r\n", "\r\n", "d3\r\n", "\r\n", "\r\n", "d3->z3\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "d3->y3\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "w3\r\n", "\r\n", "w3\r\n", "\r\n", "\r\n", "w3->y3\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ovb = graphviz.Digraph('G')\n", "with ovb.subgraph(name='case 1') as a:\n", " a.node('u1', _attributes={'style': 'dashed'})\n", " a.node('z1', _attributes={'style': 'dashed'})\n", " a.edge('d1', 'y1')\n", " a.edge('w1', 'y1')\n", " a.edge('u1', 'y1', _attributes={'style': 'dashed'})\n", " a.edge('z1', 'd1', _attributes={'style': 'dashed'})\n", " a.edge('z1', 'y1', _attributes={'style': 'dashed'})\n", "with ovb.subgraph(name='case 2') as b:\n", " b.node('z2', _attributes={'style': 'dashed'})\n", " b.node('u2', _attributes={'style': 'dashed'})\n", " b.node('e2', _attributes={'style': 'dashed'})\n", " b.edge('d2', 'y2')\n", " b.edge('w2', 'y2')\n", " b.edge('u2', 'y2', _attributes={'style': 'dashed'})\n", " b.edge('e2', 'z2', _attributes={'style': 'dashed'})\n", " b.edge('e2', 'd2', _attributes={'style': 'dashed'})\n", " b.edge('z2', 'y2', _attributes={'style': 'dashed'})\n", "with ovb.subgraph(name='case 3') as c:\n", " c.node('z3', _attributes={'style': 'dashed'})\n", " c.node('u3', _attributes={'style': 'dashed'})\n", " c.edge('d3', 'y3')\n", " c.edge('w3', 'y3')\n", " c.edge('u3', 'y3', _attributes={'style': 'dashed'})\n", " c.edge('d3', 'z3', _attributes={'style': 'dashed'})\n", " c.edge('z3', 'y3', _attributes={'style': 'dashed'})\n", "ovb" ] }, { "cell_type": "markdown", "id": "486298d1", "metadata": { "kernel": "Python3" }, "source": [ "In the first scenario, there is a backdoor path from $D$ to $Y$. The path is $D \\leftarrow Z \\rightarrow Y$. Likewise in scenario two, there is a backdoor path $D \\leftarrow E \\rightarrow Z \\rightarrow Y$. Under these circumstances, it is impossible to estimate the causal effect of $D$ on $Y$ without observing $Z$." ] }, { "cell_type": "code", "execution_count": 5, "id": "5d6f135b", "metadata": { "kernel": "Python3" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[['d', 'z', 'y']]\n", "[['d', 'e', 'z', 'y']]\n", "[]\n" ] } ], "source": [ "ovb1 = CausalGraphicalModel(\n", " nodes=[\"d\", \"y\", \"w\", \"u\", \"z\"],\n", " edges=[\n", " ('u','y'),\n", " ('z','y'),\n", " ('z','d'),\n", " ('d','y'),\n", " ('w','y')\n", " ]\n", ")\n", "ovb2 = CausalGraphicalModel(\n", " nodes=[\"d\", \"y\", \"w\", \"u\", \"z\", \"e\"],\n", " edges=[\n", " ('e','z'),\n", " ('e','d'),\n", " ('u','y'),\n", " ('z','y'),\n", " ('d','y'),\n", " ('w','y')\n", " ]\n", ")\n", "ovb3 = CausalGraphicalModel(\n", " nodes=[\"d\", \"y\", \"w\", \"u\", \"z\"],\n", " edges=[\n", " ('d','z'),\n", " ('d','y'),\n", " ('z','y'),\n", " ('u','y'),\n", " ('w','y')\n", " ]\n", ")\n", "print(ovb1.get_all_backdoor_paths(\"d\", \"y\"))\n", "print(ovb2.get_all_backdoor_paths(\"d\", \"y\"))\n", "print(ovb3.get_all_backdoor_paths(\"d\", \"y\"))" ] }, { "cell_type": "markdown", "id": "cba95afe", "metadata": { "kernel": "Python3" }, "source": [ "In the third scenario, there is no backdoor path. We can estimate the causal *total* effect of $X$ on $Y$. However, without observing the path $X \\rightarrow Z \\rightarrow Y$ we cannot estimate the moderating effect of $Z$. This scenario satisfies the \"classic\" endogeneity critique in economics: (1) $Z$ is unobserved, (2) $Z$ affects $Y$, (3) $Z$ is correlated with $X$. Yet, omission of $Z$ as a regressor *does not* imply that the $\\beta$ coefficient estimated in $\\eqref{ovb2}$ is biased.\n", "\n", "The problem with the third scenario is that there is a threat to the **external validity** of the estimated $\\beta$. The $\\beta$ estimated from model $\\eqref{ovb2}$ yields the average total effect of $X$ conditional on an unobserved $Z$. Suppose $\\beta > 0$ and that $Z$ has a dampening effect on $X$, meaning that $\\gamma < 0$ in the true model $\\eqref{ovb1}$. If our estimation sample happens to have data points with low values of $Z$, an estimation $\\beta$ coefficient from model $\\eqref{ovb2}$ will estimate the total effect of $X$ under very limited moderation. Thus, the $\\beta$ estimate will be close to the true $\\beta$ in $\\eqref{ovb1}$. However, we could not safely extrapolate our measured effect of $X$ on $Y$ to data in which $Z$ is, on average, larger in absolute magnitude than it is in the estimation sample. Our extrapolation would overstate predictions for $Y$ because we would not have correctly measued the moderating power of $Z$.\n", "\n", "An interesting application of this third form of omitted variable bias is in the context of spillover effects, as shown below. Suppose that firms 1 and 2 are each potentially exposed to a treatment. The firm-level treatment status is captured in $x$. The group-level treatment, $z$, is dependent upon the treatment statuses of each firm. If there are spillover effects of treatment (to either treated or to untreated units), a path from $x$ to $y$ via $z$ exists. Empirical analysis that ignores $z$ will estimate the *total* causal effect of $x$ on $y$ subject to the average level of spillovers within the sample (Hudgens & Halloran 2008). Under such a model, the causal estimate would have limited external validity because it would only be applicable to contexts with an equal group level of treatment." ] }, { "cell_type": "code", "execution_count": 6, "id": "2147af51", "metadata": { "kernel": "Python3" }, "outputs": [ { "data": { "image/svg+xml": [ "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "G\r\n", "\r\n", "\r\n", "z\r\n", "\r\n", "z\r\n", "\r\n", "\r\n", "y1\r\n", "\r\n", "y1\r\n", "\r\n", "\r\n", "z->y1\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "y2\r\n", "\r\n", "y2\r\n", "\r\n", "\r\n", "z->y2\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "u1\r\n", "\r\n", "u1\r\n", "\r\n", "\r\n", "u1->y1\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "x1\r\n", "\r\n", "x1\r\n", "\r\n", "\r\n", "x1->z\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "x1->y1\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "x2\r\n", "\r\n", "x2\r\n", "\r\n", "\r\n", "x2->z\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "x2->y2\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "u2\r\n", "\r\n", "u2\r\n", "\r\n", "\r\n", "u2->y2\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spillovers = graphviz.Digraph('G')\n", "spillovers.node('z', _attributes={'style': 'dashed'})\n", "with spillovers.subgraph(name='firm 1') as a:\n", " a.edge('u1', 'y1', _attributes={'style': 'dashed'})\n", " a.edge('x1', 'y1')\n", "with spillovers.subgraph(name='firm 2') as b:\n", " b.edge('x2', 'y2')\n", " b.edge('u2', 'y2', _attributes={'style': 'dashed'})\n", "spillovers.edge('x1', 'z', _attributes={'style': 'dashed'})\n", "spillovers.edge('z', 'y1', _attributes={'style': 'dashed'})\n", "spillovers.edge('x2', 'z', _attributes={'style': 'dashed'})\n", "spillovers.edge('z', 'y2', _attributes={'style': 'dashed'})\n", "spillovers.node('u1', _attributes={'style': 'dashed'})\n", "spillovers.node('u2', _attributes={'style': 'dashed'})\n", "spillovers" ] }, { "cell_type": "markdown", "id": "63a17ca0", "metadata": { "kernel": "Python3" }, "source": [ "**One Omitted Variable (extremely unlikely)**\n", "\n", "For the moment, assume that $Z$ is a vector. Let $\\hat{\\tau}_{\\text{res}}$ be the estimate of $\\tau$ from $\\eqref{ovb2}$, $\\hat{\\tau}$ be the (hypothetical) estimate of $\\tau$ from $\\eqref{ovb1}$, $\\hat{\\gamma}$ be the (hypothetical) estimate of $\\gamma$ from $\\eqref{ovb1}$, and $\\hat{\\delta}$ be the estimated coefficient in the (hypothetical) regression\n", "\\begin{equation}\n", " Z = \\varphi D + W \\phi + E \\label{ovb_aux}\n", "\\end{equation}\n", "where $E$ is random, mean zero noise. The omitted variable bias formula for $\\tau$ is given by\n", "\\begin{equation}\n", " \\hat{\\tau}_{\\text{res}} = \\hat{\\tau} + \\hat{\\gamma}\\hat{\\varphi}. \\label{ovb}\n", "\\end{equation}\n", "To assess the severity of omitted variable bias on the estimate $\\hat{\\tau}_{\\text{res}}$, one could apply economic arguments about expected magnitudes of $\\hat{\\gamma}$ and $\\hat{\\varphi}$. This approach is limited for a number of reasons. One obvious issue is that $Z$ is assumed to be a single omitted factor; realistically, any empirical model will omit many unobservable features. Another important limitation of the utility of $\\eqref{ovb}$ is that parameter estimates scale inversely with the scale of the data. For instance, if one multiplies $D$ by $10$ and re-runs $\\eqref{ovb_aux}$ the estimate of $\\hat{\\varphi}$ will be $0.1$ times the size; omitted variable bias does not \"shrink\" away by manipulating the scale of the data because estimates for $\\tau$ will also be $0.1$ times as large. The dependence of parameter scale on the scale of the data means that it is sometimes difficult to argue what a \"reasonable\" parameter size might be.\n", "\n", "There are $R^2$-based approaches to omitted variable bias analysis that are invariant to parameter scale and to the dimension of the unobserved variable space. These are more promising aveneues of OVB diagnostics." ] }, { "cell_type": "markdown", "id": "40266741", "metadata": { "kernel": "SoS" }, "source": [ "#### Oster's $\\delta$\n", "\n", "[Emily Oster](https://emilyoster.net/) has some interesting ideas about omitted variable bias in her [JBES 2017](https://www.tandfonline.com/doi/abs/10.1080/07350015.2016.1227711) article. Define $\\hat{R}^2_{res}$ to be the $R^2$ from $\\eqref{ovb2}$, $\\hat{R}^2$ to be the hypothetical $R^2$ from $\\eqref{ovb1}$, and $\\{\\hat{\\tau}_0, \\hat{R}^2_0\\}$ to be the parameter estimate and $R^2$ from the model:\n", "\\begin{equation}\n", " Y = \\tau D + O \\label{ovb4}\n", "\\end{equation}\n", "Note that $\\hat{R}^2$ can be less than oe if there is measurement error in $Y$.\n", "\n", "There is always a $\\delta$ that satisfies\n", "\\begin{equation}\n", " \\delta \\frac{cov(D,W\\pi)}{var(W\\pi)} = \\frac{cov(D, Z\\gamma)}{var(Z\\gamma)} \\label{oster}\n", "\\end{equation}\n", "even in the case of $Z$ being a vector.\n", "\n", "A consistent estimator for $\\hat{\\tau}$ is given by\n", "\\begin{equation}\n", " \\hat{\\tau} = \\tau_{res} - \\frac{\\delta(\\hat{\\tau}_0 - \\hat{\\tau_{res}})(\\hat{R}^2 - \\hat{R}^2_{res})}{\\hat{R}^2_{res}-\\hat{R}^2} \\label{oster2}\n", "\\end{equation}\n", "\n", "Using either an assumed value of $\\hat{R}^2=1$ or arguments to bound $\\hat{R}^2$ to some value below $1$, one can then solve for the value of $\\delta$ that would render $\\hat{\\tau}$ equal to $0$. A $\\delta$ of 2, for instance, implies that the omitted data $Z$ needs to be twice as important as the data in $W$ for $\\hat{\\tau}$ to be $0$. Given data limitations, the omitted variables one can imagine may or may not satisfy that threshold, and this point needs to be argued economically (rather than statistically). Extremely large values for $\\delta$ (e.g. 100?) should leave one fairly comfortable that omitted variable bias is not artificially producing statistically significant results for $\\hat{\\tau}$ as estimated by $\\eqref{ovb2}$. Note that the summary of Oster's paper here simplifies and skips over some details for the sake of brevity and presents only the crux of the argument. Code to conduct analysis as done in [Oster (2017)](https://www.tandfonline.com/doi/abs/10.1080/07350015.2016.1227711) is available in her accompanying Stata package: *psacalc*. The package is available via SSC, meaning it can be installed with the command" ] }, { "cell_type": "code", "execution_count": 7, "id": "076eec8e", "metadata": { "kernel": "Stata" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "checking psacalc consistency and verifying not already installed...\n", "all files already exist and are up to date.\n" ] } ], "source": [ "ssc install psacalc" ] }, { "cell_type": "markdown", "id": "3a9486ff", "metadata": { "kernel": "Stata" }, "source": [ "Using a classic example data set (car prices), we can check for omitted variable bias.\n", "\n", "Regress car price on an indicator for whether the car is foreign in origin, the car's fuel efficiency, weight, headroom, and trunk space assuming the reduced form model in $\\eqref{ols}$." ] }, { "cell_type": "code", "execution_count": 8, "id": "5b7b2e5f", "metadata": { "kernel": "Stata", "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "(1978 automobile data)\n", "\n", "\n", " Source | SS df MS Number of obs = 74\n", "-------------+---------------------------------- F(5, 68) = 15.07\n", " Model | 333779438 5 66755887.5 Prob > F = 0.0000\n", " Residual | 301285958 68 4430675.86 R-squared = 0.5256\n", "-------------+---------------------------------- Adj R-squared = 0.4907\n", " Total | 635065396 73 8699525.97 Root MSE = 2104.9\n", "\n", "------------------------------------------------------------------------------\n", " price | Coefficient Std. err. t P>|t| [95% conf. interval]\n", "-------------+----------------------------------------------------------------\n", " foreign | 3654.777 677.486 5.39 0.000 2302.875 5006.679\n", " mpg | 14.4324 73.55266 0.20 0.845 -132.3397 161.2044\n", " weight | 3.78137 .677797 5.58 0.000 2.428848 5.133893\n", " headroom | -615.6944 390.0197 -1.58 0.119 -1393.967 162.5776\n", " trunk | -11.80202 91.57615 -0.13 0.898 -194.5394 170.9353\n", " _cons | -4641.084 3394.703 -1.37 0.176 -11415.11 2132.94\n", "------------------------------------------------------------------------------\n" ] } ], "source": [ "sysuse auto.dta, clear\n", "reg price foreign mpg weight headroom trunk" ] }, { "cell_type": "markdown", "id": "8cf3166e", "metadata": { "kernel": "Stata" }, "source": [ "Note that the coefficient on `weight` is statistically different from zero. Use the `psacalc` command to check for omitted variable bias." ] }, { "cell_type": "code", "execution_count": 9, "id": "30b6d90b", "metadata": { "kernel": "Stata" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " ---- Bound Estimate ----\n", "-------------+----------------------------------------------------------------\n", "delta | 0.30310\n", "-------------+----------------------------------------------------------------\n", "\n", " ---- Inputs from Regressions ----\n", " | Coeff. R-Squared\n", "-------------+----------------------------------------------------------------\n", "Uncontrolled | 2.04406 0.290\n", "Controlled | 3.78137 0.526\n", "-------------+----------------------------------------------------------------\n", "\n", " ---- Other Inputs ----\n", "-------------+----------------------------------------------------------------\n", "R_max | 1.000\n", "Beta | 0.000000\n", "Unr. Controls| \n", "-------------+----------------------------------------------------------------\n" ] } ], "source": [ "psacalc delta weight" ] }, { "cell_type": "markdown", "id": "13b9cb8e", "metadata": { "kernel": "Stata" }, "source": [ "The $\\delta$ estimate of $.30$ implies that omitted variabels need only be $30\\%$ as important as the included variables to potentially explain away the estimated coefficient on `weight`." ] }, { "cell_type": "markdown", "id": "67a9f797", "metadata": { "kernel": "Stata" }, "source": [ "**Cinelli and Hazlett**\n", "\n", "Cinelli and Hazlett derive a function for the relative bias (that is, the omitted variable bias scaled by the observed estimate):\n", "\\begin{equation}\n", " \\Biggr|\\frac{\\hat{\\gamma}\\hat{\\varphi}}{\\hat{\\tau}_{\\text{res}}}\\Biggr| = f(R^2_{Y\\sim Z|D,W},R^2_{D\\sim Z|W},R^2_{Y\\sim D|W}). \\label{relativebias}\n", "\\end{equation}\n", "The first two inputs to the function in $\\eqref{relativebias}$, the partial $R^2$ of regressing $Y$ on $Z$ controlling for $\\{D,W\\}$ and the partial $R^2$ of regressing $D$ on $Z$ controlling for $Z$, are hypothetical. The third input, the partial $R^2$ of regressing $Y$ on $D$ controlling for $W$ is calculable. An analysis of the two hypothetical inputs is made easy with `sensemakr`. Begin by loading the package, installing it if necessary." ] }, { "cell_type": "code", "execution_count": 10, "id": "a69d7097", "metadata": { "kernel": "R" }, "outputs": [], "source": [ "if(!suppressMessages(require(\"sensemakr\"))) {\n", " install.packages(\"sensemakr\")\n", " suppressMessages(library(sensemakr))\n", "}" ] }, { "cell_type": "markdown", "id": "be2ea7dd", "metadata": { "kernel": "R" }, "source": [ "The example data in [Cinelli and Hazlett (2020)](https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12348) deals with the genocide in Darfur. In their data, variable $D$ is an indicator for whether an individual was directly harmed by the violence in Sudan. Variable $Y$ is that individual's surveyed preference for peace. Variables in $W$ include such things as the individual's age and occupation. There is a concern that omitted variable bias affects the estimation of $\\tau$. The following code loads the data and runs the regression specified in $\\eqref{ovb2}$." ] }, { "cell_type": "code", "execution_count": 11, "id": "9c6a065b", "metadata": { "kernel": "R" }, "outputs": [], "source": [ "# load the data\n", "data('darfur')\n", "# run regression\n", "model <- lm(peacefactor ~ directlyharmed + age + farmer_dar + herder_dar +\n", " pastvoted + hhsize_darfur + female + village, data = darfur)" ] }, { "cell_type": "markdown", "id": "7598745e", "metadata": { "kernel": "R" }, "source": [ "Now pass the model's estimates to the `sensemakr` package for sensitivity analysis." ] }, { "cell_type": "code", "execution_count": 12, "id": "5a64ffe2", "metadata": { "kernel": "R" }, "outputs": [], "source": [ "# runs sensemakr for sensitivity analysis\n", "sensitivity <- sensemakr(model = model, \n", " treatment = \"directlyharmed\",\n", " benchmark_covariates = \"female\",\n", " kd = 1:3)" ] }, { "cell_type": "markdown", "id": "3f0a34c5", "metadata": { "kernel": "R" }, "source": [ "One can plot the output, `sensitivity`, to see how the coefficient $\\hat{\\tau}_{\\text{res}}$ is affected by omitted variable bias as given in $\\eqref{relativebias}$." ] }, { "cell_type": "code", "execution_count": 13, "id": "ee48ed56", "metadata": { "kernel": "R" }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA0gAAANICAMAAADKOT/pAAAACVBMVEUAAAD/AAD///9nGWQe\nAAAACXBIWXMAABJ0AAASdAHeZh94AAAgAElEQVR4nO2djZqjKhAFGd//oXcTTeJfImoDp5uq\n797dzCTKAamAaDZpAIDbpNYBACKASAAGIBKAAYgEYAAiARiASAAGIBKAAYgEYAAiARiASAAG\nIBKAAYgEYAAiARiASAAGIBKAAYgEYAAiARiASAAGIBKAAYgEYAAiARiASAAGIBKAAcVFSgCB\naCdS6QIA6oFIAAYgEoABiARgACIBGIBIAAYgEoABiARgACIBGIBIAAYgEoABiARgACIBGIBI\nAAYgEoABiARgACIBGIBIAAYgEoABiARgACIBGIBIAAYgEoABiARgACIBGOBFpOtx/v4ub3pM\n7XeDhu8+3RVt1EERKQdEils0Ii0JJVJ/3blh0adKjS9SWToyqeUh0DcJke6BSOHLzgOR7tGR\nSJj0C0S6ByLFLzsLRLpH/VyYJFkkIt2kpyGpqcTiJiHSTRApbtmPIn98qeXmteeesKKeSCUv\nJPU1t+us7Om7YbMK7kKkUJdkO+vMTcseRyNEmigqUl8mNZ1h159Fp/xyEekuXYnUl0knPEKk\n2yBS2MLHxYb81557wgpE8lRkn2Wn3DU7RLoPInVR9hGIdJu+5naYtA8i3aazIakvk6b1huMZ\nXg8iFQaRApedNg+OXpj9hBVhROrNpD4tRqTyIFLgwl8Ld4hUns5E6qzw9L7n7uhlJ5+wApG8\nldlp4dzZUAtECl34q8Dfg1IXIhVe/8ak2IWP50jpd8mIZAAiBS88jf8jUrxLso37cmcmpWOT\nuhAp4pDUW19uWnrGAjgiWdDfkNSy8DYiHbxzIZIF3YnUW+nHF5IQyQJE6qn0XbyIJL5sh0l9\nlP69XEQyocmB7bMvK5S+Qx8iFQeRuip9B0QyoUORer6StQMimdAmXc+Dglp/QCQbOhySGg8K\nYh0CkWxgSOqt+BWIZAMi9Vb8ik5EKr7+3ePcrvPilyCSEYjUW/FLEMmIHud2nRe/AJGswKT6\npQuZ1IlIDEkUXxZEsqJLkVqX37r6HxDJCkTqr/gZiGRFo2Pauie3Lr9t8R8QyYxOTWpbfPPy\nX/QiUgUQqcfyXyCSGX2K1H35E4hkBiJ1Wf4EItmBSV2WP4JIdrQSqXXT9F7+E0Syo1XE1k3T\ne/lP4om0/bf8nj9/Xf82rGinc7vmAVqX/yCcSGnz4vEHRIoboHX5D6KK9PFpHKGmcSo9hqfX\nt4K+frU3il2i17ld8wCty/8VwatIrxen9yajLf9HpDTO8tLs2eXD22BSp+WHFCmtvs1m+unv\nL70fv/eGSDECtC4/pEjDa7q2+GGa4Q0zkd6/SjlfW20e0hDhflSp+OYt0ItI02pD2v6fDOva\n7HDq9qNOyvcjUu7r16c+X0XavvA+iNRtgHIiHe2gvkhpI9Jq3eE+mNRrAHuR0oxLJd98/edk\naNpkdj60nNal9BHKaJKNSL0GKDAipcVf50s2en0T+hWpfYK2AUpM7WYDwZWSjV7fBETqNUCR\nc6T3LOpSyUavbwMmdRqg0GJDxlIYIkUpWSVB04tJpVbtEiJ1U7JMgogiHb89xBQJkzoNEO6C\n7Hcq/ItcAyL1GqCoSE2uI623eV8jetz/vf0EhTU9i9Q+gmDrF4mUcq/W2gR6y/L843lrw/JO\nBkyKFkFv1TTC1G4p0oBI8SMgkvnr5w6tRFo9a4viPL1eguYRWgUocmdD1tytqUivdIgULUIg\nkdLmgUkBXkakvud27TM0CmAvUtp9eL8ARJIuWidCmwQ9ifR+vHnCFETqMQEi2YNJHSYIf470\nuQg7f1xwsaF3kQQytEgQedXueJMyLd6yJ7XvxQIRooh0s2Sj1+dsUqiSDEmtE9SPEFqko21K\n1RGRWoNIdq/foc79393P7RQyVI+ASAVApOYgktnrd/irZJLmZ54rIpChdoSuROpjSBLoxR2a\nhEglaNqPBDqxQgZEMttgCyLVQiBE3QiIVAJEEgiBSFYbbKklEiYphKgaAZGK0LYbCXRiiTWP\nmhH6EqkaiKQQApFsNmgJJimEqBgBkcqASBIh6mVApDIgkkQKRLLYoCmYJJGiWgREKgQiSaRA\nJIMN9qi1/t36UpJAHx40TKqUoTuRqpnEkKSRApFub7BHLyK1Ln5CIUWdDIhUisazK4UuPGjE\nqJIBkYqBSINGDES6ucEe3YjUuvgXCjFqZPAv0rn9IFJlFGIgUsYG6e/Ujmr9sw0PMOmBQowK\nGZyLlNLf31mTzpZ7mdZdqHX5EwoxymfwLdJDo4dJCsdqS+tUrcufUIiBSD83mDw6OyhVg7nd\nA4m3uXbd2YFIH49ETUKkJxI5SofwLBIjknj5LxRyINKvDZK0R+17UOvyXyjkKDzD9C3SZJLC\ncdqldbDW5b+QyIFIv7Y4vfxdc/27fQdqHmBCIkfREN5FOntBFpGaIJEDkX5ucXY/NUVq34Ga\nB5iQyFEyhH+RzoJILZC4mFSyNfoTibldEySCINLFLXZhSGqCRJByIRCpLM27T/MALySCINK1\nLXbpSySBBBMSQYqFQKTCtO8+7RNMSAQpterRoUh1aR+6fYIJjSCIJHIgztI+dfsEExpByqRA\npNK0T90+wYRGEETSOA5nEUgtEGFEI0iRFIhUnPax2yd4oZGkRApEKk772O0TvNBIgkhGVF3/\nVug8AhEmNJIUSNGnSJ1dSVKI8EIjin0KRCqPQN8RiDAhksQ8BiKVR6HvKGQYEUliHQORyqPQ\ndRQyjIgkQSQDKosk0XcUMoyIJDGOgUgVUOg6ChlGRJJ0LJJZpppfSfFAoutIhHgiksQ2Rpci\nMSS1RSSKaYw+RaqNQnCFDC9EsljGQKQqKCRXyPBCJIthDESqgkJyhQwvRLIgkjckkkuEGNH4\nZ+4smwSR6iARXSLEhEgWsxiIVAeJ6BIhJkSyIJIzNKJrpBgRyWIVo1ORal9IEuk2GilGVLIY\n5ehWJExqjUoWmxyditRgSNLoNxopRkSy2BwZRKqFRrfRSDGikgWRbtCtSBoxRlSyWORApFqI\n9BqRGE9UsiDSdeqLJNJtNFJMqIQxGKcRqRoivUYkxohMmNtBehWpASLpRWJMqKRBJEeIxBeJ\nMaIS5vbkDpHqoRJfJccDmSyI5AiR/CIxRmTW42/mcCWSVhc4j0p8lRwjKmnuKY1IFVGJr5Jj\nRCYNIl2iwQK4TJ9RyTEik+ZOEESqiUqXUckxIpMGka7Qs0g6QZ7IpLkRBJGqEqHLFEAmzfUg\nPYvEkKSCTprLSfoViSFJCJ00V5MgUl1keoxMkBGZOIh0mr5FEkryRCbOxSCIVBmdDiOTZEQm\nzrUgHYvUBp0q6CR5IhMHkXygUwedJE9k4lwaqxGpNjp10EnyRCcOInlAqA5CUR7oxLH8iA8i\nlUKnEjpJRnTynE+CSNURqoRQlAc6cRDpFI0WwD33l7Lo5Dl9kHJESv/3al/FbkVS6i6tA6zR\nCVRApPT6zxZEEkAoyohOoJNJjkVKs/8tubQ/2xCNRBLqLUpRnrid93YtEkOSUpQRnUCIlA9D\nklKUEZ1Ap0ZHX+dIiGSOUJQRpUAnsmSu2hWYuiKSBkpZnsgFysLXdaQoIkl1FqUsT+QC5dC3\nSM1QqodSlidygXLIOUeaqFWy+UaKKFVEKcsToTXwbPJW7aqWbL6RIkoVUcoyIpLIdtUOkUog\nVRGpME9EEj1j5E3GEKkRUjWRCvNEJFEa55kZafKuI5UAkXSQCjOiEim9/8h43e8nWGwoglRV\npMI8EUmUZn9mvPDXE7Gnds2uJKn0lAmtNINKoGyPEAmRnmileSCxBp47sUOkdiKJ9V2tNE8U\nImWf03S/2MCQNKKV5olUpCOhMkak2IsNDEkTWmmeiEX6Gaf7e+0QaUIrzYhEpryTHGcilThJ\nYm43IhbngUQkO5GEPo/EkFQQsTgPNJbuck5ushcbzGukIlJD1CqjlmcQiZSjc/7yt8Rig0bD\nmiFWG7E4TyQyIZI4YrURi/NEI9NxCkRqilh1xOI8kch0PLnjHKkpatWROLlfoZhpC6t2TZGr\njlygQTPThu6vIzVd/9brJIpv/4qZNiASIi2QCzRoZlrDOVLLexsGwU4iF2jQzLSCVbvGQ5Je\nJ5ELNHiY3SESIq2QC/RAMtQcREKkFZrv/pKhZhQ7RzpcMUekEb0eopdoEA01o8B1pPT6ztnf\n2yDShF4X0Us0iIb6YL/8/VQoYxS7KJJ4e15Br0p6iQbRUG/sz5HS58UFRBJvzkvoVUkv0QPN\nVBOIJIBenfQSPdBMNXIkUkqn//GTj4KIlIdenfQSPdBMNZI/Ip3e59V/vyhr56EQrJNgpEE1\n1RNv99qVydV43U6xgwhGGpTXmhDpQdvb7SR7rWCkB6KxsqZ21/+BSD/nSAxJGwQjPRCNlT8i\nmVQgpQuLFwVybECkDYKRnojmyp/aadxrF1Ukwe6hmGlApPySi2x2QGuRNLuHZChRwUuIlDd3\nkxKpuUm+OkdbJGPlLzac36Wn60itRZLsHZrv/aJtdfqJE3v0s2rXHsl6SYbSjIVIKkhWTDKU\n5FCZI9LFmV0ZkVSP7X0kKyYZahDMlSFSxp3c+1sWOUfSa0IrJCsmGWoQzJV/06rIqp1eE5oh\nWTPJUINerhIi3Sy50Hb6SNZMMtQDsWCI9KL5Arha15jQTDWoBStwjnS35ELbHdFeJLGu8UJw\niWxEKpf9qt3tkgttdwQifUM0llYu++tIt0sutN0RAiJpdY0PorGkciHSGwGTlHrGDNFYUsEQ\n6Y2ASEo9Y45oLKVgiPRGQSShnrFANJZQMER6g0jfEY016CQLINJ2TbHIP6hXB9VwqrlkkmUu\nfxeIayVS2vyy0L1JVZANR7Df5F2QTcIXZCeRPj6NI9Q0TqXH8DQNWe9f7Y1iKojGks2lkizv\nFqH065XWJV/ZcCH7O20aZ3lp9uzyoSKisXRvcNBIFl6kYZkekS4jG0wiWQyRppnb7KfXBzkW\n6d+/uvMP6xVHNZduMIVk/s6RdjYcf7MUafV49v/3ykgsgCv0in1kgykky1y1E7ppdWf9+/Xn\nemo3rAU6FAmTfiEbTOA0KUekyiWf3fCjz+9zpHHG9/McSUMk3Q4rG6x9NP8iTec7Mzdm50PL\nad34svR5dg0iHUCy0+W7EckSEZGa94rvCCdrGy1nseHWt0ecL7nYhseInCQpd9fWAb6jLlKp\nfIoiqQxJdNcrNI2GSAtERBLuru3Xx77TMhsiSSJcR+FoLbPlXZCtWnKxDT0hXEnhaA2zHYh0\n/6sqz5dcbENPKFeSbGcK9rf8vb4+9Hr8eheYXVlygHJMsp0od3OOJDMifdlyfj/DMH/85Wd1\nlGMKZ2u14BBfpEV+RDKBbPnFvrqj3jnS/pZp+cdMpMV0717JVVGOKZ2tSTiHy99nRfr8MDXx\nz5JVriSJd9bWAX7SIp3DxYaTIg2bvw9EwqQMlLMh0p0trUQSGpKke6vyHQ4twsUXaWeK93Pv\niJQH4fJKDCVS2nvBF4REorNepnq4WCJ9LsLOH6/+PipZyCT66mVqpwsjUv7+jl4oJJJ2Z5UO\nVztdhkg+riPFFEm7s0ovOFROF+Y6UvYOpQ/+Bu20pDsuTFgk8eNninhVtePVTIdI2ohXlXiH\nRaXjlxQqueim7hCvK/GOSvosJ8stNqgfPVvEK6u94FCv9Twuf6v3LVvUK6udT0ik9e1qxUsu\numkeSgvg4j1VPV+tEROR9pC6lCTeU+Xz1Yl3JFKf50iIdAby/Sqk6+VvLZH0e6p4wBrxWGzY\nRcsk8X4qHxCR7DfNREsk9Y4qH7DCkHl0HWl2llSr5KKb5qJlkng/HfQTthsX+h6RxESS76f6\nCUvncymS/FGzR7/G6gkL58tYtdO7107+oBVAvsrqAZuLNKh9q/nNbZ2iX2X1hGVXHLKndkqL\nDfLHrAAOqiwfsWTAPJEYkdrjoM7yEQsGzJvaVS258La5iK3bOeimDiKWC8hiwzcQ6TzyEVuK\nxIikgnw31b/rrlxAh+dIqVKXkjNJvZM+0M9Ye4Ilu2qX/lKnIjnopR4ylkno7jpS+vurZJKe\nSA56qYeMdc9URBcbHh4xJEkjH7KRSKW4VMDTo/8mdSqSfid9IB+yRPfxJdLk0WNM6hQPFXeQ\nseKZiqJIb48wSRoHGc0juhKJEcnBlZoHDkJaR/QlUt1zJE1cVN1ByPoilfmk+dVVu3rL36q4\nqLuDkLYRj0T6GGTt0p0Lsh6OUzF81N1BStOIByKlnd+VLjlnw1qHSXAF3EUfHVyktIzo7BzJ\nZOMTaIrkoI8OPkyyy4hIP5EUyUMXfeAgZlWRnlMplcUGm62z0RTJQxd94CCmWb/OuNduKHNa\n4kIkUZMc9NAHLmIahUSk3/xh0g08xESkOmiK5KKLDj7WRWwyIpJPvNTfRU6LkHl3NhgVlldy\nha0D4KUBPIxJFo3pdPnbTT8qhpsGcBH0fkhE8oqbFnAR9HZIh7cIGWwdAheTpgcugt4N6e+m\nVYutY+CmDVwELSzSMGh9jMJq8zNoLoAPTjroAxdB73Vxr+dIVUVSNclF/3ziImlhkUq1ASLd\nx0X/fOIi6Z2QiHSMsEgu+ucTF1FvhMy8+7sAfkRSNql1gHx8RL2cMmNESmVWG1yJhEn38RH1\nakoWG3JAJAt8ZL2YEpGc4+LcY8JH1Gspc0R6TOvs2wCRjPDTEk6kvxQzb7Ehyd397aj7lMZT\nS3jKeo685e8Sn6O4u7+4x+QkTt7nRzxlPQUi+cdVS7gKewJECoCrpvAQ9kJGt+dItQ+I7AL4\nAw+d84WHmeiY8dS108xVO53vkDXb/iS6l5IeOOicHxyETensNMztdSREmuOgb85wkTa9/8h/\n+aknrHAnkrJJLrrmBwdxp8ndudd/fyLNuBstt+RK259FWiQXZx4z9NOeHJDy7/7WW2xgbrdA\nv2/OkPf+rEcnPo+kNiIxJC1R75or1OOeXahGpGzERZLvmivU487/1Z+MrIgUB2cN4iRuypuH\n+j1H8nIgKuKsReTjpsVfWa/9+YTmBVn941Ad+VP4FT7y5t3l4Pc6EiJt8dYkHvK+LigdmIRI\nofDWJvp5Pxdmf5uESGdQX7hz0DFXyAd+nScdBc1ZbNC8swGR9vBx2jHDQd6s7wfLX/625v5+\n6x8DfZFc9MwF+upn3eWASKdwYJJ8v1yjHziZTO0QaYYDkRx0zBUOxqTjhNkXZM1xKZIHk+S7\n5QZ/ibdkjEiqiw2NRMIke/THpEMcL3+36TAORPLYL/0lXoFIEXHYMg4jL0CkkDhsGoeR53CO\nFBKPTeMx84fsEUnvYxTem74oDk+TfB/O/Kmd3ojku+UL49Ikh5lfIFJUXDaOy9BPEOk8HlbA\nB6ed0mXoB/mLDdVKrrqLKyBSOVyGfuB5+budSE5M8tgrXYYenIvEkHSAy07pU6UckUT/8ROj\nfVzBi0kuu6TP1Nl3f5+o3Vu7n9sgUgWcvrs7TJ3/eaTsyj1emI4/VehbJDcmtQ5wDX+x7UWa\njUZhRfIzJDnskk/cxS4m0vPzuZdKPoG75m6A0zbyNr2zP0eaGYhICjhtJGexC6zazS7lXir5\nBM4auxFOW8lX7BLXkTazQusCTPfRAU6byVXsA5GOTnRKlFx9J+Hx2kqezpMQ6TJuFu589cgF\nfnIfTe1ufRlz8aldW5E8mdQ6wEX85M5f/jYpzfpb0hEpEz89combsdT3TauNRcKkCjhRCZGu\ng0l1cJE854Ls2alY3gY2IjU2qWXpZ3Hy1r6lxGcPzClwjpS5pU3jIFI+DrrjFxwktxcp7T48\nUfLV0uAIv42lPyYhUk/4bS355Nk3rV7ZIyKp4be51MekjBHp7GJDP+dIDnHcXtoqFblptd6q\nneeO0QTX7aUc3vl1JO3GVUT7ff0A4ewFPo90t+Qmu7mMsyXw9g12B+Hs2YsN5nUII5I7k1o3\n2Q10s+cvf1tXIYhI/oYkgTa7gapKiHQXf0NS+za7g2h4RLqNP5Fk39az0AzPOdJ9PJrUOsAt\nFNO7X7UTaFWHIom+recimN79dSQBkXziu93kVEKkXnHebmrx3Z8jybWoG+Te1M8hFt/9qh0i\nXcZ7y0nlRyQbPC44qL2pn0Ypvn+RNJrTpUgibXcdoXcC/+dIIp3Bp0kabXcDmQr4v44k0pY+\nRVJ6T7+GSgX8L3+LiOTVJJXmu4xIfkSywqtIKu13GY0xKYBIIj3B4V3gExrtdwMFlRAJArRf\n+yogEki8o9+keRUORErluikiCeG/CVvXAJHggf82bDwmHU3tjL8YLKfkhnvqmOZzo/s0rUL+\nLULVSm64p56J0IoN6xBhsUGoD7hdAh9CjEkN6xDgFiEtkRybJNSO12lVhwA3rUp1AExqTaM6\nBPgYhdbx9y2SUktepU0lEMkY30OSVFNepkUlQogkdfgxqT0NBqUQ50haR9+3SFpteZnqtQix\nahfk4IsQozFrD0ohriMFOfYqhFhyqF0NRIItQdqzZjViiBTlyMsQpD0rDkqIBHsEaVBEarkz\nC5wv3Qm26DWqjUmIVAb3IkVZcqhVj+8ijUve5VJEF8m9SXqNepEq9fg5IhX8fKz1nuUOegCR\n9Br1IjUGpd9Tu4RIV4kwJOm16lXKV+TgHKmgy8FFwiQpig9KP86Rfj9frmSBvZkQQaQwSw7F\naxJk1U5RpCDEadmiNTkU6X2/qtLHKNbbpp8nc3G6QhPiNF/JQelIpPcSuNLHKNYNcrBOH6cn\ntCFQ+zU4U0mLP6U+RpFWej8veaX3ha/xh/cTk2clPgnSC4GarlhV8kQqUfztqd3nKtfcq/T6\nf/GqMoNqP6DS4W5/P1Fw6a6USPOfh+YihVi6G2K9CdXtzvoifd5bXnO3tBbp/asi/+hyDiGu\nJj0INCYVqUsgkWaPl/+nQjXIIoxJsQalemtn+iKtpnZbkdbnSG3AJEmqXc15LzMofhvFN5HS\nRqTVukMLwogUanpnXZkDkQpyV6SZG5u17tf/44Xa9Hm2BXGGpFhjkq1KPkWqs0Mr4ogUa0wy\n7TKIBKeI1cp2bwyIBOeINigZVSeQSJhUCVTa28vpJ6xAJLdEa2iL+vhc/q61R9gn2JhkUR9G\npEoEWgR/IN3WF7hdH88ira8Pva4kpfkDlWP+F80kjWY14259MkRSndqtrsoO74uviyctSjIh\nmEgRVbp9s83PJ9KyV5pRVKRFYpUDHs0kmYa1oopI9e7yO7P9T5Hew6jIAQ8nkkrDmnFjTMoS\nqciQVFKktPuC5mCSPJdVCiXSkIb1EIRIZYl2onS5s+QsNiASfCecStcWHXKWv9PdFY1zJZ/Z\nfl+k3fkdlCJe817o7X6vI2WJJLXYEJbA7ZutlHOR3hdk0+zx2yyxVbu4hJvevUjZncevSLs7\n2N2p2FEOd2X2QUyVTpwYHIiUhs99qxbJckq+swNEaoZYK99lnOK8HuW9/twTVtwvYGcP2XK1\nJKZJcs18k5TODEiZ15FO7DCbMg3v4nAGNSnc9A6R1IlpkpfWz2Z++fToXeJIJE8f7Cu3V3OC\nihRszSG9K3Tc//NHJGsK7dfHkYwqUjCV5h92O3rp4ROIBKcIeQQO3x8QCayJNSgt/jp83a8n\nfPX4SAfRLYEOwuu7Hw9fd/iEr8UGVweRMyUvvJfejl5x4gkrECnsKviDUCpl1AWRWhLZJFcH\n4j4550i+pnauDmBskzwdibtkrdqlXy80L1l2xyUIbZKvQ3GPbJGc3CJUcsdFCC1SR4NS3nWk\nEkNSOZFcHbzYInk7GtcJKJKzISk8fRyOvAuyiATX6WJQyln+TkU+a4JI/dCBSvGuI5XddSGi\nnyk5PCQnQSQJYq+CP4j3+dklhyIVawBEmhPfJJeHJZ/DT8j+ek2ZksX3XYguTHJ4XHI5ECn9\nflGRki+/8NYmzenBJJcHJg99kdLf+eJdHq8eRIo7KMmLlP6/U3diUh8EXXVQF+nh0QWTQh6r\nKIQ8OD5EOm1SyGMVhoiDkrhIk0enTfJ7oLo4Uwp4rnQkUtt/IPLtUT8m/XWxejeEU0n8zoar\nI5Ljw9SNSY6P0Q7iIr0WG7pat+vGJM8HaY26SBeXv30fo25ECjQoyYt06YLsid0r0o9Ivo/T\nHH2RriaJcoSiE2RQciCS6P7BigJLwvVBJFDA/dGKK5L/Y9PTqZL7GR4i6dLPFaUnvo9XYJGc\nH5mhP5M8HzBEkqYrkVwvOyCSNH0NSU+cHrXIInk9JnP6E8npDA+RQA6Pxy20SC6PCLg8V0Ik\nD3Q4wfM2w0MkD3S2Dj7h6ujFFsnXsfhBpyY5GpUQyQldmuRIpeAiYZJ7nKgUXaRAJnWLi2EJ\nkUAfByohErhAXSVEckavp0rqRzK8SOLtf5q/PlfCH0hP8DoQSbj1L9GvScouxRcp2pA09K2S\n6hujvUi5/1x4vfbQbPk79CySqEoFRqTMLREJriKoUompXd6mFdtCr9nhJnInS0XOkbK2RSS4\nhZZLHSw2VC6rJn2fK0mphEie6VykQehsCZFc0/VC+IjIsFRUJJHl79qF1QSTBo1hqe6IlHIv\nMtkXXLO0qiDSoDAs9TG1CzwkwURjlXoRKbpJzPAaD0tFLshmzd0QyZKObwqf006lkrcI/d5F\n7SoHNwmVRloNSwVuWs3cR/X6RjeJFbyJJi4hUiQQaaSBSh2J1INJ8KK2S/2cIyFSb1RVqZtV\nu6H5lYZ6MMObqDgsdXMdqVmZDWDR4UOte2gQKSIshc+pYlJfImFSn1QYlhApKoi0pPCh70yk\nnkyCJWWHpd5EwqSuKScTIgWHk6U1ZVTqTqTeTPpj3WFDiWEJkeKDSVvMVepPpA5NYgVvB+PT\npQ5F6tEk2MVQph5F6tYkpng7GKmESB3BydIuJsMSInUFKu1zX6UuRernAxUbWAz/wmxYutQ7\n+hSpdektQaQD0qXegUgAc57j0fnu0atIrYtvDyPThrT849Qkr1uRmpffGs6WNjwvK81EOjPJ\n61ek9gFag0ob0nxm97Mran0AAAnhSURBVHqYNS71LFLzBM3BpA3LAempVtaFpo5FEkgAenyG\noLdImZudfMIKgW4sEEEEBqY37/vvppld5v14XYskkUECTpeWpDQfkHJM6lskjRAK8Pm/Ba8z\npcUk76dPnYskkkICTFqxXg3/OTQhEsAPcm906F0kFsG3MDDNYbEhE5UcOnC6dAFEYkzawMrD\neRBpkIqiAiadBJEGqSjgFER6oJQFXIJIT6TCaMH5UhaINKKVRom/P9YeMkCkCdbufoBJhyDS\nC7U84ApEesOYBNdBpA96iQRhlrcPIs1gTDqGtYd9EGmBZCg5MGkLIi2QDAUOQKQlzO7OwdA0\ngUhrUOkMnDBNINIG2WCaoNITRNrCmHQSREKkfZSzKdOxUIi0C4PSJTqe5iHSPtrpZOnXJET6\nAmPSVfo0CZG+gko36cooRPqBg4jKdHVXHiL9gkHpHh2ZhEi/8ZFSGURqVbIWDEpGBB+cEOkQ\nVDIh+DQPkTJwFFWbwCYhUg4MSsbEG50QKQ9faeWJ94F1RMqEQcmaWCYhUjaoVIwARiHSCRxG\n9kGAiR4inYFBqRTuTUKkc6AS7IJIZ/Ga2xEeZ3qIdJq8b7mG63hcHEekK3jO7gZEulmyBxiV\nKuJieEKkq6BSLVzM9BDpOv5r4IiPSZpKIdINEjO8BmiOT4h0kyDVcIWiSYh0F0altohIhUgG\noFJDRGZ6iGQCKjVlZlIrpRDJCBYeNGg1PiGSIQGr5I9GUz1EsoRRSYP5NadKUiGSMUzxlKg3\nPCFSAQJXzSOL8alUIYhUgsS4pEjJ8QmRSoFKivyVWinXFin9fGXae2rzm4bdGZekMR2gnIuU\n8cu2fZk5njALk24q5USk9Hp3T5+/pxEpvV74fCbN/ny/qjG45IGbn3ryIlIaFtaM4mx/9f5r\n8ar2MDA54PsAlSGYF5HmWyztmVszbEUSGJEmcMkTf/MR6u/530/ciTTN3T6/X07tXo8Wr5IB\nlxzxMekl0y+8ibSd4s1tmU/tBkGRBlzySECRdv52JtIwcLnWG/5Fmg9C60WG+WJD+hikudiw\nBZv88Dd4P0d6z4QWJ0Kzv9e/GgbB5e+vIJMPIohUYydtwSV5/v6ObydCJAEYmdQ5vkzrWqRQ\nvQ+bXONapHgkdHIKIgmCTP5AJFEYmnyBSMpgkxtcifS6j+798/KzFYtb7OKQOHFygCeR5jct\nDLPHizuB5rcIhQKfpIkmkujtdXYgkyaORFreKzT8ECm0SQODkyJRREoLg/roY/gkRASRZvem\ndiXSREIoAUKI9Pm7R5FeIFRLYom0tKpP8KkJEUT6rHgj0geEqoo3kT4XYeeP01ytK/sOTMKo\nGjgSKX8L+sweCaUKgkgdglL2eBIpdxO6Ry4JpaxwJRKUA6XugUiwgmHqCogE30kLWqeRBpEg\nF6z6ASLBFRirViAS3CfhFSKBOT16hUhQmrSmdaASIBJUZ2NWALkQCRRwrxYigSh745auXogE\nrtjXy7ozpZ1HuVvkPmEFIoEdX/y65tjjnwBJq0eH25x+wgpEgir8cOxLH1x+5Hr+EdIfpZx+\nwgpEgtZ8cWsjUkZfRSSAFTsiHfZWRAJYsRXp+FSphEh5Z3mIBKJsz5GGw/5aQKS0eWBcAEBR\nNERKuw8NCwAozGzNYbX68H2T008chsjcByKBFzIuRymKhGLgDsFzJOEbqgC+ILhqh0jgD73r\nSNL3+ALsIyeS9s3yAPsUFenKYoP4x04Adqk7Ih3feVvqAyYARVGb2iESuERMpFufxwJohuDy\nN4A/BC/IAvhD8RYhAHcgEoABiARgAOdIAAawagdggNh1JACfIBKAAYgEYAAiARiASAAGIBKA\nAYgEYAAiARiASAAGIBKAAYgEYAAiARiASAAGIBKAAYgEYAAiARjQUCSAQDQT6TtOxiofMX2k\nDBwTkY7wEdNHysAxEekIHzF9pAwcE5GO8BHTR8rAMRHpCB8xfaQMHBORjvAR00fKwDER6Qgf\nMX2kDBwTkY7wEdNHysAxEekIHzF9pAwcE5GO8BHTR8rAMRHpCB8xfaQMHBORjvAR00fKwDGd\n1AxAG0QCMACRAAxAJAADEAnAAEQCMACRAAxAJAADEAnAAEQCMACRAAxAJAADEAnAAEQCMACR\nAAxAJAADEAnAAEQCMACRAAxAJAADGoi0+LamH1/d1JhVMtGY68Z0ErNdkp+sk53IWb9KaV7q\n4gcpVslEj73HxnQSczh30KvXKM2LXfwgxSpZkgzpsjHT7LEW6wY8ddARaZ9lsqQZcq/9FHNu\nYyqm3L55ItJ9NskUQ7oVSXOijEgF8CqSh5iqiw2bWQgi3QeRDPEY8+ypHCLt41QkyZQeZ6Cn\neyYi7eOji27XmSTxKNLEyY0rgkiGbE+PJfF6zKVHpOVJ3MkzuoqskzlIqRnxwTxm8nPMtUV6\nL9qk+Q96LGKqHvp5yrOTkZr4PObiIgHEA5EADEAkAAMQCcAARAIwAJEADEAkAAMQCcAARAIw\nAJEADEAkAAMQCcAARAIwAJEADEAkAAMQCcAARAIwAJEADEAkAAMQCcAARAIwAJEADEAkAAMQ\nCcAARAIwAJEADEAkAAMQqR66/+Y13IZDWw3db2GA+3Bkq0JzR4UjWxWaOyoc2ZrQ2mHh0FaE\nxo4Lx/YcP78Wb+9rE2cb7G11uJB3vNK384KD77799oV0dIbr0Hbn+PpFmJvfpPVfewZeseT4\nJTnfg7r3Hbm7L/2R4DhcP92rn5ra8PXbRQ9E+tLQiBSFfmpqw8KLaYhJ0/zt9f3yaf7F3V9E\nml7zGaPe3wP8mgY+H6T36z6Fvp+cnkirzZbhFjler5me/exn/fMiV5p2u3r1Ypuf8Xugl3pa\nMffi3a/T6hdTb99ssNzNYgxI6//TbC/r/ae9Vy5/WP+xFmn16r1dDasn9quYdjdcxO+Cfmpq\nw7zbvR5tO/tWpK1Hi46Wdp9IXwpaurp5MB+IFgnS9JKt+CtRd6u6X9rOm8cmYQ/0U1MbVqt2\nKc16zUeH9OlDu8t8vkVaVBGRRvqpqQ3z9np1pqVIi99uevV8L4YivWyd9/j3actr1+n156FI\nc/nnFZufsu2KtM7RT/fqp6Y2pPXDtUhrR9J6s+2mw/rviyPSMIuy/fGMSOusy90uit0Zkb7t\nKTT91NQGZZGWI9L70VWRdp7YKW1HJEYkOGQt0uYcafnb9V/zbdP812nn/11JtwXubbb54/33\n/qtXP6fVZttNh91zpP34XdBPTW2Yt9fr2tHUsT5XdNKsD63HnM+2y99uLsQM8/f45WWj2Ss2\n15HmTy2LWNnw3nIYhs3Pq/WU2WsWVVxusxef60jgl/T1BygFzRwQRKoPzRyR9ZkcFId2BjAA\nkQAMQCQAAxAJwIB/XcM6MmXuUB0AAAAASUVORK5CYII=", "text/plain": [ "plot without title" ] }, "metadata": { "image/png": { "height": 420, "width": 420 } }, "output_type": "display_data" } ], "source": [ "plot(sensitivity)" ] }, { "cell_type": "markdown", "id": "b1756e8d", "metadata": { "kernel": "R" }, "source": [ "The estimate $\\hat{\\tau}_{\\text{res}}$ is plotted as a triangular point at the coordinate $(0,0)$. The two axes correspond to the two hypothetical $R^2$ values given in $\\eqref{relativebias}$. For any combination of partial $R^2$ values along the red, dashed contour line, the omitted variable bias is sufficient to imply that the \"true\" estimate from running the model specified in $\\eqref{ovb1}$ of $\\hat{\\tau}$ would be zero. The red diamonds specify effects equal to $1,$ $2,$ or $3$ times the magnitude of the variable \"female\" (one of the covariates in $W$). The authors argue that whether an individual is female is strongly predictive of experiencing violence and of having a preference for peace. Thus, the fact that omitted variables would need to be more than three times as important (in a partial $R^2$ sense) as the \"female\" variable is indicative that omitted variable bias is unlikely to explain away the results (i.e., set the relative bias to $1$, which implies that $\\hat{\\tau}$ would be $0$ if estimable)." ] }, { "cell_type": "markdown", "id": "056d88af", "metadata": { "kernel": "R" }, "source": [ "**A Capital Structure Example**\n", "\n", "Let's start with the conclusion of Modigliani and Miller:\n", "\\begin{equation}\n", " V^L = V^U + \\tau D. \\label{MM}\n", "\\end{equation}\n", "\n", "Suppose that there is a function $f:A\\mapsto V^U$ which maps the asset value of a firm, $A$, to the firm's unlevered value.\n", "For simplicity, we'll take $f()$ to be a polynomial function $f(A)=\\sum_{l=1}^L \\beta_l A^l$ of order $L$.\n", "Per the theory of Modigliani and Miller, we would thus expect that in the regression\n", "\\begin{equation}\n", " V^L = \\sum_{l=1}^L \\beta_l A^l + \\gamma \\tau D + U \\label{MMreg}\n", "\\end{equation}\n", "\n", "the coefficient $\\hat{\\gamma}$ will be one.\n", "\n", "Step 1: fire up WRDS" ] }, { "cell_type": "code", "execution_count": 14, "id": "c586e315", "metadata": { "kernel": "Python3" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loading library list...\n", "Done\n" ] } ], "source": [ "import wrds\n", "conn = wrds.Connection(wrds_username='nordlund')" ] }, { "cell_type": "markdown", "id": "c1c9307a", "metadata": { "kernel": "Python3" }, "source": [ "Step 2: pull the requisite data" ] }, { "cell_type": "code", "execution_count": 15, "id": "2cf5bdb7", "metadata": { "kernel": "Python3" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
value_leveredassets
count4.575700e+044.575700e+04
mean4.928903e+032.037475e+03
std2.714116e+041.622491e+04
min7.245600e-031.000000e-03
25%6.783682e+012.223700e+01
50%4.070055e+021.432660e+02
75%1.925432e+036.975930e+02
max1.823719e+061.020934e+06
\n", "
" ], "text/plain": [ " value_levered assets\n", "count 4.575700e+04 4.575700e+04\n", "mean 4.928903e+03 2.037475e+03\n", "std 2.714116e+04 1.622491e+04\n", "min 7.245600e-03 1.000000e-03\n", "25% 6.783682e+01 2.223700e+01\n", "50% 4.070055e+02 1.432660e+02\n", "75% 1.925432e+03 6.975930e+02\n", "max 1.823719e+06 1.020934e+06" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "funda = conn.raw_sql('''\n", " select distinct\n", " a.gvkey, a.fyear, a.naicsh,\n", " a.csho*a.prcc_f + a.dltt + a.dlc as value_levered, \n", " a.at-a.gdwl-a.intan as assets, \n", " (a.dltt + a.dlc)*b.bcg_mtrint as tauD\n", " from comp.funda as a\n", " inner join compa.marginal_tax as b\n", " on a.gvkey = b.gvkey and a.fyear = b.year\n", " and a.indfmt = 'INDL'\n", " and a.datafmt = 'STD'\n", " and a.popsrc = 'D'\n", " and a.consol = 'C'\n", " and a.fyear >= 2000\n", " and a.at > 0\n", " and a.dltt+a.dlc < a.at\n", "''')\n", "funda = funda[(funda['assets'] > 0) & (funda['value_levered'] >= funda['assets'])].copy()\n", "funda[['value_levered','assets']].describe()" ] }, { "cell_type": "markdown", "id": "aff3b020", "metadata": { "kernel": "Python3" }, "source": [ "Step 3: pass it to Stata" ] }, { "cell_type": "code", "execution_count": 16, "id": "6bf366eb", "metadata": { "kernel": "Stata" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "naicsh was double now str6\n", "\n", "\n", "sector: all characters numeric; replaced as int\n", "(97 missing values generated)\n", "\n", "(113 observations deleted)\n", "\n", "\n", "\n", " Variable | Obs Mean Std. dev. Min Max\n", "-------------+---------------------------------------------------------\n", " index | 45,644 39768.64 22074.86 1 77136\n", " gvkey | 0\n", " fyear | 45,644 2007.388 4.746293 2000 2016\n", "value_leve~d | 45,644 4925.796 27161.97 .0072456 1823719\n", " assets | 45,644 2036.945 16242.77 .001 1020934\n", "-------------+---------------------------------------------------------\n", " taud | 45,644 322.4741 3819.172 0 303969.9\n", " sector | 45,644 412.4792 133.7177 111 999\n" ] } ], "source": [ "%get funda --from Python3\n", "\n", "tostring naicsh, replace\n", "gen sector = substr(naicsh,1,3)\n", "destring sector, replace\n", "drop if sector < 100 | sector > 999\n", "drop naicsh\n", "\n", "su *" ] }, { "cell_type": "markdown", "id": "6190a90f", "metadata": { "kernel": "Stata" }, "source": [ "Step 4: polynomial regression - it's reasonable to suppose that $\\vec{\\beta}$ would differ for different industries/sectors." ] }, { "cell_type": "code", "execution_count": 17, "id": "3964f3d3", "metadata": { "kernel": "Stata" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n", "\n", "gamma is estimated to be 3.765, and the model's R-squared is .8320000000000001\n", "\n", "\n", " ( 1) taud = 1\n", "\n", " F( 1, 45456) = 1902.15\n", " Prob > F = 0.0000\n" ] } ], "source": [ "qui reg value_levered i.sector#c.assets i.sector#c.assets#c.assets taud\n", "local r2_observed = e(r2)\n", "local tau = _b[taud]\n", "\n", "di \"gamma is estimated to be `=round(`tau',.001)', and the model's R-squared is `=round(`r2_observed',.001)'\"\n", "test _b[taud] == 1" ] }, { "cell_type": "markdown", "id": "3d05c872", "metadata": { "kernel": "Stata" }, "source": [ "Note that $\\hat{\\gamma}$ is statistically significantly different from $1$." ] }, { "cell_type": "markdown", "id": "518bd847", "metadata": { "kernel": "Stata" }, "source": [ "Step 5: calculate Oster's $\\delta$" ] }, { "cell_type": "code", "execution_count": 18, "id": "d222b5db", "metadata": { "kernel": "Stata" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n", "\n", "\n", "Oster's delta is .12\n", "\n", "The omitted variables must explain .097 percent of the variation in y, relative \n", "> to the other observables which explain .8190000000000001\n" ] } ], "source": [ "qui psacalc delta taud, beta(1)\n", "local delta = r(delta)\n", "\n", "qui reg value_levered i.sector#c.assets i.sector#c.assets#c.assets, vce(robust)\n", "local r2_o = e(r2)\n", "\n", "di \"Oster's delta is `=round(`delta',.01)'\"\n", "di \"The omitted variables must explain `=round(`r2_o'*`delta',.001)' percent of the variation in y, relative to the other observables which explain `=round(`r2_o',.001)'\"" ] }, { "cell_type": "markdown", "id": "e6349ac7", "metadata": { "kernel": "Stata" }, "source": [ "### Collider Bias (aka Bad Controls)\n", "\n", "Believe it or not, adding data to your model is not always good. In the following graph, there is no backdoor path from $D$ to $Y$ if we exclude $E$ because the unobserved path $D \\rightarrow E \\leftarrow Z \\rightarrow Y$ will \"break\" at the collider, $E$. However, if we include $E$ as a variable in the model, the path $D \\rightarrow E \\leftarrow Z \\rightarrow Y$ is opened because the collider is controlled for. The path is still a backdoor because $Z$ is unobserved. Hence, including $E$ in the regression will *add* bias to the estimate of the effect of $D$.\n", "\n", "In this context $E$ is a \"bad\" control with respect to $D$. This is in spite of the fact that $E$ has a causal effect on $Y$. For this reason, \"collider bias\" is perhaps a less confusing than \"bad control\"." ] }, { "cell_type": "code", "execution_count": 19, "id": "80d68f44", "metadata": { "kernel": "Python3" }, "outputs": [ { "data": { "image/svg+xml": [ "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "G\r\n", "\r\n", "\r\n", "e\r\n", "\r\n", "e\r\n", "\r\n", "\r\n", "y\r\n", "\r\n", "y\r\n", "\r\n", "\r\n", "e->y\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "z\r\n", "\r\n", "z\r\n", "\r\n", "\r\n", "z->e\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "z->y\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "u\r\n", "\r\n", "u\r\n", "\r\n", "\r\n", "u->y\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "d\r\n", "\r\n", "d\r\n", "\r\n", "\r\n", "d->e\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "d->y\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "w\r\n", "\r\n", "w\r\n", "\r\n", "\r\n", "w->y\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n" ], "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cbias = graphviz.Digraph('G')\n", "cbias.node('e', _attributes={'style': 'dashed'})\n", "cbias.node('z', _attributes={'style': 'dashed'})\n", "cbias.node('u', _attributes={'style': 'dashed'})\n", "cbias.edge('d', 'y')\n", "cbias.edge('w', 'y')\n", "cbias.edge('u', 'y', _attributes={'style': 'dashed'})\n", "cbias.edge('z', 'e', _attributes={'style': 'dashed'})\n", "cbias.edge('d', 'e', _attributes={'style': 'dashed'})\n", "cbias.edge('z', 'y', _attributes={'style': 'dashed'})\n", "cbias.edge('e', 'y')\n", "cbias" ] }, { "cell_type": "markdown", "id": "8c970b8e", "metadata": { "kernel": "Python3" }, "source": [ "Consider the following graphs and Stata analysis (both examples borrowed from Scott Cunningham's *Causal Inference: The Mixtape*).\n", "\n", "**Pay Discrimination**\n", "\n", "Consider the graphical model of discrimination shown below." ] }, { "cell_type": "code", "execution_count": 20, "id": "29f74182", "metadata": { "kernel": "Python3" }, "outputs": [ { "data": { "image/svg+xml": [ "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "G\r\n", "\r\n", "\r\n", "ability\r\n", "\r\n", "ability\r\n", "\r\n", "\r\n", "occupation\r\n", "\r\n", "occupation\r\n", "\r\n", "\r\n", "ability->occupation\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "income\r\n", "\r\n", "income\r\n", "\r\n", "\r\n", "ability->income\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "occupation->income\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "female\r\n", "\r\n", "female\r\n", "\r\n", "\r\n", "discrimination\r\n", "\r\n", "discrimination\r\n", "\r\n", "\r\n", "female->discrimination\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "discrimination->occupation\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "discrimination->income\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n" ], "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import graphviz\n", "collider = graphviz.Digraph('G')\n", "collider.node('ability', _attributes={'style': 'dashed'})\n", "collider.edge('ability','occupation', _attributes={'style': 'dashed'})\n", "collider.edge('ability','income', _attributes={'style': 'dashed'})\n", "collider.edge('female','discrimination')\n", "collider.edge('discrimination','occupation')\n", "collider.edge('discrimination','income')\n", "collider.edge('occupation','income')\n", "collider" ] }, { "cell_type": "markdown", "id": "d285fb1d", "metadata": { "kernel": "Python3" }, "source": [ "Note that $\\text{female}$ is the only node pointing to $\\text{discrimination}$. This implies that we are working under a simplified model in which only females are discriminated against (when, in fact, discrimination along race, sexuality, religion, or other features may occur). This simplifying assumption means that we can include $\\text{female}$ in our regression and fully control for discrimination. Suppose we have the series of equations:\n", "\\begin{equation}\n", " \\text{discrimination} = \\text{female} + \\epsilon_1 \\label{collider1}\n", "\\end{equation}\n", "\\begin{equation}\n", " \\text{occupation} = \\text{ability} - \\text{discrimination} + \\epsilon_2 \\label{collider2}\n", "\\end{equation}\n", "\\begin{equation}\n", " \\text{income} = \\text{ability} + \\text{occupation} - \\text{discrimination} + \\epsilon_3 \\label{collider3}\n", "\\end{equation}\n", "where $\\epsilon_1 \\perp (\\epsilon_2\\text{ }\\epsilon_3)$ and $(\\epsilon_2\\text{ }\\epsilon_3)' \\sim MVN(\\vec{0},I_2 (\\sigma_2\\text{ }\\sigma_3)')$. The $\\beta$ coefficients in all models are one in absolute magnitude. Plugging $\\eqref{collider2}$ in to $\\eqref{collider3}$ we have\n", "\\begin{equation}\n", " \\text{income} = 2\\times\\text{ability} - 2\\times\\text{discrimination} + \\epsilon_4 \\label{collider4}\n", "\\end{equation}\n", "for $\\epsilon_4 \\sim N(0,\\sigma_2 + \\sigma_3)$.\n", "\n", "The Stata code to generate data according to this model is shown below." ] }, { "cell_type": "code", "execution_count": 21, "id": "a5886375", "metadata": { "kernel": "Stata" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "Number of observations (_N) was 0, now 10,000.\n", "\n", "\n", "\n", "\n", "\n", "\n" ] } ], "source": [ "clear *\n", "set obs 10000\n", "set seed 0\n", "\n", "gen byte female = runiform() > .5\n", "gen ability = rnormal()\n", "gen byte discrimination = female + (female == 0)*(runiform() > .99) - (runiform() > .99)\n", "gen occupation = ability - discrimination + rnormal()\n", "gen income = ability + occupation - discrimination + rnormal()" ] }, { "cell_type": "markdown", "id": "bb11108a", "metadata": { "kernel": "Stata" }, "source": [ "Because $\\text{ability}$ and $\\text{female}$ are independent, the total effect of discrimination is given by regressing $\\text{income}$ on $\\text{female}$. Given $\\eqref{collider4}$, we would expect a $\\beta$ coefficient of -2 on $\\text{female}$." ] }, { "cell_type": "code", "execution_count": 22, "id": "d150f349", "metadata": { "kernel": "Stata" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " Source | SS df MS Number of obs = 10,000\n", "-------------+---------------------------------- F(1, 9998) = 1503.12\n", " Model | 9214.27486 1 9214.27486 Prob > F = 0.0000\n", " Residual | 61288.8213 9,998 6.13010815 R-squared = 0.1307\n", "-------------+---------------------------------- Adj R-squared = 0.1306\n", " Total | 70503.0962 9,999 7.05101472 Root MSE = 2.4759\n", "\n", "------------------------------------------------------------------------------\n", " income | Coefficient Std. err. t P>|t| [95% conf. interval]\n", "-------------+----------------------------------------------------------------\n", " female | -1.91983 .0495184 -38.77 0.000 -2.016896 -1.822764\n", " _cons | .0137958 .0349587 0.39 0.693 -.0547303 .0823219\n", "------------------------------------------------------------------------------\n" ] } ], "source": [ "reg income female" ] }, { "cell_type": "markdown", "id": "921e5c8d", "metadata": { "kernel": "Stata" }, "source": [ "However, if we control for $\\text{occupation}$, there is now a backdoor path $\\text{discrimination} \\rightarrow \\text{occupation} \\leftarrow \\text{ability} \\rightarrow \\text{income}$. Remember, backdoor paths are acceptable if there is a collider (e.g. $\\rightarrow \\text{occupation} \\leftarrow$) that is not controlled for. However, once we add the collider to the model, the path is open and we lose the ability to derive a causal estimate of $\\text{discrimination}$ on $\\text{income}$ because $\\{\\text{income}\\}$ does not $d$-separate $\\text{female}$ from $\\text{income}$ (we would need the set $\\{\\text{ability, occupation}\\}$ to achieve $d$-separation)." ] }, { "cell_type": "code", "execution_count": 23, "id": "6893a613", "metadata": { "kernel": "Stata" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " Source | SS df MS Number of obs = 10,000\n", "-------------+---------------------------------- F(2, 9997) = 17649.43\n", " Model | 54942.7483 2 27471.3742 Prob > F = 0.0000\n", " Residual | 15560.3478 9,997 1.55650173 R-squared = 0.7793\n", "-------------+---------------------------------- Adj R-squared = 0.7793\n", " Total | 70503.0962 9,999 7.05101472 Root MSE = 1.2476\n", "\n", "------------------------------------------------------------------------------\n", " income | Coefficient Std. err. t P>|t| [95% conf. interval]\n", "-------------+----------------------------------------------------------------\n", " female | -.4786197 .0263307 -18.18 0.000 -.5302332 -.4270062\n", " occupation | 1.513416 .0088296 171.40 0.000 1.496108 1.530723\n", " _cons | .0008055 .0176157 0.05 0.964 -.0337249 .0353358\n", "------------------------------------------------------------------------------\n" ] } ], "source": [ "reg income female occupation" ] }, { "cell_type": "markdown", "id": "9b4d2330", "metadata": { "kernel": "Stata" }, "source": [ "If $\\text{ability}$ were in fact observable, then we could identify the direct effect of $\\text{discrimination}$ as $-1$." ] }, { "cell_type": "code", "execution_count": 24, "id": "fd265518", "metadata": { "kernel": "Stata" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " Source | SS df MS Number of obs = 10,000\n", "-------------+---------------------------------- F(3, 9996) = 19267.67\n", " Model | 60108.4159 3 20036.1386 Prob > F = 0.0000\n", " Residual | 10394.6803 9,996 1.03988398 R-squared = 0.8526\n", "-------------+---------------------------------- Adj R-squared = 0.8525\n", " Total | 70503.0962 9,999 7.05101472 Root MSE = 1.0197\n", "\n", "------------------------------------------------------------------------------\n", " income | Coefficient Std. err. t P>|t| [95% conf. interval]\n", "-------------+----------------------------------------------------------------\n", " female | -.970721 .0226261 -42.90 0.000 -1.015073 -.9263693\n", " occupation | 1.013039 .0101236 100.07 0.000 .9931952 1.032884\n", " ability | 1.011533 .0143519 70.48 0.000 .9834006 1.039666\n", " _cons | -.0072966 .014399 -0.51 0.612 -.0355215 .0209283\n", "------------------------------------------------------------------------------\n" ] } ], "source": [ "reg income female occupation ability" ] } ], "metadata": { "kernelspec": { "display_name": "SoS", "language": "sos", "name": "sos" }, "language_info": { "codemirror_mode": "sos", "file_extension": ".sos", "mimetype": "text/x-sos", "name": "sos", "nbconvert_exporter": "sos_notebook.converter.SoS_Exporter", "pygments_lexer": "sos" }, "sos": { "kernels": [ [ "Python3", "python3", "Python3", "#FFD91A", { "name": "ipython", "version": 3 } ], [ "R", "ir", "R", "#DCDCDA", "r" ], [ "Stata", "stata", "Stata", "#CAE8F3", "stata" ] ], "panel": { "displayed": true, "height": 0 }, "version": "0.22.3" } }, "nbformat": 4, "nbformat_minor": 5 }