{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Models, response schedules, and estimators"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook summarizes some probability distributions and models related to them, and draws a distinction between a model and a response schedule."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Some common probability distributions\n",
    "\n",
    "### Discrete\n",
    "\n",
    "+ Bernoulli: distribution of a single trial that can result in \"success\" (1) or \"failure\" (0). A random variable $X$ has the Bernoulli($p$) distribution iff \n",
    "$$\\Pr \\{ X=1 \\} = p, \\;\\; \\Pr \\{X=0\\} = 1-p.$$\n",
    "\n",
    "+ Binomial: distribution of the number of successes in $n$ independent Bernoulli($p$) trials. Special case: Bernoulli ($n=1$). A random variable $X$ has a Binomial($n,p$) distribution iff \n",
    "$$\\Pr \\{X=j\\} =  {{n}\\choose{j}}p^j(1-p)^{n-j}, \\; j=0, 1, \\ldots, n.$$\n",
    "\n",
    "+ Geometric: distribution of the number of trials until the 1st success in independent Bernoulli($p$) trials. A random variable $X$ has a Geometric($p$) distribution iff \n",
    "$$ \\Pr \\{X=j\\} = (1-p)^{j-1}p, \\;\\; j=1, 2, \\ldots .$$\n",
    "\n",
    "+ Negative binomial: distribution of the number of trials until the $k$th success in independent Bernoulli($p$) trials. Special case: geometric ($k=1$). A random variable $X$ has a Negative Binomial distribution with parameters $p$ and $k$ distribution iff \n",
    "$$\\Pr \\{X=j\\} = {{j-1}\\choose{k-1}}(1-p)^{j-k}p^k, \\;\\; j=k, k+1, \\ldots .$$\n",
    "\n",
    "+ Poisson: limit of Binomial as $n \\rightarrow \\infty$ and $p \\rightarrow 0$, with $np= \\lambda$. A random variable $X$ has a Poisson($\\lambda$) distribution iff\n",
    "$$ \\Pr \\{X=j\\} = e^{-\\lambda} \\frac{\\lambda^j}{j!}, \\;\\;j=0, 1, \\ldots .$$\n",
    "\n",
    "+ Hypergeometric: number of \"good\" items in a simple random sample of size $n$ from a population of $N$ items of which $G$ are good. A random variable $X$ has a hypergeometric distribution with parameters $N$, $G$, and $n$ iff\n",
    "$$ \\Pr \\{X = j,\\; j = 1, \\ldots, k\\} = \\frac{{{G}\\choose{j}}{{N-G}\\choose{n-j}}}{{N}\\choose{n}}, \\;\\; j = \\max(0,n-(N-G)), \\ldots, \\min(n, G).$$\n",
    "\n",
    "+ Multinomial: joint distribution of the number of values in each of $k \\ge 2$ categories\n",
    "for $n$ IID draws with probability $\\pi_j$ of selecting value $j$ in each draw. Special cases: uniform distribution on $k$ outcomes ($n=1$, $\\pi_j = 1/k$), binomial ($k=2$). A random vector $(X_1, \\ldots, X_k)$ has a multinomial joint distribution with parameters $n$\n",
    "and $\\{\\pi_j\\}_{j=1}^k$ iff\n",
    "$$ \\Pr \\{X_j = x_j \\} = \\prod_{j=1}^k \\pi_j^{x_j} \\frac{n!}{x_1!x_2! \\cdots x_j!}, \\;\\; x_j \\ge 0,\\;\\; \\sum_{j=1}^k x_j = n.$$\n",
    "\n",
    "+ Multi-hypergeometric: joint distribution of the number of values in each of $k \\ge 2$ categories for $n$ draws without replacement from a finite population of $N$ items of\n",
    "which $N_j$ are in category $j$. Special case: hypergeometric ($k = 2$). A random vector $(X_1, \\ldots, X_k)$ has a multi-hypergeometric joint distribution with parameters $\\{N_j\\}_{j=1}^k$ iff\n",
    "$$ \\Pr \\{ X_j = x_j,\\; j = 1, \\ldots, k \\} = \\frac{{{N_1}\\choose{x_1}} \\cdots {{N_k}\\choose{x_k}}}{{{N}\\choose{n}}}, \\;\\; x_j \\ge 0;\\;\\; \\sum_j x_j = n; \\;\\; \\sum_j N_j = N.$$\n",
    "\n",
    "### Continuous\n",
    "\n",
    "+ Uniform on a domain $\\mathbf{S}$. A random variable $X$ has a Uniform distribution on $\\mathbf{S}$ iff\n",
    "$$ \\Pr \\{X \\in A\\} = \\frac{\\int_{A \\cap S} dx}{\\int_{S} dx}.$$\n",
    "(Here and below, $A$ needs to be a Lebesgue-measurable set; we will not worry about measurability.)\n",
    "\n",
    "+ Normal. A random variable $X$ has a normal distribution with mean $\\mu$ and variance $\\sigma^2$ iff\n",
    "$$ \\Pr \\{ X \\in A \\} = \\int_A \\frac{1}{\\sqrt{2 \\pi} \\sigma} e^{-(x-\\mu)^2/(2\\sigma^2)} dx.$$\n",
    "\n",
    "+ Distributions derived from the normal: Student's t, F, chi-square\n",
    "\n",
    "+ Exponential. A random variable $X$ has an exponential distribution with rate $\\lambda$ \n",
    "(mean $\\lambda^{-1}$) iff\n",
    "$$ \\Pr \\{ X \\in A \\} = \\int_{A \\cap [0, \\infty)} \\lambda e^{-\\lambda x} dx.$$\n",
    "\n",
    "+ Gamma. A random variable $X$ has a Gamma distribution with shape parameter $\\alpha$\n",
    "and rate parameter $\\beta$ iff\n",
    "$$ \\Pr \\{ X \\in A \\} = \\int_{A \\cap [0, \\infty)}\\frac{\\beta ^{\\alpha }}{\\Gamma (\\alpha )}x^{\\alpha -1}e^{-\\beta x} dx.$$\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What's a model?\n",
    "\n",
    "An expression for the probability distribution of data $X$, usually \"indexed\" by a (possibly abstract, possibly infinite-dimensional) parameter, often relating some observables (_independent variables_, _covariates_, _explanatory variables_, _predictors_) to others (_dependent variables_, _response variables_, _data_, ...).\n",
    "\n",
    "$$\n",
    "   X \\sim \\mathbb{P}_\\theta, \\;\\; \\theta \\in \\Theta.\n",
    "$$\n",
    "\n",
    "### Examples\n",
    "\n",
    "+ coins and 0-1 boxes\n",
    "    - number of heads in 1 toss\n",
    "    - number of heads in $n$ tosses\n",
    "    - number of tosses to the first head\n",
    "    - number of tosses to the $k$th head\n",
    "    \n",
    "+ draws without replacement\n",
    "    - boxes of numbers\n",
    "    - boxes of categories\n",
    "\n",
    "+ radioactive decay\n",
    "\n",
    "+ Hooke's Law, Ohm's Law, Boyle's Law\n",
    "\n",
    "+ Conjoint analysis\n",
    "\n",
    "+ avian-turbine interactions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Some models\n",
    "\n",
    "+ Linear regression \n",
    "\n",
    "+ Linear probability model\n",
    "\n",
    "+ Logit\n",
    "\n",
    "+ Probit\n",
    "\n",
    "+ Multinomial logit\n",
    "\n",
    "+ Poisson regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Response schedules and causal inference\n",
    "\n",
    "A response schedule is an assertion about how Nature generated the data: it says how one variable would respond if you intervened and changed the value of other variables.\n",
    "\n",
    "Regression is about _conditional expectation_: the expected value of\n",
    "the response variable for cases _selected_ on the basis of the values of the predictor variables.\n",
    "\n",
    "Causal inference is about _intervention_: what would happen if the values of the predictor variables were exogenously set to some values.\n",
    "\n",
    "Response schedules connect _selection_ to _intervention_.\n",
    "\n",
    "For conditioning to give the same result as intervention, the model has to be a response schedule, and the response schedule has to be correct.\n",
    "\n",
    "+ Linear: a model for real-valued outcomes. $Y_X = X\\beta + \\epsilon$. Nature picks $X$, multiplies by $\\beta$, adds $\\epsilon$. $X$ and $\\epsilon$ are independent.\n",
    "\n",
    "    - Good examples (for suitable ranges of $X$ and suitable instrumental error): Hooke's law, Ohm's law, Boyle's law\n",
    "    - Bad examples: most (if not all) applications in social science, including econometrics.\n",
    "\n",
    "+ Linear probability model: a model for binary outcomes. $\\Pr \\{Y_j = 1 | X \\} = X_j\\beta + \\epsilon$, where the components of $\\epsilon$ are IID with mean zero. Not guaranteed to give probabilities between 0 and 1 when fitted to data.\n",
    "\n",
    "+ Logit: a model for binary outcomes. Logistic distribution function is $\\Lambda(x) = e^x/(1+e^x)$. The logit function is $\\mathrm{logit} p \\equiv \\log [p/(1-p)]$, also called the _log odds ratio_. The logit model is that $\\{Y_j\\}$ are independent with $\\Pr \\{Y_j = 1 | X \\} = \\Lambda(X_j \\beta)$.  Equivalently, $\\mathrm{logit} \\Pr(Y_j=1 | X) = X_j \\beta$. Also equivalently, the _latent variable_ formulation\n",
    "$$ Y_j = \\begin{cases} 1, & X_j\\beta + U_j \\ge 0\\\\ 0, & \\mathrm{otherwise,} \\end{cases}$$\n",
    "where $\\{U_j \\}$ are IID random variables with the logistic distribution, and are independent of $X$.\n",
    "\n",
    "+ Probit: a model for binary outcomes. Let $\\Phi$ denote the standard normal cdf. The probit model is that $\\{Y_j\\}$ are independent with $\\Pr \\{Y_j = 1 | X) = \\Phi(X_j \\beta)$. Equivalently, the latent variable formulation\n",
    "$$ Y_j = \\begin{cases} 1, & X_j\\beta + U_j \\ge 0\\\\ 0, &\\mathrm{otherwise,} \\end{cases}$$\n",
    "where $\\{U_j \\}$ are IID random variables with the standard normal distribution, and are independent of $X$.\n",
    "\n",
    "+ Multinomial logit: a model for categorical outcomes. Suppose there are $K$ categories.\n",
    "The multinomial logit model is that $\\{Y_j\\}$ are independent with\n",
    "$$ \\Pr \\{Y_j = k | X \\} = \\begin{cases} \\frac{e^{X_j \\beta_k}}{1 + \\sum_{\\ell=1}^{K-1}e^{X_j \\beta_\\ell}}, & k=1, \\ldots, K-1 \\\\ \\frac{1}{1 + \\sum_{\\ell=1}^{K-1}e^{X_j \\beta_\\ell}}, & k=K.\n",
    "\\end{cases}\n",
    "$$\n",
    "\n",
    "+ Poisson regression: a model for non-negative counts. The model is that $\\{Y_j\\}$ are independent Poisson random variables with corresponding rates $\\{\\lambda_j\\}$ and that\n",
    "$$ \\log \\lambda_j | X = X_j \\beta.$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}