{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# A Primer on Bayesian Methods for Multilevel Modeling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hierarchical or multilevel modeling is a generalization of regression modeling.\n", "\n", "*Multilevel models* are regression models in which the constituent model parameters are given **probability models**. This implies that model parameters are allowed to **vary by group**.\n", "\n", "Observational units are often naturally **clustered**. Clustering induces dependence between observations, despite random sampling of clusters and random sampling within clusters.\n", "\n", "A *hierarchical model* is a particular multilevel model where parameters are nested within one another.\n", "\n", "Some multilevel structures are not hierarchical. \n", "\n", "* e.g. \"country\" and \"year\" are not nested, but may represent separate, but overlapping, clusters of parameters\n", "\n", "We will motivate this topic using an environmental epidemiology example." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: Radon contamination (Gelman and Hill 2006)\n", "\n", "Radon is a radioactive gas that enters homes through contact points with the ground. It is a carcinogen that is the primary cause of lung cancer in non-smokers. Radon levels vary greatly from household to household.\n", "\n", "![radon](images/how_radon_enters.jpg)\n", "\n", "The EPA did a study of radon levels in 80,000 houses. Two important predictors:\n", "\n", "* measurement in basement or first floor (radon higher in basements)\n", "* county uranium level (positive correlation with radon levels)\n", "\n", "We will focus on modeling radon levels in Minnesota.\n", "\n", "The hierarchy in this example is households within county. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data organization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we import the data from a local file, and extract Minnesota's data." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns; sns.set_context('notebook')\n", "import warnings\n", "warnings.filterwarnings(\"ignore\", module=\"mkl_fft\")\n", "warnings.filterwarnings(\"ignore\", module=\"matplotlib\")\n", "\n", "# Import radon data\n", "srrs2 = pd.read_csv('../data/srrs2.dat')\n", "srrs2.columns = srrs2.columns.map(str.strip)\n", "srrs_mn = srrs2[srrs2.state=='MN'].copy()\n", "\n", "RANDOM_SEEDS = [20090425, 19700903]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, obtain the county-level predictor, uranium, by combining two variables." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "srrs_mn['fips'] = srrs_mn.stfips*1000 + srrs_mn.cntyfips\n", "cty = pd.read_csv('../data/cty.dat')\n", "cty_mn = cty[cty.st=='MN'].copy()\n", "cty_mn['fips'] = 1000*cty_mn.stfips + cty_mn.ctfips" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use the merge method to combine home- and county-level information in a single DataFrame." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "srrs_mn = srrs_mn.merge(cty_mn[['fips', 'Uppm']], on='fips')\n", "srrs_mn = srrs_mn.drop_duplicates(subset='idnum')\n", "u = np.log(srrs_mn.Uppm)\n", "\n", "n = len(srrs_mn)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
" ], "text/plain": [ " idnum state state2 stfips zip region typebldg floor room basement \\\n", "0 5081 MN MN 27 55735 5 1 1 3 N \n", "1 5082 MN MN 27 55748 5 1 0 4 Y \n", "2 5083 MN MN 27 55748 5 1 0 4 Y \n", "3 5084 MN MN 27 56469 5 1 0 4 Y \n", "4 5085 MN MN 27 55011 3 1 0 4 Y \n", "\n", " ... stopdt activity pcterr adjwt dupflag zipflag cntyfips \\\n", "0 ... 12288 2.2 9.7 1146.499190 1 0 1 \n", "1 ... 12088 2.2 14.5 471.366223 0 0 1 \n", "2 ... 21188 2.9 9.6 433.316718 0 0 1 \n", "3 ... 123187 1.0 24.3 461.623670 0 0 1 \n", "4 ... 13088 3.1 13.8 433.316718 0 0 3 \n", "\n", " county fips Uppm \n", "0 AITKIN 27001 0.502054 \n", "1 AITKIN 27001 0.502054 \n", "2 AITKIN 27001 0.502054 \n", "3 AITKIN 27001 0.502054 \n", "4 ANOKA 27003 0.428565 \n", "\n", "[5 rows x 27 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "srrs_mn.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also need a lookup table (dict) for each unique county, for indexing." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "srrs_mn.county = srrs_mn.county.map(str.strip)\n", "mn_counties = srrs_mn.county.unique()\n", "counties = mn_counties.shape[0]\n", "county_lookup = dict(zip(mn_counties, range(counties)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, create local copies of variables." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "county = srrs_mn['county_code'] = srrs_mn.county.replace(county_lookup).values\n", "radon = srrs_mn.activity\n", "srrs_mn['log_radon'] = log_radon = np.log(radon + 0.1).values\n", "floor_measure = srrs_mn.floor.values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Distribution of radon levels in MN (log scale):" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAEBCAYAAACe6Rn8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAD8xJREFUeJzt3X+s3XV9x/Hni5KJGYtVoCpcoLpRXAR1BQMmVDcNuqyQDf8gdgEWk2Up/iD8sURmFruZkHWMLURpBmYxMFGiWQia4SIxm7EVXIzyY2qgxLVAwSlQrwlGmLbv/XG/dcfjLZxzv+f0nNvP85HcnHs+3+/3831/m9vzOp/vz1QVkqQ2HTPrAiRJs2MISFLDDAFJapghIEkNMwQkqWGGgCQ1zBCQpIYZApLUMENAkhpmCEhSwwwBSWrYsbMuYDlJXgK8Gfg+cGDG5UjSarEGeDXwjap6fpQF5jIEWAqAnbMuQpJWqU3ArlFmnNcQ+D7Azp07WVhYmHUtkrQq7Nu3j02bNkH3GTqKeQ2BAwALCwusX79+xqVI0qoz8m50DwxLUsMMAUlqmCEgSQ0zBCSpYYaAJDXMEJCkhhkCktSweb1OQNIy1l9z11jz792+eUqV6GjhSECSGmYISFLDDAFJapghIEkNMwQkqWGGgCQ1zBCQpIYZApLUMENAkhpmCEhSwwwBSWqYISBJDTMEJKlhI4VAkuuT7ElSSc7q2k5I8sUkDyd5MMkdSU4aWOb8JA8k2Z3k7iTrprURkqSVGXUkcCfwVuDRgbYCrquqM6vqDcD3gO0ASQLcBry/qjYAXz00TZI0P0YKgaraVVWPD7Xtr6qvDDR9HTi9+/1c4Lmq2tW9vwm4tGetkqQJm8hDZZIcA1wJfKFrOo2BUUNVPZ3kmCSvqKr9Q8uuBdYOdbkwibokSS9sUk8W+zjwLHDjCpa9Gtg2oTokSWPoHQJJrgfOAC6uqoNd82P8/64hkpwI1PAooHMDcMtQ2wKws29tkqQX1isEklwLnANsrqrnByZ9E3hpkgu64wJbgc8t10dVLQKLQ/32KUuSNKKRQiDJx4B3A68CvpzkGZYO9H4Y2A3c031w76mqS6rqYJLLgZuTHAfsBS6bQv2SpB5GCoGqugq4aplJh/3KXlX3AGevsC5J0hHgFcOS1DBDQJIaZghIUsMMAUlqmCEgSQ0zBCSpYYaAJDXMEJCkhhkCktQwQ0CSGmYISFLDDAFJapghIEkNMwQkqWGGgCQ1bFLPGJY0pvXX3DXrEiRHApLUMkNAkhrm7iBpQty9o9XIkYAkNcwQkKSGGQKS1DBDQJIa9qIhkOT6JHuSVJKzBto3JLk3ye7u9YxRpkmS5scoI4E7gbcCjw613wTsqKoNwA7g5hGnSZLmxIueIlpVuwCS/KItyTpgI3Bh13Q7cGOSk4AcblpVPTXcf5K1wNqh5oXxNkOStBIrvU7gVOCJqjoAUFUHkjzZtecFpv1KCABXA9tWWIckqYd5uFjsBuCWobYFYOeRL0WS2rLSEHgcOCXJmu6b/hrg5K49LzDtV1TVIrA42Da460mSND0rOkW0qn4I3A9s6Zq2APdV1VMvNK1vsZKkyRrlFNGPJdnH0i6aLyf5TjdpK/DBJLuBD3bvGWGaJGlOjHJ20FXAVcu0PwScd5hlDjtNkjQ/vGJYkhpmCEhSwwwBSWqYISBJDTMEJKlhhoAkNcwQkKSGGQKS1DBDQJIaZghIUsMMAUlqmCEgSQ0zBCSpYYaAJDXMEJCkhhkCktQwQ0CSGmYISFLDXvTxktLRYv01d401/97tm6dUiTQ/HAlIUsMMAUlqmCEgSQ3rHQJJLkpyX5L7kzyY5N1d+4Yk9ybZ3b2e0b9cSdIk9QqBJAE+BVxeVW8CLgNuTXIMcBOwo6o2ADuAm/sWK0marEmcHXQQeFn3+1rg+8CJwEbgwq79duDGJCdV1VMTWKekKfAMqvb0CoGqqiSXAp9P8hPgN4DNwKnAE1V1oJvvQJInu/ZfCoEka1kKj0ELfeqSJI2m7+6gY4G/AP6wqk4HLgY+Cxw/RjdXA3uGfnb2qUuSNJq+B4bfBJxcVV8D6F5/AjwHnJJkDUD3ejLw+DJ93AC8ZuhnU8+6JEkj6HtMYB+wkOTMqno4yW8DrwIeAe4HtgC3da/3LXc8oKoWgcXBtqXjzZKkaet7TOB/klwJ/EuSg13ze6tqf5KtLJ0p9BHgR8AVPWuVJE1Y77ODqurTwKeXaX8IOK9v/5Kk6fGKYUlqmCEgSQ0zBCSpYYaAJDXMh8pIR7FxbwOh9jgSkKSGORKQDsNv0WqBIwFJapghIEkNMwQkqWGGgCQ1zBCQpIYZApLUMENAkhpmCEhSwwwBSWqYISBJDTMEJKlhhoAkNcwbyGnV8gZvUn+OBCSpYYaAJDXM3UGSVmzcXXJ7t2+eUiVaqd4jgSTHJfnHJI8k+a8kn+jaNyS5N8nu7vWM/uVKkiZpEiOB64DngA1VVUle2bXfBOyoqtuSXAbcDLx9AuuTJE1IrxBIcjxwBbBQVQVQVT9Isg7YCFzYzXo7cGOSk6rqqaE+1gJrh7pe6FOXJGk0fUcCvwk8A2xL8nvAs8BfAj8FnqiqAwBVdSDJk8CpwFNDfVwNbOtZhyRpBfoeEzgWeC1wX1WdC3wIuAM4fow+bgBeM/SzqWddkqQR9B0JPAr8nKXdPVTVfyZ5mqWRwClJ1nSjgDXAycDjwx1U1SKwONiWpGdZkqRR9BoJVNXTwH/Q7ftPsgFYB+wG7ge2dLNuYWm0MLwrSJI0Q5M4O2gr8Mkkfw/8DLi8qhaTbAVuTfIR4EcsHUCWJM2R3iFQVf8N/O4y7Q8B5/XtX5I0Pd42QpIaZghIUsMMAUlqmCEgSQ0zBCSpYYaAJDXMEJCkhhkCktQwQ0CSGmYISFLDDAFJapghIEkNMwQkqWGGgCQ1zBCQpIYZApLUMENAkhpmCEhSwwwBSWqYISBJDTMEJKlhhoAkNWxiIZBkW5JKclb3/vwkDyTZneTuJOsmtS5J0mRMJASSbATOBx7r3ge4DXh/VW0Avgpsn8S6JEmT0zsEkrwE2AG8D6iu+Vzguara1b2/Cbi077okSZN17AT6+ChwW1XtWRoAAHAa8OihN1X1dJJjkryiqvYPLpxkLbB2qM+FCdQlSXoRvUIgyVuANwPX9OjmamBbnzokSSvTd3fQ24DXAXuS7GXpG/yXgN8CTj80U5ITgRoeBXRuAF4z9LOpZ12SpBH0GglU1XYGDvh2QXAR8F3gz5Jc0B0X2Ap87jB9LAKLg20Du5UkSVM0iWMCv6KqDia5HLg5yXHAXuCyaaxLkrRyEw2Bqlo/8Ps9wNmT7F+SNFleMSxJDZvK7iBJWs76a+4aa/692zdPqRId4khAkhpmCEhSwwwBSWqYISBJDTMEJKlhhoAkNcwQkKSGGQKS1DBDQJIaZghIUsMMAUlqmCEgSQ0zBCSpYYaAJDXMEJCkhhkCktQwQ0CSGmYISFLDfLyk5sK4jx2UNBmOBCSpYYaAJDWsVwgkOSHJF5M8nOTBJHckOambdn6SB5LsTnJ3knWTKVmSNCl9RwIFXFdVZ1bVG4DvAduTBLgNeH9VbQC+CmzvuS5J0oT1CoGq2l9VXxlo+jpwOnAu8FxV7erabwIu7bMuSdLkTezsoCTHAFcCXwBOAx49NK2qnk5yTJJXVNX+oeXWAmuHuluYVF2SpMOb5IHhjwPPAjeOudzVwJ6hn50TrEuSdBgTGQkkuR44A7i4qg4meYyl3UKHpp8I1PAooHMDcMtQ2wIGwarmef/S6tA7BJJcC5wDbK6q57vmbwIvTXJBd1xgK/C55ZavqkVgcajPvmVJkkbQKwSSvB74MLAbuKf78N5TVZckuRy4OclxwF7gsp61SpImrFcIVNV3gGW/tlfVPcDZffqXJE2XVwxLUsMMAUlqmCEgSQ0zBCSpYYaAJDXMEJCkhvlkMY3EK4A1Cyv5u9u7ffMUKjl6ORKQpIYZApLUMENAkhpmCEhSwwwBSWqYISBJDTMEJKlhhoAkNcyLxSQdVca9wKz1i8scCUhSwxwJNMrbQEhLWh85OBKQpIYZApLUMENAkhpmCEhSw47KA8OtH+iRpFFNNQSSbABuBU4AngGuqKpHprnOo4VBJs2naZ9Zd6T/L097d9BNwI6q2gDsAG6e8vokSWOY2kggyTpgI3Bh13Q7cGOSk6rqqYH51gJrhxY/HWDfvn0rWvfPf/yDsebfu3fvWPNf8Lf/Ptb8uz709rHmh/G3YeHKT469DknzZ9zPo0EDn5lrRl0mVbXiFb5gx8k5wD9X1esH2r4LXFZV3xpo+ytg21SKkKQ2baqqXaPMOA8Hhm8Abhlq+zXgtcAjwIEjWMsCsBPYBKxsGDJ7bsP8OBq2w22YD6Nuwxrg1cA3Ru14miHwOHBKkjVVdSDJGuDkrv0XqmoRWFxm+d1TrG1ZSQ79uq+q9h7p9U+C2zA/jobtcBvmw5jb8L1x+p7ageGq+iFwP7Cla9oC3Dd4PECSNFvT3h20Fbg1yUeAHwFXTHl9kqQxTDUEquoh4LxprkOStHLeNuKXLQJ/zfLHKFYLt2F+HA3b4TbMh6ltw9ROEZUkzT9HApLUMENAkhpmCCwjyY4kDyV5IMnXkpw765rGleSyJA8m+XmSD8y6nnEk2ZDk3iS7u9czZl3TOJJcn2RPkkpy1qzrWYkkJyT5YpKHu7+jO5KcNOu6xpXkzu7/8X1JdiZ506xrWqkk26bxN2UILO/fgLOr6o3A3wCfnXE9K3E/8B7gM7MuZAVW+40H7wTeCjw660J6KOC6qjqzqt7A0gVI22dc00r8SVW9sap+B7geWJU32UqyETgfeGzSfRsCy6iqf62qn3Vv7wUWkqyqf6uq+nZVfRc4OOtaxjFw48Hbu6bbgY2r6VtoVe2qqsdffM75VVX7q+orA01fp7ux42pSVT8eePsyVtn/B4AkL2Hpy9D7WArniZqHewfNuw8Ad1XVqvvjWaVOBZ6oqgMA3S1Hnuzavdp8BrovQFcCX5h1LSuR5J+AdwIBfn/G5azER4HbqmrPwO0jJqbJEEjyLeC0w0x+5aEPoCTvAf6YpaH9XBl1G6QJ+DjwLHDjrAtZiar6U4AklwN/B/zBbCsaXZK3AG8GrpnWOpoMgara+GLzJLkEuBZ4R1WNd3P/I2CUbVilRrrxoI6MJNcDZwAXr/bRcFV9KsknkpxQVc/Mup4RvQ14HXBoFLAAfCnJe6vq7kmsYFXt5z5SklwE/APwrtV618HVyhsPzo8k1wLnAH9UVc/Pup5xJTk+yakD7y8G9nc/q0JVba+qk6tqfVWtZ+k20u+aVACAVwwvK8lTwP/yy/ug37GKvj2QZAtLQ9+Xs7QtPwHe2R0snmtJXsfSs6lfTnfjwap6eLZVjS7Jx4B3A68CngaeGXy40mqQ5PXAt1m6pftPu+Y9VXXJ7KoaT5JXAp8Hfp2l55LsB/588KFWq02SvcBFVfXtifVpCEhSu9wdJEkNMwQkqWGGgCQ1zBCQpIYZApLUMENAkhpmCEhSwwwBSWrY/wFlFUVYfSIImgAAAABJRU5ErkJggg==\n", "text/plain": [ "