{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# LMM with multiple crossed of nested random effects\n",
    "\n",
    "Often the experimental design or the data suggests a linear mixed model whose random effects are associated with multiple grouping factors.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Crossed random effects\n",
    "\n",
    "An inclusion of multiple random effects terms which correspond to multiple grouping factors is often refered to as *crossed random effect*. A good coverage of linear mixed models with crossed random effects can be found in [Chapter 2](http://lme4.r-forge.r-project.org/book/Ch2.pdf) of Douglas Bates' lme4 book.\n",
    "\n",
    "A simple example of nested random effects from Bates' book is the following model. The $i$th observation of *diameter* in the $j$th *sample* from the $k$th *plate* is modeled as:\n",
    "\n",
    "$$diameter_{ijk} = Intercept + SampleIntercept_{j} + \n",
    "PlateIntercept_{k} + RandomError_{ij},$$\n",
    "\n",
    "where *Intercept* is the overall average, and *SampleIntercept* as well as *PlateIntercept* are random intercept terms, due to the *sample* and *plate* that a particular observation comes from.\n",
    "\n",
    "In `mixed_models` we would fit such a model with:\n",
    "\n",
    "```Ruby\n",
    "LMM.from_formula(formula: \"diameter ~ 1 + (1 | Sample) + (1 | Plate)\", \n",
    "                 data: penicillin)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example with simulated data\n",
    "\n",
    "As an example we fit a linear mixed model, which can be written as\n",
    "\n",
    "$$y = \\beta_{0} + \\beta_{1} \\cdot x + b_{0} + b_{1} \\cdot x + c_{0} + c_{1} \\cdot x + \\epsilon,$$\n",
    "\n",
    "where $y$ is the response and $x$ is a predictor variable; $\\beta_{0}$ and $\\beta_{1}$ are the fixed intercept and slope coefficients; $b_{0}$ and $b_{1}$ are *random* intercept and slope coefficients due to factor $g$; $c_{0}$ and $c_{1}$ are *random* intercept and slope coefficients due to factor $h$.\n",
    "\n",
    "The simulated data set is loaded, and its first five rows are displayed with:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table><tr><th colspan=\"5\">Daru::DataFrame:47127301961640  rows: 5  cols: 4</th></tr><tr><th></th><th>g</th><th>h</th><th>x</th><th>y</th></tr><tr><td>0</td><td>1</td><td>1</td><td>1.71742040246789</td><td>0.202546206520008</td></tr><tr><td>1</td><td>2</td><td>3</td><td>0.223744902239436</td><td>0.840573625427331</td></tr><tr><td>2</td><td>3</td><td>1</td><td>-1.11598926418025</td><td>-0.998332155138107</td></tr><tr><td>3</td><td>1</td><td>2</td><td>-0.15562952641427</td><td>-0.0145985318440115</td></tr><tr><td>4</td><td>1</td><td>2</td><td>-0.108919415063593</td><td>0.722443338784882</td></tr></table>"
      ],
      "text/plain": [
       "\n",
       "#<Daru::DataFrame:47127301961640 @name = 38fb0dad-dcc4-4071-a041-d329b961aaf8 @size = 5>\n",
       "                    g          h          x          y \n",
       "         0          1          1 1.71742040 0.20254620 \n",
       "         1          2          3 0.22374490 0.84057362 \n",
       "         2          3          1 -1.1159892 -0.9983321 \n",
       "         3          1          2 -0.1556295 -0.0145985 \n",
       "         4          1          2 -0.1089194 0.72244333 \n"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "require 'mixed_models'\n",
    "# we pass `headers: true` to `#from_csv`, because\n",
    "# mixed_models expects that all variable names in the data frame are ruby Symbols\n",
    "df = Daru::DataFrame.from_csv \"../spec/data/crossed_effects_data.csv\", headers: true\n",
    "df.head 5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we fit a linear mixed model in `mixed_models`, and display the estimated correlation structure of the random effects:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table><tr><th colspan=\"5\">Daru::DataFrame:47127296310540  rows: 4  cols: 4</th></tr><tr><th></th><th>g</th><th>g_x</th><th>h</th><th>h_x</th></tr><tr><td>g</td><td>0.7539785718983487</td><td>0.999999998031062</td><td></td><td></td></tr><tr><td>g_x</td><td>0.999999998031062</td><td>0.7490861098771483</td><td></td><td></td></tr><tr><td>h</td><td></td><td></td><td>0.5638620447671433</td><td>0.9999999971404151</td></tr><tr><td>h_x</td><td></td><td></td><td>0.9999999971404151</td><td>0.38533198068098046</td></tr></table>"
      ],
      "text/plain": [
       "\n",
       "#<Daru::DataFrame:47127296310540 @name = 75ce6314-b05d-43bf-b00f-3c8fc68b5ea8 @size = 4>\n",
       "                    g        g_x          h        h_x \n",
       "         g 0.75397857 0.99999999        nil        nil \n",
       "       g_x 0.99999999 0.74908610        nil        nil \n",
       "         h        nil        nil 0.56386204 0.99999999 \n",
       "       h_x        nil        nil 0.99999999 0.38533198 \n"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mod = LMM.from_formula(formula: \"y ~ x + (x|g) + (x|h)\", data: df, reml: false)\n",
    "mod.ran_ef_summary"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that the crossed random effects corresponding to the grouping factors $g$ and $h$ form uncorrelated blocks in the correlation matrix. That is, crossed random effects are assumed to be independent by the model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Of course, we can use all of the model attributes, diagnostics and inference methods described in other `mixed_models` tutorials for this model as well. \n",
    "\n",
    "For example, we can test for the significance of the fixed slope effect, using the bootstrap approach with the following line of code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.0989010989010989"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "p_value = mod.fix_ef_p(variable: :x, method: :bootstrap, nsim: 1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or we can use the likelihood ratio test instead:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.04871184664935746"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "alternative_p_value = mod.fix_ef_p(variable: :x, method: :lrt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since the p-value obtained by LRT is barely below a significance level of 5%, and since the bootstrap method is typically more accurate than LRT and produces a rather high p-value here, we conclude that the data does not show enough evidence for statistical significance of predictor $x$ as a fixed effect."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Nested random effects\n",
    "\n",
    "The grouping factors of random effects terms can be *nested* in each other. We refer to such random effects structures as *nested* random effects (even though strictly speaking not the random effects but the corresponding grouping factors are nested). As for crossed random effects, a good reference for linear mixed models with nested random effects is [Chapter 2](http://lme4.r-forge.r-project.org/book/Ch2.pdf) of Douglas Bates' lme4 book.\n",
    "\n",
    "For example, consider an experiment where we measure the bone volume of each digit in each foot of a number of mice (i.e. digit is nested within foot, which is nested within mouse).\n",
    "The $i$th observation of *volume* in the $m$th *digit* of the $k$th *foot* of the $j$th *mouse* can be modeled as:\n",
    "\n",
    "$$volume_{ijkm} = Intercept +  MouseIntercept_{j} + FootIntercept_{kj} + RandomError_{ijkm},$$\n",
    "\n",
    "i.e. the random effect *foot* only appears as nested within *mouse* (i.e. the intercept due to foot 1 in mouse 1 is different than the intercept due to foot 1 in mouse 2).\n",
    "\n",
    "In `mixed_models` we could fit such a model with:\n",
    "\n",
    "```Ruby\n",
    "LMM.from_formula(formula: \"volume ~ 1 + (1 | mouse) + (1 | mouse:foot)\",\n",
    "                 data: bone_data)\n",
    "```\n",
    "\n",
    "__Remark:__ In the `R` package `lme4`, instead of the formula \"`volume ~ 1 + (1|mouse) + (1|mouse:foot)`\" a shorter equivalent formula \"`volume ~ 1 + (1|mouse/foot)`\" can be used to fit the model. However, the formula parser in `mixed_models` currently does not support the shortcut notation `/`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example with simulated data\n",
    "\n",
    "As an example we fit a linear mixed model with nested random effects to the following data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table><tr><th colspan=\"5\">Daru::DataFrame:47127301459180  rows: 5  cols: 4</th></tr><tr><th></th><th>a</th><th>b</th><th>x</th><th>y</th></tr><tr><td>0</td><td>a3</td><td>b1</td><td>0.388425310194731</td><td>5.10364866735101</td></tr><tr><td>1</td><td>a3</td><td>b2</td><td>0.446223000551612</td><td>6.23307061450375</td></tr><tr><td>2</td><td>a3</td><td>b1</td><td>1.54993657118302</td><td>12.2050404173393</td></tr><tr><td>3</td><td>a3</td><td>b1</td><td>1.52786614599715</td><td>12.0067595454774</td></tr><tr><td>4</td><td>a3</td><td>b2</td><td>0.760112121512708</td><td>8.20054527384668</td></tr></table>"
      ],
      "text/plain": [
       "\n",
       "#<Daru::DataFrame:47127301459180 @name = baf7edf8-b4d9-4eed-b230-1a23a4e58bd6 @size = 5>\n",
       "                    a          b          x          y \n",
       "         0         a3         b1 0.38842531 5.10364866 \n",
       "         1         a3         b2 0.44622300 6.23307061 \n",
       "         2         a3         b1 1.54993657 12.2050404 \n",
       "         3         a3         b1 1.52786614 12.0067595 \n",
       "         4         a3         b2 0.76011212 8.20054527 \n"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = Daru::DataFrame.from_csv(\"../spec/data/nested_effects_with_slope_data.csv\", headers: true)\n",
    "df.head 5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We consider the following model:\n",
    "\n",
    "* We take `y` to be the response and `x` its predictor.\n",
    "* We consider the factor `b` to be nested within the factor `a`.\n",
    "* We assume that the intercept varies due to variable `a`; that is, a different (random) intercept term for each level of `a`.\n",
    "* Moreover, we assume that the intercept varies due to the factor `b` which is nested in `a`; that is, different (random) intercept for each combination of levels of `a` and `b`.\n",
    "\n",
    "We fit this model in `mixed_models`, and display the estimated random effects correlation structure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table><tr><th colspan=\"3\">Daru::DataFrame:47127301931880  rows: 2  cols: 2</th></tr><tr><th></th><th>a</th><th>a_and_b</th></tr><tr><td>a</td><td>1.3410830040769561</td><td></td></tr><tr><td>a_and_b</td><td></td><td>0.9769750031499026</td></tr></table>"
      ],
      "text/plain": [
       "\n",
       "#<Daru::DataFrame:47127301931880 @name = 22ed9ffb-8435-4076-9f8d-5d81365c6085 @size = 2>\n",
       "                    a    a_and_b \n",
       "         a 1.34108300        nil \n",
       "   a_and_b        nil 0.97697500 \n"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mod = LMM.from_formula(formula: \"y ~ x + (1|a) + (1|a:b)\", data: df, reml: false)\n",
    "mod.ran_ef_summary"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that the standard deviations of the effect of `a` and of the nested effect of `b` and `a` are of comparable magnitude."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use all methods available in `LMM` to look at various parameter estimates or to do statistical inference.\n",
    "\n",
    "For example, we can test the nested random effect for significance, in order to decide whether we should drop that term from the model to reduce model complexity. We can use the Chi squared based likelihood ratio test as follows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.0050606262424956515"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "p_val = mod.ran_ef_p(variable: :intercept, grouping: [:a, :b], method: :lrt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Where the nested grouping factor is supplied as an Array `[:a, :b]`.\n",
    "\n",
    "The p-value is small, suggesting that we probably should keep the term `(1|a:b)` in the model formula. To be more sure we can perform a bootstrap based hypothesis test as follows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.000999000999000999"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "p_val_boot = mod.ran_ef_p(variable: :intercept, grouping: [:a, :b], \n",
    "                           method: :bootstrap, nsim: 1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The bootstrap p-value also support the above conclusion."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Ruby 2.3.1",
   "language": "ruby",
   "name": "ruby"
  },
  "language_info": {
   "file_extension": ".rb",
   "mimetype": "application/x-ruby",
   "name": "ruby",
   "version": "2.3.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}