{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Linear Mixed Models\n",
    "\n",
    "\n",
    "\n",
    "Consider a study that compares the effect of a new drug to a placebo."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "|        |   | Drug      |   |   |   | Placebo   |   |    |\n",
    "|--------|---|-----------|---|---|---|-----------|---|----|\n",
    "|        |   | Age Group |   |   |   | Age Group |   |    |\n",
    "|--------|---|-----------|---|---|---|-----------|---|----|\n",
    "|        | A | B         | C |   | A | B         | C |    |\n",
    "| Gender |   |           |   |   |   |           |   |    |\n",
    "| Male   |   |           |   |   |   |           |   |    |\n",
    "| Female |   |           |   |   |   |           |   |    |\n",
    "|        |   |           |   |   |   |           |   |    |\n",
    "\n",
    "Table 1: Study of effect if a drug over placebo for different age groups for males and females\n",
    "\n",
    "\n",
    "|        |   | School X  |   |   |   | School Y  |   |    |\n",
    "|--------|---|-----------|---|---|---|-----------|---|----|\n",
    "|        |   | Section   |   |   |   | Section   |   |    |\n",
    "|--------|---|-----------|---|---|---|-----------|---|----|\n",
    "|        | A | B         | C |   | A | B         | C |    |\n",
    "| Gender |   |           |   |   |   |           |   |    |\n",
    "| Male   |   |           |   |   |   |           |   |    |\n",
    "| Female |   |           |   |   |   |           |   |    |\n",
    "|        |   |           |   |   |   |           |   |    |\n",
    "\n",
    "Table 2: Study of marks scored for different schools over "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The definition for fixed effects and random-effects is a bit [hazy](http://andrewgelman.com/2005/01/25/why_i_dont_use/), but here we refer to the fixed effects arising due to a factor \n",
    "\n",
    "### Terminology\n",
    "-----------------------------\n",
    "**Factor**: Classifications identifying the source of datum. So 'Drug', 'Placebo' and 'Gender' are the three factors in the above study.\n",
    "\n",
    "**Levels**: Individual classes of the factors. So 'Male' and 'Female' are levels of 'Gender' factor and 'A', 'B', 'C' are the levels of 'gender' factor. 'Drug' and 'Placebo' are levels of 'Type of drug administered'\n",
    "\n",
    "**Crossed Factors**: All levels of 'Age' factor appear for both levels of the 'Type of drug' factorand hence is a 'crossed' factor.\n",
    "\n",
    "**Nested Factors**: In the table summarizing school wise grades, the section A under school X is comnpletely independent of section A under school Y. In fact, it does not matter what particular section we refer to, as long as we have 3 sections for both. Since the levels of the 'section' factor appear only once, 'section' is a nested factor within the 'type of school'.\n",
    "\n",
    "\n",
    "**Effects**: Extent to which different levels of a factor affect the variable of interest\n",
    "\n",
    "In order to define our meaning of fixed and random effects,, we refer to fixed effects as the effects that can be attributed to a finite set of levels of a factor. The random effects arise due to the levels of factor being infinite. So if the performance of two schools is being considered and we also record the student ids along with their gender and scores, the gender in itself will give rise to a fixed effect for there are just two levels: male and female. But, the student ids being levels of the student identifier factor can have infinite levels, because the student can be sampled randomly.\n",
    "\n",
    "**Fixed Effects** are simply the effects arising due to the levels of a fixed covariate, a factor whose all levels are of interest such that they represent specific conditions that are used to define specific contrasts\n",
    "\n",
    "**Random Effects** arise out of **random factors**, a factor whose levels were randomly sampled from a population of levels. Thus, all possible levels of **random factors** cannot be part of the study and hence only an inference can be made about the entire population of levels given a random sample of the levels\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Matrix Specification\n",
    "\n",
    "A linear mixed model is described by the following model:\n",
    "\n",
    "$$Y= \\underbrace{X\\beta}_\\text{Fixed} + \\underbrace{Zu + \\epsilon}_\\text{Random}$$\n",
    "\n",
    "where \n",
    "\n",
    "$n$ = Number of observations\n",
    "\n",
    "$p$ = Number of fixed effect covariates\n",
    "\n",
    "$q$ = Number of random effect covariates\n",
    "\n",
    "$Y$ = response column vector $(n \\times 1)$\n",
    "\n",
    "$X$ = $n \\times p$ model matrix representing the known values of the $p$ fixed-effect covariates \n",
    "      \n",
    "$\\beta$ = $p \\times 1$ fixed-effect parameter column vector\n",
    "\n",
    "$Z$ = $n \\times q$ model matrix for representing the known values of the $q$ random-effect covariates\n",
    "\n",
    "$u$ = $q \\times 1$ fixed-effect parameter column vector     \n",
    "\n",
    "$\\epsilon$ = Residual column vector\n",
    "\n",
    "\n",
    "By definition the random-effects are random variables and the $q$ random-effects parameter inthe $u$ column vector follow a multivarite normal distribution:\n",
    "\n",
    "$$u \\sim N(0,D)$$\n",
    "\n",
    "where $D$ is a $q \\times q$ variance-covariance matrix\n",
    "\n",
    "\n",
    "$\\epsilon$ column vector constitutes the residuals associated with the $n$ observations. \n",
    "\n",
    "Residuals and random-effects parameter vector $u$ are assumed to be independent and $\\epsilon$ is assumed to be normally distributed:\n",
    "\n",
    "$$\\epsilon \\sim N(0,R)$$ \n",
    "\n",
    "where $R$ is a $n\\times n$ variance-covariance matrix\n",
    "\n",
    "\n",
    "Our aim is to estimate the column vectors $\\beta$ , $u$ (by estimating for the $D$ matrix) and $\\epsilon$(by estimating the $R$ matrix)\n",
    "\n",
    "It is possible to represent the above model in a marginailised form:\n",
    "\n",
    "$$Y= \\underbrace{X\\beta}_\\text{Fixed} + \\underbrace{\\epsilon^*}_\\text{Random}$$\n",
    " \n",
    "where $\\epsilon^* \\sim N(0, ZDZ'+R)$ which helps to relax the constraints on both $D$ and $R$ being positive definite to $ZDZ'+R$ being positive definite.\n",
    "\n",
    "and hence $Y \\sim N(X\\beta, ZDZ'+R)$\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import statsmodels.api as sm\n",
    "import pandas\n",
    "import statsmodels.formula.api as smf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "data = pandas.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/lme4/Penicillin.csv', index_col=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "          diameter plate sample\n",
      "count   144.000000   144    144\n",
      "unique         NaN    24      6\n",
      "top            NaN     x      F\n",
      "freq           NaN     6     24\n",
      "mean     22.972222   NaN    NaN\n",
      "std       2.031034   NaN    NaN\n",
      "min      18.000000   NaN    NaN\n",
      "25%      22.000000   NaN    NaN\n",
      "50%      23.000000   NaN    NaN\n",
      "75%      24.000000   NaN    NaN\n",
      "max      27.000000   NaN    NaN\n"
     ]
    }
   ],
   "source": [
    "print data.describe(include='all')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['diameter' 'plate' 'sample']\n"
     ]
    }
   ],
   "source": [
    "print (data.columns.values)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "count     144\n",
      "unique      6\n",
      "top         F\n",
      "freq       24\n",
      "Name: sample, dtype: object\n",
      "count     144\n",
      "unique     24\n",
      "top         x\n",
      "freq        6\n",
      "Name: plate, dtype: object\n"
     ]
    }
   ],
   "source": [
    "print data['sample'].describe()\n",
    "print data['plate'].describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We take the Penicillin growth example from the 'lme4' manual. Diameter refers to the diameter of growth inhibition in a culturing experiment. There are 24 plates and 6 types of samples, each plate e\n",
    "The diameter varies with the plate and sample type.\n",
    "Clearly plate and sample are categorical variables with 24 and 6 levels each. The type of plate here is assumed to be sampled randomly(we are not interested in studying what effect a particular plate say A has, but are more interested in studying the effect of plate ). This is an example of crossed random effect. Crossed because each sample is used in each plate. \n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"21\" halign=\"left\">diameter</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>plate</th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "      <th>f</th>\n",
       "      <th>g</th>\n",
       "      <th>h</th>\n",
       "      <th>i</th>\n",
       "      <th>j</th>\n",
       "      <th>...</th>\n",
       "      <th>o</th>\n",
       "      <th>p</th>\n",
       "      <th>q</th>\n",
       "      <th>r</th>\n",
       "      <th>s</th>\n",
       "      <th>t</th>\n",
       "      <th>u</th>\n",
       "      <th>v</th>\n",
       "      <th>w</th>\n",
       "      <th>x</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sample</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>A</th>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td>...</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>B</th>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td>...</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>C</th>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td>...</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>D</th>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td>...</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>E</th>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td>...</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>F</th>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td>...</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>6 rows × 24 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       diameter                            ...                              \n",
       "plate         a  b  c  d  e  f  g  h  i  j ...  o  p  q  r  s  t  u  v  w  x\n",
       "sample                                     ...                              \n",
       "A             1  1  1  1  1  1  1  1  1  1 ...  1  1  1  1  1  1  1  1  1  1\n",
       "B             1  1  1  1  1  1  1  1  1  1 ...  1  1  1  1  1  1  1  1  1  1\n",
       "C             1  1  1  1  1  1  1  1  1  1 ...  1  1  1  1  1  1  1  1  1  1\n",
       "D             1  1  1  1  1  1  1  1  1  1 ...  1  1  1  1  1  1  1  1  1  1\n",
       "E             1  1  1  1  1  1  1  1  1  1 ...  1  1  1  1  1  1  1  1  1  1\n",
       "F             1  1  1  1  1  1  1  1  1  1 ...  1  1  1  1  1  1  1  1  1  1\n",
       "\n",
       "[6 rows x 24 columns]"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table = data.pivot_table(['plate'],  rows='sample', columns='plate', aggfunc=len)\n",
    "table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Loading required package: lattice\n",
       "Loading required package: Matrix\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%load_ext rpy2.ipython\n",
    "%R library(lme4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'data.frame':\t144 obs. of  3 variables:\n",
       " $ diameter: num  27 23 26 23 23 21 27 23 26 23 ...\n",
       " $ plate   : Factor w/ 24 levels \"a\",\"b\",\"c\",\"d\",..: 1 1 1 1 1 1 2 2 2 2 ...\n",
       " $ sample  : Factor w/ 6 levels \"A\",\"B\",\"C\",\"D\",..: 1 2 3 4 5 6 1 2 3 4 ...\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%R str(Penicillin)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "       <td>Model:</td>       <td>MixedLM</td> <td>Dependent Variable:</td> <td>diameter</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <td>No. Observations:</td>   <td>144</td>         <td>Method:</td>         <td>REML</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "     <td>No. Groups:</td>      <td>24</td>          <td>Scale:</td>         <td>1.4191</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <td>Min. group size:</td>     <td>6</td>        <td>Likelihood:</td>     <td>-306.6280</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <td>Max. group size:</td>     <td>6</td>        <td>Converged:</td>         <td>Yes</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <td>Mean group size:</td>    <td>6.0</td>            <td></td>               <td></td>     \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>       <th>Coef.</th> <th>Std.Err.</th>    <th>z</th>    <th>P>|z|</th> <th>[0.025</th> <th>0.975]</th>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>22.972</td>   <td>0.179</td>  <td>128.476</td> <td>0.000</td> <td>22.622</td> <td>23.323</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>plate RE</th>   <td>0.095</td>     <td></td>        <td></td>       <td></td>       <td></td>       <td></td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>sample RE</th>  <td>2.614</td>     <td></td>        <td></td>       <td></td>       <td></td>       <td></td>   \n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary2.Summary'>\n",
       "\"\"\"\n",
       "         Mixed Linear Model Regression Results\n",
       "=======================================================\n",
       "Model:            MixedLM Dependent Variable: diameter \n",
       "No. Observations: 144     Method:             REML     \n",
       "No. Groups:       24      Scale:              1.4191   \n",
       "Min. group size:  6       Likelihood:         -306.6280\n",
       "Max. group size:  6       Converged:          Yes      \n",
       "Mean group size:  6.0                                  \n",
       "-------------------------------------------------------\n",
       "            Coef.  Std.Err.    z    P>|z| [0.025 0.975]\n",
       "-------------------------------------------------------\n",
       "Intercept   22.972    0.179 128.476 0.000 22.622 23.323\n",
       "plate RE     0.095                                     \n",
       "sample RE    2.614                                     \n",
       "=======================================================\n",
       "\n",
       "\"\"\""
      ]
     },
     "execution_count": 104,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## Crossed Random Effects\n",
    "data = pandas.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/lme4/Penicillin.csv', index_col=0)\n",
    "\n",
    "vcf = {\"plate\" : \"1 + C(plate)\", \"sample\" : \"0 + C(sample)\"}\n",
    "model = sm.MixedLM.from_formula('diameter ~ 1', groups=\"plate\", vc_formula=vcf, data=data)\n",
    "result = model.fit()\n",
    "result.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The same example in R:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Linear mixed model fit by REML ['lmerMod']\n",
       "Formula: diameter ~ 1 + (1 | plate) + (1 | sample) \n",
       "   Data: Penicillin \n",
       "\n",
       "REML criterion at convergence: 330.8606 \n",
       "\n",
       "Random effects:\n",
       " Groups   Name        Variance Std.Dev.\n",
       " plate    (Intercept) 0.7169   0.8467  \n",
       " sample   (Intercept) 3.7309   1.9316  \n",
       " Residual             0.3024   0.5499  \n",
       "Number of obs: 144, groups: plate, 24; sample, 6\n",
       "\n",
       "Fixed effects:\n",
       "            Estimate Std. Error t value\n",
       "(Intercept)  22.9722     0.8086   28.41\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%R print (summary(fm <- lmer(diameter ~ 1 + (1|plate) + (1|sample), Penicillin)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Dyestuff example\n",
    "\n",
    "The dyestuff dataset is a self-sufficient dataset to understand fixed and random effects. The dataset involves batch to batch variability in the strength of dye. We have 6 batches and 5 samples for each batch, totalling to 30 observations. It is easy ti imagine two sources of variability here, the one within-batch and the other one across-batch. We use mixed models to quantify the batch-batch variability:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "       <td>Model:</td>       <td>MixedLM</td> <td>Dependent Variable:</td>   <td>Yield</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <td>No. Observations:</td>   <td>30</td>          <td>Method:</td>         <td>REML</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "     <td>No. Groups:</td>       <td>6</td>          <td>Scale:</td>        <td>2451.2239</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <td>Min. group size:</td>     <td>5</td>        <td>Likelihood:</td>     <td>-159.8271</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <td>Max. group size:</td>     <td>5</td>        <td>Converged:</td>         <td>Yes</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <td>Mean group size:</td>    <td>5.0</td>            <td></td>               <td></td>     \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>        <th>Coef.</th>  <th>Std.Err.</th>    <th>z</th>   <th>P>|z|</th>  <th>[0.025</th>   <th>0.975]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>1527.500</td>  <td>19.384</td>  <td>78.802</td> <td>0.000</td> <td>1489.508</td> <td>1565.492</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>groups RE</th> <td>1764.171</td>  <td>31.658</td>     <td></td>      <td></td>        <td></td>         <td></td>    \n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary2.Summary'>\n",
       "\"\"\"\n",
       "           Mixed Linear Model Regression Results\n",
       "==========================================================\n",
       "Model:             MixedLM  Dependent Variable:  Yield    \n",
       "No. Observations:  30       Method:              REML     \n",
       "No. Groups:        6        Scale:               2451.2239\n",
       "Min. group size:   5        Likelihood:          -159.8271\n",
       "Max. group size:   5        Converged:           Yes      \n",
       "Mean group size:   5.0                                    \n",
       "----------------------------------------------------------\n",
       "           Coef.   Std.Err.   z    P>|z|  [0.025   0.975] \n",
       "----------------------------------------------------------\n",
       "Intercept 1527.500   19.384 78.802 0.000 1489.508 1565.492\n",
       "groups RE 1764.171   31.658                               \n",
       "==========================================================\n",
       "\n",
       "\"\"\""
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pandas.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/lme4/Dyestuff.csv', index_col=0)\n",
    "model = smf.mixedlm('Yield ~ 1', groups=\"Batch\", data=data)\n",
    "result = model.fit()\n",
    "result.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Same example in R:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Linear mixed model fit by REML ['lmerMod']\n",
       "Formula: Yield ~ 1 + (1 | Batch) \n",
       "   Data: Dyestuff \n",
       "\n",
       "REML criterion at convergence: 319.6543 \n",
       "\n",
       "Random effects:\n",
       " Groups   Name        Variance Std.Dev.\n",
       " Batch    (Intercept) 1764     42.00   \n",
       " Residual             2451     49.51   \n",
       "Number of obs: 30, groups: Batch, 6\n",
       "\n",
       "Fixed effects:\n",
       "            Estimate Std. Error t value\n",
       "(Intercept)  1527.50      19.38    78.8\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%R print( summary(fm <- lmer(Yield ~ 1 + (1|Batch), Dyestuff)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###Dyestuff2 '\n",
    "\n",
    "Same as the dyestuff example, but with batch-batch variability missing:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       " Batch     Yield       \n",
       " A:5   Min.   :-0.892  \n",
       " B:5   1st Qu.: 2.765  \n",
       " C:5   Median : 5.365  \n",
       " D:5   Mean   : 5.666  \n",
       " E:5   3rd Qu.: 8.151  \n",
       " F:5   Max.   :13.434  \n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%R print(summary(Dyestuff2))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Linear mixed model fit by REML ['lmerMod']\n",
       "Formula: Yield ~ 1 + (1 | Batch) \n",
       "   Data: Dyestuff2 \n",
       "\n",
       "REML criterion at convergence: 161.8283 \n",
       "\n",
       "Random effects:\n",
       " Groups   Name        Variance Std.Dev.\n",
       " Batch    (Intercept)  0.00    0.000   \n",
       " Residual             13.81    3.716   \n",
       "Number of obs: 30, groups: Batch, 6\n",
       "\n",
       "Fixed effects:\n",
       "            Estimate Std. Error t value\n",
       "(Intercept)   5.6656     0.6784   8.352\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "\n",
    "%R print( summary(fm <- lmer(Yield ~ 1 + (1|Batch), Dyestuff2)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The variance for batch random effect is 0, thus there is no random effect. It is as good as a linear model:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "Call:\n",
       "lm(formula = Yield ~ 1, data = Dyestuff2)\n",
       "\n",
       "Residuals:\n",
       "    Min      1Q  Median      3Q     Max \n",
       "-6.5576 -2.9006 -0.3006  2.4854  7.7684 \n",
       "\n",
       "Coefficients:\n",
       "            Estimate Std. Error t value Pr(>|t|)    \n",
       "(Intercept)   5.6656     0.6784   8.352 3.32e-09 ***\n",
       "---\n",
       "Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1\n",
       "\n",
       "Residual standard error: 3.716 on 29 degrees of freedom\n",
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%R print (summary(fm <- lm(Yield ~1, Dyestuff2)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python2.7/dist-packages/statsmodels-0.7.0-py2.7-linux-x86_64.egg/statsmodels/regression/mixed_linear_model.py:1912: ConvergenceWarning: The MLE may be on the boundary of the parameter space.\n",
      "  warnings.warn(msg, ConvergenceWarning)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "       <td>Model:</td>       <td>MixedLM</td> <td>Dependent Variable:</td>   <td>Yield</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <td>No. Observations:</td>   <td>30</td>          <td>Method:</td>         <td>REML</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "     <td>No. Groups:</td>       <td>6</td>          <td>Scale:</td>         <td>13.8063</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <td>Min. group size:</td>     <td>5</td>        <td>Likelihood:</td>     <td>-80.9141</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <td>Max. group size:</td>     <td>5</td>        <td>Converged:</td>         <td>Yes</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <td>Mean group size:</td>    <td>5.0</td>            <td></td>               <td></td>    \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>      <th>Coef.</th> <th>Std.Err.</th>   <th>z</th>   <th>P>|z|</th> <th>[0.025</th> <th>0.975]</th>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>5.666</td>   <td>0.678</td>  <td>8.352</td> <td>0.000</td>  <td>4.336</td>  <td>6.995</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>groups RE</th> <td>0.000</td>   <td>1.235</td>    <td></td>      <td></td>       <td></td>       <td></td>   \n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary2.Summary'>\n",
       "\"\"\"\n",
       "         Mixed Linear Model Regression Results\n",
       "======================================================\n",
       "Model:            MixedLM Dependent Variable: Yield   \n",
       "No. Observations: 30      Method:             REML    \n",
       "No. Groups:       6       Scale:              13.8063 \n",
       "Min. group size:  5       Likelihood:         -80.9141\n",
       "Max. group size:  5       Converged:          Yes     \n",
       "Mean group size:  5.0                                 \n",
       "------------------------------------------------------\n",
       "              Coef. Std.Err.   z   P>|z| [0.025 0.975]\n",
       "------------------------------------------------------\n",
       "Intercept     5.666    0.678 8.352 0.000  4.336  6.995\n",
       "groups RE     0.000    1.235                          \n",
       "======================================================\n",
       "\n",
       "\"\"\""
      ]
     },
     "execution_count": 109,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pandas.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/lme4/Dyestuff2.csv', index_col=0)\n",
    "vcf = {\"batch\" : \"0 + C(Batch)\"}\n",
    "model = sm.MixedLM.from_formula('Yield ~ 1', groups=\"Batch\",  data=data)\n",
    "result = model.fit()\n",
    "result.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pastes\n",
    "The experiment consists of strength of chemical pastes packaged in different casks, at two time points. There are 10 batches, with 3 casks sampled randomly and two tests performed. In total 10x3x2 observations.\n",
    "\n",
    "So there are 30 samples with levels 'A:a', 'A:b', 'A:c' and so on, 3 samples for the 10 batches(This sort of naming saves us from the trouble of using different variable names for these 30 levels) The 'a', 'b', 'c' refer to the levels of the cask. But notice that cask 'a' of batch 'A' has no relation to cask 'a' of batch 'B' and so on. 'a', 'b', 'c' differentiate the casks within a batch but have no relation to other batches otherwise.\n",
    "\n",
    "\n",
    "This is an example of nested random effects because each level of sample occurs with one and only one level of the batch(compare it with the pivot table we had earlier where all the entries were one) \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"10\" halign=\"left\">strength</th>\n",
       "      <th>...</th>\n",
       "      <th colspan=\"10\" halign=\"left\">cask</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sample</th>\n",
       "      <th>A:a</th>\n",
       "      <th>A:b</th>\n",
       "      <th>A:c</th>\n",
       "      <th>B:a</th>\n",
       "      <th>B:b</th>\n",
       "      <th>B:c</th>\n",
       "      <th>C:a</th>\n",
       "      <th>C:b</th>\n",
       "      <th>C:c</th>\n",
       "      <th>D:a</th>\n",
       "      <th>...</th>\n",
       "      <th>G:c</th>\n",
       "      <th>H:a</th>\n",
       "      <th>H:b</th>\n",
       "      <th>H:c</th>\n",
       "      <th>I:a</th>\n",
       "      <th>I:b</th>\n",
       "      <th>I:c</th>\n",
       "      <th>J:a</th>\n",
       "      <th>J:b</th>\n",
       "      <th>J:c</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>batch</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>A</th>\n",
       "      <td>  2</td>\n",
       "      <td>  2</td>\n",
       "      <td>  2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>B</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>  2</td>\n",
       "      <td>  2</td>\n",
       "      <td>  2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>C</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>  2</td>\n",
       "      <td>  2</td>\n",
       "      <td>  2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>D</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>  2</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>E</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>F</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>G</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>  2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>H</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>  2</td>\n",
       "      <td>  2</td>\n",
       "      <td>  2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>I</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>  2</td>\n",
       "      <td>  2</td>\n",
       "      <td>  2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>J</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>  2</td>\n",
       "      <td>  2</td>\n",
       "      <td>  2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10 rows × 60 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       strength                                     ... cask                  \\\n",
       "sample      A:a A:b A:c B:a B:b B:c C:a C:b C:c D:a ...  G:c H:a H:b H:c I:a   \n",
       "batch                                               ...                        \n",
       "A             2   2   2 NaN NaN NaN NaN NaN NaN NaN ...  NaN NaN NaN NaN NaN   \n",
       "B           NaN NaN NaN   2   2   2 NaN NaN NaN NaN ...  NaN NaN NaN NaN NaN   \n",
       "C           NaN NaN NaN NaN NaN NaN   2   2   2 NaN ...  NaN NaN NaN NaN NaN   \n",
       "D           NaN NaN NaN NaN NaN NaN NaN NaN NaN   2 ...  NaN NaN NaN NaN NaN   \n",
       "E           NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...  NaN NaN NaN NaN NaN   \n",
       "F           NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...  NaN NaN NaN NaN NaN   \n",
       "G           NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...    2 NaN NaN NaN NaN   \n",
       "H           NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...  NaN   2   2   2 NaN   \n",
       "I           NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...  NaN NaN NaN NaN   2   \n",
       "J           NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...  NaN NaN NaN NaN NaN   \n",
       "\n",
       "                            \n",
       "sample I:b I:c J:a J:b J:c  \n",
       "batch                       \n",
       "A      NaN NaN NaN NaN NaN  \n",
       "B      NaN NaN NaN NaN NaN  \n",
       "C      NaN NaN NaN NaN NaN  \n",
       "D      NaN NaN NaN NaN NaN  \n",
       "E      NaN NaN NaN NaN NaN  \n",
       "F      NaN NaN NaN NaN NaN  \n",
       "G      NaN NaN NaN NaN NaN  \n",
       "H      NaN NaN NaN NaN NaN  \n",
       "I        2   2 NaN NaN NaN  \n",
       "J      NaN NaN   2   2   2  \n",
       "\n",
       "[10 rows x 60 columns]"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pandas.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/lme4/Pastes.csv', index_col=0)\n",
    "\n",
    "table = data.pivot_table(['batch', 'sample'],  rows='batch', columns='sample', aggfunc=len)\n",
    "table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "       <td>Model:</td>       <td>MixedLM</td> <td>Dependent Variable:</td> <td>strength</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <td>No. Observations:</td>   <td>60</td>          <td>Method:</td>         <td>REML</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "     <td>No. Groups:</td>      <td>10</td>          <td>Scale:</td>         <td>0.6780</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <td>Min. group size:</td>     <td>6</td>        <td>Likelihood:</td>     <td>-123.4954</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <td>Max. group size:</td>     <td>6</td>        <td>Converged:</td>         <td>Yes</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <td>Mean group size:</td>    <td>6.0</td>            <td></td>               <td></td>     \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>       <th>Coef.</th> <th>Std.Err.</th>    <th>z</th>   <th>P>|z|</th> <th>[0.025</th> <th>0.975]</th>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>60.053</td>   <td>0.677</td>  <td>88.714</td> <td>0.000</td> <td>58.727</td> <td>61.380</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>batch RE</th>   <td>1.658</td>   <td>2.901</td>     <td></td>      <td></td>       <td></td>       <td></td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>sample RE</th>  <td>8.433</td>   <td>4.349</td>     <td></td>      <td></td>       <td></td>       <td></td>   \n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary2.Summary'>\n",
       "\"\"\"\n",
       "         Mixed Linear Model Regression Results\n",
       "=======================================================\n",
       "Model:            MixedLM Dependent Variable: strength \n",
       "No. Observations: 60      Method:             REML     \n",
       "No. Groups:       10      Scale:              0.6780   \n",
       "Min. group size:  6       Likelihood:         -123.4954\n",
       "Max. group size:  6       Converged:          Yes      \n",
       "Mean group size:  6.0                                  \n",
       "-------------------------------------------------------\n",
       "             Coef.  Std.Err.   z    P>|z| [0.025 0.975]\n",
       "-------------------------------------------------------\n",
       "Intercept    60.053    0.677 88.714 0.000 58.727 61.380\n",
       "batch RE      1.658    2.901                           \n",
       "sample RE     8.433    4.349                           \n",
       "=======================================================\n",
       "\n",
       "\"\"\""
      ]
     },
     "execution_count": 110,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pandas.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/lme4/Pastes.csv', index_col=0)\n",
    "vcf = {\"batch\" : \"0 + C(batch)\", \"sample\" : \"0 + C(sample)\"}\n",
    "model = sm.MixedLM.from_formula('strength ~ 1', groups=\"batch\", vc_formula=vcf, data=data)\n",
    "result = model.fit()\n",
    "result.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In R: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Linear mixed model fit by REML ['lmerMod']\n",
       "Formula: strength ~ 1 + (1 | batch) + (1 | sample) \n",
       "   Data: Pastes \n",
       "\n",
       "REML criterion at convergence: 246.9907 \n",
       "\n",
       "Random effects:\n",
       " Groups   Name        Variance Std.Dev.\n",
       " sample   (Intercept) 8.434    2.9041  \n",
       " batch    (Intercept) 1.657    1.2874  \n",
       " Residual             0.678    0.8234  \n",
       "Number of obs: 60, groups: sample, 30; batch, 10\n",
       "\n",
       "Fixed effects:\n",
       "            Estimate Std. Error t value\n",
       "(Intercept)  60.0533     0.6769   88.72\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%R print( summary(fm <- lmer(strength ~ 1 + (1|batch) + (1|sample), Pastes)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}