{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Linear Mixed Models\n", "\n", "\n", "\n", "Consider a study that compares the effect of a new drug to a placebo." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | Drug | | | | Placebo | | |\n", "|--------|---|-----------|---|---|---|-----------|---|----|\n", "| | | Age Group | | | | Age Group | | |\n", "|--------|---|-----------|---|---|---|-----------|---|----|\n", "| | A | B | C | | A | B | C | |\n", "| Gender | | | | | | | | |\n", "| Male | | | | | | | | |\n", "| Female | | | | | | | | |\n", "| | | | | | | | | |\n", "\n", "Table 1: Study of effect if a drug over placebo for different age groups for males and females\n", "\n", "\n", "| | | School X | | | | School Y | | |\n", "|--------|---|-----------|---|---|---|-----------|---|----|\n", "| | | Section | | | | Section | | |\n", "|--------|---|-----------|---|---|---|-----------|---|----|\n", "| | A | B | C | | A | B | C | |\n", "| Gender | | | | | | | | |\n", "| Male | | | | | | | | |\n", "| Female | | | | | | | | |\n", "| | | | | | | | | |\n", "\n", "Table 2: Study of marks scored for different schools over " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The definition for fixed effects and random-effects is a bit [hazy](http://andrewgelman.com/2005/01/25/why_i_dont_use/), but here we refer to the fixed effects arising due to a factor \n", "\n", "### Terminology\n", "-----------------------------\n", "**Factor**: Classifications identifying the source of datum. So 'Drug', 'Placebo' and 'Gender' are the three factors in the above study.\n", "\n", "**Levels**: Individual classes of the factors. So 'Male' and 'Female' are levels of 'Gender' factor and 'A', 'B', 'C' are the levels of 'gender' factor. 'Drug' and 'Placebo' are levels of 'Type of drug administered'\n", "\n", "**Crossed Factors**: All levels of 'Age' factor appear for both levels of the 'Type of drug' factorand hence is a 'crossed' factor.\n", "\n", "**Nested Factors**: In the table summarizing school wise grades, the section A under school X is comnpletely independent of section A under school Y. In fact, it does not matter what particular section we refer to, as long as we have 3 sections for both. Since the levels of the 'section' factor appear only once, 'section' is a nested factor within the 'type of school'.\n", "\n", "\n", "**Effects**: Extent to which different levels of a factor affect the variable of interest\n", "\n", "In order to define our meaning of fixed and random effects,, we refer to fixed effects as the effects that can be attributed to a finite set of levels of a factor. The random effects arise due to the levels of factor being infinite. So if the performance of two schools is being considered and we also record the student ids along with their gender and scores, the gender in itself will give rise to a fixed effect for there are just two levels: male and female. But, the student ids being levels of the student identifier factor can have infinite levels, because the student can be sampled randomly.\n", "\n", "**Fixed Effects** are simply the effects arising due to the levels of a fixed covariate, a factor whose all levels are of interest such that they represent specific conditions that are used to define specific contrasts\n", "\n", "**Random Effects** arise out of **random factors**, a factor whose levels were randomly sampled from a population of levels. Thus, all possible levels of **random factors** cannot be part of the study and hence only an inference can be made about the entire population of levels given a random sample of the levels\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Matrix Specification\n", "\n", "A linear mixed model is described by the following model:\n", "\n", "$$Y= \\underbrace{X\\beta}_\\text{Fixed} + \\underbrace{Zu + \\epsilon}_\\text{Random}$$\n", "\n", "where \n", "\n", "$n$ = Number of observations\n", "\n", "$p$ = Number of fixed effect covariates\n", "\n", "$q$ = Number of random effect covariates\n", "\n", "$Y$ = response column vector $(n \\times 1)$\n", "\n", "$X$ = $n \\times p$ model matrix representing the known values of the $p$ fixed-effect covariates \n", " \n", "$\\beta$ = $p \\times 1$ fixed-effect parameter column vector\n", "\n", "$Z$ = $n \\times q$ model matrix for representing the known values of the $q$ random-effect covariates\n", "\n", "$u$ = $q \\times 1$ fixed-effect parameter column vector \n", "\n", "$\\epsilon$ = Residual column vector\n", "\n", "\n", "By definition the random-effects are random variables and the $q$ random-effects parameter inthe $u$ column vector follow a multivarite normal distribution:\n", "\n", "$$u \\sim N(0,D)$$\n", "\n", "where $D$ is a $q \\times q$ variance-covariance matrix\n", "\n", "\n", "$\\epsilon$ column vector constitutes the residuals associated with the $n$ observations. \n", "\n", "Residuals and random-effects parameter vector $u$ are assumed to be independent and $\\epsilon$ is assumed to be normally distributed:\n", "\n", "$$\\epsilon \\sim N(0,R)$$ \n", "\n", "where $R$ is a $n\\times n$ variance-covariance matrix\n", "\n", "\n", "Our aim is to estimate the column vectors $\\beta$ , $u$ (by estimating for the $D$ matrix) and $\\epsilon$(by estimating the $R$ matrix)\n", "\n", "It is possible to represent the above model in a marginailised form:\n", "\n", "$$Y= \\underbrace{X\\beta}_\\text{Fixed} + \\underbrace{\\epsilon^*}_\\text{Random}$$\n", " \n", "where $\\epsilon^* \\sim N(0, ZDZ'+R)$ which helps to relax the constraints on both $D$ and $R$ being positive definite to $ZDZ'+R$ being positive definite.\n", "\n", "and hence $Y \\sim N(X\\beta, ZDZ'+R)$\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import statsmodels.api as sm\n", "import pandas\n", "import statsmodels.formula.api as smf" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [], "source": [ "data = pandas.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/lme4/Penicillin.csv', index_col=0)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " diameter plate sample\n", "count 144.000000 144 144\n", "unique NaN 24 6\n", "top NaN x F\n", "freq NaN 6 24\n", "mean 22.972222 NaN NaN\n", "std 2.031034 NaN NaN\n", "min 18.000000 NaN NaN\n", "25% 22.000000 NaN NaN\n", "50% 23.000000 NaN NaN\n", "75% 24.000000 NaN NaN\n", "max 27.000000 NaN NaN\n" ] } ], "source": [ "print data.describe(include='all')" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['diameter' 'plate' 'sample']\n" ] } ], "source": [ "print (data.columns.values)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "count 144\n", "unique 6\n", "top F\n", "freq 24\n", "Name: sample, dtype: object\n", "count 144\n", "unique 24\n", "top x\n", "freq 6\n", "Name: plate, dtype: object\n" ] } ], "source": [ "print data['sample'].describe()\n", "print data['plate'].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We take the Penicillin growth example from the 'lme4' manual. Diameter refers to the diameter of growth inhibition in a culturing experiment. There are 24 plates and 6 types of samples, each plate e\n", "The diameter varies with the plate and sample type.\n", "Clearly plate and sample are categorical variables with 24 and 6 levels each. The type of plate here is assumed to be sampled randomly(we are not interested in studying what effect a particular plate say A has, but are more interested in studying the effect of plate ). This is an example of crossed random effect. Crossed because each sample is used in each plate. \n", "\n" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
diameter
plateabcdefghij...opqrstuvwx
sample
A 1 1 1 1 1 1 1 1 1 1... 1 1 1 1 1 1 1 1 1 1
B 1 1 1 1 1 1 1 1 1 1... 1 1 1 1 1 1 1 1 1 1
C 1 1 1 1 1 1 1 1 1 1... 1 1 1 1 1 1 1 1 1 1
D 1 1 1 1 1 1 1 1 1 1... 1 1 1 1 1 1 1 1 1 1
E 1 1 1 1 1 1 1 1 1 1... 1 1 1 1 1 1 1 1 1 1
F 1 1 1 1 1 1 1 1 1 1... 1 1 1 1 1 1 1 1 1 1
\n", "

6 rows × 24 columns

\n", "
" ], "text/plain": [ " diameter ... \n", "plate a b c d e f g h i j ... o p q r s t u v w x\n", "sample ... \n", "A 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1\n", "B 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1\n", "C 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1\n", "D 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1\n", "E 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1\n", "F 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1\n", "\n", "[6 rows x 24 columns]" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table = data.pivot_table(['plate'], rows='sample', columns='plate', aggfunc=len)\n", "table" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Loading required package: lattice\n", "Loading required package: Matrix\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%load_ext rpy2.ipython\n", "%R library(lme4)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'data.frame':\t144 obs. of 3 variables:\n", " $ diameter: num 27 23 26 23 23 21 27 23 26 23 ...\n", " $ plate : Factor w/ 24 levels \"a\",\"b\",\"c\",\"d\",..: 1 1 1 1 1 1 2 2 2 2 ...\n", " $ sample : Factor w/ 6 levels \"A\",\"B\",\"C\",\"D\",..: 1 2 3 4 5 6 1 2 3 4 ...\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%R str(Penicillin)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "code", "execution_count": 104, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Model: MixedLM Dependent Variable: diameter
No. Observations: 144 Method: REML
No. Groups: 24 Scale: 1.4191
Min. group size: 6 Likelihood: -306.6280
Max. group size: 6 Converged: Yes
Mean group size: 6.0
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Coef. Std.Err. z P>|z| [0.025 0.975]
Intercept 22.972 0.179 128.476 0.000 22.622 23.323
plate RE 0.095
sample RE 2.614
" ], "text/plain": [ "\n", "\"\"\"\n", " Mixed Linear Model Regression Results\n", "=======================================================\n", "Model: MixedLM Dependent Variable: diameter \n", "No. Observations: 144 Method: REML \n", "No. Groups: 24 Scale: 1.4191 \n", "Min. group size: 6 Likelihood: -306.6280\n", "Max. group size: 6 Converged: Yes \n", "Mean group size: 6.0 \n", "-------------------------------------------------------\n", " Coef. Std.Err. z P>|z| [0.025 0.975]\n", "-------------------------------------------------------\n", "Intercept 22.972 0.179 128.476 0.000 22.622 23.323\n", "plate RE 0.095 \n", "sample RE 2.614 \n", "=======================================================\n", "\n", "\"\"\"" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Crossed Random Effects\n", "data = pandas.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/lme4/Penicillin.csv', index_col=0)\n", "\n", "vcf = {\"plate\" : \"1 + C(plate)\", \"sample\" : \"0 + C(sample)\"}\n", "model = sm.MixedLM.from_formula('diameter ~ 1', groups=\"plate\", vc_formula=vcf, data=data)\n", "result = model.fit()\n", "result.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The same example in R:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Linear mixed model fit by REML ['lmerMod']\n", "Formula: diameter ~ 1 + (1 | plate) + (1 | sample) \n", " Data: Penicillin \n", "\n", "REML criterion at convergence: 330.8606 \n", "\n", "Random effects:\n", " Groups Name Variance Std.Dev.\n", " plate (Intercept) 0.7169 0.8467 \n", " sample (Intercept) 3.7309 1.9316 \n", " Residual 0.3024 0.5499 \n", "Number of obs: 144, groups: plate, 24; sample, 6\n", "\n", "Fixed effects:\n", " Estimate Std. Error t value\n", "(Intercept) 22.9722 0.8086 28.41\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%R print (summary(fm <- lmer(diameter ~ 1 + (1|plate) + (1|sample), Penicillin)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dyestuff example\n", "\n", "The dyestuff dataset is a self-sufficient dataset to understand fixed and random effects. The dataset involves batch to batch variability in the strength of dye. We have 6 batches and 5 samples for each batch, totalling to 30 observations. It is easy ti imagine two sources of variability here, the one within-batch and the other one across-batch. We use mixed models to quantify the batch-batch variability:\n" ] }, { "cell_type": "code", "execution_count": 106, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Model: MixedLM Dependent Variable: Yield
No. Observations: 30 Method: REML
No. Groups: 6 Scale: 2451.2239
Min. group size: 5 Likelihood: -159.8271
Max. group size: 5 Converged: Yes
Mean group size: 5.0
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Coef. Std.Err. z P>|z| [0.025 0.975]
Intercept 1527.500 19.384 78.802 0.000 1489.508 1565.492
groups RE 1764.171 31.658
" ], "text/plain": [ "\n", "\"\"\"\n", " Mixed Linear Model Regression Results\n", "==========================================================\n", "Model: MixedLM Dependent Variable: Yield \n", "No. Observations: 30 Method: REML \n", "No. Groups: 6 Scale: 2451.2239\n", "Min. group size: 5 Likelihood: -159.8271\n", "Max. group size: 5 Converged: Yes \n", "Mean group size: 5.0 \n", "----------------------------------------------------------\n", " Coef. Std.Err. z P>|z| [0.025 0.975] \n", "----------------------------------------------------------\n", "Intercept 1527.500 19.384 78.802 0.000 1489.508 1565.492\n", "groups RE 1764.171 31.658 \n", "==========================================================\n", "\n", "\"\"\"" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = pandas.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/lme4/Dyestuff.csv', index_col=0)\n", "model = smf.mixedlm('Yield ~ 1', groups=\"Batch\", data=data)\n", "result = model.fit()\n", "result.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Same example in R:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Linear mixed model fit by REML ['lmerMod']\n", "Formula: Yield ~ 1 + (1 | Batch) \n", " Data: Dyestuff \n", "\n", "REML criterion at convergence: 319.6543 \n", "\n", "Random effects:\n", " Groups Name Variance Std.Dev.\n", " Batch (Intercept) 1764 42.00 \n", " Residual 2451 49.51 \n", "Number of obs: 30, groups: Batch, 6\n", "\n", "Fixed effects:\n", " Estimate Std. Error t value\n", "(Intercept) 1527.50 19.38 78.8\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%R print( summary(fm <- lmer(Yield ~ 1 + (1|Batch), Dyestuff)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###Dyestuff2 '\n", "\n", "Same as the dyestuff example, but with batch-batch variability missing:\n", "\n" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ " Batch Yield \n", " A:5 Min. :-0.892 \n", " B:5 1st Qu.: 2.765 \n", " C:5 Median : 5.365 \n", " D:5 Mean : 5.666 \n", " E:5 3rd Qu.: 8.151 \n", " F:5 Max. :13.434 \n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%R print(summary(Dyestuff2))\n" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Linear mixed model fit by REML ['lmerMod']\n", "Formula: Yield ~ 1 + (1 | Batch) \n", " Data: Dyestuff2 \n", "\n", "REML criterion at convergence: 161.8283 \n", "\n", "Random effects:\n", " Groups Name Variance Std.Dev.\n", " Batch (Intercept) 0.00 0.000 \n", " Residual 13.81 3.716 \n", "Number of obs: 30, groups: Batch, 6\n", "\n", "Fixed effects:\n", " Estimate Std. Error t value\n", "(Intercept) 5.6656 0.6784 8.352\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "\n", "%R print( summary(fm <- lmer(Yield ~ 1 + (1|Batch), Dyestuff2)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The variance for batch random effect is 0, thus there is no random effect. It is as good as a linear model:\n" ] }, { "cell_type": "code", "execution_count": 108, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "\n", "Call:\n", "lm(formula = Yield ~ 1, data = Dyestuff2)\n", "\n", "Residuals:\n", " Min 1Q Median 3Q Max \n", "-6.5576 -2.9006 -0.3006 2.4854 7.7684 \n", "\n", "Coefficients:\n", " Estimate Std. Error t value Pr(>|t|) \n", "(Intercept) 5.6656 0.6784 8.352 3.32e-09 ***\n", "---\n", "Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1\n", "\n", "Residual standard error: 3.716 on 29 degrees of freedom\n", "\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%R print (summary(fm <- lm(Yield ~1, Dyestuff2)))" ] }, { "cell_type": "code", "execution_count": 109, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/usr/local/lib/python2.7/dist-packages/statsmodels-0.7.0-py2.7-linux-x86_64.egg/statsmodels/regression/mixed_linear_model.py:1912: ConvergenceWarning: The MLE may be on the boundary of the parameter space.\n", " warnings.warn(msg, ConvergenceWarning)\n" ] }, { "data": { "text/html": [ "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Model: MixedLM Dependent Variable: Yield
No. Observations: 30 Method: REML
No. Groups: 6 Scale: 13.8063
Min. group size: 5 Likelihood: -80.9141
Max. group size: 5 Converged: Yes
Mean group size: 5.0
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Coef. Std.Err. z P>|z| [0.025 0.975]
Intercept 5.666 0.678 8.352 0.000 4.336 6.995
groups RE 0.000 1.235
" ], "text/plain": [ "\n", "\"\"\"\n", " Mixed Linear Model Regression Results\n", "======================================================\n", "Model: MixedLM Dependent Variable: Yield \n", "No. Observations: 30 Method: REML \n", "No. Groups: 6 Scale: 13.8063 \n", "Min. group size: 5 Likelihood: -80.9141\n", "Max. group size: 5 Converged: Yes \n", "Mean group size: 5.0 \n", "------------------------------------------------------\n", " Coef. Std.Err. z P>|z| [0.025 0.975]\n", "------------------------------------------------------\n", "Intercept 5.666 0.678 8.352 0.000 4.336 6.995\n", "groups RE 0.000 1.235 \n", "======================================================\n", "\n", "\"\"\"" ] }, "execution_count": 109, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = pandas.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/lme4/Dyestuff2.csv', index_col=0)\n", "vcf = {\"batch\" : \"0 + C(Batch)\"}\n", "model = sm.MixedLM.from_formula('Yield ~ 1', groups=\"Batch\", data=data)\n", "result = model.fit()\n", "result.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pastes\n", "The experiment consists of strength of chemical pastes packaged in different casks, at two time points. There are 10 batches, with 3 casks sampled randomly and two tests performed. In total 10x3x2 observations.\n", "\n", "So there are 30 samples with levels 'A:a', 'A:b', 'A:c' and so on, 3 samples for the 10 batches(This sort of naming saves us from the trouble of using different variable names for these 30 levels) The 'a', 'b', 'c' refer to the levels of the cask. But notice that cask 'a' of batch 'A' has no relation to cask 'a' of batch 'B' and so on. 'a', 'b', 'c' differentiate the casks within a batch but have no relation to other batches otherwise.\n", "\n", "\n", "This is an example of nested random effects because each level of sample occurs with one and only one level of the batch(compare it with the pivot table we had earlier where all the entries were one) \n" ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
strength...cask
sampleA:aA:bA:cB:aB:bB:cC:aC:bC:cD:a...G:cH:aH:bH:cI:aI:bI:cJ:aJ:bJ:c
batch
A 2 2 2NaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
BNaNNaNNaN 2 2 2NaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
CNaNNaNNaNNaNNaNNaN 2 2 2NaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
DNaNNaNNaNNaNNaNNaNNaNNaNNaN 2...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
ENaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
FNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
GNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN... 2NaNNaNNaNNaNNaNNaNNaNNaNNaN
HNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaN 2 2 2NaNNaNNaNNaNNaNNaN
INaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaN 2 2 2NaNNaNNaN
JNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaN 2 2 2
\n", "

10 rows × 60 columns

\n", "
" ], "text/plain": [ " strength ... cask \\\n", "sample A:a A:b A:c B:a B:b B:c C:a C:b C:c D:a ... G:c H:a H:b H:c I:a \n", "batch ... \n", "A 2 2 2 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN \n", "B NaN NaN NaN 2 2 2 NaN NaN NaN NaN ... NaN NaN NaN NaN NaN \n", "C NaN NaN NaN NaN NaN NaN 2 2 2 NaN ... NaN NaN NaN NaN NaN \n", "D NaN NaN NaN NaN NaN NaN NaN NaN NaN 2 ... NaN NaN NaN NaN NaN \n", "E NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN \n", "F NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN \n", "G NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 2 NaN NaN NaN NaN \n", "H NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN 2 2 2 NaN \n", "I NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN 2 \n", "J NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN \n", "\n", " \n", "sample I:b I:c J:a J:b J:c \n", "batch \n", "A NaN NaN NaN NaN NaN \n", "B NaN NaN NaN NaN NaN \n", "C NaN NaN NaN NaN NaN \n", "D NaN NaN NaN NaN NaN \n", "E NaN NaN NaN NaN NaN \n", "F NaN NaN NaN NaN NaN \n", "G NaN NaN NaN NaN NaN \n", "H NaN NaN NaN NaN NaN \n", "I 2 2 NaN NaN NaN \n", "J NaN NaN 2 2 2 \n", "\n", "[10 rows x 60 columns]" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = pandas.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/lme4/Pastes.csv', index_col=0)\n", "\n", "table = data.pivot_table(['batch', 'sample'], rows='batch', columns='sample', aggfunc=len)\n", "table" ] }, { "cell_type": "code", "execution_count": 110, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Model: MixedLM Dependent Variable: strength
No. Observations: 60 Method: REML
No. Groups: 10 Scale: 0.6780
Min. group size: 6 Likelihood: -123.4954
Max. group size: 6 Converged: Yes
Mean group size: 6.0
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Coef. Std.Err. z P>|z| [0.025 0.975]
Intercept 60.053 0.677 88.714 0.000 58.727 61.380
batch RE 1.658 2.901
sample RE 8.433 4.349
" ], "text/plain": [ "\n", "\"\"\"\n", " Mixed Linear Model Regression Results\n", "=======================================================\n", "Model: MixedLM Dependent Variable: strength \n", "No. Observations: 60 Method: REML \n", "No. Groups: 10 Scale: 0.6780 \n", "Min. group size: 6 Likelihood: -123.4954\n", "Max. group size: 6 Converged: Yes \n", "Mean group size: 6.0 \n", "-------------------------------------------------------\n", " Coef. Std.Err. z P>|z| [0.025 0.975]\n", "-------------------------------------------------------\n", "Intercept 60.053 0.677 88.714 0.000 58.727 61.380\n", "batch RE 1.658 2.901 \n", "sample RE 8.433 4.349 \n", "=======================================================\n", "\n", "\"\"\"" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = pandas.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/lme4/Pastes.csv', index_col=0)\n", "vcf = {\"batch\" : \"0 + C(batch)\", \"sample\" : \"0 + C(sample)\"}\n", "model = sm.MixedLM.from_formula('strength ~ 1', groups=\"batch\", vc_formula=vcf, data=data)\n", "result = model.fit()\n", "result.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In R: " ] }, { "cell_type": "code", "execution_count": 101, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Linear mixed model fit by REML ['lmerMod']\n", "Formula: strength ~ 1 + (1 | batch) + (1 | sample) \n", " Data: Pastes \n", "\n", "REML criterion at convergence: 246.9907 \n", "\n", "Random effects:\n", " Groups Name Variance Std.Dev.\n", " sample (Intercept) 8.434 2.9041 \n", " batch (Intercept) 1.657 1.2874 \n", " Residual 0.678 0.8234 \n", "Number of obs: 60, groups: sample, 30; batch, 10\n", "\n", "Fixed effects:\n", " Estimate Std. Error t value\n", "(Intercept) 60.0533 0.6769 88.72\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%R print( summary(fm <- lmer(strength ~ 1 + (1|batch) + (1|sample), Pastes)))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }