{ "metadata": { "language": "Julia", "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Examples of fitting linear mixed-effects models" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Simple examples using data from the `lme4` package for `R`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `lme4` package for [R](http://www.r-project.org) includes several data sets, chosen to illustrate various types of models.\n", "\n", "The simplest model fits are those with a single, simple, scalar random-effects term. That is, there is only a _main random effects_ term." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "A model for the `Dyestuff` data" ] }, { "cell_type": "code", "collapsed": false, "input": [ "using MixedModels,RDatasets # load packages\n", "ds = dataset(\"lme4\",\"Dyestuff\")" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
BatchYield
1A1545
2A1440
3A1440
4A1520
5A1580
6B1540
7B1555
8B1490
9B1560
10B1495
11C1595
12C1550
13C1605
14C1510
15C1560
16D1445
17D1440
18D1595
19D1465
20D1545
21E1595
22E1630
23E1515
24E1635
25E1625
26F1520
27F1455
28F1450
29F1480
30F1445
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 1, "text": [ "30x2 DataFrame\n", "| Row | Batch | Yield |\n", "|-----|-------|-------|\n", "| 1 | \"A\" | 1545 |\n", "| 2 | \"A\" | 1440 |\n", "| 3 | \"A\" | 1440 |\n", "| 4 | \"A\" | 1520 |\n", "| 5 | \"A\" | 1580 |\n", "| 6 | \"B\" | 1540 |\n", "| 7 | \"B\" | 1555 |\n", "| 8 | \"B\" | 1490 |\n", "| 9 | \"B\" | 1560 |\n", "| 10 | \"B\" | 1495 |\n", "| 11 | \"C\" | 1595 |\n", "\u22ee\n", "| 19 | \"D\" | 1465 |\n", "| 20 | \"D\" | 1545 |\n", "| 21 | \"E\" | 1595 |\n", "| 22 | \"E\" | 1630 |\n", "| 23 | \"E\" | 1515 |\n", "| 24 | \"E\" | 1635 |\n", "| 25 | \"E\" | 1625 |\n", "| 26 | \"F\" | 1520 |\n", "| 27 | \"F\" | 1455 |\n", "| 28 | \"F\" | 1450 |\n", "| 29 | \"F\" | 1480 |\n", "| 30 | \"F\" | 1445 |" ] } ], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The only covariate is the categorical variable `Batch` denoting the batch of an intermediate product from which the dye was created. We are not interested in the \"effect\" of a particular batch on the yield of dyestuff so much as we are interested in the variability due to batch." ] }, { "cell_type": "code", "collapsed": false, "input": [ "m1 = fit(lmm(Yield ~ 1 + (1|Batch),ds))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 2, "text": [ "Linear mixed model fit by maximum likelihood\n", "Formula: Yield ~ 1 + (1 | Batch)\n", "\n", " logLik: -163.663530, deviance: 327.327060\n", "\n", " Variance components:\n", " Variance Std.Dev.\n", " Batch 1388.331690 37.260323\n", " Residual 2451.250503 49.510105\n", " Number of obs: 30; levels of grouping factors: 6\n", "\n", " Fixed-effects parameters:\n", " Estimate Std.Error z value\n", "(Intercept) 1527.5 17.6945 86.326\n" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first time an `lmm` model is fit takes longer than subsequent fits because several methods must be compiled. Subsequent fits are quite fast. There is a second argument, `verbose`, to the `fit` method. It defaults to `false`. We can trace the progress of the iterations by setting it to `true`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "@time fit(lmm(Yield ~ 1 + (1|Batch), ds),true);" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "f_1: 327.76702, [1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ ".0]\n", "f_2: 328.63496, [0.428326]\n", "f_3: 327.33773, [0.787132]\n", "f_4: 328.27031, [0.472809]\n", "f_5: 327.33282, [0.727955]\n", "f_6: 327.32706, [0.752783]\n", "f_7: 327.32706, [0.752599]\n", "f_8: 327.32706, [0.752355]\n", "f_9: 327.32706, [0.752575]\n", "f_10: 327.32706, [0.75258]\n", "FTOL_REACHED\n", "elapsed time: 0.212709603 seconds (5417164 bytes allocated)\n" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Convergence is fast, as this is a 1-dimensional optimization and the gradient of the objective function is available." ] }, { "cell_type": "code", "collapsed": false, "input": [ "MixedModels.hasgrad(m1)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ "true" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The parameter being optimized is the single free parameter in the _relative covariance factor_, $\\Lambda$. This matrix is $6\\times 6$ and block diagonal with each diagonal block being a copy of" ] }, { "cell_type": "code", "collapsed": false, "input": [ "m1.\u03bb" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ "1-element Array{Any,1}:\n", " PDScalF(0.7525801695835216,1)" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The single element in `m1.`$\\lambda$ is the ratio of the standard deviation of the random effects to the residual standard deviation.\n", "\n", "Most of the extractors from the [lme4](https://github.com/lme4/lme4) package for [R](http://www.r-project.org) apply to these model fits." ] }, { "cell_type": "code", "collapsed": false, "input": [ "fixef(m1)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ "1-element Array{Float64,1}:\n", " 1527.5" ] } ], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "ranef(m1)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ "1-element Array{Any,1}:\n", " 1x6 Array{Float64,2}:\n", " -16.6282 0.369516 26.9747 -21.8014 53.5798 -42.4943" ] } ], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "deviance(m1)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ "327.32705988113764" ] } ], "prompt_number": 8 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "A model for the `DyeStuff2` data showing a singular fit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `Dyestuff2` data are simulated to show a fit on the boundary of the parameter space." ] }, { "cell_type": "code", "collapsed": false, "input": [ "ds2 = dataset(\"lme4\",\"Dyestuff2\");\n", "@time m2 = fit(lmm(Yield ~ 1 + (1|Batch), ds2),true)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "f_1: 170.9011, [" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "1.0]\n", "f_2: 163.97764, [0.281722]\n", "f_3: 162.87304, [0.0]\n", "f_4: 162.87304, [0.0]\n", "f_5: 162.87304, [0.0]\n", "XTOL_REACHED\n", "elapsed time: 0.087887694 seconds (1538732 bytes allocated)\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "Linear mixed model fit by maximum likelihood\n", "Formula: Yield ~ 1 + (1 | Batch)\n", "\n", " logLik: -81.436518, deviance: 162.873037\n", "\n", " Variance components:\n", " Variance Std.Dev.\n", " Batch 0.000000 0.000000\n", " Residual 13.346099 3.653231\n", " Number of obs: 30; levels of grouping factors: 6\n", "\n", " Fixed-effects parameters:\n", " Estimate Std.Error z value\n", "(Intercept) 5.6656 0.666986 8.49433\n" ] } ], "prompt_number": 9 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "A model with a single, vector-valued, random-effects term" ] }, { "cell_type": "code", "collapsed": false, "input": [ "slp = dataset(\"lme4\",\"sleepstudy\");\n", "m3 = fit(lmm(Reaction ~ 1+Days + (1+Days|Subject),slp),true)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "f_1: 1784.6423, [" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "1.0,0.0,1.0]\n", "f_2: 1792.09158, [1.04647,-0.384052,0.159046]\n", "f_3: 1759.76629, [1.00506,-0.0847897,0.418298]\n", "f_4: 1787.91236, [1.26209,0.662287,0.0]\n", "f_5: 1770.2265, [1.04773,0.323752,0.0]\n", "f_6: 1755.6188, [1.00967,0.0107469,0.150327]\n", "f_7: 1762.85008, [0.991808,0.14307,0.446863]\n", "f_8: 1753.29754, [1.0048,0.0534958,0.272807]\n", "f_9: 1752.5881, [1.00312,0.0418443,0.252944]\n", "f_10: 1767.54407, [0.99451,0.00122224,0.0940196]\n", "f_11: 1752.21061, [1.00224,0.0373251,0.232541]\n", "f_12: 1758.83812, [0.988744,0.0206109,0.123804]\n", "f_13: 1752.13481, [1.00085,0.0355465,0.220038]\n", "f_14: 1752.02982, [0.980566,0.015964,0.234045]\n", "f_15: 1759.88963, [0.971299,0.0118275,0.120811]\n", "f_16: 1751.98896, [0.979624,0.0155451,0.221269]\n", "f_17: 1751.98436, [0.97888,0.015535,0.224081]\n", "f_18: 1751.96796, [0.968696,0.0144745,0.223608]\n", "f_19: 1752.08826, [0.867754,0.0226905,0.23397]\n", "f_20: 1751.9463, [0.943112,0.0163709,0.226009]\n", "f_21: 1754.13022, [0.834535,0.0100178,0.166328]\n", "f_22: 1751.94908, [0.930934,0.0157232,0.218808]\n", "f_23: 1751.94123, [0.938201,0.0161115,0.223087]\n", "f_24: 1751.96529, [0.894427,0.0196256,0.223832]\n", "f_25: 1751.93978, [0.930555,0.0167127,0.223213]\n", "f_26: 1751.93974, [0.930506,0.019329,0.222173]\n", "f_27: 1751.94419, [0.913941,0.018183,0.222681]\n", "f_28: 1751.93955, [0.927971,0.0191544,0.22225]\n", "f_29: 1751.93979, [0.933502,0.0174017,0.2225]\n", "f_30: 1751.93942, [0.92986,0.0185533,0.222336]\n", "f_31: 1751.9544, [0.903287,0.0173483,0.222935]\n", "f_32: 1751.93944, [0.927141,0.0184317,0.222396]\n", "f_33: 1751.93939, [0.928786,0.0185053,0.222359]\n", "f_34: 1751.93935, [0.929171,0.0180663,0.222656]\n", "f_35: 1751.93935, [0.929702,0.0182395,0.222667]\n", "f_36: 1751.93934, [0.929337,0.0181204,0.222659]\n", "f_37: 1751.93935, [0.928905,0.0183732,0.222647]\n", "f_38: 1751.93934, [0.929269,0.0181603,0.222657]\n", "f_39: 1751.93935, [0.929106,0.0181461,0.222618]\n", "f_40: 1751.93934, [0.929236,0.0181575,0.22265]\n", "f_41: 1751.93934, [0.929197,0.0181927,0.222625]\n", "f_42: 1751.93934, [0.929229,0.018164,0.222645]\n", "f_43: 1751.93934, [0.929146,0.0181729,0.22267]\n", "f_44: 1751.93934, [0.929221,0.0181649,0.222647]\n", "f_45: 1751.93934, [0.929226,0.0181643,0.222646]\n", "FTOL_REACHED\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 10, "text": [ "Linear mixed model fit by maximum likelihood\n", "Formula: Reaction ~ 1 + Days + ((1 + Days) | Subject)\n", "\n", " logLik: -875.969672, deviance: 1751.939344\n", "\n", " Variance components:\n", " Variance Std.Dev. Corr.\n", " Subject 565.516376 23.780588\n", " 32.682265 5.716840 0.08\n", " Residual 654.940901 25.591813\n", " Number of obs: 180; levels of grouping factors: 18\n", "\n", " Fixed-effects parameters:\n", " Estimate Std.Error z value\n", "(Intercept) 251.405 6.63228 37.9063\n", "Days 10.4673 1.50224 6.96779\n" ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this model $\\Lambda$ is $36\\times 36$ and consists of 18 diagonal blocks of the form" ] }, { "cell_type": "code", "collapsed": false, "input": [ "m3.\u03bb" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ "1-element Array{Any,1}:\n", " PDLCholF(Cholesky{Float64} with factor:\n", "2x2 Triangular{Float64,Array{Float64,2},:L,false}:\n", " 0.929226 0.0 \n", " 0.0181643 0.222646)" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This model also provides the gradient for the optimization" ] }, { "cell_type": "code", "collapsed": false, "input": [ "MixedModels.hasgrad(m3)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "true" ] } ], "prompt_number": 12 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "A model with independent random effects for intercept and slope" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we specify more than one random effects term for the same grouping factor, these are amalgamated into a single term when fitting." ] }, { "cell_type": "code", "collapsed": false, "input": [ "m4 = fit(lmm(Reaction ~ 1+Days + (1|Subject) + (0+Days|Subject), slp),true)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "f_1: 1784.6423, [" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "1.0,1.0]\n", "f_2: 1755.16173, [1.04647,0.159046]\n", "f_3: 1804.58124, [0.192941,1.05905]\n", "f_4: 1795.3958, [0.437654,1.05905]\n", "f_5: 1781.60206, [0.812816,0.891382]\n", "f_6: 1765.95434, [1.00957,0.544762]\n", "f_7: 1754.16459, [1.03851,0.30914]\n", "f_8: 1752.98638, [1.03629,0.276642]\n", "f_9: 1753.26874, [1.02498,0.18224]\n", "f_10: 1752.17734, [1.03219,0.236848]\n", "f_11: 1752.66367, [0.976269,0.192399]\n", "f_12: 1752.10695, [1.01955,0.226392]\n", "f_13: 1752.26292, [0.891397,0.248476]\n", "f_14: 1752.03612, [0.977546,0.232736]\n", "f_15: 1756.75776, [0.777162,0.149161]\n", "f_16: 1752.02373, [0.950054,0.220357]\n", "f_17: 1752.00507, [0.947215,0.228948]\n", "f_18: 1752.00331, [0.946874,0.227205]\n", "f_19: 1752.00389, [0.941836,0.226055]\n", "f_20: 1752.00326, [0.945839,0.226969]\n", "f_21: 1752.00336, [0.945545,0.226456]\n", "f_22: 1752.00326, [0.94581,0.226917]\n", "f_23: 1752.00326, [0.945808,0.226927]\n", "f_24: 1752.00326, [0.945842,0.226939]\n", "f_25: 1752.00326, [0.945812,0.226928]\n", "f_26: 1752.00326, [0.945811,0.226928]\n", "FTOL_REACHED\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ "Linear mixed model fit by maximum likelihood\n", "Formula: Reaction ~ 1 + Days + (1 | Subject) + ((0 + Days) | Subject)\n", "\n", " logLik: -876.001628, deviance: 1752.003255\n", "\n", " Variance components:\n", " Variance Std.Dev. Corr.\n", " Subject 584.250055 24.171265\n", " 33.633148 5.799409 0.00\n", " Residual 653.116005 25.556134\n", " Number of obs: 180; levels of grouping factors: 18\n", "\n", " Fixed-effects parameters:\n", " Estimate Std.Error z value\n", "(Intercept) 251.405 6.70767 37.4802\n", "Days 10.4673 1.51931 6.88948\n" ] } ], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "@time fit(lmm(Reaction ~ 1+Days + (1|Subject) + (0+Days|Subject), slp));" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "elapsed time: 0.005619764 seconds (702960 bytes allocated)\n" ] } ], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "MixedModels.hasgrad(m4)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 15, "text": [ "true" ] } ], "prompt_number": 15 }, { "cell_type": "code", "collapsed": false, "input": [ "m4.\u03bb" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 16, "text": [ "1-element Array{Any,1}:\n", " PDDiagF(2x2 Diagonal{Float64}:\n", " 0.945811 0.0 \n", " 0.0 0.226928)" ] } ], "prompt_number": 16 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "A model with random effects terms for two crossed grouping factors" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `Penicillin` data are from an assay of the potency of penicillin batches using an old method in which samples from the batch were applied to several plates of agar spiked with a bacterium and the diameter of the cleared region measured. Each of the 6 samples was applied to each of the 24 plates." ] }, { "cell_type": "code", "collapsed": false, "input": [ "pen = dataset(\"lme4\",\"Penicillin\");\n", "m5 = fit(lmm(Diameter ~ 1 + (1|Plate) + (1|Sample),pen),true)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "f_1: 364.62678, [" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "1.0,1.0]\n", "f_2: 337.90847, [" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "1.77431,1.88085]\n", "f_3: 333.965, [1.3028,2.43847]\n", "f_4: 333.07671, [1.71277,2.68316]\n", "f_5: 332.45223, [1.53153,2.7824]\n", "f_6: 332.40848, [1.41984,3.13078]\n", "f_7: 332.81251, [1.76305,3.15165]\n", "f_8: 332.19876, [1.54801,3.13536]\n", "f_9: 332.19519, [1.53628,3.14215]\n", "f_10: 332.19049, [1.5254,3.21805]\n", "f_11: 332.2817, [1.62101,3.21628]\n", "f_12: 332.18835, [1.53724,3.21786]\n", "f_13: 332.18857, [1.54153,3.21977]\n", "f_14: 332.18835, [1.53767,3.21805]\n", "f_15: 332.18835, [1.53754,3.2182]\n", "f_16: 332.18835, [1.53778,3.22006]\n", "f_17: 332.18839, [1.53587,3.21986]\n", "f_18: 332.18835, [1.53759,3.22004]\n", "f_19: 332.18835, [1.5376,3.22002]\n", "f_20: 332.18835, [1.5376,3.21975]\n", "f_21: 332.18835, [1.53748,3.21976]\n", "f_22: 332.18835, [1.53759,3.21975]\n", "f_23: 332.18835, [1.53759,3.21975]\n", "f_24: 332.18835, [1.53759,3.21975]\n", "f_25: 332.18835, [1.53759,3.21975]\n", "FTOL_REACHED\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 17, "text": [ "Linear mixed model fit by maximum likelihood\n", "Formula: Diameter ~ 1 + (1 | Plate) + (1 | Sample)\n", "\n", " logLik: -166.094174, deviance: 332.188349\n", "\n", " Variance components:\n", " Variance Std.Dev.\n", " Plate 0.714992 0.845572\n", " Sample 3.135184 1.770645\n", " Residual 0.302425 0.549932\n", " Number of obs: 144; levels of grouping factors: 24, 6\n", "\n", " Fixed-effects parameters:\n", " Estimate Std.Error z value\n", "(Intercept) 22.9722 0.744595 30.852\n" ] } ], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "@time fit(lmm(Diameter ~ 1 + (1|Plate) + (1|Sample),pen));" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "elapsed time: " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "0.028766902 seconds (7223940 bytes allocated)\n" ] } ], "prompt_number": 18 }, { "cell_type": "code", "collapsed": false, "input": [ "MixedModels.hasgrad(m5)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 19, "text": [ "true" ] } ], "prompt_number": 19 }, { "cell_type": "code", "collapsed": false, "input": [ "m5.\u03bb" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 20, "text": [ "2-element Array{Any,1}:\n", " PDScalF(1.53759361082603,1) \n", " PDScalF(3.219751759198035,1)" ] } ], "prompt_number": 20 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "A model with random effects for nested factors" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `Pastes` data provides the strength of samples of paste from each of three casks sampled from each of 10 batches." ] }, { "cell_type": "code", "collapsed": false, "input": [ "psts = dataset(\"lme4\",\"Pastes\");\n", "m6 = fit(lmm(Strength ~ 1 + (1|Sample) + (1|Batch),psts),true)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "f_1: 274.8397, [" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "1.0,1.0]\n", "f_2: 258.56865, [1.75,1.0]\n", "f_3: 277.39618, [1.0,1.75]\n", "f_4: 300.92218, [0.25,1.0]\n", "f_5: 279.7496, [1.0,0.25]\n", "f_6: 250.93944, [2.49601,1.07724]\n", "f_7: 249.5404, [2.74867,1.15646]\n", "f_8: 248.64833, [2.98474,1.63052]\n", "f_9: 248.59997, [2.99408,1.54877]\n", "f_10: 248.43446, [3.0687,1.54124]\n", "f_11: 248.29936, [3.14346,1.53526]\n", "f_12: 248.15509, [3.26562,1.58041]\n", "f_13: 248.20737, [3.38606,1.81135]\n", "f_14: 248.05426, [3.38994,1.54163]\n", "f_15: 248.05921, [3.49885,1.61304]\n", "f_16: 248.0504, [3.43672,1.57657]\n", "f_17: 248.04172, [3.43144,1.54874]\n", "f_18: 248.02318, [3.4418,1.49304]\n", "f_19: 248.01406, [3.42417,1.38112]\n", "f_20: 248.0061, [3.44879,1.38228]\n", "f_21: 247.99886, [3.47775,1.35361]\n", "f_22: 247.99762, [3.48518,1.34275]\n", "f_23: 247.99844, [3.48937,1.30221]\n", "f_24: 247.99559, [3.50242,1.3319]\n", "f_25: 247.99468, [3.51952,1.32081]\n", "f_26: 247.99472, [3.52475,1.31303]\n", "f_27: 247.9945, [3.52317,1.32736]\n", "f_28: 247.99448, [3.5253,1.33384]\n", "f_29: 247.99448, [3.52484,1.33222]\n", "f_30: 247.99447, [3.52599,1.33059]\n", "f_31: 247.99447, [3.52678,1.32968]\n", "f_32: 247.99447, [3.52749,1.32946]\n", "f_33: 247.99447, [3.52669,1.32894]\n", "f_34: 247.99447, [3.52689,1.32988]\n", "f_35: 247.99447, [3.52689,1.32992]\n", "FTOL_REACHED\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 21, "text": [ "Linear mixed model fit by maximum likelihood\n", "Formula: Strength ~ 1 + (1 | Sample) + (1 | Batch)\n", "\n", " logLik: -123.997233, deviance: 247.994466\n", "\n", " Variance components:\n", " Variance Std.Dev.\n", " Sample 8.433617 2.904069\n", " Batch 1.199179 1.095070\n", " Residual 0.678002 0.823409\n", " Number of obs: 60; levels of grouping factors: 30, 10\n", "\n", " Fixed-effects parameters:\n", " Estimate Std.Error z value\n", "(Intercept) 60.0533 0.642136 93.5212\n" ] } ], "prompt_number": 21 }, { "cell_type": "code", "collapsed": false, "input": [ "@time fit(lmm(Strength ~ 1 + (1|Sample) + (1|Batch),psts));" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "elapsed time: " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "0.015186255 seconds (2387896 bytes allocated)\n" ] } ], "prompt_number": 22 }, { "cell_type": "code", "collapsed": false, "input": [ "typeof(m6)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 23, "text": [ "LinearMixedModel{PLSNested} (constructor with 1 method)" ] } ], "prompt_number": 23 }, { "cell_type": "markdown", "metadata": {}, "source": [ "At present the gradient is not available for models with nested grouping factors, but it will be." ] }, { "cell_type": "code", "collapsed": false, "input": [ "MixedModels.hasgrad(m6)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 24, "text": [ "false" ] } ], "prompt_number": 24 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Multiple grouping factors neither nested not completely crossed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `InstEval` data are several years of instructor evaluations at ETH-Zurich. The student id is `S` and the instructor is `D`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "inst = dataset(\"lme4\",\"InstEval\");\n", "m7 = fit(lmm(Y ~ Dept*Service + (1|S) + (1|D), inst))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 25, "text": [ "Linear mixed model fit by maximum likelihood\n", "Formula: Y ~ Dept * Service + (1 | S) + (1 | D)\n", "\n", " logLik: -118792.776708, deviance: 237585.553415\n", "\n", " Variance components:\n", " Variance Std.Dev.\n", " S 0.105418 0.324681\n", " D 0.258416 0.508347\n", " Residual 1.384728 1.176745\n", " Number of obs: 73421; levels of grouping factors: 2972, 1128\n", "\n", " Fixed-effects parameters:\n", " Estimate Std.Error z value\n", "(Intercept) 3.22961 0.064053 50.4209\n", "Dept5 0.129536 0.101294 1.27882\n", "Dept10 -0.176751 0.0881352 -2.00545\n", "Dept12 0.0517102 0.0817524 0.632522\n", "Dept6 0.0347319 0.085621 0.405647\n", "Dept7 0.14594 0.0997984 1.46235\n", "Dept4 0.151689 0.0816897 1.85689\n", "Dept8 0.104206 0.118751 0.877517\n", "Dept9 0.0440401 0.0962985 0.457329\n", "Dept14 0.0517546 0.0986029 0.524879\n", "Dept1 0.0466719 0.101942 0.457828\n", "Dept3 0.0563461 0.0977925 0.57618\n", "Dept11 0.0596536 0.100233 0.59515\n", "Dept2 0.00556281 0.110867 0.0501756\n", "Service1 0.252025 0.0686507 3.67112\n", "Dept5&Service1 -0.180757 0.123179 -1.46744\n", "Dept10&Service1 0.0186492 0.110017 0.169512\n", "Dept12&Service1 -0.282269 0.0792937 -3.55979\n", "Dept6&Service1 -0.494464 0.0790278 -6.25683\n", "Dept7&Service1 -0.392054 0.110313 -3.55403\n", "Dept4&Service1 -0.278547 0.0823727 -3.38154\n", "Dept8&Service1 -0.189526 0.111449 -1.70056\n", "Dept9&Service1 -0.499868 0.0885423 -5.64553\n", "Dept14&Service1 -0.497162 0.0917162 -5.42065\n", "Dept1&Service1 -0.24042 0.0982071 -2.4481\n", "Dept3&Service1 -0.223013 0.0890548 -2.50422\n", "Dept11&Service1 -0.516997 0.0809077 -6.38997\n", "Dept2&Service1 -0.384773 0.091843 -4.18946\n" ] } ], "prompt_number": 25 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Not too long ago this would have been considered a \"large\" model to fit." ] }, { "cell_type": "code", "collapsed": false, "input": [ "@time fit(lmm(Y ~ Dept*Service + (1|S) + (1|D), inst));" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "elapsed time: " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "5.873610568 seconds (333351324 bytes allocated, 8.24% gc time)\n" ] } ], "prompt_number": 26 }, { "cell_type": "code", "collapsed": false, "input": [ "typeof(m7)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 27, "text": [ "LinearMixedModel{PLSDiag{Int32}} (constructor with 1 method)" ] } ], "prompt_number": 27 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the first model for which the sparse matrix representation of $\\Lambda'Z'Z\\Lambda$ and the sparse Cholesky factor is used. At present [SuiteSparse](http://suitesparse.com) is used for the factorization but we are considering alternatives. It is challenging to reduce the amount of copying done by SuiteSparse" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "A large \"Multilevel\" (nested grouping factors) model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In some disciplines models with random effects are all called [Multilevel](http://www.bristol.ac.uk/cmm/learning/multilevel-models/) or \"Hierarchical Linear\" models. Most of the software to fit such models assumes that grouping factors are nested. Some of the data sets used in a review of software for fitting such models is available in the `mlmRev` package for [R](http://www.r-project.org)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "chem = dataset(\"mlmRev\",\"Chem97\");\n", "@time m8 = fit(lmm(Score ~ 1+GCSECnt+Gender+Age + (1|School) + (1|Lea),chem))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "elapsed time: " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "1.459777227 seconds (203399116 bytes allocated, 21.25% gc time)\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 28, "text": [ "Linear mixed model fit by maximum likelihood\n", "Formula: Score ~ 1 + GCSECnt + Gender + Age + (1 | School) + (1 | Lea)\n", "\n", " logLik: -70498.755697, deviance: 140997.511394\n", "\n", " Variance components:\n", " Variance Std.Dev.\n", " School 1.131754 1.063839\n", " Lea 0.018988 0.137798\n", " Residual 5.041983 2.245436\n", " Number of obs: 31022; levels of grouping factors: 2410, 131\n", "\n", " Fixed-effects parameters:\n", " Estimate Std.Error z value\n", "(Intercept) 5.97244 0.0348207 171.52\n", "GCSECnt 2.56977 0.0171238 150.07\n", "GenderF -0.744255 0.0302938 -24.5679\n", "Age -0.0376029 0.00382113 -9.84078\n" ] } ], "prompt_number": 28 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Models with vector-valued random effects for crossed factors" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In experimental psychology and especially in psycholinguistics it is common to observe characteristics, such as reaction time, on several subjects exposed to a number of items. Only recently has it been possible to fit models with random effects for `Subject` and `Item`. These blocking factors are crossed or nearly crossed. The experimental factors are often in a 2-level factorial design.\n", "\n", "A recent paper indicated that it is important to use a \"maximal\" set of random effects for the blocking factors when analyzing such data. Such models usually end up severely overparameterized and are difficult to fit. The optimization problem has many free variables and the optimum is poorly defined and usually on the boundary of the parameter space." ] }, { "cell_type": "code", "collapsed": false, "input": [ "bs10 = MixedModels.rdata(\"bs10\");\n", "names(bs10)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 29, "text": [ "12-element Array{Symbol,1}:\n", " :SubjID\n", " :ItemID\n", " :Spkr \n", " :Filler\n", " :ms \n", " :d \n", " :d2 \n", " :Spkr2 \n", " :dif \n", " :SF \n", " :F \n", " :S " ] } ], "prompt_number": 29 }, { "cell_type": "code", "collapsed": false, "input": [ "@time m9 = fit(lmm(dif ~ 1+S+F+SF + (1+S+F+SF|SubjID) + (1+S+F+SF|ItemID), bs10))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "elapsed time: " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "1.691779479 seconds (243729412 bytes allocated, 22.98% gc time)\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 30, "text": [ "Linear mixed model fit by maximum likelihood\n", "Formula: dif ~ 1 + S + F + SF + ((1 + S + F + SF) | SubjID) + ((1 + S + F + SF) | ItemID)\n", "\n", " logLik: -515.477648, deviance: 1030.955295\n", "\n", " Variance components:\n", " Variance Std.Dev. Corr.\n", " SubjID 0.013393 0.115730\n", " 0.011216 0.105907 -0.56\n", " 0.003373 0.058075 0.99 0.99\n", " 0.005192 0.072053 -0.13 -0.13 -0.13\n", " ItemID 0.000305 0.017459\n", " 0.000079 0.008904 -1.00\n", " 0.000136 0.011647 1.00 1.00\n", " 0.000083 0.009125 -1.00 -1.00 -1.00\n", " Residual 0.127779 0.357462\n", " Number of obs: 1104; levels of grouping factors: 92, 12\n", "\n", " Fixed-effects parameters:\n", " Estimate Std.Error z value\n", "(Intercept) 0.039221 0.0169329 2.31626\n", "S -0.0174094 0.0156289 -1.11392\n", "F 0.0174819 0.0127947 1.36633\n", "SF -0.0322645 0.0133832 -2.41082\n" ] } ], "prompt_number": 30 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that there are 10 parameters to estimate in the covariance factor for `ItemID` and only 12 distict levels of that factor. The covariance matrices for the random effects are both apparently 4-dimensional but in fact are singular. The `SubjID` random effects lie in a 2-dimensional subspace and the `ItemID` random effects in a 1-dimensional subspace." ] }, { "cell_type": "code", "collapsed": false, "input": [ "m9.\u03bb" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 31, "text": [ "2-element Array{Any,1}:\n", " PDLCholF(Cholesky{Float64} with factor:\n", "4x4 Triangular{Float64,Array{Float64,2},:L,false}:\n", " 0.323755 0.0 0.0 0.0\n", " -0.166342 0.24517 0.0 0.0\n", " 0.161121 -0.0208524 1.2824e-6 0.0\n", " -0.0265681 0.199809 -1.86487e-6 0.0) \n", " PDLCholF(Cholesky{Float64} with factor:\n", "4x4 Triangular{Float64,Array{Float64,2},:L,false}:\n", " 0.048841 0.0 0.0 0.0\n", " -0.0249089 9.75571e-7 0.0 0.0\n", " 0.0325814 -7.88902e-8 0.0 0.0\n", " -0.0255267 -3.63047e-8 4.95277e-10 0.0)" ] } ], "prompt_number": 31 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a 20-dimensional optimization problem on the profiled log-likelihood. Other examples involve 70 or more nonlinear parameters." ] }, { "cell_type": "code", "collapsed": false, "input": [ "kb07 = MixedModels.rdata(\"kb07\")\n", "names(kb07)'" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 32, "text": [ "1x15 Array{Symbol,2}:\n", " :SubjID :ItemID :Spkr :Prec :CogLoad \u2026 :S :P :C :SP :SC :PC :SPC" ] } ], "prompt_number": 32 }, { "cell_type": "code", "collapsed": false, "input": [ "@time m10 = fit(lmm(RTtrunc ~ 1+S+P+C+SP+SC+PC+SPC + (1+S+P+C+SP+SC+PC+SPC|SubjID) + (1+S+P+C+SP+SC+PC+SPC|ItemID), kb07))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "elapsed time: " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "266.277216762 seconds (26724434372 bytes allocated, 16.81% gc time)\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 33, "text": [ "Linear mixed model fit by maximum likelihood\n", "Formula: RTtrunc ~ 1 + S + P + C + SP + SC + PC + SPC + ((1 + S + P + C + SP + SC + PC + SPC) | SubjID) + ((1 + S + P + C + SP + SC + PC + SPC) | ItemID)\n", "\n", " logLik: -14293.158811, deviance: 28586.317622\n", "\n", " Variance components:\n", " Variance Std.Dev. Corr.\n", " SubjID 90769.700667 301.280103\n", " 20729.176368 143.976305 -0.43\n", " 22176.490819 148.917732 -0.47 -0.47\n", " 30348.287532 174.207599 0.21 0.21 0.21\n", " 141271.253273 375.860683 0.20 0.20 0.20 0.20\n", " 29142.540830 170.711865 0.47 0.47 0.47 0.47 0.47\n", " 118762.228836 344.618962 -0.10 -0.10 -0.10 -0.10 -0.10 -0.10\n", " 243331.145812 493.286069 -0.48 -0.48 -0.48 -0.48 -0.48 -0.48 -0.48\n", " ItemID 129824.695548 360.311942\n", " 7421.991851 86.150983 -0.34\n", " 249574.280629 499.574099 -0.68 -0.68\n", " 11792.165997 108.591740 0.20 0.20 0.20\n", " 16683.129834 129.163191 0.57 0.57 0.57 0.57\n", " 25917.846598 160.990207 0.28 0.28 0.28 0.28 0.28\n", " 75203.218971 274.232053 0.08 0.08 0.08 0.08 0.08 0.08\n", " 308511.015871 555.437680 0.04 0.04 0.04 0.04 0.04 0.04 0.04\n", " Residual 399612.544911 632.149148\n", " Number of obs: 1790; levels of grouping factors: 56, 32\n", "\n", " Fixed-effects parameters:\n", " Estimate Std.Error z value\n", "(Intercept) 2180.63 76.8192 28.3865\n", "S -133.98 38.6681 -3.46487\n", "P -667.763 95.3327 -7.00455\n", "C 157.974 42.4683 3.71981\n", "SP 88.6069 81.342 1.08931\n", "SC -75.6977 70.02 -1.08109\n", "PC 21.048 89.6846 0.234689\n", "SPC -191.608 168.155 -1.13947\n" ] } ], "prompt_number": 33 }, { "cell_type": "code", "collapsed": false, "input": [ "m10.\u03bb" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 34, "text": [ "2-element Array{Any,1}:\n", " PDLCholF(Cholesky{Float64} with factor:\n", "8x8 Triangular{Float64,Array{Float64,2},:L,false}:\n", " 0.476597 0.0 0.0 \u2026 0.0 0.0 0.0\n", " -0.0989296 0.205149 0.0 0.0 0.0 0.0\n", " -0.111553 -0.0341986 0.204649 0.0 0.0 0.0\n", " 0.0590232 -0.0332038 0.156032 0.0 0.0 0.0\n", " 0.116584 -0.443693 -0.377608 0.0 0.0 0.0\n", " 0.127793 -0.097419 0.0203143 \u2026 5.74227e-57 0.0 0.0\n", " -0.0556877 0.0506364 -0.0536056 5.96388e-57 4.04582e-144 0.0\n", " -0.371105 0.174335 -0.520435 -1.03749e-56 -3.00707e-144 0.0)\n", " PDLCholF(Cholesky{Float64} with factor:\n", "8x8 Triangular{Float64,Array{Float64,2},:L,false}:\n", " 0.569979 0.0 0.0 \u2026 0.0 0.0 0.0\n", " -0.0467818 0.128002 0.0 0.0 0.0 0.0\n", " -0.534095 -0.576095 0.0860161 0.0 0.0 0.0\n", " 0.0347278 0.00688954 -0.0257139 0.0 0.0 0.0\n", " 0.116132 -0.123402 -0.0684817 0.0 0.0 0.0\n", " 0.0713495 0.0186925 -0.0655391 \u2026 4.9444e-5 0.0 0.0\n", " 0.0339327 -0.0951286 0.402184 -4.56546e-5 0.0 0.0\n", " 0.03904 -0.431245 -0.061637 -3.57477e-5 1.3111e-121 0.0) " ] } ], "prompt_number": 34 } ], "metadata": {} } ] }