{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Parametric bootstrap in Julia"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Fabrizzio Sanchez](http://www.stat.wisc.edu/node/2034) suggested bootstrapping a statistical model as a non-trivial task on which to demonstrate the speed of [Julia](http://www.stat.wisc.edu).\n",
    "\n",
    "In this notebook I will demonstrate a _parametric bootstrap_, which means that samples are drawn from the probability model with parameters set at their estimated values from the original model, and the model refit.\n",
    "\n",
    "A simple example is a _linear model_ which can be fit using the `GLM` package."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING: Base.String is deprecated, use AbstractString instead.\n",
      "  likely near /home/bates/.julia/v0.4/RDatasets/src/dataset.jl:1\n",
      "WARNING: Base.String is deprecated, use AbstractString instead.\n",
      "  likely near /home/bates/.julia/v0.4/RDatasets/src/dataset.jl:1\n",
      "WARNING: Base.String is deprecated, use AbstractString instead.\n",
      "  likely near /home/bates/.julia/v0.4/RDatasets/src/datasets.jl:1\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"data-frame\"><tr><th></th><th>Country</th><th>SR</th><th>Pop15</th><th>Pop75</th><th>DPI</th><th>DDPI</th></tr><tr><th>1</th><td>Australia</td><td>11.43</td><td>29.35</td><td>2.87</td><td>2329.68</td><td>2.87</td></tr><tr><th>2</th><td>Austria</td><td>12.07</td><td>23.32</td><td>4.41</td><td>1507.99</td><td>3.93</td></tr><tr><th>3</th><td>Belgium</td><td>13.17</td><td>23.8</td><td>4.43</td><td>2108.47</td><td>3.82</td></tr><tr><th>4</th><td>Bolivia</td><td>5.75</td><td>41.89</td><td>1.67</td><td>189.13</td><td>0.22</td></tr><tr><th>5</th><td>Brazil</td><td>12.88</td><td>42.19</td><td>0.83</td><td>728.47</td><td>4.56</td></tr><tr><th>6</th><td>Canada</td><td>8.79</td><td>31.72</td><td>2.85</td><td>2982.88</td><td>2.43</td></tr><tr><th>7</th><td>Chile</td><td>0.6</td><td>39.74</td><td>1.34</td><td>662.86</td><td>2.67</td></tr><tr><th>8</th><td>China</td><td>11.9</td><td>44.75</td><td>0.67</td><td>289.52</td><td>6.51</td></tr><tr><th>9</th><td>Colombia</td><td>4.98</td><td>46.64</td><td>1.06</td><td>276.65</td><td>3.08</td></tr><tr><th>10</th><td>Costa Rica</td><td>10.78</td><td>47.64</td><td>1.14</td><td>471.24</td><td>2.8</td></tr><tr><th>11</th><td>Denmark</td><td>16.85</td><td>24.42</td><td>3.93</td><td>2496.53</td><td>3.99</td></tr><tr><th>12</th><td>Ecuador</td><td>3.59</td><td>46.31</td><td>1.19</td><td>287.77</td><td>2.19</td></tr><tr><th>13</th><td>Finland</td><td>11.24</td><td>27.84</td><td>2.37</td><td>1681.25</td><td>4.32</td></tr><tr><th>14</th><td>France</td><td>12.64</td><td>25.06</td><td>4.7</td><td>2213.82</td><td>4.52</td></tr><tr><th>15</th><td>Germany</td><td>12.55</td><td>23.31</td><td>3.35</td><td>2457.12</td><td>3.44</td></tr><tr><th>16</th><td>Greece</td><td>10.67</td><td>25.62</td><td>3.1</td><td>870.85</td><td>6.28</td></tr><tr><th>17</th><td>Guatamala</td><td>3.01</td><td>46.05</td><td>0.87</td><td>289.71</td><td>1.48</td></tr><tr><th>18</th><td>Honduras</td><td>7.7</td><td>47.32</td><td>0.58</td><td>232.44</td><td>3.19</td></tr><tr><th>19</th><td>Iceland</td><td>1.27</td><td>34.03</td><td>3.08</td><td>1900.1</td><td>1.12</td></tr><tr><th>20</th><td>India</td><td>9.0</td><td>41.31</td><td>0.96</td><td>88.94</td><td>1.54</td></tr><tr><th>21</th><td>Ireland</td><td>11.34</td><td>31.16</td><td>4.19</td><td>1139.95</td><td>2.99</td></tr><tr><th>22</th><td>Italy</td><td>14.28</td><td>24.52</td><td>3.48</td><td>1390.0</td><td>3.54</td></tr><tr><th>23</th><td>Japan</td><td>21.1</td><td>27.01</td><td>1.91</td><td>1257.28</td><td>8.21</td></tr><tr><th>24</th><td>Korea</td><td>3.98</td><td>41.74</td><td>0.91</td><td>207.68</td><td>5.81</td></tr><tr><th>25</th><td>Luxembourg</td><td>10.35</td><td>21.8</td><td>3.73</td><td>2449.39</td><td>1.57</td></tr><tr><th>26</th><td>Malta</td><td>15.48</td><td>32.54</td><td>2.47</td><td>601.05</td><td>8.12</td></tr><tr><th>27</th><td>Norway</td><td>10.25</td><td>25.95</td><td>3.67</td><td>2231.03</td><td>3.62</td></tr><tr><th>28</th><td>Netherlands</td><td>14.65</td><td>24.71</td><td>3.25</td><td>1740.7</td><td>7.66</td></tr><tr><th>29</th><td>New Zealand</td><td>10.67</td><td>32.61</td><td>3.17</td><td>1487.52</td><td>1.76</td></tr><tr><th>30</th><td>Nicaragua</td><td>7.3</td><td>45.04</td><td>1.21</td><td>325.54</td><td>2.48</td></tr><tr><th>&vellip;</th><td>&vellip;</td><td>&vellip;</td><td>&vellip;</td><td>&vellip;</td><td>&vellip;</td><td>&vellip;</td></tr></table>"
      ],
      "text/plain": [
       "50x6 DataFrames.DataFrame\n",
       "| Row | Country          | SR    | Pop15 | Pop75 | DPI     | DDPI  |\n",
       "|-----|------------------|-------|-------|-------|---------|-------|\n",
       "| 1   | \"Australia\"      | 11.43 | 29.35 | 2.87  | 2329.68 | 2.87  |\n",
       "| 2   | \"Austria\"        | 12.07 | 23.32 | 4.41  | 1507.99 | 3.93  |\n",
       "| 3   | \"Belgium\"        | 13.17 | 23.8  | 4.43  | 2108.47 | 3.82  |\n",
       "| 4   | \"Bolivia\"        | 5.75  | 41.89 | 1.67  | 189.13  | 0.22  |\n",
       "| 5   | \"Brazil\"         | 12.88 | 42.19 | 0.83  | 728.47  | 4.56  |\n",
       "| 6   | \"Canada\"         | 8.79  | 31.72 | 2.85  | 2982.88 | 2.43  |\n",
       "| 7   | \"Chile\"          | 0.6   | 39.74 | 1.34  | 662.86  | 2.67  |\n",
       "| 8   | \"China\"          | 11.9  | 44.75 | 0.67  | 289.52  | 6.51  |\n",
       "| 9   | \"Colombia\"       | 4.98  | 46.64 | 1.06  | 276.65  | 3.08  |\n",
       "| 10  | \"Costa Rica\"     | 10.78 | 47.64 | 1.14  | 471.24  | 2.8   |\n",
       "| 11  | \"Denmark\"        | 16.85 | 24.42 | 3.93  | 2496.53 | 3.99  |\n",
       "⋮\n",
       "| 39  | \"Sweden\"         | 6.86  | 21.44 | 4.54  | 3299.49 | 3.01  |\n",
       "| 40  | \"Switzerland\"    | 14.13 | 23.49 | 3.73  | 2630.96 | 2.7   |\n",
       "| 41  | \"Turkey\"         | 5.13  | 43.42 | 1.08  | 389.66  | 2.96  |\n",
       "| 42  | \"Tunisia\"        | 2.81  | 46.12 | 1.21  | 249.87  | 1.13  |\n",
       "| 43  | \"United Kingdom\" | 7.81  | 23.27 | 4.46  | 1813.93 | 2.01  |\n",
       "| 44  | \"United States\"  | 7.56  | 29.81 | 3.43  | 4001.89 | 2.45  |\n",
       "| 45  | \"Venezuela\"      | 9.22  | 46.4  | 0.9   | 813.39  | 0.53  |\n",
       "| 46  | \"Zambia\"         | 18.56 | 45.25 | 0.56  | 138.33  | 5.14  |\n",
       "| 47  | \"Jamaica\"        | 7.72  | 41.12 | 1.73  | 380.47  | 10.23 |\n",
       "| 48  | \"Uruguay\"        | 9.24  | 28.13 | 2.72  | 766.54  | 1.88  |\n",
       "| 49  | \"Libya\"          | 8.89  | 43.69 | 2.07  | 123.58  | 16.71 |\n",
       "| 50  | \"Malaysia\"       | 4.71  | 47.2  | 0.66  | 242.69  | 5.08  |"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "using GLM, RDatasets\n",
    "ls = dataset(\"datasets\",\"LifeCycleSavings\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DataFrames.DataFrameRegressionModel{GLM.LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}},Float64}:\n",
       "\n",
       "Coefficients:\n",
       "                 Estimate   Std.Error   t value Pr(>|t|)\n",
       "(Intercept)       28.5661     7.35452   3.88416   0.0003\n",
       "Pop15           -0.461193    0.144642  -3.18851   0.0026\n",
       "Pop75             -1.6915      1.0836    -1.561   0.1255\n",
       "DPI          -0.000336902 0.000931107 -0.361829   0.7192\n",
       "DDPI             0.409695    0.196197   2.08818   0.0425\n"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "m1 = lm(SR ~ Pop15 + Pop75 + DPI + DDPI, ls)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will take the fitted values and the estimate of the scale parameter from this model as the mean and scale parameter of a multivariate normal distribution from which to simulate new response vectors.\n",
    "\n",
    "Getting the fitted values turned out to be more complicated than I thought it would be because I hadn't written the appropriate method.  For now we will use the somewhat mysterious extractor `m1.model.rr.mu` to get this.\n",
    "\n",
    "We will use the `IsoNormal` type of multivariate normal distribution from the `Distributions` package for simulating the responses."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "ename": "LoadError",
     "evalue": "LoadError: fitted is not defined for DataFrames.DataFrameRegressionModel{GLM.LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}},Float64}.\nwhile loading In[6], in expression starting on line 1",
     "output_type": "error",
     "traceback": [
      "LoadError: fitted is not defined for DataFrames.DataFrameRegressionModel{GLM.LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}},Float64}.\nwhile loading In[6], in expression starting on line 1",
      "",
      " in error at ./error.jl:21",
      " in fitted at /home/bates/.julia/v0.4/StatsBase/src/statmodels.jl:18"
     ]
    }
   ],
   "source": [
    "fitted(m1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "ename": "LoadError",
     "evalue": "LoadError: MethodError: `convert` has no method matching convert(::Type{PDMats.ScalMat{Float64}}, ::Float64)\nThis may have arisen from a call to the constructor PDMats.ScalMat{Float64}(...),\nsince type constructors fall back to convert methods.\nClosest candidates are:\n  PDMats.ScalMat{T<:AbstractFloat}(::Any, !Matched::Any, !Matched::Any)\n  call{T}(::Type{T}, ::Any)\n  convert{T}(::Type{T}, !Matched::T)\nwhile loading In[3], in expression starting on line 1",
     "output_type": "error",
     "traceback": [
      "LoadError: MethodError: `convert` has no method matching convert(::Type{PDMats.ScalMat{Float64}}, ::Float64)\nThis may have arisen from a call to the constructor PDMats.ScalMat{Float64}(...),\nsince type constructors fall back to convert methods.\nClosest candidates are:\n  PDMats.ScalMat{T<:AbstractFloat}(::Any, !Matched::Any, !Matched::Any)\n  call{T}(::Type{T}, ::Any)\n  convert{T}(::Type{T}, !Matched::T)\nwhile loading In[3], in expression starting on line 1",
      "",
      " in call at /home/bates/.julia/v0.4/Distributions/src/multivariate/mvnormal.jl:64"
     ]
    }
   ],
   "source": [
    "d = IsoNormal(m1.model.rr.mu,scale(m1.model))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting the coefficients."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A simple method of obtaining the coefficients from fitting the model matrix to a new simulated response is the `lm` method for a model matrix and a response."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "50x5 Array{Float64,2}:\n",
       " 1.0  29.35  2.87  2329.68   2.87\n",
       " 1.0  23.32  4.41  1507.99   3.93\n",
       " 1.0  23.8   4.43  2108.47   3.82\n",
       " 1.0  41.89  1.67   189.13   0.22\n",
       " 1.0  42.19  0.83   728.47   4.56\n",
       " 1.0  31.72  2.85  2982.88   2.43\n",
       " 1.0  39.74  1.34   662.86   2.67\n",
       " 1.0  44.75  0.67   289.52   6.51\n",
       " 1.0  46.64  1.06   276.65   3.08\n",
       " 1.0  47.64  1.14   471.24   2.8 \n",
       " 1.0  24.42  3.93  2496.53   3.99\n",
       " 1.0  46.31  1.19   287.77   2.19\n",
       " 1.0  27.84  2.37  1681.25   4.32\n",
       " ⋮                               \n",
       " 1.0  21.44  4.54  3299.49   3.01\n",
       " 1.0  23.49  3.73  2630.96   2.7 \n",
       " 1.0  43.42  1.08   389.66   2.96\n",
       " 1.0  46.12  1.21   249.87   1.13\n",
       " 1.0  23.27  4.46  1813.93   2.01\n",
       " 1.0  29.81  3.43  4001.89   2.45\n",
       " 1.0  46.4   0.9    813.39   0.53\n",
       " 1.0  45.25  0.56   138.33   5.14\n",
       " 1.0  41.12  1.73   380.47  10.23\n",
       " 1.0  28.13  2.72   766.54   1.88\n",
       " 1.0  43.69  2.07   123.58  16.71\n",
       " 1.0  47.2   0.66   242.69   5.08"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X = copy(m1.model.pp.X) # copy the X matrix from the predictor field, pp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LinearModel{DensePredQR{Float64}}:\n",
       "\n",
       "Coefficients:        Estimate  Std.Error  t value Pr(>|t|)\n",
       "x1       28.5583    8.82106  3.23751   0.0023\n",
       "x2     -0.445019   0.173485 -2.56518   0.0137\n",
       "x3      -2.39471    1.29968 -1.84255   0.0720\n",
       "x4   0.000373479 0.00111678 0.334426   0.7396\n",
       "x5      0.449379    0.23532  1.90965   0.0626\n",
       "\n"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "srand(1234321)\n",
    "lm(X,rand(d))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose that we want the coefficient estimates, $\\widehat{\\beta}$, and the scale parameter estimate, $\\widehat{\\sigma}$, from, say, 10000 simulated responses for this model.\n",
    "\n",
    "For a single fit we can extract both of these and concatenate them as a vector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6-element Array{Float64,1}:\n",
       " 30.153     \n",
       " -0.509894  \n",
       " -1.85234   \n",
       " -0.00174832\n",
       "  0.737757  \n",
       "  4.1995    "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sm1 = lm(X,rand(d))\n",
    "[coef(sm1), scale(sm1)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's express this as a function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "samp1 (generic function with 1 method)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "samp1(X::Matrix,d) = (sm = lm(X,rand(d)); [coef(sm), scale(sm)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6-element Array{Float64,1}:\n",
       " 38.8541    \n",
       " -0.657691  \n",
       " -2.45837   \n",
       " -0.00162179\n",
       "  0.277361  \n",
       "  3.63202   "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "samp1(X,d)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating the parametric bootstrap sample"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The simplest way to create a parametric bootstrap sample of these parameter estimates is as a comprehension."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "10000-element Array{Any,1}:\n",
       " [28.5583,-0.445019,-2.39471,0.000373479,0.449379,4.56095]  \n",
       " [35.944,-0.621564,-2.35589,-0.0012479,0.598529,4.14837]    \n",
       " [21.9009,-0.343171,-0.86067,-9.93608e-5,0.338381,4.10657]  \n",
       " [33.1855,-0.601662,-1.77152,-0.000105169,0.398659,3.27316] \n",
       " [17.9789,-0.231693,-0.386903,-0.000723807,0.371328,3.69092]\n",
       " [31.442,-0.516094,-2.32955,0.000473851,0.274775,3.83084]   \n",
       " [17.6364,-0.292026,-0.440675,-0.000368788,0.828189,3.94549]\n",
       " [33.2162,-0.51244,-2.8588,8.82325e-5,0.373106,4.17757]     \n",
       " [12.508,-0.148361,-0.244755,-0.000232026,0.877372,4.6156]  \n",
       " [33.1391,-0.538018,-3.09807,-0.000228105,0.362138,3.62479] \n",
       " [29.9126,-0.476695,-1.169,-0.000781248,0.175824,4.16407]   \n",
       " [30.0718,-0.479997,-2.0634,-0.000215239,0.455825,4.23013]  \n",
       " [21.1611,-0.370234,-1.25906,0.000697624,0.827424,3.91826]  \n",
       " ⋮                                                          \n",
       " [29.7568,-0.503739,-1.68198,-9.8295e-5,0.514151,4.23564]   \n",
       " [40.0399,-0.673688,-2.69232,-0.00111253,0.312692,3.84436]  \n",
       " [26.1465,-0.357683,-1.81777,-0.000553048,0.157578,4.28691] \n",
       " [32.3954,-0.548962,-2.91014,0.000659737,0.51586,2.50244]   \n",
       " [23.1756,-0.327187,-0.69978,7.07523e-5,0.20963,3.70527]    \n",
       " [29.086,-0.471942,-1.49713,-0.000662799,0.399184,2.99092]  \n",
       " [37.6333,-0.625821,-3.35728,0.000467882,0.255695,4.19263]  \n",
       " [28.9012,-0.483732,-2.56316,0.000625508,0.708579,3.96592]  \n",
       " [26.5054,-0.439392,-2.0225,0.000442202,0.61712,4.27044]    \n",
       " [17.5618,-0.255279,-1.47908,0.00119981,0.885358,3.06848]   \n",
       " [21.5199,-0.329395,-1.27995,0.000940134,0.233601,3.5673]   \n",
       " [28.1422,-0.431297,-1.82568,-0.000431981,0.423766,3.95991] "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "srand(1234321)\n",
    "bss = [samp1(X,d) for i in 1:10000]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "elapsed time: 2.263262407 seconds (329511888 bytes allocated, 16.73% gc time)\n"
     ]
    }
   ],
   "source": [
    "@time [samp1(X,d) for i in 1:10000];"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we see, this produces the sample in the form of an array of vectors, as opposed to, say, a matrix.  Furthermore this process is not blazingly fast, allocates a lot of memory and spends about 1/6 of the time in garbage collection.\n",
    "\n",
    "What is happening here is that all the work of creating the model object and factoring the model matrix is (unnecessarily) being done for every sample."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using the factorization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As indicated by the rather long name of the object type for the fitted models, a dense QR decomposition is used to obtain the coefficients."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "QRCompactWY{Float64} (constructor with 1 method)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "qrd = m1.model.pp.qr;\n",
    "typeof(qrd)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5-element Array{Float64,1}:\n",
       " 28.5583     \n",
       " -0.445019   \n",
       " -2.39471    \n",
       "  0.000373479\n",
       "  0.449379   "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "srand(1234321)\n",
    "y1 = rand(d)\n",
    "β = qrd\\y1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see we get the same coefficient estimates using the pre-computed QR decomposition of the model matrix.  To get $\\widehat{\\sigma}$ we could use these coefficient estimates, generate the fitted values and residuals and assumulate the sum-of-squared residuals. However, there is a faster way using the vector $\\bf Q^\\prime y$.  The coefficient estimates satisfy $\\bf R\\widehat{\\beta}=Q_1^\\prime y$ and the residual sum of squares is $\\|\\bf Q_2^\\prime y\\|^2$ where $\\bf Q_1$ is the first $p$ columns of the $n\\times n$ matrix $\\bf Q$ and $\\bf Q_2$ is the last $n-p$ columns.\n",
    "\n",
    "This method would not be practical for large $n$ if we needed to generate and store $\\bf Q$ explicitly.  But we don't."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "QRCompactWYQ{Float64} (constructor with 1 method)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Q = qrd[:Q]\n",
    "typeof(Q)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5x5 Triangular{Float64,Array{Float64,2},:U,false}:\n",
       " -7.07107  -248.121   -16.214    -7825.96  -26.5702  \n",
       "  0.0        64.0621   -8.20847  -5244.98   -0.960775\n",
       "  0.0         0.0      -3.77617  -1659.93    0.871339\n",
       "  0.0         0.0       0.0       4224.22   -5.12174 \n",
       "  0.0         0.0       0.0          0.0   -19.3819  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "R = Triangular(qrd[:R],:U)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can create both $\\widehat{\\sigma}$ and $\\widehat{\\beta}$ in a simple function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "betasigmahat (generic function with 1 method)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "function betasigmahat(qrd,y::Vector)\n",
    "    n,p = size(qrd)\n",
    "    length(y) == n || throw(DimensionMismatch(\"\"))\n",
    "    qty = qrd[:Q]'y\n",
    "    sigmahat = sqrt(Base.sumabs2(sub(qty,(p+1):n))/(n-p))\n",
    "    Triangular(qrd[:R],:U)\\sub(qty,1:p), sigmahat\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "([28.5583,-0.445019,-2.39471,0.000373479,0.449379],4.560948146915455)"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "betasigmahat(qrd,y1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can wrap all this in a function to create the bootstrap sample"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "boot (generic function with 1 method)"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "function boot(m::LinearModel,nsim::Integer)\n",
    "    qrd = m.pp.qr\n",
    "    qmat = qrd[:Q]\n",
    "    rmat = Triangular(qrd[:R],:U)\n",
    "    n,p = size(qrd)\n",
    "    df = float(n-p)\n",
    "    d = IsoNormal(m.rr.mu, scale(m))\n",
    "    betas = Array(Float64,(size(qrd,2),nsim))\n",
    "    sigmas = Array(Float64,nsim)\n",
    "    for j in 1:nsim\n",
    "        qty = qmat'rand(d)\n",
    "        betas[:,j] = rmat\\sub(qty,1:p)\n",
    "        sigmas[j] = sqrt(Base.sumabs2(sub(qty,(p+1):n))/df)\n",
    "    end\n",
    "    betas, sigmas\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(\n",
       "5x1 Array{Float64,2}:\n",
       " 28.5583     \n",
       " -0.445019   \n",
       " -2.39471    \n",
       "  0.000373479\n",
       "  0.449379   ,\n",
       "\n",
       "[4.56095])"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "srand(1234321)\n",
    "boot(m1.model,1)  # check that the results are calculated properly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(\n",
       "5x10000 Array{Float64,2}:\n",
       " 28.5583       35.944      21.9009      …  21.5199       28.1422     \n",
       " -0.445019     -0.621564   -0.343171       -0.329395     -0.431297   \n",
       " -2.39471      -2.35589    -0.86067        -1.27995      -1.82568    \n",
       "  0.000373479  -0.0012479  -9.93608e-5      0.000940134  -0.000431981\n",
       "  0.449379      0.598529    0.338381        0.233601      0.423766   ,\n",
       "\n",
       "[4.56095,4.14837,4.10657,3.27316,3.69092,3.83084,3.94549,4.17757,4.6156,3.62479  …  4.28691,2.50244,3.70527,2.99092,4.19263,3.96592,4.27044,3.06848,3.5673,3.95991])"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "srand(1234321)\n",
    "boot(m1.model,10000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "elapsed time: 0.105056267 seconds (24635424 bytes allocated)\n"
     ]
    }
   ],
   "source": [
    "gc();@time boot(m1.model,10000);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reducing memory allocation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are doing much better in the boot function except that we are allocating a lot of memory.  We can avoid this by using mutating functions.  For example each call to `rand(d)` allocates a vector of length `n` which eventually must be garbage collected.  This is not a terrible burden if $n=50$ but when $n=100000$ you don't want to have this wasted allocation occurring `nsim` times.\n",
    "\n",
    "The mutating version of `rand` avoids this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "boot1 (generic function with 1 method)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "function boot1(m::LinearModel,nsim::Integer)\n",
    "    qrd = m.pp.qr\n",
    "    qmat = qrd[:Q]\n",
    "    rmat = Triangular(qrd[:R],:U)\n",
    "    n,p = size(qrd)\n",
    "    df = float(n-p)\n",
    "    d = IsoNormal(m.rr.mu, scale(m))\n",
    "    betas = Array(Float64,(size(qrd,2),nsim))\n",
    "    sigmas = Array(Float64,nsim)\n",
    "    y = Array(Float64,n)\n",
    "    for j in 1:nsim\n",
    "        qty = qmat'rand!(d,y)\n",
    "        betas[:,j] = rmat\\sub(qty,1:p)\n",
    "        sigmas[j] = sqrt(Base.sumabs2(sub(qty,(p+1):n))/df)\n",
    "    end\n",
    "    betas, sigmas\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(\n",
       "5x1 Array{Float64,2}:\n",
       " 28.5583     \n",
       " -0.445019   \n",
       " -2.39471    \n",
       "  0.000373479\n",
       "  0.449379   ,\n",
       "\n",
       "[4.56095])"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "srand(1234321)\n",
    "boot1(m1.model,1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "elapsed time: 0.103414702 seconds (20153008 bytes allocated)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "(\n",
       "5x10000 Array{Float64,2}:\n",
       " 28.5583       35.944      21.9009      …  21.5199       28.1422     \n",
       " -0.445019     -0.621564   -0.343171       -0.329395     -0.431297   \n",
       " -2.39471      -2.35589    -0.86067        -1.27995      -1.82568    \n",
       "  0.000373479  -0.0012479  -9.93608e-5      0.000940134  -0.000431981\n",
       "  0.449379      0.598529    0.338381        0.233601      0.423766   ,\n",
       "\n",
       "[4.56095,4.14837,4.10657,3.27316,3.69092,3.83084,3.94549,4.17757,4.6156,3.62479  …  4.28691,2.50244,3.70527,2.99092,4.19263,3.96592,4.27044,3.06848,3.5673,3.95991])"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "srand(1234321);gc();@time boot1(m1.model,10000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have cut down on the memory allocated (24 Mb total down to 20 Mb total) but only by about 1/6.  The other allocations are occurring in the multiplication by $\\bf Q$ and the solutions with respect to $\\bf R$. There are mutating versions of both these operations.  A multiplication of the form $\\bf Q^\\prime y$ ends up calling the relevant method for `At_mul_B`. The general function is `Ac_mul_B` indicating the conjugate transpose. The two functions are identical for real matrices and we will use the more general name `Ac_mul_B` and its mutating version `Base.Ac_mul_B!`.  Similarly, `R\\qty` ends up calling `A_ldiv_B` with mutating version `Base.A_ldiv_B!`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "boot2 (generic function with 1 method)"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "function boot2(m::LinearModel,nsim::Integer)\n",
    "    qrd = m.pp.qr\n",
    "    qmat = qrd[:Q]\n",
    "    rmat = Triangular(qrd[:R],:U)\n",
    "    n,p = size(qrd)\n",
    "    df = float(n-p)\n",
    "    d = IsoNormal(m.rr.mu, scale(m))\n",
    "    betas = Array(Float64,(size(qrd,2),nsim))\n",
    "    sigmas = Array(Float64,nsim)\n",
    "    y = Array(Float64,n)\n",
    "    for j in 1:nsim\n",
    "        Base.Ac_mul_B!(qmat,rand!(d,y))\n",
    "        Base.A_ldiv_B!(rmat,copy!(sub(betas,:,j),sub(y,1:p)))\n",
    "        sigmas[j] = sqrt(Base.sumabs2(sub(y,(p+1):n))/df)\n",
    "    end\n",
    "    betas, sigmas\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(\n",
       "5x1 Array{Float64,2}:\n",
       " 28.5583     \n",
       " -0.445019   \n",
       " -2.39471    \n",
       "  0.000373479\n",
       "  0.449379   ,\n",
       "\n",
       "[4.56095])"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "srand(1234321)\n",
    "boot2(m1.model,1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "elapsed time: 0.099737582 seconds (14835392 bytes allocated)\n"
     ]
    }
   ],
   "source": [
    "srand(1234321); gc(); @time samp2 = boot2(m1.model,10000);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It doesn't look like we have saved a lot but remember that we are allocating 4 Mb of that 14 Mb for the result."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preallocating the result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A common idiom in Julia is to create a mutating version of a function that overwrites one or more arguments and a non-mutating version that allocates the object to be overwritten and calls the mutating version.\n",
    "\n",
    "We first create the mutating `boot2!` method then define the non-mutation version to allocate the storage and call the mutating version."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "boot2 (generic function with 1 method)"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "function boot2!(betas::Matrix, sigmas::Vector, m::LinearModel)\n",
    "    qrd = m.pp.qr\n",
    "    n,p = size(qrd)\n",
    "    size(betas,2) == length(sigmas) && size(betas,1) == p || throw(DimensionMismatch(\"\"))\n",
    "    qmat = qrd[:Q]\n",
    "    rmat = Triangular(qrd[:R],:U)\n",
    "    df = float(n-p)\n",
    "    d = IsoNormal(m.rr.mu, scale(m))\n",
    "    y = Array(Float64,n)\n",
    "    for j in 1:length(sigmas)\n",
    "        Base.Ac_mul_B!(qmat,rand!(d,y))\n",
    "        Base.A_ldiv_B!(rmat,copy!(sub(betas,:,j),sub(y,1:p)))\n",
    "        sigmas[j] = sqrt(Base.sumabs2(sub(y,(p+1):n))/df)\n",
    "    end\n",
    "    betas, sigmas\n",
    "end\n",
    "function boot2(m::LinearModel, nsim::Integer)\n",
    "    boot2!(Array(Float64,(size(m.pp.qr,2),nsim)),Array(Float64,nsim),m)\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(\n",
       "5x1 Array{Float64,2}:\n",
       " 28.5583     \n",
       " -0.445019   \n",
       " -2.39471    \n",
       "  0.000373479\n",
       "  0.449379   ,\n",
       "\n",
       "[4.56095])"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "srand(1234321)\n",
    "boot2!(zeros(5,1),zeros(1),m1.model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "elapsed time: 0.099889196 seconds (14355200 bytes allocated)\n"
     ]
    }
   ],
   "source": [
    "betas = zeros(5,10000)\n",
    "sigmas = zeros(10000)\n",
    "srand(1234321); gc(); @time boot2!(betas,sigmas,m1.model);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So despite all these heroic efforts we seem to be stuck allocating about 14 Mb in this case."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Profiling to find the memory allocation bottlenecks."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We should profile to see what is happening.  To get a reasonable execution time we up the number of samples from 10,000 to 100,000."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "betas = zeros(5,100_000)\n",
    "sigmas = zeros(100_000)\n",
    "srand(1234321)\n",
    "gc()\n",
    "@profile boot2!(betas,sigmas,m1.model);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "using ProfileView"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At this point we would typically run `ProfileViews.view()` but the resulting plot doesn't display well in an IJulia notebook.  Also, when `@profile` is called through IJulia there are several functions specific to the IJulia interface that show up.\n",
    "\n",
    "Running this in the REPL does produce a profile _flame plot_ which shows, as expected, that essentially all the time is being used in the inner loop.  Roughly 1/3 in the `Ac_mul_B!` call, nearly 2/3 in the `A_ldiv_B!` call and a small amount in the `sumabs2` call.\n",
    "\n",
    "Furthermore, only a small fraction of the time in the `A_ldiv_B!` call is actually spent in the calculation.  Most of the time is spent in the `sub` function.  I misunderstood how it worked.  This frequently happens to me, which is why `@profile` and `ProfileView.view()` are so valuable.\n",
    "\n",
    "It is likely that `view` from the `ArrayViews` package will be faster and allocate less memory.  (Because the capabilities in `ArrayViews` are likely to break existing code, they will not be introduced into the base system until after release 0.3.0 is out.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "boot3 (generic function with 1 method)"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "using ArrayViews\n",
    "function boot3!(betas::Matrix, sigmas::Vector, m::LinearModel)\n",
    "    qrd = m.pp.qr\n",
    "    n,p = size(qrd)\n",
    "    size(betas,2) == length(sigmas) && size(betas,1) == p || throw(DimensionMismatch(\"\"))\n",
    "    qmat = qrd[:Q]\n",
    "    rmat = Triangular(qrd[:R],:U)\n",
    "    df = float(n-p)\n",
    "    d = IsoNormal(m.rr.mu, scale(m))\n",
    "    y = Array(Float64,n)\n",
    "    for j in 1:length(sigmas)\n",
    "        Base.Ac_mul_B!(qmat,rand!(d,y))\n",
    "        Base.A_ldiv_B!(rmat,copy!(view(betas,:,j),view(y,1:p)))\n",
    "        sigmas[j] = sqrt(Base.sumabs2(view(y,(p+1):n))/df)\n",
    "    end\n",
    "    betas, sigmas\n",
    "end\n",
    "function boot3(m::LinearModel, nsim::Integer)\n",
    "    boot3!(Array(Float64,(size(m.pp.qr,2),nsim)),Array(Float64,nsim),m)\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(\n",
       "5x1 Array{Float64,2}:\n",
       " 28.5583     \n",
       " -0.445019   \n",
       " -2.39471    \n",
       "  0.000373479\n",
       "  0.449379   ,\n",
       "\n",
       "[4.56095])"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "srand(1234321)\n",
    "boot3!(zeros(5,1),[0.],m1.model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "elapsed time: 0.062589559 seconds (5361040 bytes allocated)\n"
     ]
    }
   ],
   "source": [
    "betas = zeros(5,10_000)\n",
    "sigmas = zeros(10_000)\n",
    "srand(1234321)\n",
    "gc()\n",
    "@time boot3!(betas,sigmas,m1.model);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The profile indicates that we are unlikely to make this much faster as almost all the time is being spent in the BLAS or LAPACK functions, plus the simulation from the multivariate normal.  We are still allocating over 5 Mb during the execution but I'm not sure exactly where this is occuring."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Parallel execution"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition to allowing memory allocation to be more finely controlled, having a mutating version of the simulation function can help when parallelizing the simulation using distributed arrays or memory-mapped arrays.\n",
    "\n",
    "A memory-mapped array corresponds to a stored file.  Numbers are stored in their native binary representation.\n",
    "\n",
    "Suppose that we were simulating a huge number of replications for this model or for more complex models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Array{Float64,2}"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nsim = 10_000_000\n",
    "n, p = size(m1.model.pp.X)\n",
    "s = open(\"/tmp/mmapbetas.bin\",\"w+\")\n",
    "betas = mmap_array(Float64,(p,nsim),s)\n",
    "typeof(betas)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Julia 0.4.2",
   "language": "julia",
   "name": "julia-0.4"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "0.4.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}