{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "| [Table of Contents](#table_of_contents) | [Data and model](#data_and_model) | [MLE](#mle) | [REMLE](#remle) | [References](#references) |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Authors:** Andrej Gajdoš, Jozef Hanč, Martina Hančová <br> *[Faculty of Science](https://www.upjs.sk/en/faculty-of-science/?prefferedLang=EN), P. J. Šafárik University in Košice, Slovakia* <br> email: [andrej.gajdos@student.upjs.sk](mailto:andrej.gajdos@student.upjs.sk)\n",
    "***\n",
    "**<font size=6 color=brown> MLE, REMLE for tourism</font>** "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<font size=4> R-based computational tools - </font> **<font size=4> fdslrm, nlme, MMEinR(mixed), sommer</font>**  \n",
    "<a id=table_of_contents></a>\n",
    "###  Table of Contents \n",
    "* [Data and model](#data_and_model) - data and model description of empirical data\n",
    "* [MLE](#mle) - computation by standard packages *nlme*, *MMEinR(mixed)* and *fdslrm*\n",
    "* [REMLE](#remle) - computation by standard package *nlme*, *MMEinR(mixed)*, *sommer* and *fdslrm*\n",
    "* [References](#references)\n",
    "\n",
    "**To get back to the table of contents, use <font color=brown>the Home key</font>.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "<a id=data_and_model></a>\n",
    "# <font color=brown>Data and model </font>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this econometric FDSLRM application, we consider the time series data set, called \n",
    "`visnights`, representing *total quarterly visitor nights* (in millions) from \n",
    "1998-2016 in one of the regions of Australia -- inner zone of Victoria state. The number of time \n",
    "series observations is $n=76$. The data was adapted from _Hyndman, 2018_.\n",
    "\n",
    "The Gaussian orthogonal FDSLRM fitting the tourism data has the following form:\n",
    "\n",
    "$ \n",
    "\\begin{array}{rl}\n",
    "& X(t) & \\!  = \\! & \\beta_1+\\beta_2\\cos\\left(\\tfrac{2\\pi t}{76}\\right)+\\beta_3\\sin\\left(\\tfrac{2\\pi t\\cdot 2}{76}\\right) + \\\\\n",
    "&      &  & +Y_1\\cos\\left(\\tfrac{2\\pi t\\cdot 19 }{76}\\right)+Y_2\\sin\\left(\\tfrac{2\\pi t\\cdot 19}{76}\\right) +Y_3\\cos\\left(\\tfrac{2\\pi t\\cdot 38}{76}\\right) +w(t), \\, t\\in \\mathbb{N},\n",
    "\\end{array}\n",
    "$ \n",
    "\n",
    "where $\\boldsymbol{\\beta}=(\\beta_1,\\,\\beta_2,\\,\\beta_3)' \\in \\mathbb{R}^3, \\mathbf{Y} = (Y_1, Y_2, Y_3)' \\sim \\mathcal{N}_3(\\boldsymbol{0}, \\mathrm{D}), w(t) \\sim \\mathcal{iid}\\, \\mathcal{N} (0, \\sigma_0^2), \\boldsymbol{\\nu}=(\\sigma_0^2, \\sigma_1^2, \\sigma_2^2, \\sigma_3^2) \\in \\mathbb{R}_{+}^4$.\n",
    "\n",
    "We identified the given and most parsimonious structure of the FDSLRM using an iterative process of the model building and selection based on exploratory tools of *spectral analysis* and *residual diagnostics* (for details see our Jupyter notebook `tourism.ipynb`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:01.373000Z",
     "start_time": "2019-05-23T16:00:01.311Z"
    }
   },
   "outputs": [],
   "source": [
    "# data - time series observation\n",
    "x <- read.csv2('tourism.csv', header = FALSE)\n",
    "x <- x[,2]\n",
    "x <- as.numeric(as.vector(x[-1]))\n",
    "t <- 1:length(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:01.443000Z",
     "start_time": "2019-05-23T16:00:01.314Z"
    }
   },
   "outputs": [],
   "source": [
    "# auxilliary functions to create design matrices F, V and the projection matrix M_F\n",
    "makeF <- function(times, freqs) {\n",
    "\n",
    "        n <- length(times)\n",
    "        f <- length(freqs)\n",
    "        c1 <- matrix(1, n)\n",
    "        F <- matrix(c1, n, 1)\n",
    "\n",
    "        for (i in 1:f) {\n",
    "                F <- cbind(F, cos(2 * pi * freqs[i] * times))\n",
    "                F <- cbind(F, sin(2 * pi * freqs[i] * times))\n",
    "        }\n",
    "\n",
    "        return(F)\n",
    "}\n",
    "\n",
    "makeV <- function(times, freqs) {\n",
    "\n",
    "        V <- makeF(times, freqs)\n",
    "        V <- V[, -1]\n",
    "\n",
    "        return(V)\n",
    "}\n",
    "\n",
    "makeM_F <- function(F) {\n",
    "\n",
    "        n <- nrow(F)\n",
    "        c1 <- rep(1, n)\n",
    "        I <- diag(c1)\n",
    "        M_F <- I - F %*% solve((t(F) %*% F)) %*% t(F)\n",
    "\n",
    "        return(M_F)\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:01.496000Z",
     "start_time": "2019-05-23T16:00:01.316Z"
    }
   },
   "outputs": [],
   "source": [
    "# model parameters\n",
    "n <- 76 \n",
    "k <- 3 \n",
    "l <- 3\n",
    "\n",
    "# model - design matrices F, V\n",
    "F <- makeF(t, c(1/76, 2/76))\n",
    "F <- F[,-c(3,4)]\n",
    "\n",
    "V <- makeV(t, c(19/76, 38/76))\n",
    "V <- V[,-4]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "<a id=mle></a>\n",
    "# <font color=brown> MLE</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## nlme: Linear and Nonlinear Mixed Effects Models\n",
    "* _Purpose_: R package to fit and compare Gaussian linear and nonlinear mixed-effects models \n",
    "* _Version_: 3.1-140, 2019\n",
    "* _URL_: https://CRAN.R-project.org/package=nlme]\n",
    "* _Key tool_: `lme` function for fitting LMM\n",
    "* _Computational parameters_ of `lme`:\n",
    ">   * <font size=2> `maxIter` - maximum number of iterations  (default: 50).\n",
    "    * `msMaxIter` - maximum number of iterations for the optimization step (default: 50).\n",
    "    * `tolerance` - tolerance for the convergence criterion (default: 1e-6).\n",
    "    * `niterEM` - number of iterations for the EM algorithm used to refine the initial estimates of the random effects (default: 25).\n",
    "    * `msMaxEval` - maximum number of evaluations of the objective function permitted for optimizer nlminb (default: 200).\n",
    "    * `msTol` - tolerance for the convergence criterion on the first iteration when optim is used (default is 1e-7). \n",
    "    * `.relStep` - relative step for numerical derivatives calculations. Default is `.Machine$double.eps^(1/3)` while `.Machine$double.eps = 2.220446e-16`. \n",
    "    * `opt` - the optimizer to be used, either \"nlminb\" (the default) or \"optim\".\n",
    "    * `optimMethod`  - the optimization method, a version of quasi-Newton method to be used with the optim optimizer (default:\"BFGS\", alternative: \"L-BFGS-B\")\n",
    "    * `minAbsParApVar` - numeric value - minimum absolute parameter value in the approximate variance calculation (default: 0.05). <font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:01.590000Z",
     "start_time": "2019-05-23T16:00:01.319Z"
    }
   },
   "outputs": [],
   "source": [
    "library(nlme)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:01.634000Z",
     "start_time": "2019-05-23T16:00:01.322Z"
    }
   },
   "outputs": [],
   "source": [
    "REMLE_nlme <- function(X, F, V, method) {\n",
    "    g <- rep(1,length(X))\n",
    "    d <- data.frame(g,F,V,X)\n",
    "    colnames(d) <- c(\"g\",\"f1\",\"f2\",\"f3\",\"v1\",\"v2\",\"v3\",\"X\")\n",
    "    \n",
    "    result <- NULL\n",
    "    \n",
    "    if(method == \"ML\") {\n",
    "        result <- lme(fixed=X~f2+f3,random=list(g=pdDiag(~-1+v1+v2+v3)),data=d,method=\"ML\")\n",
    "    } else {\n",
    "        result <- lme(fixed=X~f2+f3,random=list(g=pdDiag(~-1+v1+v2+v3)),data=d,method=\"REML\")\n",
    "    }\n",
    "    \n",
    "    return(as.vector(c(result$sigma^2, diag(getVarCov(result)))))\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:01.669000Z",
     "start_time": "2019-05-23T16:00:01.325Z"
    }
   },
   "outputs": [],
   "source": [
    "# auxilliary function to calculate the norm of a vector\n",
    "norm_vec <- function(x) sqrt(sum(x^2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:01.745000Z",
     "start_time": "2019-05-23T16:00:01.329Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 0.103243075 0.001188691 0.227594308 0.020914576\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "0.250793009894722"
      ],
      "text/latex": [
       "0.250793009894722"
      ],
      "text/markdown": [
       "0.250793009894722"
      ],
      "text/plain": [
       "[1] 0.250793"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "MLEnlme <- REMLE_nlme(x, F, V, \"ML\")\n",
    "print(MLEnlme)\n",
    "norm_vec(MLEnlme)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## fdslrm: Time series analysis and forecasting using LMM\n",
    "    \n",
    "* _Purpose_: R package for modeling and prediction of time series using linear mixed models.\n",
    "* _Version_: 0.1.0, 2019\n",
    "* _Depends_: kableExtra, IRdisplay, MASS, Matrix, car, nlme, stats, forecast, fpp2, matrixcalc, sommer, gnm, pracma, CVXR\n",
    "* _Maintainer_: Andrej Gajdoš\n",
    "* _Authors_: Andrej Gajdoš, Jozef Hanč, Martina Hančová\n",
    "* _URL_: https://github.com/fdslrm/R-package\n",
    "* _Installation_: Run jupyter notebook `00 installation fdslrm.ipynb` once before the first run of any R-based Jupyter notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### fdslrm (nlme)\n",
    "function `fitDiagFDSLRM` via parameter `\"lme\"` implements `lme` function from `nlme`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*<font size=2>Important note:</font>* \n",
    "<font size=2> A brief help on R functions of our fdslrm package designed to work with FDSLRM is in the modelling notebooks. After our testing, the most reliable way to install our fdslrm package in a Binder repository is its direct loading from GitHub. The standard installation of our fdslrm package as in the case of any R package on GitHub works without any problems in a local installation using Anaconda R distribution or CRAN distribution.</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:05.264000Z",
     "start_time": "2019-05-23T16:00:01.333Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "SHA-1 hash of file is f56b9d53e72a8575947a467930a2bdddb5b500ad\n"
     ]
    }
   ],
   "source": [
    "# loading all fdslrm functions as an R script from GiHub\n",
    "devtools::source_url(\"https://github.com/fdslrm/fdslrmAllinOne/blob/master/fdslrmAllinOne.R?raw=TRUE\")\n",
    "initialFDSLRM()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:07.391000Z",
     "start_time": "2019-05-23T16:00:01.335Z"
    }
   },
   "outputs": [],
   "source": [
    "# fit particular FDSLRM employing ML estimation of variances (by 'lme' function)\n",
    "MLEfitnlme <- fitDiagFDSLRM(x, t, freq_mean = c(1/76, 2/76), include_fixed_eff = c(1,0,0,1), freq_random = c(19/76, 38/76), \n",
    "                            include_random_eff = c(1,1,1,0), fit_function = \"lme\", var_estim_method = \"ML\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:07.435000Z",
     "start_time": "2019-05-23T16:00:01.337Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 0.103243075 0.001188691 0.227594308 0.020914576\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "0.250793009894722"
      ],
      "text/latex": [
       "0.250793009894722"
      ],
      "text/markdown": [
       "0.250793009894722"
      ],
      "text/plain": [
       "[1] 0.250793"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "print(as.vector(c(MLEfitnlme$error_variance, diag(MLEfitnlme$rand_eff_variance))))\n",
    "Norm(as.vector(c(MLEfitnlme$error_variance, diag(MLEfitnlme$rand_eff_variance))))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## MMEinR\n",
    "R version of Witkovský's MATLAB function `mixed` https://www.mathworks.com/matlabcentral/fileexchange/200\n",
    "* _Purpose_: R function to estimate parameters of a linear mixed model (LMM) with a simple variance components structure\n",
    "* _URL_: https://github.com/fdslrm/MMEinR\n",
    "* _Computational parameters_ of `MMEinR (mixed)`:\n",
    ">* <font size=2> tolerance for the convergence criterion (`epss = 1e-8`)\n",
    ">* iterative method solving Henderson's mixed model equations <font>\n",
    ">* function return the estimates of variance parameters in a different order: $(\\tilde{\\nu}_1,\\ldots,\\tilde{\\nu}_l,\\tilde{\\nu}_0)'$ in comparison with other tools"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:08.666000Z",
     "start_time": "2019-05-23T16:00:01.339Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "SHA-1 hash of file is 36b3b4cd1a1f98c5fb36b541fc6b3956f6663aba\n",
      "Warning message in mixed(y, X, Z, dim, s20, method):\n",
      "\"Estimated variance components are negative or zeros!\""
     ]
    }
   ],
   "source": [
    "# load function 'MMEinR'\n",
    "devtools::source_url(\"https://github.com/fdslrm/MMEinR/blob/master/MMEinR.R?raw=TRUE\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:08.742000Z",
     "start_time": "2019-05-23T16:00:01.342Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 0.001188715 0.227589326 0.020914669 0.103243088\n",
      "[1] 0.2507885\n"
     ]
    }
   ],
   "source": [
    "MLEmixed <- mixed(x, F, V, c(1,1,1), c(1,1,1,1), 1)\n",
    "print(MLEmixed$s2)\n",
    "print(norm_vec(MLEmixed$s2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### fdslrm (MMEinR)\n",
    "function `fitDiagFDSLRM` via parameter `\"mixed\"` implements R version of MATLAB `mixed` function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:08.806000Z",
     "start_time": "2019-05-23T16:00:01.344Z"
    }
   },
   "outputs": [],
   "source": [
    "# fit particular FDSLRM employing ML estimation of variances (by 'mixed' function)\n",
    "MLEfitmixed <- fitDiagFDSLRM(x, t, freq_mean = c(1/76, 2/76), include_fixed_eff = c(1,0,0,1), freq_random = c(19/76, 38/76), \n",
    "                             include_random_eff = c(1,1,1,0), fit_function = \"mixed\", var_estim_method = \"ML\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:08.848000Z",
     "start_time": "2019-05-23T16:00:01.346Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 0.103243088 0.001188715 0.227589326 0.020914669\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "0.250788501950155"
      ],
      "text/latex": [
       "0.250788501950155"
      ],
      "text/markdown": [
       "0.250788501950155"
      ],
      "text/plain": [
       "[1] 0.2507885"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "print(as.vector(c(MLEfitmixed$error_variance, MLEfitmixed$rand_eff_variance)))\n",
    "norm_vec(as.vector(c(MLEfitmixed$error_variance, MLEfitmixed$rand_eff_variance)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "<a id=remle></a>\n",
    "# <font color=brown> REMLE</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## nlme"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:08.905000Z",
     "start_time": "2019-05-23T16:00:01.350Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 0.107667792 0.001072223 0.227470576 0.020857001\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "0.252529959856975"
      ],
      "text/latex": [
       "0.252529959856975"
      ],
      "text/markdown": [
       "0.252529959856975"
      ],
      "text/plain": [
       "[1] 0.25253"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "REMLEnlme <- REMLE_nlme(x, F, V, \"REML\")\n",
    "print(REMLEnlme)\n",
    "norm_vec(REMLEnlme)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### fdslrm (nlme)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:09.092000Z",
     "start_time": "2019-05-23T16:00:01.352Z"
    }
   },
   "outputs": [],
   "source": [
    "# fit particular FDSLRM employing ML estimation of variances (by 'lme' function)\n",
    "REMLEfitnlme <- fitDiagFDSLRM(x, t, freq_mean = c(1/76, 2/76), include_fixed_eff = c(1,0,0,1), freq_random = c(19/76, 38/76), \n",
    "                              include_random_eff = c(1,1,1,0), fit_function = \"lme\", var_estim_method = \"REML\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:09.133000Z",
     "start_time": "2019-05-23T16:00:01.355Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 0.107667792 0.001072223 0.227470576 0.020857001\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "0.252529959856975"
      ],
      "text/latex": [
       "0.252529959856975"
      ],
      "text/markdown": [
       "0.252529959856975"
      ],
      "text/plain": [
       "[1] 0.25253"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "print(as.vector(c(REMLEfitnlme$error_variance, diag(REMLEfitnlme$rand_eff_variance))))\n",
    "norm_vec(as.vector(c(REMLEfitnlme$error_variance, diag(REMLEfitnlme$rand_eff_variance))))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## MMEinR"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:09.203000Z",
     "start_time": "2019-05-23T16:00:01.358Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 0.00107228 0.22747289 0.02085645 0.10766779\n",
      "[1] 0.252532\n"
     ]
    }
   ],
   "source": [
    "REMLEmixed <- mixed(x, F, V, c(1,1,1), c(1,1,1,1), 2)\n",
    "print(REMLEmixed$s2)\n",
    "print(norm_vec(REMLEmixed$s2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  fdslrm (MMEinR)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:09.277000Z",
     "start_time": "2019-05-23T16:00:01.361Z"
    }
   },
   "outputs": [],
   "source": [
    "# fit particular FDSLRM employing ML estimation of variances (by 'mixed' function)\n",
    "REMLEfitmixed <- fitDiagFDSLRM(x, t, freq_mean = c(1/76, 2/76), include_fixed_eff = c(1,0,0,1), freq_random = c(19/76, 38/76), \n",
    "                               include_random_eff = c(1,1,1,0), fit_function = \"mixed\", var_estim_method = \"REML\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:09.319000Z",
     "start_time": "2019-05-23T16:00:01.365Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 0.10766779 0.00107228 0.22747289 0.02085645\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "0.252531993886652"
      ],
      "text/latex": [
       "0.252531993886652"
      ],
      "text/markdown": [
       "0.252531993886652"
      ],
      "text/plain": [
       "[1] 0.252532"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "print(as.vector(c(REMLEfitmixed$error_variance, REMLEfitmixed$rand_eff_variance)))\n",
    "norm_vec(as.vector(c(REMLEfitmixed$error_variance, REMLEfitmixed$rand_eff_variance)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## sommer: Solving Mixed Model Equations in R\n",
    "* _Purpose_: R package - structural multivariate-univariate linear mixed model solver for multiple random effects and estimation of unknown variance-covariance structures (i.e. heterogeneous and unstructured variance models).   \n",
    "<font size=2> Designed for genomic prediction and genome wide association studies (GWAS), particularly focused in the $p$ > $n$ problem (more coefficients than observations) to include multiple known relationship matrices and estimate complex unknown covariance structures. Spatial models can be fitted using the two-dimensional spline functionality in sommer. </font>\n",
    "\n",
    "* _Version_: 3.9.3, 2019\n",
    "* _URL_: https://CRAN.R-project.org/package=sommer\n",
    "* _Key tool_: `mmer` function for fitting\n",
    "* _computational parameters_ of `mmer`:\n",
    ">   * <font size=2> `iters` - maximum number of iterations (default: 20).\n",
    "    * `tolpar` - tolerance for the convergence criterion for the change in loglikelihood (default: 1e-03)\n",
    "    * `tolparinv` - tolerance parameter for matrix inverse used when singularities are encountered (default:1e-06). \n",
    "    * `method` - this refers to the method or algorithm to be used for estimating variance components (default: NR, direct-inversion Newton-Raphson method, alternative: AI,  Average Information) \n",
    "    * function return the estimates of variance parameters in a different order: $(\\tilde{\\nu}_1,\\ldots,\\tilde{\\nu}_l,\\tilde{\\nu}_0)'$ in comparison with other tools<font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:09.360000Z",
     "start_time": "2019-05-23T16:00:01.368Z"
    }
   },
   "outputs": [],
   "source": [
    "library(sommer)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:09.397000Z",
     "start_time": "2019-05-23T16:00:01.370Z"
    }
   },
   "outputs": [],
   "source": [
    "REMLE_sommer <- function(X, F, V) {\n",
    "    f1 <- F[,1]\n",
    "    f2 <- F[,2]\n",
    "    f3 <- F[,3]\n",
    "    v1 <- V[,1]\n",
    "    v2 <- V[,2]\n",
    "    v3 <- V[,3]\n",
    "\n",
    "    \n",
    "    DT <- data.frame(X, f1, f2, f3, v1, v2, v3)\n",
    "\n",
    "    suppressWarnings(result_new <- mmer(fixed = X~f2+f3, random = ~ vs(ds(v1),1)+vs(ds(v2),1)+vs(ds(v3),1), data = DT, \n",
    "                                        verbose = FALSE))\n",
    "    \n",
    "    output <- as.vector(unlist(result_new$sigma))\n",
    "    \n",
    "    return(output)\n",
    "\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:09.538000Z",
     "start_time": "2019-05-23T16:00:01.373Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 0.001072577 0.227469441 0.020856440 0.107667700\n",
      "[1] 0.2525289\n"
     ]
    }
   ],
   "source": [
    "REMLEsommer <- REMLE_sommer(x, F, V)\n",
    "print(REMLEsommer)\n",
    "print(norm_vec(REMLEsommer))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### fdslrm (sommer)\n",
    "function `fitDiagFDSLRM` via parameter `\"mmer\"` implements `mmer` function from `sommer`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:09.659000Z",
     "start_time": "2019-05-23T16:00:01.375Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[34mfixed-effect model matrix is rank deficient so dropping 1 columns / coefficients\n",
      "\u001b[39m"
     ]
    }
   ],
   "source": [
    "# fit particular FDSLRM employing REML estimation of variances (by 'mmer' function)\n",
    "REMLEfitsommer <- suppressWarnings(fitDiagFDSLRM(x, t, freq_mean = c(1/76, 2/76), include_fixed_eff = c(1,0,0,1), \n",
    "                                                 freq_random = c(19/76, 38/76), include_random_eff = c(1,1,1,0), \n",
    "                                                 fit_function = \"mmer\", var_estim_method = \"REML\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:09.716000Z",
     "start_time": "2019-05-23T16:00:01.378Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 0.107667700 0.001072577 0.227469441 0.020856440\n",
      "[1] 0.2525289\n"
     ]
    }
   ],
   "source": [
    "print(as.vector(c(REMLEfitsommer$error_variance, REMLEfitsommer$rand_eff_variance)))\n",
    "print(norm_vec(c(REMLEfitsommer$error_variance, REMLEfitsommer$rand_eff_variance)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "<a id=references></a>\n",
    "# <font color=brown> References </font>\n",
    "This notebook belongs to suplementary materials of the paper submitted to Statistical Papers and available at <https://arxiv.org/abs/1905.07771>.\n",
    "\n",
    "* Hančová, M., Vozáriková, G., Gajdoš, A., Hanč, J. (2019). [Estimating variance components in time series\n",
    "\tlinear regression models using empirical BLUPs and convex optimization](https://arxiv.org/abs/1905.07771), https://arxiv.org/, 2019.\n",
    "\n",
    "### Abstract of the paper\n",
    "\n",
    "We propose a two-stage estimation method of variance components in time series models known as FDSLRMs, whose observations can be described by a linear mixed model (LMM). We based estimating variances, fundamental quantities in a time series forecasting approach called kriging, on the empirical (plug-in) best linear unbiased predictions of unobservable random components in FDSLRM. \n",
    "\n",
    "The method, providing invariant non-negative quadratic estimators, can be used for any absolutely continuous probability distribution of time series data. As a result of applying the convex optimization and the LMM methodology, we resolved two problems $-$ theoretical existence and equivalence between least squares estimators, non-negative (M)DOOLSE, and maximum likelihood estimators, (RE)MLE, as possible starting points of our method and a \n",
    "practical lack of computational implementation for FDSLRM. As for computing (RE)MLE in the case of $ n $ observed time series values, we also discovered a new algorithm of order $\\mathcal{O}(n)$, which at the default precision is $10^7$ times more accurate and $n^2$ times faster than the best current Python(or R)-based computational packages, namely CVXPY, CVXR, nlme, sommer and mixed. \n",
    "\n",
    "We illustrate our results on three real data sets $-$ electricity consumption, tourism and cyber security $-$ which are easily available, reproducible, sharable and modifiable in the form of interactive Jupyter notebooks."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Hyndman R.J., Athanasopoulos G. (2018). [Forecasting: Principles and Practice](https://otexts.org/fpp2/) (2nd Edition), OTexts, Monash University, Australia. Data in R package fpp2 version 2.3. https://CRAN.R-project.org/package=fpp2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "| [Table of Contents](#table_of_contents) | [Data and model](#data_and_model) | [MLE](#mle) | [REMLE](#remle) | [References](#references) |"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "3.5.1"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}