{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"| [Table of Contents](#table_of_contents) | [Data and model](#data_and_model) | [MLE](#mle) | [REMLE](#remle) | [References](#references) |"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Authors:** Andrej Gajdoš, Jozef Hanč, Martina Hančová
*[Faculty of Science](https://www.upjs.sk/en/faculty-of-science/?prefferedLang=EN), P. J. Šafárik University in Košice, Slovakia*
email: [andrej.gajdos@student.upjs.sk](mailto:andrej.gajdos@student.upjs.sk)\n",
"***\n",
"** MLE, REMLE for tourism** "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" R-based computational tools - ** fdslrm, nlme, MMEinR(mixed), sommer** \n",
"\n",
"### Table of Contents \n",
"* [Data and model](#data_and_model) - data and model description of empirical data\n",
"* [MLE](#mle) - computation by standard packages *nlme*, *MMEinR(mixed)* and *fdslrm*\n",
"* [REMLE](#remle) - computation by standard package *nlme*, *MMEinR(mixed)*, *sommer* and *fdslrm*\n",
"* [References](#references)\n",
"\n",
"**To get back to the table of contents, use the Home key.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***\n",
"\n",
"# Data and model "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this econometric FDSLRM application, we consider the time series data set, called \n",
"`visnights`, representing *total quarterly visitor nights* (in millions) from \n",
"1998-2016 in one of the regions of Australia -- inner zone of Victoria state. The number of time \n",
"series observations is $n=76$. The data was adapted from _Hyndman, 2018_.\n",
"\n",
"The Gaussian orthogonal FDSLRM fitting the tourism data has the following form:\n",
"\n",
"$ \n",
"\\begin{array}{rl}\n",
"& X(t) & \\! = \\! & \\beta_1+\\beta_2\\cos\\left(\\tfrac{2\\pi t}{76}\\right)+\\beta_3\\sin\\left(\\tfrac{2\\pi t\\cdot 2}{76}\\right) + \\\\\n",
"& & & +Y_1\\cos\\left(\\tfrac{2\\pi t\\cdot 19 }{76}\\right)+Y_2\\sin\\left(\\tfrac{2\\pi t\\cdot 19}{76}\\right) +Y_3\\cos\\left(\\tfrac{2\\pi t\\cdot 38}{76}\\right) +w(t), \\, t\\in \\mathbb{N},\n",
"\\end{array}\n",
"$ \n",
"\n",
"where $\\boldsymbol{\\beta}=(\\beta_1,\\,\\beta_2,\\,\\beta_3)' \\in \\mathbb{R}^3, \\mathbf{Y} = (Y_1, Y_2, Y_3)' \\sim \\mathcal{N}_3(\\boldsymbol{0}, \\mathrm{D}), w(t) \\sim \\mathcal{iid}\\, \\mathcal{N} (0, \\sigma_0^2), \\boldsymbol{\\nu}=(\\sigma_0^2, \\sigma_1^2, \\sigma_2^2, \\sigma_3^2) \\in \\mathbb{R}_{+}^4$.\n",
"\n",
"We identified the given and most parsimonious structure of the FDSLRM using an iterative process of the model building and selection based on exploratory tools of *spectral analysis* and *residual diagnostics* (for details see our Jupyter notebook `tourism.ipynb`)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:01.373000Z",
"start_time": "2019-05-23T16:00:01.311Z"
}
},
"outputs": [],
"source": [
"# data - time series observation\n",
"x <- read.csv2('tourism.csv', header = FALSE)\n",
"x <- x[,2]\n",
"x <- as.numeric(as.vector(x[-1]))\n",
"t <- 1:length(x)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:01.443000Z",
"start_time": "2019-05-23T16:00:01.314Z"
}
},
"outputs": [],
"source": [
"# auxilliary functions to create design matrices F, V and the projection matrix M_F\n",
"makeF <- function(times, freqs) {\n",
"\n",
" n <- length(times)\n",
" f <- length(freqs)\n",
" c1 <- matrix(1, n)\n",
" F <- matrix(c1, n, 1)\n",
"\n",
" for (i in 1:f) {\n",
" F <- cbind(F, cos(2 * pi * freqs[i] * times))\n",
" F <- cbind(F, sin(2 * pi * freqs[i] * times))\n",
" }\n",
"\n",
" return(F)\n",
"}\n",
"\n",
"makeV <- function(times, freqs) {\n",
"\n",
" V <- makeF(times, freqs)\n",
" V <- V[, -1]\n",
"\n",
" return(V)\n",
"}\n",
"\n",
"makeM_F <- function(F) {\n",
"\n",
" n <- nrow(F)\n",
" c1 <- rep(1, n)\n",
" I <- diag(c1)\n",
" M_F <- I - F %*% solve((t(F) %*% F)) %*% t(F)\n",
"\n",
" return(M_F)\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:01.496000Z",
"start_time": "2019-05-23T16:00:01.316Z"
}
},
"outputs": [],
"source": [
"# model parameters\n",
"n <- 76 \n",
"k <- 3 \n",
"l <- 3\n",
"\n",
"# model - design matrices F, V\n",
"F <- makeF(t, c(1/76, 2/76))\n",
"F <- F[,-c(3,4)]\n",
"\n",
"V <- makeV(t, c(19/76, 38/76))\n",
"V <- V[,-4]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***\n",
"\n",
"# MLE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## nlme: Linear and Nonlinear Mixed Effects Models\n",
"* _Purpose_: R package to fit and compare Gaussian linear and nonlinear mixed-effects models \n",
"* _Version_: 3.1-140, 2019\n",
"* _URL_: https://CRAN.R-project.org/package=nlme]\n",
"* _Key tool_: `lme` function for fitting LMM\n",
"* _Computational parameters_ of `lme`:\n",
"> * `maxIter` - maximum number of iterations (default: 50).\n",
" * `msMaxIter` - maximum number of iterations for the optimization step (default: 50).\n",
" * `tolerance` - tolerance for the convergence criterion (default: 1e-6).\n",
" * `niterEM` - number of iterations for the EM algorithm used to refine the initial estimates of the random effects (default: 25).\n",
" * `msMaxEval` - maximum number of evaluations of the objective function permitted for optimizer nlminb (default: 200).\n",
" * `msTol` - tolerance for the convergence criterion on the first iteration when optim is used (default is 1e-7). \n",
" * `.relStep` - relative step for numerical derivatives calculations. Default is `.Machine$double.eps^(1/3)` while `.Machine$double.eps = 2.220446e-16`. \n",
" * `opt` - the optimizer to be used, either \"nlminb\" (the default) or \"optim\".\n",
" * `optimMethod` - the optimization method, a version of quasi-Newton method to be used with the optim optimizer (default:\"BFGS\", alternative: \"L-BFGS-B\")\n",
" * `minAbsParApVar` - numeric value - minimum absolute parameter value in the approximate variance calculation (default: 0.05). "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:01.590000Z",
"start_time": "2019-05-23T16:00:01.319Z"
}
},
"outputs": [],
"source": [
"library(nlme)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:01.634000Z",
"start_time": "2019-05-23T16:00:01.322Z"
}
},
"outputs": [],
"source": [
"REMLE_nlme <- function(X, F, V, method) {\n",
" g <- rep(1,length(X))\n",
" d <- data.frame(g,F,V,X)\n",
" colnames(d) <- c(\"g\",\"f1\",\"f2\",\"f3\",\"v1\",\"v2\",\"v3\",\"X\")\n",
" \n",
" result <- NULL\n",
" \n",
" if(method == \"ML\") {\n",
" result <- lme(fixed=X~f2+f3,random=list(g=pdDiag(~-1+v1+v2+v3)),data=d,method=\"ML\")\n",
" } else {\n",
" result <- lme(fixed=X~f2+f3,random=list(g=pdDiag(~-1+v1+v2+v3)),data=d,method=\"REML\")\n",
" }\n",
" \n",
" return(as.vector(c(result$sigma^2, diag(getVarCov(result)))))\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:01.669000Z",
"start_time": "2019-05-23T16:00:01.325Z"
}
},
"outputs": [],
"source": [
"# auxilliary function to calculate the norm of a vector\n",
"norm_vec <- function(x) sqrt(sum(x^2))"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:01.745000Z",
"start_time": "2019-05-23T16:00:01.329Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.103243075 0.001188691 0.227594308 0.020914576\n"
]
},
{
"data": {
"text/html": [
"0.250793009894722"
],
"text/latex": [
"0.250793009894722"
],
"text/markdown": [
"0.250793009894722"
],
"text/plain": [
"[1] 0.250793"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"MLEnlme <- REMLE_nlme(x, F, V, \"ML\")\n",
"print(MLEnlme)\n",
"norm_vec(MLEnlme)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## fdslrm: Time series analysis and forecasting using LMM\n",
" \n",
"* _Purpose_: R package for modeling and prediction of time series using linear mixed models.\n",
"* _Version_: 0.1.0, 2019\n",
"* _Depends_: kableExtra, IRdisplay, MASS, Matrix, car, nlme, stats, forecast, fpp2, matrixcalc, sommer, gnm, pracma, CVXR\n",
"* _Maintainer_: Andrej Gajdoš\n",
"* _Authors_: Andrej Gajdoš, Jozef Hanč, Martina Hančová\n",
"* _URL_: https://github.com/fdslrm/R-package\n",
"* _Installation_: Run jupyter notebook `00 installation fdslrm.ipynb` once before the first run of any R-based Jupyter notebook."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### fdslrm (nlme)\n",
"function `fitDiagFDSLRM` via parameter `\"lme\"` implements `lme` function from `nlme`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Important note:* \n",
" A brief help on R functions of our fdslrm package designed to work with FDSLRM is in the modelling notebooks. After our testing, the most reliable way to install our fdslrm package in a Binder repository is its direct loading from GitHub. The standard installation of our fdslrm package as in the case of any R package on GitHub works without any problems in a local installation using Anaconda R distribution or CRAN distribution."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:05.264000Z",
"start_time": "2019-05-23T16:00:01.333Z"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"SHA-1 hash of file is f56b9d53e72a8575947a467930a2bdddb5b500ad\n"
]
}
],
"source": [
"# loading all fdslrm functions as an R script from GiHub\n",
"devtools::source_url(\"https://github.com/fdslrm/fdslrmAllinOne/blob/master/fdslrmAllinOne.R?raw=TRUE\")\n",
"initialFDSLRM()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:07.391000Z",
"start_time": "2019-05-23T16:00:01.335Z"
}
},
"outputs": [],
"source": [
"# fit particular FDSLRM employing ML estimation of variances (by 'lme' function)\n",
"MLEfitnlme <- fitDiagFDSLRM(x, t, freq_mean = c(1/76, 2/76), include_fixed_eff = c(1,0,0,1), freq_random = c(19/76, 38/76), \n",
" include_random_eff = c(1,1,1,0), fit_function = \"lme\", var_estim_method = \"ML\")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:07.435000Z",
"start_time": "2019-05-23T16:00:01.337Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.103243075 0.001188691 0.227594308 0.020914576\n"
]
},
{
"data": {
"text/html": [
"0.250793009894722"
],
"text/latex": [
"0.250793009894722"
],
"text/markdown": [
"0.250793009894722"
],
"text/plain": [
"[1] 0.250793"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"print(as.vector(c(MLEfitnlme$error_variance, diag(MLEfitnlme$rand_eff_variance))))\n",
"Norm(as.vector(c(MLEfitnlme$error_variance, diag(MLEfitnlme$rand_eff_variance))))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## MMEinR\n",
"R version of Witkovský's MATLAB function `mixed` https://www.mathworks.com/matlabcentral/fileexchange/200\n",
"* _Purpose_: R function to estimate parameters of a linear mixed model (LMM) with a simple variance components structure\n",
"* _URL_: https://github.com/fdslrm/MMEinR\n",
"* _Computational parameters_ of `MMEinR (mixed)`:\n",
">* tolerance for the convergence criterion (`epss = 1e-8`)\n",
">* iterative method solving Henderson's mixed model equations \n",
">* function return the estimates of variance parameters in a different order: $(\\tilde{\\nu}_1,\\ldots,\\tilde{\\nu}_l,\\tilde{\\nu}_0)'$ in comparison with other tools"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:08.666000Z",
"start_time": "2019-05-23T16:00:01.339Z"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"SHA-1 hash of file is 36b3b4cd1a1f98c5fb36b541fc6b3956f6663aba\n",
"Warning message in mixed(y, X, Z, dim, s20, method):\n",
"\"Estimated variance components are negative or zeros!\""
]
}
],
"source": [
"# load function 'MMEinR'\n",
"devtools::source_url(\"https://github.com/fdslrm/MMEinR/blob/master/MMEinR.R?raw=TRUE\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:08.742000Z",
"start_time": "2019-05-23T16:00:01.342Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.001188715 0.227589326 0.020914669 0.103243088\n",
"[1] 0.2507885\n"
]
}
],
"source": [
"MLEmixed <- mixed(x, F, V, c(1,1,1), c(1,1,1,1), 1)\n",
"print(MLEmixed$s2)\n",
"print(norm_vec(MLEmixed$s2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### fdslrm (MMEinR)\n",
"function `fitDiagFDSLRM` via parameter `\"mixed\"` implements R version of MATLAB `mixed` function"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:08.806000Z",
"start_time": "2019-05-23T16:00:01.344Z"
}
},
"outputs": [],
"source": [
"# fit particular FDSLRM employing ML estimation of variances (by 'mixed' function)\n",
"MLEfitmixed <- fitDiagFDSLRM(x, t, freq_mean = c(1/76, 2/76), include_fixed_eff = c(1,0,0,1), freq_random = c(19/76, 38/76), \n",
" include_random_eff = c(1,1,1,0), fit_function = \"mixed\", var_estim_method = \"ML\")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:08.848000Z",
"start_time": "2019-05-23T16:00:01.346Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.103243088 0.001188715 0.227589326 0.020914669\n"
]
},
{
"data": {
"text/html": [
"0.250788501950155"
],
"text/latex": [
"0.250788501950155"
],
"text/markdown": [
"0.250788501950155"
],
"text/plain": [
"[1] 0.2507885"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"print(as.vector(c(MLEfitmixed$error_variance, MLEfitmixed$rand_eff_variance)))\n",
"norm_vec(as.vector(c(MLEfitmixed$error_variance, MLEfitmixed$rand_eff_variance)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***\n",
"\n",
"# REMLE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## nlme"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:08.905000Z",
"start_time": "2019-05-23T16:00:01.350Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.107667792 0.001072223 0.227470576 0.020857001\n"
]
},
{
"data": {
"text/html": [
"0.252529959856975"
],
"text/latex": [
"0.252529959856975"
],
"text/markdown": [
"0.252529959856975"
],
"text/plain": [
"[1] 0.25253"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"REMLEnlme <- REMLE_nlme(x, F, V, \"REML\")\n",
"print(REMLEnlme)\n",
"norm_vec(REMLEnlme)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### fdslrm (nlme)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:09.092000Z",
"start_time": "2019-05-23T16:00:01.352Z"
}
},
"outputs": [],
"source": [
"# fit particular FDSLRM employing ML estimation of variances (by 'lme' function)\n",
"REMLEfitnlme <- fitDiagFDSLRM(x, t, freq_mean = c(1/76, 2/76), include_fixed_eff = c(1,0,0,1), freq_random = c(19/76, 38/76), \n",
" include_random_eff = c(1,1,1,0), fit_function = \"lme\", var_estim_method = \"REML\")"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:09.133000Z",
"start_time": "2019-05-23T16:00:01.355Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.107667792 0.001072223 0.227470576 0.020857001\n"
]
},
{
"data": {
"text/html": [
"0.252529959856975"
],
"text/latex": [
"0.252529959856975"
],
"text/markdown": [
"0.252529959856975"
],
"text/plain": [
"[1] 0.25253"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"print(as.vector(c(REMLEfitnlme$error_variance, diag(REMLEfitnlme$rand_eff_variance))))\n",
"norm_vec(as.vector(c(REMLEfitnlme$error_variance, diag(REMLEfitnlme$rand_eff_variance))))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## MMEinR"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:09.203000Z",
"start_time": "2019-05-23T16:00:01.358Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.00107228 0.22747289 0.02085645 0.10766779\n",
"[1] 0.252532\n"
]
}
],
"source": [
"REMLEmixed <- mixed(x, F, V, c(1,1,1), c(1,1,1,1), 2)\n",
"print(REMLEmixed$s2)\n",
"print(norm_vec(REMLEmixed$s2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### fdslrm (MMEinR)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:09.277000Z",
"start_time": "2019-05-23T16:00:01.361Z"
}
},
"outputs": [],
"source": [
"# fit particular FDSLRM employing ML estimation of variances (by 'mixed' function)\n",
"REMLEfitmixed <- fitDiagFDSLRM(x, t, freq_mean = c(1/76, 2/76), include_fixed_eff = c(1,0,0,1), freq_random = c(19/76, 38/76), \n",
" include_random_eff = c(1,1,1,0), fit_function = \"mixed\", var_estim_method = \"REML\")"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:09.319000Z",
"start_time": "2019-05-23T16:00:01.365Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.10766779 0.00107228 0.22747289 0.02085645\n"
]
},
{
"data": {
"text/html": [
"0.252531993886652"
],
"text/latex": [
"0.252531993886652"
],
"text/markdown": [
"0.252531993886652"
],
"text/plain": [
"[1] 0.252532"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"print(as.vector(c(REMLEfitmixed$error_variance, REMLEfitmixed$rand_eff_variance)))\n",
"norm_vec(as.vector(c(REMLEfitmixed$error_variance, REMLEfitmixed$rand_eff_variance)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## sommer: Solving Mixed Model Equations in R\n",
"* _Purpose_: R package - structural multivariate-univariate linear mixed model solver for multiple random effects and estimation of unknown variance-covariance structures (i.e. heterogeneous and unstructured variance models). \n",
" Designed for genomic prediction and genome wide association studies (GWAS), particularly focused in the $p$ > $n$ problem (more coefficients than observations) to include multiple known relationship matrices and estimate complex unknown covariance structures. Spatial models can be fitted using the two-dimensional spline functionality in sommer. \n",
"\n",
"* _Version_: 3.9.3, 2019\n",
"* _URL_: https://CRAN.R-project.org/package=sommer\n",
"* _Key tool_: `mmer` function for fitting\n",
"* _computational parameters_ of `mmer`:\n",
"> * `iters` - maximum number of iterations (default: 20).\n",
" * `tolpar` - tolerance for the convergence criterion for the change in loglikelihood (default: 1e-03)\n",
" * `tolparinv` - tolerance parameter for matrix inverse used when singularities are encountered (default:1e-06). \n",
" * `method` - this refers to the method or algorithm to be used for estimating variance components (default: NR, direct-inversion Newton-Raphson method, alternative: AI, Average Information) \n",
" * function return the estimates of variance parameters in a different order: $(\\tilde{\\nu}_1,\\ldots,\\tilde{\\nu}_l,\\tilde{\\nu}_0)'$ in comparison with other tools"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:09.360000Z",
"start_time": "2019-05-23T16:00:01.368Z"
}
},
"outputs": [],
"source": [
"library(sommer)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:09.397000Z",
"start_time": "2019-05-23T16:00:01.370Z"
}
},
"outputs": [],
"source": [
"REMLE_sommer <- function(X, F, V) {\n",
" f1 <- F[,1]\n",
" f2 <- F[,2]\n",
" f3 <- F[,3]\n",
" v1 <- V[,1]\n",
" v2 <- V[,2]\n",
" v3 <- V[,3]\n",
"\n",
" \n",
" DT <- data.frame(X, f1, f2, f3, v1, v2, v3)\n",
"\n",
" suppressWarnings(result_new <- mmer(fixed = X~f2+f3, random = ~ vs(ds(v1),1)+vs(ds(v2),1)+vs(ds(v3),1), data = DT, \n",
" verbose = FALSE))\n",
" \n",
" output <- as.vector(unlist(result_new$sigma))\n",
" \n",
" return(output)\n",
"\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:09.538000Z",
"start_time": "2019-05-23T16:00:01.373Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.001072577 0.227469441 0.020856440 0.107667700\n",
"[1] 0.2525289\n"
]
}
],
"source": [
"REMLEsommer <- REMLE_sommer(x, F, V)\n",
"print(REMLEsommer)\n",
"print(norm_vec(REMLEsommer))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### fdslrm (sommer)\n",
"function `fitDiagFDSLRM` via parameter `\"mmer\"` implements `mmer` function from `sommer`"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:09.659000Z",
"start_time": "2019-05-23T16:00:01.375Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[34mfixed-effect model matrix is rank deficient so dropping 1 columns / coefficients\n",
"\u001b[39m"
]
}
],
"source": [
"# fit particular FDSLRM employing REML estimation of variances (by 'mmer' function)\n",
"REMLEfitsommer <- suppressWarnings(fitDiagFDSLRM(x, t, freq_mean = c(1/76, 2/76), include_fixed_eff = c(1,0,0,1), \n",
" freq_random = c(19/76, 38/76), include_random_eff = c(1,1,1,0), \n",
" fit_function = \"mmer\", var_estim_method = \"REML\"))"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:09.716000Z",
"start_time": "2019-05-23T16:00:01.378Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.107667700 0.001072577 0.227469441 0.020856440\n",
"[1] 0.2525289\n"
]
}
],
"source": [
"print(as.vector(c(REMLEfitsommer$error_variance, REMLEfitsommer$rand_eff_variance)))\n",
"print(norm_vec(c(REMLEfitsommer$error_variance, REMLEfitsommer$rand_eff_variance)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***\n",
"\n",
"# References \n",
"This notebook belongs to suplementary materials of the paper submitted to Statistical Papers and available at .\n",
"\n",
"* Hančová, M., Vozáriková, G., Gajdoš, A., Hanč, J. (2019). [Estimating variance components in time series\n",
"\tlinear regression models using empirical BLUPs and convex optimization](https://arxiv.org/abs/1905.07771), https://arxiv.org/, 2019.\n",
"\n",
"### Abstract of the paper\n",
"\n",
"We propose a two-stage estimation method of variance components in time series models known as FDSLRMs, whose observations can be described by a linear mixed model (LMM). We based estimating variances, fundamental quantities in a time series forecasting approach called kriging, on the empirical (plug-in) best linear unbiased predictions of unobservable random components in FDSLRM. \n",
"\n",
"The method, providing invariant non-negative quadratic estimators, can be used for any absolutely continuous probability distribution of time series data. As a result of applying the convex optimization and the LMM methodology, we resolved two problems $-$ theoretical existence and equivalence between least squares estimators, non-negative (M)DOOLSE, and maximum likelihood estimators, (RE)MLE, as possible starting points of our method and a \n",
"practical lack of computational implementation for FDSLRM. As for computing (RE)MLE in the case of $ n $ observed time series values, we also discovered a new algorithm of order $\\mathcal{O}(n)$, which at the default precision is $10^7$ times more accurate and $n^2$ times faster than the best current Python(or R)-based computational packages, namely CVXPY, CVXR, nlme, sommer and mixed. \n",
"\n",
"We illustrate our results on three real data sets $-$ electricity consumption, tourism and cyber security $-$ which are easily available, reproducible, sharable and modifiable in the form of interactive Jupyter notebooks."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Hyndman R.J., Athanasopoulos G. (2018). [Forecasting: Principles and Practice](https://otexts.org/fpp2/) (2nd Edition), OTexts, Monash University, Australia. Data in R package fpp2 version 2.3. https://CRAN.R-project.org/package=fpp2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"| [Table of Contents](#table_of_contents) | [Data and model](#data_and_model) | [MLE](#mle) | [REMLE](#remle) | [References](#references) |"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "3.5.1"
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 1
}