{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "| [Table of Contents](#table_of_contents) | [Data and model](#data_and_model) | [MLE](#mle) | [REMLE](#remle) | [References](#references) |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Authors:** Andrej Gajdoš, Jozef Hanč, Martina Hančová
*[Faculty of Science](https://www.upjs.sk/en/faculty-of-science/?prefferedLang=EN), P. J. Šafárik University in Košice, Slovakia*
email: [andrej.gajdos@student.upjs.sk](mailto:andrej.gajdos@student.upjs.sk)\n", "***\n", "** MLE, REMLE for tourism** " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " R-based computational tools - ** fdslrm, nlme, MMEinR(mixed), sommer** \n", "\n", "### Table of Contents \n", "* [Data and model](#data_and_model) - data and model description of empirical data\n", "* [MLE](#mle) - computation by standard packages *nlme*, *MMEinR(mixed)* and *fdslrm*\n", "* [REMLE](#remle) - computation by standard package *nlme*, *MMEinR(mixed)*, *sommer* and *fdslrm*\n", "* [References](#references)\n", "\n", "**To get back to the table of contents, use the Home key.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***\n", "\n", "# Data and model " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this econometric FDSLRM application, we consider the time series data set, called \n", "`visnights`, representing *total quarterly visitor nights* (in millions) from \n", "1998-2016 in one of the regions of Australia -- inner zone of Victoria state. The number of time \n", "series observations is $n=76$. The data was adapted from _Hyndman, 2018_.\n", "\n", "The Gaussian orthogonal FDSLRM fitting the tourism data has the following form:\n", "\n", "$ \n", "\\begin{array}{rl}\n", "& X(t) & \\! = \\! & \\beta_1+\\beta_2\\cos\\left(\\tfrac{2\\pi t}{76}\\right)+\\beta_3\\sin\\left(\\tfrac{2\\pi t\\cdot 2}{76}\\right) + \\\\\n", "& & & +Y_1\\cos\\left(\\tfrac{2\\pi t\\cdot 19 }{76}\\right)+Y_2\\sin\\left(\\tfrac{2\\pi t\\cdot 19}{76}\\right) +Y_3\\cos\\left(\\tfrac{2\\pi t\\cdot 38}{76}\\right) +w(t), \\, t\\in \\mathbb{N},\n", "\\end{array}\n", "$ \n", "\n", "where $\\boldsymbol{\\beta}=(\\beta_1,\\,\\beta_2,\\,\\beta_3)' \\in \\mathbb{R}^3, \\mathbf{Y} = (Y_1, Y_2, Y_3)' \\sim \\mathcal{N}_3(\\boldsymbol{0}, \\mathrm{D}), w(t) \\sim \\mathcal{iid}\\, \\mathcal{N} (0, \\sigma_0^2), \\boldsymbol{\\nu}=(\\sigma_0^2, \\sigma_1^2, \\sigma_2^2, \\sigma_3^2) \\in \\mathbb{R}_{+}^4$.\n", "\n", "We identified the given and most parsimonious structure of the FDSLRM using an iterative process of the model building and selection based on exploratory tools of *spectral analysis* and *residual diagnostics* (for details see our Jupyter notebook `tourism.ipynb`)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:01.373000Z", "start_time": "2019-05-23T16:00:01.311Z" } }, "outputs": [], "source": [ "# data - time series observation\n", "x <- read.csv2('tourism.csv', header = FALSE)\n", "x <- x[,2]\n", "x <- as.numeric(as.vector(x[-1]))\n", "t <- 1:length(x)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:01.443000Z", "start_time": "2019-05-23T16:00:01.314Z" } }, "outputs": [], "source": [ "# auxilliary functions to create design matrices F, V and the projection matrix M_F\n", "makeF <- function(times, freqs) {\n", "\n", " n <- length(times)\n", " f <- length(freqs)\n", " c1 <- matrix(1, n)\n", " F <- matrix(c1, n, 1)\n", "\n", " for (i in 1:f) {\n", " F <- cbind(F, cos(2 * pi * freqs[i] * times))\n", " F <- cbind(F, sin(2 * pi * freqs[i] * times))\n", " }\n", "\n", " return(F)\n", "}\n", "\n", "makeV <- function(times, freqs) {\n", "\n", " V <- makeF(times, freqs)\n", " V <- V[, -1]\n", "\n", " return(V)\n", "}\n", "\n", "makeM_F <- function(F) {\n", "\n", " n <- nrow(F)\n", " c1 <- rep(1, n)\n", " I <- diag(c1)\n", " M_F <- I - F %*% solve((t(F) %*% F)) %*% t(F)\n", "\n", " return(M_F)\n", "}" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:01.496000Z", "start_time": "2019-05-23T16:00:01.316Z" } }, "outputs": [], "source": [ "# model parameters\n", "n <- 76 \n", "k <- 3 \n", "l <- 3\n", "\n", "# model - design matrices F, V\n", "F <- makeF(t, c(1/76, 2/76))\n", "F <- F[,-c(3,4)]\n", "\n", "V <- makeV(t, c(19/76, 38/76))\n", "V <- V[,-4]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***\n", "\n", "# MLE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## nlme: Linear and Nonlinear Mixed Effects Models\n", "* _Purpose_: R package to fit and compare Gaussian linear and nonlinear mixed-effects models \n", "* _Version_: 3.1-140, 2019\n", "* _URL_: https://CRAN.R-project.org/package=nlme]\n", "* _Key tool_: `lme` function for fitting LMM\n", "* _Computational parameters_ of `lme`:\n", "> * `maxIter` - maximum number of iterations (default: 50).\n", " * `msMaxIter` - maximum number of iterations for the optimization step (default: 50).\n", " * `tolerance` - tolerance for the convergence criterion (default: 1e-6).\n", " * `niterEM` - number of iterations for the EM algorithm used to refine the initial estimates of the random effects (default: 25).\n", " * `msMaxEval` - maximum number of evaluations of the objective function permitted for optimizer nlminb (default: 200).\n", " * `msTol` - tolerance for the convergence criterion on the first iteration when optim is used (default is 1e-7). \n", " * `.relStep` - relative step for numerical derivatives calculations. Default is `.Machine$double.eps^(1/3)` while `.Machine$double.eps = 2.220446e-16`. \n", " * `opt` - the optimizer to be used, either \"nlminb\" (the default) or \"optim\".\n", " * `optimMethod` - the optimization method, a version of quasi-Newton method to be used with the optim optimizer (default:\"BFGS\", alternative: \"L-BFGS-B\")\n", " * `minAbsParApVar` - numeric value - minimum absolute parameter value in the approximate variance calculation (default: 0.05). " ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:01.590000Z", "start_time": "2019-05-23T16:00:01.319Z" } }, "outputs": [], "source": [ "library(nlme)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:01.634000Z", "start_time": "2019-05-23T16:00:01.322Z" } }, "outputs": [], "source": [ "REMLE_nlme <- function(X, F, V, method) {\n", " g <- rep(1,length(X))\n", " d <- data.frame(g,F,V,X)\n", " colnames(d) <- c(\"g\",\"f1\",\"f2\",\"f3\",\"v1\",\"v2\",\"v3\",\"X\")\n", " \n", " result <- NULL\n", " \n", " if(method == \"ML\") {\n", " result <- lme(fixed=X~f2+f3,random=list(g=pdDiag(~-1+v1+v2+v3)),data=d,method=\"ML\")\n", " } else {\n", " result <- lme(fixed=X~f2+f3,random=list(g=pdDiag(~-1+v1+v2+v3)),data=d,method=\"REML\")\n", " }\n", " \n", " return(as.vector(c(result$sigma^2, diag(getVarCov(result)))))\n", "}" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:01.669000Z", "start_time": "2019-05-23T16:00:01.325Z" } }, "outputs": [], "source": [ "# auxilliary function to calculate the norm of a vector\n", "norm_vec <- function(x) sqrt(sum(x^2))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:01.745000Z", "start_time": "2019-05-23T16:00:01.329Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 0.103243075 0.001188691 0.227594308 0.020914576\n" ] }, { "data": { "text/html": [ "0.250793009894722" ], "text/latex": [ "0.250793009894722" ], "text/markdown": [ "0.250793009894722" ], "text/plain": [ "[1] 0.250793" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "MLEnlme <- REMLE_nlme(x, F, V, \"ML\")\n", "print(MLEnlme)\n", "norm_vec(MLEnlme)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## fdslrm: Time series analysis and forecasting using LMM\n", " \n", "* _Purpose_: R package for modeling and prediction of time series using linear mixed models.\n", "* _Version_: 0.1.0, 2019\n", "* _Depends_: kableExtra, IRdisplay, MASS, Matrix, car, nlme, stats, forecast, fpp2, matrixcalc, sommer, gnm, pracma, CVXR\n", "* _Maintainer_: Andrej Gajdoš\n", "* _Authors_: Andrej Gajdoš, Jozef Hanč, Martina Hančová\n", "* _URL_: https://github.com/fdslrm/R-package\n", "* _Installation_: Run jupyter notebook `00 installation fdslrm.ipynb` once before the first run of any R-based Jupyter notebook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### fdslrm (nlme)\n", "function `fitDiagFDSLRM` via parameter `\"lme\"` implements `lme` function from `nlme`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Important note:* \n", " A brief help on R functions of our fdslrm package designed to work with FDSLRM is in the modelling notebooks. After our testing, the most reliable way to install our fdslrm package in a Binder repository is its direct loading from GitHub. The standard installation of our fdslrm package as in the case of any R package on GitHub works without any problems in a local installation using Anaconda R distribution or CRAN distribution." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:05.264000Z", "start_time": "2019-05-23T16:00:01.333Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "SHA-1 hash of file is f56b9d53e72a8575947a467930a2bdddb5b500ad\n" ] } ], "source": [ "# loading all fdslrm functions as an R script from GiHub\n", "devtools::source_url(\"https://github.com/fdslrm/fdslrmAllinOne/blob/master/fdslrmAllinOne.R?raw=TRUE\")\n", "initialFDSLRM()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:07.391000Z", "start_time": "2019-05-23T16:00:01.335Z" } }, "outputs": [], "source": [ "# fit particular FDSLRM employing ML estimation of variances (by 'lme' function)\n", "MLEfitnlme <- fitDiagFDSLRM(x, t, freq_mean = c(1/76, 2/76), include_fixed_eff = c(1,0,0,1), freq_random = c(19/76, 38/76), \n", " include_random_eff = c(1,1,1,0), fit_function = \"lme\", var_estim_method = \"ML\")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:07.435000Z", "start_time": "2019-05-23T16:00:01.337Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 0.103243075 0.001188691 0.227594308 0.020914576\n" ] }, { "data": { "text/html": [ "0.250793009894722" ], "text/latex": [ "0.250793009894722" ], "text/markdown": [ "0.250793009894722" ], "text/plain": [ "[1] 0.250793" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(as.vector(c(MLEfitnlme$error_variance, diag(MLEfitnlme$rand_eff_variance))))\n", "Norm(as.vector(c(MLEfitnlme$error_variance, diag(MLEfitnlme$rand_eff_variance))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## MMEinR\n", "R version of Witkovský's MATLAB function `mixed` https://www.mathworks.com/matlabcentral/fileexchange/200\n", "* _Purpose_: R function to estimate parameters of a linear mixed model (LMM) with a simple variance components structure\n", "* _URL_: https://github.com/fdslrm/MMEinR\n", "* _Computational parameters_ of `MMEinR (mixed)`:\n", ">* tolerance for the convergence criterion (`epss = 1e-8`)\n", ">* iterative method solving Henderson's mixed model equations \n", ">* function return the estimates of variance parameters in a different order: $(\\tilde{\\nu}_1,\\ldots,\\tilde{\\nu}_l,\\tilde{\\nu}_0)'$ in comparison with other tools" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:08.666000Z", "start_time": "2019-05-23T16:00:01.339Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "SHA-1 hash of file is 36b3b4cd1a1f98c5fb36b541fc6b3956f6663aba\n", "Warning message in mixed(y, X, Z, dim, s20, method):\n", "\"Estimated variance components are negative or zeros!\"" ] } ], "source": [ "# load function 'MMEinR'\n", "devtools::source_url(\"https://github.com/fdslrm/MMEinR/blob/master/MMEinR.R?raw=TRUE\")" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:08.742000Z", "start_time": "2019-05-23T16:00:01.342Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 0.001188715 0.227589326 0.020914669 0.103243088\n", "[1] 0.2507885\n" ] } ], "source": [ "MLEmixed <- mixed(x, F, V, c(1,1,1), c(1,1,1,1), 1)\n", "print(MLEmixed$s2)\n", "print(norm_vec(MLEmixed$s2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### fdslrm (MMEinR)\n", "function `fitDiagFDSLRM` via parameter `\"mixed\"` implements R version of MATLAB `mixed` function" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:08.806000Z", "start_time": "2019-05-23T16:00:01.344Z" } }, "outputs": [], "source": [ "# fit particular FDSLRM employing ML estimation of variances (by 'mixed' function)\n", "MLEfitmixed <- fitDiagFDSLRM(x, t, freq_mean = c(1/76, 2/76), include_fixed_eff = c(1,0,0,1), freq_random = c(19/76, 38/76), \n", " include_random_eff = c(1,1,1,0), fit_function = \"mixed\", var_estim_method = \"ML\")" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:08.848000Z", "start_time": "2019-05-23T16:00:01.346Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 0.103243088 0.001188715 0.227589326 0.020914669\n" ] }, { "data": { "text/html": [ "0.250788501950155" ], "text/latex": [ "0.250788501950155" ], "text/markdown": [ "0.250788501950155" ], "text/plain": [ "[1] 0.2507885" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(as.vector(c(MLEfitmixed$error_variance, MLEfitmixed$rand_eff_variance)))\n", "norm_vec(as.vector(c(MLEfitmixed$error_variance, MLEfitmixed$rand_eff_variance)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***\n", "\n", "# REMLE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## nlme" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:08.905000Z", "start_time": "2019-05-23T16:00:01.350Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 0.107667792 0.001072223 0.227470576 0.020857001\n" ] }, { "data": { "text/html": [ "0.252529959856975" ], "text/latex": [ "0.252529959856975" ], "text/markdown": [ "0.252529959856975" ], "text/plain": [ "[1] 0.25253" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "REMLEnlme <- REMLE_nlme(x, F, V, \"REML\")\n", "print(REMLEnlme)\n", "norm_vec(REMLEnlme)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### fdslrm (nlme)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:09.092000Z", "start_time": "2019-05-23T16:00:01.352Z" } }, "outputs": [], "source": [ "# fit particular FDSLRM employing ML estimation of variances (by 'lme' function)\n", "REMLEfitnlme <- fitDiagFDSLRM(x, t, freq_mean = c(1/76, 2/76), include_fixed_eff = c(1,0,0,1), freq_random = c(19/76, 38/76), \n", " include_random_eff = c(1,1,1,0), fit_function = \"lme\", var_estim_method = \"REML\")" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:09.133000Z", "start_time": "2019-05-23T16:00:01.355Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 0.107667792 0.001072223 0.227470576 0.020857001\n" ] }, { "data": { "text/html": [ "0.252529959856975" ], "text/latex": [ "0.252529959856975" ], "text/markdown": [ "0.252529959856975" ], "text/plain": [ "[1] 0.25253" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(as.vector(c(REMLEfitnlme$error_variance, diag(REMLEfitnlme$rand_eff_variance))))\n", "norm_vec(as.vector(c(REMLEfitnlme$error_variance, diag(REMLEfitnlme$rand_eff_variance))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## MMEinR" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:09.203000Z", "start_time": "2019-05-23T16:00:01.358Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 0.00107228 0.22747289 0.02085645 0.10766779\n", "[1] 0.252532\n" ] } ], "source": [ "REMLEmixed <- mixed(x, F, V, c(1,1,1), c(1,1,1,1), 2)\n", "print(REMLEmixed$s2)\n", "print(norm_vec(REMLEmixed$s2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### fdslrm (MMEinR)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:09.277000Z", "start_time": "2019-05-23T16:00:01.361Z" } }, "outputs": [], "source": [ "# fit particular FDSLRM employing ML estimation of variances (by 'mixed' function)\n", "REMLEfitmixed <- fitDiagFDSLRM(x, t, freq_mean = c(1/76, 2/76), include_fixed_eff = c(1,0,0,1), freq_random = c(19/76, 38/76), \n", " include_random_eff = c(1,1,1,0), fit_function = \"mixed\", var_estim_method = \"REML\")" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:09.319000Z", "start_time": "2019-05-23T16:00:01.365Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 0.10766779 0.00107228 0.22747289 0.02085645\n" ] }, { "data": { "text/html": [ "0.252531993886652" ], "text/latex": [ "0.252531993886652" ], "text/markdown": [ "0.252531993886652" ], "text/plain": [ "[1] 0.252532" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(as.vector(c(REMLEfitmixed$error_variance, REMLEfitmixed$rand_eff_variance)))\n", "norm_vec(as.vector(c(REMLEfitmixed$error_variance, REMLEfitmixed$rand_eff_variance)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## sommer: Solving Mixed Model Equations in R\n", "* _Purpose_: R package - structural multivariate-univariate linear mixed model solver for multiple random effects and estimation of unknown variance-covariance structures (i.e. heterogeneous and unstructured variance models). \n", " Designed for genomic prediction and genome wide association studies (GWAS), particularly focused in the $p$ > $n$ problem (more coefficients than observations) to include multiple known relationship matrices and estimate complex unknown covariance structures. Spatial models can be fitted using the two-dimensional spline functionality in sommer. \n", "\n", "* _Version_: 3.9.3, 2019\n", "* _URL_: https://CRAN.R-project.org/package=sommer\n", "* _Key tool_: `mmer` function for fitting\n", "* _computational parameters_ of `mmer`:\n", "> * `iters` - maximum number of iterations (default: 20).\n", " * `tolpar` - tolerance for the convergence criterion for the change in loglikelihood (default: 1e-03)\n", " * `tolparinv` - tolerance parameter for matrix inverse used when singularities are encountered (default:1e-06). \n", " * `method` - this refers to the method or algorithm to be used for estimating variance components (default: NR, direct-inversion Newton-Raphson method, alternative: AI, Average Information) \n", " * function return the estimates of variance parameters in a different order: $(\\tilde{\\nu}_1,\\ldots,\\tilde{\\nu}_l,\\tilde{\\nu}_0)'$ in comparison with other tools" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:09.360000Z", "start_time": "2019-05-23T16:00:01.368Z" } }, "outputs": [], "source": [ "library(sommer)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:09.397000Z", "start_time": "2019-05-23T16:00:01.370Z" } }, "outputs": [], "source": [ "REMLE_sommer <- function(X, F, V) {\n", " f1 <- F[,1]\n", " f2 <- F[,2]\n", " f3 <- F[,3]\n", " v1 <- V[,1]\n", " v2 <- V[,2]\n", " v3 <- V[,3]\n", "\n", " \n", " DT <- data.frame(X, f1, f2, f3, v1, v2, v3)\n", "\n", " suppressWarnings(result_new <- mmer(fixed = X~f2+f3, random = ~ vs(ds(v1),1)+vs(ds(v2),1)+vs(ds(v3),1), data = DT, \n", " verbose = FALSE))\n", " \n", " output <- as.vector(unlist(result_new$sigma))\n", " \n", " return(output)\n", "\n", "}" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:09.538000Z", "start_time": "2019-05-23T16:00:01.373Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 0.001072577 0.227469441 0.020856440 0.107667700\n", "[1] 0.2525289\n" ] } ], "source": [ "REMLEsommer <- REMLE_sommer(x, F, V)\n", "print(REMLEsommer)\n", "print(norm_vec(REMLEsommer))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### fdslrm (sommer)\n", "function `fitDiagFDSLRM` via parameter `\"mmer\"` implements `mmer` function from `sommer`" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:09.659000Z", "start_time": "2019-05-23T16:00:01.375Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mfixed-effect model matrix is rank deficient so dropping 1 columns / coefficients\n", "\u001b[39m" ] } ], "source": [ "# fit particular FDSLRM employing REML estimation of variances (by 'mmer' function)\n", "REMLEfitsommer <- suppressWarnings(fitDiagFDSLRM(x, t, freq_mean = c(1/76, 2/76), include_fixed_eff = c(1,0,0,1), \n", " freq_random = c(19/76, 38/76), include_random_eff = c(1,1,1,0), \n", " fit_function = \"mmer\", var_estim_method = \"REML\"))" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T16:00:09.716000Z", "start_time": "2019-05-23T16:00:01.378Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 0.107667700 0.001072577 0.227469441 0.020856440\n", "[1] 0.2525289\n" ] } ], "source": [ "print(as.vector(c(REMLEfitsommer$error_variance, REMLEfitsommer$rand_eff_variance)))\n", "print(norm_vec(c(REMLEfitsommer$error_variance, REMLEfitsommer$rand_eff_variance)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***\n", "\n", "# References \n", "This notebook belongs to suplementary materials of the paper submitted to Statistical Papers and available at .\n", "\n", "* Hančová, M., Vozáriková, G., Gajdoš, A., Hanč, J. (2019). [Estimating variance components in time series\n", "\tlinear regression models using empirical BLUPs and convex optimization](https://arxiv.org/abs/1905.07771), https://arxiv.org/, 2019.\n", "\n", "### Abstract of the paper\n", "\n", "We propose a two-stage estimation method of variance components in time series models known as FDSLRMs, whose observations can be described by a linear mixed model (LMM). We based estimating variances, fundamental quantities in a time series forecasting approach called kriging, on the empirical (plug-in) best linear unbiased predictions of unobservable random components in FDSLRM. \n", "\n", "The method, providing invariant non-negative quadratic estimators, can be used for any absolutely continuous probability distribution of time series data. As a result of applying the convex optimization and the LMM methodology, we resolved two problems $-$ theoretical existence and equivalence between least squares estimators, non-negative (M)DOOLSE, and maximum likelihood estimators, (RE)MLE, as possible starting points of our method and a \n", "practical lack of computational implementation for FDSLRM. As for computing (RE)MLE in the case of $ n $ observed time series values, we also discovered a new algorithm of order $\\mathcal{O}(n)$, which at the default precision is $10^7$ times more accurate and $n^2$ times faster than the best current Python(or R)-based computational packages, namely CVXPY, CVXR, nlme, sommer and mixed. \n", "\n", "We illustrate our results on three real data sets $-$ electricity consumption, tourism and cyber security $-$ which are easily available, reproducible, sharable and modifiable in the form of interactive Jupyter notebooks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Hyndman R.J., Athanasopoulos G. (2018). [Forecasting: Principles and Practice](https://otexts.org/fpp2/) (2nd Edition), OTexts, Monash University, Australia. Data in R package fpp2 version 2.3. https://CRAN.R-project.org/package=fpp2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| [Table of Contents](#table_of_contents) | [Data and model](#data_and_model) | [MLE](#mle) | [REMLE](#remle) | [References](#references) |" ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.5.1" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }