{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "| [Table of Contents](#table_of_contents) | [Data and model](#data_and_model) | [MLE](#mle) | [REMLE](#remle) | [References](#references) |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Authors:** Andrej Gajdoš, Jozef Hanč, Martina Hančová
*[Faculty of Science](https://www.upjs.sk/en/faculty-of-science/?prefferedLang=EN), P. J. Šafárik University in Košice, Slovakia*
email: [andrej.gajdos@student.upjs.sk](mailto:andrej.gajdos@student.upjs.sk)\n", "***\n", "** MLE, REMLE for electricity consumption 1** " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " R-based computational tools - ** fdslrm, nlme, MMEinR(mixed), sommer** \n", "\n", "### Table of Contents \n", "* [Data and model](#data_and_model) - data and model description of empirical data\n", "* [MLE](#mle) - computation by standard packages *nlme*, *MMEinR(mixed)* and *fdslrm*\n", "* [REMLE](#remle) - computation by standard package *nlme*, *MMEinR(mixed)*, *sommer* and *fdslrm*\n", "* [References](#references)\n", "\n", "**To get back to the table of contents, use the Home key.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***\n", "\n", "# Data and Toy model 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this FDSLRM application, we model the econometric time series data set, representing the hours observations of the consumption of the electric energy in some department store. The number of time series observations is $n=24$. The data was adapted from _Štulajter & Witkovský, 2004_ and the FDSLRM model from _Gajdoš et al 2017_.\n", "\n", "The consumption data can be fitted by the following Gaussian orthogonal FDSLRM:\n", "\n", "$ \n", "\\begin{array}{rl}\n", "& X(t) & \\! = \\! & \\beta_1+\\beta_2\\cos\\left(\\tfrac{2\\pi t}{24}\\right)+\\beta_3\\sin\\left(\\tfrac{2\\pi t}{24}\\right) +\\\\\n", "& & & +Y_1\\cos\\left(\\tfrac{2\\pi t\\cdot 3}{24}\\right)+Y_2\\sin\\left(\\tfrac{2\\pi t\\cdot 3}{24}\\right)+\\\\\n", "& & & +Y_3\\cos\\left(\\tfrac{2\\pi t\\cdot 4}{24}\\right)+Y_4\\sin\\left(\\tfrac{2\\pi t\\cdot 4}{24}\\right)\n", "+w(t), \\, t\\in \\mathbb{N},\n", "\\end{array}\n", "$ \n", "\n", "where $\\boldsymbol{\\beta}=(\\beta_1,\\,\\beta_2,\\,\\beta_3)' \\in \\mathbb{R}^3\\,,\\mathbf{Y} = (Y_1, Y_2, Y_3, Y_4)' \\sim \\mathcal{N}_4(\\boldsymbol{0}, \\mathrm{D})\\,, w(t) \\sim \\mathcal{iid}\\, \\mathcal{N} (0, \\sigma_0^2)\\,, \\boldsymbol{\\nu}= (\\sigma_0^2, \\sigma_1^2, \\sigma_2^2, \\sigma_3^2, \\sigma_4^2) \\in \\mathbb{R}_{+}^5.$ " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:02.780000Z", "start_time": "2019-05-23T15:58:02.737Z" } }, "outputs": [], "source": [ "# data - time series observation\n", "x <- c(40.3,40.7,38.5,37.9,38.6,41.1,45.2,45.7,46.7,46.5,\n", " 45.2,45.1,45.8,46.3,47.5,48.5,49.1,51.7,50.6,48.0,\n", " 44.7,41.2,40.0,40.3)\n", "t <- 1:length(x)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:02.858000Z", "start_time": "2019-05-23T15:58:02.740Z" } }, "outputs": [], "source": [ "# auxilliary functions to create design matrices F, V and the projection matrix M_F\n", "makeF <- function(times, freqs) {\n", "\n", " n <- length(times)\n", " f <- length(freqs)\n", " c1 <- matrix(1, n)\n", " F <- matrix(c1, n, 1)\n", "\n", " for (i in 1:f) {\n", " F <- cbind(F, cos(2 * pi * freqs[i] * times))\n", " F <- cbind(F, sin(2 * pi * freqs[i] * times))\n", " }\n", "\n", " return(F)\n", "}\n", "\n", "makeV <- function(times, freqs) {\n", "\n", " V <- makeF(times, freqs)\n", " V <- V[, -1]\n", "\n", " return(V)\n", "}\n", "\n", "makeM_F <- function(F) {\n", "\n", " n <- nrow(F)\n", " c1 <- rep(1, n)\n", " I <- diag(c1)\n", " M_F <- I - F %*% solve((t(F) %*% F)) %*% t(F)\n", "\n", " return(M_F)\n", "}" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:02.913000Z", "start_time": "2019-05-23T15:58:02.743Z" } }, "outputs": [], "source": [ "# model parameters\n", "n <- 24 \n", "k <- 3 \n", "l <- 4\n", "\n", "# model - design matrices F, V\n", "F <- makeF(t, c(1/24))\n", "V <- makeV(t, c(3/24, 4/24))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***\n", "\n", "# MLE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## nlme: Linear and Nonlinear Mixed Effects Models\n", "* _Purpose_: R package to fit and compare Gaussian linear and nonlinear mixed-effects models \n", "* _Version_: 3.1-140, 2019\n", "* _URL_: https://CRAN.R-project.org/package=nlme]\n", "* _Key tool_: `lme` function for fitting LMM\n", "* _Computational parameters_ of `lme`:\n", "> * `maxIter` - maximum number of iterations (default: 50).\n", " * `msMaxIter` - maximum number of iterations for the optimization step (default: 50).\n", " * `tolerance` - tolerance for the convergence criterion (default: 1e-6).\n", " * `niterEM` - number of iterations for the EM algorithm used to refine the initial estimates of the random effects (default: 25).\n", " * `msMaxEval` - maximum number of evaluations of the objective function permitted for optimizer nlminb (default: 200).\n", " * `msTol` - tolerance for the convergence criterion on the first iteration when optim is used (default is 1e-7). \n", " * `.relStep` - relative step for numerical derivatives calculations. Default is `.Machine$double.eps^(1/3)` while `.Machine$double.eps = 2.220446e-16`. \n", " * `opt` - the optimizer to be used, either \"nlminb\" (the default) or \"optim\".\n", " * `optimMethod` - the optimization method, a version of quasi-Newton method to be used with the optim optimizer (default:\"BFGS\", alternative: \"L-BFGS-B\")\n", " * `minAbsParApVar` - numeric value - minimum absolute parameter value in the approximate variance calculation (default: 0.05). " ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:03.009000Z", "start_time": "2019-05-23T15:58:02.747Z" } }, "outputs": [], "source": [ "library(nlme)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:03.040000Z", "start_time": "2019-05-23T15:58:02.749Z" } }, "outputs": [], "source": [ "REMLE_nlme <- function(X, F, V, method) {\n", " g <- rep(1,length(X))\n", " d <- data.frame(g,F,V,X)\n", " colnames(d) <- c(\"g\",\"f1\",\"f2\",\"f3\",\"v1\",\"v2\",\"v3\",\"v4\",\"X\")\n", " \n", " result <- NULL\n", " \n", " if(method == \"ML\") {\n", " result <- lme(fixed=X~f2+f3,random=list(g=pdDiag(~-1+v1+v2+v3+v4)),data=d,method=\"ML\")\n", " } else {\n", " result <- lme(fixed=X~f2+f3,random=list(g=pdDiag(~-1+v1+v2+v3+v4)),data=d,method=\"REML\")\n", " }\n", " \n", " return(as.vector(c(result$sigma^2, diag(getVarCov(result)))))\n", "}" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:03.081000Z", "start_time": "2019-05-23T15:58:02.752Z" } }, "outputs": [], "source": [ "# auxilliary function to calculate the norm of a vector\n", "norm_vec <- function(x) sqrt(sum(x^2))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:03.183000Z", "start_time": "2019-05-23T15:58:02.755Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 2.862039e+00 1.334114e-01 1.624978e+00 5.313318e-09 1.028995e+00\n" ] }, { "data": { "text/html": [ "3.4508628045055" ], "text/latex": [ "3.4508628045055" ], "text/markdown": [ "3.4508628045055" ], "text/plain": [ "[1] 3.450863" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "MLEnlme <- REMLE_nlme(x, F, V, \"ML\")\n", "print(MLEnlme)\n", "norm_vec(MLEnlme)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## fdslrm: Time series analysis and forecasting using LMM\n", " \n", "* _Purpose_: R package for modeling and prediction of time series using linear mixed models.\n", "* _Version_: 0.1.0, 2019\n", "* _Depends_: kableExtra, IRdisplay, MASS, Matrix, car, nlme, stats, forecast, fpp2, matrixcalc, sommer, gnm, pracma, CVXR\n", "* _Maintainer_: Andrej Gajdoš\n", "* _Authors_: Andrej Gajdoš, Jozef Hanč, Martina Hančová\n", "* _URL_: https://github.com/fdslrm/R-package\n", "* _Installation_: Run jupyter notebook `00 installation fdslrm.ipynb` once before the first run of any R-based Jupyter notebook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### fdslrm (nlme)\n", "function `fitDiagFDSLRM` via parameter `\"lme\"` implements `lme` function from `nlme`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Important note:* \n", " A brief help on R functions of our fdslrm package designed to work with FDSLRM is in the modelling notebooks. After our testing, the most reliable way to install our fdslrm package in a Binder repository is its direct loading from GitHub. The standard installation of our fdslrm package as in the case of any R package on GitHub works without any problems in a local installation using Anaconda R distribution or CRAN distribution." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:07.193000Z", "start_time": "2019-05-23T15:58:02.759Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "SHA-1 hash of file is f56b9d53e72a8575947a467930a2bdddb5b500ad\n" ] } ], "source": [ "# loading all fdslrm functions as an R script from GiHub\n", "devtools::source_url(\"https://github.com/fdslrm/fdslrmAllinOne/blob/master/fdslrmAllinOne.R?raw=TRUE\")\n", "initialFDSLRM()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:09.355000Z", "start_time": "2019-05-23T15:58:02.761Z" } }, "outputs": [], "source": [ "# fit particular FDSLRM employing ML estimation of variances (by 'lme' function)\n", "MLEfitnlme <- fitDiagFDSLRM(x, t, freq_mean = c(1/24), freq_random = c(3/24, 4/24), \n", " fit_function = \"lme\", var_estim_method = \"ML\")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:09.401000Z", "start_time": "2019-05-23T15:58:02.765Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 2.862039e+00 1.334114e-01 1.624978e+00 5.313318e-09 1.028995e+00\n" ] }, { "data": { "text/html": [ "3.4508628045055" ], "text/latex": [ "3.4508628045055" ], "text/markdown": [ "3.4508628045055" ], "text/plain": [ "[1] 3.450863" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(as.vector(c(MLEfitnlme$error_variance, diag(MLEfitnlme$rand_eff_variance))))\n", "norm_vec(as.vector(c(MLEfitnlme$error_variance, diag(MLEfitnlme$rand_eff_variance))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## MMEinR\n", "R version of Witkovský's MATLAB function `mixed` https://www.mathworks.com/matlabcentral/fileexchange/200\n", "* _Purpose_: R function to estimate parameters of a linear mixed model (LMM) with a simple variance components structure\n", "* _URL_: https://github.com/fdslrm/MMEinR\n", "* _Computational parameters_ of `MMEinR (mixed)`:\n", ">* tolerance for the convergence criterion (`epss = 1e-8`)\n", ">* iterative method solving Henderson's mixed model equations \n", ">* function return the estimates of variance parameters in a different order: $(\\tilde{\\nu}_1,\\ldots,\\tilde{\\nu}_l,\\tilde{\\nu}_0)'$ in comparison with other tools" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:10.788000Z", "start_time": "2019-05-23T15:58:02.768Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "SHA-1 hash of file is 36b3b4cd1a1f98c5fb36b541fc6b3956f6663aba\n" ] } ], "source": [ "# load function 'MMEinR'\n", "devtools::source_url(\"https://github.com/fdslrm/MMEinR/blob/master/MMEinR.R?raw=TRUE\")" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:10.884000Z", "start_time": "2019-05-23T15:58:02.771Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Warning message in mixed(x, F, V, c(1, 1, 1, 1), c(1, 1, 1, 1, 1), 1):\n", "\"Estimated variance components are negative or zeros!\"" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[1] 0.1334323 1.6249768 0.0000000 1.0289973 2.8620320\n", "[1] 3.450857\n" ] } ], "source": [ "MLEmixed <- mixed(x, F, V, c(1,1,1,1), c(1,1,1,1,1), 1)\n", "print(MLEmixed$s2)\n", "print(norm_vec(MLEmixed$s2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### fdslrm (MMEinR)\n", "function `fitDiagFDSLRM` via parameter `\"mixed\"` implements R version of MATLAB `mixed` function" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:10.972000Z", "start_time": "2019-05-23T15:58:02.774Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Warning message in mixed(x, F, V, dimensions, variances_start, var_est_met):\n", "\"Estimated variance components are negative or zeros!\"" ] } ], "source": [ "# fit particular FDSLRM employing ML estimation of variances (by 'MMEinR (mixed)' function)\n", "MLEfitmixed <- fitDiagFDSLRM(x, t, freq_mean = c(1/24), freq_random = c(3/24, 4/24), \n", " fit_function = \"mixed\", var_estim_method = \"ML\")" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:11.014000Z", "start_time": "2019-05-23T15:58:02.777Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 2.8620320 0.1334323 1.6249768 0.0000000 1.0289973\n" ] }, { "data": { "text/html": [ "3.45085736426117" ], "text/latex": [ "3.45085736426117" ], "text/markdown": [ "3.45085736426117" ], "text/plain": [ "[1] 3.450857" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(as.vector(c(MLEfitmixed$error_variance, MLEfitmixed$rand_eff_variance)))\n", "norm_vec(as.vector(c(MLEfitmixed$error_variance, MLEfitmixed$rand_eff_variance)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***\n", "\n", "# REMLE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## nlme" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:11.095000Z", "start_time": "2019-05-23T15:58:02.781Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 3.339037e+00 9.368184e-02 1.585226e+00 2.134273e-09 9.892469e-01\n" ] }, { "data": { "text/html": [ "3.8274663089535" ], "text/latex": [ "3.8274663089535" ], "text/markdown": [ "3.8274663089535" ], "text/plain": [ "[1] 3.827466" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "REMLEnlme <- REMLE_nlme(x, F, V, \"REML\")\n", "print(REMLEnlme)\n", "norm_vec(REMLEnlme)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### fdslrm (nlme)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:11.206000Z", "start_time": "2019-05-23T15:58:02.784Z" } }, "outputs": [], "source": [ "# fit particular FDSLRM employing REML estimation of variances (by 'lme' function)\n", "REMLEfitnlme <- fitDiagFDSLRM(x, t, freq_mean = c(1/24), freq_random = c(3/24, 4/24), \n", " fit_function = \"lme\", var_estim_method = \"REML\")" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:11.253000Z", "start_time": "2019-05-23T15:58:02.787Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 3.339037e+00 9.368184e-02 1.585226e+00 2.134273e-09 9.892469e-01\n" ] }, { "data": { "text/html": [ "3.8274663089535" ], "text/latex": [ "3.8274663089535" ], "text/markdown": [ "3.8274663089535" ], "text/plain": [ "[1] 3.827466" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(as.vector(c(REMLEfitnlme$error_variance, diag(REMLEfitnlme$rand_eff_variance))))\n", "norm_vec(as.vector(c(REMLEfitnlme$error_variance, diag(REMLEfitnlme$rand_eff_variance))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## MMEinR" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:11.394000Z", "start_time": "2019-05-23T15:58:02.791Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Warning message in mixed(x, F, V, c(1, 1, 1, 1), c(1, 1, 1, 1, 1), 2):\n", "\"Estimated variance components are negative or zeros!\"" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[1] 9.368188e-02 1.585226e+00 -1.126023e-26 9.892469e-01 3.339037e+00\n", "[1] 3.827466\n" ] } ], "source": [ "REMLEmixed <- mixed(x, F, V, c(1,1,1,1), c(1,1,1,1,1), 2)\n", "print(REMLEmixed$s2)\n", "print(norm_vec(REMLEmixed$s2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### fdslrm (MMEinR)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:11.488000Z", "start_time": "2019-05-23T15:58:02.794Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Warning message in mixed(x, F, V, dimensions, variances_start, var_est_met):\n", "\"Estimated variance components are negative or zeros!\"" ] } ], "source": [ "# fit particular FDSLRM employing REML estimation of variances (by 'MMEinR (mixed)' function)\n", "REMLEfitmixed <- fitDiagFDSLRM(x, t, freq_mean = c(1/24), freq_random = c(3/24, 4/24), \n", " fit_function = \"mixed\", var_estim_method = \"REML\")" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:11.534000Z", "start_time": "2019-05-23T15:58:02.797Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 3.339037e+00 9.368188e-02 1.585226e+00 -1.126023e-26 9.892469e-01\n" ] }, { "data": { "text/html": [ "3.82746635990518" ], "text/latex": [ "3.82746635990518" ], "text/markdown": [ "3.82746635990518" ], "text/plain": [ "[1] 3.827466" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(as.vector(c(REMLEfitmixed$error_variance, REMLEfitmixed$rand_eff_variance)))\n", "norm_vec(as.vector(c(REMLEfitmixed$error_variance, REMLEfitmixed$rand_eff_variance)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## sommer: Solving Mixed Model Equations in R\n", "* _Purpose_: R package - structural multivariate-univariate linear mixed model solver for multiple random effects and estimation of unknown variance-covariance structures (i.e. heterogeneous and unstructured variance models). \n", " Designed for genomic prediction and genome wide association studies (GWAS), particularly focused in the $p$ > $n$ problem (more coefficients than observations) to include multiple known relationship matrices and estimate complex unknown covariance structures. Spatial models can be fitted using the two-dimensional spline functionality in sommer. \n", "\n", "* _Version_: 3.9.3, 2019\n", "* _URL_: https://CRAN.R-project.org/package=sommer\n", "* _Key tool_: `mmer` function for fitting\n", "* _computational parameters_ of `mmer`:\n", "> * `iters` - maximum number of iterations (default: 20).\n", " * `tolpar` - tolerance for the convergence criterion for the change in loglikelihood (default: 1e-03)\n", " * `tolparinv` - tolerance parameter for matrix inverse used when singularities are encountered (default:1e-06). \n", " * `method` - this refers to the method or algorithm to be used for estimating variance components (default: NR, direct-inversion Newton-Raphson method, alternative: AI, Average Information) \n", " * function return the estimates of variance parameters in a different order: $(\\tilde{\\nu}_1,\\ldots,\\tilde{\\nu}_l,\\tilde{\\nu}_0)'$ in comparison with other tools" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:11.571000Z", "start_time": "2019-05-23T15:58:02.800Z" } }, "outputs": [], "source": [ "library(sommer)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:11.605000Z", "start_time": "2019-05-23T15:58:02.802Z" } }, "outputs": [], "source": [ "REMLE_sommer <- function(X, F, V) {\n", " f1 <- F[,1]\n", " f2 <- F[,2]\n", " f3 <- F[,3]\n", " v1 <- V[,1]\n", " v2 <- V[,2]\n", " v3 <- V[,3]\n", " v4 <- V[,4]\n", " \n", " DT <- data.frame(X, f1, f2, f3, v1, v2, v3, v4)\n", "\n", " suppressWarnings(result_new <- mmer(fixed = X~f2+f3, random = ~ vs(ds(v1),1)+vs(ds(v2),1)+vs(ds(v3),1)+vs(ds(v4),1), \n", " data = DT, verbose = FALSE))\n", " \n", " output <- as.vector(unlist(result_new$sigma))\n", " \n", " return(output)\n", "\n", "}" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:11.690000Z", "start_time": "2019-05-23T15:58:02.805Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 0.09375687 1.58504405 0.00000000 0.98916742 3.33903029\n", "[1] 3.827366\n" ] } ], "source": [ "REMLEsommer <- REMLE_sommer(x, F, V)\n", "print(REMLEsommer)\n", "print(norm_vec(REMLEsommer))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### fdslrm (sommer)\n", "function `fitDiagFDSLRM` via parameter `\"mmer\"` implements `mmer` function from `sommer`" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:11.763000Z", "start_time": "2019-05-23T15:58:02.808Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mfixed-effect model matrix is rank deficient so dropping 1 columns / coefficients\n", "\u001b[39m" ] } ], "source": [ "# fit particular FDSLRM employing REML estimation of variances (by 'mmer' function)\n", "REMLEfitsommer <- suppressWarnings(fitDiagFDSLRM(x, t, freq_mean = c(1/24), freq_random = c(3/24, 4/24), \n", " fit_function = \"mmer\", var_estim_method = \"REML\"))" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "ExecuteTime": { "end_time": "2019-05-23T15:58:11.810000Z", "start_time": "2019-05-23T15:58:02.811Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] 3.33903029 0.09375687 1.58504405 0.00000000 0.98916742\n", "[1] 3.827366\n" ] } ], "source": [ "print(as.vector(c(REMLEfitsommer$error_variance, REMLEfitsommer$rand_eff_variance)))\n", "print(norm_vec(c(REMLEfitsommer$error_variance, REMLEfitsommer$rand_eff_variance)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***\n", "\n", "# References \n", "This notebook belongs to suplementary materials of the paper submitted to Statistical Papers and available at .\n", "\n", "* Hančová, M., Vozáriková, G., Gajdoš, A., Hanč, J. (2019). [Estimating variance components in time series\n", "\tlinear regression models using empirical BLUPs and convex optimization](https://arxiv.org/abs/1905.07771), https://arxiv.org/, 2019.\n", "\n", "### Abstract of the paper\n", "\n", "We propose a two-stage estimation method of variance components in time series models known as FDSLRMs, whose observations can be described by a linear mixed model (LMM). We based estimating variances, one of the steps in a time series forecasting approach called kriging, on the empirical (plug-in) best linear unbiased predictions of unobservable random components in FDSLRM. \n", "\n", "The method, providing invariant non-negative quadratic estimators, can be used for any absolutely continuous probability distribution of time series data. As a result of applying the convex optimization and the LMM methodology, we resolved two problems $-$ theoretical existence and equivalence between least squares estimators, non-negative (M)DOOLSE, and maximum likelihood estimators, (RE)MLE, as possible starting points of our method and a \n", "practical lack of computational implementation for FDSLRM. As for computing (RE)MLE in the case of $ n $ observed time series values, we also discovered a new algorithm of order $\\mathcal{O}(n)$, which at the default precision is $10^7$ times more accurate and $n^2$ times faster than the best current Python(or R)-based computational packages, namely CVXPY, CVXR, nlme, sommer and mixed. \n", "\n", "We illustrate our results on three real data sets $-$ electricity consumption, tourism and cyber security $-$ which are easily available, reproducible, sharable and modifiable in the form of interactive Jupyter notebooks.\n", "* Gajdoš, A., Hančová, M., Hanč, J. (2017). [Kriging Methodology and Its Development in Forecasting Econometric Time Series](https://www.czso.cz/csu/czso/statistika-statistics-and-economy-journal-no-12017). _Statistica: Statistics and Economy Journal_, 2017, 1, Vol. 97, pp. 59–73. \n", "$~$\n", "* Štulajter, F., Witkovský, V. (2004). [Estimation of Variances in Orthogonal Finite\n", "Discrete Spectrum Linear Regression Models](https://link.springer.com/article/10.1007/s001840300299). _Metrika_, 2004, vol. 60, num. 2, s. 105–118. ISSN 0026-1335." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| [Table of Contents](#table_of_contents) | [Data and model](#data_and_model) | [MLE](#mle) | [REMLE](#remle) | [References](#references) |" ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.5.1" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }