{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "| [Table of Contents](#table_of_contents) | [Data and model](#data_and_model) | [MLE](#mle) | [REMLE](#remle) | [References](#references) |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Authors:** Andrej Gajdoš, Jozef Hanč, Martina Hančová <br> *[Faculty of Science](https://www.upjs.sk/en/faculty-of-science/?prefferedLang=EN), P. J. Šafárik University in Košice, Slovakia* <br> email: [andrej.gajdos@student.upjs.sk](mailto:andrej.gajdos@student.upjs.sk)\n",
    "***\n",
    "**<font size=6 color=brown> MLE, REMLE for cyber attacks</font>**  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<font size=4> R-based computational tools - </font> **<font size=4> fdslrm, nlme, MMEinR(mixed), sommer</font>**  \n",
    "<a id=table_of_contents></a>\n",
    "###  Table of Contents \n",
    "* [Data and model](#data_and_model) - data and model description of empirical data\n",
    "* [MLE](#mle) - computation by standard packages *nlme*, *MMEinR(mixed)* and *fdslrm*\n",
    "* [REMLE](#remle) - computation by standard package *nlme*, *MMEinR(mixed)*, *sommer* and *fdslrm*\n",
    "* [References](#references)\n",
    "\n",
    "**To get back to the table of contents, use <font color=brown>the Home key</font>.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "<a id=data_and_model></a>\n",
    "# <font color=brown>Data and Model</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This FDSLRM application describes the real time series data set representing total weekly number of cyber attacks against a honeynet -- an unconventional tool which mimics real systems connected to Internet like business or school computers intranets to study methods, tools and goals of cyber attackers. \n",
    "\n",
    "Data, taken from from _Sokol, 2017_ were collected from November 2014 to May 2016 in CZ.NIC honeynet consisting of Kippo honeypots in medium-interaction mode. The number of time series observations is \n",
    "$n=72$. \n",
    "\n",
    "The suitable FDSLRM, after a preliminary logarithmic transformation of \n",
    "data $Z(t) = \\log X(t)$, is Gaussian orthogonal:\n",
    "\n",
    "$ \n",
    "\\begin{array}{rl}\n",
    "& Z(t) & \\!  = \\! &\\beta_1+\\beta_2\\cos\\left(\\tfrac{2\\pi t\\cdot 3}{72}\\right)+\\beta_3\\sin\\left(\\tfrac{2\\pi t\\cdot 3}{72}\\right)+\\beta_4\\sin\\left(\\tfrac{2\\pi t\\cdot 4}{72}\\right) + \\\\\n",
    "&     &     & +Y_1\\sin\\left(\\tfrac{2\\pi\\ t\\cdot 6}{72}\\right)+Y_2\\sin\\left(\\tfrac{2\\pi\\cdot t\\cdot 7}{72}\\right)+w(t), \\, t\\in \\mathbb{N}, \n",
    "\\end{array}\n",
    "$ \n",
    "\n",
    "where $\\boldsymbol{\\beta}=(\\beta_1,\\,\\beta_2,\\,\\beta_3,\\,\\beta_4)' \\in \\mathbb{R}^4, \\mathbf{Y} = (Y_1,Y_2)' \\sim \\mathcal{N}_2(\\boldsymbol{0}, \\mathrm{D}), w(t) \\sim \\mathcal{iid}\\, \\mathcal{N} (0, \\sigma_0^2), \\boldsymbol{\\nu}= (\\sigma_0^2, \\sigma_1^2, \\sigma_2^2) \\in \\mathbb{R}_{+}^3$.\n",
    "\n",
    "We identified the given and most parsimonious structure of the FDSLRM using an iterative process of the model building and selection based on exploratory tools of *spectral analysis* and *residual diagnostics* (for details see our Jupyter notebook `cyberattacks.ipynb`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:43.052000Z",
     "start_time": "2019-05-23T16:00:42.992Z"
    }
   },
   "outputs": [],
   "source": [
    "# data - time series observation\n",
    "x <- read.csv2('cyberattacks.csv', header = FALSE, sep = \",\")\n",
    "x <- as.numeric(as.vector(x[-1,]))\n",
    "x <- log(x)\n",
    "t <- 1:length(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:43.127000Z",
     "start_time": "2019-05-23T16:00:42.995Z"
    }
   },
   "outputs": [],
   "source": [
    "# auxilliary functions to create design matrices F, V and the projection matrix M_F\n",
    "makeF <- function(times, freqs) {\n",
    "\n",
    "        n <- length(times)\n",
    "        f <- length(freqs)\n",
    "        c1 <- matrix(1, n)\n",
    "        F <- matrix(c1, n, 1)\n",
    "\n",
    "        for (i in 1:f) {\n",
    "                F <- cbind(F, cos(2 * pi * freqs[i] * times))\n",
    "                F <- cbind(F, sin(2 * pi * freqs[i] * times))\n",
    "        }\n",
    "\n",
    "        return(F)\n",
    "}\n",
    "\n",
    "makeV <- function(times, freqs) {\n",
    "\n",
    "        V <- makeF(times, freqs)\n",
    "        V <- V[, -1]\n",
    "\n",
    "        return(V)\n",
    "}\n",
    "\n",
    "makeM_F <- function(F) {\n",
    "\n",
    "        n <- nrow(F)\n",
    "        c1 <- rep(1, n)\n",
    "        I <- diag(c1)\n",
    "        M_F <- I - F %*% solve((t(F) %*% F)) %*% t(F)\n",
    "\n",
    "        return(M_F)\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:43.187000Z",
     "start_time": "2019-05-23T16:00:42.997Z"
    }
   },
   "outputs": [],
   "source": [
    "# model parameters\n",
    "n <- 72 \n",
    "k <- 4 \n",
    "l <- 2 \n",
    "\n",
    "# model - design matrices F, V\n",
    "F <- makeF(t, c(3/72, 4/72))\n",
    "F <- F[,-4]\n",
    "\n",
    "V <- makeV(t, c(6/72, 7/72))\n",
    "V <- V[,-c(1,3)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "<a id=mle></a>\n",
    "# <font color=brown> MLE</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## nlme: Linear and Nonlinear Mixed Effects Models\n",
    "* _Purpose_: R package to fit and compare Gaussian linear and nonlinear mixed-effects models \n",
    "* _Version_: 3.1-140, 2019\n",
    "* _URL_: https://CRAN.R-project.org/package=nlme]\n",
    "* _Key tool_: `lme` function for fitting LMM\n",
    "* _Computational parameters_ of `lme`:\n",
    ">   * <font size=2> `maxIter` - maximum number of iterations  (default: 50).\n",
    "    * `msMaxIter` - maximum number of iterations for the optimization step (default: 50).\n",
    "    * `tolerance` - tolerance for the convergence criterion (default: 1e-6).\n",
    "    * `niterEM` - number of iterations for the EM algorithm used to refine the initial estimates of the random effects (default: 25).\n",
    "    * `msMaxEval` - maximum number of evaluations of the objective function permitted for optimizer nlminb (default: 200).\n",
    "    * `msTol` - tolerance for the convergence criterion on the first iteration when optim is used (default is 1e-7). \n",
    "    * `.relStep` - relative step for numerical derivatives calculations. Default is `.Machine$double.eps^(1/3)` while `.Machine$double.eps = 2.220446e-16`. \n",
    "    * `opt` - the optimizer to be used, either \"nlminb\" (the default) or \"optim\".\n",
    "    * `optimMethod`  - the optimization method, a version of quasi-Newton method to be used with the optim optimizer (default:\"BFGS\", alternative: \"L-BFGS-B\")\n",
    "    * `minAbsParApVar` - numeric value - minimum absolute parameter value in the approximate variance calculation (default: 0.05). <font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:43.286000Z",
     "start_time": "2019-05-23T16:00:43.000Z"
    }
   },
   "outputs": [],
   "source": [
    "library(nlme)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:43.320000Z",
     "start_time": "2019-05-23T16:00:43.002Z"
    }
   },
   "outputs": [],
   "source": [
    "REMLE_nlme <- function(X, F, V, method) {\n",
    "    g <- rep(1,length(X))\n",
    "    d <- data.frame(g,F,V,X)\n",
    "    colnames(d) <- c(\"g\",\"f1\",\"f2\",\"f3\",\"f4\",\"v1\",\"v2\",\"X\")\n",
    "    \n",
    "    result <- NULL\n",
    "    \n",
    "    if(method == \"ML\") {\n",
    "        result <- lme(fixed=X~f2+f3+f4,random=list(g=pdDiag(~-1+v1+v2)),data=d,method=\"ML\")\n",
    "    } else {\n",
    "        result <- lme(fixed=X~f2+f3+f4,random=list(g=pdDiag(~-1+v1+v2)),data=d,method=\"REML\")\n",
    "    }\n",
    "    \n",
    "    return(as.vector(c(result$sigma^2, diag(getVarCov(result)))))\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:43.357000Z",
     "start_time": "2019-05-23T16:00:43.006Z"
    }
   },
   "outputs": [],
   "source": [
    "# auxilliary function to calculate the norm of a vector\n",
    "norm_vec <- function(x) sqrt(sum(x^2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:43.437000Z",
     "start_time": "2019-05-23T16:00:43.011Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 0.05595104 0.02392048 0.01394113\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "0.0624264655696255"
      ],
      "text/latex": [
       "0.0624264655696255"
      ],
      "text/markdown": [
       "0.0624264655696255"
      ],
      "text/plain": [
       "[1] 0.06242647"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "MLEnlme <- REMLE_nlme(x, F, V, \"ML\")\n",
    "print(MLEnlme)\n",
    "norm_vec(MLEnlme)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## fdslrm: Time series analysis and forecasting using LMM\n",
    "    \n",
    "* _Purpose_: R package for modeling and prediction of time series using linear mixed models.\n",
    "* _Version_: 0.1.0, 2019\n",
    "* _Depends_: kableExtra, IRdisplay, MASS, Matrix, car, nlme, stats, forecast, fpp2, matrixcalc, sommer, gnm, pracma, CVXR\n",
    "* _Maintainer_: Andrej Gajdoš\n",
    "* _Authors_: Andrej Gajdoš, Jozef Hanč, Martina Hančová\n",
    "* _URL_: https://github.com/fdslrm/R-package\n",
    "* _Installation_: Run jupyter notebook `00 installation fdslrm.ipynb` once before the first run of any R-based Jupyter notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### fdslrm (nlme)\n",
    "function `fitDiagFDSLRM` via parameter `\"lme\"` implements `lme` function from `nlme`\n",
    "detailed explanat"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*<font size=2>Important note:</font>* \n",
    "<font size=2> A brief help on R functions of our fdslrm package designed to work with FDSLRM is in the modelling notebooks. After our testing, the most reliable way to install our fdslrm package in a Binder repository is its direct loading from GitHub. The standard installation of our fdslrm package as in the case of any R package on GitHub works without any problems in a local installation using Anaconda R distribution or CRAN distribution.</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:47.230000Z",
     "start_time": "2019-05-23T16:00:43.014Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "SHA-1 hash of file is f56b9d53e72a8575947a467930a2bdddb5b500ad\n"
     ]
    }
   ],
   "source": [
    "# loading all fdslrm functions as an R script from GiHub\n",
    "devtools::source_url(\"https://github.com/fdslrm/fdslrmAllinOne/blob/master/fdslrmAllinOne.R?raw=TRUE\")\n",
    "initialFDSLRM()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:49.341000Z",
     "start_time": "2019-05-23T16:00:43.017Z"
    }
   },
   "outputs": [],
   "source": [
    "# fit particular FDSLRM employing ML estimation of variances (by 'lme' function)\n",
    "MLEfitnlme <- fitDiagFDSLRM(x, t, freq_mean = c(3/72, 4/72), include_fixed_eff = c(1,1,0,1), freq_random = c(6/72, 7/72), \n",
    "                            include_random_eff = c(0,1,0,1), fit_function = \"lme\", var_estim_method = \"ML\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:49.389000Z",
     "start_time": "2019-05-23T16:00:43.019Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 0.05595104 0.02392048 0.01394113\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "0.0624264655696255"
      ],
      "text/latex": [
       "0.0624264655696255"
      ],
      "text/markdown": [
       "0.0624264655696255"
      ],
      "text/plain": [
       "[1] 0.06242647"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "print(as.vector(c(MLEfitnlme$error_variance, diag(MLEfitnlme$rand_eff_variance))))\n",
    "norm_vec(as.vector(c(MLEfitnlme$error_variance, diag(MLEfitnlme$rand_eff_variance))))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## MMEinR\n",
    "R version of Witkovský's MATLAB function `mixed` https://www.mathworks.com/matlabcentral/fileexchange/200\n",
    "* _Purpose_: R function to estimate parameters of a linear mixed model (LMM) with a simple variance components structure\n",
    "* _URL_: https://github.com/fdslrm/MMEinR\n",
    "* _Computational parameters_ of `MMEinR (mixed)`:\n",
    ">* <font size=2> tolerance for the convergence criterion (`epss = 1e-8`)\n",
    ">* iterative method solving Henderson's mixed model equations <font>\n",
    ">* function return the estimates of variance parameters in a different order: $(\\tilde{\\nu}_1,\\ldots,\\tilde{\\nu}_l,\\tilde{\\nu}_0)'$ in comparison with other tools"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:50.641000Z",
     "start_time": "2019-05-23T16:00:43.022Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "SHA-1 hash of file is 36b3b4cd1a1f98c5fb36b541fc6b3956f6663aba\n"
     ]
    }
   ],
   "source": [
    "# load function 'MMEinR'\n",
    "devtools::source_url(\"https://github.com/fdslrm/MMEinR/blob/master/MMEinR.R?raw=TRUE\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:50.693000Z",
     "start_time": "2019-05-23T16:00:43.024Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 0.02392048 0.01394113 0.05595104\n",
      "[1] 0.06242647\n"
     ]
    }
   ],
   "source": [
    "MLEmixed <- mixed(x, F, V, c(1,1), c(1,1,1), 1)\n",
    "print(MLEmixed$s2)\n",
    "print(norm_vec(MLEmixed$s2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### fdslrm (MMEinR)\n",
    "function `fitDiagFDSLRM` via parameter `\"mixed\"` implements R version of MATLAB `mixed` function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:50.750000Z",
     "start_time": "2019-05-23T16:00:43.026Z"
    }
   },
   "outputs": [],
   "source": [
    "# fit particular FDSLRM employing ML estimation of variances (by 'mixed' function)\n",
    "MLEfitmixed <- fitDiagFDSLRM(x, t, freq_mean = c(3/72, 4/72), include_fixed_eff = c(1,1,0,1), freq_random = c(6/72, 7/72), \n",
    "                             include_random_eff = c(0,1,0,1), fit_function = \"mixed\", var_estim_method = \"ML\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:50.796000Z",
     "start_time": "2019-05-23T16:00:43.029Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 0.05595104 0.02392048 0.01394113\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "0.0624264654844781"
      ],
      "text/latex": [
       "0.0624264654844781"
      ],
      "text/markdown": [
       "0.0624264654844781"
      ],
      "text/plain": [
       "[1] 0.06242647"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "print(as.vector(c(MLEfitmixed$error_variance, MLEfitmixed$rand_eff_variance)))\n",
    "norm_vec(as.vector(c(MLEfitmixed$error_variance, MLEfitmixed$rand_eff_variance)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "<a id=remle></a>\n",
    "# <font color=brown> REMLE</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## nlme"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:50.851000Z",
     "start_time": "2019-05-23T16:00:43.032Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 0.05934201 0.02382629 0.01384694\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "0.0654286193615291"
      ],
      "text/latex": [
       "0.0654286193615291"
      ],
      "text/markdown": [
       "0.0654286193615291"
      ],
      "text/plain": [
       "[1] 0.06542862"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "REMLEnlme <- REMLE_nlme(x, F, V, \"REML\")\n",
    "print(REMLEnlme)\n",
    "norm_vec(REMLEnlme)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### fdslrm (nlme)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:50.983000Z",
     "start_time": "2019-05-23T16:00:43.034Z"
    }
   },
   "outputs": [],
   "source": [
    "# fit particular FDSLRM employing REML estimation of variances (by 'lme' function)\n",
    "REMLEfitnlme <- fitDiagFDSLRM(x, t, freq_mean = c(3/72, 4/72), include_fixed_eff = c(1,1,0,1), freq_random = c(6/72, 7/72), \n",
    "                              include_random_eff = c(0,1,0,1), fit_function = \"lme\", var_estim_method = \"REML\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:51.028000Z",
     "start_time": "2019-05-23T16:00:43.037Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 0.05934201 0.02382629 0.01384694\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "0.0654286193615291"
      ],
      "text/latex": [
       "0.0654286193615291"
      ],
      "text/markdown": [
       "0.0654286193615291"
      ],
      "text/plain": [
       "[1] 0.06542862"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "print(as.vector(c(REMLEfitnlme$error_variance, diag(REMLEfitnlme$rand_eff_variance))))\n",
    "norm_vec(as.vector(c(REMLEfitnlme$error_variance, diag(REMLEfitnlme$rand_eff_variance))))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## MMEinR"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:51.096000Z",
     "start_time": "2019-05-23T16:00:43.040Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 0.02382629 0.01384694 0.05934201\n",
      "[1] 0.06542862\n"
     ]
    }
   ],
   "source": [
    "MLEmixed <- mixed(x, F, V, c(1,1), c(1,1,1), 2)\n",
    "print(MLEmixed$s2)\n",
    "print(norm_vec(MLEmixed$s2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  fdslrm (MMEinR)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:51.190000Z",
     "start_time": "2019-05-23T16:00:43.042Z"
    }
   },
   "outputs": [],
   "source": [
    "# fit particular FDSLRM employing REML estimation of variances (by 'MMEinR (mixed)' function)\n",
    "REMLEfitmixed <- fitDiagFDSLRM(x, t, freq_mean = c(3/72, 4/72), include_fixed_eff = c(1,1,0,1), freq_random = c(6/72, 7/72), \n",
    "                               include_random_eff = c(0,1,0,1), fit_function = \"mixed\", var_estim_method = \"REML\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:51.230000Z",
     "start_time": "2019-05-23T16:00:43.045Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 0.05934201 0.02382629 0.01384694\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "0.065428619190323"
      ],
      "text/latex": [
       "0.065428619190323"
      ],
      "text/markdown": [
       "0.065428619190323"
      ],
      "text/plain": [
       "[1] 0.06542862"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "print(as.vector(c(REMLEfitmixed$error_variance, REMLEfitmixed$rand_eff_variance)))\n",
    "norm_vec(as.vector(c(REMLEfitmixed$error_variance, REMLEfitmixed$rand_eff_variance)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## sommer: Solving Mixed Model Equations in R\n",
    "* _Purpose_: R package - structural multivariate-univariate linear mixed model solver for multiple random effects and estimation of unknown variance-covariance structures (i.e. heterogeneous and unstructured variance models).   \n",
    "<font size=2> Designed for genomic prediction and genome wide association studies (GWAS), particularly focused in the $p$ > $n$ problem (more coefficients than observations) to include multiple known relationship matrices and estimate complex unknown covariance structures. Spatial models can be fitted using the two-dimensional spline functionality in sommer. </font>\n",
    "\n",
    "* _Version_: 3.9.3, 2019\n",
    "* _URL_: https://CRAN.R-project.org/package=sommer\n",
    "* _Key tool_: `mmer` function for fitting\n",
    "* _computational parameters_ of `mmer`:\n",
    ">   * <font size=2> `iters` - maximum number of iterations (default: 20).\n",
    "    * `tolpar` - tolerance for the convergence criterion for the change in loglikelihood (default: 1e-03)\n",
    "    * `tolparinv` - tolerance parameter for matrix inverse used when singularities are encountered (default:1e-06). \n",
    "    * `method` - this refers to the method or algorithm to be used for estimating variance components (default: NR, direct-inversion Newton-Raphson method, alternative: AI,  Average Information) \n",
    "    * function return the estimates of variance parameters in a different order: $(\\tilde{\\nu}_1,\\ldots,\\tilde{\\nu}_l,\\tilde{\\nu}_0)'$ in comparison with other tools<font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:51.271000Z",
     "start_time": "2019-05-23T16:00:43.048Z"
    }
   },
   "outputs": [],
   "source": [
    "library(sommer)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:51.316000Z",
     "start_time": "2019-05-23T16:00:43.051Z"
    }
   },
   "outputs": [],
   "source": [
    "REMLE_sommer <- function(X, F, V) {\n",
    "    f1 <- F[,1]\n",
    "    f2 <- F[,2]\n",
    "    f3 <- F[,3]\n",
    "    f4 <- F[,4]\n",
    "    v1 <- V[,1]\n",
    "    v2 <- V[,2]\n",
    "    \n",
    "    DT <- data.frame(X, f1, f2, f3, f4, v1, v2)\n",
    "\n",
    "    suppressWarnings(result_new <- mmer(fixed = X~f2+f3+f4, random = ~ vs(ds(v1),1)+vs(ds(v2),1), data = DT, verbose = FALSE))\n",
    "    \n",
    "    output <- as.vector(unlist(result_new$sigma))\n",
    "    \n",
    "    return(output)\n",
    "\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:51.419000Z",
     "start_time": "2019-05-23T16:00:43.053Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 0.02382450 0.01384666 0.05934209\n",
      "[1] 0.06542798\n"
     ]
    }
   ],
   "source": [
    "REMLEsommer <- REMLE_sommer(x, F, V)\n",
    "print(REMLEsommer)\n",
    "print(norm_vec(REMLEsommer))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### fdslrm (sommer)\n",
    "function `fitDiagFDSLRM` via parameter `\"mmer\"` implements `mmer` function from `sommer`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:51.517000Z",
     "start_time": "2019-05-23T16:00:43.056Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[34mfixed-effect model matrix is rank deficient so dropping 1 columns / coefficients\n",
      "\u001b[39m"
     ]
    }
   ],
   "source": [
    "# fit particular FDSLRM employing REML estimation of variances (by 'mmer' function)\n",
    "REMLEfitsommer <- suppressWarnings(fitDiagFDSLRM(x, t, freq_mean = c(3/72, 4/72), include_fixed_eff = c(1,1,0,1), \n",
    "                                                 freq_random = c(6/72, 7/72), include_random_eff = c(0,1,0,1), \n",
    "                                                 fit_function = \"mmer\", var_estim_method = \"REML\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-05-23T16:00:51.559000Z",
     "start_time": "2019-05-23T16:00:43.058Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] 0.05934209 0.02382450 0.01384666\n",
      "[1] 0.06542798\n"
     ]
    }
   ],
   "source": [
    "print(as.vector(c(REMLEfitsommer$error_variance, REMLEfitsommer$rand_eff_variance)))\n",
    "print(norm_vec(c(REMLEfitsommer$error_variance, REMLEfitsommer$rand_eff_variance)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "<a id=references></a>\n",
    "# <font color=brown> References </font>\n",
    "This notebook belongs to suplementary materials of the paper submitted to Statistical Papers and available at <https://arxiv.org/abs/1905.07771>.\n",
    "\n",
    "* Hančová, M., Vozáriková, G., Gajdoš, A., Hanč, J. (2019). [Estimating variance components in time series\n",
    "\tlinear regression models using empirical BLUPs and convex optimization](https://arxiv.org/abs/1905.07771), https://arxiv.org/, 2019.\n",
    "\n",
    "### Abstract of the paper\n",
    "\n",
    "We propose a two-stage estimation method of variance components in time series models known as FDSLRMs, whose observations can be described by a linear mixed model (LMM). We based estimating variances, fundamental quantities in a time series forecasting approach called kriging, on the empirical (plug-in) best linear unbiased predictions of unobservable random components in FDSLRM. \n",
    "\n",
    "The method, providing invariant non-negative quadratic estimators, can be used for any absolutely continuous probability distribution of time series data. As a result of applying the convex optimization and the LMM methodology, we resolved two problems $-$ theoretical existence and equivalence between least squares estimators, non-negative (M)DOOLSE, and maximum likelihood estimators, (RE)MLE, as possible starting points of our method and a \n",
    "practical lack of computational implementation for FDSLRM. As for computing (RE)MLE in the case of $ n $ observed time series values, we also discovered a new algorithm of order $\\mathcal{O}(n)$, which at the default precision is $10^7$ times more accurate and $n^2$ times faster than the best current Python(or R)-based computational packages, namely CVXPY, CVXR, nlme, sommer and mixed. \n",
    "\n",
    "We illustrate our results on three real data sets $-$ electricity consumption, tourism and cyber security $-$ which are easily available, reproducible, sharable and modifiable in the form of interactive Jupyter notebooks."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Sokol P., Gajdoš, A. (2017). [Prediction of Attacks Against Honeynet Based on Time Series Modeling](https://www.springer.com/gp/book/9783319676203). Silhavy, R., Silhavy, P., & Prokopova, Z. (Eds.). (2017). _Applied Computational Intelligence and Mathematical Methods (Vol. 662)_. Cham: Springer International Publishing, pp. 360-371"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "| [Table of Contents](#table_of_contents) | [Data and model](#data_and_model) | [MLE](#mle) | [REMLE](#remle) | [References](#references) |"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "3.5.1"
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
   "autoclose": false,
   "autocomplete": true,
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 1,
   "hotkeys": {
    "equation": "Ctrl-E",
    "itemize": "Ctrl-I"
   },
   "labels_anchors": false,
   "latex_user_defs": false,
   "report_style_numbering": false,
   "user_envs_cfg": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}