{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"| [Table of Contents](#table_of_contents) | [Data and model](#data_and_model) | [MLE](#mle) | [REMLE](#remle) | [References](#references) |"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Authors:** Andrej Gajdoš, Jozef Hanč, Martina Hančová
*[Faculty of Science](https://www.upjs.sk/en/faculty-of-science/?prefferedLang=EN), P. J. Šafárik University in Košice, Slovakia*
email: [andrej.gajdos@student.upjs.sk](mailto:andrej.gajdos@student.upjs.sk)\n",
"***\n",
"** MLE, REMLE for cyber attacks** "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" R-based computational tools - ** fdslrm, nlme, MMEinR(mixed), sommer** \n",
"\n",
"### Table of Contents \n",
"* [Data and model](#data_and_model) - data and model description of empirical data\n",
"* [MLE](#mle) - computation by standard packages *nlme*, *MMEinR(mixed)* and *fdslrm*\n",
"* [REMLE](#remle) - computation by standard package *nlme*, *MMEinR(mixed)*, *sommer* and *fdslrm*\n",
"* [References](#references)\n",
"\n",
"**To get back to the table of contents, use the Home key.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***\n",
"\n",
"# Data and Model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This FDSLRM application describes the real time series data set representing total weekly number of cyber attacks against a honeynet -- an unconventional tool which mimics real systems connected to Internet like business or school computers intranets to study methods, tools and goals of cyber attackers. \n",
"\n",
"Data, taken from from _Sokol, 2017_ were collected from November 2014 to May 2016 in CZ.NIC honeynet consisting of Kippo honeypots in medium-interaction mode. The number of time series observations is \n",
"$n=72$. \n",
"\n",
"The suitable FDSLRM, after a preliminary logarithmic transformation of \n",
"data $Z(t) = \\log X(t)$, is Gaussian orthogonal:\n",
"\n",
"$ \n",
"\\begin{array}{rl}\n",
"& Z(t) & \\! = \\! &\\beta_1+\\beta_2\\cos\\left(\\tfrac{2\\pi t\\cdot 3}{72}\\right)+\\beta_3\\sin\\left(\\tfrac{2\\pi t\\cdot 3}{72}\\right)+\\beta_4\\sin\\left(\\tfrac{2\\pi t\\cdot 4}{72}\\right) + \\\\\n",
"& & & +Y_1\\sin\\left(\\tfrac{2\\pi\\ t\\cdot 6}{72}\\right)+Y_2\\sin\\left(\\tfrac{2\\pi\\cdot t\\cdot 7}{72}\\right)+w(t), \\, t\\in \\mathbb{N}, \n",
"\\end{array}\n",
"$ \n",
"\n",
"where $\\boldsymbol{\\beta}=(\\beta_1,\\,\\beta_2,\\,\\beta_3,\\,\\beta_4)' \\in \\mathbb{R}^4, \\mathbf{Y} = (Y_1,Y_2)' \\sim \\mathcal{N}_2(\\boldsymbol{0}, \\mathrm{D}), w(t) \\sim \\mathcal{iid}\\, \\mathcal{N} (0, \\sigma_0^2), \\boldsymbol{\\nu}= (\\sigma_0^2, \\sigma_1^2, \\sigma_2^2) \\in \\mathbb{R}_{+}^3$.\n",
"\n",
"We identified the given and most parsimonious structure of the FDSLRM using an iterative process of the model building and selection based on exploratory tools of *spectral analysis* and *residual diagnostics* (for details see our Jupyter notebook `cyberattacks.ipynb`)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:43.052000Z",
"start_time": "2019-05-23T16:00:42.992Z"
}
},
"outputs": [],
"source": [
"# data - time series observation\n",
"x <- read.csv2('cyberattacks.csv', header = FALSE, sep = \",\")\n",
"x <- as.numeric(as.vector(x[-1,]))\n",
"x <- log(x)\n",
"t <- 1:length(x)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:43.127000Z",
"start_time": "2019-05-23T16:00:42.995Z"
}
},
"outputs": [],
"source": [
"# auxilliary functions to create design matrices F, V and the projection matrix M_F\n",
"makeF <- function(times, freqs) {\n",
"\n",
" n <- length(times)\n",
" f <- length(freqs)\n",
" c1 <- matrix(1, n)\n",
" F <- matrix(c1, n, 1)\n",
"\n",
" for (i in 1:f) {\n",
" F <- cbind(F, cos(2 * pi * freqs[i] * times))\n",
" F <- cbind(F, sin(2 * pi * freqs[i] * times))\n",
" }\n",
"\n",
" return(F)\n",
"}\n",
"\n",
"makeV <- function(times, freqs) {\n",
"\n",
" V <- makeF(times, freqs)\n",
" V <- V[, -1]\n",
"\n",
" return(V)\n",
"}\n",
"\n",
"makeM_F <- function(F) {\n",
"\n",
" n <- nrow(F)\n",
" c1 <- rep(1, n)\n",
" I <- diag(c1)\n",
" M_F <- I - F %*% solve((t(F) %*% F)) %*% t(F)\n",
"\n",
" return(M_F)\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:43.187000Z",
"start_time": "2019-05-23T16:00:42.997Z"
}
},
"outputs": [],
"source": [
"# model parameters\n",
"n <- 72 \n",
"k <- 4 \n",
"l <- 2 \n",
"\n",
"# model - design matrices F, V\n",
"F <- makeF(t, c(3/72, 4/72))\n",
"F <- F[,-4]\n",
"\n",
"V <- makeV(t, c(6/72, 7/72))\n",
"V <- V[,-c(1,3)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***\n",
"\n",
"# MLE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## nlme: Linear and Nonlinear Mixed Effects Models\n",
"* _Purpose_: R package to fit and compare Gaussian linear and nonlinear mixed-effects models \n",
"* _Version_: 3.1-140, 2019\n",
"* _URL_: https://CRAN.R-project.org/package=nlme]\n",
"* _Key tool_: `lme` function for fitting LMM\n",
"* _Computational parameters_ of `lme`:\n",
"> * `maxIter` - maximum number of iterations (default: 50).\n",
" * `msMaxIter` - maximum number of iterations for the optimization step (default: 50).\n",
" * `tolerance` - tolerance for the convergence criterion (default: 1e-6).\n",
" * `niterEM` - number of iterations for the EM algorithm used to refine the initial estimates of the random effects (default: 25).\n",
" * `msMaxEval` - maximum number of evaluations of the objective function permitted for optimizer nlminb (default: 200).\n",
" * `msTol` - tolerance for the convergence criterion on the first iteration when optim is used (default is 1e-7). \n",
" * `.relStep` - relative step for numerical derivatives calculations. Default is `.Machine$double.eps^(1/3)` while `.Machine$double.eps = 2.220446e-16`. \n",
" * `opt` - the optimizer to be used, either \"nlminb\" (the default) or \"optim\".\n",
" * `optimMethod` - the optimization method, a version of quasi-Newton method to be used with the optim optimizer (default:\"BFGS\", alternative: \"L-BFGS-B\")\n",
" * `minAbsParApVar` - numeric value - minimum absolute parameter value in the approximate variance calculation (default: 0.05). "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:43.286000Z",
"start_time": "2019-05-23T16:00:43.000Z"
}
},
"outputs": [],
"source": [
"library(nlme)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:43.320000Z",
"start_time": "2019-05-23T16:00:43.002Z"
}
},
"outputs": [],
"source": [
"REMLE_nlme <- function(X, F, V, method) {\n",
" g <- rep(1,length(X))\n",
" d <- data.frame(g,F,V,X)\n",
" colnames(d) <- c(\"g\",\"f1\",\"f2\",\"f3\",\"f4\",\"v1\",\"v2\",\"X\")\n",
" \n",
" result <- NULL\n",
" \n",
" if(method == \"ML\") {\n",
" result <- lme(fixed=X~f2+f3+f4,random=list(g=pdDiag(~-1+v1+v2)),data=d,method=\"ML\")\n",
" } else {\n",
" result <- lme(fixed=X~f2+f3+f4,random=list(g=pdDiag(~-1+v1+v2)),data=d,method=\"REML\")\n",
" }\n",
" \n",
" return(as.vector(c(result$sigma^2, diag(getVarCov(result)))))\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:43.357000Z",
"start_time": "2019-05-23T16:00:43.006Z"
}
},
"outputs": [],
"source": [
"# auxilliary function to calculate the norm of a vector\n",
"norm_vec <- function(x) sqrt(sum(x^2))"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:43.437000Z",
"start_time": "2019-05-23T16:00:43.011Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.05595104 0.02392048 0.01394113\n"
]
},
{
"data": {
"text/html": [
"0.0624264655696255"
],
"text/latex": [
"0.0624264655696255"
],
"text/markdown": [
"0.0624264655696255"
],
"text/plain": [
"[1] 0.06242647"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"MLEnlme <- REMLE_nlme(x, F, V, \"ML\")\n",
"print(MLEnlme)\n",
"norm_vec(MLEnlme)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## fdslrm: Time series analysis and forecasting using LMM\n",
" \n",
"* _Purpose_: R package for modeling and prediction of time series using linear mixed models.\n",
"* _Version_: 0.1.0, 2019\n",
"* _Depends_: kableExtra, IRdisplay, MASS, Matrix, car, nlme, stats, forecast, fpp2, matrixcalc, sommer, gnm, pracma, CVXR\n",
"* _Maintainer_: Andrej Gajdoš\n",
"* _Authors_: Andrej Gajdoš, Jozef Hanč, Martina Hančová\n",
"* _URL_: https://github.com/fdslrm/R-package\n",
"* _Installation_: Run jupyter notebook `00 installation fdslrm.ipynb` once before the first run of any R-based Jupyter notebook."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### fdslrm (nlme)\n",
"function `fitDiagFDSLRM` via parameter `\"lme\"` implements `lme` function from `nlme`\n",
"detailed explanat"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Important note:* \n",
" A brief help on R functions of our fdslrm package designed to work with FDSLRM is in the modelling notebooks. After our testing, the most reliable way to install our fdslrm package in a Binder repository is its direct loading from GitHub. The standard installation of our fdslrm package as in the case of any R package on GitHub works without any problems in a local installation using Anaconda R distribution or CRAN distribution."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:47.230000Z",
"start_time": "2019-05-23T16:00:43.014Z"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"SHA-1 hash of file is f56b9d53e72a8575947a467930a2bdddb5b500ad\n"
]
}
],
"source": [
"# loading all fdslrm functions as an R script from GiHub\n",
"devtools::source_url(\"https://github.com/fdslrm/fdslrmAllinOne/blob/master/fdslrmAllinOne.R?raw=TRUE\")\n",
"initialFDSLRM()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:49.341000Z",
"start_time": "2019-05-23T16:00:43.017Z"
}
},
"outputs": [],
"source": [
"# fit particular FDSLRM employing ML estimation of variances (by 'lme' function)\n",
"MLEfitnlme <- fitDiagFDSLRM(x, t, freq_mean = c(3/72, 4/72), include_fixed_eff = c(1,1,0,1), freq_random = c(6/72, 7/72), \n",
" include_random_eff = c(0,1,0,1), fit_function = \"lme\", var_estim_method = \"ML\")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:49.389000Z",
"start_time": "2019-05-23T16:00:43.019Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.05595104 0.02392048 0.01394113\n"
]
},
{
"data": {
"text/html": [
"0.0624264655696255"
],
"text/latex": [
"0.0624264655696255"
],
"text/markdown": [
"0.0624264655696255"
],
"text/plain": [
"[1] 0.06242647"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"print(as.vector(c(MLEfitnlme$error_variance, diag(MLEfitnlme$rand_eff_variance))))\n",
"norm_vec(as.vector(c(MLEfitnlme$error_variance, diag(MLEfitnlme$rand_eff_variance))))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## MMEinR\n",
"R version of Witkovský's MATLAB function `mixed` https://www.mathworks.com/matlabcentral/fileexchange/200\n",
"* _Purpose_: R function to estimate parameters of a linear mixed model (LMM) with a simple variance components structure\n",
"* _URL_: https://github.com/fdslrm/MMEinR\n",
"* _Computational parameters_ of `MMEinR (mixed)`:\n",
">* tolerance for the convergence criterion (`epss = 1e-8`)\n",
">* iterative method solving Henderson's mixed model equations \n",
">* function return the estimates of variance parameters in a different order: $(\\tilde{\\nu}_1,\\ldots,\\tilde{\\nu}_l,\\tilde{\\nu}_0)'$ in comparison with other tools"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:50.641000Z",
"start_time": "2019-05-23T16:00:43.022Z"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"SHA-1 hash of file is 36b3b4cd1a1f98c5fb36b541fc6b3956f6663aba\n"
]
}
],
"source": [
"# load function 'MMEinR'\n",
"devtools::source_url(\"https://github.com/fdslrm/MMEinR/blob/master/MMEinR.R?raw=TRUE\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:50.693000Z",
"start_time": "2019-05-23T16:00:43.024Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.02392048 0.01394113 0.05595104\n",
"[1] 0.06242647\n"
]
}
],
"source": [
"MLEmixed <- mixed(x, F, V, c(1,1), c(1,1,1), 1)\n",
"print(MLEmixed$s2)\n",
"print(norm_vec(MLEmixed$s2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### fdslrm (MMEinR)\n",
"function `fitDiagFDSLRM` via parameter `\"mixed\"` implements R version of MATLAB `mixed` function"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:50.750000Z",
"start_time": "2019-05-23T16:00:43.026Z"
}
},
"outputs": [],
"source": [
"# fit particular FDSLRM employing ML estimation of variances (by 'mixed' function)\n",
"MLEfitmixed <- fitDiagFDSLRM(x, t, freq_mean = c(3/72, 4/72), include_fixed_eff = c(1,1,0,1), freq_random = c(6/72, 7/72), \n",
" include_random_eff = c(0,1,0,1), fit_function = \"mixed\", var_estim_method = \"ML\")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:50.796000Z",
"start_time": "2019-05-23T16:00:43.029Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.05595104 0.02392048 0.01394113\n"
]
},
{
"data": {
"text/html": [
"0.0624264654844781"
],
"text/latex": [
"0.0624264654844781"
],
"text/markdown": [
"0.0624264654844781"
],
"text/plain": [
"[1] 0.06242647"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"print(as.vector(c(MLEfitmixed$error_variance, MLEfitmixed$rand_eff_variance)))\n",
"norm_vec(as.vector(c(MLEfitmixed$error_variance, MLEfitmixed$rand_eff_variance)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***\n",
"\n",
"# REMLE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## nlme"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:50.851000Z",
"start_time": "2019-05-23T16:00:43.032Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.05934201 0.02382629 0.01384694\n"
]
},
{
"data": {
"text/html": [
"0.0654286193615291"
],
"text/latex": [
"0.0654286193615291"
],
"text/markdown": [
"0.0654286193615291"
],
"text/plain": [
"[1] 0.06542862"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"REMLEnlme <- REMLE_nlme(x, F, V, \"REML\")\n",
"print(REMLEnlme)\n",
"norm_vec(REMLEnlme)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### fdslrm (nlme)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:50.983000Z",
"start_time": "2019-05-23T16:00:43.034Z"
}
},
"outputs": [],
"source": [
"# fit particular FDSLRM employing REML estimation of variances (by 'lme' function)\n",
"REMLEfitnlme <- fitDiagFDSLRM(x, t, freq_mean = c(3/72, 4/72), include_fixed_eff = c(1,1,0,1), freq_random = c(6/72, 7/72), \n",
" include_random_eff = c(0,1,0,1), fit_function = \"lme\", var_estim_method = \"REML\")"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:51.028000Z",
"start_time": "2019-05-23T16:00:43.037Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.05934201 0.02382629 0.01384694\n"
]
},
{
"data": {
"text/html": [
"0.0654286193615291"
],
"text/latex": [
"0.0654286193615291"
],
"text/markdown": [
"0.0654286193615291"
],
"text/plain": [
"[1] 0.06542862"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"print(as.vector(c(REMLEfitnlme$error_variance, diag(REMLEfitnlme$rand_eff_variance))))\n",
"norm_vec(as.vector(c(REMLEfitnlme$error_variance, diag(REMLEfitnlme$rand_eff_variance))))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## MMEinR"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:51.096000Z",
"start_time": "2019-05-23T16:00:43.040Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.02382629 0.01384694 0.05934201\n",
"[1] 0.06542862\n"
]
}
],
"source": [
"MLEmixed <- mixed(x, F, V, c(1,1), c(1,1,1), 2)\n",
"print(MLEmixed$s2)\n",
"print(norm_vec(MLEmixed$s2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### fdslrm (MMEinR)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:51.190000Z",
"start_time": "2019-05-23T16:00:43.042Z"
}
},
"outputs": [],
"source": [
"# fit particular FDSLRM employing REML estimation of variances (by 'MMEinR (mixed)' function)\n",
"REMLEfitmixed <- fitDiagFDSLRM(x, t, freq_mean = c(3/72, 4/72), include_fixed_eff = c(1,1,0,1), freq_random = c(6/72, 7/72), \n",
" include_random_eff = c(0,1,0,1), fit_function = \"mixed\", var_estim_method = \"REML\")"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:51.230000Z",
"start_time": "2019-05-23T16:00:43.045Z"
},
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.05934201 0.02382629 0.01384694\n"
]
},
{
"data": {
"text/html": [
"0.065428619190323"
],
"text/latex": [
"0.065428619190323"
],
"text/markdown": [
"0.065428619190323"
],
"text/plain": [
"[1] 0.06542862"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"print(as.vector(c(REMLEfitmixed$error_variance, REMLEfitmixed$rand_eff_variance)))\n",
"norm_vec(as.vector(c(REMLEfitmixed$error_variance, REMLEfitmixed$rand_eff_variance)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## sommer: Solving Mixed Model Equations in R\n",
"* _Purpose_: R package - structural multivariate-univariate linear mixed model solver for multiple random effects and estimation of unknown variance-covariance structures (i.e. heterogeneous and unstructured variance models). \n",
" Designed for genomic prediction and genome wide association studies (GWAS), particularly focused in the $p$ > $n$ problem (more coefficients than observations) to include multiple known relationship matrices and estimate complex unknown covariance structures. Spatial models can be fitted using the two-dimensional spline functionality in sommer. \n",
"\n",
"* _Version_: 3.9.3, 2019\n",
"* _URL_: https://CRAN.R-project.org/package=sommer\n",
"* _Key tool_: `mmer` function for fitting\n",
"* _computational parameters_ of `mmer`:\n",
"> * `iters` - maximum number of iterations (default: 20).\n",
" * `tolpar` - tolerance for the convergence criterion for the change in loglikelihood (default: 1e-03)\n",
" * `tolparinv` - tolerance parameter for matrix inverse used when singularities are encountered (default:1e-06). \n",
" * `method` - this refers to the method or algorithm to be used for estimating variance components (default: NR, direct-inversion Newton-Raphson method, alternative: AI, Average Information) \n",
" * function return the estimates of variance parameters in a different order: $(\\tilde{\\nu}_1,\\ldots,\\tilde{\\nu}_l,\\tilde{\\nu}_0)'$ in comparison with other tools"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:51.271000Z",
"start_time": "2019-05-23T16:00:43.048Z"
}
},
"outputs": [],
"source": [
"library(sommer)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:51.316000Z",
"start_time": "2019-05-23T16:00:43.051Z"
}
},
"outputs": [],
"source": [
"REMLE_sommer <- function(X, F, V) {\n",
" f1 <- F[,1]\n",
" f2 <- F[,2]\n",
" f3 <- F[,3]\n",
" f4 <- F[,4]\n",
" v1 <- V[,1]\n",
" v2 <- V[,2]\n",
" \n",
" DT <- data.frame(X, f1, f2, f3, f4, v1, v2)\n",
"\n",
" suppressWarnings(result_new <- mmer(fixed = X~f2+f3+f4, random = ~ vs(ds(v1),1)+vs(ds(v2),1), data = DT, verbose = FALSE))\n",
" \n",
" output <- as.vector(unlist(result_new$sigma))\n",
" \n",
" return(output)\n",
"\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:51.419000Z",
"start_time": "2019-05-23T16:00:43.053Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.02382450 0.01384666 0.05934209\n",
"[1] 0.06542798\n"
]
}
],
"source": [
"REMLEsommer <- REMLE_sommer(x, F, V)\n",
"print(REMLEsommer)\n",
"print(norm_vec(REMLEsommer))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### fdslrm (sommer)\n",
"function `fitDiagFDSLRM` via parameter `\"mmer\"` implements `mmer` function from `sommer`"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:51.517000Z",
"start_time": "2019-05-23T16:00:43.056Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[34mfixed-effect model matrix is rank deficient so dropping 1 columns / coefficients\n",
"\u001b[39m"
]
}
],
"source": [
"# fit particular FDSLRM employing REML estimation of variances (by 'mmer' function)\n",
"REMLEfitsommer <- suppressWarnings(fitDiagFDSLRM(x, t, freq_mean = c(3/72, 4/72), include_fixed_eff = c(1,1,0,1), \n",
" freq_random = c(6/72, 7/72), include_random_eff = c(0,1,0,1), \n",
" fit_function = \"mmer\", var_estim_method = \"REML\"))"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"ExecuteTime": {
"end_time": "2019-05-23T16:00:51.559000Z",
"start_time": "2019-05-23T16:00:43.058Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1] 0.05934209 0.02382450 0.01384666\n",
"[1] 0.06542798\n"
]
}
],
"source": [
"print(as.vector(c(REMLEfitsommer$error_variance, REMLEfitsommer$rand_eff_variance)))\n",
"print(norm_vec(c(REMLEfitsommer$error_variance, REMLEfitsommer$rand_eff_variance)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***\n",
"\n",
"# References \n",
"This notebook belongs to suplementary materials of the paper submitted to Statistical Papers and available at .\n",
"\n",
"* Hančová, M., Vozáriková, G., Gajdoš, A., Hanč, J. (2019). [Estimating variance components in time series\n",
"\tlinear regression models using empirical BLUPs and convex optimization](https://arxiv.org/abs/1905.07771), https://arxiv.org/, 2019.\n",
"\n",
"### Abstract of the paper\n",
"\n",
"We propose a two-stage estimation method of variance components in time series models known as FDSLRMs, whose observations can be described by a linear mixed model (LMM). We based estimating variances, fundamental quantities in a time series forecasting approach called kriging, on the empirical (plug-in) best linear unbiased predictions of unobservable random components in FDSLRM. \n",
"\n",
"The method, providing invariant non-negative quadratic estimators, can be used for any absolutely continuous probability distribution of time series data. As a result of applying the convex optimization and the LMM methodology, we resolved two problems $-$ theoretical existence and equivalence between least squares estimators, non-negative (M)DOOLSE, and maximum likelihood estimators, (RE)MLE, as possible starting points of our method and a \n",
"practical lack of computational implementation for FDSLRM. As for computing (RE)MLE in the case of $ n $ observed time series values, we also discovered a new algorithm of order $\\mathcal{O}(n)$, which at the default precision is $10^7$ times more accurate and $n^2$ times faster than the best current Python(or R)-based computational packages, namely CVXPY, CVXR, nlme, sommer and mixed. \n",
"\n",
"We illustrate our results on three real data sets $-$ electricity consumption, tourism and cyber security $-$ which are easily available, reproducible, sharable and modifiable in the form of interactive Jupyter notebooks."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Sokol P., Gajdoš, A. (2017). [Prediction of Attacks Against Honeynet Based on Time Series Modeling](https://www.springer.com/gp/book/9783319676203). Silhavy, R., Silhavy, P., & Prokopova, Z. (Eds.). (2017). _Applied Computational Intelligence and Mathematical Methods (Vol. 662)_. Cham: Springer International Publishing, pp. 360-371"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"| [Table of Contents](#table_of_contents) | [Data and model](#data_and_model) | [MLE](#mle) | [REMLE](#remle) | [References](#references) |"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "3.5.1"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autoclose": false,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 1
}