{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# The emh Rpackage" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The emh package allows you to test the effeciency of any univariate zoo time series object in R.\n", "\n", "The package achieves this using the following methodology,\n", "\n", "* Downsample the data into a number of subfrequencies\n", "* Run a suite of statistical tests of randomness on each subfrequency\n", "* Aggregate the results (p values, Z scores, etc.) in a data.frame and return it\n", "\n", "In addition to randomness tests emh also includes a number of stochastic process models." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installing the package" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first step to using the package is to download it and install it using the devtools R package" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "library(devtools)\n", "suppressMessages(install_github(repo=\"stuartgordonreid/emh\", \n", " force = TRUE))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now check that you can load the package," ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "suppressMessages(library(emh))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The emh package includes a few functions which allow you to download a bunch of global stock market indices from Quandl.com right off the bat. You can, of course, also pass in your own data. I recommend sticking with zoo objects when using emh because of how the downsampling works." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"DOWNLOADING DATASETS ...\"\n", "\r", " | \r", " | | 0%\r", " | \r", " |= | 2%\r", " | \r", " |=== | 4%\r", " | \r", " |==== | 6%\r", " | \r", " |===== | 8%\r", " | \r", " |======= | 10%\r", " | \r", " |======== | 12%\r", " | \r", " |========== | 14%\r", " | \r", " |=========== | 16%\r", " | \r", " |============ | 18%\r", " | \r", " |============== | 20%\r", " | \r", " |=============== | 22%\r", " | \r", " |================ | 24%\r", " | \r", " |================== | 25%\r", " | \r", " |=================== | 27%\r", " | \r", " |===================== | 29%\r", " | \r", " |====================== | 31%\r", " | \r", " |======================= | 33%\r", " | \r", " |========================= | 35%\r", " | \r", " |========================== | 37%\r", " | \r", " |=========================== | 39%\r", " | \r", " |============================= | 41%\r", " | \r", " |============================== | 43%\r", " | \r", " |================================ | 45%\r", " | \r", " |================================= | 47%\r", " | \r", " |================================== | 49%\r", " | \r", " |==================================== | 51%\r", " | \r", " |===================================== | 53%\r", " | \r", " |====================================== | 55%\r", " | \r", " |======================================== | 57%\r", " | \r", " |========================================= | 59%\r", " | \r", " |=========================================== | 61%\r", " | \r", " |============================================ | 63%\r", " | \r", " |============================================= | 65%\r", " | \r", " |=============================================== | 67%\r", " | \r", " |================================================ | 69%\r", " | \r", " |================================================= | 71%\r", " | \r", " |=================================================== | 73%\r", " | \r", " |==================================================== | 75%\r", " | \r", " |====================================================== | 76%\r", " | \r", " |======================================================= | 78%\r", " | \r", " |======================================================== | 80%\r", " | \r", " |========================================================== | 82%\r", " | \r", " |=========================================================== | 84%\r", " | \r", " |============================================================ | 86%\r", " | \r", " |============================================================== | 88%\r", " | \r", " |=============================================================== | 90%\r", " | \r", " |================================================================= | 92%\r", " | \r", " |================================================================== | 94%\r", " | \r", " |=================================================================== | 96%\r", " | \r", " |===================================================================== | 98%\r", " | \r", " |======================================================================| 100%" ] } ], "source": [ "# This may take some time. Use the S3, $ operator to see the datasets.\n", "global_indices <- emh::data_quandl_downloader(data_quandl_indices())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generating results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generating a data.frame with the results from each of the randomness tests is as easy as passing a zoo object into the is_random function in emh. This function will downsample the data into multiple lower frequencies and run a battery of tests on each subfrequency. \n", "\n", "Frequencies are specified by the freqs1 and freqs2 arguments,\n", "\n", "* freqs1 - this specifies the _lags_ to use when computing returns\n", "* freqs2 - this specifies time-aware lags. Options are: c(\"Mon\", \"Tue\", \"Wed\", \"Thu\", \"Fri\", \"Week\", \"Month\")" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\r", " | \r", " | | 0%\r", " | \r", " |==== | 6%\r", " | \r", " |======== | 12%\r", " | \r", " |============ | 18%\r", " | \r", " |================ | 24%\r", " | \r", " |===================== | 29%\r", " | \r", " |========================= | 35%\r", " | \r", " |============================= | 41%\r", " | \r", " |================================= | 47%\r", " | \r", " |===================================== | 53%\r", " | \r", " |========================================= | 59%\r", " | \r", " |============================================= | 65%\r", " | \r", " |================================================= | 71%\r", " | \r", " |====================================================== | 76%\r", " | \r", " |========================================================== | 82%\r", " | \r", " |============================================================== | 88%\r", " | \r", " |================================================================== | 94%\r", " | \r", " |======================================================================| 100%" ] } ], "source": [ "results <- is_random(S = global_indices$'YAHOO/INDEX_SML',\n", " a = 0.99, # To get a 99% confident result\n", " freqs1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),\n", " freqs2 = c(\"Mon\", \"Tue\", \"Wed\", \"Thu\", \"Fri\", \"Week\", \"Month\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Viewing the results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can now view the results (a data.frame object) or plot some interesting statistics" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
Test_Name | Frequency | Sample_Size | Statistic | Two_sided_p | Z_Score | Non_Random |
---|---|---|---|---|---|---|
Independent Runs | (t-1 to t) | 7031 | 3222.000000 | 0.000000 | -6.401018 | TRUE |
Durbin-Watson | (t-1 to t) | 7031 | 2.000085 | 0.997102 | 2.759057 | TRUE |
Ljung-Box | (t-1 to t) | 7031 | 40.018562 | 0.000451 | -3.319694 | TRUE |
Breusch-Godfrey | (t-1 to t) | 7031 | 1.493007 | 0.221750 | -0.766295 | FALSE |
Bartell Rank | (t-1 to t) | 7031 | -4.581588 | 0.000005 | -4.434487 | TRUE |
Variance-Ratio LoMac | (t-1 to t) | 7031 | 0.524304 | 0.612558 | 0.285993 | FALSE |
Independent Runs | (t-2 to t) | 3515 | 1615.000000 | 0.000025 | -4.056609 | TRUE |
Durbin-Watson | (t-2 to t) | 3515 | 1.999914 | 0.997877 | 2.859250 | TRUE |
Ljung-Box | (t-2 to t) | 3515 | 35.686137 | 0.001962 | -2.884263 | TRUE |
Breusch-Godfrey | (t-2 to t) | 3515 | 0.002772 | 0.958014 | 1.728094 | FALSE |
Bartell Rank | (t-2 to t) | 3515 | -1.561292 | 0.118455 | -1.182746 | FALSE |
Variance-Ratio LoMac | (t-2 to t) | 3515 | -0.298505 | 0.427464 | -0.182833 | FALSE |
Independent Runs | (t-3 to t) | 2343 | 1107.000000 | 0.034269 | -1.821457 | FALSE |
Durbin-Watson | (t-3 to t) | 2343 | 2.000828 | 0.983218 | 2.125265 | FALSE |
Ljung-Box | (t-3 to t) | 2343 | 36.372465 | 0.001562 | -2.955341 | TRUE |
Breusch-Godfrey | (t-3 to t) | 2343 | 1.496615 | 0.221193 | -0.768171 | FALSE |
Bartell Rank | (t-3 to t) | 2343 | 0.044739 | 0.964315 | 1.803117 | FALSE |
Variance-Ratio LoMac | (t-3 to t) | 2343 | 2.593742 | 0.889218 | 1.222378 | FALSE |
Independent Runs | (t-4 to t) | 1757 | 861.000000 | 0.467113 | -0.082529 | FALSE |
Durbin-Watson | (t-4 to t) | 1757 | 1.993175 | 0.886997 | 1.210711 | FALSE |
Ljung-Box | (t-4 to t) | 1757 | 18.857366 | 0.220270 | -0.771281 | FALSE |
Breusch-Godfrey | (t-4 to t) | 1757 | 0.064177 | 0.800012 | 0.841665 | FALSE |
Bartell Rank | (t-4 to t) | 1757 | 1.664227 | 0.096067 | -1.304292 | FALSE |
Variance-Ratio LoMac | (t-4 to t) | 1757 | 2.066387 | 0.898717 | 1.274274 | FALSE |
Independent Runs | (t-5 to t) | 1406 | 713.000000 | 0.905517 | 1.313647 | FALSE |
Durbin-Watson | (t-5 to t) | 1406 | 1.995872 | 0.938608 | 1.543194 | FALSE |
Ljung-Box | (t-5 to t) | 1406 | 14.701537 | 0.473122 | -0.067424 | FALSE |
Breusch-Godfrey | (t-5 to t) | 1406 | 0.446644 | 0.503933 | 0.009859 | FALSE |
Bartell Rank | (t-5 to t) | 1406 | 1.576687 | 0.114868 | -1.201041 | FALSE |
Variance-Ratio LoMac | (t-5 to t) | 1406 | 0.509927 | 0.623519 | 0.314736 | FALSE |