{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Bayesian Structural Time Series \n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"### Notebook by [Marco Tavora](http://www.marcotavora.me)\n",
"\n",
"\n",
"## Table of contents\n",
"\n",
"1. [Motivation](#Motivation)\n",
"\n",
"1. [Executive Summary of the BSTS Technique](#Executive-Summary-of-the-BSTS-Technique)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Motivation\n",
"[[go back to the top]](#Table-of-contents)\n",
"\n",
"Let us try to estimate the impact of the [Deepwater Horizon oil spill](https://en.wikipedia.org/wiki/Deepwater_Horizon_oil_spill) on [BP plc](https://en.wikipedia.org/wiki/BP) (formerly British Petroleum) stock prices using Bayesian structural time series [Sengul (2018)](#References) (see section [BSTS Model](#BSTS-Model) for a slightly more formal explanation). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The oil was photographed from space by a NASA satellite in 24 May of 2010 and is [shown below](https://en.wikipedia.org/wiki/Deepwater_Horizon_oil_spill)\n",
"\n",
"
\n",
"
\n",
" \n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Executive Summary of the BSTS Technique\n",
"[[go back to the top]](#Table-of-contents)\n",
"\n",
"In general, structural time series (STS) models (either frequentist or Bayesian) can be written as a system of equations. The simplest possible example of STS model is the *local level model*, given by:\n",
"\n",
"$$\\begin{array}{l}\n",
"{\\mu _{t + 1}} = {\\mu _t} + {\\xi _t},\\\\\n",
"{y_t} = {\\mu _t} + {\\varepsilon _t}\n",
"\\end{array}$$\n",
"\n",
"where ${\\xi _t} \\sim {\\cal N}(0,\\sigma _\\xi ^2)$ and ${\\varepsilon _t} \\sim {\\cal N}(0,\\sigma _\\varepsilon ^2)$.\n",
"Here, the first equation is called *state equation* and ${\\mu _t}$ is an unobserved variable. Though ${\\mu _t}$ is not observed, the second equation, called *observation equation*, which depends on ${\\mu _t}$, contains the variable ${y_t}$ which is based on observed data (roughly speaking, ${\\mu _t}$ can be interpreted as a time-dependent version of the intercept of simple linear regressions).\n",
"\n",
"The general form of the BSTS is more convoluted so let us consider the following case, \n",
"\n",
"$$\\begin{array}{l}\n",
"{y_t} = {\\mu _t} + {\\tau _t} + {\\beta ^T}{{\\vec x}_t} + {\\varepsilon _t}\\\\\n",
"{\\mu _t} = {\\mu _{t - 1}} + {\\delta _{t - 1}} + {\\eta _t},\\,\\,\\,\\,\\,\\,\\,\\,\\,\\\\\n",
"{\\delta _t} = {\\delta _{t - 1}} + {\\omega _t},\\,\\,\\,\\,\\,\\,\\,\\,\\,\\,\\,\\,\\,\\,\\,\\,\\,\\,\\\\\n",
"{\\tau _t} = - \\sum\\nolimits_{s = 1}^{S - 1} {{\\tau _{t - s}} + {\\gamma _t}} \n",
"\\end{array}$$\n",
"\n",
"where the errors have similar properties as in the local level model. The variables have the following meanings:\n",
"- ${\\tau _t}$ is the seasonal component\n",
"- ${\\delta _t}$ is a random walk with trend\n",
"- ${\\mu _t}$ is the trend component\n",
"- ${{\\vec x}_t}$ is a vector of covariates.\n",
"\n",
"The last variable, namely, ${{\\vec x}_t}$ is of crucial importance in this discussion. It is a vector time series that can be used to predict ${y_t}$ and the coefficients $\\beta$ are estimated the [spike-and-slab](https://projecteuclid.org/download/pdfview_1/euclid.aos/1117114335) Bayesian method of feature selection. To use this system of equations to predict BP stock prices after the oil spill we need to identify seasonal effects, trends and choose ${{\\vec x}_t}$. An important condition that must be obeyed is that ${{\\vec x}_t}$ must (ideally) be highly correlated with ${y_t}$, but cannot have been impacted by the same factors that impacted $y_t$.\n",
"\n",
"Let us apply the `CausalImpact` package without going into much detail now. First let us build "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Installing packages\n",
"\n",
"[[go back to the top]](#Table-of-contents)\n",
"\n",
"Some of the packages we need to install are:\n",
"\n",
"- `devtools` to be able to download packages from Github\n",
"- `magrittr` for code readability and maintainability\n",
"- `dplyr` for data manipulation\n",
"- `rga` to obtain data from the Google Analytics (GA) APIs\n",
"\n",
"We need also to provided the `id` corresponding to our view from GA."
]
},
{
"cell_type": "code",
"execution_count": 107,
"metadata": {},
"outputs": [],
"source": [
"# # install.packages(\"devtools\")\n",
"# # install.packages(\"magrittr\")\n",
"# # install.packages(\"dplyr\")\n",
"# # install.packages(\"curl\")\n",
"# # install_github(\"skardhamar/rga\")\n",
"# # install.packages(c(\"bitops\", \"jsonlite\", \"httr\"),repos='http://cran.us.r-project.org')\n",
"# install.packages(\"tidyverse\")\n",
"# install.packages(\"tidyr\")"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"library(devtools)\n",
"library(magrittr)\n",
"library(\"dplyr\")\n",
"library(lubridate)\n",
"library(curl)\n",
"library(tidyverse)\n",
"library(tidyr)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Reading the `csv` file containing the BP stock prices:"
]
},
{
"cell_type": "code",
"execution_count": 187,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
date | bp | nasdaq |
---|---|---|
2009-01-02 | 47.0 | 15.79 |
2009-01-05 | 48.3 | 16.21 |
2009-01-06 | 48.9 | 16.42 |
2009-01-07 | 48.0 | 16.22 |
2009-01-08 | 48.3 | 15.90 |
2009-01-09 | 47.9 | 16.17 |