{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Module 8\n", "\n", "## Video 35: Hypothesis Testing II\n", "**Python for the Energy Industry**\n", "\n", "In this lesson we will look at the statistical tool of Granger causality. Working with the same time series data from the previous lesson, Granger causality allows us to test whether one time series is indeed predictive of the other with a certain time lag.\n", "\n", "*Remember: when you run this notebook, you will be using more recent data that was used originally. So, you may find different or even contradictory results! This is the nature of working with real data.*\n", "\n", "Start by getting all the data:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# initial imports\n", "import pandas as pd\n", "import numpy as np\n", "from datetime import datetime\n", "from dateutil.relativedelta import relativedelta\n", "import vortexasdk as v\n", "\n", "# The cargo unit for the time series (barrels)\n", "TS_UNIT = 'b'\n", "\n", "# The granularity of the time series\n", "TS_FREQ = 'day'\n", "\n", "# datetimes to access last 7 weeks of data\n", "now = datetime.utcnow()\n", "seven_weeks_ago = now - relativedelta(weeks=7)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "crude = [p.id for p in v.Products().search('crude').to_list() if p.name=='Crude']\n", "assert len(crude) == 1\n", "\n", "china = v.Geographies().search('China',exact_term_match=True)[0]['id']\n", "SEA = v.Geographies().search('Southeast Asia',exact_term_match=True)[0]['id']\n", "\n", "SEA_exports = v.CargoTimeSeries().search(\n", " timeseries_frequency=TS_FREQ,\n", " timeseries_unit=TS_UNIT,\n", " filter_time_min=seven_weeks_ago,\n", " filter_time_max=now,\n", " filter_activity=\"loading_end\",\n", " filter_origins=SEA,\n", " filter_destinations=china,\n", ").to_df()\n", "\n", "SEA_exports = SEA_exports.rename(columns={'key':'date','value':'SEA_exp'})[['date','SEA_exp']]\n", "\n", "china_imports = v.CargoTimeSeries().search(\n", " timeseries_frequency=TS_FREQ,\n", " timeseries_unit=TS_UNIT,\n", " filter_time_min=seven_weeks_ago,\n", " filter_time_max=now,\n", " filter_activity=\"unloading_start\",\n", " filter_origins=SEA,\n", " filter_destinations=china,\n", ").to_df()\n", "\n", "china_imports = china_imports.rename(columns={'key':'date','value':'china_imp'})[['date','china_imp']]\n", "\n", "combined_df = SEA_exports\n", "combined_df['china_imp'] = china_imports['china_imp']\n", "\n", "# dropna in case NaN values returned by search\n", "combined_df = combined_df.dropna()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The input to the granger causality test is a numpy array of values, with 2 columns. The test will determine if the second column Granger causes the first. We create two arrays to test both directions of causality:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# test if china imports causes SEA exports\n", "x1 = combined_df[['SEA_exp','china_imp']].values\n", "\n", "# test if SEA exports causes china imports\n", "x2 = combined_df[['china_imp','SEA_exp']].values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll now do the Granger causality test on x1. We specify the maximum number of lags - this must be less than 1/3 the size of our data. So we go with 15 lags here. The output is a dictionary:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dict_keys([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])\n" ] } ], "source": [ "from statsmodels.tsa.stattools import grangercausalitytests\n", "\n", "gc1 = grangercausalitytests(x1,maxlag=15,verbose=False)\n", "print(gc1.keys())" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "({'ssr_ftest': (0.11191210577788067, 0.7394982261424798, 46.0, 1), 'ssr_chi2test': (0.11921072137209027, 0.7298921072736446, 1), 'lrtest': (0.11906594393622072, 0.7300497647785229, 1), 'params_ftest': (0.11191210577793151, 0.7394982261424223, 46.0, 1.0)}, [, , array([[0., 1., 0.]])])\n" ] } ], "source": [ "print(gc1[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the test, for each lag, we get the results of a few different types of signficance test. The second value in each of these results tuples is the p-value. So, we pick a test (ssr_ftest), and for each lag value we get the p-value:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.7394982261424798, 0.34406244046901324, 0.4379876250464344, 0.5978693386431604, 0.4297746020366883, 0.5913047894842787, 0.6333656452613916, 0.39251634523943296, 0.3460044222271352, 0.42328369935264354, 0.2947044888072325, 0.4534537474070228, 0.45562416494890623, 0.6404053509496745, 0.6127531951186226]\n" ] } ], "source": [ "test = 'ssr_ftest'\n", "lags = range(1,16)\n", "p1 = [gc1[lag][0][test][1] for lag in lags]\n", "print(p1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Clearly, none of these p-values are less than 0.05, so there is no statistically significant evidence of China imports Granger causing SEA exports. We now repeat the test for the other direction of causality, and we plot the p-values for both tests:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "gc2 = grangercausalitytests(x2,maxlag=15,verbose=False)\n", "p2 = [gc2[lag][0][test][1] for lag in lags]\n", "\n", "p_df = pd.DataFrame({'imports -> exports':p1,'exports -> imports':p2},index=lags)\n", "\n", "p_df.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From this, we can conclude that there is no Granger causation between these two time series." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "Using the two time series from the exercise in the previous lesson - confirm that there is no Granger causation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" } }, "nbformat": 4, "nbformat_minor": 4 }