{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Module 7\n",
    "\n",
    "## Video 30: Working with Aggregate Flows\n",
    "**Python for the Energy Industry**\n",
    "\n",
    "As we've seen, the CargoMovements endpoint gives access to data at the level of individual cargos. In the previous two lessons, we looked at how we would aggregate this data. The Vortexa SDK also conveniently offers a way of directly accessing aggregated flows data: the CargoTimeSeries endpoint.\n",
    "\n",
    "[Cargo Time Series documentation.](https://vortechsa.github.io/python-sdk/endpoints/cargo_timeseries/)\n",
    "\n",
    "So how does this work?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# initial imports\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from datetime import datetime,date\n",
    "from dateutil.relativedelta import relativedelta\n",
    "import vortexasdk as v"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are a couple of important parameters to define when using time series data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The cargo unit for the time series (barrels)\n",
    "TS_UNIT = 'b'\n",
    "\n",
    "# The granularity of the time series\n",
    "TS_FREQ = 'week'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As before, we also need to define the geography and products of interest. Here we will once again consider crude exports from the US. So we get the corresponding IDs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# datetimes to access last 7 weeks of data\n",
    "now = datetime.utcnow()\n",
    "seven_weeks_ago = now - relativedelta(weeks=7)\n",
    "\n",
    "# Find US ID\n",
    "us = [g.id for g in v.Geographies().search('united states').to_list() if 'country' in g.layer]\n",
    "assert len(us) == 1\n",
    "\n",
    "# Find crude ID\n",
    "crude = [p.id for p in v.Products().search('crude').to_list() if p.name=='Crude']\n",
    "assert len(crude) == 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We then make a a search call to the CargoTimeSeries endpoint, specifying our date range, geography and product IDs as usual. We also specify the unit (barrels) and granularity (weekly) of the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key</th>\n",
       "      <th>value</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2020-12-21 00:00:00+00:00</td>\n",
       "      <td>28153014</td>\n",
       "      <td>55</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2020-12-28 00:00:00+00:00</td>\n",
       "      <td>28800130</td>\n",
       "      <td>48</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2021-01-04 00:00:00+00:00</td>\n",
       "      <td>17902289</td>\n",
       "      <td>26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2021-01-11 00:00:00+00:00</td>\n",
       "      <td>24413069</td>\n",
       "      <td>45</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2021-01-18 00:00:00+00:00</td>\n",
       "      <td>19569612</td>\n",
       "      <td>31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2021-01-25 00:00:00+00:00</td>\n",
       "      <td>29752386</td>\n",
       "      <td>47</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>2021-02-01 00:00:00+00:00</td>\n",
       "      <td>22664959</td>\n",
       "      <td>35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2021-02-08 00:00:00+00:00</td>\n",
       "      <td>986121</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        key     value  count\n",
       "0 2020-12-21 00:00:00+00:00  28153014     55\n",
       "1 2020-12-28 00:00:00+00:00  28800130     48\n",
       "2 2021-01-04 00:00:00+00:00  17902289     26\n",
       "3 2021-01-11 00:00:00+00:00  24413069     45\n",
       "4 2021-01-18 00:00:00+00:00  19569612     31\n",
       "5 2021-01-25 00:00:00+00:00  29752386     47\n",
       "6 2021-02-01 00:00:00+00:00  22664959     35\n",
       "7 2021-02-08 00:00:00+00:00    986121      1"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_exports = v.CargoTimeSeries().search( \n",
    "            filter_activity = 'loading_end',\n",
    "            filter_origins = us,\n",
    "            filter_products = crude, \n",
    "            # Measure in barrels\n",
    "            timeseries_unit = TS_UNIT,\n",
    "            # Look at weekly imports\n",
    "            timeseries_frequency = TS_FREQ,\n",
    "            # Set the date range\n",
    "            filter_time_min = seven_weeks_ago,\n",
    "            filter_time_max = now).to_df()\n",
    "\n",
    "df_exports"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To make the dataframe more readable, we rename the key and value columns, and convert the datetime object to date only."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>barrels</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2020-12-21</th>\n",
       "      <td>28153014</td>\n",
       "      <td>55</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-12-28</th>\n",
       "      <td>28800130</td>\n",
       "      <td>48</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-01-04</th>\n",
       "      <td>17902289</td>\n",
       "      <td>26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-01-11</th>\n",
       "      <td>24413069</td>\n",
       "      <td>45</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-01-18</th>\n",
       "      <td>19569612</td>\n",
       "      <td>31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-01-25</th>\n",
       "      <td>29752386</td>\n",
       "      <td>47</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-02-01</th>\n",
       "      <td>22664959</td>\n",
       "      <td>35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-02-08</th>\n",
       "      <td>986121</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             barrels  count\n",
       "date                       \n",
       "2020-12-21  28153014     55\n",
       "2020-12-28  28800130     48\n",
       "2021-01-04  17902289     26\n",
       "2021-01-11  24413069     45\n",
       "2021-01-18  19569612     31\n",
       "2021-01-25  29752386     47\n",
       "2021-02-01  22664959     35\n",
       "2021-02-08    986121      1"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_exports['key'] = pd.to_datetime(df_exports['key']).dt.date\n",
    "df_exports = df_exports.rename(columns = {'key': 'date', 'value': 'barrels'})\n",
    "df_exports = df_exports.set_index('date')\n",
    "\n",
    "df_exports"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's possible to access a week's data from this DataFrame by indexing it with the date of the start of the week:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "date_to_locate = df_exports.index[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "barrels    28153014\n",
      "count            55\n",
      "Name: 2020-12-21, dtype: int64\n"
     ]
    }
   ],
   "source": [
    "# one week's data\n",
    "print(df_exports.loc[date_to_locate])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise\n",
    "\n",
    "Use the CargoTimeSeries endpoint to pull the last 2 weeks' US crude exports data, but with a daily frequency. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}