{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Module 6\n",
    "\n",
    "## Video 28: Aggregating Cargo Movements Data\n",
    "**Python for the Energy Industry**\n",
    "\n",
    "The CargoMovements endpoint provides granular data about individual cargoes. We can convert this data into 'macro' data about the flows of products by 'aggregating' this data. That is the subject of this lesson. \n",
    "\n",
    "[Cargo Movements documentation](https://vortechsa.github.io/python-sdk/endpoints/cargo_movements/)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-13T11:16:22.266492Z",
     "iopub.status.busy": "2021-07-13T11:16:22.265679Z",
     "iopub.status.idle": "2021-07-13T11:16:22.711332Z",
     "shell.execute_reply": "2021-07-13T11:16:22.711866Z"
    }
   },
   "outputs": [],
   "source": [
    "# initial imports\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from datetime import datetime\n",
    "from dateutil.relativedelta import relativedelta\n",
    "import vortexasdk as v"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We start by storing the datetimes and IDs needed to filter for crude flows out of the US in the last 7 weeks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-13T11:16:22.716012Z",
     "iopub.status.busy": "2021-07-13T11:16:22.715340Z",
     "iopub.status.idle": "2021-07-13T11:16:22.717118Z",
     "shell.execute_reply": "2021-07-13T11:16:22.717603Z"
    }
   },
   "outputs": [],
   "source": [
    "# datetimes to access last 7 weeks of data\n",
    "now = datetime.utcnow()\n",
    "seven_weeks_ago = now - relativedelta(weeks=7)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-13T11:16:22.721518Z",
     "iopub.status.busy": "2021-07-13T11:16:22.720870Z",
     "iopub.status.idle": "2021-07-13T11:16:23.397523Z",
     "shell.execute_reply": "2021-07-13T11:16:23.398001Z"
    }
   },
   "outputs": [],
   "source": [
    "# Find US ID\n",
    "us = [g.id for g in v.Geographies().search('united states').to_list() if 'country' in g.layer]\n",
    "assert len(us) == 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-13T11:16:23.402232Z",
     "iopub.status.busy": "2021-07-13T11:16:23.401638Z",
     "iopub.status.idle": "2021-07-13T11:16:23.953026Z",
     "shell.execute_reply": "2021-07-13T11:16:23.953539Z"
    }
   },
   "outputs": [],
   "source": [
    "# Find crude ID\n",
    "crude = [p.id for p in v.Products().search('crude').to_list() if p.name=='Crude']\n",
    "assert len(crude) == 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-13T11:16:23.958532Z",
     "iopub.status.busy": "2021-07-13T11:16:23.957914Z",
     "iopub.status.idle": "2021-07-13T11:16:23.959671Z",
     "shell.execute_reply": "2021-07-13T11:16:23.960077Z"
    }
   },
   "outputs": [],
   "source": [
    "# Columns to pull out, and shortened names\n",
    "required_columns = [\n",
    "    # primary vessel info\n",
    "    \"vessels.0.name\",\n",
    "    \"vessels.0.vessel_class\",\n",
    "    # product information and quantity\n",
    "    \"product.group.label\",\n",
    "    \"product.category.label\",\n",
    "    \"quantity\",\n",
    "    \"status\",\n",
    "    # loading info\n",
    "    \"events.cargo_port_load_event.0.location.port.label\",\n",
    "    \"events.cargo_port_load_event.0.end_timestamp\",\n",
    "    # unloading info\n",
    "    \"events.cargo_port_unload_event.0.location.port.label\",\n",
    "    \"events.cargo_port_unload_event.0.location.country.label\",\n",
    "    \"events.cargo_port_unload_event.0.end_timestamp\",\n",
    "]\n",
    "\n",
    "new_labels = [\n",
    "    \"vessel_name\",\n",
    "    \"vessel_class\",\n",
    "    \"product_group\",\n",
    "    \"product_category\",\n",
    "    \"quantity\",\n",
    "    \"status\",\n",
    "    \"loading_port\",\n",
    "    \"loading_finish\",\n",
    "    \"unloading_port\",\n",
    "    \"unloading_country\",\n",
    "    \"unloading_finish\",\n",
    "]\n",
    "\n",
    "relabel = dict(zip(required_columns,new_labels))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Do the search for crude flows out of the US (and not destined for another US location) in the last 7 weeks. Convert this to a DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-13T11:16:23.964580Z",
     "iopub.status.busy": "2021-07-13T11:16:23.963819Z",
     "iopub.status.idle": "2021-07-13T11:16:25.013849Z",
     "shell.execute_reply": "2021-07-13T11:16:25.014305Z"
    }
   },
   "outputs": [],
   "source": [
    "cms = v.CargoMovements().search(\n",
    "        filter_activity = 'loading_end',\n",
    "        filter_origins = us,\n",
    "        exclude_destinations = us,\n",
    "        filter_products = crude,\n",
    "        filter_time_min = seven_weeks_ago,\n",
    "        filter_time_max = now,\n",
    "        cm_unit = 'b'\n",
    "    ).to_df(columns=required_columns).rename(relabel,axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-13T11:16:25.024840Z",
     "iopub.status.busy": "2021-07-13T11:16:25.024142Z",
     "iopub.status.idle": "2021-07-13T11:16:25.027587Z",
     "shell.execute_reply": "2021-07-13T11:16:25.027997Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "228"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# How many associated movements?\n",
    "len(cms)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We add a new column to the DataFrame which gets the week of the year in which loading was completed:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-13T11:16:25.033910Z",
     "iopub.status.busy": "2021-07-13T11:16:25.032650Z",
     "iopub.status.idle": "2021-07-13T11:16:25.035670Z",
     "shell.execute_reply": "2021-07-13T11:16:25.036076Z"
    }
   },
   "outputs": [],
   "source": [
    "cms['loading_week'] = cms['loading_finish'].dt.isocalendar().week"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We then use `groupby` to aggregate the data by week, and sum over the numeric data. This shows the total volume (in # of barrels) exported each week."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-13T11:16:25.040774Z",
     "iopub.status.busy": "2021-07-13T11:16:25.040183Z",
     "iopub.status.idle": "2021-07-13T11:16:25.048212Z",
     "shell.execute_reply": "2021-07-13T11:16:25.048640Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quantity</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>loading_week</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>16203392</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>19765442</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>21530598</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>21794264</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>25614818</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>19651398</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>24615334</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>5623297</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              quantity\n",
       "loading_week          \n",
       "21            16203392\n",
       "22            19765442\n",
       "23            21530598\n",
       "24            21794264\n",
       "25            25614818\n",
       "26            19651398\n",
       "27            24615334\n",
       "28             5623297"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weekly_quantity = cms.groupby('loading_week').sum()\n",
    "weekly_quantity"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We could also break this information down further. Let's say we wanted to know the weekly exported quantities of each category of crude separately. It's possible to group by multiple categories:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-13T11:16:25.059122Z",
     "iopub.status.busy": "2021-07-13T11:16:25.052952Z",
     "iopub.status.idle": "2021-07-13T11:16:25.061677Z",
     "shell.execute_reply": "2021-07-13T11:16:25.062092Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>loading_week</th>\n",
       "      <th>product_category</th>\n",
       "      <th>quantity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>21</td>\n",
       "      <td>Light-Sweet</td>\n",
       "      <td>14457314</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>21</td>\n",
       "      <td>Medium-Sour</td>\n",
       "      <td>1557064</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>22</td>\n",
       "      <td>Light-Sweet</td>\n",
       "      <td>18779321</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>22</td>\n",
       "      <td>Medium-Sour</td>\n",
       "      <td>986121</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>23</td>\n",
       "      <td>Heavy-Sour</td>\n",
       "      <td>1584078</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>23</td>\n",
       "      <td>Light-Sweet</td>\n",
       "      <td>17238857</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>23</td>\n",
       "      <td>Medium-Sour</td>\n",
       "      <td>1959945</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>24</td>\n",
       "      <td>Heavy-Sour</td>\n",
       "      <td>483880</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>24</td>\n",
       "      <td>Light-Sour</td>\n",
       "      <td>1443458</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>24</td>\n",
       "      <td>Light-Sweet</td>\n",
       "      <td>19866926</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>25</td>\n",
       "      <td>Heavy-Sour</td>\n",
       "      <td>298671</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>25</td>\n",
       "      <td>Light-Sweet</td>\n",
       "      <td>24730561</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>25</td>\n",
       "      <td>Medium-Sour</td>\n",
       "      <td>357009</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>26</td>\n",
       "      <td>Light-Sour</td>\n",
       "      <td>815856</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>26</td>\n",
       "      <td>Light-Sweet</td>\n",
       "      <td>14701277</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>26</td>\n",
       "      <td>Medium-Sour</td>\n",
       "      <td>3447476</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>27</td>\n",
       "      <td>Light-Sweet</td>\n",
       "      <td>23007185</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>27</td>\n",
       "      <td>Medium-Sour</td>\n",
       "      <td>1275061</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>28</td>\n",
       "      <td>Light-Sweet</td>\n",
       "      <td>5623297</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    loading_week product_category  quantity\n",
       "0             21      Light-Sweet  14457314\n",
       "1             21      Medium-Sour   1557064\n",
       "2             22      Light-Sweet  18779321\n",
       "3             22      Medium-Sour    986121\n",
       "4             23       Heavy-Sour   1584078\n",
       "5             23      Light-Sweet  17238857\n",
       "6             23      Medium-Sour   1959945\n",
       "7             24       Heavy-Sour    483880\n",
       "8             24       Light-Sour   1443458\n",
       "9             24      Light-Sweet  19866926\n",
       "10            25       Heavy-Sour    298671\n",
       "11            25      Light-Sweet  24730561\n",
       "12            25      Medium-Sour    357009\n",
       "13            26       Light-Sour    815856\n",
       "14            26      Light-Sweet  14701277\n",
       "15            26      Medium-Sour   3447476\n",
       "16            27      Light-Sweet  23007185\n",
       "17            27      Medium-Sour   1275061\n",
       "18            28      Light-Sweet   5623297"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quantity_by_category = cms.groupby(by = ['loading_week','product_category']).sum().reset_index()\n",
    "quantity_by_category"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is not the most intuitive way of displaying the data. We can 'pivot' the table, so that each product category gets its own column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-13T11:16:25.073096Z",
     "iopub.status.busy": "2021-07-13T11:16:25.067581Z",
     "iopub.status.idle": "2021-07-13T11:16:25.075476Z",
     "shell.execute_reply": "2021-07-13T11:16:25.075877Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>product_category</th>\n",
       "      <th>Heavy-Sour</th>\n",
       "      <th>Light-Sour</th>\n",
       "      <th>Light-Sweet</th>\n",
       "      <th>Medium-Sour</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>loading_week</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>14457314.0</td>\n",
       "      <td>1557064.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>18779321.0</td>\n",
       "      <td>986121.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>1584078.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>17238857.0</td>\n",
       "      <td>1959945.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>483880.0</td>\n",
       "      <td>1443458.0</td>\n",
       "      <td>19866926.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>298671.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>24730561.0</td>\n",
       "      <td>357009.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>NaN</td>\n",
       "      <td>815856.0</td>\n",
       "      <td>14701277.0</td>\n",
       "      <td>3447476.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>23007185.0</td>\n",
       "      <td>1275061.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>5623297.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "product_category  Heavy-Sour  Light-Sour  Light-Sweet  Medium-Sour\n",
       "loading_week                                                      \n",
       "21                       NaN         NaN   14457314.0    1557064.0\n",
       "22                       NaN         NaN   18779321.0     986121.0\n",
       "23                 1584078.0         NaN   17238857.0    1959945.0\n",
       "24                  483880.0   1443458.0   19866926.0          NaN\n",
       "25                  298671.0         NaN   24730561.0     357009.0\n",
       "26                       NaN    815856.0   14701277.0    3447476.0\n",
       "27                       NaN         NaN   23007185.0    1275061.0\n",
       "28                       NaN         NaN    5623297.0          NaN"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quantity_by_category = quantity_by_category.pivot(index = 'loading_week', \n",
    "                                                  columns = 'product_category',\n",
    "                                                  values = 'quantity')\n",
    "quantity_by_category"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is better. However, a number of elements are 'NaN'. Clearly, these represent an absence of any exports of that category in that week - so we replace these with zeroes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-13T11:16:25.085179Z",
     "iopub.status.busy": "2021-07-13T11:16:25.084471Z",
     "iopub.status.idle": "2021-07-13T11:16:25.086866Z",
     "shell.execute_reply": "2021-07-13T11:16:25.087264Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>product_category</th>\n",
       "      <th>Heavy-Sour</th>\n",
       "      <th>Light-Sour</th>\n",
       "      <th>Light-Sweet</th>\n",
       "      <th>Medium-Sour</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>loading_week</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>14457314.0</td>\n",
       "      <td>1557064.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>18779321.0</td>\n",
       "      <td>986121.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>1584078.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>17238857.0</td>\n",
       "      <td>1959945.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>483880.0</td>\n",
       "      <td>1443458.0</td>\n",
       "      <td>19866926.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>298671.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>24730561.0</td>\n",
       "      <td>357009.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>0.0</td>\n",
       "      <td>815856.0</td>\n",
       "      <td>14701277.0</td>\n",
       "      <td>3447476.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>23007185.0</td>\n",
       "      <td>1275061.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>5623297.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "product_category  Heavy-Sour  Light-Sour  Light-Sweet  Medium-Sour\n",
       "loading_week                                                      \n",
       "21                       0.0         0.0   14457314.0    1557064.0\n",
       "22                       0.0         0.0   18779321.0     986121.0\n",
       "23                 1584078.0         0.0   17238857.0    1959945.0\n",
       "24                  483880.0   1443458.0   19866926.0          0.0\n",
       "25                  298671.0         0.0   24730561.0     357009.0\n",
       "26                       0.0    815856.0   14701277.0    3447476.0\n",
       "27                       0.0         0.0   23007185.0    1275061.0\n",
       "28                       0.0         0.0    5623297.0          0.0"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quantity_by_category = quantity_by_category.fillna(0)\n",
    "quantity_by_category"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another way of breaking down the exports data is by receiving country. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-13T11:16:25.092463Z",
     "iopub.status.busy": "2021-07-13T11:16:25.091892Z",
     "iopub.status.idle": "2021-07-13T11:16:25.095115Z",
     "shell.execute_reply": "2021-07-13T11:16:25.095523Z"
    }
   },
   "outputs": [],
   "source": [
    "quantity_by_destination = cms.groupby('unloading_country').sum()[['quantity']]\n",
    "quantity_by_destination.sort_values(by='quantity',ascending = False, inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-13T11:16:25.100828Z",
     "iopub.status.busy": "2021-07-13T11:16:25.100126Z",
     "iopub.status.idle": "2021-07-13T11:16:25.102742Z",
     "shell.execute_reply": "2021-07-13T11:16:25.103160Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quantity</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unloading_country</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>South Korea</th>\n",
       "      <td>19482797</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>India</th>\n",
       "      <td>15210800</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Canada</th>\n",
       "      <td>11880161</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Netherlands</th>\n",
       "      <td>11127371</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>United Kingdom</th>\n",
       "      <td>10571957</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   quantity\n",
       "unloading_country          \n",
       "South Korea        19482797\n",
       "India              15210800\n",
       "Canada             11880161\n",
       "Netherlands        11127371\n",
       "United Kingdom     10571957"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quantity_by_destination.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We might not want data for all recipient countries in our data, so we can collect those besides the top 10 in an 'other' category."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-13T11:16:25.109191Z",
     "iopub.status.busy": "2021-07-13T11:16:25.108684Z",
     "iopub.status.idle": "2021-07-13T11:16:25.110306Z",
     "shell.execute_reply": "2021-07-13T11:16:25.110652Z"
    }
   },
   "outputs": [],
   "source": [
    "# grab the top 10\n",
    "top_destination_countries = quantity_by_destination.head(10)\n",
    "\n",
    "# grab the bottom, sum their quantities and put into a DataFrame\n",
    "rest = pd.DataFrame(index = ['Other'], columns = ['quantity'])\n",
    "rest.loc['Other'] = quantity_by_destination[10:].sum().values\n",
    "\n",
    "#  join the two DataFrames\n",
    "top_destination_countries = pd.concat([top_destination_countries, rest]).astype(int)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we can add a new column containing the percentage of total exports represented by exports to that country:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-13T11:16:25.117100Z",
     "iopub.status.busy": "2021-07-13T11:16:25.113869Z",
     "iopub.status.idle": "2021-07-13T11:16:25.119555Z",
     "shell.execute_reply": "2021-07-13T11:16:25.119116Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quantity</th>\n",
       "      <th>%</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>South Korea</th>\n",
       "      <td>19482797</td>\n",
       "      <td>14.79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>India</th>\n",
       "      <td>15210800</td>\n",
       "      <td>11.54</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Canada</th>\n",
       "      <td>11880161</td>\n",
       "      <td>9.02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Netherlands</th>\n",
       "      <td>11127371</td>\n",
       "      <td>8.44</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>United Kingdom</th>\n",
       "      <td>10571957</td>\n",
       "      <td>8.02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>China</th>\n",
       "      <td>8629115</td>\n",
       "      <td>6.55</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Italy</th>\n",
       "      <td>6716011</td>\n",
       "      <td>5.10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>France</th>\n",
       "      <td>6286669</td>\n",
       "      <td>4.77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Taiwan</th>\n",
       "      <td>6243591</td>\n",
       "      <td>4.74</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Germany</th>\n",
       "      <td>4338556</td>\n",
       "      <td>3.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Other</th>\n",
       "      <td>31278613</td>\n",
       "      <td>23.74</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                quantity      %\n",
       "South Korea     19482797  14.79\n",
       "India           15210800  11.54\n",
       "Canada          11880161   9.02\n",
       "Netherlands     11127371   8.44\n",
       "United Kingdom  10571957   8.02\n",
       "China            8629115   6.55\n",
       "Italy            6716011   5.10\n",
       "France           6286669   4.77\n",
       "Taiwan           6243591   4.74\n",
       "Germany          4338556   3.29\n",
       "Other           31278613  23.74"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "top_destination_countries['%'] = round(top_destination_countries['quantity']*100 / top_destination_countries['quantity'].sum(),2)\n",
    "top_destination_countries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Note: In the next module, we will see how the Cargo Time Series endpoint offers another approach to this same kind of analysis*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise\n",
    "\n",
    "Make a pivoted table showing how exports to each of the top 5 desination countries varied over the 7 weeks. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}