{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Module 6\n",
    "\n",
    "## Video 28: Aggregating Cargo Movements Data\n",
    "**Python for the Energy Industry**\n",
    "\n",
    "The CargoMovements endpoint provides granular data about individual cargoes. We can convert this data into 'macro' data about the flows of products by 'aggregating' this data. That is the subject of this lesson. \n",
    "\n",
    "[Cargo Movements documentation](https://vortechsa.github.io/python-sdk/endpoints/cargo_movements/)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/js/1d4sm_gs33n2lr86b3p7pyvm0000gn/T/ipykernel_40860/3584211857.py:2: DeprecationWarning: \n",
      "Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),\n",
      "(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)\n",
      "but was not found to be installed on your system.\n",
      "If this would cause problems for you,\n",
      "please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466\n",
      "        \n",
      "  import pandas as pd\n"
     ]
    }
   ],
   "source": [
    "# initial imports\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from datetime import datetime\n",
    "from dateutil.relativedelta import relativedelta\n",
    "import vortexasdk as v"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We start by storing the datetimes and IDs needed to filter for crude flows out of the US in the last 7 weeks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# datetimes to access last 7 weeks of data\n",
    "now = datetime.utcnow()\n",
    "seven_weeks_ago = now - relativedelta(weeks=7)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Find US ID\n",
    "us = [g.id for g in v.Geographies().search('united states').to_list() if 'country' in g.layer]\n",
    "assert len(us) == 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Find crude ID\n",
    "crude = [p.id for p in v.Products().search('crude').to_list() if p.name=='Crude']\n",
    "assert len(crude) == 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Columns to pull out, and shortened names\n",
    "required_columns = [\n",
    "    # primary vessel info\n",
    "    \"vessels.0.name\",\n",
    "    \"vessels.0.vessel_class\",\n",
    "    # product information and quantity\n",
    "    \"product.group.label\",\n",
    "    \"product.category.label\",\n",
    "    \"quantity\",\n",
    "    \"status\",\n",
    "    # loading info\n",
    "    \"events.cargo_port_load_event.0.location.port.label\",\n",
    "    \"events.cargo_port_load_event.0.end_timestamp\",\n",
    "    # unloading info\n",
    "    \"events.cargo_port_unload_event.0.location.port.label\",\n",
    "    \"events.cargo_port_unload_event.0.location.country.label\",\n",
    "    \"events.cargo_port_unload_event.0.end_timestamp\",\n",
    "]\n",
    "\n",
    "new_labels = [\n",
    "    \"vessel_name\",\n",
    "    \"vessel_class\",\n",
    "    \"product_group\",\n",
    "    \"product_category\",\n",
    "    \"quantity\",\n",
    "    \"status\",\n",
    "    \"loading_port\",\n",
    "    \"loading_finish\",\n",
    "    \"unloading_port\",\n",
    "    \"unloading_country\",\n",
    "    \"unloading_finish\",\n",
    "]\n",
    "\n",
    "relabel = dict(zip(required_columns,new_labels))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Do the search for crude flows out of the US (and not destined for another US location) in the last 7 weeks. Convert this to a DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "cms = v.CargoMovements().search(\n",
    "        filter_activity = 'loading_end',\n",
    "        filter_origins = us,\n",
    "        exclude_destinations = us,\n",
    "        filter_products = crude,\n",
    "        filter_time_min = seven_weeks_ago,\n",
    "        filter_time_max = now,\n",
    "        cm_unit = 'b'\n",
    "    ).to_df(columns=required_columns).rename(relabel,axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "277"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# How many associated movements?\n",
    "len(cms)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We add a new column to the DataFrame which gets the week of the year in which loading was completed:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "cms['loading_week'] = cms['loading_finish'].dt.isocalendar().week"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We then use `groupby` to aggregate the data by week, and sum over the numeric data. This shows the total volume (in # of barrels) exported each week."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quantity</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>loading_week</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>6177861</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>26019118</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>36710668</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>28675170</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>27427342</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>17971104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>31144565</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>22209895</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              quantity\n",
       "loading_week          \n",
       "9              6177861\n",
       "10            26019118\n",
       "11            36710668\n",
       "12            28675170\n",
       "13            27427342\n",
       "14            17971104\n",
       "15            31144565\n",
       "16            22209895"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weekly_quantity = cms.groupby('loading_week').sum(numeric_only=True)\n",
    "weekly_quantity"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We could also break this information down further. Let's say we wanted to know the weekly exported quantities of each category of crude separately. It's possible to group by multiple categories:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>loading_week</th>\n",
       "      <th>product_category</th>\n",
       "      <th>quantity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9</td>\n",
       "      <td>Light-Sweet</td>\n",
       "      <td>4898641</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9</td>\n",
       "      <td>Medium-Sour</td>\n",
       "      <td>1279220</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>10</td>\n",
       "      <td>Heavy-Sour</td>\n",
       "      <td>1292891</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>10</td>\n",
       "      <td>Light-Sweet</td>\n",
       "      <td>23657246</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>10</td>\n",
       "      <td>Medium-Sour</td>\n",
       "      <td>1068981</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>11</td>\n",
       "      <td>Heavy-Sour</td>\n",
       "      <td>1168289</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>11</td>\n",
       "      <td>Light-Sweet</td>\n",
       "      <td>31123847</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>11</td>\n",
       "      <td>Medium-Sour</td>\n",
       "      <td>4418532</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>12</td>\n",
       "      <td>Heavy-Sour</td>\n",
       "      <td>291878</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>12</td>\n",
       "      <td>Light-Sweet</td>\n",
       "      <td>27299960</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>12</td>\n",
       "      <td>Medium-Sour</td>\n",
       "      <td>1083332</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>13</td>\n",
       "      <td>Light-Sweet</td>\n",
       "      <td>26522943</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>13</td>\n",
       "      <td>Medium-Sour</td>\n",
       "      <td>904399</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>14</td>\n",
       "      <td>Heavy-Sour</td>\n",
       "      <td>317250</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>14</td>\n",
       "      <td>Light-Sweet</td>\n",
       "      <td>16403216</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>14</td>\n",
       "      <td>Medium-Sour</td>\n",
       "      <td>775952</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>14</td>\n",
       "      <td>Medium-Sweet</td>\n",
       "      <td>474686</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>15</td>\n",
       "      <td>Heavy-Sour</td>\n",
       "      <td>876933</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>15</td>\n",
       "      <td>Light-Sweet</td>\n",
       "      <td>29190571</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>15</td>\n",
       "      <td>Medium-Sour</td>\n",
       "      <td>1077061</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>16</td>\n",
       "      <td>Light-Sweet</td>\n",
       "      <td>19109858</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>16</td>\n",
       "      <td>Medium-Sour</td>\n",
       "      <td>3100037</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    loading_week product_category  quantity\n",
       "0              9      Light-Sweet   4898641\n",
       "1              9      Medium-Sour   1279220\n",
       "2             10       Heavy-Sour   1292891\n",
       "3             10      Light-Sweet  23657246\n",
       "4             10      Medium-Sour   1068981\n",
       "5             11       Heavy-Sour   1168289\n",
       "6             11      Light-Sweet  31123847\n",
       "7             11      Medium-Sour   4418532\n",
       "8             12       Heavy-Sour    291878\n",
       "9             12      Light-Sweet  27299960\n",
       "10            12      Medium-Sour   1083332\n",
       "11            13      Light-Sweet  26522943\n",
       "12            13      Medium-Sour    904399\n",
       "13            14       Heavy-Sour    317250\n",
       "14            14      Light-Sweet  16403216\n",
       "15            14      Medium-Sour    775952\n",
       "16            14     Medium-Sweet    474686\n",
       "17            15       Heavy-Sour    876933\n",
       "18            15      Light-Sweet  29190571\n",
       "19            15      Medium-Sour   1077061\n",
       "20            16      Light-Sweet  19109858\n",
       "21            16      Medium-Sour   3100037"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quantity_by_category = cms.groupby(by = ['loading_week','product_category']).sum(numeric_only=True).reset_index()\n",
    "quantity_by_category"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is not the most intuitive way of displaying the data. We can 'pivot' the table, so that each product category gets its own column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>product_category</th>\n",
       "      <th>Heavy-Sour</th>\n",
       "      <th>Light-Sweet</th>\n",
       "      <th>Medium-Sour</th>\n",
       "      <th>Medium-Sweet</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>loading_week</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>NaN</td>\n",
       "      <td>4898641.0</td>\n",
       "      <td>1279220.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1292891.0</td>\n",
       "      <td>23657246.0</td>\n",
       "      <td>1068981.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>1168289.0</td>\n",
       "      <td>31123847.0</td>\n",
       "      <td>4418532.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>291878.0</td>\n",
       "      <td>27299960.0</td>\n",
       "      <td>1083332.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>NaN</td>\n",
       "      <td>26522943.0</td>\n",
       "      <td>904399.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>317250.0</td>\n",
       "      <td>16403216.0</td>\n",
       "      <td>775952.0</td>\n",
       "      <td>474686.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>876933.0</td>\n",
       "      <td>29190571.0</td>\n",
       "      <td>1077061.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>NaN</td>\n",
       "      <td>19109858.0</td>\n",
       "      <td>3100037.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "product_category  Heavy-Sour  Light-Sweet  Medium-Sour  Medium-Sweet\n",
       "loading_week                                                        \n",
       "9                        NaN    4898641.0    1279220.0           NaN\n",
       "10                 1292891.0   23657246.0    1068981.0           NaN\n",
       "11                 1168289.0   31123847.0    4418532.0           NaN\n",
       "12                  291878.0   27299960.0    1083332.0           NaN\n",
       "13                       NaN   26522943.0     904399.0           NaN\n",
       "14                  317250.0   16403216.0     775952.0      474686.0\n",
       "15                  876933.0   29190571.0    1077061.0           NaN\n",
       "16                       NaN   19109858.0    3100037.0           NaN"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quantity_by_category = quantity_by_category.pivot(index = 'loading_week', \n",
    "                                                  columns = 'product_category',\n",
    "                                                  values = 'quantity')\n",
    "quantity_by_category"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is better. However, a number of elements are 'NaN'. Clearly, these represent an absence of any exports of that category in that week - so we replace these with zeroes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>product_category</th>\n",
       "      <th>Heavy-Sour</th>\n",
       "      <th>Light-Sweet</th>\n",
       "      <th>Medium-Sour</th>\n",
       "      <th>Medium-Sweet</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>loading_week</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0.0</td>\n",
       "      <td>4898641.0</td>\n",
       "      <td>1279220.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1292891.0</td>\n",
       "      <td>23657246.0</td>\n",
       "      <td>1068981.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>1168289.0</td>\n",
       "      <td>31123847.0</td>\n",
       "      <td>4418532.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>291878.0</td>\n",
       "      <td>27299960.0</td>\n",
       "      <td>1083332.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>0.0</td>\n",
       "      <td>26522943.0</td>\n",
       "      <td>904399.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>317250.0</td>\n",
       "      <td>16403216.0</td>\n",
       "      <td>775952.0</td>\n",
       "      <td>474686.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>876933.0</td>\n",
       "      <td>29190571.0</td>\n",
       "      <td>1077061.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>0.0</td>\n",
       "      <td>19109858.0</td>\n",
       "      <td>3100037.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "product_category  Heavy-Sour  Light-Sweet  Medium-Sour  Medium-Sweet\n",
       "loading_week                                                        \n",
       "9                        0.0    4898641.0    1279220.0           0.0\n",
       "10                 1292891.0   23657246.0    1068981.0           0.0\n",
       "11                 1168289.0   31123847.0    4418532.0           0.0\n",
       "12                  291878.0   27299960.0    1083332.0           0.0\n",
       "13                       0.0   26522943.0     904399.0           0.0\n",
       "14                  317250.0   16403216.0     775952.0      474686.0\n",
       "15                  876933.0   29190571.0    1077061.0           0.0\n",
       "16                       0.0   19109858.0    3100037.0           0.0"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quantity_by_category = quantity_by_category.fillna(0)\n",
    "quantity_by_category"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another way of breaking down the exports data is by receiving country. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "quantity_by_destination = cms.groupby('unloading_country').sum(numeric_only=True)[['quantity']]\n",
    "quantity_by_destination.sort_values(by='quantity',ascending = False, inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quantity</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unloading_country</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Netherlands</th>\n",
       "      <td>25729101</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>South Korea</th>\n",
       "      <td>20622091</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>China</th>\n",
       "      <td>16348697</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Singapore</th>\n",
       "      <td>15721615</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>United Kingdom</th>\n",
       "      <td>15540232</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   quantity\n",
       "unloading_country          \n",
       "Netherlands        25729101\n",
       "South Korea        20622091\n",
       "China              16348697\n",
       "Singapore          15721615\n",
       "United Kingdom     15540232"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "quantity_by_destination.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We might not want data for all recipient countries in our data, so we can collect those besides the top 10 in an 'other' category."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "# grab the top 10\n",
    "top_destination_countries = quantity_by_destination.head(10)\n",
    "\n",
    "# grab the bottom, sum their quantities and put into a DataFrame\n",
    "rest = pd.DataFrame(index = ['Other'], columns = ['quantity'])\n",
    "rest.loc['Other'] = quantity_by_destination[10:].sum().values\n",
    "\n",
    "#  join the two DataFrames\n",
    "top_destination_countries = pd.concat([top_destination_countries, rest]).astype(int)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we can add a new column containing the percentage of total exports represented by exports to that country:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>quantity</th>\n",
       "      <th>%</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Netherlands</th>\n",
       "      <td>25729101</td>\n",
       "      <td>13.10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>South Korea</th>\n",
       "      <td>20622091</td>\n",
       "      <td>10.50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>China</th>\n",
       "      <td>16348697</td>\n",
       "      <td>8.33</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Singapore</th>\n",
       "      <td>15721615</td>\n",
       "      <td>8.01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>United Kingdom</th>\n",
       "      <td>15540232</td>\n",
       "      <td>7.92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Spain</th>\n",
       "      <td>14611630</td>\n",
       "      <td>7.44</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Canada</th>\n",
       "      <td>12589674</td>\n",
       "      <td>6.41</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>India</th>\n",
       "      <td>11868801</td>\n",
       "      <td>6.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Malaysia</th>\n",
       "      <td>8395456</td>\n",
       "      <td>4.28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>France</th>\n",
       "      <td>6847915</td>\n",
       "      <td>3.49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Other</th>\n",
       "      <td>48060511</td>\n",
       "      <td>24.48</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                quantity      %\n",
       "Netherlands     25729101  13.10\n",
       "South Korea     20622091  10.50\n",
       "China           16348697   8.33\n",
       "Singapore       15721615   8.01\n",
       "United Kingdom  15540232   7.92\n",
       "Spain           14611630   7.44\n",
       "Canada          12589674   6.41\n",
       "India           11868801   6.05\n",
       "Malaysia         8395456   4.28\n",
       "France           6847915   3.49\n",
       "Other           48060511  24.48"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "top_destination_countries['%'] = round(top_destination_countries['quantity']*100 / top_destination_countries['quantity'].sum(),2)\n",
    "top_destination_countries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Note: In the next module, we will see how the Cargo Time Series endpoint offers another approach to this same kind of analysis*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise\n",
    "\n",
    "Make a pivoted table showing how exports to each of the top 5 desination countries varied over the 7 weeks. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}