{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Module 6\n", "\n", "## Video 28: Aggregating Cargo Movements Data\n", "**Python for the Energy Industry**\n", "\n", "The CargoMovements endpoint provides granular data about individual cargoes. We can convert this data into 'macro' data about the flows of products by 'aggregating' this data. That is the subject of this lesson. \n", "\n", "[Cargo Movements documentation](https://vortechsa.github.io/python-sdk/endpoints/cargo_movements/)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# initial imports\n", "import pandas as pd\n", "import numpy as np\n", "from datetime import datetime\n", "from dateutil.relativedelta import relativedelta\n", "import vortexasdk as v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We start by storing the datetimes and IDs needed to filter for crude flows out of the US in the last 7 weeks." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# datetimes to access last 7 weeks of data\n", "now = datetime.utcnow()\n", "seven_weeks_ago = now - relativedelta(weeks=7)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2020-12-10 19:31:56,428 vortexasdk.client — WARNING — You are using vortexasdk version 0.26.0, however version 0.27.1 is available.\n", "You should consider upgrading via the 'pip install vortexasdk --upgrade' command.\n" ] } ], "source": [ "# Find US ID\n", "us = [g.id for g in v.Geographies().search('united states').to_list() if 'country' in g.layer]\n", "assert len(us) == 1" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Find crude ID\n", "crude = [p.id for p in v.Products().search('crude').to_list() if p.name=='Crude']\n", "assert len(crude) == 1" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Columns to pull out, and shortened names\n", "required_columns = [\n", " # primary vessel info\n", " \"vessels.0.name\",\n", " \"vessels.0.vessel_class\",\n", " # product information and quantity\n", " \"product.group.label\",\n", " \"product.category.label\",\n", " \"quantity\",\n", " \"status\",\n", " # loading info\n", " \"events.cargo_port_load_event.0.location.port.label\",\n", " \"events.cargo_port_load_event.0.end_timestamp\",\n", " # unloading info\n", " \"events.cargo_port_unload_event.0.location.port.label\",\n", " \"events.cargo_port_unload_event.0.location.country.label\",\n", " \"events.cargo_port_unload_event.0.end_timestamp\",\n", "]\n", "\n", "new_labels = [\n", " \"vessel_name\",\n", " \"vessel_class\",\n", " \"product_group\",\n", " \"product_category\",\n", " \"quantity\",\n", " \"status\",\n", " \"loading_port\",\n", " \"loading_finish\",\n", " \"unloading_port\",\n", " \"unloading_country\",\n", " \"unloading_finish\",\n", "]\n", "\n", "relabel = dict(zip(required_columns,new_labels))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Do the search for crude flows out of the US (and not destined for another US location) in the last 7 weeks. Convert this to a DataFrame." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "cms = v.CargoMovements().search(\n", " filter_activity = 'loading_end',\n", " filter_origins = us,\n", " exclude_destinations = us,\n", " filter_products = crude,\n", " filter_time_min = seven_weeks_ago,\n", " filter_time_max = now,\n", " cm_unit = 'b'\n", " ).to_df(columns=required_columns).rename(relabel,axis=1)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "180" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# How many associated movements?\n", "len(cms)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We add a new column to the DataFrame which gets the week of the year in which loading was completed:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "cms['loading_week'] = cms['loading_finish'].dt.isocalendar().week" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We then use `groupby` to aggregate the data by week, and sum over the numeric data. This shows the total volume (in # of barrels) exported each week." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
quantity
loading_week
4310466025
4415567944
4516264815
4618324452
4717003930
4818275876
4915558908
5011001331
\n", "
" ], "text/plain": [ " quantity\n", "loading_week \n", "43 10466025\n", "44 15567944\n", "45 16264815\n", "46 18324452\n", "47 17003930\n", "48 18275876\n", "49 15558908\n", "50 11001331" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weekly_quantity = cms.groupby('loading_week').sum()\n", "weekly_quantity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could also break this information down further. Let's say we wanted to know the weekly exported quantities of each category of crude separately. It's possible to group by multiple categories:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
loading_weekproduct_categoryquantity
043Light-Sweet7775458
143Medium-Sour2690567
244Light-Sour156307
344Light-Sweet15411637
445Light-Sweet12888609
545Medium-Sour2973678
646Light-Sweet17338331
746Medium-Sour986121
847Light-Sour1121194
947Light-Sweet15882736
1048Light-Sweet17518824
1148Medium-Sour757052
1249Light-Sweet13420357
1349Medium-Sour2138551
1450Light-Sweet10015210
1550Medium-Sour986121
\n", "
" ], "text/plain": [ " loading_week product_category quantity\n", "0 43 Light-Sweet 7775458\n", "1 43 Medium-Sour 2690567\n", "2 44 Light-Sour 156307\n", "3 44 Light-Sweet 15411637\n", "4 45 Light-Sweet 12888609\n", "5 45 Medium-Sour 2973678\n", "6 46 Light-Sweet 17338331\n", "7 46 Medium-Sour 986121\n", "8 47 Light-Sour 1121194\n", "9 47 Light-Sweet 15882736\n", "10 48 Light-Sweet 17518824\n", "11 48 Medium-Sour 757052\n", "12 49 Light-Sweet 13420357\n", "13 49 Medium-Sour 2138551\n", "14 50 Light-Sweet 10015210\n", "15 50 Medium-Sour 986121" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quantity_by_category = cms.groupby(by = ['loading_week','product_category']).sum().reset_index()\n", "quantity_by_category" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is not the most intuitive way of displaying the data. We can 'pivot' the table, so that each product category gets its own column:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
product_categoryLight-SourLight-SweetMedium-Sour
loading_week
43NaN7775458.02690567.0
44156307.015411637.0NaN
45NaN12888609.02973678.0
46NaN17338331.0986121.0
471121194.015882736.0NaN
48NaN17518824.0757052.0
49NaN13420357.02138551.0
50NaN10015210.0986121.0
\n", "
" ], "text/plain": [ "product_category Light-Sour Light-Sweet Medium-Sour\n", "loading_week \n", "43 NaN 7775458.0 2690567.0\n", "44 156307.0 15411637.0 NaN\n", "45 NaN 12888609.0 2973678.0\n", "46 NaN 17338331.0 986121.0\n", "47 1121194.0 15882736.0 NaN\n", "48 NaN 17518824.0 757052.0\n", "49 NaN 13420357.0 2138551.0\n", "50 NaN 10015210.0 986121.0" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quantity_by_category = quantity_by_category.pivot(index = 'loading_week', \n", " columns = 'product_category',\n", " values = 'quantity')\n", "quantity_by_category" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is better. However, a number of elements are 'NaN'. Clearly, these represent an absence of any exports of that category in that week - so we replace these with zeroes." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
product_categoryLight-SourLight-SweetMedium-Sour
loading_week
430.07775458.02690567.0
44156307.015411637.00.0
450.012888609.02973678.0
460.017338331.0986121.0
471121194.015882736.00.0
480.017518824.0757052.0
490.013420357.02138551.0
500.010015210.0986121.0
\n", "
" ], "text/plain": [ "product_category Light-Sour Light-Sweet Medium-Sour\n", "loading_week \n", "43 0.0 7775458.0 2690567.0\n", "44 156307.0 15411637.0 0.0\n", "45 0.0 12888609.0 2973678.0\n", "46 0.0 17338331.0 986121.0\n", "47 1121194.0 15882736.0 0.0\n", "48 0.0 17518824.0 757052.0\n", "49 0.0 13420357.0 2138551.0\n", "50 0.0 10015210.0 986121.0" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quantity_by_category = quantity_by_category.fillna(0)\n", "quantity_by_category" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another way of breaking down the exports data is by receiving country. " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "quantity_by_destination = cms.groupby('unloading_country').sum()[['quantity']]\n", "quantity_by_destination.sort_values(by='quantity',ascending = False, inplace=True)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
quantity
unloading_country
India17764529
China17334013
Canada13019738
Netherlands10866913
United Kingdom9787720
\n", "
" ], "text/plain": [ " quantity\n", "unloading_country \n", "India 17764529\n", "China 17334013\n", "Canada 13019738\n", "Netherlands 10866913\n", "United Kingdom 9787720" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quantity_by_destination.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We might not want data for all recipient countries in our data, so we can collect those besides the top 10 in an 'other' category." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# grab the top 10\n", "top_destination_countries = quantity_by_destination.head(10)\n", "\n", "# grab the bottom, sum their quantities and put into a DataFrame\n", "rest = pd.DataFrame(index = ['Other'], columns = ['quantity'])\n", "rest.loc['Other'] = quantity_by_destination[10:].sum().values\n", "\n", "# join the two DataFrames\n", "top_destination_countries = pd.concat([top_destination_countries, rest])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can add a new column containing the percentage of total exports represented by exports to that country:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
quantity%
India1776452915.55
China1733401315.17
Canada1301973811.40
Netherlands108669139.51
United Kingdom97877208.57
France59141445.18
Germany54158344.74
South Korea48717684.26
Italy45821424.01
Taiwan37286853.26
Other2095057418.34
\n", "
" ], "text/plain": [ " quantity %\n", "India 17764529 15.55\n", "China 17334013 15.17\n", "Canada 13019738 11.40\n", "Netherlands 10866913 9.51\n", "United Kingdom 9787720 8.57\n", "France 5914144 5.18\n", "Germany 5415834 4.74\n", "South Korea 4871768 4.26\n", "Italy 4582142 4.01\n", "Taiwan 3728685 3.26\n", "Other 20950574 18.34" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "top_destination_countries['%'] = round(top_destination_countries['quantity']*100 / top_destination_countries['quantity'].sum(),2)\n", "top_destination_countries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note: In the next module, we will see how the Cargo Time Series endpoint offers another approach to this same kind of analysis*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "Make a pivoted table showing how exports to each of the top 5 desination countries varied over the 7 weeks. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" } }, "nbformat": 4, "nbformat_minor": 4 }