{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Module 6\n", "\n", "## Video 28: Aggregating Cargo Movements Data\n", "**Python for the Energy Industry**\n", "\n", "The CargoMovements endpoint provides granular data about individual cargoes. We can convert this data into 'macro' data about the flows of products by 'aggregating' this data. That is the subject of this lesson. \n", "\n", "[Cargo Movements documentation](https://vortechsa.github.io/python-sdk/endpoints/cargo_movements/)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/js/1d4sm_gs33n2lr86b3p7pyvm0000gn/T/ipykernel_40860/3584211857.py:2: DeprecationWarning: \n", "Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),\n", "(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)\n", "but was not found to be installed on your system.\n", "If this would cause problems for you,\n", "please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466\n", " \n", " import pandas as pd\n" ] } ], "source": [ "# initial imports\n", "import pandas as pd\n", "import numpy as np\n", "from datetime import datetime\n", "from dateutil.relativedelta import relativedelta\n", "import vortexasdk as v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We start by storing the datetimes and IDs needed to filter for crude flows out of the US in the last 7 weeks." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# datetimes to access last 7 weeks of data\n", "now = datetime.utcnow()\n", "seven_weeks_ago = now - relativedelta(weeks=7)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Find US ID\n", "us = [g.id for g in v.Geographies().search('united states').to_list() if 'country' in g.layer]\n", "assert len(us) == 1" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Find crude ID\n", "crude = [p.id for p in v.Products().search('crude').to_list() if p.name=='Crude']\n", "assert len(crude) == 1" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Columns to pull out, and shortened names\n", "required_columns = [\n", " # primary vessel info\n", " \"vessels.0.name\",\n", " \"vessels.0.vessel_class\",\n", " # product information and quantity\n", " \"product.group.label\",\n", " \"product.category.label\",\n", " \"quantity\",\n", " \"status\",\n", " # loading info\n", " \"events.cargo_port_load_event.0.location.port.label\",\n", " \"events.cargo_port_load_event.0.end_timestamp\",\n", " # unloading info\n", " \"events.cargo_port_unload_event.0.location.port.label\",\n", " \"events.cargo_port_unload_event.0.location.country.label\",\n", " \"events.cargo_port_unload_event.0.end_timestamp\",\n", "]\n", "\n", "new_labels = [\n", " \"vessel_name\",\n", " \"vessel_class\",\n", " \"product_group\",\n", " \"product_category\",\n", " \"quantity\",\n", " \"status\",\n", " \"loading_port\",\n", " \"loading_finish\",\n", " \"unloading_port\",\n", " \"unloading_country\",\n", " \"unloading_finish\",\n", "]\n", "\n", "relabel = dict(zip(required_columns,new_labels))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Do the search for crude flows out of the US (and not destined for another US location) in the last 7 weeks. Convert this to a DataFrame." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "cms = v.CargoMovements().search(\n", " filter_activity = 'loading_end',\n", " filter_origins = us,\n", " exclude_destinations = us,\n", " filter_products = crude,\n", " filter_time_min = seven_weeks_ago,\n", " filter_time_max = now,\n", " cm_unit = 'b'\n", " ).to_df(columns=required_columns).rename(relabel,axis=1)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "277" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# How many associated movements?\n", "len(cms)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We add a new column to the DataFrame which gets the week of the year in which loading was completed:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "cms['loading_week'] = cms['loading_finish'].dt.isocalendar().week" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We then use `groupby` to aggregate the data by week, and sum over the numeric data. This shows the total volume (in # of barrels) exported each week." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
quantity
loading_week
96177861
1026019118
1136710668
1228675170
1327427342
1417971104
1531144565
1622209895
\n", "
" ], "text/plain": [ " quantity\n", "loading_week \n", "9 6177861\n", "10 26019118\n", "11 36710668\n", "12 28675170\n", "13 27427342\n", "14 17971104\n", "15 31144565\n", "16 22209895" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weekly_quantity = cms.groupby('loading_week').sum(numeric_only=True)\n", "weekly_quantity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could also break this information down further. Let's say we wanted to know the weekly exported quantities of each category of crude separately. It's possible to group by multiple categories:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
loading_weekproduct_categoryquantity
09Light-Sweet4898641
19Medium-Sour1279220
210Heavy-Sour1292891
310Light-Sweet23657246
410Medium-Sour1068981
511Heavy-Sour1168289
611Light-Sweet31123847
711Medium-Sour4418532
812Heavy-Sour291878
912Light-Sweet27299960
1012Medium-Sour1083332
1113Light-Sweet26522943
1213Medium-Sour904399
1314Heavy-Sour317250
1414Light-Sweet16403216
1514Medium-Sour775952
1614Medium-Sweet474686
1715Heavy-Sour876933
1815Light-Sweet29190571
1915Medium-Sour1077061
2016Light-Sweet19109858
2116Medium-Sour3100037
\n", "
" ], "text/plain": [ " loading_week product_category quantity\n", "0 9 Light-Sweet 4898641\n", "1 9 Medium-Sour 1279220\n", "2 10 Heavy-Sour 1292891\n", "3 10 Light-Sweet 23657246\n", "4 10 Medium-Sour 1068981\n", "5 11 Heavy-Sour 1168289\n", "6 11 Light-Sweet 31123847\n", "7 11 Medium-Sour 4418532\n", "8 12 Heavy-Sour 291878\n", "9 12 Light-Sweet 27299960\n", "10 12 Medium-Sour 1083332\n", "11 13 Light-Sweet 26522943\n", "12 13 Medium-Sour 904399\n", "13 14 Heavy-Sour 317250\n", "14 14 Light-Sweet 16403216\n", "15 14 Medium-Sour 775952\n", "16 14 Medium-Sweet 474686\n", "17 15 Heavy-Sour 876933\n", "18 15 Light-Sweet 29190571\n", "19 15 Medium-Sour 1077061\n", "20 16 Light-Sweet 19109858\n", "21 16 Medium-Sour 3100037" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quantity_by_category = cms.groupby(by = ['loading_week','product_category']).sum(numeric_only=True).reset_index()\n", "quantity_by_category" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is not the most intuitive way of displaying the data. We can 'pivot' the table, so that each product category gets its own column:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
product_categoryHeavy-SourLight-SweetMedium-SourMedium-Sweet
loading_week
9NaN4898641.01279220.0NaN
101292891.023657246.01068981.0NaN
111168289.031123847.04418532.0NaN
12291878.027299960.01083332.0NaN
13NaN26522943.0904399.0NaN
14317250.016403216.0775952.0474686.0
15876933.029190571.01077061.0NaN
16NaN19109858.03100037.0NaN
\n", "
" ], "text/plain": [ "product_category Heavy-Sour Light-Sweet Medium-Sour Medium-Sweet\n", "loading_week \n", "9 NaN 4898641.0 1279220.0 NaN\n", "10 1292891.0 23657246.0 1068981.0 NaN\n", "11 1168289.0 31123847.0 4418532.0 NaN\n", "12 291878.0 27299960.0 1083332.0 NaN\n", "13 NaN 26522943.0 904399.0 NaN\n", "14 317250.0 16403216.0 775952.0 474686.0\n", "15 876933.0 29190571.0 1077061.0 NaN\n", "16 NaN 19109858.0 3100037.0 NaN" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quantity_by_category = quantity_by_category.pivot(index = 'loading_week', \n", " columns = 'product_category',\n", " values = 'quantity')\n", "quantity_by_category" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is better. However, a number of elements are 'NaN'. Clearly, these represent an absence of any exports of that category in that week - so we replace these with zeroes." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
product_categoryHeavy-SourLight-SweetMedium-SourMedium-Sweet
loading_week
90.04898641.01279220.00.0
101292891.023657246.01068981.00.0
111168289.031123847.04418532.00.0
12291878.027299960.01083332.00.0
130.026522943.0904399.00.0
14317250.016403216.0775952.0474686.0
15876933.029190571.01077061.00.0
160.019109858.03100037.00.0
\n", "
" ], "text/plain": [ "product_category Heavy-Sour Light-Sweet Medium-Sour Medium-Sweet\n", "loading_week \n", "9 0.0 4898641.0 1279220.0 0.0\n", "10 1292891.0 23657246.0 1068981.0 0.0\n", "11 1168289.0 31123847.0 4418532.0 0.0\n", "12 291878.0 27299960.0 1083332.0 0.0\n", "13 0.0 26522943.0 904399.0 0.0\n", "14 317250.0 16403216.0 775952.0 474686.0\n", "15 876933.0 29190571.0 1077061.0 0.0\n", "16 0.0 19109858.0 3100037.0 0.0" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quantity_by_category = quantity_by_category.fillna(0)\n", "quantity_by_category" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another way of breaking down the exports data is by receiving country. " ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "quantity_by_destination = cms.groupby('unloading_country').sum(numeric_only=True)[['quantity']]\n", "quantity_by_destination.sort_values(by='quantity',ascending = False, inplace=True)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
quantity
unloading_country
Netherlands25729101
South Korea20622091
China16348697
Singapore15721615
United Kingdom15540232
\n", "
" ], "text/plain": [ " quantity\n", "unloading_country \n", "Netherlands 25729101\n", "South Korea 20622091\n", "China 16348697\n", "Singapore 15721615\n", "United Kingdom 15540232" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quantity_by_destination.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We might not want data for all recipient countries in our data, so we can collect those besides the top 10 in an 'other' category." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "# grab the top 10\n", "top_destination_countries = quantity_by_destination.head(10)\n", "\n", "# grab the bottom, sum their quantities and put into a DataFrame\n", "rest = pd.DataFrame(index = ['Other'], columns = ['quantity'])\n", "rest.loc['Other'] = quantity_by_destination[10:].sum().values\n", "\n", "# join the two DataFrames\n", "top_destination_countries = pd.concat([top_destination_countries, rest]).astype(int)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can add a new column containing the percentage of total exports represented by exports to that country:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
quantity%
Netherlands2572910113.10
South Korea2062209110.50
China163486978.33
Singapore157216158.01
United Kingdom155402327.92
Spain146116307.44
Canada125896746.41
India118688016.05
Malaysia83954564.28
France68479153.49
Other4806051124.48
\n", "
" ], "text/plain": [ " quantity %\n", "Netherlands 25729101 13.10\n", "South Korea 20622091 10.50\n", "China 16348697 8.33\n", "Singapore 15721615 8.01\n", "United Kingdom 15540232 7.92\n", "Spain 14611630 7.44\n", "Canada 12589674 6.41\n", "India 11868801 6.05\n", "Malaysia 8395456 4.28\n", "France 6847915 3.49\n", "Other 48060511 24.48" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "top_destination_countries['%'] = round(top_destination_countries['quantity']*100 / top_destination_countries['quantity'].sum(),2)\n", "top_destination_countries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note: In the next module, we will see how the Cargo Time Series endpoint offers another approach to this same kind of analysis*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "Make a pivoted table showing how exports to each of the top 5 desination countries varied over the 7 weeks. " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 4 }