{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Module 6\n", "\n", "## Video 28: Aggregating Cargo Movements Data\n", "**Python for the Energy Industry**\n", "\n", "The CargoMovements endpoint provides granular data about individual cargoes. We can convert this data into 'macro' data about the flows of products by 'aggregating' this data. That is the subject of this lesson. \n", "\n", "[Cargo Movements documentation](https://vortechsa.github.io/python-sdk/endpoints/cargo_movements/)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2021-07-13T11:16:22.266492Z", "iopub.status.busy": "2021-07-13T11:16:22.265679Z", "iopub.status.idle": "2021-07-13T11:16:22.711332Z", "shell.execute_reply": "2021-07-13T11:16:22.711866Z" } }, "outputs": [], "source": [ "# initial imports\n", "import pandas as pd\n", "import numpy as np\n", "from datetime import datetime\n", "from dateutil.relativedelta import relativedelta\n", "import vortexasdk as v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We start by storing the datetimes and IDs needed to filter for crude flows out of the US in the last 7 weeks." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2021-07-13T11:16:22.716012Z", "iopub.status.busy": "2021-07-13T11:16:22.715340Z", "iopub.status.idle": "2021-07-13T11:16:22.717118Z", "shell.execute_reply": "2021-07-13T11:16:22.717603Z" } }, "outputs": [], "source": [ "# datetimes to access last 7 weeks of data\n", "now = datetime.utcnow()\n", "seven_weeks_ago = now - relativedelta(weeks=7)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2021-07-13T11:16:22.721518Z", "iopub.status.busy": "2021-07-13T11:16:22.720870Z", "iopub.status.idle": "2021-07-13T11:16:23.397523Z", "shell.execute_reply": "2021-07-13T11:16:23.398001Z" } }, "outputs": [], "source": [ "# Find US ID\n", "us = [g.id for g in v.Geographies().search('united states').to_list() if 'country' in g.layer]\n", "assert len(us) == 1" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2021-07-13T11:16:23.402232Z", "iopub.status.busy": "2021-07-13T11:16:23.401638Z", "iopub.status.idle": "2021-07-13T11:16:23.953026Z", "shell.execute_reply": "2021-07-13T11:16:23.953539Z" } }, "outputs": [], "source": [ "# Find crude ID\n", "crude = [p.id for p in v.Products().search('crude').to_list() if p.name=='Crude']\n", "assert len(crude) == 1" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2021-07-13T11:16:23.958532Z", "iopub.status.busy": "2021-07-13T11:16:23.957914Z", "iopub.status.idle": "2021-07-13T11:16:23.959671Z", "shell.execute_reply": "2021-07-13T11:16:23.960077Z" } }, "outputs": [], "source": [ "# Columns to pull out, and shortened names\n", "required_columns = [\n", " # primary vessel info\n", " \"vessels.0.name\",\n", " \"vessels.0.vessel_class\",\n", " # product information and quantity\n", " \"product.group.label\",\n", " \"product.category.label\",\n", " \"quantity\",\n", " \"status\",\n", " # loading info\n", " \"events.cargo_port_load_event.0.location.port.label\",\n", " \"events.cargo_port_load_event.0.end_timestamp\",\n", " # unloading info\n", " \"events.cargo_port_unload_event.0.location.port.label\",\n", " \"events.cargo_port_unload_event.0.location.country.label\",\n", " \"events.cargo_port_unload_event.0.end_timestamp\",\n", "]\n", "\n", "new_labels = [\n", " \"vessel_name\",\n", " \"vessel_class\",\n", " \"product_group\",\n", " \"product_category\",\n", " \"quantity\",\n", " \"status\",\n", " \"loading_port\",\n", " \"loading_finish\",\n", " \"unloading_port\",\n", " \"unloading_country\",\n", " \"unloading_finish\",\n", "]\n", "\n", "relabel = dict(zip(required_columns,new_labels))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Do the search for crude flows out of the US (and not destined for another US location) in the last 7 weeks. Convert this to a DataFrame." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2021-07-13T11:16:23.964580Z", "iopub.status.busy": "2021-07-13T11:16:23.963819Z", "iopub.status.idle": "2021-07-13T11:16:25.013849Z", "shell.execute_reply": "2021-07-13T11:16:25.014305Z" } }, "outputs": [], "source": [ "cms = v.CargoMovements().search(\n", " filter_activity = 'loading_end',\n", " filter_origins = us,\n", " exclude_destinations = us,\n", " filter_products = crude,\n", " filter_time_min = seven_weeks_ago,\n", " filter_time_max = now,\n", " cm_unit = 'b'\n", " ).to_df(columns=required_columns).rename(relabel,axis=1)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2021-07-13T11:16:25.024840Z", "iopub.status.busy": "2021-07-13T11:16:25.024142Z", "iopub.status.idle": "2021-07-13T11:16:25.027587Z", "shell.execute_reply": "2021-07-13T11:16:25.027997Z" } }, "outputs": [ { "data": { "text/plain": [ "228" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# How many associated movements?\n", "len(cms)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We add a new column to the DataFrame which gets the week of the year in which loading was completed:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2021-07-13T11:16:25.033910Z", "iopub.status.busy": "2021-07-13T11:16:25.032650Z", "iopub.status.idle": "2021-07-13T11:16:25.035670Z", "shell.execute_reply": "2021-07-13T11:16:25.036076Z" } }, "outputs": [], "source": [ "cms['loading_week'] = cms['loading_finish'].dt.isocalendar().week" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We then use `groupby` to aggregate the data by week, and sum over the numeric data. This shows the total volume (in # of barrels) exported each week." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2021-07-13T11:16:25.040774Z", "iopub.status.busy": "2021-07-13T11:16:25.040183Z", "iopub.status.idle": "2021-07-13T11:16:25.048212Z", "shell.execute_reply": "2021-07-13T11:16:25.048640Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
quantity
loading_week
2116203392
2219765442
2321530598
2421794264
2525614818
2619651398
2724615334
285623297
\n", "
" ], "text/plain": [ " quantity\n", "loading_week \n", "21 16203392\n", "22 19765442\n", "23 21530598\n", "24 21794264\n", "25 25614818\n", "26 19651398\n", "27 24615334\n", "28 5623297" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weekly_quantity = cms.groupby('loading_week').sum()\n", "weekly_quantity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could also break this information down further. Let's say we wanted to know the weekly exported quantities of each category of crude separately. It's possible to group by multiple categories:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2021-07-13T11:16:25.059122Z", "iopub.status.busy": "2021-07-13T11:16:25.052952Z", "iopub.status.idle": "2021-07-13T11:16:25.061677Z", "shell.execute_reply": "2021-07-13T11:16:25.062092Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
loading_weekproduct_categoryquantity
021Light-Sweet14457314
121Medium-Sour1557064
222Light-Sweet18779321
322Medium-Sour986121
423Heavy-Sour1584078
523Light-Sweet17238857
623Medium-Sour1959945
724Heavy-Sour483880
824Light-Sour1443458
924Light-Sweet19866926
1025Heavy-Sour298671
1125Light-Sweet24730561
1225Medium-Sour357009
1326Light-Sour815856
1426Light-Sweet14701277
1526Medium-Sour3447476
1627Light-Sweet23007185
1727Medium-Sour1275061
1828Light-Sweet5623297
\n", "
" ], "text/plain": [ " loading_week product_category quantity\n", "0 21 Light-Sweet 14457314\n", "1 21 Medium-Sour 1557064\n", "2 22 Light-Sweet 18779321\n", "3 22 Medium-Sour 986121\n", "4 23 Heavy-Sour 1584078\n", "5 23 Light-Sweet 17238857\n", "6 23 Medium-Sour 1959945\n", "7 24 Heavy-Sour 483880\n", "8 24 Light-Sour 1443458\n", "9 24 Light-Sweet 19866926\n", "10 25 Heavy-Sour 298671\n", "11 25 Light-Sweet 24730561\n", "12 25 Medium-Sour 357009\n", "13 26 Light-Sour 815856\n", "14 26 Light-Sweet 14701277\n", "15 26 Medium-Sour 3447476\n", "16 27 Light-Sweet 23007185\n", "17 27 Medium-Sour 1275061\n", "18 28 Light-Sweet 5623297" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quantity_by_category = cms.groupby(by = ['loading_week','product_category']).sum().reset_index()\n", "quantity_by_category" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is not the most intuitive way of displaying the data. We can 'pivot' the table, so that each product category gets its own column:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2021-07-13T11:16:25.073096Z", "iopub.status.busy": "2021-07-13T11:16:25.067581Z", "iopub.status.idle": "2021-07-13T11:16:25.075476Z", "shell.execute_reply": "2021-07-13T11:16:25.075877Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
product_categoryHeavy-SourLight-SourLight-SweetMedium-Sour
loading_week
21NaNNaN14457314.01557064.0
22NaNNaN18779321.0986121.0
231584078.0NaN17238857.01959945.0
24483880.01443458.019866926.0NaN
25298671.0NaN24730561.0357009.0
26NaN815856.014701277.03447476.0
27NaNNaN23007185.01275061.0
28NaNNaN5623297.0NaN
\n", "
" ], "text/plain": [ "product_category Heavy-Sour Light-Sour Light-Sweet Medium-Sour\n", "loading_week \n", "21 NaN NaN 14457314.0 1557064.0\n", "22 NaN NaN 18779321.0 986121.0\n", "23 1584078.0 NaN 17238857.0 1959945.0\n", "24 483880.0 1443458.0 19866926.0 NaN\n", "25 298671.0 NaN 24730561.0 357009.0\n", "26 NaN 815856.0 14701277.0 3447476.0\n", "27 NaN NaN 23007185.0 1275061.0\n", "28 NaN NaN 5623297.0 NaN" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quantity_by_category = quantity_by_category.pivot(index = 'loading_week', \n", " columns = 'product_category',\n", " values = 'quantity')\n", "quantity_by_category" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is better. However, a number of elements are 'NaN'. Clearly, these represent an absence of any exports of that category in that week - so we replace these with zeroes." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2021-07-13T11:16:25.085179Z", "iopub.status.busy": "2021-07-13T11:16:25.084471Z", "iopub.status.idle": "2021-07-13T11:16:25.086866Z", "shell.execute_reply": "2021-07-13T11:16:25.087264Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
product_categoryHeavy-SourLight-SourLight-SweetMedium-Sour
loading_week
210.00.014457314.01557064.0
220.00.018779321.0986121.0
231584078.00.017238857.01959945.0
24483880.01443458.019866926.00.0
25298671.00.024730561.0357009.0
260.0815856.014701277.03447476.0
270.00.023007185.01275061.0
280.00.05623297.00.0
\n", "
" ], "text/plain": [ "product_category Heavy-Sour Light-Sour Light-Sweet Medium-Sour\n", "loading_week \n", "21 0.0 0.0 14457314.0 1557064.0\n", "22 0.0 0.0 18779321.0 986121.0\n", "23 1584078.0 0.0 17238857.0 1959945.0\n", "24 483880.0 1443458.0 19866926.0 0.0\n", "25 298671.0 0.0 24730561.0 357009.0\n", "26 0.0 815856.0 14701277.0 3447476.0\n", "27 0.0 0.0 23007185.0 1275061.0\n", "28 0.0 0.0 5623297.0 0.0" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quantity_by_category = quantity_by_category.fillna(0)\n", "quantity_by_category" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another way of breaking down the exports data is by receiving country. " ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2021-07-13T11:16:25.092463Z", "iopub.status.busy": "2021-07-13T11:16:25.091892Z", "iopub.status.idle": "2021-07-13T11:16:25.095115Z", "shell.execute_reply": "2021-07-13T11:16:25.095523Z" } }, "outputs": [], "source": [ "quantity_by_destination = cms.groupby('unloading_country').sum()[['quantity']]\n", "quantity_by_destination.sort_values(by='quantity',ascending = False, inplace=True)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2021-07-13T11:16:25.100828Z", "iopub.status.busy": "2021-07-13T11:16:25.100126Z", "iopub.status.idle": "2021-07-13T11:16:25.102742Z", "shell.execute_reply": "2021-07-13T11:16:25.103160Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
quantity
unloading_country
South Korea19482797
India15210800
Canada11880161
Netherlands11127371
United Kingdom10571957
\n", "
" ], "text/plain": [ " quantity\n", "unloading_country \n", "South Korea 19482797\n", "India 15210800\n", "Canada 11880161\n", "Netherlands 11127371\n", "United Kingdom 10571957" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quantity_by_destination.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We might not want data for all recipient countries in our data, so we can collect those besides the top 10 in an 'other' category." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2021-07-13T11:16:25.109191Z", "iopub.status.busy": "2021-07-13T11:16:25.108684Z", "iopub.status.idle": "2021-07-13T11:16:25.110306Z", "shell.execute_reply": "2021-07-13T11:16:25.110652Z" } }, "outputs": [], "source": [ "# grab the top 10\n", "top_destination_countries = quantity_by_destination.head(10)\n", "\n", "# grab the bottom, sum their quantities and put into a DataFrame\n", "rest = pd.DataFrame(index = ['Other'], columns = ['quantity'])\n", "rest.loc['Other'] = quantity_by_destination[10:].sum().values\n", "\n", "# join the two DataFrames\n", "top_destination_countries = pd.concat([top_destination_countries, rest]).astype(int)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can add a new column containing the percentage of total exports represented by exports to that country:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2021-07-13T11:16:25.117100Z", "iopub.status.busy": "2021-07-13T11:16:25.113869Z", "iopub.status.idle": "2021-07-13T11:16:25.119555Z", "shell.execute_reply": "2021-07-13T11:16:25.119116Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
quantity%
South Korea1948279714.79
India1521080011.54
Canada118801619.02
Netherlands111273718.44
United Kingdom105719578.02
China86291156.55
Italy67160115.10
France62866694.77
Taiwan62435914.74
Germany43385563.29
Other3127861323.74
\n", "
" ], "text/plain": [ " quantity %\n", "South Korea 19482797 14.79\n", "India 15210800 11.54\n", "Canada 11880161 9.02\n", "Netherlands 11127371 8.44\n", "United Kingdom 10571957 8.02\n", "China 8629115 6.55\n", "Italy 6716011 5.10\n", "France 6286669 4.77\n", "Taiwan 6243591 4.74\n", "Germany 4338556 3.29\n", "Other 31278613 23.74" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "top_destination_countries['%'] = round(top_destination_countries['quantity']*100 / top_destination_countries['quantity'].sum(),2)\n", "top_destination_countries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note: In the next module, we will see how the Cargo Time Series endpoint offers another approach to this same kind of analysis*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "\n", "Make a pivoted table showing how exports to each of the top 5 desination countries varied over the 7 weeks. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }