{
"cells": [
{
"cell_type": "markdown",
"id": "c69a3906",
"metadata": {},
"source": [
"# Vortexa voyages SDK use case\n",
"This notebook contains code and instructions for generating use cases of Vortexa's Voyages dataset."
]
},
{
"cell_type": "markdown",
"id": "8aee8ec8",
"metadata": {},
"source": [
"## Import required libraries\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "c2cc71c4",
"metadata": {
"ExecuteTime": {
"end_time": "2024-05-31T16:52:54.036744Z",
"start_time": "2024-05-31T16:52:53.145498Z"
}
},
"outputs": [],
"source": [
"import vortexasdk as v\n",
"from datetime import datetime\n",
"import pandas as pd\n",
"import numpy as np\n",
"import time\n",
"import plotly.express as px\n",
"import dateutil.relativedelta"
]
},
{
"cell_type": "markdown",
"id": "7aeadb42",
"metadata": {},
"source": [
"## Store Vortexa IDs\n",
"Vortexa's SDK interprets vessels, products and geographies using unique IDs rather than names. The code below demonstrates how to search and save various Vortexa reference IDs."
]
},
{
"cell_type": "markdown",
"id": "d2aeaacf",
"metadata": {},
"source": [
"### Geographies\n",
"Search for geography ids (remove hashtags to search)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "17e1a76a",
"metadata": {
"ExecuteTime": {
"end_time": "2024-05-31T16:52:54.620058Z",
"start_time": "2024-05-31T16:52:54.614789Z"
}
},
"outputs": [],
"source": [
"# full_length_df = v.Geographies().search(term=[\"Mexico East\"]).to_df()\n",
"# print(full_length_df.to_string(index=False))\n",
"\n",
"# Store geography ids\n",
"gom='37c8c4eeb730d1cd41f90ca6bf95c923222b0734b1b0336a475acce821f87ebd'\n",
"nwe='c5460c5a4ece7b64ffc0cc280aeade60d364423e8e062ef4a11494352fe6fdbb'\n",
"usac='2d8f42426b74af03caa9055df1952d22a011f2a210b53b9132955a89fc552433'\n"
]
},
{
"cell_type": "markdown",
"id": "59977f33",
"metadata": {},
"source": [
"### Products\n",
"Search for product ids (remove hashtags to search)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "5d1001f5",
"metadata": {
"ExecuteTime": {
"end_time": "2024-05-31T16:52:55.703310Z",
"start_time": "2024-05-31T16:52:55.699882Z"
}
},
"outputs": [],
"source": [
"# product_search = v.Products().search(term=['diesel']).to_df()\n",
"# print (product_search.to_string(index=False))\n",
"\n",
"cpp='b68cbb746f8b9098c50e2ba36bcad83001a53bd362e9031fb49085d02c36659c'\n",
"lpg='364ccbb996c944055b479810a8e74863267885dc1b01407cb0f00ab26dafe1e1'"
]
},
{
"cell_type": "markdown",
"id": "858554fe",
"metadata": {},
"source": [
"## Define main functions\n",
"The below code defines the functions which process Vortexa data into a format which can be visualised."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "535cc41e",
"metadata": {
"ExecuteTime": {
"end_time": "2024-05-31T16:52:57.433884Z",
"start_time": "2024-05-31T16:52:57.351081Z"
}
},
"outputs": [],
"source": [
"# Function for post ballast distirbution\n",
"def post_ballast_distribution(origin, origin_excl, destination, destination_excl, vessel_class, product, product_excl, start_y, start_m, start_d, end_y, end_m, end_d, show_top_x, plot, option):\n",
"\n",
" # set date objects\n",
" start=datetime(start_y, start_m, start_d)\n",
" end=datetime(end_y, end_m, end_d, 23, 59, 59)\n",
" \n",
" # Pull the laden voyages which occurred in the required timeframe\n",
" route = v.VoyagesSearchEnriched().search(\n",
" origins = origin,\n",
" origins_excluded = origin_excl,\n",
" destinations = destination,\n",
" destinations_excluded = destination_excl,\n",
" time_min = start,\n",
" time_max = end,\n",
" vessels = vessel_class,\n",
" products = product,\n",
" products_excluded = product_excl\n",
" )\n",
" \n",
" # Convert to dataframe\n",
" route = pd.DataFrame(route)\n",
" \n",
" # Sort by end_timestamp\n",
" route[\"end_timestamp\"] = pd.to_datetime(route[\"end_timestamp\"])\n",
" route.sort_values(by='end_timestamp', ascending = True, inplace=True)\n",
" \n",
" # Remove null end_timestamps\n",
" route.drop(route[pd.isnull(route['end_timestamp']) == True].index, inplace = True)\n",
" \n",
" # Remove voyages that end past the specified end date\n",
" route = route[(pd.to_datetime(route['end_timestamp']).dt.tz_convert(None) <= pd.to_datetime(end))]\n",
" \n",
" # Remove voyages still in progress (i.e. voyages with no next voyage ID)\n",
" route = route.dropna(subset=['next_voyage_id'])\n",
"\n",
" # Get the next voyage IDs\n",
" next_voyage_id_list = list(route[\"next_voyage_id\"].unique())\n",
" next_voyage_id_list=[x for x in next_voyage_id_list if x != '']\n",
" \n",
" # Get voyages corresponding to the next voyage IDs\n",
" df = v.VoyagesSearchEnriched().search(\n",
" voyage_id = next_voyage_id_list,\n",
" columns = \"all\").to_df()\n",
"\n",
" # Sort them by their start dates (end date of laden voyage/discharge date)\n",
" df[\"START DATE\"] = pd.to_datetime(df[\"START DATE\"])\n",
" df.sort_values(by='START DATE', ascending = True, inplace=True)\n",
" \n",
" # Relabel blank destinations as Undetermined\n",
" df['FINAL DESTINATION SHIPPING REGION']=df['FINAL DESTINATION SHIPPING REGION'].replace([''],'Undetermined')\n",
"\n",
" # Remove laden results\n",
" df=df.loc[df[\"VOYAGE STATUS\"] == 'Ballast']\n",
" \n",
" df.reset_index(drop=True, inplace=True)\n",
" \n",
" # Store the unique destinations\n",
" dests = list(df[\"FINAL DESTINATION SHIPPING REGION\"].unique())\n",
" \n",
" dest_counts = []\n",
" # Count the number of times each ballast destination is declared\n",
" for i in range(len(dests)):\n",
" g = len(df.loc[df['FINAL DESTINATION SHIPPING REGION'] == dests[i]])\n",
" dest_counts.append(g)\n",
"\n",
" # convert counts and destinations list to data frames\n",
" dests = pd.DataFrame(dests)\n",
" dest_counts = pd.DataFrame(dest_counts)\n",
" \n",
" # compile unique destinations and their counts\n",
" ranked = pd.concat([dests, dest_counts], axis = 1)\n",
" ranked.columns = ['Destination', 'Count']\n",
" \n",
" # Sort destinations by highest count\n",
" ranked.sort_values(by='Count', ascending = False, inplace=True)\n",
" \n",
" # Get a list of ranked destinations\n",
" dests = list(ranked[\"Destination\"])\n",
" \n",
" # Convert dates of ballast voyages to months and years for counting purposes\n",
" df[\"months\"] = df['START DATE'].dt.strftime('%m-%Y')\n",
" \n",
" # Get a complete list of dates in month/year format\n",
" dates = list(pd.date_range(start=start, end=end, freq='MS').strftime('%m-%Y'))\n",
" dates_df=pd.DataFrame(dates, columns=['Date'])\n",
" \n",
" # Initialise a data frame for dates\n",
" raw_counts_df=dates_df\n",
" \n",
" # Loop through all destinations\n",
" for j in range(len(dests)):\n",
" \n",
" # initialise a list to store counts\n",
" counts2=[]\n",
" \n",
" # loop through dates\n",
" for i in range(len(dates)):\n",
" \n",
" # count destination occurrences for this date\n",
" g = len(df[(df['FINAL DESTINATION SHIPPING REGION'] == dests[j]) & (df['months'] == dates[i])])\n",
" \n",
" # add to list\n",
" counts2.append(g)\n",
" \n",
" # convert counts to data frame and label it with corresponding destination\n",
" counts2_df=pd.DataFrame(counts2, columns=[dests[j]])\n",
" \n",
" # add counts for this destination to data frame\n",
" raw_counts_df=pd.concat([raw_counts_df, counts2_df], axis=1)\n",
" \n",
" # select count values\n",
" raw_count_vals=raw_counts_df[list(raw_counts_df.columns)[1:]]\n",
" \n",
" # convert counts to percentages\n",
" df_props = raw_count_vals.div(raw_count_vals.sum(axis=1), axis=0)\n",
" \n",
" # add dates to proportions\n",
" df_props=pd.concat([dates_df, df_props], axis=1)\n",
" \n",
" # If you wish to only see the top x destinations, put the rest into 'other'\n",
" if (len(list(raw_counts_df.columns))>(show_top_x + 1)): # if more than x breakdown labels, create another column - can change if required\n",
"\n",
" # Store first x columns\n",
" first_x=list(raw_counts_df.columns)[:(show_top_x + 1)]\n",
"\n",
" # Store the others\n",
" rest=list(raw_counts_df.columns)[(show_top_x + 1):]\n",
"\n",
" # Sum the others\n",
" raw_counts_df['other']=raw_counts_df[rest].sum(axis=1) # other column is sum of everything not in top x\n",
"\n",
" raw_counts_df2=raw_counts_df[first_x + ['other']] # compile\n",
"\n",
" # If you want all split properties to show, set show_top_x to a large number and no 'other' category will be made\n",
" else:\n",
" raw_counts_df2=raw_counts_df\n",
" \n",
" # If you wish to only see the top x destinations, put the rest into 'other'\n",
" if (len(list(df_props.columns))>(show_top_x + 1)): # if more than x breakdown labels, create another column - can change if required\n",
"\n",
" # Store first x columns\n",
" first_x=list(df_props.columns)[:(show_top_x + 1)]\n",
"\n",
" # Store the others\n",
" rest=list(df_props.columns)[(show_top_x + 1):]\n",
"\n",
" # Sum the others\n",
" df_props['other']=df_props[rest].sum(axis=1) # other column is sum of everything not in top x\n",
"\n",
" df_props2=df_props[first_x + ['other']] # compile\n",
"\n",
" # If you want all split properties to show, set show_top_x to a large number and no 'other' category will be made\n",
" else:\n",
" df_props2=df_props\n",
" \n",
" df_props2=df_props2.copy()\n",
" raw_counts_df2=raw_counts_df2.copy()\n",
" \n",
" df_props2['Date']=pd.to_datetime(df_props2['Date'], format='%m-%Y')\n",
" raw_counts_df2['Date']=pd.to_datetime(raw_counts_df2['Date'], format='%m-%Y')\n",
"\n",
" if plot:\n",
" \n",
" if option=='counts':\n",
" \n",
" # Plot ballast distribution data (counts)\n",
" fig = px.bar(\n",
" raw_counts_df2, \n",
" x=\"Date\", \n",
" y=list(raw_counts_df2.columns)[1:],\n",
" labels={\n",
" \"Date\":\"Date\",\n",
" \"value\":\"Number of voyages\"\n",
" }\n",
" )\n",
" fig.update_layout(xaxis_rangeslider_visible = True)\n",
" fig.show()\n",
" \n",
" if option=='proportions':\n",
" \n",
" # Plot ballast distribution data (proportions)\n",
" fig = px.area(\n",
" df_props2, \n",
" x=\"Date\", \n",
" y=list(df_props2.columns)[1:],\n",
" labels={\n",
" \"Date\":\"Date\",\n",
" \"value\":\"Proportion of voyages\"\n",
" }\n",
" )\n",
" fig.update_layout(xaxis_rangeslider_visible = True)\n",
" fig.show()\n",
" \n",
" \n",
" raw_counts_df2['Date']=raw_counts_df2['Date'].dt.strftime('%b-%Y')\n",
" df_props2['Date']=df_props2['Date'].dt.strftime('%b-%Y')\n",
"\n",
" \n",
" return raw_counts_df2, df_props2, df\n",
"\n",
"\n",
"# Helper function to make time blocks of 4 years from a specified start date\n",
"def get_search_blocks(start_y, start_m, start_d, today):\n",
" \n",
" \"\"\"\n",
" Vortexa's API maximum search is 4 years and starts in 2016. \n",
" This function creates a list of tuples splitting up start_date - present into 4-year blocks.\n",
" \"\"\"\n",
" \n",
" blocks=[]\n",
" \n",
" start=datetime(start_y, start_m, start_d)\n",
" end=start + dateutil.relativedelta.relativedelta(years=4) - dateutil.relativedelta.relativedelta(seconds=1)\n",
" \n",
" if end > today:\n",
" blocks.append((start, today))\n",
" \n",
" else:\n",
" blocks.append((start, end))\n",
" \n",
" while end < today:\n",
" start+=dateutil.relativedelta.relativedelta(years=4) \n",
" end+=dateutil.relativedelta.relativedelta(years=4)\n",
" \n",
" if end > today:\n",
" blocks.append((start, today))\n",
" \n",
" else: \n",
" blocks.append((start, end))\n",
" \n",
" \n",
" return blocks\n",
"\n",
"\n",
"# Function for aggregating voyages data and splitting\n",
"def voyages_time_series_with_split(start_y, start_m, start_d, end_y, end_m, end_d, origin, destination, locs, prod, prod_excl, vessel_class, vessel_class_excl, status, freq, option, operator, title, split, plot, plot_type, show_top_x):\n",
" \n",
" today=datetime(end_y, end_m, end_d)\n",
" search_blocks=get_search_blocks(start_y, start_m, start_d, today)\n",
" \n",
" result_dfs=pd.DataFrame()\n",
"\n",
" for block in search_blocks:\n",
"\n",
" time_min=block[0]\n",
" time_max=block[1]\n",
" print(f\"Downloading {option} for period: {time_min} to {time_max}\")\n",
"\n",
" \n",
" # Original query\n",
" result = v.VoyagesTimeseries().search(\n",
" time_min=time_min,\n",
" time_max=time_max,\n",
" origins=origin,\n",
" destinations=destination,\n",
" locations=locs,\n",
" latest_products=prod,\n",
" latest_products_excluded=prod_excl,\n",
" vessels=vessel_class,\n",
" vessels_excluded=vessel_class_excl,\n",
" voyage_status=status,\n",
" breakdown_property=option,\n",
" breakdown_frequency=freq,\n",
" breakdown_split_property=split,\n",
" breakdown_unit_operator=operator,\n",
" ).to_df(columns='all')\n",
"\n",
" # If you wish to split, process the data as follows\n",
" if split != None:\n",
"\n",
" # Break the output down into k data frames, all with date, id, label, value and count columns\n",
" # Stack these on top of each other\n",
"\n",
" breakdown_cols=list(result.columns)[3:]\n",
" cols=['key']+breakdown_cols\n",
" k=int(len(breakdown_cols) / 4)\n",
" result2=result[cols]\n",
"\n",
" # Empty data frame for stacking\n",
" stack=pd.DataFrame()\n",
"\n",
" # Loop through each split property\n",
" for i in range(k):\n",
"\n",
" cols=['key', f'breakdown.{i}.id', f'breakdown.{i}.label', f'breakdown.{i}.value', f'breakdown.{i}.count']\n",
"\n",
" temp=result2[cols]\n",
"\n",
" new_cols=['date', 'id', 'label', 'value', 'count']\n",
"\n",
" temp.columns=new_cols\n",
"\n",
" stack=pd.concat([stack, temp])\n",
"\n",
" # Choose relevant columns from the stacked data frame \n",
" stack2=stack[['date', 'label', 'value']]\n",
"\n",
" # Remove rows with blank labels\n",
" # These are for regions where a 0 value will show, we deal with this later\n",
" result3=stack2[stack2['label']!='']\n",
"\n",
" # Sum each split property and rank them to obtain an order for the data to appear in\n",
" result3=result3.copy()\n",
" result3['value'] = pd.to_numeric(result3['value'])\n",
" sum_per_label=result3.groupby('label')['value'].sum().reset_index()\n",
" sum_per_label.sort_values(by='value', ascending=False, inplace=True)\n",
" labels=list(sum_per_label['label'].unique()) # we use this order\n",
"\n",
" # Sort the result first by split property and then by date\n",
" # This helps us to re-transpose the data later\n",
" result3=result3.sort_values(by=['label', 'date']).copy()\n",
"\n",
" # Create and sort a dates data frame\n",
" dates_df=pd.DataFrame(result3['date'].unique(), columns=['date'])\n",
" dates_df['date']=pd.to_datetime(dates_df['date'])\n",
" dates_df.sort_values(by='date', ascending=True, inplace=True)\n",
"\n",
" # Empty data frame to store split properties' corresponding columns\n",
" store_df=pd.DataFrame()\n",
"\n",
" # First loop through each split property\n",
" for i in range(len(labels)):\n",
"\n",
" # Empty list to store values\n",
" values=[]\n",
"\n",
" # Temporary data frame to work with (only for current split property)\n",
" temp_df=result3[result3['label']==labels[i]]\n",
"\n",
" # Now loop through each date in the temporary data\n",
" for j in range(len(dates_df['date'])):\n",
"\n",
" # Obtain record for date in question\n",
" check=temp_df[temp_df['date']==dates_df['date'][j]]\n",
" \n",
" # If no record, add 0.0 as the value for that split property on that date\n",
" if len(check)==0:\n",
" values.append(0.0)\n",
"\n",
" # If record exists, add its value\n",
" else:\n",
" values.append(check['value'].iloc[0])\n",
"\n",
" # Compile\n",
" values_df=pd.DataFrame(values, columns=[labels[i]])\n",
" store_df=pd.concat([store_df, values_df], axis=1)\n",
"\n",
" # After looping, add date column\n",
" result5=pd.concat([dates_df, store_df], axis=1)\n",
"\n",
"\n",
"\n",
" # If no split, just select and rename relevant columns\n",
" else:\n",
" result5=result[['key', 'value']]\n",
" result5.columns=['date', 'value']\n",
" \n",
" result_dfs=pd.concat([result_dfs, result5])\n",
" \n",
" # If you wish to only show the top x split properties in the plot, put the rest into 'other'\n",
" if (len(list(result_dfs.columns))>(show_top_x + 1)): # if more than x breakdown labels, create another column - can change if required\n",
"\n",
" # Store first x columns\n",
" first_x=list(result_dfs.columns)[:(show_top_x + 1)]\n",
"\n",
" # Store the others\n",
" rest=list(result_dfs.columns)[(show_top_x + 1):]\n",
"\n",
" # Sum the others\n",
" result_dfs['other']=result_dfs[rest].sum(axis=1) # other column is sum of everything not in top x\n",
"\n",
" result_dfs2=result_dfs[first_x + ['other']] # compile\n",
"\n",
" # If you want all split properties to show, set show_top_x to a large number and no 'other' category will be made\n",
" else:\n",
" result_dfs2=result_dfs\n",
" \n",
" # Set units for y axis label if you wish to plot\n",
" if option=='vessel_count':\n",
" y_axis_label='No. of vessels'\n",
" \n",
" elif option=='utilisation':\n",
" y_axis_label=\"No. of vessels\"\n",
"\n",
" elif option=='cargo_quantity':\n",
" y_axis_label=\"tonne-days\"\n",
" \n",
" elif option=='dwt':\n",
" y_axis_label=\"dwt\"\n",
"\n",
" elif option=='cubic_capacity':\n",
" y_axis_label=\"cubic meters\"\n",
"\n",
" elif option=='tonne_miles':\n",
" y_axis_label=\"tonne-miles\"\n",
" \n",
" elif option=='avg_speed':\n",
" y_axis_label=\"knots\"\n",
" \n",
" if plot_type=='area':\n",
" \n",
" if plot: # plot data if desired\n",
"\n",
" fig = px.area(\n",
" result_dfs2, # data to plot\n",
" title=title, # title set as input\n",
" x=\"date\",\n",
" y=list(result_dfs2.columns)[1:],\n",
" labels={\n",
" \"date\":\"Date\",\n",
" \"value\":y_axis_label # unit label\n",
" },\n",
" )\n",
" fig.update_layout(xaxis_rangeslider_visible = True)\n",
" fig.show()\n",
" \n",
" if plot_type=='line':\n",
" \n",
" if plot: # plot data if desired\n",
"\n",
" fig = px.line(\n",
" result_dfs2, # data to plot\n",
" title=title, # title set as input\n",
" x=\"date\",\n",
" y=list(result_dfs2.columns)[1:],\n",
" labels={\n",
" \"date\":\"Date\",\n",
" \"value\":y_axis_label # unit label\n",
" },\n",
" )\n",
" fig.update_layout(xaxis_rangeslider_visible = True)\n",
" fig.show()\n",
" \n",
" if plot_type=='bar':\n",
" \n",
" if plot: # plot data if desired\n",
"\n",
" fig = px.bar(\n",
" result_dfs2, # data to plot\n",
" title=title, # title set as input\n",
" x=\"date\",\n",
" y=list(result_dfs2.columns)[1:],\n",
" labels={\n",
" \"date\":\"Date\",\n",
" \"value\":y_axis_label # unit label\n",
" },\n",
" )\n",
" fig.update_layout(xaxis_rangeslider_visible = True)\n",
" fig.show()\n",
" \n",
" # Reformat dates and rename date column\n",
" result_dfs2=result_dfs2.copy()\n",
" result_dfs2['date']=result_dfs2['date'].dt.strftime('%d-%m-%Y')\n",
" result_dfs2.rename(columns={'date': 'Date'}, inplace=True)\n",
" \n",
" if split==None:\n",
" result_dfs2.rename(columns={'value': title}, inplace=True)\n",
" \n",
" result_dfs2 = result_dfs2.fillna(0)\n",
" \n",
"\n",
" return result_dfs2\n",
"\n",
"\n",
"# function to create a moving average\n",
"def moving_average(data, period, option):\n",
" \n",
" if option=='multiple':\n",
"\n",
" # calculate moving avg\n",
" moving_avg = pd.DataFrame(data.iloc[:, 1:].rolling(window=period, min_periods=1).mean())\n",
"\n",
" # add moving average\n",
" moving_avg_df=pd.concat([data.iloc[0:, 0:1], moving_avg], axis=1)\n",
"\n",
" moving_avg_df.columns=list(data.columns)\n",
" \n",
" elif option=='single':\n",
" \n",
" # calculate moving avg\n",
" moving_avg = pd.DataFrame(data['value'].rolling(window=period, min_periods=1).mean())\n",
" moving_avg.columns=[f'{period}-day moving_avg']\n",
"\n",
" # get all columns\n",
" data_cols=list(data.columns)\n",
"\n",
" # get all columns except vlaue\n",
" date_cols=[x for x in data_cols if x !='value']\n",
"\n",
" # add moving average\n",
" moving_avg_df=pd.concat([data[date_cols], moving_avg], axis=1)\n",
"\n",
" moving_avg_df.rename(columns={f'{period}-day moving_avg':'value'}, inplace=True)\n",
" \n",
"\n",
" return moving_avg_df\n",
"\n",
"# Function for getting freight data\n",
"def voyages_time_series(start_y, start_m, start_d, origin, origin_excl, destination, destination_excl, prod, prod_excl, vessel_class, vessel_class_excl, status, freq, unit, operator):\n",
" \n",
" today=datetime.today()\n",
" search_blocks=get_search_blocks(start_y, start_m, start_d, today)\n",
" \n",
" result_dfs=pd.DataFrame()\n",
"\n",
" for block in search_blocks:\n",
"\n",
" time_min=block[0]\n",
" time_max=block[1]\n",
" print(f\"Downloading freight data for period: {time_min} to {time_max}\")\n",
"\n",
" \n",
" # Original query\n",
" result = v.VoyagesTimeseries().search(\n",
" time_min=time_min,\n",
" time_max=time_max,\n",
" origins=origin,\n",
" origins_excluded=origin_excl,\n",
" destinations=destination,\n",
" destinations_excluded=destination_excl,\n",
" latest_products=prod,\n",
" latest_products_excluded=prod_excl,\n",
" vessels=vessel_class,\n",
" vessels_excluded=vessel_class_excl,\n",
" voyage_status=status,\n",
" breakdown_frequency=freq,\n",
" breakdown_property=unit,\n",
" breakdown_unit_operator=operator\n",
" ).to_df(columns='all')\n",
"\n",
" result2=result[['key', 'value']]\n",
" result2.columns=['date', 'value']\n",
" \n",
" result_dfs=pd.concat([result_dfs, result2])\n",
" \n",
" # Reformat dates and rename date column\n",
" result_dfs=result_dfs.copy()\n",
" result_dfs['date'] = pd.to_datetime(result_dfs['date'])\n",
" result_dfs['string_date']=result_dfs['date'].dt.strftime('%d-%m-%Y')\n",
" result_dfs['dd_mmm']=result_dfs['date'].dt.strftime('%d-%b')\n",
" result_dfs['month']=result_dfs['date'].dt.strftime('%b')\n",
" result_dfs['week_end_timestamp'] = result_dfs['date'] + pd.offsets.Week(weekday=6) \n",
" result_dfs['week_number'] = result_dfs['date'].dt.isocalendar().week\n",
" result_dfs['year']=round(pd.to_numeric(result_dfs['date'].dt.strftime('%Y')), 0)\n",
" result_dfs = result_dfs.fillna(0)\n",
" \n",
" result_dfs=result_dfs[['date', 'week_end_timestamp', 'string_date', 'dd_mmm', 'week_number', 'month', 'year', 'value']]\n",
" \n",
" result_dfs.reset_index(drop=True, inplace=True)\n",
"\n",
" return result_dfs\n",
"\n",
"\n",
"\n",
"# function for obtaining seasonal chart data\n",
"def seasonal_charts(data, freq, start_y):\n",
" \n",
" # Remove leap days for daily time series\n",
" df=data[data['dd_mmm']!='29-Feb']\n",
" df.reset_index(drop=True, inplace=True)\n",
" \n",
" # Set constants\n",
" current_date=datetime.today()\n",
" this_year=current_date.year\n",
" last_year=this_year-1\n",
" stats_end_y=last_year\n",
" stats_start_y=start_y\n",
" \n",
" # Define stats calculating data set and current year dataset\n",
" stats_df=df[(df['year'] >= stats_start_y) & (df['year'] <= stats_end_y)]\n",
" this_year_df=df[df['year']==this_year]\n",
" \n",
" # if frequency is daily, calculate stats on a daily basis\n",
" if freq=='day':\n",
"\n",
" # date range creation - use a non-leap year\n",
" start_date = datetime(2023, 1, 1)\n",
" end_date = datetime(2023, 12, 31)\n",
"\n",
" date_range = pd.DataFrame(pd.date_range(start=start_date, end=end_date, freq='1D'), columns=['Date'])\n",
" date_range['Date']=date_range['Date'].dt.strftime('%d-%b')\n",
" \n",
" # empty lists to store stats\n",
" mins=[]\n",
" maxs=[]\n",
" avgs=[]\n",
" this_year_vals=[]\n",
" \n",
" # loop through dates and calculate stats\n",
" for i in range(len(date_range)):\n",
"\n",
" temp=stats_df[stats_df['dd_mmm']==date_range['Date'][i]]\n",
" \n",
" mn=min(temp['value'])\n",
" mx=max(temp['value'])\n",
" av=temp['value'].mean()\n",
"\n",
" mins.append(mn)\n",
" maxs.append(mx)\n",
" avgs.append(av)\n",
"\n",
" # obtain last year's values\n",
" last_year_df=pd.DataFrame(stats_df[stats_df['year']==last_year]['value'])\n",
" last_year_df.columns=['Last year']\n",
" last_year_df.reset_index(drop=True, inplace=True)\n",
" \n",
" # loop through dates and obtain current year values, if no data yet, add a blank\n",
" for i in range(len(date_range)):\n",
"\n",
" temp=this_year_df[this_year_df['dd_mmm']==date_range['Date'][i]]\n",
"\n",
" if (len(temp)!=0):\n",
"\n",
" add=temp['value'].iloc[0]\n",
"\n",
" this_year_vals.append(add)\n",
"\n",
" elif (len(temp)==0):\n",
"\n",
" this_year_vals.append('')\n",
"\n",
" \n",
" # convert stats to data frames\n",
" mins_df=pd.DataFrame(mins, columns=['Min.'])\n",
" maxs_df=pd.DataFrame(maxs, columns=['Max.'])\n",
" avgs_df=pd.DataFrame(avgs, columns=[f'Average {stats_start_y}-{stats_end_y}'])\n",
" this_year_vals_df=pd.DataFrame(this_year_vals, columns=['Current year'])\n",
" \n",
" # compile data\n",
" seasonal_df=pd.concat([date_range, mins_df, maxs_df, avgs_df, last_year_df, this_year_vals_df], axis=1)\n",
" \n",
" # calculate range\n",
" seasonal_df[f'Range {stats_start_y}-{stats_end_y}']=seasonal_df['Max.']-seasonal_df['Min.']\n",
" \n",
" # compile in desired order\n",
" seasonal_df=seasonal_df[['Date', 'Min.', f'Range {stats_start_y}-{stats_end_y}', f'Average {stats_start_y}-{stats_end_y}', 'Last year', 'Current year']]\n",
"\n",
" \n",
" # if frequency is monthly, calculate stas on a monthly basis\n",
" elif freq=='month':\n",
"\n",
" # date range creation\n",
" start_date = datetime(2023, 1, 1)\n",
" end_date = datetime(2023, 12, 31)\n",
"\n",
" date_range = pd.DataFrame(pd.date_range(start=start_date, end=end_date, freq='1M'), columns=['Date'])\n",
" date_range['Date']=date_range['Date'].dt.strftime('%b')\n",
" \n",
" # empty lists to store various stats\n",
" mins=[]\n",
" maxs=[]\n",
" avgs=[]\n",
" this_year_vals=[]\n",
" \n",
" # loop through dates and calculate stats\n",
" for i in range(len(date_range)):\n",
"\n",
" temp=stats_df[stats_df['month']==date_range['Date'][i]]\n",
"\n",
" mn=min(temp['value'])\n",
" mx=max(temp['value'])\n",
" av=temp['value'].mean()\n",
"\n",
" mins.append(mn)\n",
" maxs.append(mx)\n",
" avgs.append(av)\n",
"\n",
" # obtain previous year's values\n",
" last_year_df=pd.DataFrame(stats_df[stats_df['year']==last_year]['value'])\n",
" last_year_df.columns=['Last year']\n",
" last_year_df.reset_index(drop=True, inplace=True)\n",
" \n",
" # loop through dates and obtain current year values, if not data yet, add a blank\n",
" for i in range(len(date_range)):\n",
"\n",
" temp=this_year_df[this_year_df['month']==date_range['Date'][i]]\n",
"\n",
" if (len(temp)!=0):\n",
"\n",
" add=temp['value'].iloc[0]\n",
"\n",
" this_year_vals.append(add)\n",
"\n",
" elif (len(temp)==0):\n",
"\n",
" this_year_vals.append('')\n",
"\n",
" # convert stats lists to data frames\n",
" mins_df=pd.DataFrame(mins, columns=['Min.'])\n",
" maxs_df=pd.DataFrame(maxs, columns=['Max.'])\n",
" avgs_df=pd.DataFrame(avgs, columns=[f'Average {stats_start_y}-{stats_end_y}'])\n",
" this_year_vals_df=pd.DataFrame(this_year_vals, columns=['Current year'])\n",
" \n",
" # compile data\n",
" seasonal_df=pd.concat([date_range, mins_df, maxs_df, avgs_df, last_year_df, this_year_vals_df], axis=1)\n",
" \n",
" # calculate the range \n",
" seasonal_df[f'Range {stats_start_y}-{stats_end_y}']=seasonal_df['Max.']-seasonal_df['Min.']\n",
" \n",
" # compile in desired order\n",
" seasonal_df=seasonal_df[['Date', 'Min.', f'Range {stats_start_y}-{stats_end_y}', f'Average {stats_start_y}-{stats_end_y}', 'Last year', 'Current year']]\n",
" \n",
" \n",
" return seasonal_df\n",
"\n",
"# Function to plot seasonal chart\n",
"def plot_seasonal(y_min, y_max, data, title):\n",
" df=data\n",
"\n",
" colors = {\n",
" 'Min.': 'white', \n",
" list(df.columns)[2]: 'lightblue', \n",
" list(df.columns)[3]: 'blue', \n",
" 'Last year': 'yellow', \n",
" 'Current year': 'red' \n",
" }\n",
"\n",
" fig = px.area(df, x='Date', y=list(df.columns)[1:3], title=title, color_discrete_map=colors)\n",
"\n",
" # Add line charts for Average, Last year, and Current year\n",
" for column in list(df.columns)[3:6]:\n",
" fig.add_scatter(x=df['Date'], y=df[column], mode='lines', name=column, line=dict(color=colors[column]))\n",
"\n",
" # Set the y-axis range\n",
" fig.update_yaxes(range=[y_min, y_max])\n",
"\n",
" # Show the plot\n",
" fig.show()\n",
" \n",
"# Function to plot and extract seasonal data\n",
"def complete_seasonal_voyages(start_y, start_m, start_d, origin, origin_excl, destination, destination_excl, prod, prod_excl, vessel_class, vessel_class_excl, status, freq, unit, operator, ma_period, plot, title, y_min, y_max):\n",
"\n",
" # Query voyages data\n",
" daily_voyages_ts=voyages_time_series(start_y=start_y, start_m=start_m, start_d=start_d, \n",
" prod=prod, prod_excl=prod_excl, \n",
" vessel_class=vessel_class, vessel_class_excl=vessel_class_excl, \n",
" status=status,\n",
" freq=freq, unit=unit, operator=operator, \n",
" origin=origin, origin_excl=origin_excl,\n",
" destination=destination, destination_excl=destination_excl)\n",
"\n",
"\n",
" if ma_period==None:\n",
" data=seasonal_charts(data=daily_voyages_ts, freq=freq, start_y=start_y)\n",
"\n",
" else:\n",
" # Calculate moving averages\n",
" voyages_ts_x_day_ma=moving_average(data=daily_voyages_ts, period=ma_period, option='single')\n",
" data=seasonal_charts(data=voyages_ts_x_day_ma, freq=freq, start_y=start_y)\n",
" title=title+f' ({ma_period}-{freq} MA)'\n",
"\n",
" if plot:\n",
" plot_seasonal(y_min=y_min, y_max=y_max, \n",
" data=data, \n",
" title=title)\n",
" \n",
" return data"
]
},
{
"cell_type": "markdown",
"id": "8b81d4ee",
"metadata": {},
"source": [
"## Post laden route ballast distribution\n",
"Due to Vortexa's unique Voyages dataset, which has unique identifiers which are all linked to the next voyage as well as the previous voyage identifiers, we can gain insight into what tankers are doing after they discharge on a specified route. This is valuable for analysts who support freight traiders and who want to plan the best regional positioning of their fleets based on the latest changes in tanker behaviour. Additionally, commodity traders can use this to anticipate increased demand in a region.\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "d4b1eb59",
"metadata": {},
"source": [
"### Worked example - TC2\n",
"In this example, we visualise the behaviour of MR2 tankers after discharging CPP on TC2 (Europe-to-USAC)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "92272dc2",
"metadata": {
"ExecuteTime": {
"end_time": "2024-05-31T16:54:52.602624Z",
"start_time": "2024-05-31T16:54:40.335536Z"
}
},
"outputs": [
{
"data": {
"application/vnd.plotly.v1+json": {
"config": {
"plotlyServerURL": "https://plot.ly"
},
"data": [
{
"fillpattern": {
"shape": ""
},
"hovertemplate": "variable=Gulf of Mexico (GoM)
Date=%{x}
Proportion of voyages=%{y}
Date=%{x}
Proportion of voyages=%{y}
Date=%{x}
Proportion of voyages=%{y}
Date=%{x}
Proportion of voyages=%{y}