{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Module 5\n", "\n", "## Video 22: Filtering by Time\n", "**Python for the Energy Industry**\n", "\n", "## Datetime Objects\n", "\n", "In the 'Cargo Movements Example' video, we saw the `datetime` object used to specify a particular data and time to look for cargo movements. In this lesson we explore in more detail the `datetime` object, and how it is used for filtering. \n", "\n", "When given 3 arguments, a datetime object represents midnight at the beginning of the day specified by `datetime(YYYY,MM,DD)`:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2020-11-01 00:00:00\n" ] } ], "source": [ "from datetime import datetime\n", "\n", "# 00:00 November 1st, 2020\n", "print(datetime(2020,11,1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Additional arguments represent hours, minutes, and seconds respectively:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2020-11-01 12:00:00\n" ] } ], "source": [ "# 12:00 November 1st, 2020\n", "print(datetime(2020,11,1,12))" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2020-11-01 12:30:00\n" ] } ], "source": [ "# 12:30 November 1st, 2020\n", "print(datetime(2020,11,1,12,30))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2020-11-01 12:30:09\n" ] } ], "source": [ "# 12:30:09 November 1st, 2020\n", "print(datetime(2020,11,1,12,30,9))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's straightforward to get the current date/time:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2021-01-24 14:58:59.263863\n" ] } ], "source": [ "print(datetime.utcnow())" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "84 days, 14:59:24.868841\n" ] } ], "source": [ "print(datetime.utcnow() - datetime(2020,11,1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Times up to Now\n", "\n", "Say you want data over a time period stretching from 1 day, or week, or month ago, up to the current time. The `relativedelta` object can be used for this." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2021-01-23 15:00:28.838017\n", "2021-01-17 15:00:28.838017\n", "2020-12-24 15:00:28.838017\n" ] } ], "source": [ "from dateutil.relativedelta import relativedelta\n", "now = datetime.utcnow()\n", "\n", "one_day_ago = now - relativedelta(days=1)\n", "one_week_ago = now - relativedelta(weeks=1)\n", "one_month_ago = now - relativedelta(months=1)\n", "\n", "print(one_day_ago)\n", "print(one_week_ago)\n", "print(one_month_ago)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filtering\n", "\n", "When pulling Cargo Movements data from the Vortexa API, we are generally only interested in some subset of the data. This may be data from a particular time window, originating or destinated for a particular location, carrying a particular product, a particular vessel, or some combination of these conditions. This is called 'filtering'.\n", "\n", "Filtering by location, product, or vessel is done using the associated IDs that we can access from the relevant endpoints. Filtering by time is a bit different: as you've seen, datetime objects are used for this.\n", "\n", "As a reminder, documentation for the Cargo Movements endpoint can be [found here.](https://vortechsa.github.io/python-sdk/endpoints/cargo_movements/)\n", "\n", "## Timestamp Filters\n", "\n", "The meaning of `filter_time_min` and `filter_time_max` depends on the `filter_activity` corresponding to these times. The following activities:\n", "- loading_start\n", "- identified_for_loading_at\n", "- storing_start\n", "- storing_end\n", "- unloading_start\n", "- unloading_end\n", "These filters that correspond to an exact timestamp at which the event occured. Filtering on these will give Cargo Movements where the timestamp of the corresponding activity is between `filter_time_min` and `filter_time_max`." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2021-01-24 15:02:08,726 vortexasdk.client — WARNING — You are using vortexasdk version 0.28.0, however version 0.28.5 is available.\n", "You should consider upgrading via the 'pip install vortexasdk --upgrade' command.\n", "274\n" ] } ], "source": [ "import vortexasdk as v\n", "\n", "cm_query = v.CargoMovements().search(\n", " filter_activity=\"loading_start\",\n", " filter_time_min=one_day_ago,\n", " filter_time_max=now)\n", "\n", "print(len(cm_query))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This means that there are 257 Cargo Movements that started loading between midnight and midday on November 1st. Obviously, if the same time is given as both the min and max for a timestamp filter, zero results will be returned:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Loading from API: 0it [00:00, ?it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "0\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "cm_query = v.CargoMovements().search(\n", " filter_activity=\"loading_end\",\n", " filter_time_min=now,\n", " filter_time_max=now)\n", "\n", "print(len(cm_query))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note: you can of course use specific datetime objects, rather than relative dates, for filtering.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## State Filters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Certain activities correspond to states that last for some time, rather than instantaneous timestamps:\n", "- loading_state\n", "- identified_for_loading_state\n", "- transiting_state\n", "- storing_state\n", "- unloaded_state\n", "- any_activity\n", "When filtering on a state, you will get all Cargo Movements which were in that state at any point between `filter_time_min` and `filter_time_max`. This means even if `filter_time_min` and `filter_time_max` are the same time, you will still get back any Cargo Movements that were in that state at that time:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "427\n" ] } ], "source": [ "cm_query = v.CargoMovements().search(\n", " filter_activity=\"loading_state\",\n", " filter_time_min=now,\n", " filter_time_max=now)\n", "\n", "print(len(cm_query))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Naturally, the number of Cargo Movements returned by a general query like this will become quite large as the filter window is expanded:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Loading from API: 1000it [00:01, 865.14it/s] \n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "last day: 835\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Loading from API: 4000it [00:04, 883.84it/s] \n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "last week: 3804\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Loading from API: 16500it [00:10, 1501.56it/s] " ] }, { "name": "stdout", "output_type": "stream", "text": [ "last month: 16169\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "cm_query = v.CargoMovements().search(\n", " filter_activity=\"loading_state\",\n", " filter_time_min=one_day_ago,\n", " filter_time_max=now)\n", "\n", "print('last day:',len(cm_query))\n", "\n", "cm_query = v.CargoMovements().search(\n", " filter_activity=\"loading_state\",\n", " filter_time_min=one_week_ago,\n", " filter_time_max=now)\n", "\n", "print('last week:',len(cm_query))\n", "\n", "cm_query = v.CargoMovements().search(\n", " filter_activity=\"loading_state\",\n", " filter_time_min=one_month_ago,\n", " filter_time_max=now)\n", "\n", "print('last month:',len(cm_query))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note of caution: be careful about directly putting `datetime.utcnow()` as the `filter_time_max` argument, or putting `now = datetime.utcnow()` in the same cell as now is passed in the argument. There is a risk that small differences between the time measured on your computer and the Vortexa servers can mean that `now` is thought to be in the future, giving an error!*\n", "\n", "### Exercise\n", "\n", "Create a pandas DataFrame that gives the number of cargos that are being loaded at 00:00UTC on each day of the previous month." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" } }, "nbformat": 4, "nbformat_minor": 4 }