# Module 5

## Video 22: Filtering by Time
**Python for the Energy Industry**

## Datetime Objects

In the 'Cargo Movements Example' video, we saw the `datetime` object used to specify a particular data and time to look for cargo movements. In this lesson we explore in more detail the `datetime` object, and how it is used for filtering. 

When given 3 arguments, a datetime object represents midnight at the beginning of the day specified by `datetime(YYYY,MM,DD)`:

In [1]:
from datetime import datetime

# 00:00 November 1st, 2020
print(datetime(2020,11,1))

2020-11-01 00:00:00


Additional arguments represent hours, minutes, and seconds respectively:

In [2]:
# 12:00 November 1st, 2020
print(datetime(2020,11,1,12))

2020-11-01 12:00:00


In [3]:
# 12:30 November 1st, 2020
print(datetime(2020,11,1,12,30))

2020-11-01 12:30:00


In [4]:
# 12:30:09 November 1st, 2020
print(datetime(2020,11,1,12,30,9))

2020-11-01 12:30:09


It's straightforward to get the current date/time:

In [5]:
print(datetime.utcnow())

2021-01-24 14:58:59.263863


In [6]:
print(datetime.utcnow() - datetime(2020,11,1))

84 days, 14:59:24.868841


## Times up to Now

Say you want data over a time period stretching from 1 day, or week, or month ago, up to the current time. The `relativedelta` object can be used for this.

In [7]:
from dateutil.relativedelta import relativedelta
now = datetime.utcnow()

one_day_ago = now - relativedelta(days=1)
one_week_ago = now - relativedelta(weeks=1)
one_month_ago = now - relativedelta(months=1)

print(one_day_ago)
print(one_week_ago)
print(one_month_ago)

2021-01-23 15:00:28.838017
2021-01-17 15:00:28.838017
2020-12-24 15:00:28.838017


## Filtering

When pulling Cargo Movements data from the Vortexa API, we are generally only interested in some subset of the data. This may be data from a particular time window, originating or destinated for a particular location, carrying a particular product, a particular vessel, or some combination of these conditions. This is called 'filtering'.

Filtering by location, product, or vessel is done using the associated IDs that we can access from the relevant endpoints. Filtering by time is a bit different: as you've seen, datetime objects are used for this.

As a reminder, documentation for the Cargo Movements endpoint can be [found here.](https://vortechsa.github.io/python-sdk/endpoints/cargo_movements/)

## Timestamp Filters

The meaning of `filter_time_min` and `filter_time_max` depends on the `filter_activity` corresponding to these times. The following activities:
- loading_start
- identified_for_loading_at
- storing_start
- storing_end
- unloading_start
- unloading_end
These filters that correspond to an exact timestamp at which the event occured. Filtering on these will give Cargo Movements where the timestamp of the corresponding activity is between `filter_time_min` and `filter_time_max`.

In [8]:
import vortexasdk as v

cm_query = v.CargoMovements().search(
 filter_activity="loading_start",
 filter_time_min=one_day_ago,
 filter_time_max=now)

print(len(cm_query))

You should consider upgrading via the 'pip install vortexasdk --upgrade' command.
274


This means that there are 257 Cargo Movements that started loading between midnight and midday on November 1st. Obviously, if the same time is given as both the min and max for a timestamp filter, zero results will be returned:

In [9]:
cm_query = v.CargoMovements().search(
 filter_activity="loading_end",
 filter_time_min=now,
 filter_time_max=now)

print(len(cm_query))

Loading from API: 0it [00:00, ?it/s]

0





*Note: you can of course use specific datetime objects, rather than relative dates, for filtering.*

## State Filters

Certain activities correspond to states that last for some time, rather than instantaneous timestamps:
- loading_state
- identified_for_loading_state
- transiting_state
- storing_state
- unloaded_state
- any_activity
When filtering on a state, you will get all Cargo Movements which were in that state at any point between `filter_time_min` and `filter_time_max`. This means even if `filter_time_min` and `filter_time_max` are the same time, you will still get back any Cargo Movements that were in that state at that time:

In [10]:
cm_query = v.CargoMovements().search(
 filter_activity="loading_state",
 filter_time_min=now,
 filter_time_max=now)

print(len(cm_query))

427


Naturally, the number of Cargo Movements returned by a general query like this will become quite large as the filter window is expanded:

In [11]:
cm_query = v.CargoMovements().search(
 filter_activity="loading_state",
 filter_time_min=one_day_ago,
 filter_time_max=now)

print('last day:',len(cm_query))

cm_query = v.CargoMovements().search(
 filter_activity="loading_state",
 filter_time_min=one_week_ago,
 filter_time_max=now)

print('last week:',len(cm_query))

cm_query = v.CargoMovements().search(
 filter_activity="loading_state",
 filter_time_min=one_month_ago,
 filter_time_max=now)

print('last month:',len(cm_query))

Loading from API: 1000it [00:01, 865.14it/s] 


last day: 835


Loading from API: 4000it [00:04, 883.84it/s] 


last week: 3804


Loading from API: 16500it [00:10, 1501.56it/s] 

last month: 16169





*Note of caution: be careful about directly putting `datetime.utcnow()` as the `filter_time_max` argument, or putting `now = datetime.utcnow()` in the same cell as now is passed in the argument. There is a risk that small differences between the time measured on your computer and the Vortexa servers can mean that `now` is thought to be in the future, giving an error!*

### Exercise

Create a pandas DataFrame that gives the number of cargos that are being loaded at 00:00UTC on each day of the previous month.