DateTime gotcha-s¶
This post will be reworking some examples from https://dev.arie.bovenberg.net/blog/python-datetime-pitfalls/ which show how counter-intuitive some of the default Python date-time handling is as the corner cases.
I will also look at how Pandas handle date-times, as I do most of data wrangling with Pandas.
from zoneinfo import ZoneInfo
from datetime import datetime, timedelta, date, timezone, UTC
import pandas as pd
import dateutil
%load_ext watermark
Time jumps due to daylight saving¶
Europe changes clocks forward on the last Sunday in March. So if we go to bed at 10:00 pm, and wake up at 7:00, we have only really slept 8 hours.
However the datetime library appears to ignore clock jumping.
paris = ZoneInfo('Europe/Paris')
# last Sunday in March in Paris, so clock should jump forward
bedtime = datetime(2023,3,25,22, tzinfo=paris)
wake_up = datetime(2023, 3, 26, 7, tzinfo=paris)
sleep= wake_up - bedtime
print(f'{sleep=}')
hours = sleep.total_seconds()/3600
print(f'Hours slept = {hours}')
sleep=datetime.timedelta(seconds=32400) Hours slept = 9.0
If we print out the two datetime variables in question, we can see that they have a different UTC offset. This shows us that the clocks were changed while we were sleeping.
print(f' {bedtime.utcoffset()=}, {wake_up.utcoffset()=}')
bedtime.utcoffset()=datetime.timedelta(seconds=3600), wake_up.utcoffset()=datetime.timedelta(seconds=7200)
The change in offset is exactly one hour.
wake_up.utcoffset() - bedtime.utcoffset()
datetime.timedelta(seconds=3600)
So now we can correct our time calculation, by considering the change in UTC offset (if any)
sleep= wake_up - bedtime -(wake_up.utcoffset() - bedtime.utcoffset())
hours = sleep.total_seconds()/3600
print(f'Hours slept = {hours}')
Hours slept = 8.0
We also get the correct answer if there is no clock jumping involved
bedtime = datetime(2023,2,21,22, tzinfo=paris)
wake_up = datetime(2023, 2, 22, 7, tzinfo=paris)
sleep= wake_up - bedtime -(wake_up.utcoffset() - bedtime.utcoffset())
hours = sleep.total_seconds()/3600
print(f'Hours slept = {hours}')
Hours slept = 9.0
pandas gets it right¶
Pandas has extensive date-time support, and seems to get it right, with no coding from us required.
First the case of a clock change while we sleep.
bedtime = pd.to_datetime(datetime(2023,3,25,22, tzinfo=paris))
wake_up = pd.to_datetime(datetime(2023, 3, 26, 7, tzinfo=paris))
sleep= wake_up - bedtime
print(f'{sleep=}')
hours = sleep.total_seconds()/3600
print(f'Hours slept = {hours}')
sleep=Timedelta('0 days 08:00:00') Hours slept = 8.0
And now a case with no clock change:
bedtime = pd.to_datetime(datetime(2023,2,21,22, tzinfo=paris))
wake_up = pd.to_datetime(datetime(2023, 2, 22, 7, tzinfo=paris))
sleep= wake_up - bedtime
print(f'{sleep=}')
hours = sleep.total_seconds()/3600
print(f'Hours slept = {hours}')
sleep=Timedelta('0 days 09:00:00') Hours slept = 9.0
Non-existent times¶
If we have a clock change where the clock is moved forward, then some times become non-existance. For example consider the case where what would normally be 2:00 AM becomes 3:00 AM.
This makes 2:30 AM an impossible wall-clock time.
By default, Python will happily create these date-times!
# ⚠️ This time does not exist on this date
d = datetime(2023, 3, 26, 2, 30, tzinfo=paris)
d
datetime.datetime(2023, 3, 26, 2, 30, tzinfo=zoneinfo.ZoneInfo(key='Europe/Paris'))
Converting this Python datetime to a Pandas Timestamp, results in a legal date-time, half an hour past what is now 3:00 AM.
pd.to_datetime(d)
Timestamp('2023-03-26 03:30:00+0200', tz='Europe/Paris')
Another way to handle this is to create in Python a date-time that has no TimeZone unspecified (naive), and then tell Pandas to convert this to the Time Zone required. We can specify a behaviour if the date-time is non-existent, e.g. we can raise an exception if any such date-time is seen.
pd.to_datetime(datetime(2023,3,26,2,30)).tz_localize(paris, nonexistent='raise')
--------------------------------------------------------------------------- NonExistentTimeError Traceback (most recent call last) Cell In[13], line 1 ----> 1 pd.to_datetime(datetime(2023,3,26,2,30)).tz_localize(paris, nonexistent='raise') File timestamps.pyx:2327, in pandas._libs.tslibs.timestamps.Timestamp.tz_localize() File tzconversion.pyx:180, in pandas._libs.tslibs.tzconversion.tz_localize_to_utc_single() File tzconversion.pyx:426, in pandas._libs.tslibs.tzconversion.tz_localize_to_utc() NonExistentTimeError: 2023-03-26 02:30:00
The other behavious are to slide the input date-time backwards or forwards to the closest legal time
pd.to_datetime(datetime(2023,3,26,2,30)).tz_localize(paris, nonexistent='shift_forward')
Timestamp('2023-03-26 03:00:00+0200', tz='Europe/Paris')
pd.to_datetime(datetime(2023,3,26,2,30)).tz_localize(paris, nonexistent='shift_backward')
Timestamp('2023-03-26 01:59:59.999999999+0100', tz='Europe/Paris')
Duplicated times¶
Daylight saving where clocks go back can create a situation where a given wall clock time can occur twice in a 24 hour period. Standard Python has a parameter fold
that lets you specify the first or last such time. The default is the first such wall clock time.
d = datetime(2023,10,29,2,30,tzinfo=paris)
d
datetime.datetime(2023, 10, 29, 2, 30, tzinfo=zoneinfo.ZoneInfo(key='Europe/Paris'))
If we create an ambiguous date-time, by default we get the first one (larger of the two UTC Offsets)
d.utcoffset()
datetime.timedelta(seconds=7200)
If we specify the seond wall-clock time we get the smaller UTC Offset.
d2 = datetime(2023,10,29,2,30,tzinfo=paris, fold=1, )
d2
datetime.datetime(2023, 10, 29, 2, 30, fold=1, tzinfo=zoneinfo.ZoneInfo(key='Europe/Paris'))
d2.utcoffset()
datetime.timedelta(seconds=3600)
Again, with pandas
we can specify the behaviour we want, including raisng an exception if such a date-time is seen. The error message is little confusing?
pd.to_datetime(datetime(2023,10,29,2,30)).tz_localize(paris, ambiguous='raise')
--------------------------------------------------------------------------- AmbiguousTimeError Traceback (most recent call last) Cell In[77], line 1 ----> 1 pd.to_datetime(datetime(2023,10,29,2,30)).tz_localize(paris, ambiguous='raise') File timestamps.pyx:2327, in pandas._libs.tslibs.timestamps.Timestamp.tz_localize() File tzconversion.pyx:180, in pandas._libs.tslibs.tzconversion.tz_localize_to_utc_single() File tzconversion.pyx:371, in pandas._libs.tslibs.tzconversion.tz_localize_to_utc() AmbiguousTimeError: Cannot infer dst time from 2023-10-29 02:30:00, try using the 'ambiguous' argument
We can also specify a boolean value, that indicates if DST adjustment is to be applied.
dst_bool = False
d1 = pd.to_datetime(datetime(2023,10,29,2,30)).tz_localize(paris, ambiguous=dst_bool)
d1
Timestamp('2023-10-29 02:30:00+0100', tz='Europe/Paris')
dst_bool = True
d2 = pd.to_datetime(datetime(2023,10,29,2,30)).tz_localize(paris, ambiguous=dst_bool)
d2
Timestamp('2023-10-29 02:30:00+0200', tz='Europe/Paris')
pandas
also repects the standard Python fold
parameter.
d1 = datetime(2023,10,29,2,30,tzinfo=paris)
pd.to_datetime(d1)
Timestamp('2023-10-29 02:30:00+0200', tz='Europe/Paris')
d2 = datetime(2023,10,29,2,30,tzinfo=paris, fold=1, )
pd.to_datetime(d2)
Timestamp('2023-10-29 02:30:00+0100', tz='Europe/Paris')
Comparisons¶
Comparison of date-times can be confusing. As an example, below are six ways of asking "what is the date and time now" (one deprecated).
print(f'{date.today()=}')
print(f'{datetime.today()=}')
print(f'{datetime.now()=}')
print(f'{datetime.utcnow()=}')
print(f'{datetime.now(timezone.utc)=}')
print(f'{datetime.now(UTC)=}')
date.today()=datetime.date(2024, 2, 19) datetime.today()=datetime.datetime(2024, 2, 19, 17, 24, 59, 687248) datetime.now()=datetime.datetime(2024, 2, 19, 17, 24, 59, 687248) datetime.utcnow()=datetime.datetime(2024, 2, 19, 7, 24, 59, 687248) datetime.now(timezone.utc)=datetime.datetime(2024, 2, 19, 7, 24, 59, 687248, tzinfo=datetime.timezone.utc) datetime.now(UTC)=datetime.datetime(2024, 2, 19, 7, 24, 59, 687248, tzinfo=datetime.timezone.utc)
C:\Users\donrc\AppData\Local\Temp\ipykernel_24512\1724687794.py:4: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC). print(f'{datetime.utcnow()=}')
From Arie Bovenberg's blog post, below is an example where we create two apparently different date-times from an ambiguous date-time, only to find they test equal! Apparantly, the test compares wall clock digits only.
# two times one hour apart (due to DST transition)
earlier = datetime(2023, 10, 29, 2, 30, tzinfo=paris, fold=0)
later = datetime(2023, 10, 29, 2, 30, tzinfo=paris, fold=1)
print(earlier, later)
2023-10-29 02:30:00+02:00 2023-10-29 02:30:00+01:00
earlier.timestamp(), later.timestamp()
(1698539400.0, 1698543000.0)
earlier==later
True
Once again, pandas
does the expected thing (no, these date-time are not equal)
t1 = pd.to_datetime(earlier)
t2 = pd.to_datetime(later)
t1 == t2
False
Note that if we change the time zone information, even if to an equivalent set of information, the date-times will test not-equal!
later2 = later.replace(tzinfo=dateutil.tz.gettz("Europe/Paris"))
later == later2
False
pandas
again seems to do the correct thing, even if the time zone information came from two different typed objects.
t3 = pd.to_datetime(later)
t4 = pd.to_datetime(later2)
print(t3,t4)
2023-10-29 02:30:00+01:00 2023-10-29 02:30:00+01:00
t3==t4
True
t3.tzinfo, t4.tzinfo
(zoneinfo.ZoneInfo(key='Europe/Paris'), tzfile('Europe/Paris'))
Conclusion¶
The various datetime pitfalls are certainly something to be aware of, and should be considered in any code reviews of Python apps that deal with dates or times.
It is slightly reassuring the pandas
seems to be more reliable in this regard.
Reproducability¶
%watermark -co
conda environment: c:\Users\donrc\Documents\VisualCodeProjects\DateTimeProject\.conda
%watermark -iv
dateutil: 2.8.2 pandas : 2.1.4
%watermark
Last updated: 2024-02-19T17:39:41.908759+10:00 Python implementation: CPython Python version : 3.12.1 IPython version : 8.21.0 Compiler : MSC v.1916 64 bit (AMD64) OS : Windows Release : 11 Machine : AMD64 Processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel CPU cores : 8 Architecture: 64bit