{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exploring `pandas` Date Offsets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Creating finance calender lookup tables with the powerful date and time classes in base `pandas`. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Reference:\n", "\n", "* [Pandas user guide on time series/date functionality](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html)\n", "* [Pandas docs on `timedeltas`](https://pandas.pydata.org/pandas-docs/stable/user_guide/timedeltas.html)\n", "* [Pandas Reference Docs on Date offsets](https://pandas.pydata.org/docs/reference/offset_frequency.html#)\n", "* [Docs on Period objects](https://pandas.pydata.org/docs/reference/api/pandas.Period.html)\n", "* [Pandas frequency strings aka offset aliases](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases)\n", "* [Pandas Info on the FY5253 offset](https://pandas.pydata.org/docs/reference/api/pandas.tseries.offsets.FY5253.html)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.1.3\n" ] } ], "source": [ "import datetime\n", "import numpy as np\n", "import pandas as pd\n", "print(pd.__version__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`pandas` has a variety of functions that let you create and manipulate date ranges that conform to a fiscal calendar that begins on January 1st of every year and ends on Dec 31st of every year, and each quarter is exactly 3 months long. \n", "\n", "We can see an example of this built-in functionality if we use `pandas` to give us an array of the *last* day in each quarter in 2021 using the `freq='Q'` parameter." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2021-03-31', '2021-06-30', '2021-09-30', '2021-12-31'], dtype='datetime64[ns]', freq='Q-DEC')" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "start = datetime.datetime(2021, 1, 1)\n", "end = datetime.datetime(2022, 1, 1)\n", "# Create a DatetimeIndex with freq='Q'\n", "# the 'Q' indicates the last day of each quarter\n", "ts = pd.date_range(start, end, freq='Q')\n", "ts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use this index to create a dataframe with columns with various attributes, by converting this stuff to a `period` object:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
quarterfiscal_year
2021-03-3112021
2021-06-3022021
2021-09-3032021
2021-12-3142021
\n", "
" ], "text/plain": [ " quarter fiscal_year\n", "2021-03-31 1 2021\n", "2021-06-30 2 2021\n", "2021-09-30 3 2021\n", "2021-12-31 4 2021" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tp = ts.to_period()\n", "\n", "pd.DataFrame(index=ts, data={'quarter':tp.quarter, 'fiscal_year':tp.qyear})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, what if you wanted to use `pandas` to handle fiscal calendars that aren't set to this standard? For example, let's say we have a fiscal calendar that starts on June 1st (i.e. Nike). \n", "\n", "Notice that the previous creation of a `pd.date_range` with `freq='Q'` created a DatetimeIndex with a freq of `'Q-DEC'`. While there's a list of ['offset aliases'](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases) there isn't much mention of what `'Q-DEC'` is except mentioned briefly in the [Pandas user guide](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#frequency-conversion-and-resampling-with-periodindex):\n", "\n", ">Many organizations define quarters relative to the month in which their fiscal year starts and ends. Thus, first quarter of 2011 could start in 2010 or a few months into 2011. Via anchored frequencies, pandas works for all quarterly frequencies `Q-JAN` through `Q-DEC`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So in other words, the default quarter system `Q-DEC` (and `A-DEC`) means that it ends in December. So if we wanted to have a fiscal calendar that ends in May and starts in June 1st, we'd use `Q-MAY`:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2021-02-28', '2021-05-31', '2021-08-31', '2021-11-30'], dtype='datetime64[ns]', freq='Q-MAY')" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ts = pd.date_range(start, end, freq='Q-MAY')\n", "ts" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
quarterfiscal_year
2021-02-2832021
2021-05-3142021
2021-08-3112022
2021-11-3022022
\n", "
" ], "text/plain": [ " quarter fiscal_year\n", "2021-02-28 3 2021\n", "2021-05-31 4 2021\n", "2021-08-31 1 2022\n", "2021-11-30 2 2022" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tp = ts.to_period()\n", "\n", "pd.DataFrame(index=ts, data={'quarter':tp.quarter, 'fiscal_year':tp.qyear})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So that was easy! And instead of creating the date range with just the end quarters, we can create a daily date range and then use the `to_period()` call to do the fiscal year translation for us, to create a *daily* calendar lookup table:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
quarterfiscal_year
2021-12-2832022
2021-12-2932022
2021-12-3032022
2021-12-3132022
2022-01-0132022
\n", "
" ], "text/plain": [ " quarter fiscal_year\n", "2021-12-28 3 2022\n", "2021-12-29 3 2022\n", "2021-12-30 3 2022\n", "2021-12-31 3 2022\n", "2022-01-01 3 2022" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create a DatetimeIndex with all days between 2021 and 2022\n", "ts = pd.date_range(start='2021', end='2022', freq='D')\n", "\n", "tp = ts.to_period(freq='Q-MAY')\n", "df = pd.DataFrame(index=ts, data={'quarter':tp.quarter, 'fiscal_year':tp.qyear})\n", "df.tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that the correct fiscal year and fiscal quarters are represented. Nice!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What are Periods?\n", "\n", "Notice in the above example, I converted the DatetimeIndex into something with the `to_period()` call. But what exactly is that? Let's take a look:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "PeriodIndex(['2021Q3', '2021Q3', '2021Q3', '2021Q3', '2021Q3', '2021Q3',\n", " '2021Q3', '2021Q3', '2021Q3', '2021Q3',\n", " ...\n", " '2022Q3', '2022Q3', '2022Q3', '2022Q3', '2022Q3', '2022Q3',\n", " '2022Q3', '2022Q3', '2022Q3', '2022Q3'],\n", " dtype='period[Q-MAY]', length=366, freq='Q-MAY')" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our `DatetimeIndex` was converted into a `PeriodIndex`, which is an index of `Period` objects. Each `Period` object is a custom object that has a variety of helpful attributes, like `quarter`, `month`, etc that we can extract for our lookup calendar if we wanted to." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 52/53 Workweek Calendars\n", "\n", "What if your company follows a 52/53-workweek calendar using the 4-4-5 quarter system (i.e. Intel, Dell)? I want to be able to use `pandas` methods on those, too!\n", "\n", "We can use the `pd.tseries.offsets.FY5253` DateOffset type to find the dates of the beginning of each fiscal year:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2020-12-26', '2021-12-25', '2022-12-31'], dtype='datetime64[ns]', freq='RE-L-DEC-SAT')" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "variation = 'last'\n", "\n", "# create an offset for the Intel fiscal calendar, which ends on the last Saturday of December\n", "yoffset = pd.tseries.offsets.FY5253(n=1, weekday=5, startingMonth=12, variation=variation)\n", "\n", "# use the offset to create a date range with intervals at each fiscal year beginning\n", "yoffset_range = pd.date_range('2020', periods=3, freq=yoffset)\n", "yoffset_range" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look at the above result - we've generated a `DatetimeIndex` with the *last* day of each fiscal year.\n", "\n", "Notice the new `freq` string. Creating the timeseries offset object generates a new `freq` string that you can use for the `pd.date_range` function. However, unlike the previous examples you can't convert these to `Period` objects to get the quarter, month, etc (you can try it - it'll return an error).\n", "\n", "So how can we use this to create a lookup table? Well, first off having the beginning dates for each fiscal year is helpful to us because having to worry about leap years and figuring out whether a year is going to be a 52 workweek year vs 53 workweek year is annoying. Trust me, I went down that rabbit hole.\n", "\n", "But with the start/end dates for each fiscal year and the fact that we're following a 4-4-5 convention, we can create this manually:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Create a '1 day' offset:\n", "d1 = pd.tseries.offsets.DateOffset(n = 1)\n", "\n", "# Use our business acumen to specify what that first fiscal year was. '2020-12-27' is the first day of fiscal year 2021 for Intel, \n", "# while '2021-12-26' was the first day of fiscal year 2022\n", "yr = (yoffset_range[0] - pd.tseries.offsets.DateOffset(n=8)).year +1\n", "\n", "# iterate over each item in the date range we created earlier:\n", "result = pd.DataFrame()\n", "for i in yoffset_range:\n", " # recall each item is a Timestamp that represents the first day of the fiscal year,\n", " # so create a date_range from beginning to end of the fiscal year\n", " current_range = pd.date_range(i+d1, i+yoffset, freq='D')\n", " interim_df = pd.DataFrame(index = current_range)\n", " # day of year\n", " interim_df['DOY'] = (current_range-current_range[0]).days +1\n", " # fiscal year\n", " interim_df['FY'] = yr\n", " result = result.append(interim_df)\n", " yr += 1" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DOYFY
2021-12-163552021
2021-12-173562021
2021-12-183572021
2021-12-193582021
2021-12-203592021
2021-12-213602021
2021-12-223612021
2021-12-233622021
2021-12-243632021
2021-12-253642021
\n", "
" ], "text/plain": [ " DOY FY\n", "2021-12-16 355 2021\n", "2021-12-17 356 2021\n", "2021-12-18 357 2021\n", "2021-12-19 358 2021\n", "2021-12-20 359 2021\n", "2021-12-21 360 2021\n", "2021-12-22 361 2021\n", "2021-12-23 362 2021\n", "2021-12-24 363 2021\n", "2021-12-25 364 2021" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result.query('FY==2021').tail(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's go ahead and add in a few more features:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# workweek in year\n", "result['WW'] = ((result['DOY']-1) //7) + 1\n", "# fiscal quarter\n", "result['FQ'] = np.minimum((result['WW']//13) + 1, 4)\n", "# workweek in quarter\n", "result['WWinQ'] = result['WW'] - ((result['FQ']-1) * 13)\n", "# fiscal month\n", "result['FM'] = ((result['FQ']-1)*3) + np.minimum(((result['WWinQ'] // 4)+1), 3)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DOYFYWWFQWWinQFM
2021-12-1635520215141212
2021-12-1735620215141212
2021-12-1835720215141212
2021-12-1935820215241312
2021-12-2035920215241312
2021-12-2136020215241312
2021-12-2236120215241312
2021-12-2336220215241312
2021-12-2436320215241312
2021-12-2536420215241312
\n", "
" ], "text/plain": [ " DOY FY WW FQ WWinQ FM\n", "2021-12-16 355 2021 51 4 12 12\n", "2021-12-17 356 2021 51 4 12 12\n", "2021-12-18 357 2021 51 4 12 12\n", "2021-12-19 358 2021 52 4 13 12\n", "2021-12-20 359 2021 52 4 13 12\n", "2021-12-21 360 2021 52 4 13 12\n", "2021-12-22 361 2021 52 4 13 12\n", "2021-12-23 362 2021 52 4 13 12\n", "2021-12-24 363 2021 52 4 13 12\n", "2021-12-25 364 2021 52 4 13 12" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result.query('FY==2021').tail(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Great! We can summarize the function below and in a [public gist](https://gist.github.com/banditkings/735fe1885a442b3457d208060ac4b970)." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "Make a dataframe for a 52/53-workweek accounting period calendar given the last day of the previous year.\n", "Since many companies use this 52/53 workweek for their fiscal periods, this is a useful function to quickly\n", "build custom lookup tables so you can join between datasets!\n", "\"\"\"\n", "\n", "import pandas as pd\n", "from datetime import datetime\n", "import numpy as np\n", "\n", "\n", "def create_5253_calendar(start='2020', periods=3, weekday=5, startingMonth=12, variation='last'):\n", " \"\"\"\n", " Builds a lookup dataframe with calendar date on the index and relevant fiscal dates for a 52/53 workweek\n", " aka 4-4-5 fiscal calendar. \n", " \n", " See https://pandas.pydata.org/docs/reference/api/pandas.tseries.offsets.FY5253.html for info on the \n", " pandas 52/53 workweek dateoffset object.\n", " \n", " Parameters\n", " ----------\n", " start : str, datetime, or pd.Timestamp object (default : '2020')\n", " Indicate the calendar year (i.e. '2020') or datetime/timestamp for the final output dataframe\n", " periods : int (default : 3\n", " Number of fiscal years ahead to build the output dataframe\n", " weekday : int (default : 5)\n", " Which day of week the fiscal year ends on\n", " See also: https://pandas.pydata.org/docs/reference/api/pandas.tseries.offsets.FY5253.html\n", " * 0 : Monday\n", " * 1 : Tuesday\n", " * 2 : Wednesday\n", " * 3 : Thursday\n", " * 4 : Friday\n", " * 5 : Saturday\n", " * 6 : Sunday\n", " startingMonth : int (default : 12)\n", " Following the pandas convention, but this actually signals the 'ending month' for the fiscal year. So a fiscal year\n", " that ends on the last Saturday of every December would have a value of 12 here.\n", " variation : str (default : 'last')\n", " Either 'last' or 'nearest' - parameters for the pandas FY5253 offset that specifies the type of 5253 calendar this is.\n", " If it's 'last', then the fiscal year ends on the 'last' `weekday` of the `startingMonth`.\n", " If it's 'nearest', then the fiscal year ends on the 'nearest' `weekday` of the `startingMonth`. \n", " \n", " Returns\n", " -------\n", " pd.DataFrame\n", " Returns a dataframe with a daily DatetimeIndex and columns that map to key fiscal timelines for\n", " a 52/53 workweek calendar, such as DOY (day of fiscal year), WW (workweek in fiscal year), etc\n", " \n", " Examples\n", " --------\n", " >>>create_5253_calendar(start='20', periods=5, weekday=5, startingMonth=12, variation='last')\n", " \n", " \"\"\"\n", " # create an offset for the Intel fiscal calendar, which ends on the last Saturday of December\n", " yoffset = pd.tseries.offsets.FY5253(n=1, weekday=weekday, startingMonth=startingMonth, variation=variation)\n", "\n", " # use the offset to create a date range with intervals at each fiscal year beginning\n", " yoffset_range = pd.date_range(start=start, periods=periods, freq=yoffset) \n", " \n", " # Create a '1 day' offset:\n", " d1 = pd.tseries.offsets.DateOffset(n = 1)\n", "\n", " # Use our business acumen to specify what that first fiscal year was. '2020-12-27' is the first day of fiscal year 2021 for Intel, \n", " # while '2021-12-26' was the first day of fiscal year 2022\n", " yr = (yoffset_range[0] - pd.tseries.offsets.DateOffset(n=8)).year +1\n", "\n", " # iterate over each item in the date range we created earlier:\n", " result = pd.DataFrame()\n", " for i in yoffset_range:\n", " # recall each item is a Timestamp that represents the first day of the fiscal year,\n", " # so create a date_range from beginning to end of the fiscal year\n", " current_range = pd.date_range(i+d1, i+yoffset, freq='D')\n", " interim_df = pd.DataFrame(index = current_range)\n", " # day of year\n", " interim_df['DOY'] = (current_range-current_range[0]).days +1\n", " # fiscal year\n", " interim_df['FY'] = yr\n", " result = result.append(interim_df)\n", " yr += 1\n", "\n", " # workweek in year\n", " result['WW'] = ((result['DOY']-1) //7) + 1\n", " # fiscal quarter\n", " result['FQ'] = np.minimum((result['WW']//13) + 1, 4)\n", " # workweek in quarter\n", " result['WWinQ'] = result['WW'] - ((result['FQ']-1) * 13)\n", " # fiscal month\n", " result['FM'] = ((result['FQ']-1)*3) + np.minimum(((result['WWinQ'] // 4)+1), 3)\n", "\n", " return result" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "64.3 ms ± 8.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" ] } ], "source": [ "%%timeit\n", "create_5253_calendar(start='2000', periods=30, weekday=5, startingMonth=12, variation='last')" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DOYFYWWFQWWinQFM
2030-12-2436020305241312
2030-12-2536120305241312
2030-12-2636220305241312
2030-12-2736320305241312
2030-12-2836420305241312
\n", "
" ], "text/plain": [ " DOY FY WW FQ WWinQ FM\n", "2030-12-24 360 2030 52 4 13 12\n", "2030-12-25 361 2030 52 4 13 12\n", "2030-12-26 362 2030 52 4 13 12\n", "2030-12-27 363 2030 52 4 13 12\n", "2030-12-28 364 2030 52 4 13 12" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = create_5253_calendar(start='2000', periods=30, weekday=5, startingMonth=12, variation='last')\n", "df.tail()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 5 }