{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Working with Dates\n", "\n", "This is a tutorial on how to prepare temporal data for use with Lux. To display temporal fields in Lux, the column must be converted into Pandas's [`datetime`](https://docs.python.org/3/library/datetime.html) objects. Lux automatically detects attribute named as `date`, `month`, `year`, `day`, and `time` as a datetime field and recognizes them as temporal data types. If you're temporal attributes do not have these names, read more to find out how to work with temporal data types in Lux." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import lux\n", "from lux.vis.Vis import Vis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Converting Strings to Datetime objects\n", "To convert column referencing dates/times into [`datetime`](https://docs.python.org/3/library/datetime.html) objects, we use [`pd.to_datetime`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html), as follows:\n", "\n", "\n", "```\n", "pd.to_datetime(['2020-01-01', '2020-01-15', '2020-02-01'],format=\"%Y-%m-%d\")\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a toy example, a dataframe might contain a `record_date` attribute as strings of dates:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.DataFrame({'record_date': ['2020-02-01', '2020-03-01', '2020-04-01', '2020-05-01','2020-06-01',],\n", " 'value': [10.5,15.2,20.3,25.2, 14.2]})\n", "\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, the `record_date` attribute is detected as an `object` type as Pandas's data type [`dtype`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dtypes.html):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since `record_date` is detected as an object type in Pandas, the `record_date` field is recognized as a `nominal` field in Lux, instead of a `temporal` field:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.data_type" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The typing has implications on the generated visualizations, since nominal chart types are displayed as bar charts, whereas temporal fields are plotted as time series line charts." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vis = Vis([\"record_date\",\"value\"],df)\n", "vis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To fix this, we can convert the `record_date` column into a datetime object by doing:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df[\"record_date\"] = pd.to_datetime(df[\"record_date\"],format=\"%Y-%m-%d\")\n", "df[\"record_date\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After changing the Pandas data type to datetime, we see that date field is recognized as temporal fields in Lux." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.data_type" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vis.refresh_source(df)\n", "vis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualizing Trends across Different Timescales\n", "\n", "Lux automatically detects the temporal attribute and plots the visualizations across different timescales to showcase any cyclical patterns. Here, we see that the `Temporal` tab displays the yearly, monthly, and weekly trends for the number of stock records." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from vega_datasets import data\n", "df = data.stocks()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.recommendation[\"Temporal\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Advanced Date Manipulation\n", "You might notice earlier that all the dates in our example dataset are the first of the month. In this case, there may be situations where we only want to list the year and month, instead of the full date. Here, we look at how to handle these cases.\n", "\n", "Below we look at an example stocks dataset that also has `month_date` field with each row representing data for the first of each month." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.dtypes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vis = Vis([\"monthdate\",\"price\"],df)\n", "vis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we only want Lux to output the month and the year, we can convert the column to a [`PeriodIndex`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.PeriodIndex.html) using [`to_period`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.to_period.html). The `freq` argument specifies the granularity of the output. In this case, we are using 'M' for monthly. You can find more about how to specify time periods [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df[\"monthdate\"] = pd.DatetimeIndex(df[\"monthdate\"]).to_period(freq='M')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vis.refresh_source(df)\n", "vis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Specifying Intents With Datetime Fields\n", "The string representation seen in the Dataframe can be used to filter out specific dates. \n", "\n", "For example, in the above `stocks` dataset, we converted the date column to a `PeriodIndex`. Now the string representation only shows the granularity we want to see. We can use that string representation to filter the dataframe in Pandas:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df[df[\"monthdate\"] == '2008-11']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also use the same string representation for specifying an intent in Lux." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vis = Vis([\"monthdate=2008-11\",\"price\",\"symbol\"],df)\n", "vis" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.2" } }, "nbformat": 4, "nbformat_minor": 4 }