{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "# Time Series Datasets\n", "\n", "This notebook shows how to create a time series dataset from some csv file in order to then share it on the [🤗 hub](https://huggingface.co/docs/datasets/index). We will use the GluonTS library to read the csv into the appropriate format. We start by installing the libraries" ], "metadata": { "id": "allWIe5kwPcS" } }, { "cell_type": "code", "source": [ "! pip install -q datasets gluonts orjson" ], "metadata": { "id": "4XnNcdWbwrNo" }, "execution_count": 31, "outputs": [] }, { "cell_type": "markdown", "source": [ "GluonTS comes with a pandas DataFrame based dataset so our strategy will be to read the csv file, and process it as a `PandasDataset`. We will then iterate over it and convert it to a 🤗 dataset with the appropriate schema for time series. So lets get started!\n", "\n", "## `PandasDataset`\n", "\n", "Suppose we are given multiple (10) time series stacked on top of each other in a dataframe with an `item_id` column that distinguishes different series:" ], "metadata": { "id": "dI1yo_vHw5CV" } }, { "cell_type": "code", "source": [ "import pandas as pd\n", "\n", "url = (\n", " \"https://gist.githubusercontent.com/rsnirwan/a8b424085c9f44ef2598da74ce43e7a3\"\n", " \"/raw/b6fdef21fe1f654787fa0493846c546b7f9c4df2/ts_long.csv\"\n", ")\n", "df = pd.read_csv(url, index_col=0, parse_dates=True)\n", "df.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "id": "e9FaT_VpwuI2", "outputId": "8a10c908-41e1-4ca7-b420-01c0810c5c4b" }, "execution_count": 25, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " target item_id\n", "2021-01-01 00:00:00 -1.3378 A\n", "2021-01-01 01:00:00 -1.6111 A\n", "2021-01-01 02:00:00 -1.9259 A\n", "2021-01-01 03:00:00 -1.9184 A\n", "2021-01-01 04:00:00 -1.9168 A" ], "text/html": [ "\n", "
| \n", " | target | \n", "item_id | \n", "
|---|---|---|
| 2021-01-01 00:00:00 | \n", "-1.3378 | \n", "A | \n", "
| 2021-01-01 01:00:00 | \n", "-1.6111 | \n", "A | \n", "
| 2021-01-01 02:00:00 | \n", "-1.9259 | \n", "A | \n", "
| 2021-01-01 03:00:00 | \n", "-1.9184 | \n", "A | \n", "
| 2021-01-01 04:00:00 | \n", "-1.9168 | \n", "A | \n", "