{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "allWIe5kwPcS" }, "source": [ "# Time Series Datasets\n", "\n", "This notebook shows how to create a time series dataset from some csv file in order to then share it on the [🤗 hub](https://huggingface.co/docs/datasets/index). We will use the GluonTS library to read the csv into the appropriate format. We start by installing the libraries" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "id": "4XnNcdWbwrNo" }, "outputs": [], "source": [ "! pip install -q datasets gluonts orjson" ] }, { "cell_type": "markdown", "metadata": { "id": "dI1yo_vHw5CV" }, "source": [ "GluonTS comes with a pandas DataFrame based dataset so our strategy will be to read the csv file, and process it as a `PandasDataset`. We will then iterate over it and convert it to a 🤗 dataset with the appropriate schema for time series. So lets get started!\n", "\n", "## `PandasDataset`\n", "\n", "Suppose we are given multiple (10) time series stacked on top of each other in a dataframe with an `item_id` column that distinguishes different series:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "id": "e9FaT_VpwuI2", "outputId": "8a10c908-41e1-4ca7-b420-01c0810c5c4b" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " | target | \n", "item_id | \n", "
---|---|---|
2021-01-01 00:00:00 | \n", "-1.3378 | \n", "A | \n", "
2021-01-01 01:00:00 | \n", "-1.6111 | \n", "A | \n", "
2021-01-01 02:00:00 | \n", "-1.9259 | \n", "A | \n", "
2021-01-01 03:00:00 | \n", "-1.9184 | \n", "A | \n", "
2021-01-01 04:00:00 | \n", "-1.9168 | \n", "A | \n", "