{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "allWIe5kwPcS"
   },
   "source": [
    "# Time Series Datasets\n",
    "\n",
    "This notebook shows how to create a time series dataset from some csv file in order to then share it on the [🤗 hub](https://huggingface.co/docs/datasets/index). We will use the GluonTS library to read the csv into the appropriate format. We start by installing the libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "id": "4XnNcdWbwrNo"
   },
   "outputs": [],
   "source": [
    "! pip install -q datasets gluonts orjson"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "dI1yo_vHw5CV"
   },
   "source": [
    "GluonTS comes with a pandas DataFrame based dataset so our strategy will be to read the csv file, and process it as a `PandasDataset`. We will then iterate over it and convert it to a 🤗 dataset with the appropriate schema for time series. So lets get started!\n",
    "\n",
    "## `PandasDataset`\n",
    "\n",
    "Suppose we are given multiple (10) time series stacked on top of each other in a dataframe with an `item_id` column that distinguishes different series:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 206
    },
    "id": "e9FaT_VpwuI2",
    "outputId": "8a10c908-41e1-4ca7-b420-01c0810c5c4b"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "  <div id=\"df-90cae087-2afb-413c-85e0-0f6b9a05e059\">\n",
       "    <div class=\"colab-df-container\">\n",
       "      <div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>target</th>\n",
       "      <th>item_id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2021-01-01 00:00:00</th>\n",
       "      <td>-1.3378</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-01-01 01:00:00</th>\n",
       "      <td>-1.6111</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-01-01 02:00:00</th>\n",
       "      <td>-1.9259</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-01-01 03:00:00</th>\n",
       "      <td>-1.9184</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-01-01 04:00:00</th>\n",
       "      <td>-1.9168</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>\n",
       "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-90cae087-2afb-413c-85e0-0f6b9a05e059')\"\n",
       "              title=\"Convert this dataframe to an interactive table.\"\n",
       "              style=\"display:none;\">\n",
       "        \n",
       "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
       "       width=\"24px\">\n",
       "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
       "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
       "  </svg>\n",
       "      </button>\n",
       "      \n",
       "  <style>\n",
       "    .colab-df-container {\n",
       "      display:flex;\n",
       "      flex-wrap:wrap;\n",
       "      gap: 12px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert {\n",
       "      background-color: #E8F0FE;\n",
       "      border: none;\n",
       "      border-radius: 50%;\n",
       "      cursor: pointer;\n",
       "      display: none;\n",
       "      fill: #1967D2;\n",
       "      height: 32px;\n",
       "      padding: 0 0 0 0;\n",
       "      width: 32px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert:hover {\n",
       "      background-color: #E2EBFA;\n",
       "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "      fill: #174EA6;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert {\n",
       "      background-color: #3B4455;\n",
       "      fill: #D2E3FC;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert:hover {\n",
       "      background-color: #434B5C;\n",
       "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
       "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
       "      fill: #FFFFFF;\n",
       "    }\n",
       "  </style>\n",
       "\n",
       "      <script>\n",
       "        const buttonEl =\n",
       "          document.querySelector('#df-90cae087-2afb-413c-85e0-0f6b9a05e059 button.colab-df-convert');\n",
       "        buttonEl.style.display =\n",
       "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "\n",
       "        async function convertToInteractive(key) {\n",
       "          const element = document.querySelector('#df-90cae087-2afb-413c-85e0-0f6b9a05e059');\n",
       "          const dataTable =\n",
       "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
       "                                                     [key], {});\n",
       "          if (!dataTable) return;\n",
       "\n",
       "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
       "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
       "            + ' to learn more about interactive tables.';\n",
       "          element.innerHTML = '';\n",
       "          dataTable['output_type'] = 'display_data';\n",
       "          await google.colab.output.renderOutput(dataTable, element);\n",
       "          const docLink = document.createElement('div');\n",
       "          docLink.innerHTML = docLinkHtml;\n",
       "          element.appendChild(docLink);\n",
       "        }\n",
       "      </script>\n",
       "    </div>\n",
       "  </div>\n",
       "  "
      ],
      "text/plain": [
       "                     target item_id\n",
       "2021-01-01 00:00:00 -1.3378       A\n",
       "2021-01-01 01:00:00 -1.6111       A\n",
       "2021-01-01 02:00:00 -1.9259       A\n",
       "2021-01-01 03:00:00 -1.9184       A\n",
       "2021-01-01 04:00:00 -1.9168       A"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "url = (\n",
    "    \"https://gist.githubusercontent.com/rsnirwan/a8b424085c9f44ef2598da74ce43e7a3\"\n",
    "    \"/raw/b6fdef21fe1f654787fa0493846c546b7f9c4df2/ts_long.csv\"\n",
    ")\n",
    "df = pd.read_csv(url, index_col=0, parse_dates=True)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "s4WCjvBqxi3B"
   },
   "source": [
    "After converting it into a `pd.Dataframe` we can then convert it into GluonTS's `PandasDataset`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "id": "su1i8ZdDxf7c"
   },
   "outputs": [],
   "source": [
    "from gluonts.dataset.pandas import PandasDataset\n",
    "\n",
    "ds = PandasDataset.from_long_dataframe(df, target=\"target\", item_id=\"item_id\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cYnHkLdex_n3"
   },
   "source": [
    "\n",
    "## 🤗 Datasets\n",
    "\n",
    "From here we have to map the pandas dataset's `start` field into a time stamp instead of a `pd.Period`. We do this by defining the following class:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "id": "gv5Ytpwrx4FQ"
   },
   "outputs": [],
   "source": [
    "class ProcessStartField():\n",
    "    ts_id = 0\n",
    "    \n",
    "    def __call__(self, data):\n",
    "        data[\"start\"] = data[\"start\"].to_timestamp()\n",
    "        data[\"feat_static_cat\"] = [self.ts_id]\n",
    "        self.ts_id += 1\n",
    "        \n",
    "        return data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "id": "DrhbT1QIyPMA"
   },
   "outputs": [],
   "source": [
    "from gluonts.itertools import Map\n",
    "\n",
    "process_start = ProcessStartField()\n",
    "\n",
    "list_ds = list(Map(process_start, ds))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Ug2kLNUPyeyJ"
   },
   "source": [
    "Next we need to define our schema features and create our dataset from this list via the `from_list` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "id": "r1rQUtvGycaF"
   },
   "outputs": [],
   "source": [
    "from datasets import Dataset, Features, Value, Sequence\n",
    "\n",
    "features  = Features(\n",
    "    {    \n",
    "        \"start\": Value(\"timestamp[s]\"),\n",
    "        \"target\": Sequence(Value(\"float32\")),\n",
    "        \"feat_static_cat\": Sequence(Value(\"uint64\")),\n",
    "        # \"feat_static_real\":  Sequence(Value(\"float32\")),\n",
    "        # \"feat_dynamic_real\": Sequence(Sequence(Value(\"uint64\"))),\n",
    "        # \"feat_dynamic_cat\": Sequence(Sequence(Value(\"uint64\"))),\n",
    "        \"item_id\": Value(\"string\"),\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "id": "RrpP2oAxywC8"
   },
   "outputs": [],
   "source": [
    "dataset = Dataset.from_list(list_ds, features=features)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GrbGlDtYzyIf"
   },
   "source": [
    "We can thus use this strategy to [share](https://huggingface.co/docs/datasets/share) the dataset to the hub."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}