{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "taruma-hidrokit-prep-timeseries", "version": "0.3.2", "provenance": [], "collapsed_sections": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "_CtZpmBbCbAg", "colab_type": "text" }, "source": [ "# Tutorial `hidrokit.prep.timeseries`\n", "\n", "- **Kategori**: _data preparation_\n", "- __Tujuan__: Memanipulasi dataset _timeseries_ untuk penggunaan _machine learning_ / ANN\n", "- __Dokumentasi__: [readthedocs](https://hidrokit.readthedocs.io/en/stable/prep.html#module-prep.timeseries)\n", "\n", "## Informasi notebook\n", "\n", "- __notebook name__: `taruma_hidrokit_prep_timeseries`\n", "- __notebook version/date__ : `1.0.1`/`20190713`\n", "- __notebook server__: Google Colab\n", "- __hidrokit version__: `0.2.0`\n", "- **python version**: `3.7`\n" ] }, { "cell_type": "markdown", "metadata": { "id": "BPm5qNh_DQjj", "colab_type": "text" }, "source": [ "## Instalasi hidrokit" ] }, { "cell_type": "code", "metadata": { "id": "aeLepUrl_nxm", "colab_type": "code", "outputId": "87f63d63-589b-41bb-95f2-75ed62ada283", "colab": { "base_uri": "https://localhost:8080/", "height": 255 } }, "source": [ "### Instalasi melalui PyPI\n", "!pip install hidrokit\n", "\n", "### Instalasi melalui Github\n", "# !pip install git+https://github.com/taruma/hidrokit.git\n", "\n", "### Instalasi melalui Github (Latest)\n", "# !pip install git+https://github.com/taruma/hidrokit.git@latest" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "Collecting hidrokit\n", " Downloading https://files.pythonhosted.org/packages/43/9d/343d2a413a07463a21dd13369e31d664d6733bbfd46276abef5d804c83d1/hidrokit-0.2.0-py2.py3-none-any.whl\n", "Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from hidrokit) (1.16.4)\n", "Requirement already satisfied: pandas in /usr/local/lib/python3.6/dist-packages (from hidrokit) (0.24.2)\n", "Requirement already satisfied: matplotlib in /usr/local/lib/python3.6/dist-packages (from hidrokit) (3.0.3)\n", "Requirement already satisfied: python-dateutil>=2.5.0 in /usr/local/lib/python3.6/dist-packages (from pandas->hidrokit) (2.5.3)\n", "Requirement already satisfied: pytz>=2011k in /usr/local/lib/python3.6/dist-packages (from pandas->hidrokit) (2018.9)\n", "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->hidrokit) (2.4.0)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->hidrokit) (1.1.0)\n", "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib->hidrokit) (0.10.0)\n", "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.6/dist-packages (from python-dateutil>=2.5.0->pandas->hidrokit) (1.12.0)\n", "Requirement already satisfied: setuptools in /usr/local/lib/python3.6/dist-packages (from kiwisolver>=1.0.1->matplotlib->hidrokit) (41.0.1)\n", "Installing collected packages: hidrokit\n", "Successfully installed hidrokit-0.2.0\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "APb9vC-zDaV4", "colab_type": "text" }, "source": [ "## Import Library" ] }, { "cell_type": "code", "metadata": { "id": "Gx6h8iSxDfQY", "colab_type": "code", "colab": {} }, "source": [ "import numpy as np\n", "import pandas as pd" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "Kny2T1itDlz6", "colab_type": "text" }, "source": [ "## Dataset\n", "\n", "dataset memiliki tujuh fitur (a, b, c, d, e, f, g)" ] }, { "cell_type": "code", "metadata": { "id": "xdDdm1pbD-AO", "colab_type": "code", "outputId": "ae1825d2-cf32-4acf-e94d-b327b4b7512b", "colab": { "base_uri": "https://localhost:8080/", "height": 359 } }, "source": [ "# Buat dataset menggunakan numpy\n", "\n", "np.random.seed(110891)\n", "date_index = pd.date_range('20190101', '20191231')\n", "data = np.random.rand(len(date_index), 7) * 100\n", "columns = 'a b c d e f g'.split()\n", "dataset = pd.DataFrame(\n", " data=data.round(),\n", " columns=columns,\n", " index=date_index.strftime('%Y-%b-%d')\n", ")\n", "dataset.head(10)" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abcdefg
2019-Jan-0129.032.026.061.05.022.078.0
2019-Jan-0286.034.080.032.016.017.076.0
2019-Jan-0350.052.072.046.03.018.081.0
2019-Jan-045.02.086.036.019.09.097.0
2019-Jan-059.093.07.032.055.062.031.0
2019-Jan-0694.038.087.087.051.0100.018.0
2019-Jan-0754.013.023.059.043.066.068.0
2019-Jan-0861.041.096.073.057.044.077.0
2019-Jan-0989.054.040.077.066.051.076.0
2019-Jan-1060.087.062.035.042.047.062.0
\n", "
" ], "text/plain": [ " a b c d e f g\n", "2019-Jan-01 29.0 32.0 26.0 61.0 5.0 22.0 78.0\n", "2019-Jan-02 86.0 34.0 80.0 32.0 16.0 17.0 76.0\n", "2019-Jan-03 50.0 52.0 72.0 46.0 3.0 18.0 81.0\n", "2019-Jan-04 5.0 2.0 86.0 36.0 19.0 9.0 97.0\n", "2019-Jan-05 9.0 93.0 7.0 32.0 55.0 62.0 31.0\n", "2019-Jan-06 94.0 38.0 87.0 87.0 51.0 100.0 18.0\n", "2019-Jan-07 54.0 13.0 23.0 59.0 43.0 66.0 68.0\n", "2019-Jan-08 61.0 41.0 96.0 73.0 57.0 44.0 77.0\n", "2019-Jan-09 89.0 54.0 40.0 77.0 66.0 51.0 76.0\n", "2019-Jan-10 60.0 87.0 62.0 35.0 42.0 47.0 62.0" ] }, "metadata": { "tags": [] }, "execution_count": 6 } ] }, { "cell_type": "code", "metadata": { "id": "kpXizaAl73Eq", "colab_type": "code", "outputId": "ab051899-f2ac-43f6-8e2e-3ea5e95e65bb", "colab": { "base_uri": "https://localhost:8080/", "height": 221 } }, "source": [ "# Info Dataset\n", "dataset.info()" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "\n", "Index: 365 entries, 2019-Jan-01 to 2019-Dec-31\n", "Data columns (total 7 columns):\n", "a 365 non-null float64\n", "b 365 non-null float64\n", "c 365 non-null float64\n", "d 365 non-null float64\n", "e 365 non-null float64\n", "f 365 non-null float64\n", "g 365 non-null float64\n", "dtypes: float64(7)\n", "memory usage: 22.8+ KB\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "KysAN-JKFs_c", "colab_type": "text" }, "source": [ "# Fungsi `timeseries.timestep_table()`\n", "\n", "- __Tujuan__: Membuat tabel _timesteps_ dari DataFrame\n", "- __Sintaks__: `prep.timeseries.timestep_table(dataframe, columns=None, timesteps=2, keep_first=True)`\n", "- __Return__: DataFrame\n", "- __Dokumentasi__: [readthedocs](https://hidrokit.readthedocs.io/en/stable/prep.html#prep.timeseries.timestep_table)" ] }, { "cell_type": "code", "metadata": { "id": "62fs0DjLHJP_", "colab_type": "code", "colab": {} }, "source": [ "from hidrokit.prep import timeseries" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "2HlTGY6B9AlK", "colab_type": "text" }, "source": [ "## Argument: `None`\n", "\n", "Jika tidak diberikan argumen maka menggunakan nilai _default_ yaitu seluruh kolom akan dibuat _timestep_ dan menyertakan kolom pada waktu $t_{0}$. Nilai _default_ _timesteps_ adalah dua baris sebelumnya (dalam kasus ini, dua hari sebelumnya)." ] }, { "cell_type": "code", "metadata": { "id": "Musp2y418ziX", "colab_type": "code", "outputId": "5c535d80-7db1-4cab-bd3b-707a3bef0a22", "colab": { "base_uri": "https://localhost:8080/", "height": 394 } }, "source": [ "tabel_ts = timeseries.timestep_table(dataset)\n", "tabel_ts.head()" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
a_tmin0a_tmin1a_tmin2b_tmin0b_tmin1b_tmin2c_tmin0c_tmin1c_tmin2d_tmin0d_tmin1d_tmin2e_tmin0e_tmin1e_tmin2f_tmin0f_tmin1f_tmin2g_tmin0g_tmin1g_tmin2
2019-Jan-0350.086.029.052.034.032.072.080.026.046.032.061.03.016.05.018.017.022.081.076.078.0
2019-Jan-045.050.086.02.052.034.086.072.080.036.046.032.019.03.016.09.018.017.097.081.076.0
2019-Jan-059.05.050.093.02.052.07.086.072.032.036.046.055.019.03.062.09.018.031.097.081.0
2019-Jan-0694.09.05.038.093.02.087.07.086.087.032.036.051.055.019.0100.062.09.018.031.097.0
2019-Jan-0754.094.09.013.038.093.023.087.07.059.087.032.043.051.055.066.0100.062.068.018.031.0
\n", "
" ], "text/plain": [ " a_tmin0 a_tmin1 a_tmin2 ... g_tmin0 g_tmin1 g_tmin2\n", "2019-Jan-03 50.0 86.0 29.0 ... 81.0 76.0 78.0\n", "2019-Jan-04 5.0 50.0 86.0 ... 97.0 81.0 76.0\n", "2019-Jan-05 9.0 5.0 50.0 ... 31.0 97.0 81.0\n", "2019-Jan-06 94.0 9.0 5.0 ... 18.0 31.0 97.0\n", "2019-Jan-07 54.0 94.0 9.0 ... 68.0 18.0 31.0\n", "\n", "[5 rows x 21 columns]" ] }, "metadata": { "tags": [] }, "execution_count": 11 } ] }, { "cell_type": "markdown", "metadata": { "id": "9-IjTbLI9gsE", "colab_type": "text" }, "source": [ "## Argument: `columns=`\n", "\n", "Memilih kolom tertentu yang akan dimanipulasi." ] }, { "cell_type": "code", "metadata": { "id": "pFB0fZPG9pxh", "colab_type": "code", "outputId": "ed9bc011-6fdf-4c9f-fd31-ed544b9d9ccc", "colab": { "base_uri": "https://localhost:8080/", "height": 204 } }, "source": [ "tabel_ts_columns = timeseries.timestep_table(dataset, columns=['a', 'c', 'd'])\n", "tabel_ts_columns.head()" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
a_tmin0a_tmin1a_tmin2bc_tmin0c_tmin1c_tmin2d_tmin0d_tmin1d_tmin2efg
2019-Jan-0350.086.029.052.072.080.026.046.032.061.03.018.081.0
2019-Jan-045.050.086.02.086.072.080.036.046.032.019.09.097.0
2019-Jan-059.05.050.093.07.086.072.032.036.046.055.062.031.0
2019-Jan-0694.09.05.038.087.07.086.087.032.036.051.0100.018.0
2019-Jan-0754.094.09.013.023.087.07.059.087.032.043.066.068.0
\n", "
" ], "text/plain": [ " a_tmin0 a_tmin1 a_tmin2 b ... d_tmin2 e f g\n", "2019-Jan-03 50.0 86.0 29.0 52.0 ... 61.0 3.0 18.0 81.0\n", "2019-Jan-04 5.0 50.0 86.0 2.0 ... 32.0 19.0 9.0 97.0\n", "2019-Jan-05 9.0 5.0 50.0 93.0 ... 46.0 55.0 62.0 31.0\n", "2019-Jan-06 94.0 9.0 5.0 38.0 ... 36.0 51.0 100.0 18.0\n", "2019-Jan-07 54.0 94.0 9.0 13.0 ... 32.0 43.0 66.0 68.0\n", "\n", "[5 rows x 13 columns]" ] }, "metadata": { "tags": [] }, "execution_count": 13 } ] }, { "cell_type": "markdown", "metadata": { "id": "WMjBRiCq-MbC", "colab_type": "text" }, "source": [ "## Argument: `keep_first=`\n", "\n", "Jika diatur `False` maka kolom waktu $t_0$ tidak disertakan." ] }, { "cell_type": "code", "metadata": { "id": "qDLOPOfe-RFk", "colab_type": "code", "outputId": "e20d45cc-13f7-47cd-cbb5-652b3ae68257", "colab": { "base_uri": "https://localhost:8080/", "height": 204 } }, "source": [ "tabel_ts_keep = timeseries.timestep_table(dataset, columns=['a', 'b', 'c'], keep_first=False)\n", "tabel_ts_keep.head()" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
a_tmin1a_tmin2b_tmin1b_tmin2c_tmin1c_tmin2defg
2019-Jan-0350.086.052.034.072.080.046.03.018.081.0
2019-Jan-045.050.02.052.086.072.036.019.09.097.0
2019-Jan-059.05.093.02.07.086.032.055.062.031.0
2019-Jan-0694.09.038.093.087.07.087.051.0100.018.0
2019-Jan-0754.094.013.038.023.087.059.043.066.068.0
\n", "
" ], "text/plain": [ " a_tmin1 a_tmin2 b_tmin1 b_tmin2 ... d e f g\n", "2019-Jan-03 50.0 86.0 52.0 34.0 ... 46.0 3.0 18.0 81.0\n", "2019-Jan-04 5.0 50.0 2.0 52.0 ... 36.0 19.0 9.0 97.0\n", "2019-Jan-05 9.0 5.0 93.0 2.0 ... 32.0 55.0 62.0 31.0\n", "2019-Jan-06 94.0 9.0 38.0 93.0 ... 87.0 51.0 100.0 18.0\n", "2019-Jan-07 54.0 94.0 13.0 38.0 ... 59.0 43.0 66.0 68.0\n", "\n", "[5 rows x 10 columns]" ] }, "metadata": { "tags": [] }, "execution_count": 15 } ] }, { "cell_type": "markdown", "metadata": { "id": "TpnPsOLa-mHM", "colab_type": "text" }, "source": [ "## Argument: `timesteps=`\n", "\n", "Menentukan banyaknya baris yang disertakan dalam kolom _timesteps_. Contoh: membuat tabel dengan menyertakan informasi 4 hari sebelumnya." ] }, { "cell_type": "code", "metadata": { "id": "kcOXiyCd-1PC", "colab_type": "code", "outputId": "4b91d95a-24e4-44be-cc4e-9c5e8bbe73b0", "colab": { "base_uri": "https://localhost:8080/", "height": 204 } }, "source": [ "tabel_ts_time = timeseries.timestep_table(dataset, columns='a', keep_first=False, timesteps=4)\n", "tabel_ts_time.head()" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
a_tmin1a_tmin2a_tmin3a_tmin4bcdefg
2019-Jan-059.05.050.086.093.07.032.055.062.031.0
2019-Jan-0694.09.05.050.038.087.087.051.0100.018.0
2019-Jan-0754.094.09.05.013.023.059.043.066.068.0
2019-Jan-0861.054.094.09.041.096.073.057.044.077.0
2019-Jan-0989.061.054.094.054.040.077.066.051.076.0
\n", "
" ], "text/plain": [ " a_tmin1 a_tmin2 a_tmin3 a_tmin4 ... d e f g\n", "2019-Jan-05 9.0 5.0 50.0 86.0 ... 32.0 55.0 62.0 31.0\n", "2019-Jan-06 94.0 9.0 5.0 50.0 ... 87.0 51.0 100.0 18.0\n", "2019-Jan-07 54.0 94.0 9.0 5.0 ... 59.0 43.0 66.0 68.0\n", "2019-Jan-08 61.0 54.0 94.0 9.0 ... 73.0 57.0 44.0 77.0\n", "2019-Jan-09 89.0 61.0 54.0 94.0 ... 77.0 66.0 51.0 76.0\n", "\n", "[5 rows x 10 columns]" ] }, "metadata": { "tags": [] }, "execution_count": 18 } ] }, { "cell_type": "markdown", "metadata": { "id": "yHQFwa_nCE9p", "colab_type": "text" }, "source": [ "# Changelog\n", "\n", "```\n", "- 20190713 - 1.0.1 - Informasi notebook\n", "- 20190713 - 1.0.0 - Initial\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "gSU3NrNrCKoi", "colab_type": "text" }, "source": [ "#### Copyright © 2019 [Taruma Sakti Megariansyah](https://taruma.github.io)\n", "\n", "Source code in this notebook is licensed under a [MIT License](https://opensource.org/licenses/MIT). Data in this notebook is licensed under a [Creative Common Attribution 4.0 International](https://choosealicense.com/licenses/cc-by-4.0/). " ] } ] }