{ "cells": [ { "cell_type": "markdown", "id": "34ffe5a3-f269-4630-af59-05af2f656400", "metadata": {}, "source": [ "[](https://colab.research.google.com/drive/1QsYjUX3DgS8ccvDtxEgLeHHmbtPViIqV?usp=drive_link)\n", "\n", "### We recommend using the [Google Colab](https://colab.research.google.com/drive/1QsYjUX3DgS8ccvDtxEgLeHHmbtPViIqV?usp=drive_link) verion of the notebook!" ] }, { "cell_type": "markdown", "id": "f48bfa5b-05ca-493e-a892-0eff6dbcb4a6", "metadata": {}, "source": [ "# Convert UCR data to Orion format\n", "\n", "In this notebook we download the data and reformat it\n", "as Orion pipelines expect.\n", "\n", "### Download the data" ] }, { "cell_type": "code", "execution_count": 1, "id": "ac981ead", "metadata": { "id": "ac981ead" }, "outputs": [], "source": [ "# download dataset & unzip\n", "\n", "import io\n", "import os\n", "import urllib\n", "import zipfile\n", "\n", "DATA_URL = 'https://www.cs.ucr.edu/~eamonn/time_series_data_2018/UCR_TimeSeriesAnomalyDatasets2021.zip'\n", "\n", "response = urllib.request.urlopen(DATA_URL)\n", "bytes_io = io.BytesIO(response.read())\n", "\n", "with zipfile.ZipFile(bytes_io) as zf:\n", " zf.extractall()" ] }, { "cell_type": "code", "execution_count": 2, "id": "8c719486", "metadata": { "id": "8c719486" }, "outputs": [], "source": [ "DATA_PATH = os.path.join('AnomalyDatasets_2021',\n", " 'UCR_TimeSeriesAnomalyDatasets2021',\n", " 'FilesAreInHere',\n", " 'UCR_Anomaly_FullData')\n", "\n", "SAVE_TO = 'UCR'\n", "os.makedirs(SAVE_TO, exist_ok=True)" ] }, { "cell_type": "code", "execution_count": 3, "id": "23207f6e", "metadata": { "id": "23207f6e" }, "outputs": [], "source": [ "import csv\n", "import numpy as np\n", "import pandas as pd\n", "from tqdm import tqdm" ] }, { "cell_type": "markdown", "id": "3c5b24d9", "metadata": { "id": "3c5b24d9" }, "source": [ "#### Format\n", "\n", "012_UCR_Anomaly_tiltAPB1_100000_114283_114350.txt\n", "\n", "- `012` Dataset number\n", "- `tiltAPB1` Mnemonic name\n", "- `100000` From 1 to X is training data\n", "- `114283` Begin anomaly\n", "- `114350` End anomaly" ] }, { "cell_type": "code", "execution_count": 4, "id": "bd411a2b", "metadata": { "id": "bd411a2b", "outputId": "c69079c0-b034-4a55-8492-3fb96492db47" }, "outputs": [ { "data": { "text/html": [ "
| \n", " | timestamp | \n", "value | \n", "
|---|---|---|
| 0 | \n", "1222819200 | \n", "1990.0 | \n", "
| 1 | \n", "1222819500 | \n", "1996.0 | \n", "
| 2 | \n", "1222819800 | \n", "1958.0 | \n", "
| 3 | \n", "1222820100 | \n", "1958.0 | \n", "
| 4 | \n", "1222820400 | \n", "1923.0 | \n", "
| \n", " | signal | \n", "events | \n", "
|---|---|---|
| 0 | \n", "183-qtdbSel100MLII | \n", "[[1226839200, 1226959200]] | \n", "
| 1 | \n", "194-sddb49 | \n", "[[1243204200, 1243279200]] | \n", "
| 2 | \n", "069-DISTORTEDinsectEPG5 | \n", "[[1225369200, 1225369500]] | \n", "
| 3 | \n", "023-DISTORTEDGP711MarkerLFM5z5 | \n", "[[1225402800, 1225434000]] | \n", "
| 4 | \n", "212-Italianpowerdemand | \n", "[[1231663200, 1231670400]] | \n", "
| ... | \n", "... | \n", "... | \n", "
| 245 | \n", "075-DISTORTEDqtdbSel100MLII | \n", "[[1226839200, 1226959200]] | \n", "
| 246 | \n", "132-InternalBleeding10 | \n", "[[1224177000, 1224186000]] | \n", "
| 247 | \n", "109-1sddb40 | \n", "[[1238419200, 1238605200]] | \n", "
| 248 | \n", "176-insectEPG4 | \n", "[[1224771600, 1224786600]] | \n", "
| 249 | \n", "098-NOISEInternalBleeding16 | \n", "[[1224075300, 1224078900]] | \n", "
250 rows × 2 columns
\n", "