{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# The ROCKET transform\n", "\n", "## Overview\n", "\n", "ROCKET [1] transforms time series using random convolutional kernels (random length, weights, bias, dilation, and padding). ROCKET computes two features from the resulting feature maps: the max, and the proportion of positive values (or ppv). The transformed features are used to train a linear classifier.\n", "\n", "[1] Dempster A, Petitjean F, Webb GI (2019) ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels. [arXiv:1910.13051](https://arxiv.org/abs/1910.13051)\n", "\n", "***\n", "\n", "## Contents\n", "\n", "1. Imports\n", "2. Univariate Time Series\n", "3. Multivariate Time Series\n", "4. Pipeline Example\n", "\n", "***\n", "\n", "## 1 Imports\n", "\n", "Import example data, ROCKET, and a classifier (`RidgeClassifierCV` from scikit-learn), as well as NumPy and `make_pipeline` from scikit-learn.\n", "\n", "**Note**: ROCKET compiles (via Numba) on import, which may take a few seconds." ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:32:46.441933Z", "iopub.status.busy": "2020-12-19T14:32:46.441213Z", "iopub.status.idle": "2020-12-19T14:32:46.443225Z", "shell.execute_reply": "2020-12-19T14:32:46.444014Z" }, "ExecuteTime": { "end_time": "2024-11-25T17:01:32.515016Z", "start_time": "2024-11-25T17:01:32.504509Z" } }, "source": [ "# !pip install --upgrade numba" ], "outputs": [], "execution_count": 33 }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:32:46.448396Z", "iopub.status.busy": "2020-12-19T14:32:46.447602Z", "iopub.status.idle": "2020-12-19T14:32:51.904418Z", "shell.execute_reply": "2020-12-19T14:32:51.905034Z" }, "ExecuteTime": { "end_time": "2024-11-25T17:01:33.167188Z", "start_time": "2024-11-25T17:01:33.161609Z" } }, "source": [ "import numpy as np\n", "from sklearn.linear_model import RidgeClassifierCV\n", "from sklearn.pipeline import make_pipeline\n", "\n", "from aeon.datasets import load_basic_motions # multivariate dataset\n", "from aeon.datasets import load_gunpoint # univariate dataset\n", "from aeon.transformations.collection.convolution_based import Rocket" ], "outputs": [], "execution_count": 34 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2 Univariate Time Series\n", "\n", "We can transform the data using ROCKET and separately fit a classifier, or we can use ROCKET together with a classifier in a pipeline (section 4, below).\n", "\n", "### 2.1 Load the Training Data\n", "For more details on the data set, see the [univariate time series classification\n", "notebook](https://github.com/aeon-toolkit/aeon/tree/main/examples/classification/classification.ipynb)." ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:32:51.908710Z", "iopub.status.busy": "2020-12-19T14:32:51.908101Z", "iopub.status.idle": "2020-12-19T14:32:51.918987Z", "shell.execute_reply": "2020-12-19T14:32:51.919508Z" }, "ExecuteTime": { "end_time": "2024-11-25T17:01:34.603321Z", "start_time": "2024-11-25T17:01:34.573759Z" } }, "source": [ "X_train, y_train = load_gunpoint(split=\"train\")\n", "X_train = X_train[:5, :, :]\n", "y_train = y_train[:5]\n", "print(X_train.shape)" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(5, 1, 150)\n" ] } ], "execution_count": 35 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2 Initialise ROCKET and Transform the Training Data" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:32:51.923023Z", "iopub.status.busy": "2020-12-19T14:32:51.922451Z", "iopub.status.idle": "2020-12-19T14:32:52.164365Z", "shell.execute_reply": "2020-12-19T14:32:52.164864Z" }, "ExecuteTime": { "end_time": "2024-11-25T17:01:35.852821Z", "start_time": "2024-11-25T17:01:35.837832Z" } }, "source": [ "rocket = Rocket(n_kernels=100) # by default, ROCKET uses 10,000 kernels\n", "rocket.fit(X_train)\n", "X_train_transform = rocket.transform(X_train)\n", "print(X_train_transform.shape)" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(5, 200)\n" ] } ], "execution_count": 36 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.3 Fit a Classifier" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We recommend using `RidgeClassifierCV` from scikit-learn for smaller datasets (fewer than approx. 20K training examples), and using logistic regression trained using stochastic gradient descent for larger datasets." ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:32:52.168847Z", "iopub.status.busy": "2020-12-19T14:32:52.168155Z", "iopub.status.idle": "2020-12-19T14:32:52.284816Z", "shell.execute_reply": "2020-12-19T14:32:52.285506Z" }, "ExecuteTime": { "end_time": "2024-11-25T17:01:38.060428Z", "start_time": "2024-11-25T17:01:38.038775Z" } }, "source": [ "classifier = RidgeClassifierCV(alphas=np.logspace(-3, 3, 10))\n", "classifier.fit(X_train_transform, y_train)" ], "outputs": [ { "data": { "text/plain": [ "RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n", " 4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n", " 2.15443469e+02, 1.00000000e+03]))" ], "text/html": [ "
RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n",
       "       4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n",
       "       2.15443469e+02, 1.00000000e+03]))
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 37 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.4 Load and Transform the Test Data" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:32:52.289448Z", "iopub.status.busy": "2020-12-19T14:32:52.288717Z", "iopub.status.idle": "2020-12-19T14:32:53.307829Z", "shell.execute_reply": "2020-12-19T14:32:53.308341Z" }, "ExecuteTime": { "end_time": "2024-11-25T17:01:39.178929Z", "start_time": "2024-11-25T17:01:39.136007Z" } }, "source": [ "X_test, y_test = load_gunpoint(split=\"test\")\n", "X_test_transform = rocket.transform(X_test)" ], "outputs": [], "execution_count": 38 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.5 Classify the Test Data" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:32:53.312125Z", "iopub.status.busy": "2020-12-19T14:32:53.311628Z", "iopub.status.idle": "2020-12-19T14:32:53.409775Z", "shell.execute_reply": "2020-12-19T14:32:53.410342Z" }, "scrolled": true, "ExecuteTime": { "end_time": "2024-11-25T17:01:40.547350Z", "start_time": "2024-11-25T17:01:40.533334Z" } }, "source": [ "classifier.score(X_test_transform, y_test)" ], "outputs": [ { "data": { "text/plain": [ "0.64" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 39 }, { "cell_type": "markdown", "metadata": {}, "source": [ "***\n", "\n", "## 3 Multivariate Time Series\n", "\n", "We can use ROCKET in exactly the same way for multivariate time series.\n", "\n", "### 3.1 Load the Training Data" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:32:53.413597Z", "iopub.status.busy": "2020-12-19T14:32:53.412786Z", "iopub.status.idle": "2020-12-19T14:32:53.775638Z", "shell.execute_reply": "2020-12-19T14:32:53.776690Z" }, "ExecuteTime": { "end_time": "2024-11-25T17:01:41.782580Z", "start_time": "2024-11-25T17:01:41.767897Z" } }, "source": [ "X_train, y_train = load_basic_motions(split=\"train\")" ], "outputs": [], "execution_count": 40 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2 Initialise ROCKET and Transform the Training Data" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:32:53.794896Z", "iopub.status.busy": "2020-12-19T14:32:53.794345Z", "iopub.status.idle": "2020-12-19T14:32:54.613570Z", "shell.execute_reply": "2020-12-19T14:32:54.614198Z" }, "ExecuteTime": { "end_time": "2024-11-25T17:01:42.949980Z", "start_time": "2024-11-25T17:01:42.918211Z" } }, "source": [ "rocket = Rocket(n_kernels=100) # by default, ROCKET uses 10,000 kernels\n", "rocket.fit(X_train)\n", "X_train_transform = rocket.transform(X_train)\n", "X_train_transform.shape" ], "outputs": [ { "data": { "text/plain": [ "(40, 200)" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 41 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.3 Fit a Classifier" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:32:54.618359Z", "iopub.status.busy": "2020-12-19T14:32:54.617890Z", "iopub.status.idle": "2020-12-19T14:32:54.836560Z", "shell.execute_reply": "2020-12-19T14:32:54.837249Z" }, "ExecuteTime": { "end_time": "2024-11-25T17:01:44.038154Z", "start_time": "2024-11-25T17:01:44.002549Z" } }, "source": [ "classifier = RidgeClassifierCV(alphas=np.logspace(-3, 3, 10))\n", "classifier.fit(X_train_transform, y_train)" ], "outputs": [ { "data": { "text/plain": [ "RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n", " 4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n", " 2.15443469e+02, 1.00000000e+03]))" ], "text/html": [ "
RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n",
       "       4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n",
       "       2.15443469e+02, 1.00000000e+03]))
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 42 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.4 Load and Transform the Test Data" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:32:54.841004Z", "iopub.status.busy": "2020-12-19T14:32:54.840351Z", "iopub.status.idle": "2020-12-19T14:32:55.906455Z", "shell.execute_reply": "2020-12-19T14:32:55.907064Z" }, "ExecuteTime": { "end_time": "2024-11-25T17:01:45.150937Z", "start_time": "2024-11-25T17:01:45.106121Z" } }, "source": [ "X_test, y_test = load_basic_motions(split=\"test\")\n", "X_test_transform = rocket.transform(X_test)" ], "outputs": [], "execution_count": 43 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.5 Classify the Test Data" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:32:55.910253Z", "iopub.status.busy": "2020-12-19T14:32:55.909743Z", "iopub.status.idle": "2020-12-19T14:32:56.008364Z", "shell.execute_reply": "2020-12-19T14:32:56.008931Z" }, "scrolled": true, "ExecuteTime": { "end_time": "2024-11-25T17:01:46.229312Z", "start_time": "2024-11-25T17:01:46.215072Z" } }, "source": [ "classifier.score(X_test_transform, y_test)" ], "outputs": [ { "data": { "text/plain": [ "0.975" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 44 }, { "cell_type": "markdown", "metadata": {}, "source": [ "***\n", "\n", "## 4 Pipeline Example\n", "\n", "We can use ROCKET together with `RidgeClassifierCV` (or another classifier) in a pipeline. We can then use the pipeline like a self-contained classifier, with a single call to `fit`, and without having to separately transform the data, etc.\n", "\n", "### 4.1 Initialise the Pipeline" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:32:56.012465Z", "iopub.status.busy": "2020-12-19T14:32:56.011939Z", "iopub.status.idle": "2020-12-19T14:32:56.013801Z", "shell.execute_reply": "2020-12-19T14:32:56.014399Z" }, "ExecuteTime": { "end_time": "2024-11-25T17:01:47.349648Z", "start_time": "2024-11-25T17:01:47.345129Z" } }, "source": [ "rocket_pipeline = make_pipeline(\n", " Rocket(), RidgeClassifierCV(alphas=np.logspace(-3, 3, 10))\n", ")" ], "outputs": [], "execution_count": 45 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.2 Load and Fit the Training Data" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:32:56.017692Z", "iopub.status.busy": "2020-12-19T14:32:56.017166Z", "iopub.status.idle": "2020-12-19T14:32:56.420648Z", "shell.execute_reply": "2020-12-19T14:32:56.421247Z" }, "ExecuteTime": { "end_time": "2024-11-25T17:01:49.740497Z", "start_time": "2024-11-25T17:01:48.459632Z" } }, "source": [ "# it is necessary to pass y_train to the pipeline\n", "# y_train is not used for the transform, but it is used by the classifier\n", "rocket_pipeline.fit(X_train, y_train)" ], "outputs": [ { "data": { "text/plain": [ "Pipeline(steps=[('rocket', Rocket()),\n", " ('ridgeclassifiercv',\n", " RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n", " 4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n", " 2.15443469e+02, 1.00000000e+03])))])" ], "text/html": [ "
Pipeline(steps=[('rocket', Rocket()),\n",
       "                ('ridgeclassifiercv',\n",
       "                 RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n",
       "       4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n",
       "       2.15443469e+02, 1.00000000e+03])))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 46 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.3 Load and Classify the Test Data" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:32:56.425026Z", "iopub.status.busy": "2020-12-19T14:32:56.424348Z", "iopub.status.idle": "2020-12-19T14:32:57.602704Z", "shell.execute_reply": "2020-12-19T14:32:57.603291Z" }, "ExecuteTime": { "end_time": "2024-11-25T17:01:50.960464Z", "start_time": "2024-11-25T17:01:49.763086Z" } }, "source": [ "rocket_pipeline.score(X_test, y_test)" ], "outputs": [ { "data": { "text/plain": [ "0.975" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 47 }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": "" } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8" } }, "nbformat": 4, "nbformat_minor": 4 }