{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# The ROCKET transform\n", "\n", "## Overview\n", "\n", "ROCKET [1] transforms time series using random convolutional kernels (random length, weights, bias, dilation, and padding). ROCKET computes two features from the resulting feature maps: the max, and the proportion of positive values (or ppv). The transformed features are used to train a linear classifier.\n", "\n", "[1] Dempster A, Petitjean F, Webb GI (2019) ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels. [arXiv:1910.13051](https://arxiv.org/abs/1910.13051)\n", "\n", "***\n", "\n", "## Contents\n", "\n", "1. Imports\n", "2. Univariate Time Series\n", "3. Multivariate Time Series\n", "4. Pipeline Example\n", "\n", "***\n", "\n", "## 1 Imports\n", "\n", "Import example data, ROCKET, and a classifier (`RidgeClassifierCV` from scikit-learn), as well as NumPy and `make_pipeline` from scikit-learn.\n", "\n", "**Note**: ROCKET compiles (via Numba) on import, which may take a few seconds." ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:32:46.441933Z", "iopub.status.busy": "2020-12-19T14:32:46.441213Z", "iopub.status.idle": "2020-12-19T14:32:46.443225Z", "shell.execute_reply": "2020-12-19T14:32:46.444014Z" }, "ExecuteTime": { "end_time": "2024-11-25T17:01:32.515016Z", "start_time": "2024-11-25T17:01:32.504509Z" } }, "source": [ "# !pip install --upgrade numba" ], "outputs": [], "execution_count": 33 }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:32:46.448396Z", "iopub.status.busy": "2020-12-19T14:32:46.447602Z", "iopub.status.idle": "2020-12-19T14:32:51.904418Z", "shell.execute_reply": "2020-12-19T14:32:51.905034Z" }, "ExecuteTime": { "end_time": "2024-11-25T17:01:33.167188Z", "start_time": "2024-11-25T17:01:33.161609Z" } }, "source": [ "import numpy as np\n", "from sklearn.linear_model import RidgeClassifierCV\n", "from sklearn.pipeline import make_pipeline\n", "\n", "from aeon.datasets import load_basic_motions # multivariate dataset\n", "from aeon.datasets import load_gunpoint # univariate dataset\n", "from aeon.transformations.collection.convolution_based import Rocket" ], "outputs": [], "execution_count": 34 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2 Univariate Time Series\n", "\n", "We can transform the data using ROCKET and separately fit a classifier, or we can use ROCKET together with a classifier in a pipeline (section 4, below).\n", "\n", "### 2.1 Load the Training Data\n", "For more details on the data set, see the [univariate time series classification\n", "notebook](https://github.com/aeon-toolkit/aeon/tree/main/examples/classification/classification.ipynb)." ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:32:51.908710Z", "iopub.status.busy": "2020-12-19T14:32:51.908101Z", "iopub.status.idle": "2020-12-19T14:32:51.918987Z", "shell.execute_reply": "2020-12-19T14:32:51.919508Z" }, "ExecuteTime": { "end_time": "2024-11-25T17:01:34.603321Z", "start_time": "2024-11-25T17:01:34.573759Z" } }, "source": [ "X_train, y_train = load_gunpoint(split=\"train\")\n", "X_train = X_train[:5, :, :]\n", "y_train = y_train[:5]\n", "print(X_train.shape)" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(5, 1, 150)\n" ] } ], "execution_count": 35 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2 Initialise ROCKET and Transform the Training Data" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:32:51.923023Z", "iopub.status.busy": "2020-12-19T14:32:51.922451Z", "iopub.status.idle": "2020-12-19T14:32:52.164365Z", "shell.execute_reply": "2020-12-19T14:32:52.164864Z" }, "ExecuteTime": { "end_time": "2024-11-25T17:01:35.852821Z", "start_time": "2024-11-25T17:01:35.837832Z" } }, "source": [ "rocket = Rocket(n_kernels=100) # by default, ROCKET uses 10,000 kernels\n", "rocket.fit(X_train)\n", "X_train_transform = rocket.transform(X_train)\n", "print(X_train_transform.shape)" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(5, 200)\n" ] } ], "execution_count": 36 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.3 Fit a Classifier" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We recommend using `RidgeClassifierCV` from scikit-learn for smaller datasets (fewer than approx. 20K training examples), and using logistic regression trained using stochastic gradient descent for larger datasets." ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:32:52.168847Z", "iopub.status.busy": "2020-12-19T14:32:52.168155Z", "iopub.status.idle": "2020-12-19T14:32:52.284816Z", "shell.execute_reply": "2020-12-19T14:32:52.285506Z" }, "ExecuteTime": { "end_time": "2024-11-25T17:01:38.060428Z", "start_time": "2024-11-25T17:01:38.038775Z" } }, "source": [ "classifier = RidgeClassifierCV(alphas=np.logspace(-3, 3, 10))\n", "classifier.fit(X_train_transform, y_train)" ], "outputs": [ { "data": { "text/plain": [ "RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n", " 4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n", " 2.15443469e+02, 1.00000000e+03]))" ], "text/html": [ "
RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n",
" 4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n",
" 2.15443469e+02, 1.00000000e+03]))In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n",
" 4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n",
" 2.15443469e+02, 1.00000000e+03]))RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n",
" 4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n",
" 2.15443469e+02, 1.00000000e+03]))In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n",
" 4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n",
" 2.15443469e+02, 1.00000000e+03]))Pipeline(steps=[('rocket', Rocket()),\n",
" ('ridgeclassifiercv',\n",
" RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n",
" 4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n",
" 2.15443469e+02, 1.00000000e+03])))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. Pipeline(steps=[('rocket', Rocket()),\n",
" ('ridgeclassifiercv',\n",
" RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n",
" 4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n",
" 2.15443469e+02, 1.00000000e+03])))])Rocket()
RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n",
" 4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n",
" 2.15443469e+02, 1.00000000e+03]))