{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# MiniRocket\n", "\n", "MiniRocket [1] transforms input time series using a small, fixed set of convolutional\n", "kernels. MiniRocket uses PPV pooling to compute a single feature for each of the resulting feature maps (i.e., the proportion of positive values). The transformed features are used to train a linear classifier.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1 Univariate Time Series\n", "\n", "### 1.1 Imports\n", "\n", "Import example data, `MiniRocket`, `MiniRocketClassifier`, `MiniRocketRegressor`,\n", "`RidgeClassifierCV` (scikit-learn), and ``numpy``.\n", "\n", "You can use the `MiniRocket`transform directly, in a pipeline, or in our baked in `MiniRocketClassifier` or `MiniRocketRegressor`.\n", "\n", "**Note**: ``MiniRocket`` is compiled by ``numba`` on import. The compiled functions are\n", "cached, so this should only happen once (i.e., the first time you import ``MiniRocket``)." ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-10-12T17:43:03.214929Z", "iopub.status.busy": "2020-10-12T17:43:03.214184Z", "iopub.status.idle": "2020-10-12T17:43:03.216304Z", "shell.execute_reply": "2020-10-12T17:43:03.216990Z" }, "ExecuteTime": { "end_time": "2024-11-25T11:08:58.368462Z", "start_time": "2024-11-25T11:08:58.349939Z" } }, "source": [ "# !pip install --upgrade numba" ], "outputs": [], "execution_count": 1 }, { "metadata": { "ExecuteTime": { "end_time": "2024-11-25T11:10:18.327182Z", "start_time": "2024-11-25T11:08:59.095253Z" } }, "cell_type": "code", "source": [ "import numpy as np\n", "from sklearn.linear_model import RidgeClassifierCV\n", "from sklearn.preprocessing import StandardScaler\n", "\n", "from aeon.classification.convolution_based import MiniRocketClassifier\n", "from aeon.datasets import load_arrow_head # univariate dataset\n", "from aeon.datasets import load_basic_motions # multivariate dataset\n", "from aeon.regression.convolution_based import MiniRocketRegressor\n", "from aeon.transformations.collection.convolution_based import MiniRocket" ], "outputs": [], "execution_count": 2 }, { "metadata": { "ExecuteTime": { "end_time": "2024-11-25T11:10:23.328664Z", "start_time": "2024-11-25T11:10:23.234728Z" } }, "cell_type": "code", "source": [ "X_train, y_train = load_arrow_head(split=\"train\")\n", "minirocket = MiniRocket() # by default, MiniRocket uses ~10_000 kernels\n", "minirocket.fit(X_train)\n", "X_train_transform = minirocket.transform(X_train)\n", "# test shape of transformed training data -> (n_cases, 9_996)\n", "X_train_transform.shape" ], "outputs": [ { "data": { "text/plain": [ "(36, 9996)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 5 }, { "metadata": {}, "cell_type": "markdown", "source": [ "### 1.4 Fit a Classifier" ] }, { "metadata": {}, "cell_type": "markdown", "source": [ "We suggest using `RidgeClassifierCV` (scikit-learn) for smaller datasets (fewer than ~10,000 training examples), and using logistic regression trained using stochastic gradient descent for larger datasets.\n", "\n", "**Note**: For larger datasets, this means integrating MiniRocket with stochastic gradient descent such that the transform is performed per minibatch, *not* simply substituting `RidgeClassifierCV` for, e.g., `LogisticRegression`.\n", "\n", "**Note**: While the input time-series of MiniRocket is unscaled, the output features of MiniRocket may need to be adjusted for following models. E.g. for `RidgeClassifierCV`, we scale the features using the sklearn StandardScaler." ] }, { "metadata": { "ExecuteTime": { "end_time": "2024-11-25T11:10:26.380394Z", "start_time": "2024-11-25T11:10:26.343196Z" } }, "cell_type": "code", "source": [ "scaler = StandardScaler(with_mean=False)\n", "classifier = RidgeClassifierCV(alphas=np.logspace(-3, 3, 10))\n", "X_train_scaled_transform = scaler.fit_transform(X_train_transform)\n", "classifier.fit(X_train_scaled_transform, y_train)" ], "outputs": [ { "data": { "text/plain": [ "RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n", " 4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n", " 2.15443469e+02, 1.00000000e+03]))" ], "text/html": [ "
RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n",
       "       4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n",
       "       2.15443469e+02, 1.00000000e+03]))
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 6 }, { "metadata": {}, "cell_type": "markdown", "source": [ "Or just use the provide baked in ``MiniRocketClassifier`` which contains\n", "the scaler and classifier." ] }, { "metadata": { "ExecuteTime": { "end_time": "2024-11-25T11:10:28.558282Z", "start_time": "2024-11-25T11:10:28.438303Z" } }, "cell_type": "code", "source": [ "mr = MiniRocketClassifier()\n", "mr.fit(X_train, y_train)" ], "outputs": [ { "data": { "text/plain": [ "MiniRocketClassifier()" ], "text/html": [ "
MiniRocketClassifier()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 7 }, { "metadata": {}, "cell_type": "markdown", "source": [ "### 1.5 Load and Transform the Test Data" ] }, { "metadata": { "ExecuteTime": { "end_time": "2024-11-25T11:10:36.310828Z", "start_time": "2024-11-25T11:10:36.188643Z" } }, "cell_type": "code", "source": [ "X_test, y_test = load_arrow_head(split=\"test\")\n", "X_test_transform = minirocket.transform(X_test)" ], "outputs": [], "execution_count": 8 }, { "metadata": {}, "cell_type": "markdown", "source": [ "### 1.6 Classify the Test Data" ] }, { "metadata": {}, "cell_type": "markdown", "source": [ "## 2 Multivariate Time Series\n", "\n", "We can use MiniRocket with multivariate time series.\n" ] }, { "metadata": { "ExecuteTime": { "end_time": "2024-11-25T11:10:40.698690Z", "start_time": "2024-11-25T11:10:40.561965Z" } }, "cell_type": "code", "source": [ "X_test_scaled_transform = scaler.transform(X_test_transform)\n", "print(\" Score =\", classifier.score(X_test_scaled_transform, y_test))\n", "print(\" Score = \", mr.score(X_test, y_test))" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Score = 0.8514285714285714\n", " Score = 0.8685714285714285\n" ] } ], "execution_count": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load the Training Data\n", "\n", "**Note**: Input time series must be *at least* of length 9. Pad shorter time series\n", "using, e.g., `Padder` (`aeon.transformers.collection`)." ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-10-12T17:43:10.054652Z", "iopub.status.busy": "2020-10-12T17:43:10.034190Z", "iopub.status.idle": "2020-10-12T17:43:10.394311Z", "shell.execute_reply": "2020-10-12T17:43:10.394905Z" }, "ExecuteTime": { "end_time": "2024-11-25T11:10:43.874489Z", "start_time": "2024-11-25T11:10:43.846456Z" } }, "source": [ "X_train, y_train = load_basic_motions(split=\"train\")" ], "outputs": [], "execution_count": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.3 Initialise MiniRocket and Transform the Training Data" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-10-12T17:43:10.410718Z", "iopub.status.busy": "2020-10-12T17:43:10.410103Z", "iopub.status.idle": "2020-10-12T17:43:11.186318Z", "shell.execute_reply": "2020-10-12T17:43:11.186801Z" }, "ExecuteTime": { "end_time": "2024-11-25T11:10:45.517801Z", "start_time": "2024-11-25T11:10:45.415754Z" } }, "source": [ "mr = MiniRocket()\n", "mr.fit(X_train)\n", "X_train_transform = mr.transform(X_train)" ], "outputs": [], "execution_count": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.4 Fit a Classifier" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-10-12T17:43:11.190556Z", "iopub.status.busy": "2020-10-12T17:43:11.190017Z", "iopub.status.idle": "2020-10-12T17:43:11.396461Z", "shell.execute_reply": "2020-10-12T17:43:11.397135Z" }, "ExecuteTime": { "end_time": "2024-11-25T11:10:48.940610Z", "start_time": "2024-11-25T11:10:48.898236Z" } }, "source": [ "scaler = StandardScaler(with_mean=False)\n", "X_train_scaled_transform = scaler.fit_transform(X_train_transform)\n", "\n", "classifier = RidgeClassifierCV(alphas=np.logspace(-3, 3, 10))\n", "classifier.fit(X_train_scaled_transform, y_train)" ], "outputs": [ { "data": { "text/plain": [ "RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n", " 4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n", " 2.15443469e+02, 1.00000000e+03]))" ], "text/html": [ "
RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n",
       "       4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n",
       "       2.15443469e+02, 1.00000000e+03]))
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.5 Load and Transform the Test Data" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-10-12T17:43:11.401025Z", "iopub.status.busy": "2020-10-12T17:43:11.400273Z", "iopub.status.idle": "2020-10-12T17:43:12.450777Z", "shell.execute_reply": "2020-10-12T17:43:12.451162Z" }, "ExecuteTime": { "end_time": "2024-11-25T11:10:54.416071Z", "start_time": "2024-11-25T11:10:54.338117Z" } }, "source": [ "X_test, y_test = load_basic_motions(split=\"test\")\n", "X_test_transform = mr.transform(X_test)" ], "outputs": [], "execution_count": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.6 Classify the Test Data" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-10-12T17:43:12.494679Z", "iopub.status.busy": "2020-10-12T17:43:12.453795Z", "iopub.status.idle": "2020-10-12T17:43:12.548017Z", "shell.execute_reply": "2020-10-12T17:43:12.548575Z" }, "scrolled": true, "ExecuteTime": { "end_time": "2024-11-25T11:10:57.265809Z", "start_time": "2024-11-25T11:10:57.240878Z" } }, "source": [ "X_test_scaled_transform = scaler.transform(X_test_transform)\n", "classifier.score(X_test_scaled_transform, y_test)" ], "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Initialise the Pipeline" ] }, { "metadata": { "ExecuteTime": { "end_time": "2024-11-25T11:10:59.040716Z", "start_time": "2024-11-25T11:10:59.035170Z" } }, "cell_type": "code", "source": [ "from sklearn.linear_model import RidgeClassifierCV\n", "from sklearn.pipeline import make_pipeline\n", "from sklearn.preprocessing import StandardScaler\n", "\n", "minirocket_pipeline = make_pipeline(\n", " MiniRocket(),\n", " StandardScaler(with_mean=False),\n", " RidgeClassifierCV(alphas=np.logspace(-3, 3, 10)),\n", ")" ], "outputs": [], "execution_count": 15 }, { "metadata": {}, "cell_type": "markdown", "source": [ "Or just use the provide baked in ``MiniRocket`` classifier" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.3 Load and Fit the Training Data\n", "\n", "**Note**: Input time series must be *at least* of length 9. Pad shorter time series\n", "using, e.g., `Padder` (`aeon.transformers.collection`)." ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-10-12T17:43:12.557100Z", "iopub.status.busy": "2020-10-12T17:43:12.556478Z", "iopub.status.idle": "2020-10-12T17:43:12.885951Z", "shell.execute_reply": "2020-10-12T17:43:12.886625Z" }, "ExecuteTime": { "end_time": "2024-11-25T11:11:02.046780Z", "start_time": "2024-11-25T11:11:01.913155Z" } }, "source": [ "X_train, y_train = load_arrow_head(split=\"train\")\n", "\n", "# it is necessary to pass y_train to the pipeline\n", "# y_train is not used for the transform, but it is used by the classifier\n", "minirocket_pipeline.fit(X_train, y_train)" ], "outputs": [ { "data": { "text/plain": [ "Pipeline(steps=[('minirocket', MiniRocket()),\n", " ('standardscaler', StandardScaler(with_mean=False)),\n", " ('ridgeclassifiercv',\n", " RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n", " 4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n", " 2.15443469e+02, 1.00000000e+03])))])" ], "text/html": [ "
Pipeline(steps=[('minirocket', MiniRocket()),\n",
       "                ('standardscaler', StandardScaler(with_mean=False)),\n",
       "                ('ridgeclassifiercv',\n",
       "                 RidgeClassifierCV(alphas=array([1.00000000e-03, 4.64158883e-03, 2.15443469e-02, 1.00000000e-01,\n",
       "       4.64158883e-01, 2.15443469e+00, 1.00000000e+01, 4.64158883e+01,\n",
       "       2.15443469e+02, 1.00000000e+03])))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.4 Load and Classify the Test Data" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-10-12T17:43:12.890535Z", "iopub.status.busy": "2020-10-12T17:43:12.889866Z", "iopub.status.idle": "2020-10-12T17:43:13.897048Z", "shell.execute_reply": "2020-10-12T17:43:13.897624Z" }, "ExecuteTime": { "end_time": "2024-11-25T11:11:31.335457Z", "start_time": "2024-11-25T11:11:31.003725Z" } }, "source": [ "X_test, y_test = load_arrow_head(split=\"test\")\n", "\n", "minirocket_pipeline.score(X_test, y_test)\n", "minirocket_pipeline.fit(X_train, y_train)\n", "pred = minirocket_pipeline.predict(X_test)\n" ], "outputs": [], "execution_count": 18 }, { "metadata": {}, "cell_type": "markdown", "source": [ "### Time series regression\n", "\n", "You can also use MiniRocket for time series regression." ] }, { "metadata": { "ExecuteTime": { "end_time": "2024-11-25T11:11:35.125060Z", "start_time": "2024-11-25T11:11:34.966868Z" } }, "cell_type": "code", "source": [ "from aeon.datasets import load_covid_3month\n", "\n", "X_train, y_train = load_covid_3month(split=\"train\")\n", "X_test, y_test = load_covid_3month(split=\"test\")\n", "mr = MiniRocketRegressor()\n", "mr.fit(X_train, y_train)\n", "mr.score(X_test, y_test)" ], "outputs": [ { "data": { "text/plain": [ "0.1619927701771796" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 19 }, { "metadata": {}, "cell_type": "markdown", "source": [ "### References\n", "[1] Angus Dempster, Daniel F. Schmidt, Geoffrey I. Webb. A Very Fast (Almost)\n", "Deterministic Transform for Time Series Classification [arXiv:2012.08791](https://arxiv.org/abs/2012.08791)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": "" } ], "metadata": { "kernelspec": { "display_name": "Python 3.10.6 ('env_aeon')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" }, "vscode": { "interpreter": { "hash": "4a026b07681b07f232f4be37689469f2f19c060d7d208c4f2bde2ef874c4e7ae" } } }, "nbformat": 4, "nbformat_minor": 4 }