{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Feature extraction with tsfresh transformer\n", "\n", "[tsfresh](https://tsfresh.readthedocs.io) is a tool for extacting summary features\n", "from a collection of time series. It is an unsupervised transformation, and as such\n", "can easily be used as a pipeline stage in classification, clustering and regression\n", "in conjunction with a scikit-learn compatible estimator.\n", "\n", "## Preliminaries\n", "You have to install tsfresh if you haven't already. To install it, uncomment the cell below:" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:30:39.713903Z", "iopub.status.busy": "2020-12-19T14:30:39.713342Z", "iopub.status.idle": "2020-12-19T14:30:39.715128Z", "shell.execute_reply": "2020-12-19T14:30:39.715641Z" }, "ExecuteTime": { "end_time": "2024-11-25T14:07:05.457198Z", "start_time": "2024-11-25T14:07:05.449815Z" } }, "source": [ "# !pip install --upgrade tsfresh" ], "outputs": [], "execution_count": 1 }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:30:39.719083Z", "iopub.status.busy": "2020-12-19T14:30:39.718586Z", "iopub.status.idle": "2020-12-19T14:30:40.743724Z", "shell.execute_reply": "2020-12-19T14:30:40.744213Z" }, "ExecuteTime": { "end_time": "2024-11-25T14:07:07.829632Z", "start_time": "2024-11-25T14:07:06.056664Z" } }, "source": [ "from sklearn.ensemble import RandomForestClassifier\n", "from sklearn.pipeline import make_pipeline\n", "\n", "from aeon.datasets import load_arrow_head, load_basic_motions\n", "from aeon.transformations.collection.feature_based import TSFresh, TSFreshRelevant" ], "outputs": [], "execution_count": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example data set\n", "\n", "We use the ArrowHead data from the [UCR TSC archive](https://timeseriesclassification.com).\n", "as an example dataset. See\n", "[dataset notebook](https://github.com/aeon-toolkit/aeon/blob/main/examples/datasets/provided_data.ipynb) for more details. We only use the first few cases for examples to speed up the \n", "notebook. " ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:30:40.748159Z", "iopub.status.busy": "2020-12-19T14:30:40.747656Z", "iopub.status.idle": "2020-12-19T14:30:40.795200Z", "shell.execute_reply": "2020-12-19T14:30:40.795889Z" }, "ExecuteTime": { "end_time": "2024-11-25T14:07:09.120656Z", "start_time": "2024-11-25T14:07:09.090118Z" } }, "source": [ "X, y = load_arrow_head()\n", "n_cases = 24\n", "X_train = X[:n_cases, :, :]\n", "y_train = y[:n_cases]\n", "X_test = X[n_cases : 2 * n_cases, :, :]\n", "y_test = y[n_cases : 2 * n_cases]\n", "print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(24, 1, 251) (24,) (24, 1, 251) (24,)\n" ] } ], "execution_count": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using tsfresh to extract features\n", "\n", "There are two versions of TSFresh feature extractors wrapped in aeon. The\n", "first is the unsupervised\n", "`TSFresh` which by default extracts all 4662 features. See the\n", "documentation for parameter configuration." ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:30:40.829452Z", "iopub.status.busy": "2020-12-19T14:30:40.828907Z", "iopub.status.idle": "2020-12-19T14:30:53.049755Z", "shell.execute_reply": "2020-12-19T14:30:53.050249Z" }, "ExecuteTime": { "end_time": "2024-11-25T14:07:16.339473Z", "start_time": "2024-11-25T14:07:11.573523Z" } }, "source": [ "t = TSFresh()\n", "Xt = t.fit_transform(X_train)\n", "Xt2 = t.transform(X_test)\n", "print(f\"Train shape = {Xt.shape} test shape = {Xt2.shape}\")" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train shape = (24, 777) test shape = (24, 777)\n" ] } ], "execution_count": 4 }, { "cell_type": "markdown", "source": [ "The second is `TSFreshRelevant` which uses `y` to select the most\n", "relevant features." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "t = TSFreshRelevant()\n", "t.fit(X_train, y_train)\n", "Xt = t.transform(X_test)\n", "Xt.shape" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-11-25T14:07:32.455607Z", "start_time": "2024-11-25T14:07:26.124172Z" } }, "outputs": [ { "data": { "text/plain": [ "(24, 75)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using tsfresh with scikit estimators\n", "\n", "You can use the tsfresh transformer with any scikit-learn compatible estimator.\n" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:30:53.062147Z", "iopub.status.busy": "2020-12-19T14:30:53.061631Z", "iopub.status.idle": "2020-12-19T14:31:09.307275Z", "shell.execute_reply": "2020-12-19T14:31:09.307781Z" }, "ExecuteTime": { "end_time": "2024-11-25T14:07:41.090159Z", "start_time": "2024-11-25T14:07:36.403997Z" } }, "source": [ "classifier = make_pipeline(\n", " TSFresh(default_fc_parameters=\"efficient\", show_warnings=False),\n", " RandomForestClassifier(),\n", ")\n", "classifier.fit(X_train, y_train)\n", "classifier.score(X_test, y_test)" ], "outputs": [ { "data": { "text/plain": [ "0.625" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 6 }, { "cell_type": "markdown", "source": [ "For convenience and consistency of use we also have hard coded TSFresh classifier,\n", "regressor and clusterer." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "from aeon.classification.feature_based import TSFreshClassifier\n", "from aeon.clustering.feature_based import TSFreshClusterer\n", "\n", "cls = TSFreshClassifier(relevant_feature_extractor=False)\n", "clst = TSFreshClusterer(n_clusters=2)\n", "\n", "cls.fit(X_train, y_train)\n", "cls.score(X_test, y_test)\n", "clst.fit(X_train)\n", "print(cls.predict(X_test))\n", "print(clst.predict(X_test))" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-11-25T14:08:02.405107Z", "start_time": "2024-11-25T14:07:50.878523Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['0' '1' '0' '1' '1' '2' '0' '1' '1' '0' '1' '1' '0' '2' '0' '0' '0' '2'\n", " '2' '1' '0' '0' '0' '0']\n", "[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0]\n" ] } ], "execution_count": 7 }, { "cell_type": "markdown", "source": [ "By default, the `TSFreshClassifier` uses the supervised\n", "`TSFreshRelevant` and the scitkit `RandomForestClassifier`.\n", " You can\n", "change this through the constructor" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "from aeon.classification.sklearn import RotationForestClassifier\n", "\n", "cls = TSFreshClassifier(estimator=RotationForestClassifier(n_estimators=5))\n", "cls.fit(X_train, y_train)\n", "cls.score(X_test, y_test)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-11-25T14:08:13.304452Z", "start_time": "2024-11-25T14:08:06.677532Z" } }, "outputs": [ { "data": { "text/plain": [ "0.5833333333333334" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 8 }, { "cell_type": "markdown", "source": [ "By default, the `TSFreshClusterer` uses the unsupervised `TSFresh`\n", "and the `sklearn` clusterer `KMeans` with default parameters (which fits 8 clusters).\n", " You can also configure this through the constructor." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "from sklearn.cluster import KMeans\n", "\n", "clst = TSFreshClusterer(estimator=KMeans(n_clusters=3))\n", "clst.fit(X_train)\n", "print(clst.predict(X_test))" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-11-25T14:08:38.025066Z", "start_time": "2024-11-25T14:08:33.300907Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 2 0 0 1]\n" ] } ], "execution_count": 9 }, { "cell_type": "markdown", "source": [ "The `TSFreshRegressor` uses the supervised\n", "`TSFreshRelevant` and the scitkit `RandomForestRegressor`." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "from aeon.regression.feature_based import TSFreshRegressor\n", "\n", "reg = TSFreshRegressor(relevant_feature_extractor=False)\n", "from aeon.datasets import load_covid_3month\n", "\n", "X, y = load_covid_3month(split=\"train\")\n", "reg.fit(X, y)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-11-25T14:09:11.745540Z", "start_time": "2024-11-25T14:08:56.573376Z" } }, "outputs": [ { "data": { "text/plain": [ "TSFreshRegressor(relevant_feature_extractor=False)" ], "text/html": [ "
TSFreshRegressor(relevant_feature_extractor=False)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
TSFreshRegressor(relevant_feature_extractor=False)