{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Feature extraction with tsfresh transformer\n", "\n", "[tsfresh](https://tsfresh.readthedocs.io) is a tool for extacting summary features\n", "from a collection of time series. It is an unsupervised transformation, and as such\n", "can easily be used as a pipeline stage in classification, clustering and regression\n", "in conjunction with a scikit-learn compatible estimator.\n", "\n", "## Preliminaries\n", "You have to install tsfresh if you haven't already. To install it, uncomment the cell below:" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:30:39.713903Z", "iopub.status.busy": "2020-12-19T14:30:39.713342Z", "iopub.status.idle": "2020-12-19T14:30:39.715128Z", "shell.execute_reply": "2020-12-19T14:30:39.715641Z" }, "ExecuteTime": { "end_time": "2024-11-25T14:07:05.457198Z", "start_time": "2024-11-25T14:07:05.449815Z" } }, "source": [ "# !pip install --upgrade tsfresh" ], "outputs": [], "execution_count": 1 }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:30:39.719083Z", "iopub.status.busy": "2020-12-19T14:30:39.718586Z", "iopub.status.idle": "2020-12-19T14:30:40.743724Z", "shell.execute_reply": "2020-12-19T14:30:40.744213Z" }, "ExecuteTime": { "end_time": "2024-11-25T14:07:07.829632Z", "start_time": "2024-11-25T14:07:06.056664Z" } }, "source": [ "from sklearn.ensemble import RandomForestClassifier\n", "from sklearn.pipeline import make_pipeline\n", "\n", "from aeon.datasets import load_arrow_head, load_basic_motions\n", "from aeon.transformations.collection.feature_based import TSFresh, TSFreshRelevant" ], "outputs": [], "execution_count": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example data set\n", "\n", "We use the ArrowHead data from the [UCR TSC archive](https://timeseriesclassification.com).\n", "as an example dataset. See\n", "[dataset notebook](https://github.com/aeon-toolkit/aeon/blob/main/examples/datasets/provided_data.ipynb) for more details. We only use the first few cases for examples to speed up the \n", "notebook. " ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:30:40.748159Z", "iopub.status.busy": "2020-12-19T14:30:40.747656Z", "iopub.status.idle": "2020-12-19T14:30:40.795200Z", "shell.execute_reply": "2020-12-19T14:30:40.795889Z" }, "ExecuteTime": { "end_time": "2024-11-25T14:07:09.120656Z", "start_time": "2024-11-25T14:07:09.090118Z" } }, "source": [ "X, y = load_arrow_head()\n", "n_cases = 24\n", "X_train = X[:n_cases, :, :]\n", "y_train = y[:n_cases]\n", "X_test = X[n_cases : 2 * n_cases, :, :]\n", "y_test = y[n_cases : 2 * n_cases]\n", "print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(24, 1, 251) (24,) (24, 1, 251) (24,)\n" ] } ], "execution_count": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using tsfresh to extract features\n", "\n", "There are two versions of TSFresh feature extractors wrapped in aeon. The\n", "first is the unsupervised\n", "`TSFresh` which by default extracts all 4662 features. See the\n", "documentation for parameter configuration." ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:30:40.829452Z", "iopub.status.busy": "2020-12-19T14:30:40.828907Z", "iopub.status.idle": "2020-12-19T14:30:53.049755Z", "shell.execute_reply": "2020-12-19T14:30:53.050249Z" }, "ExecuteTime": { "end_time": "2024-11-25T14:07:16.339473Z", "start_time": "2024-11-25T14:07:11.573523Z" } }, "source": [ "t = TSFresh()\n", "Xt = t.fit_transform(X_train)\n", "Xt2 = t.transform(X_test)\n", "print(f\"Train shape = {Xt.shape} test shape = {Xt2.shape}\")" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train shape = (24, 777) test shape = (24, 777)\n" ] } ], "execution_count": 4 }, { "cell_type": "markdown", "source": [ "The second is `TSFreshRelevant` which uses `y` to select the most\n", "relevant features." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "t = TSFreshRelevant()\n", "t.fit(X_train, y_train)\n", "Xt = t.transform(X_test)\n", "Xt.shape" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-11-25T14:07:32.455607Z", "start_time": "2024-11-25T14:07:26.124172Z" } }, "outputs": [ { "data": { "text/plain": [ "(24, 75)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using tsfresh with scikit estimators\n", "\n", "You can use the tsfresh transformer with any scikit-learn compatible estimator.\n" ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:30:53.062147Z", "iopub.status.busy": "2020-12-19T14:30:53.061631Z", "iopub.status.idle": "2020-12-19T14:31:09.307275Z", "shell.execute_reply": "2020-12-19T14:31:09.307781Z" }, "ExecuteTime": { "end_time": "2024-11-25T14:07:41.090159Z", "start_time": "2024-11-25T14:07:36.403997Z" } }, "source": [ "classifier = make_pipeline(\n", " TSFresh(default_fc_parameters=\"efficient\", show_warnings=False),\n", " RandomForestClassifier(),\n", ")\n", "classifier.fit(X_train, y_train)\n", "classifier.score(X_test, y_test)" ], "outputs": [ { "data": { "text/plain": [ "0.625" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 6 }, { "cell_type": "markdown", "source": [ "For convenience and consistency of use we also have hard coded TSFresh classifier,\n", "regressor and clusterer." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "from aeon.classification.feature_based import TSFreshClassifier\n", "from aeon.clustering.feature_based import TSFreshClusterer\n", "\n", "cls = TSFreshClassifier(relevant_feature_extractor=False)\n", "clst = TSFreshClusterer(n_clusters=2)\n", "\n", "cls.fit(X_train, y_train)\n", "cls.score(X_test, y_test)\n", "clst.fit(X_train)\n", "print(cls.predict(X_test))\n", "print(clst.predict(X_test))" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-11-25T14:08:02.405107Z", "start_time": "2024-11-25T14:07:50.878523Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['0' '1' '0' '1' '1' '2' '0' '1' '1' '0' '1' '1' '0' '2' '0' '0' '0' '2'\n", " '2' '1' '0' '0' '0' '0']\n", "[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0]\n" ] } ], "execution_count": 7 }, { "cell_type": "markdown", "source": [ "By default, the `TSFreshClassifier` uses the supervised\n", "`TSFreshRelevant` and the scitkit `RandomForestClassifier`.\n", " You can\n", "change this through the constructor" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "from aeon.classification.sklearn import RotationForestClassifier\n", "\n", "cls = TSFreshClassifier(estimator=RotationForestClassifier(n_estimators=5))\n", "cls.fit(X_train, y_train)\n", "cls.score(X_test, y_test)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-11-25T14:08:13.304452Z", "start_time": "2024-11-25T14:08:06.677532Z" } }, "outputs": [ { "data": { "text/plain": [ "0.5833333333333334" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 8 }, { "cell_type": "markdown", "source": [ "By default, the `TSFreshClusterer` uses the unsupervised `TSFresh`\n", "and the `sklearn` clusterer `KMeans` with default parameters (which fits 8 clusters).\n", " You can also configure this through the constructor." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "from sklearn.cluster import KMeans\n", "\n", "clst = TSFreshClusterer(estimator=KMeans(n_clusters=3))\n", "clst.fit(X_train)\n", "print(clst.predict(X_test))" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-11-25T14:08:38.025066Z", "start_time": "2024-11-25T14:08:33.300907Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 2 0 0 1]\n" ] } ], "execution_count": 9 }, { "cell_type": "markdown", "source": [ "The `TSFreshRegressor` uses the supervised\n", "`TSFreshRelevant` and the scitkit `RandomForestRegressor`." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "from aeon.regression.feature_based import TSFreshRegressor\n", "\n", "reg = TSFreshRegressor(relevant_feature_extractor=False)\n", "from aeon.datasets import load_covid_3month\n", "\n", "X, y = load_covid_3month(split=\"train\")\n", "reg.fit(X, y)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2024-11-25T14:09:11.745540Z", "start_time": "2024-11-25T14:08:56.573376Z" } }, "outputs": [ { "data": { "text/plain": [ "TSFreshRegressor(relevant_feature_extractor=False)" ], "text/html": [ "
TSFreshRegressor(relevant_feature_extractor=False)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 10 }, { "cell_type": "markdown", "source": [], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "metadata": {}, "source": [ "## TSFresh with multivariate time series data\n", "\n", "``TSFresh`` transformers and all three estimators can be used with multivariate time \n", "series. The transform calculates the features on each channel independently then \n", "concatenate the results. The full transform creates `777*n_channels` features." ] }, { "cell_type": "code", "metadata": { "execution": { "iopub.execute_input": "2020-12-19T14:31:09.311742Z", "iopub.status.busy": "2020-12-19T14:31:09.311092Z", "iopub.status.idle": "2020-12-19T14:31:09.380791Z", "shell.execute_reply": "2020-12-19T14:31:09.381304Z" }, "scrolled": true, "ExecuteTime": { "end_time": "2024-11-25T14:11:57.583864Z", "start_time": "2024-11-25T14:11:57.545946Z" } }, "source": [ "X_train, y_train = load_basic_motions(split=\"train\")\n", "X_test, y_test = load_basic_motions(split=\"test\")\n", "print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)" ], "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(40, 6, 100) (40,) (40, 6, 100) (40,)\n" ] } ], "execution_count": 14 }, { "cell_type": "code", "source": [ "tsfresh = TSFresh()\n", "X = tsfresh.fit_transform(X_train, y_train)\n", "X.shape" ], "metadata": { "collapsed": false, "pycharm": { "is_executing": true }, "ExecuteTime": { "end_time": "2024-11-25T14:12:19.453228Z", "start_time": "2024-11-25T14:11:58.795027Z" } }, "outputs": [ { "data": { "text/plain": [ "(40, 4662)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 15 }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": "" } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8" } }, "nbformat": 4, "nbformat_minor": 4 }