{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "## Открытый курс по машинному обучению\n", "
Автор материала: Плаксина Елена Константиновна, Levka." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##
Обзор библиотеки для генерации временных признаков tsfresh
\n", "###
Time Series FeatuRe Extraction based on Scalable Hypothesis tests
" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Библиотека используется для извлечения признаков из временных рядов. Практически все признаки, которые могут прийти вам в голову, уже внесены в расчёт этой библиотеки и нет никакого смысла создавать их самому, когда это можно сделать парой строчек кода из библиотеки.\n", "\n", "Извлечённые признаки могут быть использованы для описания или кластеризации временных рядов. Также их можно использовать для задач классификации/регрессии на временных рядах." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Процесс расчёта признаков состоит из двух этапов:\n", "- Расчёт всех возможных признаков\n", "\n", "```python\n", "from tsfresh import extract_features\n", "extracted_features = extract_features(timeseries, column_id=\"id\", column_sort=\"time\")\n", "```\n", "- Отбор релевантных признаков и удаление константных/нулевых признаков\n", "\n", "```python\n", "from tsfresh import select_features\n", "from tsfresh.utilities.dataframe_functions import impute\n", "\n", "impute(extracted_features) # удаление константных признаков\n", "features_filtered = select_features(extracted_features, y) # отбор признаков\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Процедура отбора признаков\n", "#### Стадия 1\n", "Расчёт признаков\n", "#### Стадия 2\n", "Проверка на значимость каждого признака, расчёт p-value\n", "#### Стадия 3\n", "Поправка на множественную проверку гипотез Бенджамини-Иекутиели" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2018-04-24T18:18:21.873848Z", "start_time": "2018-04-24T18:18:21.870343Z" } }, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Приведём пример генерации признаков на основе датасета Human Activity Recognition" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "ExecuteTime": { "end_time": "2018-04-24T18:35:06.604259Z", "start_time": "2018-04-24T18:35:06.592741Z" } }, "outputs": [], "source": [ "import matplotlib.pylab as plt\n", "%matplotlib inline\n", "from tsfresh.examples.har_dataset import download_har_dataset, load_har_dataset, load_har_classes\n", "import seaborn as sns\n", "from tsfresh import extract_features, extract_relevant_features, select_features\n", "from tsfresh.utilities.dataframe_functions import impute\n", "from tsfresh.feature_extraction import ComprehensiveFCParameters\n", "from sklearn.tree import DecisionTreeClassifier\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.metrics import classification_report\n", "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Загрузка и отрисовка данных**" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2018-04-24T18:32:59.112311Z", "start_time": "2018-04-24T18:32:40.608378Z" } }, "outputs": [], "source": [ "download_har_dataset()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2018-04-24T18:32:59.373712Z", "start_time": "2018-04-24T18:32:59.113813Z" } }, "outputs": [], "source": [ "df = load_har_dataset()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2018-04-24T18:35:03.282511Z", "start_time": "2018-04-24T18:35:03.191846Z" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.title('accelerometer reading')\n", "plt.plot(df.iloc[0,:])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Извлечение признаков**" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2018-04-24T18:37:44.151891Z", "start_time": "2018-04-24T18:37:44.148384Z" } }, "outputs": [], "source": [ "# расчёт только определённого набора параметров, заданного в ComprehensiveFCParameters\n", "extraction_settings = ComprehensiveFCParameters()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2018-04-24T18:40:36.723372Z", "start_time": "2018-04-24T18:40:36.712868Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01
00.0001810
10.0101390
20.0092760
30.0050660
40.0108100
\n", "
" ], "text/plain": [ " 0 1\n", "0 0.000181 0\n", "1 0.010139 0\n", "2 0.009276 0\n", "3 0.005066 0\n", "4 0.010810 0" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# переформируем данные 500 первых показаний сенсоров column-wise, как этого требует формат библиотеки\n", "N = 500\n", "master_df = pd.DataFrame({0: df[:N].values.flatten(),\n", " 1: np.arange(N).repeat(df.shape[1])})\n", "master_df.head()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2018-04-24T18:41:33.544564Z", "start_time": "2018-04-24T18:40:57.405324Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Feature Extraction: 100%|██████████████████████████████████████████| 20/20 [00:34<00:00, 1.74s/it]\n", "WARNING:tsfresh.utilities.dataframe_functions:The columns ['0__fft_coefficient__coeff_65__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_65__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_65__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_65__attr_\"real\"'\n", " '0__fft_coefficient__coeff_66__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_66__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_66__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_66__attr_\"real\"'\n", " '0__fft_coefficient__coeff_67__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_67__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_67__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_67__attr_\"real\"'\n", " '0__fft_coefficient__coeff_68__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_68__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_68__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_68__attr_\"real\"'\n", " '0__fft_coefficient__coeff_69__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_69__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_69__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_69__attr_\"real\"'\n", " '0__fft_coefficient__coeff_70__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_70__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_70__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_70__attr_\"real\"'\n", " '0__fft_coefficient__coeff_71__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_71__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_71__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_71__attr_\"real\"'\n", " '0__fft_coefficient__coeff_72__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_72__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_72__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_72__attr_\"real\"'\n", " '0__fft_coefficient__coeff_73__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_73__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_73__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_73__attr_\"real\"'\n", " '0__fft_coefficient__coeff_74__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_74__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_74__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_74__attr_\"real\"'\n", " '0__fft_coefficient__coeff_75__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_75__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_75__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_75__attr_\"real\"'\n", " '0__fft_coefficient__coeff_76__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_76__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_76__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_76__attr_\"real\"'\n", " '0__fft_coefficient__coeff_77__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_77__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_77__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_77__attr_\"real\"'\n", " '0__fft_coefficient__coeff_78__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_78__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_78__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_78__attr_\"real\"'\n", " '0__fft_coefficient__coeff_79__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_79__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_79__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_79__attr_\"real\"'\n", " '0__fft_coefficient__coeff_80__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_80__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_80__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_80__attr_\"real\"'\n", " '0__fft_coefficient__coeff_81__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_81__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_81__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_81__attr_\"real\"'\n", " '0__fft_coefficient__coeff_82__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_82__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_82__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_82__attr_\"real\"'\n", " '0__fft_coefficient__coeff_83__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_83__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_83__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_83__attr_\"real\"'\n", " '0__fft_coefficient__coeff_84__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_84__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_84__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_84__attr_\"real\"'\n", " '0__fft_coefficient__coeff_85__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_85__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_85__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_85__attr_\"real\"'\n", " '0__fft_coefficient__coeff_86__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_86__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_86__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_86__attr_\"real\"'\n", " '0__fft_coefficient__coeff_87__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_87__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_87__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_87__attr_\"real\"'\n", " '0__fft_coefficient__coeff_88__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_88__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_88__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_88__attr_\"real\"'\n", " '0__fft_coefficient__coeff_89__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_89__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_89__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_89__attr_\"real\"'\n", " '0__fft_coefficient__coeff_90__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_90__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_90__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_90__attr_\"real\"'\n", " '0__fft_coefficient__coeff_91__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_91__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_91__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_91__attr_\"real\"'\n", " '0__fft_coefficient__coeff_92__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_92__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_92__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_92__attr_\"real\"'\n", " '0__fft_coefficient__coeff_93__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_93__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_93__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_93__attr_\"real\"'\n", " '0__fft_coefficient__coeff_94__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_94__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_94__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_94__attr_\"real\"'\n", " '0__fft_coefficient__coeff_95__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_95__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_95__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_95__attr_\"real\"'\n", " '0__fft_coefficient__coeff_96__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_96__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_96__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_96__attr_\"real\"'\n", " '0__fft_coefficient__coeff_97__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_97__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_97__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_97__attr_\"real\"'\n", " '0__fft_coefficient__coeff_98__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_98__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_98__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_98__attr_\"real\"'\n", " '0__fft_coefficient__coeff_99__attr_\"abs\"'\n", " '0__fft_coefficient__coeff_99__attr_\"angle\"'\n", " '0__fft_coefficient__coeff_99__attr_\"imag\"'\n", " '0__fft_coefficient__coeff_99__attr_\"real\"'] did not have any finite values. Filling with zeros.\n" ] } ], "source": [ "X = extract_features(master_df, column_id=1, impute_function=impute, default_fc_parameters=extraction_settings)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2018-04-24T18:42:22.275690Z", "start_time": "2018-04-24T18:42:22.272685Z" } }, "outputs": [ { "data": { "text/plain": [ "'Число рассчитанных признаков: 794.'" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"Число рассчитанных признаков: {}.\".format(X.shape[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Обучение классификатора**" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2018-04-24T18:45:36.391311Z", "start_time": "2018-04-24T18:45:36.384788Z" } }, "outputs": [ { "data": { "text/plain": [ "(500,)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = load_har_classes()[:N]\n", "y.shape" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "ExecuteTime": { "end_time": "2018-04-24T18:45:43.110969Z", "start_time": "2018-04-24T18:45:43.100963Z" } }, "outputs": [], "source": [ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "ExecuteTime": { "end_time": "2018-04-24T18:45:47.906376Z", "start_time": "2018-04-24T18:45:47.813708Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " 1 1.00 1.00 1.00 29\n", " 2 1.00 1.00 1.00 9\n", " 3 1.00 1.00 1.00 14\n", " 4 0.36 0.36 0.36 14\n", " 5 0.26 0.36 0.30 14\n", " 6 0.60 0.45 0.51 20\n", "\n", "avg / total 0.73 0.71 0.72 100\n", "\n" ] } ], "source": [ "cl = DecisionTreeClassifier()\n", "cl.fit(X_train, y_train)\n", "print(classification_report(y_test, cl.predict(X_test)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Отберём признаки для каждого класса отдельно и решим задачу бинарной классификации**" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2018-04-24T18:49:57.768306Z", "start_time": "2018-04-24T18:49:43.341016Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING:tsfresh.feature_selection.relevance:Infered classification as machine learning task\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Number of relevant features for class 5: 216/794\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "WARNING:tsfresh.feature_selection.relevance:Infered classification as machine learning task\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Number of relevant features for class 4: 202/794\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "WARNING:tsfresh.feature_selection.relevance:Infered classification as machine learning task\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Number of relevant features for class 6: 188/794\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "WARNING:tsfresh.feature_selection.relevance:Infered classification as machine learning task\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Number of relevant features for class 1: 216/794\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "WARNING:tsfresh.feature_selection.relevance:Infered classification as machine learning task\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Number of relevant features for class 3: 220/794\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "WARNING:tsfresh.feature_selection.relevance:Infered classification as machine learning task\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Number of relevant features for class 2: 166/794\n" ] } ], "source": [ "relevant_features = set()\n", "\n", "for label in y.unique():\n", " y_train_binary = y_train == label\n", " X_train_filtered = select_features(X_train, y_train_binary)\n", " print(\"Number of relevant features for class {}: {}/{}\".format(label, X_train_filtered.shape[1], X_train.shape[1]))\n", " relevant_features = relevant_features.union(set(X_train_filtered.columns))" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "ExecuteTime": { "end_time": "2018-04-24T18:50:01.727717Z", "start_time": "2018-04-24T18:50:01.724211Z" } }, "outputs": [ { "data": { "text/plain": [ "264" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(relevant_features)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Мы уменьшили количество признаков с 794 до 264." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2018-04-24T18:50:10.141904Z", "start_time": "2018-04-24T18:50:10.136905Z" } }, "outputs": [], "source": [ "X_train_filtered = X_train[list(relevant_features)]\n", "X_test_filtered = X_test[list(relevant_features)]" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "ExecuteTime": { "end_time": "2018-04-24T18:50:16.432864Z", "start_time": "2018-04-24T18:50:16.429357Z" } }, "outputs": [ { "data": { "text/plain": [ "((400, 264), (100, 264))" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X_train_filtered.shape, X_test_filtered.shape" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "ExecuteTime": { "end_time": "2018-04-24T18:50:24.694811Z", "start_time": "2018-04-24T18:50:24.649243Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " 1 1.00 1.00 1.00 29\n", " 2 1.00 1.00 1.00 9\n", " 3 1.00 1.00 1.00 14\n", " 4 0.27 0.29 0.28 14\n", " 5 0.29 0.36 0.32 14\n", " 6 0.62 0.50 0.56 20\n", "\n", "avg / total 0.72 0.71 0.71 100\n", "\n" ] } ], "source": [ "cl = DecisionTreeClassifier()\n", "cl.fit(X_train_filtered, y_train)\n", "print(classification_report(y_test, cl.predict(X_test_filtered)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Качество модели практически не изменилось, однако модель стала намного проще." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Сравнение с классификатором на стандартных признаках**" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "ExecuteTime": { "end_time": "2018-04-24T18:53:10.252470Z", "start_time": "2018-04-24T18:53:10.248964Z" } }, "outputs": [ { "data": { "text/plain": [ "(500, 128)" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X_1 = df.iloc[:N,:]\n", "X_1.shape" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "ExecuteTime": { "end_time": "2018-04-24T18:53:11.053575Z", "start_time": "2018-04-24T18:53:11.049569Z" } }, "outputs": [], "source": [ "X_train, X_test, y_train, y_test = train_test_split(X_1, y, test_size=.2)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "ExecuteTime": { "end_time": "2018-04-24T18:53:17.756822Z", "start_time": "2018-04-24T18:53:17.720767Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " 1 0.55 0.58 0.56 19\n", " 2 0.69 0.52 0.59 21\n", " 3 0.75 0.46 0.57 13\n", " 4 0.42 0.57 0.48 14\n", " 5 0.65 0.50 0.56 22\n", " 6 0.20 0.36 0.26 11\n", "\n", "avg / total 0.57 0.51 0.53 100\n", "\n" ] } ], "source": [ "cl = DecisionTreeClassifier()\n", "cl.fit(X_train, y_train)\n", "print(classification_report(y_test, cl.predict(X_test)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Как видимо, качество модели значительно улучшилось по сравнению с наивным классификатором." ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 1 }