{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction: Getting Started with SVM\n", "In this notebook, we explain to download the dataset and getting started with all the predictive tasks using Support Vector Machine. We will be extracting spectral features, specifically 6 rhythmic features - total power in 6 frequency bands, namely, Delta (0.5-4 Hz), Theta (4-8 Hz), Alpha (8-14 Hz), Beta (14-30 Hz), Low Gamma (30-47 Hz), and High Gamma (47-64 Hz). For preprocessing, we filter EEG first with 0.5 Hz highpass and then remove Artifact with ICA based approach. " ] }, { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2020-05-22T12:52:49.613138Z", "start_time": "2020-05-22T12:52:47.677570Z" } }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2020-05-22T12:52:51.109772Z", "start_time": "2020-05-22T12:52:50.271005Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PhyAAt Processing lib Loaded...\n", "Version : 0.0.2\n" ] } ], "source": [ "#!pip install phyaat # if not installed yet\n", "\n", "import phyaat\n", "print('Version :' ,phyaat.__version__)\n", "import phyaat as ph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download Data" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2020-05-22T12:53:13.791404Z", "start_time": "2020-05-22T12:53:07.545858Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "100%[|][##################################################] S1\r" ] } ], "source": [ "# Download dataset of one subject only (subject=1)\n", "# To download data of all the subjects use subject =-1 or for specify for one e.g.subject=10\n", "\n", "dirPath = ph.download_data(baseDir='../PhyAAt_Data', subject=1,verbose=0,overwrite=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Locate the subject's file" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2020-05-22T12:53:53.823637Z", "start_time": "2020-05-22T12:53:53.813091Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total Subjects : 1\n" ] }, { "data": { "text/plain": [ "{'sigFile': '../PhyAAt_Data/phyaat_dataset/Signals/S1/S1_Signals.csv',\n", " 'txtFile': '../PhyAAt_Data/phyaat_dataset/Signals/S1/S1_Textscore.csv'}" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "baseDir='../PhyAAt_Data' # or dirPath return path from above\n", "\n", "#returns a dictionary containing file names of all the subjects available in baseDir\n", "SubID = ph.ReadFilesPath(baseDir) \n", "\n", "#check files of subject=1\n", "SubID[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading data and preprocessing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create Subj (obj) with data of Subject=1" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2020-05-22T12:54:01.271048Z", "start_time": "2020-05-22T12:53:59.548714Z" } }, "outputs": [], "source": [ "# Create a Subj holding dataset of subject=1\n", "\n", "Subj = ph.Subject(SubID[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Filtering - removing DC" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2020-05-22T12:59:56.614948Z", "start_time": "2020-05-22T12:59:52.708653Z" } }, "outputs": [], "source": [ "#filtering with highpass filter of cutoff frequency 0.5Hz\n", "\n", "Subj.filter_EEG(band =[0.5],btype='highpass',order=5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Artifact removal using ICA [ ~6mins]" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2020-05-22T13:08:24.116527Z", "start_time": "2020-05-22T13:02:02.421303Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ICA Artifact Removal : extended-infomax\n", "100%|####################################################################################################|\n" ] } ], "source": [ "#Remving Artifact using ICA, setting window size to 1280 (10sec), which is larg, but takes less time\n", "\n", "Subj.correct(method='ICA',verbose=1,winsize=128*10) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Feature Extraction - Rhythmic Features [~2min]" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2020-05-22T13:12:22.957813Z", "start_time": "2020-05-22T13:09:40.534782Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "100%|##################################################|100\\100|Sg - 0\n", "Done..\n", "100%|##################################################|100\\100|Sg - 1\n", "Done..\n", "100%|##################################################|100\\100|Sg - 2\n", "Done..\n", "100%|##################################################|43\\43|Sg - 0\n", "Done..\n", "100%|##################################################|43\\43|Sg - 1\n", "Done..\n", "100%|##################################################|43\\43|Sg - 2\n", "Done..\n", "DataShape: (290, 84) (290, 4) (120, 84) (120, 4)\n" ] } ], "source": [ "# setting task=-1, does extract the features from all the segmensts for all the four tasks and \n", "# returns y_train as (n,4), one coulum for each task. Next time extracting Xy for any particular\n", "# task won't extract the features agains, unless you force it by setting 'redo'=True.\n", "\n", "X_train,y_train,X_test, y_test = Subj.getXy_eeg(task=-1)\n", "\n", "print('DataShape: ',X_train.shape,y_train.shape,X_test.shape, y_test.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predictive Modeling with SVM" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2020-05-22T13:15:13.398531Z", "start_time": "2020-05-22T13:15:13.387190Z" } }, "outputs": [], "source": [ "from sklearn import svm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### T4 Task: LWR classification" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2020-05-22T13:16:44.863900Z", "start_time": "2020-05-22T13:16:44.850700Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DataShape: (290, 84) (290,) (120, 84) (120,)\n", "\n", "Class labels : [0 1 2]\n" ] } ], "source": [ "X_train,y_train, X_test,y_test = Subj.getXy_eeg(task=4)\n", "\n", "print('DataShape: ',X_train.shape,y_train.shape,X_test.shape, y_test.shape)\n", "print('\\nClass labels :',np.unique(y_train))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "ExecuteTime": { "end_time": "2020-05-22T13:16:51.506165Z", "start_time": "2020-05-22T13:16:51.464835Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training Accuracy: 0.9551724137931035\n", "Testing Accuracy: 0.875\n" ] } ], "source": [ "# Normalization - SVM works well with normalized features\n", "means = X_train.mean(0)\n", "std = X_train.std(0)\n", "X_train = (X_train-means)/std\n", "X_test = (X_test-means)/std\n", "\n", "\n", "# Training\n", "clf = svm.SVC(kernel='rbf', C=1,gamma='auto')\n", "clf.fit(X_train,y_train)\n", "\n", "# Predition\n", "ytp = clf.predict(X_train)\n", "ysp = clf.predict(X_test)\n", "\n", "# Evaluation\n", "T4_trac = np.mean(y_train==ytp)\n", "T4_tsac = np.mean(y_test==ysp)\n", "print('Training Accuracy:',T4_trac)\n", "print('Testing Accuracy:',T4_tsac)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### T3 Task: Semanticity classification" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2020-05-22T13:17:15.909905Z", "start_time": "2020-05-22T13:17:15.902393Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DataShape: (100, 84) (100,) (43, 84) (43,)\n", "\n", "Class labels : [0 1]\n" ] } ], "source": [ "X_train,y_train, X_test,y_test = Subj.getXy_eeg(task=3)\n", "\n", "print('DataShape: ',X_train.shape,y_train.shape,X_test.shape, y_test.shape)\n", "print('\\nClass labels :',np.unique(y_train))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2020-05-22T13:17:44.998304Z", "start_time": "2020-05-22T13:17:44.976738Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training Accuracy: 0.86\n", "Testing Accuracy: 0.6046511627906976\n" ] } ], "source": [ "# Normalization - SVM works well with normalized features\n", "means = X_train.mean(0)\n", "std = X_train.std(0)\n", "X_train = (X_train-means)/std\n", "X_test = (X_test-means)/std\n", "\n", "\n", "# Training\n", "clf = svm.SVC(kernel='rbf', C=1,gamma='auto')\n", "clf.fit(X_train,y_train)\n", "\n", "# Predition\n", "ytp = clf.predict(X_train)\n", "ysp = clf.predict(X_test)\n", "\n", "\n", "# Evaluation\n", "T3_trac = np.mean(y_train==ytp)\n", "T3_tsac = np.mean(y_test==ysp)\n", "print('Training Accuracy:',T3_trac)\n", "print('Testing Accuracy:',T3_tsac)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### T2 Task: Noise level prediction : Regression" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2020-05-22T13:24:28.385511Z", "start_time": "2020-05-22T13:24:28.374990Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DataShape: (100, 84) (100,) (43, 84) (43,)\n", "\n", "Noise levels : [ -6 -3 0 3 6 1000]\n", "New Noise levels : [-6 -3 0 3 6 10]\n" ] } ], "source": [ "X_train,y_train, X_test,y_test = Subj.getXy_eeg(task=2)\n", "\n", "print('DataShape: ',X_train.shape,y_train.shape,X_test.shape, y_test.shape)\n", "print('\\nNoise levels :',np.unique(y_train))\n", "\n", "#change 1000 dB to 10 dB\n", "y_train[y_train==1000]=10\n", "y_test[y_test==1000]=10\n", "\n", "print('New Noise levels :',np.unique(y_train))" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2020-05-22T13:25:51.464007Z", "start_time": "2020-05-22T13:25:51.441827Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training MAE: 3.9959210189119596\n", "Testing MAE: 4.692983467091375\n" ] } ], "source": [ "# Normalization - SVM works well with normalized features\n", "means = X_train.mean(0)\n", "std = X_train.std(0)\n", "X_train = (X_train-means)/std\n", "X_test = (X_test-means)/std\n", "\n", "\n", "# Training\n", "clf = svm.SVR(kernel='rbf', C=1,gamma='auto')\n", "clf.fit(X_train,y_train)\n", "\n", "# Predition\n", "ytp = clf.predict(X_train)\n", "ysp = clf.predict(X_test)\n", "\n", "# Evaluation\n", "T2_tre = np.mean(np.abs(y_train-ytp))\n", "T2_tse = np.mean(np.abs(y_test-ysp))\n", "print('Training MAE:',T2_tre)\n", "print('Testing MAE:',T2_tse)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### T1 Task: Attention Level prediction: Regression" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2020-05-22T13:26:34.815699Z", "start_time": "2020-05-22T13:26:34.803815Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DataShape: (100, 84) (100,) (43, 84) (43,)\n", "\n", "Attention levels:\n", " [ 0 7 12 14 15 18 20 22 25 28 33 37 38 42 44 45 46 50\n", " 54 60 62 66 71 72 75 76 80 83 85 87 88 100]\n", "\n", "New Attention levels:\n", " [ 0 10 20 30 40 50 60 70 80 100]\n" ] } ], "source": [ "X_train,y_train, X_test,y_test = Subj.getXy_eeg(task=1)\n", "\n", "print('DataShape: ',X_train.shape,y_train.shape,X_test.shape, y_test.shape)\n", "print('\\nAttention levels:\\n',np.unique(y_train))\n", "\n", "# Round off around 10\n", "\n", "y_train = 10*(y_train//10)\n", "y_test = 10*(y_test//10)\n", "\n", "print('\\nNew Attention levels:\\n',np.unique(y_train))" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2020-05-22T13:26:59.732496Z", "start_time": "2020-05-22T13:26:59.714559Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training MAE: 30.318536004889328\n", "Testing MAE: 32.7156301374038\n" ] } ], "source": [ "# Normalization - SVM works well with normalized features\n", "means = X_train.mean(0)\n", "std = X_train.std(0)\n", "X_train = (X_train-means)/std\n", "X_test = (X_test-means)/std\n", "\n", "\n", "# Training\n", "clf = svm.SVR(kernel='rbf', C=1,gamma='auto')\n", "clf.fit(X_train,y_train)\n", "\n", "# Predition\n", "ytp = clf.predict(X_train)\n", "ysp = clf.predict(X_test)\n", "\n", "# Evaluation\n", "T1_tre = np.mean(np.abs(y_train-ytp))\n", "T1_tse = np.mean(np.abs(y_test-ysp))\n", "print('Training MAE:',T1_tre)\n", "print('Testing MAE:',T1_tse)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## All results" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "ExecuteTime": { "end_time": "2020-05-22T17:17:54.322184Z", "start_time": "2020-05-22T17:17:53.813512Z" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig = plt.figure(figsize=(13,3))\n", "plt.subplot(141)\n", "plt.bar(1, [T1_tre])\n", "plt.bar(2, [T1_tse])\n", "plt.xlim([0,3])\n", "plt.xticks([])\n", "plt.xlabel('T1: Attention Level',fontsize=13)\n", "plt.ylabel('MAE')\n", "\n", "plt.subplot(142)\n", "plt.bar(1, [T2_tre])\n", "plt.bar(2, [T2_tse])\n", "plt.xticks([])\n", "plt.xlabel('T2: Noise Level',fontsize=13)\n", "plt.ylabel('MAE')\n", "\n", "\n", "plt.xlim([0,3])\n", "plt.subplot(143)\n", "plt.bar(1, [T3_trac])\n", "plt.bar(2, [T3_tsac])\n", "plt.xticks([])\n", "plt.xlabel('T3: Semanticity',fontsize=13)\n", "plt.ylabel('Accuracy')\n", "\n", "plt.xlim([0,3])\n", "plt.subplot(144)\n", "plt.bar(1, [T4_trac],label='Training')\n", "plt.bar(2, [T4_tsac],label='Testing')\n", "plt.xlim([0,3])\n", "plt.xticks([])\n", "plt.xlabel('T4: LWR',fontsize=13)\n", "plt.ylabel('Accuracy')\n", "plt.legend(bbox_to_anchor=(1,1))\n", "plt.subplots_adjust(wspace=0.5)\n", "fig.suptitle(\"Predictive Tasks with SVM\", fontsize=\"x-large\")\n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "273.188px" }, "toc_section_display": true, "toc_window_display": false }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }