{ "cells": [ { "cell_type": "markdown", "metadata": { "tags": [ "intro_info_title" ] }, "source": [ "\n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "
Stone, Paper or Scissor Game - Train and Classify [Volume 3]
" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "intro_info_tags" ] }, "source": [ "
\n", "
\n", " Difficulty Level: \n", " \n", " \n", " \n", " \n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", "
Tagstrain_and_classify☁machine-learning☁features☁selection
\n", "
\n", " \n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "test" ] }, "source": [ "Previous Notebooks that are part of \"Rock, Paper or Scissor Game - Train and Classify\" module\n", "\n", "\n", "Following Notebooks that are part of \"Rock, Paper or Scissor Game - Train and Classify\" module\n", " \n", "\n", "\n", " \n", " \n", " \n", "
\n", " Currently we are in possession of a file containing the feature values for all training examples, as demonstrated on a previously created Jupyter Notebook .\n", "
\n", " However, there is a high risk that some of the extracted features are not useful for our classification system. Remember, a good feature is a parameter that has the ability to separate the different classes of our classification system, i.e, a parameter with a characteristic range of values for each available class.\n", "
\n", " In order to ensure that the training process of our classifier happens in the most efficient way, these redundant or invariant features should be removed.\n", "
\n", " The implicit logic of the last two paragraphs is called Feature Selection, which will be focused at this Jupyter Notebook !\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Starting Point (Setup)

\n", "List of Available Classes:\n", "
\n", "
    \n", "
  1. \"No Action\" [When the hand is relaxed]
  2. \n", "
  3. \"Paper\" [All fingers are extended]
  4. \n", "
  5. \"Stone\" [All fingers are bent]
  6. \n", "
  7. \"Scissor\" [Forefinger and middle finger are extended and the remaining ones are bent]
  8. \n", "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", " \n", " \n", "
\n", " Paper\n", " \n", " Stone\n", " \n", " Scissor\n", "
\n", "\n", "Acquired Data:\n", "
\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Protocol/Feature Extraction

\n", "Extracted Features\n", "\n", "\n", "Formal definition of parameters\n", "
\n", "☝ | Maximum Sample Value of a set of elements is equal to the last element of the sorted set\n", "\n", "☉ | $\\mu = \\frac{1}{N}\\sum_{i=1}^N (sample_i)$\n", "\n", "☆ | $\\sigma = \\sqrt{\\frac{1}{N}\\sum_{i=1}^N(sample_i - \\mu_{signal})^2}$\n", "\n", "☌ | $zcr = \\frac{1}{N - 1}\\sum_{i=1}^{N-1}bin(i)$ \n", "\n", "☇ | $\\sigma_{abs} = \\sqrt{\\frac{1}{N}\\sum_{i=1}^N(|sample_i| - \\mu_{signal_{abs}})^2}$\n", "\n", "☍ | $m = \\frac{\\Delta signal}{\\Delta t}$\n", "\n", "... being $N$ the number of acquired samples (that are part of the signal), $sample_i$ the value of the sample number $i$, $signal_{abs}$ the absolute signal, $\\Delta signal$ is the difference between the y coordinate of two points of the regression curve and $\\Delta t$ the difference between the x (time) coordinate of the same two points of the regression curve.\n", "\n", "... and \n", "\n", "$bin(i)$ a binary function defined as:\n", "\n", "$bin(i) = \\begin{cases} 1, & \\mbox{if } signal_i \\times signal_{i-1} \\leq 0 \\\\ 0, & \\mbox{if } signal_i \\times signal_{i-1}>0 \\end{cases}$\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Feature Selection

\n", "Intro\n", "
\n", "With Feature Selection we will start to use the resources contained inside an extremely useful Python package: scikit-learn \n", "\n", "Like described before, Feature Selection is intended to remove redundant or meaningless parameters which would increase the complexity of the classifier and not always translate into an improved performance. Without this step, the risk of overfitting to the training examples increases, making the classifier less able to categorize a new testing example.\n", "\n", "There are different approaches to feature selection such as filter methods or wrapper methods.\n", "\n", "In the first method (filter methods), a ranking will be attributed to the features, using, for example, the Pearson correlation coefficient to evaluate the impact that the feature under analysis has on the target class of the training example, or the Mutual Information parameter which defines whether two variables convey shared information. \n", "\n", "The least relevant features will be excluded and the classifier will be trained later (for a deeper explanation, please, visit the article of Girish Chandrashekar and Ferat Sahin at ScienceDirect ).\n", "\n", "The second methodology (wrapper methods) is characterised by the fact that the selection phase includes a classification algorithm, and features will be excluded or selected according to the quality of the trained classifier.\n", "\n", "There are also a third major methodology applicable on Feature Selection, including the so called embedded methods. Essentially this methods are a combination of filter and wrapper, being characterised by the simultaneous execution of Feature Selection and Training stages.\n", "\n", "One of the most intuitive Feature Selection methods is Recursive Feature Elimination, which will be used in the current Jupyter Notebook.\n", "\n", "Essentially the steps of this method consists in:\n", "
    \n", "
  1. Original set of training examples is segmented into multiple ($K$) subsets of training examples and test examples
  2. \n", " For each one of the $K$ subsets of training/test examples:\n", "
      \n", "
    1. The training examples are used for training a \"virtual\" classifier (for example a Support Vector Machine )
    2. \n", "
    3. The test examples are given as inputs of the trained classifier and the \"virtual\" classifier quality is estimated
    4. \n", "
    \n", "
  3. At this point we can estimate the average quality of the $K$ \"virtual\" classifiers and know the weight of each feature on the training stage
  4. \n", "
  5. The feature with a smaller weight is excluded
  6. \n", "
  7. Repetition of steps 1, 2 and 3 until only one feature remains
  8. \n", "
  9. Finally, when the \"feature elimination\" procedure ends, the set of features that provide a \"virtual\" classifier with the best average quality (step 2) define the relevant features to be used during our final training stage
  10. \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

0 - Import of the needed packages for a correct execution of the current Jupyter Notebook

" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "hide_out" ] }, "outputs": [ { "data": { "application/javascript": [ "\n", "(function(root) {\n", " function now() {\n", " return new Date();\n", " }\n", "\n", " var force = true;\n", "\n", " if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n", " root._bokeh_onload_callbacks = [];\n", " root._bokeh_is_loading = undefined;\n", " }\n", "\n", " var JS_MIME_TYPE = 'application/javascript';\n", " var HTML_MIME_TYPE = 'text/html';\n", " var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n", " var CLASS_NAME = 'output_bokeh rendered_html';\n", "\n", " /**\n", " * Render data to the DOM node\n", " */\n", " function render(props, node) {\n", " var script = document.createElement(\"script\");\n", " node.appendChild(script);\n", " }\n", "\n", " /**\n", " * Handle when an output is cleared or removed\n", " */\n", " function handleClearOutput(event, handle) {\n", " var cell = handle.cell;\n", "\n", " var id = cell.output_area._bokeh_element_id;\n", " var server_id = cell.output_area._bokeh_server_id;\n", " // Clean up Bokeh references\n", " if (id != null && id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", "\n", " if (server_id !== undefined) {\n", " // Clean up Bokeh references\n", " var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n", " cell.notebook.kernel.execute(cmd, {\n", " iopub: {\n", " output: function(msg) {\n", " var id = msg.content.text.trim();\n", " if (id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", " }\n", " }\n", " });\n", " // Destroy server and session\n", " var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n", " cell.notebook.kernel.execute(cmd);\n", " }\n", " }\n", "\n", " /**\n", " * Handle when a new output is added\n", " */\n", " function handleAddOutput(event, handle) {\n", " var output_area = handle.output_area;\n", " var output = handle.output;\n", "\n", " // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n", " if ((output.output_type != \"display_data\") || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n", " return\n", " }\n", "\n", " var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n", "\n", " if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n", " toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n", " // store reference to embed id on output_area\n", " output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n", " }\n", " if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n", " var bk_div = document.createElement(\"div\");\n", " bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n", " var script_attrs = bk_div.children[0].attributes;\n", " for (var i = 0; i < script_attrs.length; i++) {\n", " toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n", " }\n", " // store reference to server id on output_area\n", " output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n", " }\n", " }\n", "\n", " function register_renderer(events, OutputArea) {\n", "\n", " function append_mime(data, metadata, element) {\n", " // create a DOM node to render to\n", " var toinsert = this.create_output_subarea(\n", " metadata,\n", " CLASS_NAME,\n", " EXEC_MIME_TYPE\n", " );\n", " this.keyboard_manager.register_events(toinsert);\n", " // Render to node\n", " var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n", " render(props, toinsert[toinsert.length - 1]);\n", " element.append(toinsert);\n", " return toinsert\n", " }\n", "\n", " /* Handle when an output is cleared or removed */\n", " events.on('clear_output.CodeCell', handleClearOutput);\n", " events.on('delete.Cell', handleClearOutput);\n", "\n", " /* Handle when a new output is added */\n", " events.on('output_added.OutputArea', handleAddOutput);\n", "\n", " /**\n", " * Register the mime type and append_mime function with output_area\n", " */\n", " OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n", " /* Is output safe? */\n", " safe: true,\n", " /* Index of renderer in `output_area.display_order` */\n", " index: 0\n", " });\n", " }\n", "\n", " // register the mime type if in Jupyter Notebook environment and previously unregistered\n", " if (root.Jupyter !== undefined) {\n", " var events = require('base/js/events');\n", " var OutputArea = require('notebook/js/outputarea').OutputArea;\n", "\n", " if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n", " register_renderer(events, OutputArea);\n", " }\n", " }\n", "\n", " \n", " if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n", " root._bokeh_timeout = Date.now() + 5000;\n", " root._bokeh_failed_load = false;\n", " }\n", "\n", " var NB_LOAD_WARNING = {'data': {'text/html':\n", " \"
\\n\"+\n", " \"

\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"

\\n\"+\n", " \"\\n\"+\n", " \"\\n\"+\n", " \"from bokeh.resources import INLINE\\n\"+\n", " \"output_notebook(resources=INLINE)\\n\"+\n", " \"\\n\"+\n", " \"
\"}};\n", "\n", " function display_loaded() {\n", " var el = document.getElementById(null);\n", " if (el != null) {\n", " el.textContent = \"BokehJS is loading...\";\n", " }\n", " if (root.Bokeh !== undefined) {\n", " if (el != null) {\n", " el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n", " }\n", " } else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(display_loaded, 100)\n", " }\n", " }\n", "\n", "\n", " function run_callbacks() {\n", " try {\n", " root._bokeh_onload_callbacks.forEach(function(callback) {\n", " if (callback != null)\n", " callback();\n", " });\n", " } finally {\n", " delete root._bokeh_onload_callbacks\n", " }\n", " console.debug(\"Bokeh: all callbacks have finished\");\n", " }\n", "\n", " function load_libs(css_urls, js_urls, callback) {\n", " if (css_urls == null) css_urls = [];\n", " if (js_urls == null) js_urls = [];\n", "\n", " root._bokeh_onload_callbacks.push(callback);\n", " if (root._bokeh_is_loading > 0) {\n", " console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n", " return null;\n", " }\n", " if (js_urls == null || js_urls.length === 0) {\n", " run_callbacks();\n", " return null;\n", " }\n", " console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n", " root._bokeh_is_loading = css_urls.length + js_urls.length;\n", "\n", " function on_load() {\n", " root._bokeh_is_loading--;\n", " if (root._bokeh_is_loading === 0) {\n", " console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n", " run_callbacks()\n", " }\n", " }\n", "\n", " function on_error() {\n", " console.error(\"failed to load \" + url);\n", " }\n", "\n", " for (var i = 0; i < css_urls.length; i++) {\n", " var url = css_urls[i];\n", " const element = document.createElement(\"link\");\n", " element.onload = on_load;\n", " element.onerror = on_error;\n", " element.rel = \"stylesheet\";\n", " element.type = \"text/css\";\n", " element.href = url;\n", " console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n", " document.body.appendChild(element);\n", " }\n", "\n", " for (var i = 0; i < js_urls.length; i++) {\n", " var url = js_urls[i];\n", " var element = document.createElement('script');\n", " element.onload = on_load;\n", " element.onerror = on_error;\n", " element.async = false;\n", " element.src = url;\n", " console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", " document.head.appendChild(element);\n", " }\n", " };\n", "\n", " function inject_raw_css(css) {\n", " const element = document.createElement(\"style\");\n", " element.appendChild(document.createTextNode(css));\n", " document.body.appendChild(element);\n", " }\n", "\n", " \n", " var js_urls = [\"https://cdn.pydata.org/bokeh/release/bokeh-1.4.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-widgets-1.4.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-tables-1.4.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-gl-1.4.0.min.js\"];\n", " var css_urls = [];\n", " \n", "\n", " var inline_js = [\n", " function(Bokeh) {\n", " Bokeh.set_log_level(\"info\");\n", " },\n", " function(Bokeh) {\n", " \n", " \n", " }\n", " ];\n", "\n", " function run_inline_js() {\n", " \n", " if (root.Bokeh !== undefined || force === true) {\n", " \n", " for (var i = 0; i < inline_js.length; i++) {\n", " inline_js[i].call(root, root.Bokeh);\n", " }\n", " } else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(run_inline_js, 100);\n", " } else if (!root._bokeh_failed_load) {\n", " console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n", " root._bokeh_failed_load = true;\n", " } else if (force !== true) {\n", " var cell = $(document.getElementById(null)).parents('.cell').data().cell;\n", " cell.output_area.append_execute_result(NB_LOAD_WARNING)\n", " }\n", "\n", " }\n", "\n", " if (root._bokeh_is_loading === 0) {\n", " console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n", " run_inline_js();\n", " } else {\n", " load_libs(css_urls, js_urls, function() {\n", " console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n", " run_inline_js();\n", " });\n", " }\n", "}(window));" ], "application/vnd.bokehjs_load.v0+json": "\n(function(root) {\n function now() {\n return new Date();\n }\n\n var force = true;\n\n if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n root._bokeh_onload_callbacks = [];\n root._bokeh_is_loading = undefined;\n }\n\n \n\n \n if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n var NB_LOAD_WARNING = {'data': {'text/html':\n \"
\\n\"+\n \"

\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"

\\n\"+\n \"\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"\\n\"+\n \"
\"}};\n\n function display_loaded() {\n var el = document.getElementById(null);\n if (el != null) {\n el.textContent = \"BokehJS is loading...\";\n }\n if (root.Bokeh !== undefined) {\n if (el != null) {\n el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(display_loaded, 100)\n }\n }\n\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) {\n if (callback != null)\n callback();\n });\n } finally {\n delete root._bokeh_onload_callbacks\n }\n console.debug(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(css_urls, js_urls, callback) {\n if (css_urls == null) css_urls = [];\n if (js_urls == null) js_urls = [];\n\n root._bokeh_onload_callbacks.push(callback);\n if (root._bokeh_is_loading > 0) {\n console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls == null || js_urls.length === 0) {\n run_callbacks();\n return null;\n }\n console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n root._bokeh_is_loading = css_urls.length + js_urls.length;\n\n function on_load() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n run_callbacks()\n }\n }\n\n function on_error() {\n console.error(\"failed to load \" + url);\n }\n\n for (var i = 0; i < css_urls.length; i++) {\n var url = css_urls[i];\n const element = document.createElement(\"link\");\n element.onload = on_load;\n element.onerror = on_error;\n element.rel = \"stylesheet\";\n element.type = \"text/css\";\n element.href = url;\n console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n document.body.appendChild(element);\n }\n\n for (var i = 0; i < js_urls.length; i++) {\n var url = js_urls[i];\n var element = document.createElement('script');\n element.onload = on_load;\n element.onerror = on_error;\n element.async = false;\n element.src = url;\n console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.head.appendChild(element);\n }\n };\n\n function inject_raw_css(css) {\n const element = document.createElement(\"style\");\n element.appendChild(document.createTextNode(css));\n document.body.appendChild(element);\n }\n\n \n var js_urls = [\"https://cdn.pydata.org/bokeh/release/bokeh-1.4.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-widgets-1.4.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-tables-1.4.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-gl-1.4.0.min.js\"];\n var css_urls = [];\n \n\n var inline_js = [\n function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\n function(Bokeh) {\n \n \n }\n ];\n\n function run_inline_js() {\n \n if (root.Bokeh !== undefined || force === true) {\n \n for (var i = 0; i < inline_js.length; i++) {\n inline_js[i].call(root, root.Bokeh);\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n } else if (force !== true) {\n var cell = $(document.getElementById(null)).parents('.cell').data().cell;\n cell.output_area.append_execute_result(NB_LOAD_WARNING)\n }\n\n }\n\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n run_inline_js();\n } else {\n load_libs(css_urls, js_urls, function() {\n console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n}(window));" }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", "(function(root) {\n", " function now() {\n", " return new Date();\n", " }\n", "\n", " var force = true;\n", "\n", " if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n", " root._bokeh_onload_callbacks = [];\n", " root._bokeh_is_loading = undefined;\n", " }\n", "\n", " var JS_MIME_TYPE = 'application/javascript';\n", " var HTML_MIME_TYPE = 'text/html';\n", " var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n", " var CLASS_NAME = 'output_bokeh rendered_html';\n", "\n", " /**\n", " * Render data to the DOM node\n", " */\n", " function render(props, node) {\n", " var script = document.createElement(\"script\");\n", " node.appendChild(script);\n", " }\n", "\n", " /**\n", " * Handle when an output is cleared or removed\n", " */\n", " function handleClearOutput(event, handle) {\n", " var cell = handle.cell;\n", "\n", " var id = cell.output_area._bokeh_element_id;\n", " var server_id = cell.output_area._bokeh_server_id;\n", " // Clean up Bokeh references\n", " if (id != null && id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", "\n", " if (server_id !== undefined) {\n", " // Clean up Bokeh references\n", " var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n", " cell.notebook.kernel.execute(cmd, {\n", " iopub: {\n", " output: function(msg) {\n", " var id = msg.content.text.trim();\n", " if (id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", " }\n", " }\n", " });\n", " // Destroy server and session\n", " var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n", " cell.notebook.kernel.execute(cmd);\n", " }\n", " }\n", "\n", " /**\n", " * Handle when a new output is added\n", " */\n", " function handleAddOutput(event, handle) {\n", " var output_area = handle.output_area;\n", " var output = handle.output;\n", "\n", " // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n", " if ((output.output_type != \"display_data\") || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n", " return\n", " }\n", "\n", " var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n", "\n", " if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n", " toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n", " // store reference to embed id on output_area\n", " output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n", " }\n", " if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n", " var bk_div = document.createElement(\"div\");\n", " bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n", " var script_attrs = bk_div.children[0].attributes;\n", " for (var i = 0; i < script_attrs.length; i++) {\n", " toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n", " }\n", " // store reference to server id on output_area\n", " output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n", " }\n", " }\n", "\n", " function register_renderer(events, OutputArea) {\n", "\n", " function append_mime(data, metadata, element) {\n", " // create a DOM node to render to\n", " var toinsert = this.create_output_subarea(\n", " metadata,\n", " CLASS_NAME,\n", " EXEC_MIME_TYPE\n", " );\n", " this.keyboard_manager.register_events(toinsert);\n", " // Render to node\n", " var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n", " render(props, toinsert[toinsert.length - 1]);\n", " element.append(toinsert);\n", " return toinsert\n", " }\n", "\n", " /* Handle when an output is cleared or removed */\n", " events.on('clear_output.CodeCell', handleClearOutput);\n", " events.on('delete.Cell', handleClearOutput);\n", "\n", " /* Handle when a new output is added */\n", " events.on('output_added.OutputArea', handleAddOutput);\n", "\n", " /**\n", " * Register the mime type and append_mime function with output_area\n", " */\n", " OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n", " /* Is output safe? */\n", " safe: true,\n", " /* Index of renderer in `output_area.display_order` */\n", " index: 0\n", " });\n", " }\n", "\n", " // register the mime type if in Jupyter Notebook environment and previously unregistered\n", " if (root.Jupyter !== undefined) {\n", " var events = require('base/js/events');\n", " var OutputArea = require('notebook/js/outputarea').OutputArea;\n", "\n", " if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n", " register_renderer(events, OutputArea);\n", " }\n", " }\n", "\n", " \n", " if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n", " root._bokeh_timeout = Date.now() + 5000;\n", " root._bokeh_failed_load = false;\n", " }\n", "\n", " var NB_LOAD_WARNING = {'data': {'text/html':\n", " \"
\\n\"+\n", " \"

\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"

\\n\"+\n", " \"\\n\"+\n", " \"\\n\"+\n", " \"from bokeh.resources import INLINE\\n\"+\n", " \"output_notebook(resources=INLINE)\\n\"+\n", " \"\\n\"+\n", " \"
\"}};\n", "\n", " function display_loaded() {\n", " var el = document.getElementById(null);\n", " if (el != null) {\n", " el.textContent = \"BokehJS is loading...\";\n", " }\n", " if (root.Bokeh !== undefined) {\n", " if (el != null) {\n", " el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n", " }\n", " } else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(display_loaded, 100)\n", " }\n", " }\n", "\n", "\n", " function run_callbacks() {\n", " try {\n", " root._bokeh_onload_callbacks.forEach(function(callback) {\n", " if (callback != null)\n", " callback();\n", " });\n", " } finally {\n", " delete root._bokeh_onload_callbacks\n", " }\n", " console.debug(\"Bokeh: all callbacks have finished\");\n", " }\n", "\n", " function load_libs(css_urls, js_urls, callback) {\n", " if (css_urls == null) css_urls = [];\n", " if (js_urls == null) js_urls = [];\n", "\n", " root._bokeh_onload_callbacks.push(callback);\n", " if (root._bokeh_is_loading > 0) {\n", " console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n", " return null;\n", " }\n", " if (js_urls == null || js_urls.length === 0) {\n", " run_callbacks();\n", " return null;\n", " }\n", " console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n", " root._bokeh_is_loading = css_urls.length + js_urls.length;\n", "\n", " function on_load() {\n", " root._bokeh_is_loading--;\n", " if (root._bokeh_is_loading === 0) {\n", " console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n", " run_callbacks()\n", " }\n", " }\n", "\n", " function on_error() {\n", " console.error(\"failed to load \" + url);\n", " }\n", "\n", " for (var i = 0; i < css_urls.length; i++) {\n", " var url = css_urls[i];\n", " const element = document.createElement(\"link\");\n", " element.onload = on_load;\n", " element.onerror = on_error;\n", " element.rel = \"stylesheet\";\n", " element.type = \"text/css\";\n", " element.href = url;\n", " console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n", " document.body.appendChild(element);\n", " }\n", "\n", " for (var i = 0; i < js_urls.length; i++) {\n", " var url = js_urls[i];\n", " var element = document.createElement('script');\n", " element.onload = on_load;\n", " element.onerror = on_error;\n", " element.async = false;\n", " element.src = url;\n", " console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", " document.head.appendChild(element);\n", " }\n", " };\n", "\n", " function inject_raw_css(css) {\n", " const element = document.createElement(\"style\");\n", " element.appendChild(document.createTextNode(css));\n", " document.body.appendChild(element);\n", " }\n", "\n", " \n", " var js_urls = [\"https://cdn.pydata.org/bokeh/release/bokeh-1.4.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-widgets-1.4.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-tables-1.4.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-gl-1.4.0.min.js\"];\n", " var css_urls = [];\n", " \n", "\n", " var inline_js = [\n", " function(Bokeh) {\n", " Bokeh.set_log_level(\"info\");\n", " },\n", " function(Bokeh) {\n", " \n", " \n", " }\n", " ];\n", "\n", " function run_inline_js() {\n", " \n", " if (root.Bokeh !== undefined || force === true) {\n", " \n", " for (var i = 0; i < inline_js.length; i++) {\n", " inline_js[i].call(root, root.Bokeh);\n", " }\n", " } else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(run_inline_js, 100);\n", " } else if (!root._bokeh_failed_load) {\n", " console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n", " root._bokeh_failed_load = true;\n", " } else if (force !== true) {\n", " var cell = $(document.getElementById(null)).parents('.cell').data().cell;\n", " cell.output_area.append_execute_result(NB_LOAD_WARNING)\n", " }\n", "\n", " }\n", "\n", " if (root._bokeh_is_loading === 0) {\n", " console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n", " run_inline_js();\n", " } else {\n", " load_libs(css_urls, js_urls, function() {\n", " console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n", " run_inline_js();\n", " });\n", " }\n", "}(window));" ], "application/vnd.bokehjs_load.v0+json": "\n(function(root) {\n function now() {\n return new Date();\n }\n\n var force = true;\n\n if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n root._bokeh_onload_callbacks = [];\n root._bokeh_is_loading = undefined;\n }\n\n \n\n \n if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n var NB_LOAD_WARNING = {'data': {'text/html':\n \"
\\n\"+\n \"

\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"

\\n\"+\n \"\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"\\n\"+\n \"
\"}};\n\n function display_loaded() {\n var el = document.getElementById(null);\n if (el != null) {\n el.textContent = \"BokehJS is loading...\";\n }\n if (root.Bokeh !== undefined) {\n if (el != null) {\n el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(display_loaded, 100)\n }\n }\n\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) {\n if (callback != null)\n callback();\n });\n } finally {\n delete root._bokeh_onload_callbacks\n }\n console.debug(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(css_urls, js_urls, callback) {\n if (css_urls == null) css_urls = [];\n if (js_urls == null) js_urls = [];\n\n root._bokeh_onload_callbacks.push(callback);\n if (root._bokeh_is_loading > 0) {\n console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls == null || js_urls.length === 0) {\n run_callbacks();\n return null;\n }\n console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n root._bokeh_is_loading = css_urls.length + js_urls.length;\n\n function on_load() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n run_callbacks()\n }\n }\n\n function on_error() {\n console.error(\"failed to load \" + url);\n }\n\n for (var i = 0; i < css_urls.length; i++) {\n var url = css_urls[i];\n const element = document.createElement(\"link\");\n element.onload = on_load;\n element.onerror = on_error;\n element.rel = \"stylesheet\";\n element.type = \"text/css\";\n element.href = url;\n console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n document.body.appendChild(element);\n }\n\n for (var i = 0; i < js_urls.length; i++) {\n var url = js_urls[i];\n var element = document.createElement('script');\n element.onload = on_load;\n element.onerror = on_error;\n element.async = false;\n element.src = url;\n console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.head.appendChild(element);\n }\n };\n\n function inject_raw_css(css) {\n const element = document.createElement(\"style\");\n element.appendChild(document.createTextNode(css));\n document.body.appendChild(element);\n }\n\n \n var js_urls = [\"https://cdn.pydata.org/bokeh/release/bokeh-1.4.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-widgets-1.4.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-tables-1.4.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-gl-1.4.0.min.js\"];\n var css_urls = [];\n \n\n var inline_js = [\n function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\n function(Bokeh) {\n \n \n }\n ];\n\n function run_inline_js() {\n \n if (root.Bokeh !== undefined || force === true) {\n \n for (var i = 0; i < inline_js.length; i++) {\n inline_js[i].call(root, root.Bokeh);\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n } else if (force !== true) {\n var cell = $(document.getElementById(null)).parents('.cell').data().cell;\n cell.output_area.append_execute_result(NB_LOAD_WARNING)\n }\n\n }\n\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n run_inline_js();\n } else {\n load_libs(css_urls, js_urls, function() {\n console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n}(window));" }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Python package that contains functions specialized on \"Machine Learning\" tasks.\n", "from sklearn.svm import SVC\n", "from sklearn.model_selection import StratifiedKFold\n", "from sklearn.feature_selection import RFECV, RFE\n", "from sklearn.preprocessing import normalize\n", "\n", "# Package dedicated to the manipulation of json files.\n", "from json import loads, dump\n", "\n", "# Package containing a diversified set of function for statistical processing and also provide support to array operations.\n", "from numpy import max, array\n", "\n", "# biosignalsnotebooks own package that supports some functionalities used on the Jupyter Notebooks.\n", "import biosignalsnotebooks as bsnb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

1 - Loading of the dictionary created on Volume 2 of \"Classification Game\" Jupyter Notebook

\n", "This dictionary contains all the features extracted from our training examples." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Specification of filename and relative path.\n", "relative_path = \"../../signal_samples/classification_game/features\"\n", "filename = \"classification_game_features.json\"\n", "\n", "# Load of data inside file storing it inside a Python dictionary.\n", "with open(relative_path + \"/\" + filename) as file:\n", " features_dict = loads(file.read())" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [ "hide_in" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[38;2;98;195;238m\u001b[1mDict Keys\u001b[0m\u001b[39m define the class number\n", "\u001b[38;2;232;77;14m\u001b[1mDict Sub-Keys\u001b[0m\u001b[39m define the trial number\n", "\n", "{'0': {'1': [0.002128164580188196, 0.00732421875, 0.3148858143023837, 0.0013299640761190862, 0.00525897736063944, 0.0177154541015625, 0.14585764294049008, 0.0032250390995378314, 0.5878418, 0.004769659606303164, 0.6044, 0.0, 1.4062325168397938e-06], '2': [0.002029433963100043, 0.0075531005859375, 0.3459899981478051, 0.0012865379359157589, 0.00426341220793342, 0.0205078125, 0.24356362289312836, 0.0032271742477853944, 0.5960790740740741, 0.005347679929104084, 0.6175999999999999, 0.0, 1.7843743867526657e-06], '3': [0.004812456585924175, 0.01629638671875, 0.1500312565117733, 0.0027146265743094667, 0.002620978804585002, 0.01263427734375, 0.17816211710773078, 0.0022142130046097727, 0.9737463333333332, 0.008456826821502778, 1.0055999999999998, 0.0, 4.284292720672414e-07], '4': [0.003288393293733703, 0.0120849609375, 0.2182839094577996, 0.001892522517093399, 0.006623739508638536, 0.024169921875, 0.21266888927882086, 0.004546908635468114, 0.4644140350877193, 0.00591195751107905, 0.5671999999999999, 0.0, 5.204352203726982e-07], '5': [0.003974582803167046, 0.01190185546875, 0.18929431376180775, 0.0021560792451752937, 0.015274954840938857, 0.0413360595703125, 0.13502500463048714, 0.008656094595054242, 0.2837205925925926, 0.011517276154508253, 0.3273999999999999, 0.0, 2.2944983716296495e-06]}, '1': {'1': [0.01745991778366743, 0.13330078125, 0.1666919230186392, 0.012498507929444395, 0.008794508626081544, 0.0683441162109375, 0.2518563418699803, 0.006557225017506427, 0.6551165151515151, 0.1461447530029049, 1.5952000000000002, 0.00030307622367025305, -6.71839817994486e-06], '2': [0.01576872398048997, 0.11004638671875, 0.17624797260767705, 0.011874382612913703, 0.007355703497563003, 0.063262939453125, 0.2759055685709137, 0.005353108416775609, 0.664985945945946, 0.22564751043706982, 2.3568, 0.0007208506037123806, 3.0184467432070763e-05], '3': [0.016834862817464734, 0.10711669921875, 0.12385397566261043, 0.010210459675732673, 0.006991805896638525, 0.05291748046875, 0.23737289548258042, 0.005118249131484526, 0.7038323999999999, 0.09012218944433163, 1.2955999999999999, 0.0, 1.7932120498114455e-06], '4': [0.01624700006560064, 0.08184814453125, 0.13230391296718721, 0.010096274630523346, 0.006410319455413151, 0.03900146484375, 0.24232321459905246, 0.004563770456971738, 0.694470701754386, 0.07905537027731246, 1.1703999999999999, 0.0, 1.2355520070555934e-05], '5': [0.020006202433146783, 0.11279296875, 0.1737049068216789, 0.014510097131530179, 0.00870274334484326, 0.0604248046875, 0.2576508804923919, 0.006020127976422262, 0.7650033504273503, 0.18888218438624177, 2.2648, 0.00034193879295606086, -7.484701743981861e-07]}, '2': {'1': [0.03701667312138605, 0.48175048828125, 0.13714182735628042, 0.028527836988579556, 0.00962890534505662, 0.0972747802734375, 0.2526581366011894, 0.007592204871521871, -0.05535329729729731, 0.389147481076091, 1.12, 0.0016219138583528565, -0.00017734880190042993], '2': [0.05605906972012585, 0.728759765625, 0.15079500769362283, 0.04570180889395357, 0.022246936792071167, 0.2032470703125, 0.28363822875705247, 0.0181818341541995, -0.1621129230769231, 0.4012124197110927, 0.7678, 0.0020516327577363653, -0.00018692108787173123], '3': [0.04336534865689463, 0.39495849609375, 0.2081066853834006, 0.03180753348005212, 0.019210994245302326, 0.104644775390625, 0.2675908054044569, 0.013625052459912818, -0.08241263157894738, 0.33346391541823117, 1.7416, 0.0029829794700824705, -0.00014185053469919516], '4': [0.06487298554636435, 0.8785400390625, 0.19967561722832944, 0.05166313713613995, 0.013633174787322466, 0.128814697265625, 0.30762299513425845, 0.009650441229459253, -0.13143430630630631, 0.3800528622757343, 1.1568, 0.002342764462065237, -0.0001757656867122758], '5': [0.04605573833347097, 0.56634521484375, 0.21383160179872848, 0.03490825767448176, 0.022935690711269243, 0.1263427734375, 0.3056287796557606, 0.01751656809433346, -0.037762759689922494, 0.25814329874621555, 1.0832000000000002, 0.0021708792060784617, -0.00010479693645550052]}, '3': {'1': [0.06666059775795927, 0.36566162109375, 0.11758009432428038, 0.03798634625145099, 0.09589783278118581, 0.936767578125, 0.31566108310294355, 0.0724352171785137, -0.28372952845528454, 0.2879444621409897, 0.52, 0.0017889087656529517, -0.00010216214518629092], '2': [0.028400519040188962, 0.36090087890625, 0.11831082236279707, 0.01845200002389393, 0.06906328567307285, 0.3870391845703125, 0.34416139511027527, 0.050492947208441975, -0.2559360683760684, 0.28748525778615397, 0.5438000000000001, 0.001538724568302274, -0.0001163346319399635], '3': [0.026872172099736805, 0.25762939453125, 0.19161771709795206, 0.021019732851432105, 0.09220916129668498, 0.75457763671875, 0.33910144467375775, 0.07045219784614146, -0.3003303492063492, 0.3241576808540921, 0.696, 0.0014287982219399905, -0.00011106434030924326], '4': [0.03224804078389381, 0.36859130859375, 0.14737561976406224, 0.023877465471875494, 0.08250524718428286, 1.1531982421875, 0.32894511882373056, 0.06375121644479873, -0.27145336752136756, 0.29059513319336294, 1.4832, 0.0018806633612583347, -0.00011592650334266791], '5': [0.024408102937141216, 0.19610595703125, 0.1247592235886798, 0.015037408860519162, 0.06386941034280033, 0.44805908203125, 0.3390131871388354, 0.046133417352076656, -0.3375715851851852, 0.30169159148351743, 0.6961999999999999, 0.0010371906949177656, -9.453002551262363e-05]}}\n" ] } ], "source": [ "from sty import fg, rs\n", "print(fg(98,195,238) + \"\\033[1mDict Keys\\033[0m\" + fg.rs + \" define the class number\")\n", "print(fg(232,77,14) + \"\\033[1mDict Sub-Keys\\033[0m\" + fg.rs + \" define the trial number\\n\")\n", "print(features_dict)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

2 - Restructuring of \"features_dict\" to a compatible format of scikit-learn package

\n", "features_dict must be converted to a list, containing inside it a number of sub-lists equal to the number of training examples (in our case 20). In its turn, each sub-list is formed by a number of entries equal to the number of extracted features (13 for our original formulation of the problem)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Initialisation of a list containing our training data and another list containing the labels of each training example.\n", "features_list = []\n", "class_training_examples = []\n", "\n", "# Access each feature list inside dictionary.\n", "list_classes = features_dict.keys()\n", "for class_i in list_classes:\n", " list_trials = features_dict[class_i].keys()\n", " for trial in list_trials:\n", " # Storage of the class label.\n", " class_training_examples += [int(class_i)]\n", " features_list += [features_dict[class_i][trial]]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [ "hide_in" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[38;2;232;77;14m\u001b[1m[Number of list entries;Number of sub-list entries]:\u001b[0m\u001b[39m [20; 13]✓\n", "\u001b[38;2;253;196;0m\u001b[1mClass of each training example:\u001b[0m\u001b[39m\n", "[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3]\n", "\u001b[38;2;98;195;238m\u001b[1mFeatures List:\u001b[0m\u001b[39m\n", "[[0.002128164580188196, 0.00732421875, 0.3148858143023837, 0.0013299640761190862, 0.00525897736063944, 0.0177154541015625, 0.14585764294049008, 0.0032250390995378314, 0.5878418, 0.004769659606303164, 0.6044, 0.0, 1.4062325168397938e-06], [0.002029433963100043, 0.0075531005859375, 0.3459899981478051, 0.0012865379359157589, 0.00426341220793342, 0.0205078125, 0.24356362289312836, 0.0032271742477853944, 0.5960790740740741, 0.005347679929104084, 0.6175999999999999, 0.0, 1.7843743867526657e-06], [0.004812456585924175, 0.01629638671875, 0.1500312565117733, 0.0027146265743094667, 0.002620978804585002, 0.01263427734375, 0.17816211710773078, 0.0022142130046097727, 0.9737463333333332, 0.008456826821502778, 1.0055999999999998, 0.0, 4.284292720672414e-07], [0.003288393293733703, 0.0120849609375, 0.2182839094577996, 0.001892522517093399, 0.006623739508638536, 0.024169921875, 0.21266888927882086, 0.004546908635468114, 0.4644140350877193, 0.00591195751107905, 0.5671999999999999, 0.0, 5.204352203726982e-07], [0.003974582803167046, 0.01190185546875, 0.18929431376180775, 0.0021560792451752937, 0.015274954840938857, 0.0413360595703125, 0.13502500463048714, 0.008656094595054242, 0.2837205925925926, 0.011517276154508253, 0.3273999999999999, 0.0, 2.2944983716296495e-06], [0.01745991778366743, 0.13330078125, 0.1666919230186392, 0.012498507929444395, 0.008794508626081544, 0.0683441162109375, 0.2518563418699803, 0.006557225017506427, 0.6551165151515151, 0.1461447530029049, 1.5952000000000002, 0.00030307622367025305, -6.71839817994486e-06], [0.01576872398048997, 0.11004638671875, 0.17624797260767705, 0.011874382612913703, 0.007355703497563003, 0.063262939453125, 0.2759055685709137, 0.005353108416775609, 0.664985945945946, 0.22564751043706982, 2.3568, 0.0007208506037123806, 3.0184467432070763e-05], [0.016834862817464734, 0.10711669921875, 0.12385397566261043, 0.010210459675732673, 0.006991805896638525, 0.05291748046875, 0.23737289548258042, 0.005118249131484526, 0.7038323999999999, 0.09012218944433163, 1.2955999999999999, 0.0, 1.7932120498114455e-06], [0.01624700006560064, 0.08184814453125, 0.13230391296718721, 0.010096274630523346, 0.006410319455413151, 0.03900146484375, 0.24232321459905246, 0.004563770456971738, 0.694470701754386, 0.07905537027731246, 1.1703999999999999, 0.0, 1.2355520070555934e-05], [0.020006202433146783, 0.11279296875, 0.1737049068216789, 0.014510097131530179, 0.00870274334484326, 0.0604248046875, 0.2576508804923919, 0.006020127976422262, 0.7650033504273503, 0.18888218438624177, 2.2648, 0.00034193879295606086, -7.484701743981861e-07], [0.03701667312138605, 0.48175048828125, 0.13714182735628042, 0.028527836988579556, 0.00962890534505662, 0.0972747802734375, 0.2526581366011894, 0.007592204871521871, -0.05535329729729731, 0.389147481076091, 1.12, 0.0016219138583528565, -0.00017734880190042993], [0.05605906972012585, 0.728759765625, 0.15079500769362283, 0.04570180889395357, 0.022246936792071167, 0.2032470703125, 0.28363822875705247, 0.0181818341541995, -0.1621129230769231, 0.4012124197110927, 0.7678, 0.0020516327577363653, -0.00018692108787173123], [0.04336534865689463, 0.39495849609375, 0.2081066853834006, 0.03180753348005212, 0.019210994245302326, 0.104644775390625, 0.2675908054044569, 0.013625052459912818, -0.08241263157894738, 0.33346391541823117, 1.7416, 0.0029829794700824705, -0.00014185053469919516], [0.06487298554636435, 0.8785400390625, 0.19967561722832944, 0.05166313713613995, 0.013633174787322466, 0.128814697265625, 0.30762299513425845, 0.009650441229459253, -0.13143430630630631, 0.3800528622757343, 1.1568, 0.002342764462065237, -0.0001757656867122758], [0.04605573833347097, 0.56634521484375, 0.21383160179872848, 0.03490825767448176, 0.022935690711269243, 0.1263427734375, 0.3056287796557606, 0.01751656809433346, -0.037762759689922494, 0.25814329874621555, 1.0832000000000002, 0.0021708792060784617, -0.00010479693645550052], [0.06666059775795927, 0.36566162109375, 0.11758009432428038, 0.03798634625145099, 0.09589783278118581, 0.936767578125, 0.31566108310294355, 0.0724352171785137, -0.28372952845528454, 0.2879444621409897, 0.52, 0.0017889087656529517, -0.00010216214518629092], [0.028400519040188962, 0.36090087890625, 0.11831082236279707, 0.01845200002389393, 0.06906328567307285, 0.3870391845703125, 0.34416139511027527, 0.050492947208441975, -0.2559360683760684, 0.28748525778615397, 0.5438000000000001, 0.001538724568302274, -0.0001163346319399635], [0.026872172099736805, 0.25762939453125, 0.19161771709795206, 0.021019732851432105, 0.09220916129668498, 0.75457763671875, 0.33910144467375775, 0.07045219784614146, -0.3003303492063492, 0.3241576808540921, 0.696, 0.0014287982219399905, -0.00011106434030924326], [0.03224804078389381, 0.36859130859375, 0.14737561976406224, 0.023877465471875494, 0.08250524718428286, 1.1531982421875, 0.32894511882373056, 0.06375121644479873, -0.27145336752136756, 0.29059513319336294, 1.4832, 0.0018806633612583347, -0.00011592650334266791], [0.024408102937141216, 0.19610595703125, 0.1247592235886798, 0.015037408860519162, 0.06386941034280033, 0.44805908203125, 0.3390131871388354, 0.046133417352076656, -0.3375715851851852, 0.30169159148351743, 0.6961999999999999, 0.0010371906949177656, -9.453002551262363e-05]]\n" ] } ], "source": [ "print(fg(232,77,14) + \"\\033[1m[Number of list entries;Number of sub-list entries]:\\033[0m\" + fg.rs + \" [\" + str(len(features_list)) + \"; \" + str(len(features_list[0])) + \"]\" + u'\\u2713')\n", "print(fg(253,196,0) + \"\\033[1mClass of each training example:\\033[0m\" + fg.rs)\n", "print(class_training_examples)\n", "print(fg(98,195,238) + \"\\033[1mFeatures List:\\033[0m\" + fg.rs)\n", "print(features_list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

2.1 - Normalisation of the features values, ensuring that the training stage is not affected by scale factors

" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "features_list = normalize(features_list, axis=0, norm=\"max\") # axis=0 specifies that each feature is normalised independently from the others \n", " # and norm=\"max\" defines that the normalization reference value will be the feature maximum value." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [ "hide_in" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0.03192537 0.00833681 0.91010092 0.025743 0.05483938 0.01536202\n", " 0.42380594 0.04452308 0.6036909 0.01188812 0.25644942 0.\n", " 0.04658795]\n", " [ 0.03044428 0.00859733 1. 0.02490244 0.04445786 0.01778342\n", " 0.70770175 0.04455256 0.61215026 0.0133288 0.26205024 0.\n", " 0.05911565]\n", " [ 0.07219342 0.0185494 0.43362888 0.05254475 0.02733095 0.01095586\n", " 0.51767025 0.03056818 1. 0.02107818 0.42668024 0.\n", " 0.0141937 ]\n", " [ 0.04933039 0.01375573 0.63089659 0.03663197 0.06907079 0.02095903\n", " 0.6179336 0.06277207 0.47693534 0.01473523 0.24066531 0.\n", " 0.01724182]\n", " [ 0.05962417 0.01354731 0.54710921 0.04173342 0.15928363 0.03584471\n", " 0.39233048 0.11950119 0.29137013 0.02870618 0.13891718 0.\n", " 0.07601586]\n", " [ 0.26192261 0.15172989 0.48178249 0.24192313 0.09170706 0.05926485\n", " 0.73179719 0.09052537 0.67277944 0.3642578 0.67684997 0.10160185\n", " -0.22257799]\n", " [ 0.23655239 0.12526053 0.50940193 0.22984246 0.07670354 0.05485869\n", " 0.80167495 0.07390201 0.68291497 0.56241407 1. 0.24165456\n", " 1. ]\n", " [ 0.25254593 0.1219258 0.35796982 0.1976353 0.0729089 0.04588758\n", " 0.68971389 0.07065968 0.72280878 0.22462463 0.54972845 0.\n", " 0.05940844]\n", " [ 0.24372719 0.09316382 0.38239231 0.19542512 0.0668453 0.03382026\n", " 0.70409761 0.06300486 0.71319468 0.19704118 0.49660557 0.\n", " 0.40933371]\n", " [ 0.30012036 0.12838683 0.50205182 0.28085978 0.09075016 0.05239759\n", " 0.74863388 0.08311051 0.78562899 0.47077851 0.96096402 0.11462995\n", " -0.02479653]\n", " [ 0.55530065 0.54835348 0.39637512 0.55218941 0.10040796 0.08435218\n", " 0.7341269 0.10481372 -0.05684571 0.9699288 0.47522064 0.54372277\n", " -5.87549879]\n", " [ 0.8409626 0.8295123 0.43583632 0.88461157 0.23198581 0.17624643\n", " 0.82414307 0.25100821 -0.16648373 1. 0.32578072 0.68777971\n", " -6.19262501]\n", " [ 0.65053945 0.44956232 0.6014818 0.61567174 0.20032772 0.09074309\n", " 0.77751546 0.18809984 -0.0846346 0.83114056 0.73896809 1.\n", " -4.69945461]\n", " [ 0.97318338 1. 0.57711384 1. 0.14216353 0.11170213\n", " 0.89383353 0.13322858 -0.13497797 0.94726096 0.49083503 0.78537733\n", " -5.82305078]\n", " [ 0.69089897 0.6446436 0.61802828 0.67568986 0.23916798 0.10955859\n", " 0.88803911 0.24182392 -0.0387809 0.64340805 0.45960625 0.72775533\n", " -3.4718829 ]\n", " [ 1. 0.41621509 0.33983669 0.73526983 1. 0.81232137\n", " 0.91718911 1. -0.2913793 0.71768581 0.22063815 0.59970536\n", " -3.38459327]\n", " [ 0.42604657 0.41079617 0.34194868 0.35715988 0.72017567 0.33562242\n", " 1. 0.69707732 -0.26283649 0.71654127 0.23073659 0.51583478\n", " -3.8541224 ]\n", " [ 0.40311928 0.29324719 0.55382444 0.40686133 0.9615354 0.65433471\n", " 0.98529774 0.97262355 -0.30842771 0.80794528 0.29531568 0.47898359\n", " -3.67951963]\n", " [ 0.48376465 0.41954981 0.42595341 0.46217607 0.86034527 1.\n", " 0.95578738 0.88011355 -0.27877216 0.72429247 0.6293279 0.63046474\n", " -3.84060125]\n", " [ 0.36615488 0.22321801 0.36058621 0.29106651 0.66601516 0.38853604\n", " 0.9850413 0.6368921 -0.34667302 0.75194978 0.29540054 0.34770293\n", " -3.13174402]]\n" ] } ], "source": [ "print(features_list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

3 - Selection of a classification algorithm to wrap in our Feature Selection methodology

\n", "A Support Vector Machine shares some principles with k-Nearest Neighbour Classifiers (which we want to use on Jupyter Notebook [volume 4] ), namely the Cartesian logic, given that each example corresponds to a point with a number $N$ of coordinates equivalent to the number of features analysed (13 for our original problem), that is, each feature defines a dimension of the space.\n", "
\n", "Because of this \"contact point\" our \"wrapped\" classifier will be a Support Vector Machine." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Creation of a \"Support Vector Classifier\" supposing that our classes are linearly separable.\n", "svc = SVC(kernel=\"linear\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

4 - Configuration of the Recursive Feature Elimination procedure given as an input our previously created \"svc\" object

\n", "Some inputs need to be given:\n", "" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "rfecv = RFECV(estimator=svc, step=1, cv=StratifiedKFold(5), scoring='accuracy')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

5 - Execution of the Recursive Feature Elimination procedure

" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Fit data to the model.\n", "selector = rfecv.fit(features_list, class_training_examples)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "tags": [ "hide_in" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "RFECV(cv=StratifiedKFold(n_splits=5, random_state=None, shuffle=False),\n", " estimator=SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None,\n", " coef0=0.0, decision_function_shape='ovr', degree=3,\n", " gamma='scale', kernel='linear', max_iter=-1,\n", " probability=False, random_state=None, shrinking=True,\n", " tol=0.001, verbose=False),\n", " min_features_to_select=1, n_jobs=None, scoring='accuracy', step=1,\n", " verbose=0)\n" ] } ], "source": [ "print(selector)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

6 - Get the optimal number of features

\n", "It will be the smallest number that provides the possibility to obtain a highest cross-validation score." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

6.1 - Get the list of average score of each virtual classifier (1 per Recursive Feature Elimination iteration)

\n", "The first element of the list refers to the average score of the trained classifiers when the set of features is 1, while the last one corresponds to the case where all features are taken into consideration." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# Get list of average score of the virtual classifier\n", "avg_scores = rfecv.grid_scores_" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "tags": [ "hide_in" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.4 0.7 0.75 0.8 0.85 0.85 0.9 0.95 0.95 0.95 0.95 0.95 0.95]\n" ] } ], "source": [ "print(avg_scores)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

6.2 - Identification of the maximum score

" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "max_score = max(avg_scores)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "tags": [ "hide_in" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[38;2;98;195;238m\u001b[1mMaximum Average Score:\u001b[0m \u001b[39m0.95\n" ] } ], "source": [ "print(fg(98,195,238) + \"\\033[1mMaximum Average Score:\\033[0m \" + fg.rs + str(max_score))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

6.3 - Identification of the smallest feature set that achieve the maximum score

" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "for nbr_features in range(0, len(avg_scores)):\n", " if avg_scores[nbr_features] == max_score:\n", " optimal_nbr_features = nbr_features + 1\n", " break" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "tags": [ "hide_in" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[38;2;98;195;238m\u001b[1mOptimal Number of Features:\u001b[0m \u001b[39m8\n" ] } ], "source": [ "print(fg(98,195,238) + \"\\033[1mOptimal Number of Features:\\033[0m \" + fg.rs + str(optimal_nbr_features))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "tags": [ "hide_in" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "(function(root) {\n", " function embed_document(root) {\n", " \n", " var docs_json = {\"accb0cfd-8a98-46a9-ba63-6c26b07d2fe5\":{\"roots\":{\"references\":[{\"attributes\":{\"background_fill_color\":{\"value\":\"rgb(242, 242, 242)\"},\"below\":[{\"id\":\"1011\",\"type\":\"LinearAxis\"}],\"center\":[{\"id\":\"1015\",\"type\":\"Grid\"},{\"id\":\"1020\",\"type\":\"Grid\"}],\"height\":200,\"left\":[{\"id\":\"1016\",\"type\":\"LinearAxis\"}],\"renderers\":[{\"id\":\"1037\",\"type\":\"GlyphRenderer\"}],\"sizing_mode\":\"scale_width\",\"title\":null,\"toolbar\":{\"id\":\"1027\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"1003\",\"type\":\"DataRange1d\"},\"x_scale\":{\"id\":\"1007\",\"type\":\"LinearScale\"},\"y_range\":{\"id\":\"1005\",\"type\":\"DataRange1d\"},\"y_scale\":{\"id\":\"1009\",\"type\":\"LinearScale\"}},\"id\":\"1001\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"1046\",\"type\":\"BoxAnnotation\"},{\"attributes\":{},\"id\":\"1017\",\"type\":\"BasicTicker\"},{\"attributes\":{\"data_source\":{\"id\":\"1034\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"1035\",\"type\":\"Line\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"1036\",\"type\":\"Line\"},\"selection_glyph\":null,\"view\":{\"id\":\"1038\",\"type\":\"CDSView\"}},\"id\":\"1037\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"callback\":null},\"id\":\"1003\",\"type\":\"DataRange1d\"},{\"attributes\":{\"callback\":null,\"data\":{\"x\":[1,2,3,4,5,6,7,8,9,10,11,12,13],\"y\":[0.4,0.7,0.75,0.8,0.85,0.85,0.9,0.95,0.95,0.95,0.95,0.95,0.95]},\"selected\":{\"id\":\"1045\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"1044\",\"type\":\"UnionRenderers\"}},\"id\":\"1034\",\"type\":\"ColumnDataSource\"},{\"attributes\":{},\"id\":\"1007\",\"type\":\"LinearScale\"},{\"attributes\":{\"callback\":null},\"id\":\"1005\",\"type\":\"DataRange1d\"},{\"attributes\":{},\"id\":\"1012\",\"type\":\"BasicTicker\"},{\"attributes\":{\"axis_label\":\"Cross validation score (nb of correct classifications)\",\"axis_line_color\":{\"value\":\"rgb(150, 150, 150)\"},\"axis_line_dash\":[2,2],\"formatter\":{\"id\":\"1040\",\"type\":\"BasicTickFormatter\"},\"major_label_text_color\":{\"value\":\"rgb(88, 88, 88)\"},\"major_tick_in\":0,\"major_tick_line_color\":{\"value\":\"white\"},\"major_tick_out\":0,\"minor_tick_line_color\":{\"value\":\"white\"},\"minor_tick_out\":0,\"ticker\":{\"id\":\"1017\",\"type\":\"BasicTicker\"}},\"id\":\"1016\",\"type\":\"LinearAxis\"},{\"attributes\":{\"axis_label\":\"Number of features selected\",\"axis_line_color\":{\"value\":\"white\"},\"formatter\":{\"id\":\"1042\",\"type\":\"BasicTickFormatter\"},\"major_label_text_color\":{\"value\":\"rgb(88, 88, 88)\"},\"major_tick_line_color\":{\"value\":\"white\"},\"minor_tick_line_color\":{\"value\":\"white\"},\"ticker\":{\"id\":\"1012\",\"type\":\"BasicTicker\"}},\"id\":\"1011\",\"type\":\"LinearAxis\"},{\"attributes\":{},\"id\":\"1009\",\"type\":\"LinearScale\"},{\"attributes\":{\"grid_line_color\":\"rgb(150, 150, 150)\",\"grid_line_dash\":[2,2],\"ticker\":{\"id\":\"1012\",\"type\":\"BasicTicker\"}},\"id\":\"1015\",\"type\":\"Grid\"},{\"attributes\":{\"dimension\":1,\"grid_line_color\":\"rgb(150, 150, 150)\",\"grid_line_dash\":[2,2],\"ticker\":{\"id\":\"1017\",\"type\":\"BasicTicker\"}},\"id\":\"1020\",\"type\":\"Grid\"},{\"attributes\":{\"line_alpha\":0.1,\"line_color\":\"#1f77b4\",\"line_width\":2,\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"1036\",\"type\":\"Line\"},{\"attributes\":{\"overlay\":{\"id\":\"1046\",\"type\":\"BoxAnnotation\"}},\"id\":\"1023\",\"type\":\"BoxZoomTool\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":{\"id\":\"1022\",\"type\":\"WheelZoomTool\"},\"active_tap\":\"auto\",\"logo\":null,\"tools\":[{\"id\":\"1021\",\"type\":\"PanTool\"},{\"id\":\"1022\",\"type\":\"WheelZoomTool\"},{\"id\":\"1023\",\"type\":\"BoxZoomTool\"},{\"id\":\"1025\",\"type\":\"ResetTool\"}]},\"id\":\"1027\",\"type\":\"Toolbar\"},{\"attributes\":{},\"id\":\"1021\",\"type\":\"PanTool\"},{\"attributes\":{},\"id\":\"1022\",\"type\":\"WheelZoomTool\"},{\"attributes\":{},\"id\":\"1025\",\"type\":\"ResetTool\"},{\"attributes\":{\"source\":{\"id\":\"1034\",\"type\":\"ColumnDataSource\"}},\"id\":\"1038\",\"type\":\"CDSView\"},{\"attributes\":{\"line_color\":\"#009EE3\",\"line_width\":2,\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"1035\",\"type\":\"Line\"},{\"attributes\":{},\"id\":\"1040\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{},\"id\":\"1042\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{},\"id\":\"1044\",\"type\":\"UnionRenderers\"},{\"attributes\":{},\"id\":\"1045\",\"type\":\"Selection\"}],\"root_ids\":[\"1001\"]},\"title\":\"Bokeh Application\",\"version\":\"1.4.0\"}};\n", " var render_items = [{\"docid\":\"accb0cfd-8a98-46a9-ba63-6c26b07d2fe5\",\"roots\":{\"1001\":\"69140bc6-f0b7-4441-b529-c99a3929cb83\"}}];\n", " root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n", "\n", " }\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " } else {\n", " var attempts = 0;\n", " var timer = setInterval(function(root) {\n", " if (root.Bokeh !== undefined) {\n", " clearInterval(timer);\n", " embed_document(root);\n", " } else {\n", " attempts++;\n", " if (attempts > 100) {\n", " clearInterval(timer);\n", " console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\");\n", " }\n", " }\n", " }, 10, root)\n", " }\n", "})(window);" ], "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "id": "1001" } }, "output_type": "display_data" } ], "source": [ "bsnb.plot([range(1, len(rfecv.grid_scores_) + 1)], [avg_scores], \n", " y_axis_label=\"Cross validation score (nb of correct classifications)\", x_axis_label=\"Number of features selected\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

7 - Identification of the set of relevant features, taking into consideration the previously determined optimal number

\n", "It should be repeated the Recursive Feature Elimination procedure with \"RFE\" scikit-learn function, specifying the desired number of target features." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "scrolled": true }, "outputs": [], "source": [ "rfe = RFE(estimator=svc, step=1, n_features_to_select=optimal_nbr_features)\n", "\n", "# Fit data to the model.\n", "final_selector = rfe.fit(features_list, class_training_examples)\n", "\n", "# Acception/Rejection Label attributed to each feature.\n", "acception_labels = final_selector.support_" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "tags": [ "hide_in" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[38;2;98;195;238m\u001b[1mRelevant Features (True):\u001b[0m \u001b[39m\n", "[ True False True True True False False True False True True False\n", " True]\n" ] } ], "source": [ "print(fg(98,195,238) + \"\\033[1mRelevant Features (True):\\033[0m \" + fg.rs)\n", "print(acception_labels)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each training array has the following structure/content:\n", "
\n", "\\[$\\sigma_{emg\\,flexor}$, $max_{emg\\,flexor}$, $zcr_{emg\\,flexor}$, $\\sigma_{emg\\,flexor}^{abs}$, $\\sigma_{emg\\,adductor}$, $max_{emg\\,adductor}$, $zcr_{emg\\,adductor}$, $\\sigma_{emg\\,adductor}^{abs}$, $\\mu_{acc\\,z}$, $\\sigma_{acc\\,z}$, $max_{acc\\,z}$, $zcr_{acc\\,z}$, $m_{acc\\,z}$\\] \n", "\n", "So, the relevant features are:\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

8 - Removal of meaningless features from our \"features_list\" list

" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "# Access each training example and exclude meaningless entries.\n", "final_features_list = []\n", "for example_nbr in range(0, len(features_list)):\n", " final_features_list += [list(array(features_list[example_nbr])[array(acception_labels)])]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

9 - Storage of the final list of features (after Recursive Feature Elimination) inside a .json file

" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "filename = \"classification_game_features_final.json\"\n", "\n", "# Generation of .json file in our previously mentioned \"relative_path\".\n", "# [Generation of new file]\n", "with open(relative_path + \"/\" + filename, 'w') as file:\n", " dump({\"features_list_final\": final_features_list, \"class_labels\": class_training_examples}, file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We reach the end of the \"Classification Game\" third volume. After Feature Selection all training examples are ready to be delivered to our classification algorithm in order to participate on the training process.\n", "\n", "If your are feeling your interest increasing, please jump to the next volume \n", "\n", "We hope that you have enjoyed this guide. biosignalsnotebooks is an environment in continuous expansion, so don't stop your journey and learn more with the remaining Notebooks !" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "hide_mark", "aux" ] }, "source": [ "**Auxiliary Code Segment (should not be replicated by\n", "the user)**" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "tags": [ "hide_both" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ".................... CSS Style Applied to Jupyter Notebook .........................\n" ] }, { "data": { "text/html": [ "