{ "cells": [ { "cell_type": "markdown", "metadata": { "tags": [ "intro_info_title" ] }, "source": [ "\n", "\n", "\n", "\n", "
\n", " | Stone, Paper or Scissor Game - Train and Classify [Volume 3] | \n", "
Tags | \n", "train_and_classify☁machine-learning☁features☁selection | \n", "
\n",
" ☌ Currently we are in possession of a file containing the feature values for all training examples, as demonstrated on a previously created Jupyter Notebook .\n",
" \n", " However, there is a high risk that some of the extracted features are not useful for our classification system. Remember, a good feature is a parameter that has the ability to separate the different classes of our classification system, i.e, a parameter with a characteristic range of values for each available class.\n", " \n", " In order to ensure that the training process of our classifier happens in the most efficient way, these redundant or invariant features should be removed.\n", " \n", " The implicit logic of the last two paragraphs is called Feature Selection, which will be focused at this Jupyter Notebook !\n", " | \n",
"
Starting Point (Setup)
\n", "List of Available Classes:\n", "\n", " \n", " | \n", "\n", " \n", " | \n", "\n", " \n", " | \n", "
\n", " Paper\n", " | \n", "\n", " Stone\n", " | \n", "\n", " Scissor\n", " | \n", "
Protocol/Feature Extraction
\n", "Extracted Features\n", "Feature Selection
\n", "Intro\n", "0 - Import of the needed packages for a correct execution of the current Jupyter Notebook
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "hide_out" ] }, "outputs": [ { "data": { "application/javascript": [ "\n", "(function(root) {\n", " function now() {\n", " return new Date();\n", " }\n", "\n", " var force = true;\n", "\n", " if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n", " root._bokeh_onload_callbacks = [];\n", " root._bokeh_is_loading = undefined;\n", " }\n", "\n", " var JS_MIME_TYPE = 'application/javascript';\n", " var HTML_MIME_TYPE = 'text/html';\n", " var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n", " var CLASS_NAME = 'output_bokeh rendered_html';\n", "\n", " /**\n", " * Render data to the DOM node\n", " */\n", " function render(props, node) {\n", " var script = document.createElement(\"script\");\n", " node.appendChild(script);\n", " }\n", "\n", " /**\n", " * Handle when an output is cleared or removed\n", " */\n", " function handleClearOutput(event, handle) {\n", " var cell = handle.cell;\n", "\n", " var id = cell.output_area._bokeh_element_id;\n", " var server_id = cell.output_area._bokeh_server_id;\n", " // Clean up Bokeh references\n", " if (id != null && id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", "\n", " if (server_id !== undefined) {\n", " // Clean up Bokeh references\n", " var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n", " cell.notebook.kernel.execute(cmd, {\n", " iopub: {\n", " output: function(msg) {\n", " var id = msg.content.text.trim();\n", " if (id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", " }\n", " }\n", " });\n", " // Destroy server and session\n", " var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n", " cell.notebook.kernel.execute(cmd);\n", " }\n", " }\n", "\n", " /**\n", " * Handle when a new output is added\n", " */\n", " function handleAddOutput(event, handle) {\n", " var output_area = handle.output_area;\n", " var output = handle.output;\n", "\n", " // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n", " if ((output.output_type != \"display_data\") || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n", " return\n", " }\n", "\n", " var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n", "\n", " if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n", " toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n", " // store reference to embed id on output_area\n", " output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n", " }\n", " if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n", " var bk_div = document.createElement(\"div\");\n", " bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n", " var script_attrs = bk_div.children[0].attributes;\n", " for (var i = 0; i < script_attrs.length; i++) {\n", " toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n", " }\n", " // store reference to server id on output_area\n", " output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n", " }\n", " }\n", "\n", " function register_renderer(events, OutputArea) {\n", "\n", " function append_mime(data, metadata, element) {\n", " // create a DOM node to render to\n", " var toinsert = this.create_output_subarea(\n", " metadata,\n", " CLASS_NAME,\n", " EXEC_MIME_TYPE\n", " );\n", " this.keyboard_manager.register_events(toinsert);\n", " // Render to node\n", " var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n", " render(props, toinsert[toinsert.length - 1]);\n", " element.append(toinsert);\n", " return toinsert\n", " }\n", "\n", " /* Handle when an output is cleared or removed */\n", " events.on('clear_output.CodeCell', handleClearOutput);\n", " events.on('delete.Cell', handleClearOutput);\n", "\n", " /* Handle when a new output is added */\n", " events.on('output_added.OutputArea', handleAddOutput);\n", "\n", " /**\n", " * Register the mime type and append_mime function with output_area\n", " */\n", " OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n", " /* Is output safe? */\n", " safe: true,\n", " /* Index of renderer in `output_area.display_order` */\n", " index: 0\n", " });\n", " }\n", "\n", " // register the mime type if in Jupyter Notebook environment and previously unregistered\n", " if (root.Jupyter !== undefined) {\n", " var events = require('base/js/events');\n", " var OutputArea = require('notebook/js/outputarea').OutputArea;\n", "\n", " if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n", " register_renderer(events, OutputArea);\n", " }\n", " }\n", "\n", " \n", " if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n", " root._bokeh_timeout = Date.now() + 5000;\n", " root._bokeh_failed_load = false;\n", " }\n", "\n", " var NB_LOAD_WARNING = {'data': {'text/html':\n", " \"\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"
\\n\"+\n", " \"\\n\"+\n",
" \"from bokeh.resources import INLINE\\n\"+\n",
" \"output_notebook(resources=INLINE)\\n\"+\n",
" \"
\\n\"+\n",
" \"\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"
\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"
\\n\"+\n \"\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"
\\n\"+\n", " \"\\n\"+\n",
" \"from bokeh.resources import INLINE\\n\"+\n",
" \"output_notebook(resources=INLINE)\\n\"+\n",
" \"
\\n\"+\n",
" \"\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"
\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"
\\n\"+\n \"1 - Loading of the dictionary created on Volume 2 of \"Classification Game\" Jupyter Notebook
\n", "This dictionary contains all the features extracted from our training examples." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Specification of filename and relative path.\n", "relative_path = \"../../signal_samples/classification_game/features\"\n", "filename = \"classification_game_features.json\"\n", "\n", "# Load of data inside file storing it inside a Python dictionary.\n", "with open(relative_path + \"/\" + filename) as file:\n", " features_dict = loads(file.read())" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [ "hide_in" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[38;2;98;195;238m\u001b[1mDict Keys\u001b[0m\u001b[39m define the class number\n", "\u001b[38;2;232;77;14m\u001b[1mDict Sub-Keys\u001b[0m\u001b[39m define the trial number\n", "\n", "{'0': {'1': [0.002128164580188196, 0.00732421875, 0.3148858143023837, 0.0013299640761190862, 0.00525897736063944, 0.0177154541015625, 0.14585764294049008, 0.0032250390995378314, 0.5878418, 0.004769659606303164, 0.6044, 0.0, 1.4062325168397938e-06], '2': [0.002029433963100043, 0.0075531005859375, 0.3459899981478051, 0.0012865379359157589, 0.00426341220793342, 0.0205078125, 0.24356362289312836, 0.0032271742477853944, 0.5960790740740741, 0.005347679929104084, 0.6175999999999999, 0.0, 1.7843743867526657e-06], '3': [0.004812456585924175, 0.01629638671875, 0.1500312565117733, 0.0027146265743094667, 0.002620978804585002, 0.01263427734375, 0.17816211710773078, 0.0022142130046097727, 0.9737463333333332, 0.008456826821502778, 1.0055999999999998, 0.0, 4.284292720672414e-07], '4': [0.003288393293733703, 0.0120849609375, 0.2182839094577996, 0.001892522517093399, 0.006623739508638536, 0.024169921875, 0.21266888927882086, 0.004546908635468114, 0.4644140350877193, 0.00591195751107905, 0.5671999999999999, 0.0, 5.204352203726982e-07], '5': [0.003974582803167046, 0.01190185546875, 0.18929431376180775, 0.0021560792451752937, 0.015274954840938857, 0.0413360595703125, 0.13502500463048714, 0.008656094595054242, 0.2837205925925926, 0.011517276154508253, 0.3273999999999999, 0.0, 2.2944983716296495e-06]}, '1': {'1': [0.01745991778366743, 0.13330078125, 0.1666919230186392, 0.012498507929444395, 0.008794508626081544, 0.0683441162109375, 0.2518563418699803, 0.006557225017506427, 0.6551165151515151, 0.1461447530029049, 1.5952000000000002, 0.00030307622367025305, -6.71839817994486e-06], '2': [0.01576872398048997, 0.11004638671875, 0.17624797260767705, 0.011874382612913703, 0.007355703497563003, 0.063262939453125, 0.2759055685709137, 0.005353108416775609, 0.664985945945946, 0.22564751043706982, 2.3568, 0.0007208506037123806, 3.0184467432070763e-05], '3': [0.016834862817464734, 0.10711669921875, 0.12385397566261043, 0.010210459675732673, 0.006991805896638525, 0.05291748046875, 0.23737289548258042, 0.005118249131484526, 0.7038323999999999, 0.09012218944433163, 1.2955999999999999, 0.0, 1.7932120498114455e-06], '4': [0.01624700006560064, 0.08184814453125, 0.13230391296718721, 0.010096274630523346, 0.006410319455413151, 0.03900146484375, 0.24232321459905246, 0.004563770456971738, 0.694470701754386, 0.07905537027731246, 1.1703999999999999, 0.0, 1.2355520070555934e-05], '5': [0.020006202433146783, 0.11279296875, 0.1737049068216789, 0.014510097131530179, 0.00870274334484326, 0.0604248046875, 0.2576508804923919, 0.006020127976422262, 0.7650033504273503, 0.18888218438624177, 2.2648, 0.00034193879295606086, -7.484701743981861e-07]}, '2': {'1': [0.03701667312138605, 0.48175048828125, 0.13714182735628042, 0.028527836988579556, 0.00962890534505662, 0.0972747802734375, 0.2526581366011894, 0.007592204871521871, -0.05535329729729731, 0.389147481076091, 1.12, 0.0016219138583528565, -0.00017734880190042993], '2': [0.05605906972012585, 0.728759765625, 0.15079500769362283, 0.04570180889395357, 0.022246936792071167, 0.2032470703125, 0.28363822875705247, 0.0181818341541995, -0.1621129230769231, 0.4012124197110927, 0.7678, 0.0020516327577363653, -0.00018692108787173123], '3': [0.04336534865689463, 0.39495849609375, 0.2081066853834006, 0.03180753348005212, 0.019210994245302326, 0.104644775390625, 0.2675908054044569, 0.013625052459912818, -0.08241263157894738, 0.33346391541823117, 1.7416, 0.0029829794700824705, -0.00014185053469919516], '4': [0.06487298554636435, 0.8785400390625, 0.19967561722832944, 0.05166313713613995, 0.013633174787322466, 0.128814697265625, 0.30762299513425845, 0.009650441229459253, -0.13143430630630631, 0.3800528622757343, 1.1568, 0.002342764462065237, -0.0001757656867122758], '5': [0.04605573833347097, 0.56634521484375, 0.21383160179872848, 0.03490825767448176, 0.022935690711269243, 0.1263427734375, 0.3056287796557606, 0.01751656809433346, -0.037762759689922494, 0.25814329874621555, 1.0832000000000002, 0.0021708792060784617, -0.00010479693645550052]}, '3': {'1': [0.06666059775795927, 0.36566162109375, 0.11758009432428038, 0.03798634625145099, 0.09589783278118581, 0.936767578125, 0.31566108310294355, 0.0724352171785137, -0.28372952845528454, 0.2879444621409897, 0.52, 0.0017889087656529517, -0.00010216214518629092], '2': [0.028400519040188962, 0.36090087890625, 0.11831082236279707, 0.01845200002389393, 0.06906328567307285, 0.3870391845703125, 0.34416139511027527, 0.050492947208441975, -0.2559360683760684, 0.28748525778615397, 0.5438000000000001, 0.001538724568302274, -0.0001163346319399635], '3': [0.026872172099736805, 0.25762939453125, 0.19161771709795206, 0.021019732851432105, 0.09220916129668498, 0.75457763671875, 0.33910144467375775, 0.07045219784614146, -0.3003303492063492, 0.3241576808540921, 0.696, 0.0014287982219399905, -0.00011106434030924326], '4': [0.03224804078389381, 0.36859130859375, 0.14737561976406224, 0.023877465471875494, 0.08250524718428286, 1.1531982421875, 0.32894511882373056, 0.06375121644479873, -0.27145336752136756, 0.29059513319336294, 1.4832, 0.0018806633612583347, -0.00011592650334266791], '5': [0.024408102937141216, 0.19610595703125, 0.1247592235886798, 0.015037408860519162, 0.06386941034280033, 0.44805908203125, 0.3390131871388354, 0.046133417352076656, -0.3375715851851852, 0.30169159148351743, 0.6961999999999999, 0.0010371906949177656, -9.453002551262363e-05]}}\n" ] } ], "source": [ "from sty import fg, rs\n", "print(fg(98,195,238) + \"\\033[1mDict Keys\\033[0m\" + fg.rs + \" define the class number\")\n", "print(fg(232,77,14) + \"\\033[1mDict Sub-Keys\\033[0m\" + fg.rs + \" define the trial number\\n\")\n", "print(features_dict)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2 - Restructuring of \"features_dict\" to a compatible format of scikit-learn package
\n", "features_dict must be converted to a list, containing inside it a number of sub-lists equal to the number of training examples (in our case 20). In its turn, each sub-list is formed by a number of entries equal to the number of extracted features (13 for our original formulation of the problem)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Initialisation of a list containing our training data and another list containing the labels of each training example.\n", "features_list = []\n", "class_training_examples = []\n", "\n", "# Access each feature list inside dictionary.\n", "list_classes = features_dict.keys()\n", "for class_i in list_classes:\n", " list_trials = features_dict[class_i].keys()\n", " for trial in list_trials:\n", " # Storage of the class label.\n", " class_training_examples += [int(class_i)]\n", " features_list += [features_dict[class_i][trial]]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [ "hide_in" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[38;2;232;77;14m\u001b[1m[Number of list entries;Number of sub-list entries]:\u001b[0m\u001b[39m [20; 13]✓\n", "\u001b[38;2;253;196;0m\u001b[1mClass of each training example:\u001b[0m\u001b[39m\n", "[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3]\n", "\u001b[38;2;98;195;238m\u001b[1mFeatures List:\u001b[0m\u001b[39m\n", "[[0.002128164580188196, 0.00732421875, 0.3148858143023837, 0.0013299640761190862, 0.00525897736063944, 0.0177154541015625, 0.14585764294049008, 0.0032250390995378314, 0.5878418, 0.004769659606303164, 0.6044, 0.0, 1.4062325168397938e-06], [0.002029433963100043, 0.0075531005859375, 0.3459899981478051, 0.0012865379359157589, 0.00426341220793342, 0.0205078125, 0.24356362289312836, 0.0032271742477853944, 0.5960790740740741, 0.005347679929104084, 0.6175999999999999, 0.0, 1.7843743867526657e-06], [0.004812456585924175, 0.01629638671875, 0.1500312565117733, 0.0027146265743094667, 0.002620978804585002, 0.01263427734375, 0.17816211710773078, 0.0022142130046097727, 0.9737463333333332, 0.008456826821502778, 1.0055999999999998, 0.0, 4.284292720672414e-07], [0.003288393293733703, 0.0120849609375, 0.2182839094577996, 0.001892522517093399, 0.006623739508638536, 0.024169921875, 0.21266888927882086, 0.004546908635468114, 0.4644140350877193, 0.00591195751107905, 0.5671999999999999, 0.0, 5.204352203726982e-07], [0.003974582803167046, 0.01190185546875, 0.18929431376180775, 0.0021560792451752937, 0.015274954840938857, 0.0413360595703125, 0.13502500463048714, 0.008656094595054242, 0.2837205925925926, 0.011517276154508253, 0.3273999999999999, 0.0, 2.2944983716296495e-06], [0.01745991778366743, 0.13330078125, 0.1666919230186392, 0.012498507929444395, 0.008794508626081544, 0.0683441162109375, 0.2518563418699803, 0.006557225017506427, 0.6551165151515151, 0.1461447530029049, 1.5952000000000002, 0.00030307622367025305, -6.71839817994486e-06], [0.01576872398048997, 0.11004638671875, 0.17624797260767705, 0.011874382612913703, 0.007355703497563003, 0.063262939453125, 0.2759055685709137, 0.005353108416775609, 0.664985945945946, 0.22564751043706982, 2.3568, 0.0007208506037123806, 3.0184467432070763e-05], [0.016834862817464734, 0.10711669921875, 0.12385397566261043, 0.010210459675732673, 0.006991805896638525, 0.05291748046875, 0.23737289548258042, 0.005118249131484526, 0.7038323999999999, 0.09012218944433163, 1.2955999999999999, 0.0, 1.7932120498114455e-06], [0.01624700006560064, 0.08184814453125, 0.13230391296718721, 0.010096274630523346, 0.006410319455413151, 0.03900146484375, 0.24232321459905246, 0.004563770456971738, 0.694470701754386, 0.07905537027731246, 1.1703999999999999, 0.0, 1.2355520070555934e-05], [0.020006202433146783, 0.11279296875, 0.1737049068216789, 0.014510097131530179, 0.00870274334484326, 0.0604248046875, 0.2576508804923919, 0.006020127976422262, 0.7650033504273503, 0.18888218438624177, 2.2648, 0.00034193879295606086, -7.484701743981861e-07], [0.03701667312138605, 0.48175048828125, 0.13714182735628042, 0.028527836988579556, 0.00962890534505662, 0.0972747802734375, 0.2526581366011894, 0.007592204871521871, -0.05535329729729731, 0.389147481076091, 1.12, 0.0016219138583528565, -0.00017734880190042993], [0.05605906972012585, 0.728759765625, 0.15079500769362283, 0.04570180889395357, 0.022246936792071167, 0.2032470703125, 0.28363822875705247, 0.0181818341541995, -0.1621129230769231, 0.4012124197110927, 0.7678, 0.0020516327577363653, -0.00018692108787173123], [0.04336534865689463, 0.39495849609375, 0.2081066853834006, 0.03180753348005212, 0.019210994245302326, 0.104644775390625, 0.2675908054044569, 0.013625052459912818, -0.08241263157894738, 0.33346391541823117, 1.7416, 0.0029829794700824705, -0.00014185053469919516], [0.06487298554636435, 0.8785400390625, 0.19967561722832944, 0.05166313713613995, 0.013633174787322466, 0.128814697265625, 0.30762299513425845, 0.009650441229459253, -0.13143430630630631, 0.3800528622757343, 1.1568, 0.002342764462065237, -0.0001757656867122758], [0.04605573833347097, 0.56634521484375, 0.21383160179872848, 0.03490825767448176, 0.022935690711269243, 0.1263427734375, 0.3056287796557606, 0.01751656809433346, -0.037762759689922494, 0.25814329874621555, 1.0832000000000002, 0.0021708792060784617, -0.00010479693645550052], [0.06666059775795927, 0.36566162109375, 0.11758009432428038, 0.03798634625145099, 0.09589783278118581, 0.936767578125, 0.31566108310294355, 0.0724352171785137, -0.28372952845528454, 0.2879444621409897, 0.52, 0.0017889087656529517, -0.00010216214518629092], [0.028400519040188962, 0.36090087890625, 0.11831082236279707, 0.01845200002389393, 0.06906328567307285, 0.3870391845703125, 0.34416139511027527, 0.050492947208441975, -0.2559360683760684, 0.28748525778615397, 0.5438000000000001, 0.001538724568302274, -0.0001163346319399635], [0.026872172099736805, 0.25762939453125, 0.19161771709795206, 0.021019732851432105, 0.09220916129668498, 0.75457763671875, 0.33910144467375775, 0.07045219784614146, -0.3003303492063492, 0.3241576808540921, 0.696, 0.0014287982219399905, -0.00011106434030924326], [0.03224804078389381, 0.36859130859375, 0.14737561976406224, 0.023877465471875494, 0.08250524718428286, 1.1531982421875, 0.32894511882373056, 0.06375121644479873, -0.27145336752136756, 0.29059513319336294, 1.4832, 0.0018806633612583347, -0.00011592650334266791], [0.024408102937141216, 0.19610595703125, 0.1247592235886798, 0.015037408860519162, 0.06386941034280033, 0.44805908203125, 0.3390131871388354, 0.046133417352076656, -0.3375715851851852, 0.30169159148351743, 0.6961999999999999, 0.0010371906949177656, -9.453002551262363e-05]]\n" ] } ], "source": [ "print(fg(232,77,14) + \"\\033[1m[Number of list entries;Number of sub-list entries]:\\033[0m\" + fg.rs + \" [\" + str(len(features_list)) + \"; \" + str(len(features_list[0])) + \"]\" + u'\\u2713')\n", "print(fg(253,196,0) + \"\\033[1mClass of each training example:\\033[0m\" + fg.rs)\n", "print(class_training_examples)\n", "print(fg(98,195,238) + \"\\033[1mFeatures List:\\033[0m\" + fg.rs)\n", "print(features_list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2.1 - Normalisation of the features values, ensuring that the training stage is not affected by scale factors
" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "features_list = normalize(features_list, axis=0, norm=\"max\") # axis=0 specifies that each feature is normalised independently from the others \n", " # and norm=\"max\" defines that the normalization reference value will be the feature maximum value." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [ "hide_in" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0.03192537 0.00833681 0.91010092 0.025743 0.05483938 0.01536202\n", " 0.42380594 0.04452308 0.6036909 0.01188812 0.25644942 0.\n", " 0.04658795]\n", " [ 0.03044428 0.00859733 1. 0.02490244 0.04445786 0.01778342\n", " 0.70770175 0.04455256 0.61215026 0.0133288 0.26205024 0.\n", " 0.05911565]\n", " [ 0.07219342 0.0185494 0.43362888 0.05254475 0.02733095 0.01095586\n", " 0.51767025 0.03056818 1. 0.02107818 0.42668024 0.\n", " 0.0141937 ]\n", " [ 0.04933039 0.01375573 0.63089659 0.03663197 0.06907079 0.02095903\n", " 0.6179336 0.06277207 0.47693534 0.01473523 0.24066531 0.\n", " 0.01724182]\n", " [ 0.05962417 0.01354731 0.54710921 0.04173342 0.15928363 0.03584471\n", " 0.39233048 0.11950119 0.29137013 0.02870618 0.13891718 0.\n", " 0.07601586]\n", " [ 0.26192261 0.15172989 0.48178249 0.24192313 0.09170706 0.05926485\n", " 0.73179719 0.09052537 0.67277944 0.3642578 0.67684997 0.10160185\n", " -0.22257799]\n", " [ 0.23655239 0.12526053 0.50940193 0.22984246 0.07670354 0.05485869\n", " 0.80167495 0.07390201 0.68291497 0.56241407 1. 0.24165456\n", " 1. ]\n", " [ 0.25254593 0.1219258 0.35796982 0.1976353 0.0729089 0.04588758\n", " 0.68971389 0.07065968 0.72280878 0.22462463 0.54972845 0.\n", " 0.05940844]\n", " [ 0.24372719 0.09316382 0.38239231 0.19542512 0.0668453 0.03382026\n", " 0.70409761 0.06300486 0.71319468 0.19704118 0.49660557 0.\n", " 0.40933371]\n", " [ 0.30012036 0.12838683 0.50205182 0.28085978 0.09075016 0.05239759\n", " 0.74863388 0.08311051 0.78562899 0.47077851 0.96096402 0.11462995\n", " -0.02479653]\n", " [ 0.55530065 0.54835348 0.39637512 0.55218941 0.10040796 0.08435218\n", " 0.7341269 0.10481372 -0.05684571 0.9699288 0.47522064 0.54372277\n", " -5.87549879]\n", " [ 0.8409626 0.8295123 0.43583632 0.88461157 0.23198581 0.17624643\n", " 0.82414307 0.25100821 -0.16648373 1. 0.32578072 0.68777971\n", " -6.19262501]\n", " [ 0.65053945 0.44956232 0.6014818 0.61567174 0.20032772 0.09074309\n", " 0.77751546 0.18809984 -0.0846346 0.83114056 0.73896809 1.\n", " -4.69945461]\n", " [ 0.97318338 1. 0.57711384 1. 0.14216353 0.11170213\n", " 0.89383353 0.13322858 -0.13497797 0.94726096 0.49083503 0.78537733\n", " -5.82305078]\n", " [ 0.69089897 0.6446436 0.61802828 0.67568986 0.23916798 0.10955859\n", " 0.88803911 0.24182392 -0.0387809 0.64340805 0.45960625 0.72775533\n", " -3.4718829 ]\n", " [ 1. 0.41621509 0.33983669 0.73526983 1. 0.81232137\n", " 0.91718911 1. -0.2913793 0.71768581 0.22063815 0.59970536\n", " -3.38459327]\n", " [ 0.42604657 0.41079617 0.34194868 0.35715988 0.72017567 0.33562242\n", " 1. 0.69707732 -0.26283649 0.71654127 0.23073659 0.51583478\n", " -3.8541224 ]\n", " [ 0.40311928 0.29324719 0.55382444 0.40686133 0.9615354 0.65433471\n", " 0.98529774 0.97262355 -0.30842771 0.80794528 0.29531568 0.47898359\n", " -3.67951963]\n", " [ 0.48376465 0.41954981 0.42595341 0.46217607 0.86034527 1.\n", " 0.95578738 0.88011355 -0.27877216 0.72429247 0.6293279 0.63046474\n", " -3.84060125]\n", " [ 0.36615488 0.22321801 0.36058621 0.29106651 0.66601516 0.38853604\n", " 0.9850413 0.6368921 -0.34667302 0.75194978 0.29540054 0.34770293\n", " -3.13174402]]\n" ] } ], "source": [ "print(features_list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3 - Selection of a classification algorithm to wrap in our Feature Selection methodology
\n", "A Support Vector Machine shares some principles with k-Nearest Neighbour Classifiers (which we want to use on Jupyter Notebook [volume 4] ), namely the Cartesian logic, given that each example corresponds to a point with a number $N$ of coordinates equivalent to the number of features analysed (13 for our original problem), that is, each feature defines a dimension of the space.\n", "4 - Configuration of the Recursive Feature Elimination procedure given as an input our previously created \"svc\" object
\n", "Some inputs need to be given:\n", "5 - Execution of the Recursive Feature Elimination procedure
" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Fit data to the model.\n", "selector = rfecv.fit(features_list, class_training_examples)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "tags": [ "hide_in" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "RFECV(cv=StratifiedKFold(n_splits=5, random_state=None, shuffle=False),\n", " estimator=SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None,\n", " coef0=0.0, decision_function_shape='ovr', degree=3,\n", " gamma='scale', kernel='linear', max_iter=-1,\n", " probability=False, random_state=None, shrinking=True,\n", " tol=0.001, verbose=False),\n", " min_features_to_select=1, n_jobs=None, scoring='accuracy', step=1,\n", " verbose=0)\n" ] } ], "source": [ "print(selector)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "6 - Get the optimal number of features
\n", "It will be the smallest number that provides the possibility to obtain a highest cross-validation score." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "6.1 - Get the list of average score of each virtual classifier (1 per Recursive Feature Elimination iteration)
\n", "The first element of the list refers to the average score of the trained classifiers when the set of features is 1, while the last one corresponds to the case where all features are taken into consideration." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# Get list of average score of the virtual classifier\n", "avg_scores = rfecv.grid_scores_" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "tags": [ "hide_in" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.4 0.7 0.75 0.8 0.85 0.85 0.9 0.95 0.95 0.95 0.95 0.95 0.95]\n" ] } ], "source": [ "print(avg_scores)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "6.2 - Identification of the maximum score
" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "max_score = max(avg_scores)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "tags": [ "hide_in" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[38;2;98;195;238m\u001b[1mMaximum Average Score:\u001b[0m \u001b[39m0.95\n" ] } ], "source": [ "print(fg(98,195,238) + \"\\033[1mMaximum Average Score:\\033[0m \" + fg.rs + str(max_score))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "6.3 - Identification of the smallest feature set that achieve the maximum score
" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "for nbr_features in range(0, len(avg_scores)):\n", " if avg_scores[nbr_features] == max_score:\n", " optimal_nbr_features = nbr_features + 1\n", " break" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "tags": [ "hide_in" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[38;2;98;195;238m\u001b[1mOptimal Number of Features:\u001b[0m \u001b[39m8\n" ] } ], "source": [ "print(fg(98,195,238) + \"\\033[1mOptimal Number of Features:\\033[0m \" + fg.rs + str(optimal_nbr_features))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "tags": [ "hide_in" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", " \n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "(function(root) {\n", " function embed_document(root) {\n", " \n", " var docs_json = {\"accb0cfd-8a98-46a9-ba63-6c26b07d2fe5\":{\"roots\":{\"references\":[{\"attributes\":{\"background_fill_color\":{\"value\":\"rgb(242, 242, 242)\"},\"below\":[{\"id\":\"1011\",\"type\":\"LinearAxis\"}],\"center\":[{\"id\":\"1015\",\"type\":\"Grid\"},{\"id\":\"1020\",\"type\":\"Grid\"}],\"height\":200,\"left\":[{\"id\":\"1016\",\"type\":\"LinearAxis\"}],\"renderers\":[{\"id\":\"1037\",\"type\":\"GlyphRenderer\"}],\"sizing_mode\":\"scale_width\",\"title\":null,\"toolbar\":{\"id\":\"1027\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"1003\",\"type\":\"DataRange1d\"},\"x_scale\":{\"id\":\"1007\",\"type\":\"LinearScale\"},\"y_range\":{\"id\":\"1005\",\"type\":\"DataRange1d\"},\"y_scale\":{\"id\":\"1009\",\"type\":\"LinearScale\"}},\"id\":\"1001\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"1046\",\"type\":\"BoxAnnotation\"},{\"attributes\":{},\"id\":\"1017\",\"type\":\"BasicTicker\"},{\"attributes\":{\"data_source\":{\"id\":\"1034\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"1035\",\"type\":\"Line\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"1036\",\"type\":\"Line\"},\"selection_glyph\":null,\"view\":{\"id\":\"1038\",\"type\":\"CDSView\"}},\"id\":\"1037\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"callback\":null},\"id\":\"1003\",\"type\":\"DataRange1d\"},{\"attributes\":{\"callback\":null,\"data\":{\"x\":[1,2,3,4,5,6,7,8,9,10,11,12,13],\"y\":[0.4,0.7,0.75,0.8,0.85,0.85,0.9,0.95,0.95,0.95,0.95,0.95,0.95]},\"selected\":{\"id\":\"1045\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"1044\",\"type\":\"UnionRenderers\"}},\"id\":\"1034\",\"type\":\"ColumnDataSource\"},{\"attributes\":{},\"id\":\"1007\",\"type\":\"LinearScale\"},{\"attributes\":{\"callback\":null},\"id\":\"1005\",\"type\":\"DataRange1d\"},{\"attributes\":{},\"id\":\"1012\",\"type\":\"BasicTicker\"},{\"attributes\":{\"axis_label\":\"Cross validation score (nb of correct classifications)\",\"axis_line_color\":{\"value\":\"rgb(150, 150, 150)\"},\"axis_line_dash\":[2,2],\"formatter\":{\"id\":\"1040\",\"type\":\"BasicTickFormatter\"},\"major_label_text_color\":{\"value\":\"rgb(88, 88, 88)\"},\"major_tick_in\":0,\"major_tick_line_color\":{\"value\":\"white\"},\"major_tick_out\":0,\"minor_tick_line_color\":{\"value\":\"white\"},\"minor_tick_out\":0,\"ticker\":{\"id\":\"1017\",\"type\":\"BasicTicker\"}},\"id\":\"1016\",\"type\":\"LinearAxis\"},{\"attributes\":{\"axis_label\":\"Number of features selected\",\"axis_line_color\":{\"value\":\"white\"},\"formatter\":{\"id\":\"1042\",\"type\":\"BasicTickFormatter\"},\"major_label_text_color\":{\"value\":\"rgb(88, 88, 88)\"},\"major_tick_line_color\":{\"value\":\"white\"},\"minor_tick_line_color\":{\"value\":\"white\"},\"ticker\":{\"id\":\"1012\",\"type\":\"BasicTicker\"}},\"id\":\"1011\",\"type\":\"LinearAxis\"},{\"attributes\":{},\"id\":\"1009\",\"type\":\"LinearScale\"},{\"attributes\":{\"grid_line_color\":\"rgb(150, 150, 150)\",\"grid_line_dash\":[2,2],\"ticker\":{\"id\":\"1012\",\"type\":\"BasicTicker\"}},\"id\":\"1015\",\"type\":\"Grid\"},{\"attributes\":{\"dimension\":1,\"grid_line_color\":\"rgb(150, 150, 150)\",\"grid_line_dash\":[2,2],\"ticker\":{\"id\":\"1017\",\"type\":\"BasicTicker\"}},\"id\":\"1020\",\"type\":\"Grid\"},{\"attributes\":{\"line_alpha\":0.1,\"line_color\":\"#1f77b4\",\"line_width\":2,\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"1036\",\"type\":\"Line\"},{\"attributes\":{\"overlay\":{\"id\":\"1046\",\"type\":\"BoxAnnotation\"}},\"id\":\"1023\",\"type\":\"BoxZoomTool\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":{\"id\":\"1022\",\"type\":\"WheelZoomTool\"},\"active_tap\":\"auto\",\"logo\":null,\"tools\":[{\"id\":\"1021\",\"type\":\"PanTool\"},{\"id\":\"1022\",\"type\":\"WheelZoomTool\"},{\"id\":\"1023\",\"type\":\"BoxZoomTool\"},{\"id\":\"1025\",\"type\":\"ResetTool\"}]},\"id\":\"1027\",\"type\":\"Toolbar\"},{\"attributes\":{},\"id\":\"1021\",\"type\":\"PanTool\"},{\"attributes\":{},\"id\":\"1022\",\"type\":\"WheelZoomTool\"},{\"attributes\":{},\"id\":\"1025\",\"type\":\"ResetTool\"},{\"attributes\":{\"source\":{\"id\":\"1034\",\"type\":\"ColumnDataSource\"}},\"id\":\"1038\",\"type\":\"CDSView\"},{\"attributes\":{\"line_color\":\"#009EE3\",\"line_width\":2,\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"1035\",\"type\":\"Line\"},{\"attributes\":{},\"id\":\"1040\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{},\"id\":\"1042\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{},\"id\":\"1044\",\"type\":\"UnionRenderers\"},{\"attributes\":{},\"id\":\"1045\",\"type\":\"Selection\"}],\"root_ids\":[\"1001\"]},\"title\":\"Bokeh Application\",\"version\":\"1.4.0\"}};\n", " var render_items = [{\"docid\":\"accb0cfd-8a98-46a9-ba63-6c26b07d2fe5\",\"roots\":{\"1001\":\"69140bc6-f0b7-4441-b529-c99a3929cb83\"}}];\n", " root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n", "\n", " }\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " } else {\n", " var attempts = 0;\n", " var timer = setInterval(function(root) {\n", " if (root.Bokeh !== undefined) {\n", " clearInterval(timer);\n", " embed_document(root);\n", " } else {\n", " attempts++;\n", " if (attempts > 100) {\n", " clearInterval(timer);\n", " console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\");\n", " }\n", " }\n", " }, 10, root)\n", " }\n", "})(window);" ], "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "id": "1001" } }, "output_type": "display_data" } ], "source": [ "bsnb.plot([range(1, len(rfecv.grid_scores_) + 1)], [avg_scores], \n", " y_axis_label=\"Cross validation score (nb of correct classifications)\", x_axis_label=\"Number of features selected\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "7 - Identification of the set of relevant features, taking into consideration the previously determined optimal number
\n", "It should be repeated the Recursive Feature Elimination procedure with \"RFE\" scikit-learn function, specifying the desired number of target features." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "scrolled": true }, "outputs": [], "source": [ "rfe = RFE(estimator=svc, step=1, n_features_to_select=optimal_nbr_features)\n", "\n", "# Fit data to the model.\n", "final_selector = rfe.fit(features_list, class_training_examples)\n", "\n", "# Acception/Rejection Label attributed to each feature.\n", "acception_labels = final_selector.support_" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "tags": [ "hide_in" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[38;2;98;195;238m\u001b[1mRelevant Features (True):\u001b[0m \u001b[39m\n", "[ True False True True True False False True False True True False\n", " True]\n" ] } ], "source": [ "print(fg(98,195,238) + \"\\033[1mRelevant Features (True):\\033[0m \" + fg.rs)\n", "print(acception_labels)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each training array has the following structure/content:\n", "8 - Removal of meaningless features from our \"features_list\" list
" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "# Access each training example and exclude meaningless entries.\n", "final_features_list = []\n", "for example_nbr in range(0, len(features_list)):\n", " final_features_list += [list(array(features_list[example_nbr])[array(acception_labels)])]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "9 - Storage of the final list of features (after Recursive Feature Elimination) inside a .json file
" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "filename = \"classification_game_features_final.json\"\n", "\n", "# Generation of .json file in our previously mentioned \"relative_path\".\n", "# [Generation of new file]\n", "with open(relative_path + \"/\" + filename, 'w') as file:\n", " dump({\"features_list_final\": final_features_list, \"class_labels\": class_training_examples}, file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We reach the end of the \"Classification Game\" third volume. After Feature Selection all training examples are ready to be delivered to our classification algorithm in order to participate on the training process.\n", "\n", "If your are feeling your interest increasing, please jump to the next volume \n", "\n", "We hope that you have enjoyed this guide. biosignalsnotebooks is an environment in continuous expansion, so don't stop your journey and learn more with the remaining Notebooks !" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "hide_mark", "aux" ] }, "source": [ "**Auxiliary Code Segment (should not be replicated by\n", "the user)**" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "tags": [ "hide_both" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ".................... CSS Style Applied to Jupyter Notebook .........................\n" ] }, { "data": { "text/html": [ "