{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Principal Component Analysis (PCA)\n", "\n", "\n", "Na aprendizagem de máquina não supervisionada temos algoritmos que trabalham com dados que não possuem classificações prévias para treinamento. Assim, o desafio se torna mais subjetivo que na abordagem supervisionada, pois não existe um objetivo claro para ser classificado ou previsto. Por isso, a aprendizagem não supervisionada é muitas vezes usada como parte de uma análise exploratória dos dados. \n", "\n", "Um das formas mais comum de exploração dos dados é através da sua visualização num espaço N-dimensional, no qual cada dimensão representa uma característica.\n", "Ou seja, para dados com duas características usamos um plano e para dados com três características um espaço tridimensional, mas a partir de quatro características temos um problema, pois só conseguimos ver até a terceira dimensão.\n", "\n", "Para conseguir visualizar dados com muitas dimensões, podemos usar a Principal Component Analysis (PCA). Ela é uma técnica que encontra uma representação em menos dimensões para um conjunto de dados, com a preocupação em manter ao máximo a relação entre os dados originais." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A ideia principal da PCA é que num conjunto de dados representado por N dimensões, nem todas elas são interessantes. Em outras palavras, nem todas as características dos exemplos servem para diferenciá-los.\n", "\n", "Por exemplo, usando um conjunto de dados de pessoas com o objetivo de tentar entender as relações entre quem paga ou não paga um empréstimo, provavelmente os dados sobre a altura das pessoas não ajudam a diferenciar bons pagadores de maus pagadores.\n", "Além disso, num conjunto de dados, existem características que importam mais que outras, ainda nesse exemplo, o salário pode influenciar mais que a idade.\n", "\n", "Assim, a PCA procura novas dimensões nos dados que sirvam para diferenciar os dados da melhor forma possível." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dados" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Para explicar o algoritmo, inicialmente vamos trabalhar com exemplos que possuem apenas 2 características. Isso vai facilitar a visualização dos dados e o entendimento do processo. Nesse caso usaremos a PCA para reduzir a dimensão do conjunto de dados de duas dimensões para uma dimensão.\n", "\n", "Abaixo temos os pontos que trabalharemos nesse exemplo." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " Loading BokehJS ...\n", "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", "(function(root) {\n", " function now() {\n", " return new Date();\n", " }\n", "\n", " var force = true;\n", "\n", " if (typeof (root._bokeh_onload_callbacks) === \"undefined\" || force === true) {\n", " root._bokeh_onload_callbacks = [];\n", " root._bokeh_is_loading = undefined;\n", " }\n", "\n", " var JS_MIME_TYPE = 'application/javascript';\n", " var HTML_MIME_TYPE = 'text/html';\n", " var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n", " var CLASS_NAME = 'output_bokeh rendered_html';\n", "\n", " /**\n", " * Render data to the DOM node\n", " */\n", " function render(props, node) {\n", " var script = document.createElement(\"script\");\n", " node.appendChild(script);\n", " }\n", "\n", " /**\n", " * Handle when an output is cleared or removed\n", " */\n", " function handleClearOutput(event, handle) {\n", " var cell = handle.cell;\n", "\n", " var id = cell.output_area._bokeh_element_id;\n", " var server_id = cell.output_area._bokeh_server_id;\n", " // Clean up Bokeh references\n", " if (id != null && id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", "\n", " if (server_id !== undefined) {\n", " // Clean up Bokeh references\n", " var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n", " cell.notebook.kernel.execute(cmd, {\n", " iopub: {\n", " output: function(msg) {\n", " var id = msg.content.text.trim();\n", " if (id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", " }\n", " }\n", " });\n", " // Destroy server and session\n", " var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n", " cell.notebook.kernel.execute(cmd);\n", " }\n", " }\n", "\n", " /**\n", " * Handle when a new output is added\n", " */\n", " function handleAddOutput(event, handle) {\n", " var output_area = handle.output_area;\n", " var output = handle.output;\n", "\n", " // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n", " if ((output.output_type != \"display_data\") || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n", " return\n", " }\n", "\n", " var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n", "\n", " if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n", " toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n", " // store reference to embed id on output_area\n", " output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n", " }\n", " if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n", " var bk_div = document.createElement(\"div\");\n", " bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n", " var script_attrs = bk_div.children[0].attributes;\n", " for (var i = 0; i < script_attrs.length; i++) {\n", " toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n", " }\n", " // store reference to server id on output_area\n", " output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n", " }\n", " }\n", "\n", " function register_renderer(events, OutputArea) {\n", "\n", " function append_mime(data, metadata, element) {\n", " // create a DOM node to render to\n", " var toinsert = this.create_output_subarea(\n", " metadata,\n", " CLASS_NAME,\n", " EXEC_MIME_TYPE\n", " );\n", " this.keyboard_manager.register_events(toinsert);\n", " // Render to node\n", " var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n", " render(props, toinsert[toinsert.length - 1]);\n", " element.append(toinsert);\n", " return toinsert\n", " }\n", "\n", " /* Handle when an output is cleared or removed */\n", " events.on('clear_output.CodeCell', handleClearOutput);\n", " events.on('delete.Cell', handleClearOutput);\n", "\n", " /* Handle when a new output is added */\n", " events.on('output_added.OutputArea', handleAddOutput);\n", "\n", " /**\n", " * Register the mime type and append_mime function with output_area\n", " */\n", " OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n", " /* Is output safe? */\n", " safe: true,\n", " /* Index of renderer in `output_area.display_order` */\n", " index: 0\n", " });\n", " }\n", "\n", " // register the mime type if in Jupyter Notebook environment and previously unregistered\n", " if (root.Jupyter !== undefined) {\n", " var events = require('base/js/events');\n", " var OutputArea = require('notebook/js/outputarea').OutputArea;\n", "\n", " if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n", " register_renderer(events, OutputArea);\n", " }\n", " }\n", "\n", " \n", " if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n", " root._bokeh_timeout = Date.now() + 5000;\n", " root._bokeh_failed_load = false;\n", " }\n", "\n", " var NB_LOAD_WARNING = {'data': {'text/html':\n", " \"
\\n\"+\n", " \"

\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"

\\n\"+\n", " \"\\n\"+\n", " \"\\n\"+\n", " \"from bokeh.resources import INLINE\\n\"+\n", " \"output_notebook(resources=INLINE)\\n\"+\n", " \"\\n\"+\n", " \"
\"}};\n", "\n", " function display_loaded() {\n", " var el = document.getElementById(\"32957f33-6234-43d9-a835-81aac641ebc4\");\n", " if (el != null) {\n", " el.textContent = \"BokehJS is loading...\";\n", " }\n", " if (root.Bokeh !== undefined) {\n", " if (el != null) {\n", " el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n", " }\n", " } else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(display_loaded, 100)\n", " }\n", " }\n", "\n", "\n", " function run_callbacks() {\n", " try {\n", " root._bokeh_onload_callbacks.forEach(function(callback) { callback() });\n", " }\n", " finally {\n", " delete root._bokeh_onload_callbacks\n", " }\n", " console.info(\"Bokeh: all callbacks have finished\");\n", " }\n", "\n", " function load_libs(js_urls, callback) {\n", " root._bokeh_onload_callbacks.push(callback);\n", " if (root._bokeh_is_loading > 0) {\n", " console.log(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n", " return null;\n", " }\n", " if (js_urls == null || js_urls.length === 0) {\n", " run_callbacks();\n", " return null;\n", " }\n", " console.log(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n", " root._bokeh_is_loading = js_urls.length;\n", " for (var i = 0; i < js_urls.length; i++) {\n", " var url = js_urls[i];\n", " var s = document.createElement('script');\n", " s.src = url;\n", " s.async = false;\n", " s.onreadystatechange = s.onload = function() {\n", " root._bokeh_is_loading--;\n", " if (root._bokeh_is_loading === 0) {\n", " console.log(\"Bokeh: all BokehJS libraries loaded\");\n", " run_callbacks()\n", " }\n", " };\n", " s.onerror = function() {\n", " console.warn(\"failed to load library \" + url);\n", " };\n", " console.log(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", " document.getElementsByTagName(\"head\")[0].appendChild(s);\n", " }\n", " };var element = document.getElementById(\"32957f33-6234-43d9-a835-81aac641ebc4\");\n", " if (element == null) {\n", " console.log(\"Bokeh: ERROR: autoload.js configured with elementid '32957f33-6234-43d9-a835-81aac641ebc4' but no matching script tag was found. \")\n", " return false;\n", " }\n", "\n", " var js_urls = [\"https://cdn.pydata.org/bokeh/release/bokeh-0.13.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.13.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-tables-0.13.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-gl-0.13.0.min.js\"];\n", "\n", " var inline_js = [\n", " function(Bokeh) {\n", " Bokeh.set_log_level(\"info\");\n", " },\n", " \n", " function(Bokeh) {\n", " \n", " },\n", " function(Bokeh) {\n", " console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-0.13.0.min.css\");\n", " Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-0.13.0.min.css\");\n", " console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.13.0.min.css\");\n", " Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.13.0.min.css\");\n", " console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-tables-0.13.0.min.css\");\n", " Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-tables-0.13.0.min.css\");\n", " }\n", " ];\n", "\n", " function run_inline_js() {\n", " \n", " if ((root.Bokeh !== undefined) || (force === true)) {\n", " for (var i = 0; i < inline_js.length; i++) {\n", " inline_js[i].call(root, root.Bokeh);\n", " }if (force === true) {\n", " display_loaded();\n", " }} else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(run_inline_js, 100);\n", " } else if (!root._bokeh_failed_load) {\n", " console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n", " root._bokeh_failed_load = true;\n", " } else if (force !== true) {\n", " var cell = $(document.getElementById(\"32957f33-6234-43d9-a835-81aac641ebc4\")).parents('.cell').data().cell;\n", " cell.output_area.append_execute_result(NB_LOAD_WARNING)\n", " }\n", "\n", " }\n", "\n", " if (root._bokeh_is_loading === 0) {\n", " console.log(\"Bokeh: BokehJS loaded, going straight to plotting\");\n", " run_inline_js();\n", " } else {\n", " load_libs(js_urls, function() {\n", " console.log(\"Bokeh: BokehJS plotting callback run at\", now());\n", " run_inline_js();\n", " });\n", " }\n", "}(window));" ], "application/vnd.bokehjs_load.v0+json": "\n(function(root) {\n function now() {\n return new Date();\n }\n\n var force = true;\n\n if (typeof (root._bokeh_onload_callbacks) === \"undefined\" || force === true) {\n root._bokeh_onload_callbacks = [];\n root._bokeh_is_loading = undefined;\n }\n\n \n\n \n if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n var NB_LOAD_WARNING = {'data': {'text/html':\n \"
\\n\"+\n \"

\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"

\\n\"+\n \"\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"\\n\"+\n \"
\"}};\n\n function display_loaded() {\n var el = document.getElementById(\"32957f33-6234-43d9-a835-81aac641ebc4\");\n if (el != null) {\n el.textContent = \"BokehJS is loading...\";\n }\n if (root.Bokeh !== undefined) {\n if (el != null) {\n el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(display_loaded, 100)\n }\n }\n\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) { callback() });\n }\n finally {\n delete root._bokeh_onload_callbacks\n }\n console.info(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(js_urls, callback) {\n root._bokeh_onload_callbacks.push(callback);\n if (root._bokeh_is_loading > 0) {\n console.log(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls == null || js_urls.length === 0) {\n run_callbacks();\n return null;\n }\n console.log(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n root._bokeh_is_loading = js_urls.length;\n for (var i = 0; i < js_urls.length; i++) {\n var url = js_urls[i];\n var s = document.createElement('script');\n s.src = url;\n s.async = false;\n s.onreadystatechange = s.onload = function() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.log(\"Bokeh: all BokehJS libraries loaded\");\n run_callbacks()\n }\n };\n s.onerror = function() {\n console.warn(\"failed to load library \" + url);\n };\n console.log(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.getElementsByTagName(\"head\")[0].appendChild(s);\n }\n };var element = document.getElementById(\"32957f33-6234-43d9-a835-81aac641ebc4\");\n if (element == null) {\n console.log(\"Bokeh: ERROR: autoload.js configured with elementid '32957f33-6234-43d9-a835-81aac641ebc4' but no matching script tag was found. \")\n return false;\n }\n\n var js_urls = [\"https://cdn.pydata.org/bokeh/release/bokeh-0.13.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.13.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-tables-0.13.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-gl-0.13.0.min.js\"];\n\n var inline_js = [\n function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\n \n function(Bokeh) {\n \n },\n function(Bokeh) {\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-0.13.0.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-0.13.0.min.css\");\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.13.0.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.13.0.min.css\");\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-tables-0.13.0.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-tables-0.13.0.min.css\");\n }\n ];\n\n function run_inline_js() {\n \n if ((root.Bokeh !== undefined) || (force === true)) {\n for (var i = 0; i < inline_js.length; i++) {\n inline_js[i].call(root, root.Bokeh);\n }if (force === true) {\n display_loaded();\n }} else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n } else if (force !== true) {\n var cell = $(document.getElementById(\"32957f33-6234-43d9-a835-81aac641ebc4\")).parents('.cell').data().cell;\n cell.output_area.append_execute_result(NB_LOAD_WARNING)\n }\n\n }\n\n if (root._bokeh_is_loading === 0) {\n console.log(\"Bokeh: BokehJS loaded, going straight to plotting\");\n run_inline_js();\n } else {\n load_libs(js_urls, function() {\n console.log(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n}(window));" }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "(function(root) {\n", " function embed_document(root) {\n", " \n", " var docs_json = {\"80be6481-69a0-4256-af2b-841e82168f1d\":{\"roots\":{\"references\":[{\"attributes\":{\"plot\":{\"id\":\"23fadcd1-8791-4bd3-90f9-9ad69f92c83a\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"5878ff36-700e-49ce-aa18-9b46a49edfc7\",\"type\":\"BasicTicker\"}},\"id\":\"85289dea-7c88-45d5-a0c2-4098aecd9ca8\",\"type\":\"Grid\"},{\"attributes\":{},\"id\":\"10c8eed4-e90c-47c7-890d-bdac2cb1dbe3\",\"type\":\"Selection\"},{\"attributes\":{\"formatter\":{\"id\":\"61f1405e-f6bc-4e03-9c1a-22c3cfa66322\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"23fadcd1-8791-4bd3-90f9-9ad69f92c83a\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"08fec4e6-bec1-479a-8846-3140b53e57e1\",\"type\":\"BasicTicker\"}},\"id\":\"b99bcdd9-c5a3-4354-9e29-74a0913af95e\",\"type\":\"LinearAxis\"},{\"attributes\":{\"below\":[{\"id\":\"7e7495e3-14cf-4d38-9b9c-1adc33c0d9d3\",\"type\":\"LinearAxis\"}],\"left\":[{\"id\":\"b99bcdd9-c5a3-4354-9e29-74a0913af95e\",\"type\":\"LinearAxis\"}],\"plot_height\":350,\"plot_width\":350,\"renderers\":[{\"id\":\"7e7495e3-14cf-4d38-9b9c-1adc33c0d9d3\",\"type\":\"LinearAxis\"},{\"id\":\"85289dea-7c88-45d5-a0c2-4098aecd9ca8\",\"type\":\"Grid\"},{\"id\":\"b99bcdd9-c5a3-4354-9e29-74a0913af95e\",\"type\":\"LinearAxis\"},{\"id\":\"dd44c910-690d-4c67-9075-e6ee66e99889\",\"type\":\"Grid\"},{\"id\":\"7856690d-f715-4084-91d7-00518c922bf3\",\"type\":\"BoxAnnotation\"},{\"id\":\"4236f3ad-a71c-4399-bc82-fddc93dc657c\",\"type\":\"GlyphRenderer\"}],\"title\":{\"id\":\"18870396-a803-41a8-ba85-d50d6c8aa7e2\",\"type\":\"Title\"},\"toolbar\":{\"id\":\"ca3d6b4d-41b2-4500-b454-9f4a98010be3\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"e81296c0-4726-4dd9-8d43-1e8a877e2b92\",\"type\":\"Range1d\"},\"x_scale\":{\"id\":\"f4b87b77-b2b2-476f-81c9-0491c78af78c\",\"type\":\"LinearScale\"},\"y_range\":{\"id\":\"f5f8145f-67f4-4e37-902f-544f6cf3281a\",\"type\":\"Range1d\"},\"y_scale\":{\"id\":\"74a2b611-2146-4fb7-9f43-86e318a8e517\",\"type\":\"LinearScale\"}},\"id\":\"23fadcd1-8791-4bd3-90f9-9ad69f92c83a\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{},\"id\":\"08fec4e6-bec1-479a-8846-3140b53e57e1\",\"type\":\"BasicTicker\"},{\"attributes\":{\"dimension\":1,\"plot\":{\"id\":\"23fadcd1-8791-4bd3-90f9-9ad69f92c83a\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"08fec4e6-bec1-479a-8846-3140b53e57e1\",\"type\":\"BasicTicker\"}},\"id\":\"dd44c910-690d-4c67-9075-e6ee66e99889\",\"type\":\"Grid\"},{\"attributes\":{\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"value\":\"#1f77b4\"},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"radius\":{\"units\":\"data\",\"value\":0.04},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"72165b78-1a32-4d7d-b8d0-394e950ce3d1\",\"type\":\"Circle\"},{\"attributes\":{},\"id\":\"1ded86d4-12db-4806-88a1-daed74122f87\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"2e6e8e3a-1793-4f8f-85ef-a035a6f75729\",\"type\":\"PanTool\"},{\"id\":\"9bb29dc9-3153-4af6-a021-e8f446fee913\",\"type\":\"WheelZoomTool\"},{\"id\":\"04386d73-0e7d-4fc6-92f5-aa3bf253c0f1\",\"type\":\"BoxZoomTool\"},{\"id\":\"333ed71e-a550-4504-902f-c91ddb3af01d\",\"type\":\"SaveTool\"},{\"id\":\"e420a0fb-3516-41ac-b197-023ef958252c\",\"type\":\"ResetTool\"},{\"id\":\"6210843c-a00f-47cb-bd4a-bcd746f15db7\",\"type\":\"HelpTool\"}]},\"id\":\"ca3d6b4d-41b2-4500-b454-9f4a98010be3\",\"type\":\"Toolbar\"},{\"attributes\":{\"source\":{\"id\":\"f24fffc0-6227-4959-9848-dcca7f8f457c\",\"type\":\"ColumnDataSource\"}},\"id\":\"7616e946-dbc1-4a2e-b7ea-7aa1917979ff\",\"type\":\"CDSView\"},{\"attributes\":{\"fill_color\":{\"value\":\"#1f77b4\"},\"line_color\":{\"value\":\"#1f77b4\"},\"radius\":{\"units\":\"data\",\"value\":0.04},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"700f6d89-4003-4145-805c-872a96d34d60\",\"type\":\"Circle\"},{\"attributes\":{\"plot\":null,\"text\":\"\"},\"id\":\"18870396-a803-41a8-ba85-d50d6c8aa7e2\",\"type\":\"Title\"},{\"attributes\":{\"callback\":null,\"end\":4},\"id\":\"e81296c0-4726-4dd9-8d43-1e8a877e2b92\",\"type\":\"Range1d\"},{\"attributes\":{},\"id\":\"2e6e8e3a-1793-4f8f-85ef-a035a6f75729\",\"type\":\"PanTool\"},{\"attributes\":{\"callback\":null,\"data\":{\"x\":[2.5,0.5,2.2,1.9,3.1,2.3,2,1,1.5,1.1],\"y\":[2.4,0.7,2.9,2.2,3,2.7,1.6,1.1,1.6,0.9]},\"selected\":{\"id\":\"10c8eed4-e90c-47c7-890d-bdac2cb1dbe3\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"1ded86d4-12db-4806-88a1-daed74122f87\",\"type\":\"UnionRenderers\"}},\"id\":\"f24fffc0-6227-4959-9848-dcca7f8f457c\",\"type\":\"ColumnDataSource\"},{\"attributes\":{},\"id\":\"9bb29dc9-3153-4af6-a021-e8f446fee913\",\"type\":\"WheelZoomTool\"},{\"attributes\":{\"overlay\":{\"id\":\"7856690d-f715-4084-91d7-00518c922bf3\",\"type\":\"BoxAnnotation\"}},\"id\":\"04386d73-0e7d-4fc6-92f5-aa3bf253c0f1\",\"type\":\"BoxZoomTool\"},{\"attributes\":{\"callback\":null,\"end\":4},\"id\":\"f5f8145f-67f4-4e37-902f-544f6cf3281a\",\"type\":\"Range1d\"},{\"attributes\":{},\"id\":\"333ed71e-a550-4504-902f-c91ddb3af01d\",\"type\":\"SaveTool\"},{\"attributes\":{},\"id\":\"e420a0fb-3516-41ac-b197-023ef958252c\",\"type\":\"ResetTool\"},{\"attributes\":{},\"id\":\"f4b87b77-b2b2-476f-81c9-0491c78af78c\",\"type\":\"LinearScale\"},{\"attributes\":{},\"id\":\"6210843c-a00f-47cb-bd4a-bcd746f15db7\",\"type\":\"HelpTool\"},{\"attributes\":{},\"id\":\"74a2b611-2146-4fb7-9f43-86e318a8e517\",\"type\":\"LinearScale\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"plot\":null,\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"7856690d-f715-4084-91d7-00518c922bf3\",\"type\":\"BoxAnnotation\"},{\"attributes\":{\"data_source\":{\"id\":\"f24fffc0-6227-4959-9848-dcca7f8f457c\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"700f6d89-4003-4145-805c-872a96d34d60\",\"type\":\"Circle\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"72165b78-1a32-4d7d-b8d0-394e950ce3d1\",\"type\":\"Circle\"},\"selection_glyph\":null,\"view\":{\"id\":\"7616e946-dbc1-4a2e-b7ea-7aa1917979ff\",\"type\":\"CDSView\"}},\"id\":\"4236f3ad-a71c-4399-bc82-fddc93dc657c\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"formatter\":{\"id\":\"9cbdc1e1-511a-4298-bf2c-89d61148369d\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"23fadcd1-8791-4bd3-90f9-9ad69f92c83a\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"5878ff36-700e-49ce-aa18-9b46a49edfc7\",\"type\":\"BasicTicker\"}},\"id\":\"7e7495e3-14cf-4d38-9b9c-1adc33c0d9d3\",\"type\":\"LinearAxis\"},{\"attributes\":{},\"id\":\"61f1405e-f6bc-4e03-9c1a-22c3cfa66322\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{},\"id\":\"5878ff36-700e-49ce-aa18-9b46a49edfc7\",\"type\":\"BasicTicker\"},{\"attributes\":{},\"id\":\"9cbdc1e1-511a-4298-bf2c-89d61148369d\",\"type\":\"BasicTickFormatter\"}],\"root_ids\":[\"23fadcd1-8791-4bd3-90f9-9ad69f92c83a\"]},\"title\":\"Bokeh Application\",\"version\":\"0.13.0\"}};\n", " var render_items = [{\"docid\":\"80be6481-69a0-4256-af2b-841e82168f1d\",\"notebook_comms_target\":\"57d55995-e32f-4af8-805e-b59a52eae988\",\"roots\":{\"23fadcd1-8791-4bd3-90f9-9ad69f92c83a\":\"403e1fda-40de-4480-8b11-687d16d1234d\"}}];\n", " root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n", "\n", " }\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " } else {\n", " var attempts = 0;\n", " var timer = setInterval(function(root) {\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " clearInterval(timer);\n", " }\n", " attempts++;\n", " if (attempts > 100) {\n", " console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\")\n", " clearInterval(timer);\n", " }\n", " }, 10, root)\n", " }\n", "})(window);" ], "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "id": "23fadcd1-8791-4bd3-90f9-9ad69f92c83a" } }, "output_type": "display_data" }, { "data": { "text/html": [ "

<Bokeh Notebook handle for In[1]>

" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from bokeh.io import show, output_notebook\n", "from bokeh.plotting import figure\n", "output_notebook()\n", "\n", "x = [2.5, 0.5, 2.2, 1.9, 3.1, 2.3, 2, 1, 1.5, 1.1]\n", "y = [2.4, 0.7, 2.9, 2.2, 3, 2.7, 1.6, 1.1, 1.6, 0.9]\n", "\n", "p = figure(plot_width=350, plot_height=350, x_range=(0, 4), y_range=(0, 4))\n", "p.circle(x, y, radius=0.04)\n", "show(p, notebook_handle=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pré-processamento" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "No pré-processamento vamos subtrair, em cada um dos exemplos dos dados, a média de cada dimensão. Para isso, calculamos a média na dimensão x, que chamaremos de $\\overline{x}$ e a média na dimensão y denotada por $\\overline{y}$. Em seguida, para cada um dos exemplos atribuiremos a coordenada $x$ o valor $x-\\overline{x}$ e para a coordenada $y$ o valor $y-\\overline{y}$. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "x_mean = np.mean(x)\n", "y_mean = np.mean(y)\n", "\n", "x = [i - x_mean for i in x]\n", "y = [i - y_mean for i in y]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Após esse procedimento, os dados passarão a ter média zero, fazendo com que eles fiquem centralizados em relação à origem. No gráfico abaixo temos os pontos centralizados." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "(function(root) {\n", " function embed_document(root) {\n", " \n", " var docs_json = {\"2853f9e4-4383-4c74-ad95-80c1dafed000\":{\"roots\":{\"references\":[{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"22ffd8c4-7515-4338-8ca6-5e38db53f17b\",\"type\":\"PanTool\"},{\"id\":\"1e66d02c-1f78-4650-abea-a2b0544af8e0\",\"type\":\"WheelZoomTool\"},{\"id\":\"e267e155-42cf-4e2c-bfa6-9ca458a66654\",\"type\":\"BoxZoomTool\"},{\"id\":\"dde13387-1426-43c6-b3c9-6a19c6146c17\",\"type\":\"SaveTool\"},{\"id\":\"c444138e-cc36-4317-ab32-aa4de6788d1a\",\"type\":\"ResetTool\"},{\"id\":\"4588a0e7-261e-497c-8e50-e52df7684b0f\",\"type\":\"HelpTool\"}]},\"id\":\"0c364c6f-005e-436d-a250-9d115ac5b381\",\"type\":\"Toolbar\"},{\"attributes\":{},\"id\":\"0f136831-9991-4ea5-bc82-d2010f2f22af\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"data_source\":{\"id\":\"352cf643-0c2a-4736-abe7-1c2d414e4f47\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"68e9a590-d1f7-4d3a-9c40-30e580d2729f\",\"type\":\"Circle\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"c40c01a3-3661-4d1a-8486-71b596c88929\",\"type\":\"Circle\"},\"selection_glyph\":null,\"view\":{\"id\":\"8671c23c-dab2-4d4b-bc63-9ca46a96ab08\",\"type\":\"CDSView\"}},\"id\":\"920a5b99-df45-4c0e-a536-cf2da8bb9c59\",\"type\":\"GlyphRenderer\"},{\"attributes\":{},\"id\":\"4588a0e7-261e-497c-8e50-e52df7684b0f\",\"type\":\"HelpTool\"},{\"attributes\":{\"callback\":null},\"id\":\"05c2ec7d-1e82-46e4-a3df-007c960d2275\",\"type\":\"DataRange1d\"},{\"attributes\":{},\"id\":\"c444138e-cc36-4317-ab32-aa4de6788d1a\",\"type\":\"ResetTool\"},{\"attributes\":{\"fill_color\":{\"value\":\"#1f77b4\"},\"line_color\":{\"value\":\"#1f77b4\"},\"radius\":{\"units\":\"data\",\"value\":0.035},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"68e9a590-d1f7-4d3a-9c40-30e580d2729f\",\"type\":\"Circle\"},{\"attributes\":{\"formatter\":{\"id\":\"0f136831-9991-4ea5-bc82-d2010f2f22af\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"a16b38a8-3f64-4f79-a84b-5a86ebfeaf24\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"5dfd28dd-d7cd-41b4-a216-33babb436559\",\"type\":\"BasicTicker\"}},\"id\":\"64d54e51-99c3-4c68-a167-049cb8539fb5\",\"type\":\"LinearAxis\"},{\"attributes\":{\"overlay\":{\"id\":\"c492d1b4-dbac-44a6-a4b6-50e325bbe867\",\"type\":\"BoxAnnotation\"}},\"id\":\"e267e155-42cf-4e2c-bfa6-9ca458a66654\",\"type\":\"BoxZoomTool\"},{\"attributes\":{},\"id\":\"d83dfe18-2c4e-4814-8279-4943771fee43\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{},\"id\":\"0448f4b3-00f1-42ad-b82d-69f5015d6d3e\",\"type\":\"LinearScale\"},{\"attributes\":{},\"id\":\"5dfd28dd-d7cd-41b4-a216-33babb436559\",\"type\":\"BasicTicker\"},{\"attributes\":{},\"id\":\"2c11728f-32de-47c9-97fe-5697edc51e6b\",\"type\":\"LinearScale\"},{\"attributes\":{\"plot\":{\"id\":\"a16b38a8-3f64-4f79-a84b-5a86ebfeaf24\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"5dfd28dd-d7cd-41b4-a216-33babb436559\",\"type\":\"BasicTicker\"}},\"id\":\"0193a816-7b2f-4789-9571-ad822fae11f8\",\"type\":\"Grid\"},{\"attributes\":{\"callback\":null},\"id\":\"278c87c0-ae5f-4ca4-96e8-2b7ed768cd19\",\"type\":\"DataRange1d\"},{\"attributes\":{\"below\":[{\"id\":\"64d54e51-99c3-4c68-a167-049cb8539fb5\",\"type\":\"LinearAxis\"}],\"left\":[{\"id\":\"832bcf19-7ae3-4411-ae27-94f138744e60\",\"type\":\"LinearAxis\"}],\"plot_height\":350,\"plot_width\":350,\"renderers\":[{\"id\":\"64d54e51-99c3-4c68-a167-049cb8539fb5\",\"type\":\"LinearAxis\"},{\"id\":\"0193a816-7b2f-4789-9571-ad822fae11f8\",\"type\":\"Grid\"},{\"id\":\"832bcf19-7ae3-4411-ae27-94f138744e60\",\"type\":\"LinearAxis\"},{\"id\":\"cff5fe61-3c1c-43da-9c6d-d54abf5874f8\",\"type\":\"Grid\"},{\"id\":\"c492d1b4-dbac-44a6-a4b6-50e325bbe867\",\"type\":\"BoxAnnotation\"},{\"id\":\"920a5b99-df45-4c0e-a536-cf2da8bb9c59\",\"type\":\"GlyphRenderer\"}],\"title\":{\"id\":\"f3a84283-ae54-44ee-a2f3-230e22e21c55\",\"type\":\"Title\"},\"toolbar\":{\"id\":\"0c364c6f-005e-436d-a250-9d115ac5b381\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"05c2ec7d-1e82-46e4-a3df-007c960d2275\",\"type\":\"DataRange1d\"},\"x_scale\":{\"id\":\"0448f4b3-00f1-42ad-b82d-69f5015d6d3e\",\"type\":\"LinearScale\"},\"y_range\":{\"id\":\"278c87c0-ae5f-4ca4-96e8-2b7ed768cd19\",\"type\":\"DataRange1d\"},\"y_scale\":{\"id\":\"2c11728f-32de-47c9-97fe-5697edc51e6b\",\"type\":\"LinearScale\"}},\"id\":\"a16b38a8-3f64-4f79-a84b-5a86ebfeaf24\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{\"callback\":null,\"data\":{\"x\":[0.69,-1.31,0.3900000000000001,0.08999999999999986,1.29,0.48999999999999977,0.18999999999999995,-0.81,-0.31000000000000005,-0.71],\"y\":[0.48999999999999977,-1.2100000000000002,0.9899999999999998,0.29000000000000004,1.0899999999999999,0.79,-0.31000000000000005,-0.81,-0.31000000000000005,-1.0100000000000002]},\"selected\":{\"id\":\"342af5a1-efcf-4f9f-ab1d-19bfb838fc55\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"828596c5-ec19-4656-aca7-f55bb56f1443\",\"type\":\"UnionRenderers\"}},\"id\":\"352cf643-0c2a-4736-abe7-1c2d414e4f47\",\"type\":\"ColumnDataSource\"},{\"attributes\":{},\"id\":\"342af5a1-efcf-4f9f-ab1d-19bfb838fc55\",\"type\":\"Selection\"},{\"attributes\":{\"formatter\":{\"id\":\"d83dfe18-2c4e-4814-8279-4943771fee43\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"a16b38a8-3f64-4f79-a84b-5a86ebfeaf24\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"e00942b3-e24b-41ac-81b6-855a547e9ee1\",\"type\":\"BasicTicker\"}},\"id\":\"832bcf19-7ae3-4411-ae27-94f138744e60\",\"type\":\"LinearAxis\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"plot\":null,\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"c492d1b4-dbac-44a6-a4b6-50e325bbe867\",\"type\":\"BoxAnnotation\"},{\"attributes\":{},\"id\":\"dde13387-1426-43c6-b3c9-6a19c6146c17\",\"type\":\"SaveTool\"},{\"attributes\":{},\"id\":\"e00942b3-e24b-41ac-81b6-855a547e9ee1\",\"type\":\"BasicTicker\"},{\"attributes\":{},\"id\":\"828596c5-ec19-4656-aca7-f55bb56f1443\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"plot\":null,\"text\":\"\"},\"id\":\"f3a84283-ae54-44ee-a2f3-230e22e21c55\",\"type\":\"Title\"},{\"attributes\":{\"dimension\":1,\"plot\":{\"id\":\"a16b38a8-3f64-4f79-a84b-5a86ebfeaf24\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"e00942b3-e24b-41ac-81b6-855a547e9ee1\",\"type\":\"BasicTicker\"}},\"id\":\"cff5fe61-3c1c-43da-9c6d-d54abf5874f8\",\"type\":\"Grid\"},{\"attributes\":{\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"value\":\"#1f77b4\"},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"radius\":{\"units\":\"data\",\"value\":0.035},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"c40c01a3-3661-4d1a-8486-71b596c88929\",\"type\":\"Circle\"},{\"attributes\":{\"source\":{\"id\":\"352cf643-0c2a-4736-abe7-1c2d414e4f47\",\"type\":\"ColumnDataSource\"}},\"id\":\"8671c23c-dab2-4d4b-bc63-9ca46a96ab08\",\"type\":\"CDSView\"},{\"attributes\":{},\"id\":\"22ffd8c4-7515-4338-8ca6-5e38db53f17b\",\"type\":\"PanTool\"},{\"attributes\":{},\"id\":\"1e66d02c-1f78-4650-abea-a2b0544af8e0\",\"type\":\"WheelZoomTool\"}],\"root_ids\":[\"a16b38a8-3f64-4f79-a84b-5a86ebfeaf24\"]},\"title\":\"Bokeh Application\",\"version\":\"0.13.0\"}};\n", " var render_items = [{\"docid\":\"2853f9e4-4383-4c74-ad95-80c1dafed000\",\"notebook_comms_target\":\"fa6f3ffa-acc3-40c5-ba5b-87069bd655fa\",\"roots\":{\"a16b38a8-3f64-4f79-a84b-5a86ebfeaf24\":\"16d9e866-7b4e-49bf-9642-8ccabfdd6430\"}}];\n", " root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n", "\n", " }\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " } else {\n", " var attempts = 0;\n", " var timer = setInterval(function(root) {\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " clearInterval(timer);\n", " }\n", " attempts++;\n", " if (attempts > 100) {\n", " console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\")\n", " clearInterval(timer);\n", " }\n", " }, 10, root)\n", " }\n", "})(window);" ], "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "id": "a16b38a8-3f64-4f79-a84b-5a86ebfeaf24" } }, "output_type": "display_data" }, { "data": { "text/html": [ "

<Bokeh Notebook handle for In[3]>

" ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p = figure(plot_width=350, plot_height=350)\n", "p.circle(x, y, radius=0.035)\n", "show(p, notebook_handle=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Encontrando as novas dimensões" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Uma dimensão ou componente principal pode ser representado por uma reta, plano ou hiperplano no espaço. O gráfico a seguir exibe uma reta que representa o primeiro componente principal encontrado ao executarmos a PCA usando os dados apresentados anteriormente." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "(function(root) {\n", " function embed_document(root) {\n", " \n", " var docs_json = {\"15d67ea5-ef45-4170-9e55-2584252884ff\":{\"roots\":{\"references\":[{\"attributes\":{\"below\":[{\"id\":\"11416728-9245-488c-9337-7e07961049e8\",\"type\":\"LinearAxis\"}],\"left\":[{\"id\":\"facdf69f-b79d-4142-bc2f-8efabdf6909e\",\"type\":\"LinearAxis\"}],\"plot_height\":350,\"plot_width\":350,\"renderers\":[{\"id\":\"11416728-9245-488c-9337-7e07961049e8\",\"type\":\"LinearAxis\"},{\"id\":\"a194fa95-2168-4773-b8ee-f0fba641c099\",\"type\":\"Grid\"},{\"id\":\"facdf69f-b79d-4142-bc2f-8efabdf6909e\",\"type\":\"LinearAxis\"},{\"id\":\"a50beda3-2eec-4fa5-ad94-2edb4d64d227\",\"type\":\"Grid\"},{\"id\":\"8e49c342-5c18-4567-addf-a69abe617ce1\",\"type\":\"BoxAnnotation\"},{\"id\":\"84bd044b-d21e-4b02-affa-084a7a211c46\",\"type\":\"GlyphRenderer\"},{\"id\":\"41c4d4c1-7293-417f-b367-fe8a9533db03\",\"type\":\"GlyphRenderer\"}],\"title\":{\"id\":\"b4547538-57fb-4689-8e60-eadf83e7d5d6\",\"type\":\"Title\"},\"toolbar\":{\"id\":\"94b848b2-026f-40f9-a00b-4c32c27e0914\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"59ace931-276c-4fa6-8ea1-2662fcbfcb5f\",\"type\":\"DataRange1d\"},\"x_scale\":{\"id\":\"68743752-9306-4365-8e9b-e1db1e77cc5a\",\"type\":\"LinearScale\"},\"y_range\":{\"id\":\"9aa7a5c8-e6af-47be-b981-31c27124c8b8\",\"type\":\"DataRange1d\"},\"y_scale\":{\"id\":\"799d2201-a4e2-4c76-a70a-d9b27264ea4f\",\"type\":\"LinearScale\"}},\"id\":\"c5d2b14e-50a1-47f8-a6b6-c6d1039d0027\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{\"line_width\":2,\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"9d9e4c04-2378-400f-8eb8-794f7cee3068\",\"type\":\"Line\"},{\"attributes\":{\"callback\":null},\"id\":\"9aa7a5c8-e6af-47be-b981-31c27124c8b8\",\"type\":\"DataRange1d\"},{\"attributes\":{\"line_alpha\":0.1,\"line_color\":\"#1f77b4\",\"line_width\":2,\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"82213fff-f515-417a-a15a-bcf1699486af\",\"type\":\"Line\"},{\"attributes\":{},\"id\":\"799d2201-a4e2-4c76-a70a-d9b27264ea4f\",\"type\":\"LinearScale\"},{\"attributes\":{},\"id\":\"8dc68d4b-7d3a-47af-a7d9-a855665a8db7\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{},\"id\":\"bbc2c6ba-bae9-4a59-9914-4df575edc518\",\"type\":\"HelpTool\"},{\"attributes\":{\"formatter\":{\"id\":\"a84b473b-63fe-425f-ab3f-082a81ed3ccc\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"c5d2b14e-50a1-47f8-a6b6-c6d1039d0027\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"e53e818d-10ba-4f10-ba01-0e5a6d33844c\",\"type\":\"BasicTicker\"}},\"id\":\"facdf69f-b79d-4142-bc2f-8efabdf6909e\",\"type\":\"LinearAxis\"},{\"attributes\":{\"data_source\":{\"id\":\"a631906a-d444-46e4-b68e-9eee2aedb5b8\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"9d9e4c04-2378-400f-8eb8-794f7cee3068\",\"type\":\"Line\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"82213fff-f515-417a-a15a-bcf1699486af\",\"type\":\"Line\"},\"selection_glyph\":null,\"view\":{\"id\":\"4993c38e-92a1-47f5-958b-fd642c58fccf\",\"type\":\"CDSView\"}},\"id\":\"41c4d4c1-7293-417f-b367-fe8a9533db03\",\"type\":\"GlyphRenderer\"},{\"attributes\":{},\"id\":\"6a72c9cb-8194-4dd1-acad-8f5341cb0d76\",\"type\":\"BasicTicker\"},{\"attributes\":{\"source\":{\"id\":\"a631906a-d444-46e4-b68e-9eee2aedb5b8\",\"type\":\"ColumnDataSource\"}},\"id\":\"4993c38e-92a1-47f5-958b-fd642c58fccf\",\"type\":\"CDSView\"},{\"attributes\":{\"plot\":null,\"text\":\"\"},\"id\":\"b4547538-57fb-4689-8e60-eadf83e7d5d6\",\"type\":\"Title\"},{\"attributes\":{\"dimension\":1,\"plot\":{\"id\":\"c5d2b14e-50a1-47f8-a6b6-c6d1039d0027\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"e53e818d-10ba-4f10-ba01-0e5a6d33844c\",\"type\":\"BasicTicker\"}},\"id\":\"a50beda3-2eec-4fa5-ad94-2edb4d64d227\",\"type\":\"Grid\"},{\"attributes\":{},\"id\":\"1084b342-bf9c-4bd6-8078-a5d3281d0f5f\",\"type\":\"UnionRenderers\"},{\"attributes\":{},\"id\":\"454f7196-1545-41f1-b859-b1597dab5256\",\"type\":\"ResetTool\"},{\"attributes\":{\"plot\":{\"id\":\"c5d2b14e-50a1-47f8-a6b6-c6d1039d0027\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"6a72c9cb-8194-4dd1-acad-8f5341cb0d76\",\"type\":\"BasicTicker\"}},\"id\":\"a194fa95-2168-4773-b8ee-f0fba641c099\",\"type\":\"Grid\"},{\"attributes\":{\"overlay\":{\"id\":\"8e49c342-5c18-4567-addf-a69abe617ce1\",\"type\":\"BoxAnnotation\"}},\"id\":\"1a22c6fb-553a-48cc-bb64-da62e65a9e11\",\"type\":\"BoxZoomTool\"},{\"attributes\":{},\"id\":\"a42213d0-00c3-4f49-a625-eaff8965fad8\",\"type\":\"WheelZoomTool\"},{\"attributes\":{},\"id\":\"60c52021-461b-4e13-8089-cf02c483b87c\",\"type\":\"SaveTool\"},{\"attributes\":{\"callback\":null},\"id\":\"59ace931-276c-4fa6-8ea1-2662fcbfcb5f\",\"type\":\"DataRange1d\"},{\"attributes\":{},\"id\":\"a84b473b-63fe-425f-ab3f-082a81ed3ccc\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"plot\":null,\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"8e49c342-5c18-4567-addf-a69abe617ce1\",\"type\":\"BoxAnnotation\"},{\"attributes\":{\"source\":{\"id\":\"9144cfc5-0ec4-4b2f-9ecf-e2260085cc24\",\"type\":\"ColumnDataSource\"}},\"id\":\"9f34402b-9d7d-4500-aa3d-aa1e8f7f9bd9\",\"type\":\"CDSView\"},{\"attributes\":{\"fill_color\":{\"value\":\"#1f77b4\"},\"line_color\":{\"value\":\"#1f77b4\"},\"radius\":{\"units\":\"data\",\"value\":0.035},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"bfcc0d85-7ca9-48af-8055-c5a40cdb3a8b\",\"type\":\"Circle\"},{\"attributes\":{\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"value\":\"#1f77b4\"},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"radius\":{\"units\":\"data\",\"value\":0.035},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"8920c462-d844-49fc-9e30-5536db22ca50\",\"type\":\"Circle\"},{\"attributes\":{},\"id\":\"605f3358-a914-4377-9e8b-71eaf39ec421\",\"type\":\"Selection\"},{\"attributes\":{\"data_source\":{\"id\":\"9144cfc5-0ec4-4b2f-9ecf-e2260085cc24\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"bfcc0d85-7ca9-48af-8055-c5a40cdb3a8b\",\"type\":\"Circle\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"8920c462-d844-49fc-9e30-5536db22ca50\",\"type\":\"Circle\"},\"selection_glyph\":null,\"view\":{\"id\":\"9f34402b-9d7d-4500-aa3d-aa1e8f7f9bd9\",\"type\":\"CDSView\"}},\"id\":\"84bd044b-d21e-4b02-affa-084a7a211c46\",\"type\":\"GlyphRenderer\"},{\"attributes\":{},\"id\":\"4a6852a1-afca-4574-9c26-a6495c7b4307\",\"type\":\"Selection\"},{\"attributes\":{},\"id\":\"e53e818d-10ba-4f10-ba01-0e5a6d33844c\",\"type\":\"BasicTicker\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"9c77fcb6-4ecc-4e58-b39d-8ba0332075b2\",\"type\":\"PanTool\"},{\"id\":\"a42213d0-00c3-4f49-a625-eaff8965fad8\",\"type\":\"WheelZoomTool\"},{\"id\":\"1a22c6fb-553a-48cc-bb64-da62e65a9e11\",\"type\":\"BoxZoomTool\"},{\"id\":\"60c52021-461b-4e13-8089-cf02c483b87c\",\"type\":\"SaveTool\"},{\"id\":\"454f7196-1545-41f1-b859-b1597dab5256\",\"type\":\"ResetTool\"},{\"id\":\"bbc2c6ba-bae9-4a59-9914-4df575edc518\",\"type\":\"HelpTool\"}]},\"id\":\"94b848b2-026f-40f9-a00b-4c32c27e0914\",\"type\":\"Toolbar\"},{\"attributes\":{},\"id\":\"a1debb21-ac1b-49a0-ae5a-99f27e302e01\",\"type\":\"UnionRenderers\"},{\"attributes\":{},\"id\":\"68743752-9306-4365-8e9b-e1db1e77cc5a\",\"type\":\"LinearScale\"},{\"attributes\":{},\"id\":\"9c77fcb6-4ecc-4e58-b39d-8ba0332075b2\",\"type\":\"PanTool\"},{\"attributes\":{\"formatter\":{\"id\":\"8dc68d4b-7d3a-47af-a7d9-a855665a8db7\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"c5d2b14e-50a1-47f8-a6b6-c6d1039d0027\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"6a72c9cb-8194-4dd1-acad-8f5341cb0d76\",\"type\":\"BasicTicker\"}},\"id\":\"11416728-9245-488c-9337-7e07961049e8\",\"type\":\"LinearAxis\"},{\"attributes\":{\"callback\":null,\"data\":{\"x\":[-1.3557468,1.3557468],\"y\":[-1.47035732,1.47035732]},\"selected\":{\"id\":\"605f3358-a914-4377-9e8b-71eaf39ec421\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"a1debb21-ac1b-49a0-ae5a-99f27e302e01\",\"type\":\"UnionRenderers\"}},\"id\":\"a631906a-d444-46e4-b68e-9eee2aedb5b8\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"callback\":null,\"data\":{\"x\":[0.69,-1.31,0.3900000000000001,0.08999999999999986,1.29,0.48999999999999977,0.18999999999999995,-0.81,-0.31000000000000005,-0.71],\"y\":[0.48999999999999977,-1.2100000000000002,0.9899999999999998,0.29000000000000004,1.0899999999999999,0.79,-0.31000000000000005,-0.81,-0.31000000000000005,-1.0100000000000002]},\"selected\":{\"id\":\"4a6852a1-afca-4574-9c26-a6495c7b4307\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"1084b342-bf9c-4bd6-8078-a5d3281d0f5f\",\"type\":\"UnionRenderers\"}},\"id\":\"9144cfc5-0ec4-4b2f-9ecf-e2260085cc24\",\"type\":\"ColumnDataSource\"}],\"root_ids\":[\"c5d2b14e-50a1-47f8-a6b6-c6d1039d0027\"]},\"title\":\"Bokeh Application\",\"version\":\"0.13.0\"}};\n", " var render_items = [{\"docid\":\"15d67ea5-ef45-4170-9e55-2584252884ff\",\"notebook_comms_target\":\"7f508492-e9f7-4b56-8cca-6b47fe1c6687\",\"roots\":{\"c5d2b14e-50a1-47f8-a6b6-c6d1039d0027\":\"e5a3802b-1990-4400-ac58-547beaa699c4\"}}];\n", " root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n", "\n", " }\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " } else {\n", " var attempts = 0;\n", " var timer = setInterval(function(root) {\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " clearInterval(timer);\n", " }\n", " attempts++;\n", " if (attempts > 100) {\n", " console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\")\n", " clearInterval(timer);\n", " }\n", " }, 10, root)\n", " }\n", "})(window);" ], "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "id": "c5d2b14e-50a1-47f8-a6b6-c6d1039d0027" } }, "output_type": "display_data" }, { "data": { "text/html": [ "

<Bokeh Notebook handle for In[4]>

" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p = figure(plot_width=350, plot_height=350)\n", "p.circle(x, y, radius=0.035)\n", "p.line([-0.6778734*2 , -0.6778734*-2], [-0.73517866*2 , -0.73517866*-2], line_width=2, color=\"black\")\n", "show(p, notebook_handle=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Para representar os pontos que estão no espaço bidimensional numa única dimensão, basta projetá-los no componente principal. A figura abaixo mostra as projeções dos pontos na reta (esquerda) e os pontos representados em apenas uma dimensão (direita)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![title](2dto1d.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A reta que representa o componente principal segue a direção na qual os dados mais variam. Com isso, se projetarmos os pontos do espaço bidimensional nela, então a variância dos pontos na reta é a maior possível.\n", "Também podemos entender a reta encontrada como a reta que minimiza a soma do quadrado das distâncias perpendiculares dos pontos até a reta." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Matriz de covariância" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Para encontrar os componentes principais, o primeiro passo é calcular a matriz de covariância.\n", "Mas para isso, primeiro precisamos entender o que é a covariância, que é definida como a medida do grau de variação conjunta de duas variáveis. \n", "\n", "Dadas duas variáveis X e Y, a covariância é positiva quando elas tendem a variar no mesmo sentido, isto é, os maiores valores de X estão associados aos maiores valores de Y, assim como os menores valores de X estão associados aos menores valores de Y. No caso contrário, no qual os menores valores de X estão associados aos maiores valores de Y e os maiores valores de X estão associados aos menores valores de Y, a covariância tem valor negativo.\n", "\n", "A fórmula para calcular a covariância com os dados centralizados é:\n", "\n", "$$cov(X, Y) = \\frac{1}{n-1} \\sum_{i}^{n}X_iY_i$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Onde $X_i$ é o valor da variável X do i-ésimo exemplo e $Y_i$ é o valor da variável Y do i-ésimo exemplo." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A função abaixo calcula a covariância de duas variáveis. Os valores das variáveis são passados como listas, o parâmetro x recebe os valores da primeira variável e o parâmetro y da segunda variável. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def covariance(x, y):\n", " result = 0\n", " \n", " for i in range(len(x)):\n", " result += x[i]*y[i]\n", " \n", " return result/(len(x)-1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Passando os dados do exemplo para função temos:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.6154444444444445" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "covariance(x, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Para um dataset com duas características, nós podemos apenas calcular a covariância entre elas. Entretanto, para dados com mais características, que são representados por mais dimensões, temos mais possibilidades.\n", "Por exemplo, com três dimensões (X, Y e Z), é possível calcular $cov(X, Y)$, $cov(X, Z)$ e $cov(Y,Z)$.\n", "\n", "Com isso, a matriz de covariância é utilizada para sumarizar todas as covariâncias calculadas para qualquer número de dimensões. A matriz de covariância tem N linhas e N colunas, sendo N o número de dimensões. Dessa forma, para três dimensões, a matriz tem tamanho 3x3 e seus elementos são:\n", "\n", "$$ C = \n", "\\begin{bmatrix}\n", " cov(x,x) & cov(x,y) & cov(x,z) \\\\\n", " cov(y,x) & cov(y,y) & cov(y,z) \\\\\n", " cov(z,x) & cov(z,y) & cov(z,z) \n", "\\end{bmatrix}\n", "$$\n", "\n", "Note que ao longo da diagonal principal, tem-se a covariância de uma dimensão em relação a ela mesma. Esse valor é chamado de variância. Além disso, é importante ressaltar que $cov(x,y) = cov(y,x)$ para qualquer x e y, assim a matriz é simétrica em relação à diagonal principal.\n", "\n", "A função a seguir calcula a matriz de covariância para N dimensões que são passadas em uma lista como parâmetro." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[[0.6165555555555556, 0.6154444444444445],\n", " [0.6154444444444445, 0.7165555555555555]]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def covariance_matrix(dimensions):\n", " covariance_matrix = []\n", "\n", " for i in range(len(dimensions)):\n", " covariance_matrix.append([])\n", " for j in range(len(dimensions)):\n", " covariance_matrix[-1].append(covariance(dimensions[i], dimensions[j]))\n", "\n", " return covariance_matrix\n", "\n", "covariance_matrix([x, y])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Outra forma de calcular a matriz de covariância é utilizando a função `cov()` do numpy:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0.61655556, 0.61544444],\n", " [0.61544444, 0.71655556]])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.cov(x, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Autovalores e Autovetores" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "O próximo passo para encontrar os componentes principais é calcular os autovalores e autovetores da matriz de covariância. Os autovetores encontrados são os vetores utilizados para projetar os dados maximizando a variância. \n", "Anteriormente usamos uma reta para representar uma dimensão, mas lembre-se que podemos definir uma reta a partir de um vetor.\n", "Já os autovalores trazem o valor da variância dos dados projetados no autovetor correspondente.\n", "\n", "Podemos calcular os autovalores e autovetores através da função `eig()` do numpy:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.0490834 1.28402771]\n", "[[-0.73517866 -0.6778734 ]\n", " [ 0.6778734 -0.73517866]]\n" ] } ], "source": [ "from numpy.linalg import eig\n", "\n", "eigenvalues, eigenvectors = eig(np.cov(x, y))\n", "\n", "print(eigenvalues)\n", "print(eigenvectors)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nesse caso, os autovetores são dados pelas colunas da matriz `eigenvectors`, assim para acessar os autovetores vamos usar:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Autovetor 0: [-0.73517866 0.6778734 ]\n", "Autovetor 1: [-0.6778734 -0.73517866]\n" ] } ], "source": [ "print(f\"Autovetor 0: {eigenvectors[:,0]}\")\n", "print(f\"Autovetor 1: {eigenvectors[:,1]}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Para os autovalores, o primeiro elemento é o autovalor associado ao primeiro autovetor, o segundo elemento ao segundo autovetor e assim sucessivamente." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Autovalor 0: 0.04908339893832736\n", "Autovalor 1: 1.2840277121727839\n" ] } ], "source": [ "print(f\"Autovalor 0: {eigenvalues[0]}\")\n", "print(f\"Autovalor 1: {eigenvalues[1]}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Escolhendo os componentes principais" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Seguindo o exemplo, vamos escolher um autovetor para reduzir o dataset de duas dimensões para uma dimensão. Para isso, vamos escolher o autovetor com o maior autovalor, ou seja, o autovetor [-0.6778734 -0.73517866]." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "principal_component = eigenvectors[:,1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Colocando o vetor no gráfico a partir da origem e com tamanho 1, temos:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "(function(root) {\n", " function embed_document(root) {\n", " \n", " var docs_json = {\"edffd8f3-7be8-42be-afe9-8af35f222bed\":{\"roots\":{\"references\":[{\"attributes\":{},\"id\":\"0f120e1a-8b54-4469-b460-1f4edd431107\",\"type\":\"ResetTool\"},{\"attributes\":{\"callback\":null},\"id\":\"221dc08e-f082-4acb-9fd2-0379cc47f0f7\",\"type\":\"DataRange1d\"},{\"attributes\":{},\"id\":\"9dac8cd0-c1f9-4398-81a5-5ae94c4c865e\",\"type\":\"HelpTool\"},{\"attributes\":{},\"id\":\"dfe7d0ad-1d02-4f67-a0a6-4bea20f2b52e\",\"type\":\"LinearScale\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"plot\":null,\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"881d761f-31e0-48be-a39d-9264e6c25a21\",\"type\":\"BoxAnnotation\"},{\"attributes\":{},\"id\":\"c98ea95f-d41b-4d0e-bf3b-cc7ff4f5d679\",\"type\":\"LinearScale\"},{\"attributes\":{\"plot\":{\"id\":\"2f5f6ded-351f-4c0b-81e7-78a0d568aa3a\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"9eb70104-b2b7-44fd-9987-d62bf497bc64\",\"type\":\"BasicTicker\"}},\"id\":\"10d3217b-fdb6-46d6-ba28-7c8c1887edad\",\"type\":\"Grid\"},{\"attributes\":{\"formatter\":{\"id\":\"c8bd0d9e-0769-4629-a9bd-0e545250b155\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"2f5f6ded-351f-4c0b-81e7-78a0d568aa3a\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"9eb70104-b2b7-44fd-9987-d62bf497bc64\",\"type\":\"BasicTicker\"}},\"id\":\"6c174238-889a-4e5d-8de6-75838242acef\",\"type\":\"LinearAxis\"},{\"attributes\":{\"end\":{\"id\":\"47449e8e-ccd1-4754-b17c-761141a3f660\",\"type\":\"NormalHead\"},\"line_width\":{\"value\":2},\"plot\":{\"id\":\"2f5f6ded-351f-4c0b-81e7-78a0d568aa3a\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"source\":null,\"start\":null,\"x_end\":{\"value\":-0.6778733985280117},\"x_start\":{\"value\":0},\"y_end\":{\"value\":-0.735178655544408},\"y_start\":{\"value\":0}},\"id\":\"3420139b-1643-483e-a4ad-706c09e425c6\",\"type\":\"Arrow\"},{\"attributes\":{},\"id\":\"9eb70104-b2b7-44fd-9987-d62bf497bc64\",\"type\":\"BasicTicker\"},{\"attributes\":{\"plot\":null,\"text\":\"\"},\"id\":\"4c165865-075d-486b-a77d-769a934c6b86\",\"type\":\"Title\"},{\"attributes\":{\"formatter\":{\"id\":\"82ab2b62-a874-465d-9b8d-4689cc6666a3\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"2f5f6ded-351f-4c0b-81e7-78a0d568aa3a\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"ce297b61-20ff-4a1d-b994-3cfbb0a78456\",\"type\":\"BasicTicker\"}},\"id\":\"c186057f-7aee-4a1e-9855-82d9434d6649\",\"type\":\"LinearAxis\"},{\"attributes\":{},\"id\":\"580ffa86-15cb-45de-aaa7-d43f09a6aec0\",\"type\":\"Selection\"},{\"attributes\":{},\"id\":\"ce297b61-20ff-4a1d-b994-3cfbb0a78456\",\"type\":\"BasicTicker\"},{\"attributes\":{},\"id\":\"f998b8b5-ee14-4c16-a745-c27e571967b7\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"dimension\":1,\"plot\":{\"id\":\"2f5f6ded-351f-4c0b-81e7-78a0d568aa3a\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"ce297b61-20ff-4a1d-b994-3cfbb0a78456\",\"type\":\"BasicTicker\"}},\"id\":\"b7d43e54-79da-48a7-8a2e-93e640d6a1bc\",\"type\":\"Grid\"},{\"attributes\":{},\"id\":\"82ab2b62-a874-465d-9b8d-4689cc6666a3\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{},\"id\":\"c8bd0d9e-0769-4629-a9bd-0e545250b155\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"plot\":null,\"size\":15},\"id\":\"47449e8e-ccd1-4754-b17c-761141a3f660\",\"type\":\"NormalHead\"},{\"attributes\":{\"callback\":null,\"data\":{\"x\":[0.69,-1.31,0.3900000000000001,0.08999999999999986,1.29,0.48999999999999977,0.18999999999999995,-0.81,-0.31000000000000005,-0.71],\"y\":[0.48999999999999977,-1.2100000000000002,0.9899999999999998,0.29000000000000004,1.0899999999999999,0.79,-0.31000000000000005,-0.81,-0.31000000000000005,-1.0100000000000002]},\"selected\":{\"id\":\"580ffa86-15cb-45de-aaa7-d43f09a6aec0\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"f998b8b5-ee14-4c16-a745-c27e571967b7\",\"type\":\"UnionRenderers\"}},\"id\":\"adb4308a-8d4b-4d2d-8d3a-9ee8deeeb6db\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"below\":[{\"id\":\"6c174238-889a-4e5d-8de6-75838242acef\",\"type\":\"LinearAxis\"}],\"left\":[{\"id\":\"c186057f-7aee-4a1e-9855-82d9434d6649\",\"type\":\"LinearAxis\"}],\"plot_height\":350,\"plot_width\":350,\"renderers\":[{\"id\":\"6c174238-889a-4e5d-8de6-75838242acef\",\"type\":\"LinearAxis\"},{\"id\":\"10d3217b-fdb6-46d6-ba28-7c8c1887edad\",\"type\":\"Grid\"},{\"id\":\"c186057f-7aee-4a1e-9855-82d9434d6649\",\"type\":\"LinearAxis\"},{\"id\":\"b7d43e54-79da-48a7-8a2e-93e640d6a1bc\",\"type\":\"Grid\"},{\"id\":\"881d761f-31e0-48be-a39d-9264e6c25a21\",\"type\":\"BoxAnnotation\"},{\"id\":\"290c34cf-b913-4c78-99a6-cd07c35c5906\",\"type\":\"GlyphRenderer\"},{\"id\":\"3420139b-1643-483e-a4ad-706c09e425c6\",\"type\":\"Arrow\"}],\"title\":{\"id\":\"4c165865-075d-486b-a77d-769a934c6b86\",\"type\":\"Title\"},\"toolbar\":{\"id\":\"7c3c3e0b-0433-46fc-a953-115ec5e60e4c\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"890314b2-70a7-4df9-8cf9-cdcb8a6ddfab\",\"type\":\"DataRange1d\"},\"x_scale\":{\"id\":\"dfe7d0ad-1d02-4f67-a0a6-4bea20f2b52e\",\"type\":\"LinearScale\"},\"y_range\":{\"id\":\"221dc08e-f082-4acb-9fd2-0379cc47f0f7\",\"type\":\"DataRange1d\"},\"y_scale\":{\"id\":\"c98ea95f-d41b-4d0e-bf3b-cc7ff4f5d679\",\"type\":\"LinearScale\"}},\"id\":\"2f5f6ded-351f-4c0b-81e7-78a0d568aa3a\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{},\"id\":\"26a499e5-a9dc-4ea0-88a3-795306a17c92\",\"type\":\"PanTool\"},{\"attributes\":{\"fill_color\":{\"value\":\"#1f77b4\"},\"line_color\":{\"value\":\"#1f77b4\"},\"radius\":{\"units\":\"data\",\"value\":0.035},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"f710ccb5-145c-45dc-b169-e21be71e9914\",\"type\":\"Circle\"},{\"attributes\":{\"data_source\":{\"id\":\"adb4308a-8d4b-4d2d-8d3a-9ee8deeeb6db\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"f710ccb5-145c-45dc-b169-e21be71e9914\",\"type\":\"Circle\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"a3f99d1a-2b11-4cec-8b97-cb46f217fbe7\",\"type\":\"Circle\"},\"selection_glyph\":null,\"view\":{\"id\":\"7b28c110-7204-4884-9efe-27c420e5935a\",\"type\":\"CDSView\"}},\"id\":\"290c34cf-b913-4c78-99a6-cd07c35c5906\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"value\":\"#1f77b4\"},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"radius\":{\"units\":\"data\",\"value\":0.035},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"a3f99d1a-2b11-4cec-8b97-cb46f217fbe7\",\"type\":\"Circle\"},{\"attributes\":{},\"id\":\"dd422649-690e-4c67-813a-b7024c910fdb\",\"type\":\"WheelZoomTool\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"26a499e5-a9dc-4ea0-88a3-795306a17c92\",\"type\":\"PanTool\"},{\"id\":\"dd422649-690e-4c67-813a-b7024c910fdb\",\"type\":\"WheelZoomTool\"},{\"id\":\"f7a83350-5a58-497d-9c4b-f0ce79281055\",\"type\":\"BoxZoomTool\"},{\"id\":\"6b66881e-2555-493c-becf-5ed0f359cc9d\",\"type\":\"SaveTool\"},{\"id\":\"0f120e1a-8b54-4469-b460-1f4edd431107\",\"type\":\"ResetTool\"},{\"id\":\"9dac8cd0-c1f9-4398-81a5-5ae94c4c865e\",\"type\":\"HelpTool\"}]},\"id\":\"7c3c3e0b-0433-46fc-a953-115ec5e60e4c\",\"type\":\"Toolbar\"},{\"attributes\":{\"overlay\":{\"id\":\"881d761f-31e0-48be-a39d-9264e6c25a21\",\"type\":\"BoxAnnotation\"}},\"id\":\"f7a83350-5a58-497d-9c4b-f0ce79281055\",\"type\":\"BoxZoomTool\"},{\"attributes\":{\"callback\":null},\"id\":\"890314b2-70a7-4df9-8cf9-cdcb8a6ddfab\",\"type\":\"DataRange1d\"},{\"attributes\":{},\"id\":\"6b66881e-2555-493c-becf-5ed0f359cc9d\",\"type\":\"SaveTool\"},{\"attributes\":{\"source\":{\"id\":\"adb4308a-8d4b-4d2d-8d3a-9ee8deeeb6db\",\"type\":\"ColumnDataSource\"}},\"id\":\"7b28c110-7204-4884-9efe-27c420e5935a\",\"type\":\"CDSView\"}],\"root_ids\":[\"2f5f6ded-351f-4c0b-81e7-78a0d568aa3a\"]},\"title\":\"Bokeh Application\",\"version\":\"0.13.0\"}};\n", " var render_items = [{\"docid\":\"edffd8f3-7be8-42be-afe9-8af35f222bed\",\"notebook_comms_target\":\"ac41c740-1112-45dc-9e07-163970aebb62\",\"roots\":{\"2f5f6ded-351f-4c0b-81e7-78a0d568aa3a\":\"42e784cd-97ce-4576-a65a-162b6850f9cb\"}}];\n", " root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n", "\n", " }\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " } else {\n", " var attempts = 0;\n", " var timer = setInterval(function(root) {\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " clearInterval(timer);\n", " }\n", " attempts++;\n", " if (attempts > 100) {\n", " console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\")\n", " clearInterval(timer);\n", " }\n", " }, 10, root)\n", " }\n", "})(window);" ], "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "id": "2f5f6ded-351f-4c0b-81e7-78a0d568aa3a" } }, "output_type": "display_data" }, { "data": { "text/html": [ "

<Bokeh Notebook handle for In[13]>

" ], "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from bokeh.models import Arrow, NormalHead\n", "\n", "vector_size = 1\n", "pcx = principal_component[0]\n", "pcy = principal_component[1]\n", "\n", "p = figure(plot_width=350, plot_height=350)\n", "p.circle(x, y, radius=0.035)\n", "p.add_layout(Arrow(end=NormalHead(fill_color=\"black\", size=15),\n", " x_start=0, y_start=0,\n", " x_end=pcx*vector_size, y_end=pcy*vector_size,\n", " line_width=2))\n", "show(p, notebook_handle=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "O vetor do gráfico acima tem tamanho 1, mas se expandirmos ele nas suas duas direções, encontraremos a mesma reta que minimiza as distâncias quadráticas perpendiculares. O gráfico a seguir mostra a reta em vermelho." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "(function(root) {\n", " function embed_document(root) {\n", " \n", " var docs_json = {\"dfb69e51-de44-4b04-820f-9fba3dbb6188\":{\"roots\":{\"references\":[{\"attributes\":{\"plot\":null,\"text\":\"\"},\"id\":\"eacfaef3-86a1-4b83-ae79-0a3bf4889e53\",\"type\":\"Title\"},{\"attributes\":{\"callback\":null,\"data\":{\"x\":[-1.3557467970560233,1.3557467970560233],\"y\":[-1.470357311088816,1.470357311088816]},\"selected\":{\"id\":\"cf48462e-0b57-4086-ba74-17ff9d162b4a\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"234719c0-124e-4654-b4b1-2d631765c54b\",\"type\":\"UnionRenderers\"}},\"id\":\"270ad8d5-152e-4aaa-b2c2-8d88a8ad6531\",\"type\":\"ColumnDataSource\"},{\"attributes\":{},\"id\":\"0ef3e2d2-4c85-4c25-8a65-ec8d755c7703\",\"type\":\"PanTool\"},{\"attributes\":{\"source\":{\"id\":\"71bf6077-0448-4378-af26-41903920061c\",\"type\":\"ColumnDataSource\"}},\"id\":\"eaf8eb8a-a219-4deb-9f1d-778a0f7d8afe\",\"type\":\"CDSView\"},{\"attributes\":{},\"id\":\"eb9964b4-767d-49f1-862e-f5e568d84051\",\"type\":\"WheelZoomTool\"},{\"attributes\":{\"overlay\":{\"id\":\"587c0d90-7407-43f7-84f6-9e65fa0f80e0\",\"type\":\"BoxAnnotation\"}},\"id\":\"b3750cc1-9a22-46e0-b54a-069ea2a2d8f8\",\"type\":\"BoxZoomTool\"},{\"attributes\":{},\"id\":\"bae6c087-f820-4091-aad7-dbcecd354e5c\",\"type\":\"SaveTool\"},{\"attributes\":{},\"id\":\"47a8720f-b544-4934-9f8e-61e7dd447014\",\"type\":\"ResetTool\"},{\"attributes\":{},\"id\":\"6fa4924a-9d61-4f9e-82bf-e2d28ea08108\",\"type\":\"HelpTool\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"plot\":null,\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"587c0d90-7407-43f7-84f6-9e65fa0f80e0\",\"type\":\"BoxAnnotation\"},{\"attributes\":{\"plot\":null,\"size\":15},\"id\":\"55d3d13a-0477-467a-8352-16f0f7b44af2\",\"type\":\"NormalHead\"},{\"attributes\":{\"line_color\":\"red\",\"line_width\":2,\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"0afa1baf-967c-4060-a321-136ace6c0b8b\",\"type\":\"Line\"},{\"attributes\":{},\"id\":\"51370f1b-b4a9-4bbe-9807-aae512b2a96f\",\"type\":\"Selection\"},{\"attributes\":{\"plot\":{\"id\":\"53a53bf9-8090-45b7-b0e6-a4c4cc7b4e71\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"99b99735-574f-42a3-a738-de46a8aefb56\",\"type\":\"BasicTicker\"}},\"id\":\"1db357dc-5c2f-4b37-abc7-47e95ecc0580\",\"type\":\"Grid\"},{\"attributes\":{},\"id\":\"595eaf07-ad2c-4ad4-b340-38592f2ce150\",\"type\":\"BasicTicker\"},{\"attributes\":{\"formatter\":{\"id\":\"8f09c6ec-ed9f-4ee7-92fd-8f9f271c5113\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"53a53bf9-8090-45b7-b0e6-a4c4cc7b4e71\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"595eaf07-ad2c-4ad4-b340-38592f2ce150\",\"type\":\"BasicTicker\"}},\"id\":\"013fa3bc-44f9-448c-af68-fafc5a46b932\",\"type\":\"LinearAxis\"},{\"attributes\":{\"line_alpha\":0.1,\"line_color\":\"#1f77b4\",\"line_width\":2,\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"839afe2e-0f85-49fa-bc71-93c67d55a85c\",\"type\":\"Line\"},{\"attributes\":{},\"id\":\"cf48462e-0b57-4086-ba74-17ff9d162b4a\",\"type\":\"Selection\"},{\"attributes\":{\"data_source\":{\"id\":\"270ad8d5-152e-4aaa-b2c2-8d88a8ad6531\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"0afa1baf-967c-4060-a321-136ace6c0b8b\",\"type\":\"Line\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"839afe2e-0f85-49fa-bc71-93c67d55a85c\",\"type\":\"Line\"},\"selection_glyph\":null,\"view\":{\"id\":\"efabb731-0f4e-4087-830c-5a31d67345ca\",\"type\":\"CDSView\"}},\"id\":\"a32bf62b-ed04-4658-b6f3-48771d38c70a\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"source\":{\"id\":\"270ad8d5-152e-4aaa-b2c2-8d88a8ad6531\",\"type\":\"ColumnDataSource\"}},\"id\":\"efabb731-0f4e-4087-830c-5a31d67345ca\",\"type\":\"CDSView\"},{\"attributes\":{},\"id\":\"8f09c6ec-ed9f-4ee7-92fd-8f9f271c5113\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"end\":{\"id\":\"55d3d13a-0477-467a-8352-16f0f7b44af2\",\"type\":\"NormalHead\"},\"line_width\":{\"value\":2},\"plot\":{\"id\":\"53a53bf9-8090-45b7-b0e6-a4c4cc7b4e71\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"source\":null,\"start\":null,\"x_end\":{\"value\":-0.6778733985280117},\"x_start\":{\"value\":0},\"y_end\":{\"value\":-0.735178655544408},\"y_start\":{\"value\":0}},\"id\":\"13e4400c-a192-4d73-9307-1019c16faddf\",\"type\":\"Arrow\"},{\"attributes\":{},\"id\":\"f835e5ab-34f3-413a-a545-8df0193c5ab6\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"callback\":null},\"id\":\"57ec98ab-ad45-49b4-9b83-e62b80a39c0a\",\"type\":\"DataRange1d\"},{\"attributes\":{},\"id\":\"779bef6b-efd4-4e6a-b171-6902a173b478\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"dimension\":1,\"plot\":{\"id\":\"53a53bf9-8090-45b7-b0e6-a4c4cc7b4e71\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"595eaf07-ad2c-4ad4-b340-38592f2ce150\",\"type\":\"BasicTicker\"}},\"id\":\"cbcca3a8-b830-4dca-b8e9-f3182f7984b0\",\"type\":\"Grid\"},{\"attributes\":{\"callback\":null,\"data\":{\"x\":[0.69,-1.31,0.3900000000000001,0.08999999999999986,1.29,0.48999999999999977,0.18999999999999995,-0.81,-0.31000000000000005,-0.71],\"y\":[0.48999999999999977,-1.2100000000000002,0.9899999999999998,0.29000000000000004,1.0899999999999999,0.79,-0.31000000000000005,-0.81,-0.31000000000000005,-1.0100000000000002]},\"selected\":{\"id\":\"51370f1b-b4a9-4bbe-9807-aae512b2a96f\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"779bef6b-efd4-4e6a-b171-6902a173b478\",\"type\":\"UnionRenderers\"}},\"id\":\"71bf6077-0448-4378-af26-41903920061c\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"fill_color\":{\"value\":\"#1f77b4\"},\"line_color\":{\"value\":\"#1f77b4\"},\"radius\":{\"units\":\"data\",\"value\":0.035},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"d660f08e-7a95-47d1-ac9e-d0598645c95e\",\"type\":\"Circle\"},{\"attributes\":{},\"id\":\"99b99735-574f-42a3-a738-de46a8aefb56\",\"type\":\"BasicTicker\"},{\"attributes\":{\"formatter\":{\"id\":\"f835e5ab-34f3-413a-a545-8df0193c5ab6\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"53a53bf9-8090-45b7-b0e6-a4c4cc7b4e71\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"99b99735-574f-42a3-a738-de46a8aefb56\",\"type\":\"BasicTicker\"}},\"id\":\"7771dd2e-8928-4c82-b62e-d5696e6dccd7\",\"type\":\"LinearAxis\"},{\"attributes\":{},\"id\":\"ef280acc-ddb9-4c65-90ec-9fd3945618bc\",\"type\":\"LinearScale\"},{\"attributes\":{\"callback\":null},\"id\":\"2ea58493-23bd-41cf-8a8b-495e66b1610c\",\"type\":\"DataRange1d\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"0ef3e2d2-4c85-4c25-8a65-ec8d755c7703\",\"type\":\"PanTool\"},{\"id\":\"eb9964b4-767d-49f1-862e-f5e568d84051\",\"type\":\"WheelZoomTool\"},{\"id\":\"b3750cc1-9a22-46e0-b54a-069ea2a2d8f8\",\"type\":\"BoxZoomTool\"},{\"id\":\"bae6c087-f820-4091-aad7-dbcecd354e5c\",\"type\":\"SaveTool\"},{\"id\":\"47a8720f-b544-4934-9f8e-61e7dd447014\",\"type\":\"ResetTool\"},{\"id\":\"6fa4924a-9d61-4f9e-82bf-e2d28ea08108\",\"type\":\"HelpTool\"}]},\"id\":\"4c40a30f-9ea6-4ac1-908d-0bed9ddcd0cf\",\"type\":\"Toolbar\"},{\"attributes\":{},\"id\":\"fe131558-4c4c-4aa8-9cc7-45ed82821f47\",\"type\":\"LinearScale\"},{\"attributes\":{\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"value\":\"#1f77b4\"},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"radius\":{\"units\":\"data\",\"value\":0.035},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"53e910cb-3587-4360-8191-ca6a15c77363\",\"type\":\"Circle\"},{\"attributes\":{},\"id\":\"234719c0-124e-4654-b4b1-2d631765c54b\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"below\":[{\"id\":\"7771dd2e-8928-4c82-b62e-d5696e6dccd7\",\"type\":\"LinearAxis\"}],\"left\":[{\"id\":\"013fa3bc-44f9-448c-af68-fafc5a46b932\",\"type\":\"LinearAxis\"}],\"plot_height\":350,\"plot_width\":350,\"renderers\":[{\"id\":\"7771dd2e-8928-4c82-b62e-d5696e6dccd7\",\"type\":\"LinearAxis\"},{\"id\":\"1db357dc-5c2f-4b37-abc7-47e95ecc0580\",\"type\":\"Grid\"},{\"id\":\"013fa3bc-44f9-448c-af68-fafc5a46b932\",\"type\":\"LinearAxis\"},{\"id\":\"cbcca3a8-b830-4dca-b8e9-f3182f7984b0\",\"type\":\"Grid\"},{\"id\":\"587c0d90-7407-43f7-84f6-9e65fa0f80e0\",\"type\":\"BoxAnnotation\"},{\"id\":\"b9bfa60a-ddd7-44f9-9732-d4df78fe5ed3\",\"type\":\"GlyphRenderer\"},{\"id\":\"13e4400c-a192-4d73-9307-1019c16faddf\",\"type\":\"Arrow\"},{\"id\":\"a32bf62b-ed04-4658-b6f3-48771d38c70a\",\"type\":\"GlyphRenderer\"}],\"title\":{\"id\":\"eacfaef3-86a1-4b83-ae79-0a3bf4889e53\",\"type\":\"Title\"},\"toolbar\":{\"id\":\"4c40a30f-9ea6-4ac1-908d-0bed9ddcd0cf\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"2ea58493-23bd-41cf-8a8b-495e66b1610c\",\"type\":\"DataRange1d\"},\"x_scale\":{\"id\":\"ef280acc-ddb9-4c65-90ec-9fd3945618bc\",\"type\":\"LinearScale\"},\"y_range\":{\"id\":\"57ec98ab-ad45-49b4-9b83-e62b80a39c0a\",\"type\":\"DataRange1d\"},\"y_scale\":{\"id\":\"fe131558-4c4c-4aa8-9cc7-45ed82821f47\",\"type\":\"LinearScale\"}},\"id\":\"53a53bf9-8090-45b7-b0e6-a4c4cc7b4e71\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{\"data_source\":{\"id\":\"71bf6077-0448-4378-af26-41903920061c\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"d660f08e-7a95-47d1-ac9e-d0598645c95e\",\"type\":\"Circle\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"53e910cb-3587-4360-8191-ca6a15c77363\",\"type\":\"Circle\"},\"selection_glyph\":null,\"view\":{\"id\":\"eaf8eb8a-a219-4deb-9f1d-778a0f7d8afe\",\"type\":\"CDSView\"}},\"id\":\"b9bfa60a-ddd7-44f9-9732-d4df78fe5ed3\",\"type\":\"GlyphRenderer\"}],\"root_ids\":[\"53a53bf9-8090-45b7-b0e6-a4c4cc7b4e71\"]},\"title\":\"Bokeh Application\",\"version\":\"0.13.0\"}};\n", " var render_items = [{\"docid\":\"dfb69e51-de44-4b04-820f-9fba3dbb6188\",\"notebook_comms_target\":\"73b8e48c-fb17-4259-9d1c-14f417c522f5\",\"roots\":{\"53a53bf9-8090-45b7-b0e6-a4c4cc7b4e71\":\"271962c4-c91d-4f14-9324-bf70987486e0\"}}];\n", " root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n", "\n", " }\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " } else {\n", " var attempts = 0;\n", " var timer = setInterval(function(root) {\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " clearInterval(timer);\n", " }\n", " attempts++;\n", " if (attempts > 100) {\n", " console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\")\n", " clearInterval(timer);\n", " }\n", " }, 10, root)\n", " }\n", "})(window);" ], "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "id": "53a53bf9-8090-45b7-b0e6-a4c4cc7b4e71" } }, "output_type": "display_data" }, { "data": { "text/html": [ "

<Bokeh Notebook handle for In[14]>

" ], "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p = figure(plot_width=350, plot_height=350)\n", "p.circle(x, y, radius=0.035)\n", "p.add_layout(Arrow(end=NormalHead(fill_color=\"black\", size=15),\n", " x_start=0, y_start=0,\n", " x_end=pcx*vector_size, y_end=pcy*vector_size,\n", " line_width=2))\n", "p.line([pcx*2 , pcx*-2], [pcy*2 , pcy*-2], line_width=2, color=\"red\")\n", "show(p, notebook_handle=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Projetando os valores" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Para projetar um ponto no componente principal calculamos o produto escalar entre o vetor que representa um ponto e o vetor que representa o componente principal. Dessa forma, o ponto (0.69, 0.49) tem o seguinte valor na nova dimensão:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-0.8279701862010879" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.dot([0.69, 0.49], principal_component)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fazendo para todos os pontos do exemplo, temos:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "points = np.stack([x,y], axis=1) # criando uma matriz na qual cada linha é um ponto do exemplo\n", "oned_points = []\n", "\n", "for point in points:\n", " oned_points.append(np.dot(point, principal_component))" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "(function(root) {\n", " function embed_document(root) {\n", " \n", " var docs_json = {\"145ac3a3-753b-4e6d-9dad-5c0b95192c34\":{\"roots\":{\"references\":[{\"attributes\":{},\"id\":\"24ee452d-d5d6-4582-802e-6f3be36ca2f4\",\"type\":\"BasicTicker\"},{\"attributes\":{},\"id\":\"07fe73e8-e2cb-4182-a69d-8a3a93dbd369\",\"type\":\"BasicTicker\"},{\"attributes\":{},\"id\":\"7ab774dd-6e26-40f3-9bd0-af985c4ad00c\",\"type\":\"HelpTool\"},{\"attributes\":{\"formatter\":{\"id\":\"2a52ad86-4d0d-4fb4-9018-a9bc0a3b44de\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"c391242f-04dc-4cd6-948d-7026655b11ba\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"07fe73e8-e2cb-4182-a69d-8a3a93dbd369\",\"type\":\"BasicTicker\"}},\"id\":\"ea12e6e3-b51a-4784-9174-43a3318f9488\",\"type\":\"LinearAxis\"},{\"attributes\":{\"callback\":null,\"data\":{\"x\":[-0.8279701862010878,1.7775803252804292,-0.9921974944148884,-0.2742104159753993,-1.6758014186445398,-0.9129491031588078,0.09910943749844434,1.1445721637986601,0.4380461367624502,1.2238205550547405]},\"selected\":{\"id\":\"45c43356-97ef-4ab4-90b4-aa7495c016a9\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"39a16159-29a4-4ac6-b235-2aabad5d98b8\",\"type\":\"UnionRenderers\"}},\"id\":\"eeb9994f-da5f-4067-8a58-bbfb1d07d6c8\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"dimension\":1,\"plot\":{\"id\":\"c391242f-04dc-4cd6-948d-7026655b11ba\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"07fe73e8-e2cb-4182-a69d-8a3a93dbd369\",\"type\":\"BasicTicker\"}},\"id\":\"fae44950-724c-475b-8690-595d0813ef8e\",\"type\":\"Grid\"},{\"attributes\":{\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"value\":\"#1f77b4\"},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"radius\":{\"units\":\"data\",\"value\":0.035},\"x\":{\"field\":\"x\"},\"y\":{\"value\":0}},\"id\":\"4577710b-e481-4155-92d5-21e85ea18488\",\"type\":\"Circle\"},{\"attributes\":{},\"id\":\"55b42167-8c11-4f82-96d7-b2575a8e8391\",\"type\":\"WheelZoomTool\"},{\"attributes\":{},\"id\":\"a6b11a75-4c91-4363-95ee-6474c2e824ec\",\"type\":\"LinearScale\"},{\"attributes\":{\"plot\":null,\"text\":\"\"},\"id\":\"88baac70-7706-444f-9fd9-e3e133d710bc\",\"type\":\"Title\"},{\"attributes\":{\"below\":[{\"id\":\"3561e1cb-9bb8-4d27-97b4-372efcb10ab0\",\"type\":\"LinearAxis\"}],\"left\":[{\"id\":\"ea12e6e3-b51a-4784-9174-43a3318f9488\",\"type\":\"LinearAxis\"}],\"plot_height\":250,\"plot_width\":350,\"renderers\":[{\"id\":\"3561e1cb-9bb8-4d27-97b4-372efcb10ab0\",\"type\":\"LinearAxis\"},{\"id\":\"eca5d31d-b662-4858-a15b-2002a11539a1\",\"type\":\"Grid\"},{\"id\":\"ea12e6e3-b51a-4784-9174-43a3318f9488\",\"type\":\"LinearAxis\"},{\"id\":\"fae44950-724c-475b-8690-595d0813ef8e\",\"type\":\"Grid\"},{\"id\":\"03d2e2d0-8afd-4271-89b0-96daf72b9fa6\",\"type\":\"BoxAnnotation\"},{\"id\":\"76135b87-859f-4e93-8be9-41ec369fdabf\",\"type\":\"GlyphRenderer\"}],\"title\":{\"id\":\"88baac70-7706-444f-9fd9-e3e133d710bc\",\"type\":\"Title\"},\"toolbar\":{\"id\":\"703d3748-c4bd-4b78-b62e-f929b50e5040\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"9804aea3-78c0-40fd-b805-fbf3005a4ecb\",\"type\":\"DataRange1d\"},\"x_scale\":{\"id\":\"632d65e6-a022-46ae-b9df-968710f59d37\",\"type\":\"LinearScale\"},\"y_range\":{\"id\":\"ea0a22e6-499f-4846-8d87-5d166d9cf648\",\"type\":\"DataRange1d\"},\"y_scale\":{\"id\":\"a6b11a75-4c91-4363-95ee-6474c2e824ec\",\"type\":\"LinearScale\"}},\"id\":\"c391242f-04dc-4cd6-948d-7026655b11ba\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{\"data_source\":{\"id\":\"eeb9994f-da5f-4067-8a58-bbfb1d07d6c8\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"63eadcdd-8ac5-451f-b217-1d69d8018784\",\"type\":\"Circle\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"4577710b-e481-4155-92d5-21e85ea18488\",\"type\":\"Circle\"},\"selection_glyph\":null,\"view\":{\"id\":\"df40c384-b6ae-4ead-84fc-c05d57a36ec2\",\"type\":\"CDSView\"}},\"id\":\"76135b87-859f-4e93-8be9-41ec369fdabf\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"overlay\":{\"id\":\"03d2e2d0-8afd-4271-89b0-96daf72b9fa6\",\"type\":\"BoxAnnotation\"}},\"id\":\"c46ee3f2-2e7c-45a7-acc0-65f99e4cc728\",\"type\":\"BoxZoomTool\"},{\"attributes\":{},\"id\":\"2a52ad86-4d0d-4fb4-9018-a9bc0a3b44de\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"cd63fe52-78d3-4132-9703-2ac513c13fc6\",\"type\":\"PanTool\"},{\"id\":\"55b42167-8c11-4f82-96d7-b2575a8e8391\",\"type\":\"WheelZoomTool\"},{\"id\":\"c46ee3f2-2e7c-45a7-acc0-65f99e4cc728\",\"type\":\"BoxZoomTool\"},{\"id\":\"92661870-a080-4c4a-b58f-35c39872449b\",\"type\":\"SaveTool\"},{\"id\":\"e71111a6-c818-4f53-a8c4-579c043f2d8e\",\"type\":\"ResetTool\"},{\"id\":\"7ab774dd-6e26-40f3-9bd0-af985c4ad00c\",\"type\":\"HelpTool\"}]},\"id\":\"703d3748-c4bd-4b78-b62e-f929b50e5040\",\"type\":\"Toolbar\"},{\"attributes\":{\"fill_color\":{\"value\":\"#1f77b4\"},\"line_color\":{\"value\":\"#1f77b4\"},\"radius\":{\"units\":\"data\",\"value\":0.035},\"x\":{\"field\":\"x\"},\"y\":{\"value\":0}},\"id\":\"63eadcdd-8ac5-451f-b217-1d69d8018784\",\"type\":\"Circle\"},{\"attributes\":{},\"id\":\"cd63fe52-78d3-4132-9703-2ac513c13fc6\",\"type\":\"PanTool\"},{\"attributes\":{},\"id\":\"5bf925d8-334a-460c-b9c2-724e015194f2\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{},\"id\":\"92661870-a080-4c4a-b58f-35c39872449b\",\"type\":\"SaveTool\"},{\"attributes\":{\"source\":{\"id\":\"eeb9994f-da5f-4067-8a58-bbfb1d07d6c8\",\"type\":\"ColumnDataSource\"}},\"id\":\"df40c384-b6ae-4ead-84fc-c05d57a36ec2\",\"type\":\"CDSView\"},{\"attributes\":{},\"id\":\"e71111a6-c818-4f53-a8c4-579c043f2d8e\",\"type\":\"ResetTool\"},{\"attributes\":{\"callback\":null},\"id\":\"9804aea3-78c0-40fd-b805-fbf3005a4ecb\",\"type\":\"DataRange1d\"},{\"attributes\":{},\"id\":\"45c43356-97ef-4ab4-90b4-aa7495c016a9\",\"type\":\"Selection\"},{\"attributes\":{},\"id\":\"39a16159-29a4-4ac6-b235-2aabad5d98b8\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"plot\":{\"id\":\"c391242f-04dc-4cd6-948d-7026655b11ba\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"24ee452d-d5d6-4582-802e-6f3be36ca2f4\",\"type\":\"BasicTicker\"}},\"id\":\"eca5d31d-b662-4858-a15b-2002a11539a1\",\"type\":\"Grid\"},{\"attributes\":{\"callback\":null},\"id\":\"ea0a22e6-499f-4846-8d87-5d166d9cf648\",\"type\":\"DataRange1d\"},{\"attributes\":{},\"id\":\"632d65e6-a022-46ae-b9df-968710f59d37\",\"type\":\"LinearScale\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"plot\":null,\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"03d2e2d0-8afd-4271-89b0-96daf72b9fa6\",\"type\":\"BoxAnnotation\"},{\"attributes\":{\"formatter\":{\"id\":\"5bf925d8-334a-460c-b9c2-724e015194f2\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"c391242f-04dc-4cd6-948d-7026655b11ba\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"24ee452d-d5d6-4582-802e-6f3be36ca2f4\",\"type\":\"BasicTicker\"}},\"id\":\"3561e1cb-9bb8-4d27-97b4-372efcb10ab0\",\"type\":\"LinearAxis\"}],\"root_ids\":[\"c391242f-04dc-4cd6-948d-7026655b11ba\"]},\"title\":\"Bokeh Application\",\"version\":\"0.13.0\"}};\n", " var render_items = [{\"docid\":\"145ac3a3-753b-4e6d-9dad-5c0b95192c34\",\"notebook_comms_target\":\"5a249266-1643-4cec-8c81-d563602682fc\",\"roots\":{\"c391242f-04dc-4cd6-948d-7026655b11ba\":\"f40fa2fd-87b9-4d06-8e04-3b2aebfc05d1\"}}];\n", " root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n", "\n", " }\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " } else {\n", " var attempts = 0;\n", " var timer = setInterval(function(root) {\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " clearInterval(timer);\n", " }\n", " attempts++;\n", " if (attempts > 100) {\n", " console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\")\n", " clearInterval(timer);\n", " }\n", " }, 10, root)\n", " }\n", "})(window);" ], "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "id": "c391242f-04dc-4cd6-948d-7026655b11ba" } }, "output_type": "display_data" }, { "data": { "text/html": [ "

<Bokeh Notebook handle for In[17]>

" ], "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p = figure(plot_width=350, plot_height=250)\n", "p.circle(oned_points, 0, radius=0.035)\n", "show(p, notebook_handle=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Como esperado, os pontos estão todos representados em uma única dimensão. Notem que as distâncias entre os pontos continuam parecidas com as distâncias entre eles no gráfico de duas dimensões." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dados com 3 dimensões ou mais" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "O PCA é especialmente importante na visualização de dados com 3 ou mais dimensões. No exemplo anterior, usamos um dataset com dados bidimensionais e, usando o PCA, representamos eles em uma dimensão. A seguir todo o algoritmo vai ser executado para o conhecido dataset de flores íris. Esse dataset contém dados sobre 150 flores, cada uma delas representada por quatro características." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(150, 4)" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn import datasets\n", "\n", "dataset = datasets.load_iris()\n", "iris_data = dataset.data\n", "iris_data.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Com 4 dimensões, não é possível visualizar o dataset. Assim, vamos executar o PCA para representar os dados utilizando menos dimensões." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "O primeiro passo é o pré-processamento para centralizar os dados." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(150, 4)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "means = iris_data.mean(axis=1, keepdims=True) # calculando a média de cada uma das flores\n", "iris_data = iris_data - means\n", "iris_data.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Em seguida, calculamos a matriz de covariância." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0.07877002, 0.03580045, -0.06437248, -0.05019799],\n", " [ 0.03580045, 0.94506532, -0.97777517, -0.0030906 ],\n", " [-0.06437248, -0.97777517, 1.04399329, -0.00184564],\n", " [-0.05019799, -0.0030906 , -0.00184564, 0.05513423]])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cov_matrix = np.cov(iris_data, rowvar=False)\n", "cov_matrix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Com a matriz em mãos, podemos calcular os autovalores e autovetores." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1.97623957e+00 1.18269289e-01 -2.87303072e-16 2.84540080e-02]\n", "[[ 0.03760102 0.78022871 0.5 0.3739376 ]\n", " [ 0.68827047 -0.10694172 0.5 -0.5146331 ]\n", " [-0.72447773 -0.05992256 0.5 -0.47068174]\n", " [-0.00139375 -0.61336443 0.5 0.61137725]]\n" ] } ], "source": [ "eigenvalues, eigenvectors = eig(cov_matrix)\n", "\n", "print(eigenvalues)\n", "print(eigenvectors)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Analisando os autovalores, pode-se perceber que o primeiro possui o maior valor, então, se quisermos representar os dados das flores íris usando uma dimensão, projetaremos eles no primeiro autovetor.\n", "Entretanto, nesse exemplo, podemos usar mais dimensões para representar os pontos em duas ou três dimensões.\n", "\n", "Para reduzir para duas dimensões, escolheremos os dois maiores autovalores e os seus autovetores correspondentes. Observando os valores calculados, tem-se que os dois maiores autovalores são os dois primeiros. Com isso, vamos projetar os dados num plano representado pelos dois primeiros autovetores, assim exibindo os dados num espaço bidimensional." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Para projetar um ponto no espaço bidimensional multiplicamos o vetor do ponto pela matriz contendo os dois primeiros autovetores." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 2.55 0.95 -1.15 -2.35]\n" ] }, { "data": { "text/plain": [ "array([-0.6166535 , 2.28788285])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "first_iris = iris_data[0]\n", "print(first_iris)\n", "\n", "np.matmul([eigenvectors[0], eigenvectors[1]], first_iris)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fazendo o cálculo para todos os pontos:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "twod_x = []\n", "twod_y = []\n", "\n", "for point in iris_data:\n", " new_point = np.matmul([eigenvectors[0], eigenvectors[1]], point)\n", " twod_x.append(new_point[0])\n", " twod_y.append(new_point[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "O dataset além de trazer as características das flores, também tem a classificação de cada uma.\n", "São três tipos diferentes e no gráfico abaixo cada tipo tem uma cor específica.\n", "\n", "Usando o resultado do PCA, veja os pontos representados em duas dimensões:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "(function(root) {\n", " function embed_document(root) {\n", " \n", " var docs_json = {\"d28e4ca1-a16f-4722-bd9e-8f0a0422130f\":{\"roots\":{\"references\":[{\"attributes\":{},\"id\":\"01430d7a-d29c-4cd5-b8d1-75ef376e48b4\",\"type\":\"ResetTool\"},{\"attributes\":{\"source\":{\"id\":\"1a824b4d-2d12-4f58-9232-f7fa3c4f0d84\",\"type\":\"ColumnDataSource\"}},\"id\":\"3f3da774-60f5-4e7c-b311-9fce41c8af91\",\"type\":\"CDSView\"},{\"attributes\":{},\"id\":\"9fe694d0-4dc2-431e-85a6-ad9522be023a\",\"type\":\"LinearScale\"},{\"attributes\":{},\"id\":\"cb87a7ab-1eb5-4219-950b-bc8903693f09\",\"type\":\"HelpTool\"},{\"attributes\":{},\"id\":\"97209a1f-976f-44e4-837e-8a43b2cfe12f\",\"type\":\"BasicTicker\"},{\"attributes\":{},\"id\":\"58c8719c-11a1-420c-a168-b6b38f625dc3\",\"type\":\"LinearScale\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":{\"value\":0.5},\"fill_color\":{\"value\":\"lightgrey\"},\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":{\"value\":1.0},\"line_color\":{\"value\":\"black\"},\"line_dash\":[4,4],\"line_width\":{\"value\":2},\"plot\":null,\"render_mode\":\"css\",\"right_units\":\"screen\",\"top_units\":\"screen\"},\"id\":\"1e039b1d-bb43-4917-9644-8084e2698e78\",\"type\":\"BoxAnnotation\"},{\"attributes\":{\"formatter\":{\"id\":\"ad7146ca-d89a-4118-ae5a-3ab553871606\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"f3670b70-835a-4b79-ab6d-66f346dea703\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"a34c000d-cc53-45b0-8507-b62532280788\",\"type\":\"BasicTicker\"}},\"id\":\"74860cea-ee69-411b-9087-926f8339d69b\",\"type\":\"LinearAxis\"},{\"attributes\":{},\"id\":\"7b428786-300e-46d0-b52d-c1126353bd31\",\"type\":\"WheelZoomTool\"},{\"attributes\":{},\"id\":\"a34c000d-cc53-45b0-8507-b62532280788\",\"type\":\"BasicTicker\"},{\"attributes\":{\"overlay\":{\"id\":\"1e039b1d-bb43-4917-9644-8084e2698e78\",\"type\":\"BoxAnnotation\"}},\"id\":\"28d418db-7773-4455-87e7-db0b5fe6fcad\",\"type\":\"BoxZoomTool\"},{\"attributes\":{\"callback\":null,\"data\":{\"fill_color\":[\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\"],\"line_color\":[\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"red\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"blue\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\",\"green\"],\"x\":[-0.6166534950399016,-0.7182287701371253,-0.5774090491282895,-0.5591920215140949,-0.542390726084498,-0.5760243843756071,-0.4646121987900592,-0.6061422843276787,-0.5612870507955632,-0.6698938425708592,-0.6530925471412623,-0.5213683046600519,-0.674794265662578,-0.5052413094543534,-0.7231602606607581,-0.5283948244651534,-0.6068476519834918,-0.6215539181316214,-0.7224548930049468,-0.5066620385119227,-0.7448669760550471,-0.5472911491762178,-0.4190776681610713,-0.679694688754297,-0.49825085395413926,-0.7413512177160104,-0.6082373136091463,-0.6474817595207577,-0.690916263995305,-0.554291598422376,-0.6285543673777795,-0.7700794560424282,-0.428209210993585,-0.5206890075631813,-0.6698938425708592,-0.7007171101787423,-0.7784956374731833,-0.6698938425708592,-0.5332641801249582,-0.6446763657105063,-0.5907256536507649,-0.8267994976075355,-0.46180680497980625,-0.5823094722200084,-0.48073919399575765,-0.684595111846017,-0.4940557985182315,-0.5311691508434901,-0.6145584657584341,-0.649576788802226,-1.2605002233669391,-1.0496077919656366,-1.247183618844465,-1.0530874859997865,-1.223350806736796,-0.9129831263927077,-0.9648338122980098,-0.8253937591688846,-1.2163553543636094,-0.8071767315546904,-0.9914309570380724,-0.9515122109025627,-1.2667853112113419,-1.020879553639219,-0.908072709555043,-1.2037441174969468,-0.8127925160481662,-1.003367893680837,-1.3298265049257385,-1.0180691629559946,-0.8485212036207433,-1.1056485364338726,-1.2303512559829548,-1.0468073950283552,-1.162404642303868,-1.2009387236866946,-1.318640993989617,-1.207939172932853,-1.002657529152053,-1.0467974012824115,-1.0229695860477133,-1.0257749798579663,-1.0285803736682173,-1.0327804259770965,-0.7357243532825112,-0.8289145143808929,-1.1855270898827526,-1.330536869454523,-0.8338149374726123,-0.9816301108546345,-0.9101777325824546,-0.9928566829686145,-1.056603244338822,-0.8996565281242873,-0.9332951832883685,-0.8597427788617482,-0.900371889526045,-1.0853364795382117,-0.8947511081595954,-0.9438063940005915,-0.9087620003978596,-0.9704135324865995,-1.312324838713302,-1.0481970566540113,-1.093726590410026,-1.451054527313643,-0.7314982304146899,-1.3795971521684918,-1.3298364986716837,-1.1406768532236087,-1.066409087395232,-1.1862063869796233,-1.2275458621727033,-1.015943066242613,-0.9591869603726186,-1.0271646414836222,-1.0972423487490628,-1.2009537143056117,-1.6191867544643002,-1.2142292576502274,-1.1890117807898752,-0.877928739044029,-1.5484397438479316,-1.1735951501129602,-1.0664140842682062,-1.256994458773849,-1.107038198059529,-0.989340924629578,-1.1371610948845723,-1.334062621539505,-1.4741719780195544,-1.2913384815937428,-1.142061517976292,-1.107753559461285,-1.0587132642392072,-1.537918539389763,-0.8989561573414478,-1.0229795797936592,-0.9585126601487219,-1.2380570728849252,-1.1602785455904863,-1.2709753697742765,-0.9704135324865995,-1.1350660656031046,-1.0860157766350822,-1.2219300776792261,-1.2422471314478605,-1.1301606456384132,-0.8709332866708434,-0.896861128059981],\"y\":[2.2878828503879265,2.302871355155058,2.1079963083050686,2.149863433823506,2.2083616314587315,2.328651988612069,1.9738154336788951,2.293917366914765,2.0544346399023867,2.379473102370799,2.4379713000060246,2.2204306645124086,2.3138424009144396,1.9330462954547536,2.474527406122486,2.296657921558717,2.1853215526456733,2.2222521489315668,2.5831232203915793,2.183500068226514,2.584221207685875,2.142730930002372,1.8463925727044244,2.2482116994580803,2.3279284914872056,2.4291962288352336,2.1984885729936456,2.378375115076504,2.3674040693171223,2.215494135279866,2.2950153542090606,2.381294586789958,2.2948364371395584,2.3324905305503165,2.379473102370799,2.2361426664044037,2.470688864184239,2.379473102370799,1.9937404676785704,2.348577022611743,2.1317598842429906,2.15680036454471,1.9440173412141353,2.0423656068487093,2.2611998027365505,2.182580998001721,2.2849633786744725,2.0891692615996895,2.3833116443090465,2.2829463211553835,3.795908677573611,3.330654823952186,3.772145101635689,3.0145703497240657,3.520593341569632,3.1787448899149293,3.2571681215498294,2.6078146937378675,3.681652836947111,2.6496818192563047,2.8335858203469124,2.9995818449569347,3.5096222958102508,3.3785564659974625,2.7767301900613433,3.549293446740098,2.9431007048407984,3.3118277772468065,3.469951144880404,3.1149356728777287,2.9679622680730158,3.218220467744846,3.5933565468471254,3.534679432142399,3.464835698578359,3.519495354275337,3.821868228100123,3.59225855955283,3.1866008908609267,3.0670340308324535,3.0493049714213694,3.07910306388613,3.108901156350891,3.385688969818596,2.833781393446843,2.9966623732434803,3.591160572258535,3.5951780312662844,2.931031671787121,2.964847223259631,3.1489467974501695,3.3178622937736453,3.169595328574707,2.6873359126670624,3.041448970475372,3.087154637932058,3.046385499707916,3.3555163871844034,2.519143913468449,3.0354144539485337,3.132315725333381,3.079477554055562,3.870867857439694,3.5478464524903712,3.441446612809869,4.394994398865776,2.553529528210322,4.345271272401341,3.9375965461903495,3.5855005459011284,3.272156626316961,3.4791007062206267,3.563558454382365,2.9730777143750613,2.7264624835415483,3.0922700842341024,3.59647159166051,4.220963456240255,4.525336731553701,3.5397948784444435,3.5088987986853875,2.808000759989832,4.6008404914751475,3.3467413160136124,3.505979326971934,4.108529100032915,3.2313874880928184,3.1628373149230047,3.430475567050488,4.21784841142687,4.28749657189098,4.354046343572132,3.3648448655941285,3.5904370751336714,3.7756346366185047,4.139229606692041,3.0297544275911275,3.516950372731315,3.072345050234428,3.5575239378555272,3.322977740075691,3.318764707968012,3.079477554055562,3.525904360971608,3.2434565211464963,3.2701395687978723,3.3666663500132867,3.3577123617729945,2.9690602553673116,3.1251832215122475]},\"selected\":{\"id\":\"b34da002-64b9-4164-af01-183236e0eb96\",\"type\":\"Selection\"},\"selection_policy\":{\"id\":\"ed04cbe1-642b-4670-b197-bdd7a6f91607\",\"type\":\"UnionRenderers\"}},\"id\":\"1a824b4d-2d12-4f58-9232-f7fa3c4f0d84\",\"type\":\"ColumnDataSource\"},{\"attributes\":{\"below\":[{\"id\":\"74860cea-ee69-411b-9087-926f8339d69b\",\"type\":\"LinearAxis\"}],\"left\":[{\"id\":\"6f563174-945e-4678-bb07-caac5ca29a7f\",\"type\":\"LinearAxis\"}],\"plot_height\":350,\"plot_width\":350,\"renderers\":[{\"id\":\"74860cea-ee69-411b-9087-926f8339d69b\",\"type\":\"LinearAxis\"},{\"id\":\"2d175b30-ea39-4755-8bf1-40a17616ec76\",\"type\":\"Grid\"},{\"id\":\"6f563174-945e-4678-bb07-caac5ca29a7f\",\"type\":\"LinearAxis\"},{\"id\":\"83507100-2154-46d7-a4ec-75b2f52fa87a\",\"type\":\"Grid\"},{\"id\":\"1e039b1d-bb43-4917-9644-8084e2698e78\",\"type\":\"BoxAnnotation\"},{\"id\":\"d4d1fe31-4aa6-4c9c-b628-2bbc3f1d556e\",\"type\":\"GlyphRenderer\"}],\"title\":{\"id\":\"d2bb90b5-61be-42c5-a69c-8c9f584b19b2\",\"type\":\"Title\"},\"toolbar\":{\"id\":\"553d2c0d-09a7-4763-8587-689d380c818d\",\"type\":\"Toolbar\"},\"x_range\":{\"id\":\"7d1008b4-5806-4bea-a68b-268e092b2ef1\",\"type\":\"DataRange1d\"},\"x_scale\":{\"id\":\"58c8719c-11a1-420c-a168-b6b38f625dc3\",\"type\":\"LinearScale\"},\"y_range\":{\"id\":\"58035f40-c3c9-42c3-a13a-d2e917f4df1a\",\"type\":\"DataRange1d\"},\"y_scale\":{\"id\":\"9fe694d0-4dc2-431e-85a6-ad9522be023a\",\"type\":\"LinearScale\"}},\"id\":\"f3670b70-835a-4b79-ab6d-66f346dea703\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{\"dimension\":1,\"plot\":{\"id\":\"f3670b70-835a-4b79-ab6d-66f346dea703\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"97209a1f-976f-44e4-837e-8a43b2cfe12f\",\"type\":\"BasicTicker\"}},\"id\":\"83507100-2154-46d7-a4ec-75b2f52fa87a\",\"type\":\"Grid\"},{\"attributes\":{},\"id\":\"c87b5268-be91-4cce-80ff-5db680b24f80\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"fill_color\":{\"field\":\"fill_color\"},\"line_color\":{\"field\":\"line_color\"},\"radius\":{\"units\":\"data\",\"value\":0.01},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"04edf5f4-ac0c-4cb2-b8cb-173bbfd25fad\",\"type\":\"Circle\"},{\"attributes\":{},\"id\":\"23e74f8f-90aa-4b9a-89d8-adadc7019a15\",\"type\":\"PanTool\"},{\"attributes\":{\"plot\":{\"id\":\"f3670b70-835a-4b79-ab6d-66f346dea703\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"a34c000d-cc53-45b0-8507-b62532280788\",\"type\":\"BasicTicker\"}},\"id\":\"2d175b30-ea39-4755-8bf1-40a17616ec76\",\"type\":\"Grid\"},{\"attributes\":{\"active_drag\":\"auto\",\"active_inspect\":\"auto\",\"active_multi\":null,\"active_scroll\":\"auto\",\"active_tap\":\"auto\",\"tools\":[{\"id\":\"23e74f8f-90aa-4b9a-89d8-adadc7019a15\",\"type\":\"PanTool\"},{\"id\":\"7b428786-300e-46d0-b52d-c1126353bd31\",\"type\":\"WheelZoomTool\"},{\"id\":\"28d418db-7773-4455-87e7-db0b5fe6fcad\",\"type\":\"BoxZoomTool\"},{\"id\":\"ecbd30cd-93b5-44ad-a876-da1417a32b28\",\"type\":\"SaveTool\"},{\"id\":\"01430d7a-d29c-4cd5-b8d1-75ef376e48b4\",\"type\":\"ResetTool\"},{\"id\":\"cb87a7ab-1eb5-4219-950b-bc8903693f09\",\"type\":\"HelpTool\"}]},\"id\":\"553d2c0d-09a7-4763-8587-689d380c818d\",\"type\":\"Toolbar\"},{\"attributes\":{\"formatter\":{\"id\":\"c87b5268-be91-4cce-80ff-5db680b24f80\",\"type\":\"BasicTickFormatter\"},\"plot\":{\"id\":\"f3670b70-835a-4b79-ab6d-66f346dea703\",\"subtype\":\"Figure\",\"type\":\"Plot\"},\"ticker\":{\"id\":\"97209a1f-976f-44e4-837e-8a43b2cfe12f\",\"type\":\"BasicTicker\"}},\"id\":\"6f563174-945e-4678-bb07-caac5ca29a7f\",\"type\":\"LinearAxis\"},{\"attributes\":{},\"id\":\"ed04cbe1-642b-4670-b197-bdd7a6f91607\",\"type\":\"UnionRenderers\"},{\"attributes\":{\"callback\":null},\"id\":\"7d1008b4-5806-4bea-a68b-268e092b2ef1\",\"type\":\"DataRange1d\"},{\"attributes\":{},\"id\":\"b34da002-64b9-4164-af01-183236e0eb96\",\"type\":\"Selection\"},{\"attributes\":{},\"id\":\"ecbd30cd-93b5-44ad-a876-da1417a32b28\",\"type\":\"SaveTool\"},{\"attributes\":{\"callback\":null},\"id\":\"58035f40-c3c9-42c3-a13a-d2e917f4df1a\",\"type\":\"DataRange1d\"},{\"attributes\":{},\"id\":\"ad7146ca-d89a-4118-ae5a-3ab553871606\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{\"data_source\":{\"id\":\"1a824b4d-2d12-4f58-9232-f7fa3c4f0d84\",\"type\":\"ColumnDataSource\"},\"glyph\":{\"id\":\"04edf5f4-ac0c-4cb2-b8cb-173bbfd25fad\",\"type\":\"Circle\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"9318bd07-d64a-468f-b3f5-974b273a9697\",\"type\":\"Circle\"},\"selection_glyph\":null,\"view\":{\"id\":\"3f3da774-60f5-4e7c-b311-9fce41c8af91\",\"type\":\"CDSView\"}},\"id\":\"d4d1fe31-4aa6-4c9c-b628-2bbc3f1d556e\",\"type\":\"GlyphRenderer\"},{\"attributes\":{\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"value\":\"#1f77b4\"},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"radius\":{\"units\":\"data\",\"value\":0.01},\"x\":{\"field\":\"x\"},\"y\":{\"field\":\"y\"}},\"id\":\"9318bd07-d64a-468f-b3f5-974b273a9697\",\"type\":\"Circle\"},{\"attributes\":{\"plot\":null,\"text\":\"\"},\"id\":\"d2bb90b5-61be-42c5-a69c-8c9f584b19b2\",\"type\":\"Title\"}],\"root_ids\":[\"f3670b70-835a-4b79-ab6d-66f346dea703\"]},\"title\":\"Bokeh Application\",\"version\":\"0.13.0\"}};\n", " var render_items = [{\"docid\":\"d28e4ca1-a16f-4722-bd9e-8f0a0422130f\",\"notebook_comms_target\":\"2cec8ddc-2325-4086-bbb6-a0d3037f8a24\",\"roots\":{\"f3670b70-835a-4b79-ab6d-66f346dea703\":\"0ac06000-2cf8-426b-bd59-382c37229b65\"}}];\n", " root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n", "\n", " }\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " } else {\n", " var attempts = 0;\n", " var timer = setInterval(function(root) {\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " clearInterval(timer);\n", " }\n", " attempts++;\n", " if (attempts > 100) {\n", " console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\")\n", " clearInterval(timer);\n", " }\n", " }, 10, root)\n", " }\n", "})(window);" ], "application/vnd.bokehjs_exec.v0+json": "" }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "id": "f3670b70-835a-4b79-ab6d-66f346dea703" } }, "output_type": "display_data" }, { "data": { "text/html": [ "

<Bokeh Notebook handle for In[24]>

" ], "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "colors = [\"red\", \"blue\", \"green\"]\n", "\n", "p = figure(plot_width=350, plot_height=350)\n", "p.circle(twod_x, twod_y, radius=0.010, color=[colors[x] for x in dataset.target])\n", "show(p, notebook_handle=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Para finalizar, vamos representar os dados em 3 dimensões. Para isso, multiplicamos cada ponto pelos autovetores com os três maiores autovalores." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "threed_x = []\n", "threed_y = []\n", "threed_z = []\n", "\n", "for point in iris_data:\n", " new_point = np.matmul([eigenvectors[0], eigenvectors[1], eigenvectors[3]], point)\n", " threed_x.append(new_point[0])\n", " threed_y.append(new_point[1])\n", " threed_z.append(new_point[2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Os pontos em 3 dimensões podem ser vistos no gráfico abaixo:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/vnd.plotly.v1+html": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.plotly.v1+json": { "config": { "linkText": "Export to plot.ly", "plotlyServerURL": "https://plot.ly", "showLink": false }, "data": [ { "marker": { "color": [ "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "red", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green", "green" ], "size": 5 }, "mode": "markers", "type": "scatter3d", "uid": "ef269267-3912-41b8-b108-b44aefc800f9", "x": [ -0.6166534950399016, -0.7182287701371253, -0.5774090491282895, -0.5591920215140949, -0.542390726084498, -0.5760243843756071, -0.4646121987900592, -0.6061422843276787, -0.5612870507955632, -0.6698938425708592, -0.6530925471412623, -0.5213683046600519, -0.674794265662578, -0.5052413094543534, -0.7231602606607581, -0.5283948244651534, -0.6068476519834918, -0.6215539181316214, -0.7224548930049468, -0.5066620385119227, -0.7448669760550471, -0.5472911491762178, -0.4190776681610713, -0.679694688754297, -0.49825085395413926, -0.7413512177160104, -0.6082373136091463, -0.6474817595207577, -0.690916263995305, -0.554291598422376, -0.6285543673777795, -0.7700794560424282, -0.428209210993585, -0.5206890075631813, -0.6698938425708592, -0.7007171101787423, -0.7784956374731833, -0.6698938425708592, -0.5332641801249582, -0.6446763657105063, -0.5907256536507649, -0.8267994976075355, -0.46180680497980625, -0.5823094722200084, -0.48073919399575765, -0.684595111846017, -0.4940557985182315, -0.5311691508434901, -0.6145584657584341, -0.649576788802226, -1.2605002233669391, -1.0496077919656366, -1.247183618844465, -1.0530874859997865, -1.223350806736796, -0.9129831263927077, -0.9648338122980098, -0.8253937591688846, -1.2163553543636094, -0.8071767315546904, -0.9914309570380724, -0.9515122109025627, -1.2667853112113419, -1.020879553639219, -0.908072709555043, -1.2037441174969468, -0.8127925160481662, -1.003367893680837, -1.3298265049257385, -1.0180691629559946, -0.8485212036207433, -1.1056485364338726, -1.2303512559829548, -1.0468073950283552, -1.162404642303868, -1.2009387236866946, -1.318640993989617, -1.207939172932853, -1.002657529152053, -1.0467974012824115, -1.0229695860477133, -1.0257749798579663, -1.0285803736682173, -1.0327804259770965, -0.7357243532825112, -0.8289145143808929, -1.1855270898827526, -1.330536869454523, -0.8338149374726123, -0.9816301108546345, -0.9101777325824546, -0.9928566829686145, -1.056603244338822, -0.8996565281242873, -0.9332951832883685, -0.8597427788617482, -0.900371889526045, -1.0853364795382117, -0.8947511081595954, -0.9438063940005915, -0.9087620003978596, -0.9704135324865995, -1.312324838713302, -1.0481970566540113, -1.093726590410026, -1.451054527313643, -0.7314982304146899, -1.3795971521684918, -1.3298364986716837, -1.1406768532236087, -1.066409087395232, -1.1862063869796233, -1.2275458621727033, -1.015943066242613, -0.9591869603726186, -1.0271646414836222, -1.0972423487490628, -1.2009537143056117, -1.6191867544643002, -1.2142292576502274, -1.1890117807898752, -0.877928739044029, -1.5484397438479316, -1.1735951501129602, -1.0664140842682062, -1.256994458773849, -1.107038198059529, -0.989340924629578, -1.1371610948845723, -1.334062621539505, -1.4741719780195544, -1.2913384815937428, -1.142061517976292, -1.107753559461285, -1.0587132642392072, -1.537918539389763, -0.8989561573414478, -1.0229795797936592, -0.9585126601487219, -1.2380570728849252, -1.1602785455904863, -1.2709753697742765, -0.9704135324865995, -1.1350660656031046, -1.0860157766350822, -1.2219300776792261, -1.2422471314478605, -1.1301606456384132, -0.8709332866708434, -0.896861128059981 ], "y": [ 2.2878828503879265, 2.302871355155058, 2.1079963083050686, 2.149863433823506, 2.2083616314587315, 2.328651988612069, 1.9738154336788951, 2.293917366914765, 2.0544346399023867, 2.379473102370799, 2.4379713000060246, 2.2204306645124086, 2.3138424009144396, 1.9330462954547536, 2.474527406122486, 2.296657921558717, 2.1853215526456733, 2.2222521489315668, 2.5831232203915793, 2.183500068226514, 2.584221207685875, 2.142730930002372, 1.8463925727044244, 2.2482116994580803, 2.3279284914872056, 2.4291962288352336, 2.1984885729936456, 2.378375115076504, 2.3674040693171223, 2.215494135279866, 2.2950153542090606, 2.381294586789958, 2.2948364371395584, 2.3324905305503165, 2.379473102370799, 2.2361426664044037, 2.470688864184239, 2.379473102370799, 1.9937404676785704, 2.348577022611743, 2.1317598842429906, 2.15680036454471, 1.9440173412141353, 2.0423656068487093, 2.2611998027365505, 2.182580998001721, 2.2849633786744725, 2.0891692615996895, 2.3833116443090465, 2.2829463211553835, 3.795908677573611, 3.330654823952186, 3.772145101635689, 3.0145703497240657, 3.520593341569632, 3.1787448899149293, 3.2571681215498294, 2.6078146937378675, 3.681652836947111, 2.6496818192563047, 2.8335858203469124, 2.9995818449569347, 3.5096222958102508, 3.3785564659974625, 2.7767301900613433, 3.549293446740098, 2.9431007048407984, 3.3118277772468065, 3.469951144880404, 3.1149356728777287, 2.9679622680730158, 3.218220467744846, 3.5933565468471254, 3.534679432142399, 3.464835698578359, 3.519495354275337, 3.821868228100123, 3.59225855955283, 3.1866008908609267, 3.0670340308324535, 3.0493049714213694, 3.07910306388613, 3.108901156350891, 3.385688969818596, 2.833781393446843, 2.9966623732434803, 3.591160572258535, 3.5951780312662844, 2.931031671787121, 2.964847223259631, 3.1489467974501695, 3.3178622937736453, 3.169595328574707, 2.6873359126670624, 3.041448970475372, 3.087154637932058, 3.046385499707916, 3.3555163871844034, 2.519143913468449, 3.0354144539485337, 3.132315725333381, 3.079477554055562, 3.870867857439694, 3.5478464524903712, 3.441446612809869, 4.394994398865776, 2.553529528210322, 4.345271272401341, 3.9375965461903495, 3.5855005459011284, 3.272156626316961, 3.4791007062206267, 3.563558454382365, 2.9730777143750613, 2.7264624835415483, 3.0922700842341024, 3.59647159166051, 4.220963456240255, 4.525336731553701, 3.5397948784444435, 3.5088987986853875, 2.808000759989832, 4.6008404914751475, 3.3467413160136124, 3.505979326971934, 4.108529100032915, 3.2313874880928184, 3.1628373149230047, 3.430475567050488, 4.21784841142687, 4.28749657189098, 4.354046343572132, 3.3648448655941285, 3.5904370751336714, 3.7756346366185047, 4.139229606692041, 3.0297544275911275, 3.516950372731315, 3.072345050234428, 3.5575239378555272, 3.322977740075691, 3.318764707968012, 3.079477554055562, 3.525904360971608, 3.2434565211464963, 3.2701395687978723, 3.3666663500132867, 3.3577123617729945, 2.9690602553673116, 3.1251832215122475 ], "z": [ -2.597986800474729, -2.204117499819917, -2.364096158239889, -2.202620340380499, -2.6591838680287125, -2.720460967792789, -2.4127383739531743, -2.474095505927343, -2.06759132122174, -2.2890071438259736, -2.745570671474187, -2.4114012789339405, -2.240284895902596, -2.2902642066351135, -3.129799406957043, -3.2020541669123084, -2.870799061040805, -2.5492645525513513, -2.7330958518435806, -2.7329357874233953, -2.4491458666661305, -2.6104616201053354, -2.7593025539139333, -2.191562647979218, -2.2986477089979287, -2.1415033050366077, -2.3390664867685835, -2.5729571290034237, -2.536789732920745, -2.251342588303877, -2.1901455207498928, -2.426870417443384, -3.0641908932948976, -3.1644696436003037, -2.2890071438259736, -2.4393452370739896, -2.6857907311495284, -2.2890071438259736, -2.1789277639284266, -2.486650357768042, -2.5742942240226565, -1.6264969320829687, -2.3264316027177925, -2.315373910316511, -2.5338754462520017, -2.142840400055841, -2.7440735120347686, -2.313956783087186, -2.733015819633488, -2.4379281098446643, -0.7903169828872939, -0.7414346705437316, -0.5801189171045265, -0.2500408418320719, -0.42139732149369524, -0.4559875259268629, -0.678740443550329, -0.6577220571367226, -0.6051485885758319, -0.4962462392773328, -0.3001001847746821, -0.6439101424868835, -0.3852299254110164, -0.4560675581369556, -0.8554453032888835, -0.7916540779065271, -0.49349201702877443, -0.6912952953910285, 0.021194227084493922, -0.5451285516208955, -0.41974009763409226, -0.6941295498496789, -0.0622782896922367, -0.47976013458902744, -0.6927924548304467, -0.7053473066711453, -0.4326150783151611, -0.34622827486968666, -0.46995950499688716, -0.8304956640276712, -0.4964063036975178, -0.5827130749328997, -0.6690198461682814, -0.048226278412119794, -0.4683823133473766, -0.7899968540469233, -0.630178260047137, -0.20014156330964666, -0.741274606123546, -0.3975446806214375, -0.369680754691481, -0.5674040008436427, -0.5576834034615945, -0.5965249895827385, -0.4824343246274938, -0.7649671825756179, -0.6424930152575583, -0.6676827511490486, -0.8206150022254374, -0.6063256191748791, 0.248358590816122, 0.12305016903941046, 0.13670201926906456, 0.05197243968319365, 0.22316885492463168, 0.3370194232495983, 0.060596038676287045, 0.18951558446023264, 0.37192975652313676, -0.048306310622212356, -0.2848711428955176, 0.122890104619225, 0.024028481543144853, 0.2942465842808488, 0.2929094892616162, -0.050980500660678385, -0.0844737067048913, -0.17924401251318023, 0.8296703147703977, 0.23422654732591242, 0.03658333338384345, 0.04796115462549588, 0.4608306855868918, -0.06361538471146944, -0.10950337817619604, -0.13193889181912866, -0.16239697557745758, -0.25976143921412, 0.2593362510073105, -0.15704859550052674, 0.22426585331358662, -0.4145517819773459, 0.3080584989306877, -0.20836500125227686, 0.10344890985512933, 0.2339864506956344, -0.024453669749954776, -0.1456707742588752, -0.28479111068542545, -0.0998628130042406, 0.14658268107129757, -0.11517188709349807, 0.12305016903941046, 0.12430723184855108, 0.08538561351731433, 0.021274259294586872, 0.17019522531327758, -0.09978278079414793, -0.1357901124566413, -0.1594826889087143 ] } ], "layout": { "height": 400, "margin": { "b": 10, "l": 10, "pad": 10, "r": 10, "t": 10 }, "width": 800 } }, "text/html": [ "
" ], "text/vnd.plotly.v1+html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import plotly\n", "import plotly.graph_objs as go\n", "plotly.offline.init_notebook_mode()\n", "\n", "plot = go.Scatter3d(x=threed_x, y=threed_y, z=threed_z, mode='markers',\n", " marker=dict(size=5, color=[colors[x] for x in dataset.target]))\n", "\n", "layout = go.Layout(width=800, height=400, margin=go.layout.Margin(l=10, r=10, b=10, t=10, pad=10))\n", "\n", "data = [plot]\n", "\n", "plotly.offline.iplot({'data': data, 'layout': layout})" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" } }, "nbformat": 4, "nbformat_minor": 2 }