{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# A Network Tour of Data Science\n", "###       Xavier Bresson, Winter 2016/17\n", "## Assignment 1 : Unsupervised Clustering with the Normalized Association" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Load libraries\n", "\n", "# Math\n", "import numpy as np\n", "\n", "# Visualization \n", "%matplotlib notebook \n", "import matplotlib.pyplot as plt\n", "plt.rcParams.update({'figure.max_open_warning': 0})\n", "from mpl_toolkits.axes_grid1 import make_axes_locatable\n", "from scipy import ndimage\n", "\n", "# Print output of LFR code\n", "import subprocess\n", "\n", "# Sparse matrix\n", "import scipy.sparse\n", "import scipy.sparse.linalg\n", "\n", "# 3D visualization\n", "import pylab\n", "from mpl_toolkits.mplot3d import Axes3D\n", "from matplotlib import pyplot\n", "\n", "# Import data\n", "import scipy.io\n", "\n", "# Import functions in lib folder\n", "import sys\n", "sys.path.insert(1, 'lib')\n", "\n", "# Import helper functions\n", "%load_ext autoreload\n", "%autoreload 2\n", "from lib.utils import construct_kernel\n", "from lib.utils import compute_kernel_kmeans_EM\n", "from lib.utils import compute_purity\n", "\n", "# Import distance function\n", "import sklearn.metrics.pairwise\n", "\n", "# Remove warnings\n", "import warnings\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 1:** Write down the mathematical relationship between Normalized Cut (NCut) and Normalized Association (NAssoc) for K clusters. It is not necessary to provide details.\n", "\n", "The Normalized Cut problem is defined as:

\n", "$$\n", "\\min_{\\{S_k\\}}\\ NCut(\\{S_k\\}) := \\sum_{k=1}^K \\frac{Cut(S_k,S_k^c)}{Vol(S_k)} \\ \\textrm{ s.t. } \\ \\cup_{k=1}^{K} S_k = V, \\ S_k \\cap S_{k'}=\\emptyset, \\ \\forall k \\not= k' \\quad\\quad\\quad(1)\n", "$$\n", "\n", "and the Normalized Association problem is defined as:

\n", "$$\n", "\\max_{\\{S_k\\}}\\ NAssoc(\\{S_k\\}):= \\sum_{k=1}^K \\frac{Assoc(S_k,S_k)}{Vol(S_k)} \\ \\textrm{ s.t. } \\ \\cup_{k=1}^{K} S_k = V, \\ S_k \\cap S_{k'}=\\emptyset, \\ \\forall k \\not= k' .\n", "$$\n", "\n", "\n", "We may rewrite the Cut operator and the Volume operator with the Assoc operator as:

\n", "$$\n", "Vol(S_k) = \\sum_{i\\in S_k, j\\in V} W_{ij} \\\\\n", "Assoc(S_k,S_k) = \\sum_{i\\in S_k, j\\in S_k} W_{ij} \\\\\n", "Cut(S_k,S_k^c) = \\sum_{i\\in S_k, j\\in S_k^c=V\\setminus S_k} W_{ij} = \\sum_{i\\in S_k, j\\in V} W_{ij} - \\sum_{i\\in S_k, j\\in S_k} W_{ij} = Vol(S_k) - Assoc(S_k,S_k) \n", "$$\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Answer to Q1:** Your answer here.\n", "\n", "We have

\n", "$$\n", "\\frac{Cut(S_k,S_k^c)}{Vol(S_k)} = \\frac{Vol(S_k) - Assoc(S_k,S_k)}{Vol(S_k)} = 1- \\frac{Assoc(S_k,S_k)}{Vol(S_k)}\n", "$$\n", "\n", "and

\n", "\n", "$$\n", "\\sum_{k=1}^K \\frac{Cut(S_k,S_k^c)}{Vol(S_k)} = K - \\sum_{k=1}^K \\frac{Assoc(S_k,S_k)}{Vol(S_k)}\n", "$$\n", "\n", "The relationship between Normalized Cut (NCut) and Normalized Association (NAssoc) for K clusters is thus

\n", "\n", "$$\n", "NCut(\\{S_k\\}) = K - NAssoc(\\{S_k\\}).\n", "$$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 2:** Using the relationship between NCut and NAssoc from Q1, it is therefore equivalent to maximize NAssoc by minimizing or maximizing NCut? That is \n", "\n", "$$\n", "\\max_{\\{S_k\\}}\\ NAssoc(\\{S_k\\}) \\ \\textrm{ s.t. } \\cup_{k=1}^{K} S_k = V, \\quad S_k \\cap S_{k'}=\\emptyset, \\ \\forall k \\not= k'\n", "$$\n", "\n", "$$\n", "\\Updownarrow\n", "$$\n", "\n", "$$\n", "\\min_{\\{S_k\\}}\\ NCut(\\{S_k\\}) \\ \\textrm{ s.t. } \\cup_{k=1}^{K} S_k = V, \\quad S_k \\cap S_{k'}=\\emptyset, \\ \\forall k \\not= k'\n", "$$\n", "\n", "or\n", "\n", "$$\n", "\\max_{\\{S_k\\}}\\ NCut(\\{S_k\\}) \\ \\textrm{ s.t. } \\cup_{k=1}^{K} S_k = V, \\quad S_k \\cap S_{k'}=\\emptyset, \\ \\forall k \\not= k'\n", "$$\n", "\n", "It is not necessary to provide details." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Answer to Q2:** Your answer here.\n", "\n", "As $\\min F \\Leftrightarrow \\max -F$, we have equivalence between the max NAssoc problem:\n", "\n", "$$\n", "\\max_{\\{S_k\\}}\\ NAssoc(\\{S_k\\}) \\ \\textrm{ s.t. } \\cup_{k=1}^{K} S_k = V, \\quad S_k \\cap S_{k'}=\\emptyset, \\ \\forall k \\not= k'\n", "$$\n", "\n", "and the min NCut problem:\n", "\n", "$$\n", "\\min_{\\{S_k\\}}\\ NCut(\\{S_k\\}) \\ \\textrm{ s.t. } \\cup_{k=1}^{K} S_k = V, \\quad S_k \\cap S_{k'}=\\emptyset, \\ \\forall k \\not= k'\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 3:** Solving the NCut problem in Q2 is NP-hard => let us consider a spectral relaxation of NCut. Write down the Spectral Matrix A of NCut that satisfies the equivalent functional optimization problem of Q2: \n", "\n", "$$\n", "\\min_{Y}\\ tr( Y^\\top A Y) \\ \\textrm{ s.t. } \\ Y^\\top Y = I_K \\textrm{ and } Y \\in Ind_S, \\quad\\quad\\quad(3)\n", "$$\n", "\n", "where\n", "\n", "$$\n", "Y \\in Ind_S \\ \\textrm{ reads as } \\ Y_{ik} = \n", "\\left\\{\n", "\\begin{array}{ll}\n", "\\big(\\frac{D_{ii}}{Vol(S_k)}\\big)^{1/2} & \\textrm{if} \\ i \\in S_k\\\\\n", "0 & \\textrm{otherwise}\n", "\\end{array}\n", "\\right..\n", "$$\n", "\n", "and\n", "\n", "$$\n", "A=???\n", "$$\n", "\n", "It is not necessary to provide details.\n", "\n", "*Hint:* Let us introduce the indicator matrix $F$ of the clusters $S_k$ such that:\n", "\n", "$$\n", "F_{ik} = \n", "\\left\\{\n", "\\begin{array}{ll}\n", "1 & \\textrm{if} \\ i \\in S_k\\\\\n", "0 & \\textrm{otherwise}\n", "\\end{array}\n", "\\right..\n", "$$\n", "\n", "We may rewrite the Cut operator and the Volume operator with $F$ as:\n", "\n", "$$\n", "Vol(S_k) = \\sum_{i\\in S_k, j\\in V} W_{ij} = F_{\\cdot,k}^\\top D F_{\\cdot,k}\\\\\n", "Cut(S_k,S_k^c) = \\sum_{i\\in S_k, j\\in V} W_{ij} - \\sum_{i\\in S_k, j\\in S_k} W_{ij} = F_{\\cdot,k}^\\top D F_{\\cdot,k} - F_{\\cdot,k}^\\top W F_{\\cdot,k} = F_{\\cdot,k}^\\top (D - W) F_{\\cdot,k} \\quad\n", "$$\n", "\n", "We thus have\n", "\n", "$$\n", "\\frac{Cut(S_k,S_k^c)}{Vol(S_k)} = \\frac{ F_{\\cdot,k}^\\top (D - W) F_{\\cdot,k} }{ F_{\\cdot,k}^\\top D F_{\\cdot,k} } \n", "$$\n", "\n", "\n", "Set $\\hat{F}_{\\cdot,k}=D^{1/2}F_{\\cdot,k}$ and observe that\n", "\n", "$$\n", "\\frac{ F_{\\cdot,k}^\\top (D - W) F_{\\cdot,k} }{ F_{\\cdot,k}^\\top D F_{\\cdot,k} } = \\frac{ \\hat{F}_{\\cdot,k}^\\top D^{-1/2}(D - W)D^{-1/2} \\hat{F}_{\\cdot,k} }{ \\hat{F}_{\\cdot,k}^\\top \\hat{F}_{\\cdot,k} } = \\frac{ \\hat{F}_{\\cdot,k}^\\top (I - D^{-1/2}WD^{-1/2}) \\hat{F}_{\\cdot,k} }{ \\hat{F}_{\\cdot,k}^\\top \\hat{F}_{\\cdot,k} } ,\n", "$$\n", "\n", "with $L_N=I - D^{-1/2}WD^{-1/2}$ is the normalized graph Laplacian. Set $Y_{\\cdot,k}=\\frac{\\hat{F}_{\\cdot,k}}{\\|\\hat{F}_{\\cdot,k}\\|_2}$:\n", "\n", "$$\n", "\\frac{ \\hat{F}_{\\cdot,k}^\\top L_N \\hat{F}_{\\cdot,k} }{ \\hat{F}_{\\cdot,k}^\\top \\hat{F}_{\\cdot,k} } = Y_{\\cdot,k}^\\top L_N Y_{\\cdot,k} \\quad\\quad\\quad(2)\n", "$$\n", "\n", "\n", "Using (2), we can rewrite (1) as a functional optimization problem:\n", "\n", "$$\n", "\\min_{Y}\\ tr( Y^\\top A Y) \\ \\textrm{ s.t. } \\ Y^\\top Y = I_K \\textrm{ and } Y \\in Ind_S,\n", "$$\n", "\n", "where\n", "\n", "\n", "$$\n", "Y \\in Ind_S \\ \\textrm{ reads as } \\ Y_{ik} = \n", "\\left\\{\n", "\\begin{array}{ll}\n", "\\big(\\frac{D_{ii}}{Vol(S_k)}\\big)^{1/2} & \\textrm{if} \\ i \\in S_k\\\\\n", "0 & \\textrm{otherwise}\n", "\\end{array}\n", "\\right..\n", "$$\n", "\n", "and\n", "\n", "$$\n", "A=???\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Answer to Q3:** Let us introduce the indicator matrix $F$ of the clusters $S_k$ such that:\n", "\n", "$$\n", "F_{ik} = \n", "\\left\\{\n", "\\begin{array}{ll}\n", "1 & \\textrm{if} \\ i \\in S_k\\\\\n", "0 & \\textrm{otherwise}\n", "\\end{array}\n", "\\right..\n", "$$\n", "\n", "We may rewrite the Cut operator and the Volume operator with $F$ as:\n", "\n", "$$\n", "Vol(S_k) = \\sum_{i\\in S_k, j\\in V} W_{ij} = F_{\\cdot,k}^\\top D F_{\\cdot,k}\\\\\n", "Cut(S_k,S_k^c) = \\sum_{i\\in S_k, j\\in V} W_{ij} - \\sum_{i\\in S_k, j\\in S_k} W_{ij} = F_{\\cdot,k}^\\top D F_{\\cdot,k} - F_{\\cdot,k}^\\top W F_{\\cdot,k} = F_{\\cdot,k}^\\top (D - W) F_{\\cdot,k} \\quad\n", "$$\n", "\n", "We thus have\n", "\n", "$$\n", "\\frac{Cut(S_k,S_k^c)}{Vol(S_k)} = \\frac{ F_{\\cdot,k}^\\top (D - W) F_{\\cdot,k} }{ F_{\\cdot,k}^\\top D F_{\\cdot,k} } \n", "$$\n", "\n", "\n", "Set $\\hat{F}_{\\cdot,k}=D^{1/2}F_{\\cdot,k}$ and observe that\n", "\n", "$$\n", "\\frac{ F_{\\cdot,k}^\\top (D - W) F_{\\cdot,k} }{ F_{\\cdot,k}^\\top D F_{\\cdot,k} } = \\frac{ \\hat{F}_{\\cdot,k}^\\top D^{-1/2}(D - W)D^{-1/2} \\hat{F}_{\\cdot,k} }{ \\hat{F}_{\\cdot,k}^\\top \\hat{F}_{\\cdot,k} } = \\frac{ \\hat{F}_{\\cdot,k}^\\top (I - D^{-1/2}WD^{-1/2}) \\hat{F}_{\\cdot,k} }{ \\hat{F}_{\\cdot,k}^\\top \\hat{F}_{\\cdot,k} } ,\n", "$$\n", "\n", "where $L_N=I - D^{-1/2}WD^{-1/2}$ is the normalized graph Laplacian. Set $Y_{\\cdot,k}=\\frac{\\hat{F}_{\\cdot,k}}{\\|\\hat{F}_{\\cdot,k}\\|_2}$, we have:\n", "\n", "$$\n", "\\frac{ \\hat{F}_{\\cdot,k}^\\top L_N \\hat{F}_{\\cdot,k} }{ \\hat{F}_{\\cdot,k}^\\top \\hat{F}_{\\cdot,k} } = Y_{\\cdot,k}^\\top L_N Y_{\\cdot,k} \\quad\\quad\\quad(2)\n", "$$\n", "\n", "\n", "Using (2), we can rewrite (1) as a functional optimization problem:\n", "\n", "$$\n", "\\min_{Y}\\ tr( Y^\\top A Y) \\ \\textrm{ s.t. } \\ Y^\\top Y = I_K \\textrm{ and } Y \\in Ind_S,\n", "$$\n", "\n", "where\n", "\n", "$$\n", "Y \\in Ind_S \\ \\textrm{ reads as } \\ Y_{ik} = \n", "\\left\\{\n", "\\begin{array}{ll}\n", "\\big(\\frac{D_{ii}}{Vol(S_k)}\\big)^{1/2} & \\textrm{if} \\ i \\in S_k\\\\\n", "0 & \\textrm{otherwise}\n", "\\end{array}\n", "\\right..\n", "$$\n", "\n", "and\n", "\n", "$$\n", "A=L_N=I-D^{-1/2}WD^{-1/2}.\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 4:** Drop the cluster indicator constraint $Y\\in Ind_S$ in Q3, how do you compute the solution $Y^\\star$ of (3)? Why the first column of $Y^\\star$ is not relevant for clustering?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Answer to Q4:** Your answer here.\n", "\n", "Dropping the constraint $Y\\in Ind_S$ in (3) leads to a standard spectral relaxation problem:\n", "\n", "$$\n", "\\min_{Y}\\ tr( Y^\\top A Y) \\ \\textrm{ s.t. } \\ Y^\\top Y = I_K,\n", "$$\n", "\n", "which solution $Y^\\star$ is given by the $K$ smallest eigenvectors/eigenvalues of $A=L_N=I-D^{-1/2}WD^{-1/2}$. Note that the first column of $Y^\\star$ is the constant signal $y_1=\\frac{1}{\\sqrt{n}}1_{n\\times 1}$ associated to the smallest eigenvalue of $L_N$, which has value $\\lambda_1=0$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 5:** Plot in 3D the 2nd, 3rd, 4th columns of $Y^\\star$.
\n", "Hint: Compute the degree matrix $D$.
\n", "Hint: You may use function *D_sqrt_inv = scipy.sparse.diags(d_sqrt_inv.A.squeeze(), 0)* for creating $D^{-1/2}$.
\n", "Hint: You may use function *I = scipy.sparse.identity(d.size, dtype=W.dtype)* for creating a sparse identity matrix.
\n", "Hint: You may use function *lamb, U = scipy.sparse.linalg.eigsh(A, k=4, which='SM')* to perform the eigenvalue decomposition of A.
\n", "Hint: You may use function *ax.scatter(Xdisp, Ydisp, Zdisp, c=Cgt)* for 3D visualization." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of nodes = 2000\n", "Number of classes = 10\n" ] } ], "source": [ "# Load dataset: W is the Adjacency Matrix and Cgt is the ground truth clusters\n", "mat = scipy.io.loadmat('datasets/mnist_2000_graph.mat')\n", "W = mat['W']\n", "n = W.shape[0]\n", "Cgt = mat['Cgt'] - 1; Cgt = Cgt.squeeze()\n", "nc = len(np.unique(Cgt))\n", "print('Number of nodes =',n)\n", "print('Number of classes =',nc);" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1.34828146e-05 2.81922200e-02 3.81623410e-02 5.08202654e-02]\n" ] }, { "data": { "application/javascript": [ "/* Put everything inside the global mpl namespace */\n", "window.mpl = {};\n", "\n", "mpl.get_websocket_type = function() {\n", " if (typeof(WebSocket) !== 'undefined') {\n", " return WebSocket;\n", " } else if (typeof(MozWebSocket) !== 'undefined') {\n", " return MozWebSocket;\n", " } else {\n", " alert('Your browser does not have WebSocket support.' +\n", " 'Please try Chrome, Safari or Firefox ≥ 6. ' +\n", " 'Firefox 4 and 5 are also supported but you ' +\n", " 'have to enable WebSockets in about:config.');\n", " };\n", "}\n", "\n", "mpl.figure = function(figure_id, websocket, ondownload, parent_element) {\n", " this.id = figure_id;\n", "\n", " this.ws = websocket;\n", "\n", " this.supports_binary = (this.ws.binaryType != undefined);\n", "\n", " if (!this.supports_binary) {\n", " var warnings = document.getElementById(\"mpl-warnings\");\n", " if (warnings) {\n", " warnings.style.display = 'block';\n", " warnings.textContent = (\n", " \"This browser does not support binary websocket messages. \" +\n", " \"Performance may be slow.\");\n", " }\n", " }\n", "\n", " this.imageObj = new Image();\n", "\n", " this.context = undefined;\n", " this.message = undefined;\n", " this.canvas = undefined;\n", " this.rubberband_canvas = undefined;\n", " this.rubberband_context = undefined;\n", " this.format_dropdown = undefined;\n", "\n", " this.image_mode = 'full';\n", "\n", " this.root = $('
');\n", " this._root_extra_style(this.root)\n", " this.root.attr('style', 'display: inline-block');\n", "\n", " $(parent_element).append(this.root);\n", "\n", " this._init_header(this);\n", " this._init_canvas(this);\n", " this._init_toolbar(this);\n", "\n", " var fig = this;\n", "\n", " this.waiting = false;\n", "\n", " this.ws.onopen = function () {\n", " fig.send_message(\"supports_binary\", {value: fig.supports_binary});\n", " fig.send_message(\"send_image_mode\", {});\n", " fig.send_message(\"refresh\", {});\n", " }\n", "\n", " this.imageObj.onload = function() {\n", " if (fig.image_mode == 'full') {\n", " // Full images could contain transparency (where diff images\n", " // almost always do), so we need to clear the canvas so that\n", " // there is no ghosting.\n", " fig.context.clearRect(0, 0, fig.canvas.width, fig.canvas.height);\n", " }\n", " fig.context.drawImage(fig.imageObj, 0, 0);\n", " };\n", "\n", " this.imageObj.onunload = function() {\n", " this.ws.close();\n", " }\n", "\n", " this.ws.onmessage = this._make_on_message_function(this);\n", "\n", " this.ondownload = ondownload;\n", "}\n", "\n", "mpl.figure.prototype._init_header = function() {\n", " var titlebar = $(\n", " '
');\n", " var titletext = $(\n", " '
');\n", " titlebar.append(titletext)\n", " this.root.append(titlebar);\n", " this.header = titletext[0];\n", "}\n", "\n", "\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "\n", "mpl.figure.prototype._root_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "mpl.figure.prototype._init_canvas = function() {\n", " var fig = this;\n", "\n", " var canvas_div = $('
');\n", "\n", " canvas_div.attr('style', 'position: relative; clear: both; outline: 0');\n", "\n", " function canvas_keyboard_event(event) {\n", " return fig.key_event(event, event['data']);\n", " }\n", "\n", " canvas_div.keydown('key_press', canvas_keyboard_event);\n", " canvas_div.keyup('key_release', canvas_keyboard_event);\n", " this.canvas_div = canvas_div\n", " this._canvas_extra_style(canvas_div)\n", " this.root.append(canvas_div);\n", "\n", " var canvas = $('');\n", " canvas.addClass('mpl-canvas');\n", " canvas.attr('style', \"left: 0; top: 0; z-index: 0; outline: 0\")\n", "\n", " this.canvas = canvas[0];\n", " this.context = canvas[0].getContext(\"2d\");\n", "\n", " var rubberband = $('');\n", " rubberband.attr('style', \"position: absolute; left: 0; top: 0; z-index: 1;\")\n", "\n", " var pass_mouse_events = true;\n", "\n", " canvas_div.resizable({\n", " start: function(event, ui) {\n", " pass_mouse_events = false;\n", " },\n", " resize: function(event, ui) {\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " stop: function(event, ui) {\n", " pass_mouse_events = true;\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " });\n", "\n", " function mouse_event_fn(event) {\n", " if (pass_mouse_events)\n", " return fig.mouse_event(event, event['data']);\n", " }\n", "\n", " rubberband.mousedown('button_press', mouse_event_fn);\n", " rubberband.mouseup('button_release', mouse_event_fn);\n", " // Throttle sequential mouse events to 1 every 20ms.\n", " rubberband.mousemove('motion_notify', mouse_event_fn);\n", "\n", " rubberband.mouseenter('figure_enter', mouse_event_fn);\n", " rubberband.mouseleave('figure_leave', mouse_event_fn);\n", "\n", " canvas_div.on(\"wheel\", function (event) {\n", " event = event.originalEvent;\n", " event['data'] = 'scroll'\n", " if (event.deltaY < 0) {\n", " event.step = 1;\n", " } else {\n", " event.step = -1;\n", " }\n", " mouse_event_fn(event);\n", " });\n", "\n", " canvas_div.append(canvas);\n", " canvas_div.append(rubberband);\n", "\n", " this.rubberband = rubberband;\n", " this.rubberband_canvas = rubberband[0];\n", " this.rubberband_context = rubberband[0].getContext(\"2d\");\n", " this.rubberband_context.strokeStyle = \"#000000\";\n", "\n", " this._resize_canvas = function(width, height) {\n", " // Keep the size of the canvas, canvas container, and rubber band\n", " // canvas in synch.\n", " canvas_div.css('width', width)\n", " canvas_div.css('height', height)\n", "\n", " canvas.attr('width', width);\n", " canvas.attr('height', height);\n", "\n", " rubberband.attr('width', width);\n", " rubberband.attr('height', height);\n", " }\n", "\n", " // Set the figure to an initial 600x600px, this will subsequently be updated\n", " // upon first draw.\n", " this._resize_canvas(600, 600);\n", "\n", " // Disable right mouse context menu.\n", " $(this.rubberband_canvas).bind(\"contextmenu\",function(e){\n", " return false;\n", " });\n", "\n", " function set_focus () {\n", " canvas.focus();\n", " canvas_div.focus();\n", " }\n", "\n", " window.setTimeout(set_focus, 100);\n", "}\n", "\n", "mpl.figure.prototype._init_toolbar = function() {\n", " var fig = this;\n", "\n", " var nav_element = $('
')\n", " nav_element.attr('style', 'width: 100%');\n", " this.root.append(nav_element);\n", "\n", " // Define a callback function for later on.\n", " function toolbar_event(event) {\n", " return fig.toolbar_button_onclick(event['data']);\n", " }\n", " function toolbar_mouse_event(event) {\n", " return fig.toolbar_button_onmouseover(event['data']);\n", " }\n", "\n", " for(var toolbar_ind in mpl.toolbar_items) {\n", " var name = mpl.toolbar_items[toolbar_ind][0];\n", " var tooltip = mpl.toolbar_items[toolbar_ind][1];\n", " var image = mpl.toolbar_items[toolbar_ind][2];\n", " var method_name = mpl.toolbar_items[toolbar_ind][3];\n", "\n", " if (!name) {\n", " // put a spacer in here.\n", " continue;\n", " }\n", " var button = $('');\n", " button.click(method_name, toolbar_event);\n", " button.mouseover(tooltip, toolbar_mouse_event);\n", " nav_element.append(button);\n", " }\n", "\n", " // Add the status bar.\n", " var status_bar = $('');\n", " nav_element.append(status_bar);\n", " this.message = status_bar[0];\n", "\n", " // Add the close button to the window.\n", " var buttongrp = $('
');\n", " var button = $('');\n", " button.click(function (evt) { fig.handle_close(fig, {}); } );\n", " button.mouseover('Stop Interaction', toolbar_mouse_event);\n", " buttongrp.append(button);\n", " var titlebar = this.root.find($('.ui-dialog-titlebar'));\n", " titlebar.prepend(buttongrp);\n", "}\n", "\n", "mpl.figure.prototype._root_extra_style = function(el){\n", " var fig = this\n", " el.on(\"remove\", function(){\n", "\tfig.close_ws(fig, {});\n", " });\n", "}\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(el){\n", " // this is important to make the div 'focusable\n", " el.attr('tabindex', 0)\n", " // reach out to IPython and tell the keyboard manager to turn it's self\n", " // off when our div gets focus\n", "\n", " // location in version 3\n", " if (IPython.notebook.keyboard_manager) {\n", " IPython.notebook.keyboard_manager.register_events(el);\n", " }\n", " else {\n", " // location in version 2\n", " IPython.keyboard_manager.register_events(el);\n", " }\n", "\n", "}\n", "\n", "mpl.figure.prototype._key_event_extra = function(event, name) {\n", " var manager = IPython.notebook.keyboard_manager;\n", " if (!manager)\n", " manager = IPython.keyboard_manager;\n", "\n", " // Check for shift+enter\n", " if (event.shiftKey && event.which == 13) {\n", " this.canvas_div.blur();\n", " // select the cell after this one\n", " var index = IPython.notebook.find_cell_index(this.cell_info[0]);\n", " IPython.notebook.select(index + 1);\n", " }\n", "}\n", "\n", "mpl.figure.prototype.handle_save = function(fig, msg) {\n", " fig.ondownload(fig, null);\n", "}\n", "\n", "\n", "mpl.find_output_cell = function(html_output) {\n", " // Return the cell and output element which can be found *uniquely* in the notebook.\n", " // Note - this is a bit hacky, but it is done because the \"notebook_saving.Notebook\"\n", " // IPython event is triggered only after the cells have been serialised, which for\n", " // our purposes (turning an active figure into a static one), is too late.\n", " var cells = IPython.notebook.get_cells();\n", " var ncells = cells.length;\n", " for (var i=0; i= 3 moved mimebundle to data attribute of output\n", " data = data.data;\n", " }\n", " if (data['text/html'] == html_output) {\n", " return [cell, data, j];\n", " }\n", " }\n", " }\n", " }\n", "}\n", "\n", "// Register the function which deals with the matplotlib target/channel.\n", "// The kernel may be null if the page has been refreshed.\n", "if (IPython.notebook.kernel != null) {\n", " IPython.notebook.kernel.comm_manager.register_target('matplotlib', mpl.mpl_figure_comm);\n", "}\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Your code here\n", "\n", "# Construct Spectal Matrix A\n", "d = W.sum(axis=0) + 1e-6 # degree vector\n", "d = 1.0 / np.sqrt(d)\n", "Dinv = scipy.sparse.diags(d.A.squeeze(), 0)\n", "I = scipy.sparse.identity(d.size, dtype=W.dtype)\n", "A = I - Dinv* (W* Dinv)\n", "\n", "# Compute K smallest eigenvectors/eigenvalues of A\n", "lamb, U = scipy.sparse.linalg.eigsh(A, k=4, which='SM')\n", "\n", "# Sort eigenvalue from smallest to largest values\n", "idx = lamb.argsort() # increasing order\n", "lamb, U = lamb[idx], U[:,idx]\n", "print(lamb)\n", "\n", "# Y*\n", "Y = U\n", "\n", "# Plot in 3D the 2nd, 3rd, 4th columns of Y*\n", "Xdisp = Y[:,1]\n", "Ydisp = Y[:,2]\n", "Zdisp = Y[:,3]\n", "\n", "# 2D Visualization\n", "plt.figure(14)\n", "size_vertex_plot = 10\n", "plt.scatter(Xdisp, Ydisp, s=size_vertex_plot*np.ones(n), c=Cgt)\n", "plt.title('2D Visualization')\n", "plt.show()\n", "\n", "# 3D Visualization\n", "fig = pylab.figure(15)\n", "ax = Axes3D(fig)\n", "ax.scatter(Xdisp, Ydisp, Zdisp, c=Cgt)\n", "pylab.title('3D Visualization')\n", "pyplot.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 6:** Solve the unsupervised clustering problem for MNIST following the popular technique of [Ng, Jordan, Weiss, “On Spectral Clustering: Analysis and an algorithm”, 2002], i.e.
\n", "(1) Compute $Y^\\star$? solution of Q4.
\n", "(2) Normalize the rows of $Y^\\star$? with the L2-norm.
\n", "Hint: You may use function X = ( X.T / np.sqrt(np.sum(X**2,axis=1)+1e-10) ).T for the L2-normalization of the rows of X.
\n", "(3) Run standard K-Means on normalized $Y^\\star$? to get the clusters, and compute the clustering accuracy. You should get more than 50% accuracy. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Your code here\n", "# Normalize the rows of Y* with the L2 norm, i.e. ||y_i||_2 = 1\n", "Y = ( Y.T / np.sqrt(np.sum(Y**2,axis=1)+1e-10) ).T " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Construct Linear Kernel\n", "accuracy standard kmeans= 55.65\n" ] } ], "source": [ "# Your code here\n", "# Run standard K-Means\n", "Ker = construct_kernel(Y,'linear') # Compute linear Kernel for standard K-Means\n", "Theta = np.ones(n) # Equal weight for each data\n", "[C_kmeans,En_kmeans] = compute_kernel_kmeans_EM(nc,Ker,Theta,10)\n", "acc= compute_purity(C_kmeans,Cgt,nc)\n", "print('accuracy standard kmeans=',acc)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }