{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A Network Tour of Data Science\n",
    "### &nbsp; &nbsp; &nbsp; Xavier Bresson, Winter 2016/17\n",
    "## Assignment 1 : Unsupervised Clustering with the Normalized Association"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Load libraries\n",
    "\n",
    "# Math\n",
    "import numpy as np\n",
    "\n",
    "# Visualization \n",
    "%matplotlib notebook \n",
    "import matplotlib.pyplot as plt\n",
    "plt.rcParams.update({'figure.max_open_warning': 0})\n",
    "from mpl_toolkits.axes_grid1 import make_axes_locatable\n",
    "from scipy import ndimage\n",
    "\n",
    "# Print output of LFR code\n",
    "import subprocess\n",
    "\n",
    "# Sparse matrix\n",
    "import scipy.sparse\n",
    "import scipy.sparse.linalg\n",
    "\n",
    "# 3D visualization\n",
    "import pylab\n",
    "from mpl_toolkits.mplot3d import Axes3D\n",
    "from matplotlib import pyplot\n",
    "\n",
    "# Import data\n",
    "import scipy.io\n",
    "\n",
    "# Import functions in lib folder\n",
    "import sys\n",
    "sys.path.insert(1, 'lib')\n",
    "\n",
    "# Import helper functions\n",
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "from lib.utils import construct_kernel\n",
    "from lib.utils import compute_kernel_kmeans_EM\n",
    "from lib.utils import compute_purity\n",
    "\n",
    "# Import distance function\n",
    "import sklearn.metrics.pairwise\n",
    "\n",
    "# Remove warnings\n",
    "import warnings\n",
    "warnings.filterwarnings(\"ignore\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question 1:** Write down the mathematical relationship between Normalized Cut (NCut) and Normalized Association (NAssoc) for K clusters. It is not necessary to provide details.\n",
    "\n",
    "The Normalized Cut problem is defined as:<br><br>\n",
    "$$\n",
    "\\min_{\\{S_k\\}}\\   NCut(\\{S_k\\}) := \\sum_{k=1}^K \\frac{Cut(S_k,S_k^c)}{Vol(S_k)} \\ \\textrm{ s.t. } \\ \\cup_{k=1}^{K} S_k = V, \\ S_k \\cap S_{k'}=\\emptyset, \\ \\forall k \\not= k' \\quad\\quad\\quad(1)\n",
    "$$\n",
    "\n",
    "and the Normalized Association problem is defined as:<br><br>\n",
    "$$\n",
    "\\max_{\\{S_k\\}}\\   NAssoc(\\{S_k\\}):= \\sum_{k=1}^K \\frac{Assoc(S_k,S_k)}{Vol(S_k)} \\ \\textrm{ s.t. } \\ \\cup_{k=1}^{K} S_k = V, \\ S_k \\cap S_{k'}=\\emptyset, \\ \\forall k \\not= k' .\n",
    "$$\n",
    "\n",
    "\n",
    "We may rewrite the Cut operator and the Volume operator with the Assoc operator as:<br><br>\n",
    "$$\n",
    "Vol(S_k) = \\sum_{i\\in S_k, j\\in V} W_{ij} \\\\\n",
    "Assoc(S_k,S_k) = \\sum_{i\\in S_k, j\\in S_k} W_{ij} \\\\\n",
    "Cut(S_k,S_k^c) = \\sum_{i\\in S_k, j\\in S_k^c=V\\setminus S_k} W_{ij} = \\sum_{i\\in S_k, j\\in V} W_{ij} - \\sum_{i\\in S_k, j\\in S_k} W_{ij}  =  Vol(S_k) - Assoc(S_k,S_k) \n",
    "$$\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Answer to Q1:** Your answer here.\n",
    "\n",
    "We  have<br><br>\n",
    "$$\n",
    "\\frac{Cut(S_k,S_k^c)}{Vol(S_k)} = \\frac{Vol(S_k) - Assoc(S_k,S_k)}{Vol(S_k)} = 1- \\frac{Assoc(S_k,S_k)}{Vol(S_k)}\n",
    "$$\n",
    "\n",
    "and<br><br>\n",
    "\n",
    "$$\n",
    "\\sum_{k=1}^K \\frac{Cut(S_k,S_k^c)}{Vol(S_k)} = K - \\sum_{k=1}^K \\frac{Assoc(S_k,S_k)}{Vol(S_k)}\n",
    "$$\n",
    "\n",
    "The relationship between Normalized Cut (NCut) and Normalized Association (NAssoc) for K clusters is thus<br><br>\n",
    "\n",
    "$$\n",
    "NCut(\\{S_k\\}) = K - NAssoc(\\{S_k\\}).\n",
    "$$\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question 2:** Using the relationship between NCut and NAssoc from Q1, it is therefore equivalent to maximize NAssoc by minimizing or maximizing NCut? That is \n",
    "\n",
    "$$\n",
    "\\max_{\\{S_k\\}}\\  NAssoc(\\{S_k\\}) \\ \\textrm{ s.t. } \\cup_{k=1}^{K} S_k = V, \\quad S_k \\cap S_{k'}=\\emptyset, \\ \\forall k \\not= k'\n",
    "$$\n",
    "\n",
    "$$\n",
    "\\Updownarrow\n",
    "$$\n",
    "\n",
    "$$\n",
    "\\min_{\\{S_k\\}}\\  NCut(\\{S_k\\}) \\ \\textrm{ s.t. } \\cup_{k=1}^{K} S_k = V, \\quad S_k \\cap S_{k'}=\\emptyset, \\ \\forall k \\not= k'\n",
    "$$\n",
    "\n",
    "or\n",
    "\n",
    "$$\n",
    "\\max_{\\{S_k\\}}\\  NCut(\\{S_k\\}) \\ \\textrm{ s.t. } \\cup_{k=1}^{K} S_k = V, \\quad S_k \\cap S_{k'}=\\emptyset, \\ \\forall k \\not= k'\n",
    "$$\n",
    "\n",
    "It is not necessary to provide details."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Answer to Q2:** Your answer here.\n",
    "\n",
    "As $\\min F \\Leftrightarrow \\max -F$, we have equivalence between the max NAssoc problem:\n",
    "\n",
    "$$\n",
    "\\max_{\\{S_k\\}}\\  NAssoc(\\{S_k\\}) \\ \\textrm{ s.t. } \\cup_{k=1}^{K} S_k = V, \\quad S_k \\cap S_{k'}=\\emptyset, \\ \\forall k \\not= k'\n",
    "$$\n",
    "\n",
    "and the min NCut problem:\n",
    "\n",
    "$$\n",
    "\\min_{\\{S_k\\}}\\  NCut(\\{S_k\\}) \\ \\textrm{ s.t. } \\cup_{k=1}^{K} S_k = V, \\quad S_k \\cap S_{k'}=\\emptyset, \\ \\forall k \\not= k'\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question 3:** Solving the NCut problem in Q2 is NP-hard => let us consider a spectral relaxation of NCut. Write down the Spectral Matrix A of NCut that satisfies the equivalent functional optimization problem of Q2: \n",
    "\n",
    "$$\n",
    "\\min_{Y}\\ tr( Y^\\top A Y) \\ \\textrm{ s.t. } \\ Y^\\top Y = I_K \\textrm{ and } Y \\in Ind_S, \\quad\\quad\\quad(3)\n",
    "$$\n",
    "\n",
    "where\n",
    "\n",
    "$$\n",
    "Y \\in Ind_S \\ \\textrm{ reads as } \\ Y_{ik} = \n",
    "\\left\\{\n",
    "\\begin{array}{ll}\n",
    "\\big(\\frac{D_{ii}}{Vol(S_k)}\\big)^{1/2} & \\textrm{if}  \\ i \\in S_k\\\\\n",
    "0 & \\textrm{otherwise}\n",
    "\\end{array}\n",
    "\\right..\n",
    "$$\n",
    "\n",
    "and\n",
    "\n",
    "$$\n",
    "A=???\n",
    "$$\n",
    "\n",
    "It is not necessary to provide details.\n",
    "\n",
    "*Hint:* Let us introduce the indicator matrix $F$ of the clusters $S_k$ such that:\n",
    "\n",
    "$$\n",
    "F_{ik} = \n",
    "\\left\\{\n",
    "\\begin{array}{ll}\n",
    "1 & \\textrm{if}  \\ i \\in S_k\\\\\n",
    "0 & \\textrm{otherwise}\n",
    "\\end{array}\n",
    "\\right..\n",
    "$$\n",
    "\n",
    "We may rewrite the Cut operator and the Volume operator with $F$ as:\n",
    "\n",
    "$$\n",
    "Vol(S_k) = \\sum_{i\\in S_k, j\\in V} W_{ij} = F_{\\cdot,k}^\\top D F_{\\cdot,k}\\\\\n",
    "Cut(S_k,S_k^c) = \\sum_{i\\in S_k, j\\in V} W_{ij} - \\sum_{i\\in S_k, j\\in S_k} W_{ij}   = F_{\\cdot,k}^\\top D F_{\\cdot,k} - F_{\\cdot,k}^\\top W F_{\\cdot,k} = F_{\\cdot,k}^\\top (D - W) F_{\\cdot,k} \\quad\n",
    "$$\n",
    "\n",
    "We thus have\n",
    "\n",
    "$$\n",
    "\\frac{Cut(S_k,S_k^c)}{Vol(S_k)} = \\frac{ F_{\\cdot,k}^\\top (D - W) F_{\\cdot,k} }{ F_{\\cdot,k}^\\top D F_{\\cdot,k} } \n",
    "$$\n",
    "\n",
    "\n",
    "Set $\\hat{F}_{\\cdot,k}=D^{1/2}F_{\\cdot,k}$ and observe that\n",
    "\n",
    "$$\n",
    "\\frac{ F_{\\cdot,k}^\\top (D - W) F_{\\cdot,k} }{ F_{\\cdot,k}^\\top D F_{\\cdot,k} } = \\frac{ \\hat{F}_{\\cdot,k}^\\top D^{-1/2}(D - W)D^{-1/2} \\hat{F}_{\\cdot,k} }{ \\hat{F}_{\\cdot,k}^\\top  \\hat{F}_{\\cdot,k} } = \\frac{ \\hat{F}_{\\cdot,k}^\\top (I - D^{-1/2}WD^{-1/2}) \\hat{F}_{\\cdot,k} }{ \\hat{F}_{\\cdot,k}^\\top  \\hat{F}_{\\cdot,k} } ,\n",
    "$$\n",
    "\n",
    "with $L_N=I - D^{-1/2}WD^{-1/2}$ is the normalized graph Laplacian. Set $Y_{\\cdot,k}=\\frac{\\hat{F}_{\\cdot,k}}{\\|\\hat{F}_{\\cdot,k}\\|_2}$:\n",
    "\n",
    "$$\n",
    "\\frac{ \\hat{F}_{\\cdot,k}^\\top L_N \\hat{F}_{\\cdot,k} }{ \\hat{F}_{\\cdot,k}^\\top  \\hat{F}_{\\cdot,k} } = Y_{\\cdot,k}^\\top L_N Y_{\\cdot,k} \\quad\\quad\\quad(2)\n",
    "$$\n",
    "\n",
    "\n",
    "Using (2), we can rewrite (1) as a functional optimization problem:\n",
    "\n",
    "$$\n",
    "\\min_{Y}\\ tr( Y^\\top A Y) \\ \\textrm{ s.t. } \\ Y^\\top Y = I_K \\textrm{ and } Y \\in Ind_S,\n",
    "$$\n",
    "\n",
    "where\n",
    "\n",
    "\n",
    "$$\n",
    "Y \\in Ind_S \\ \\textrm{ reads as } \\ Y_{ik} = \n",
    "\\left\\{\n",
    "\\begin{array}{ll}\n",
    "\\big(\\frac{D_{ii}}{Vol(S_k)}\\big)^{1/2} & \\textrm{if}  \\ i \\in S_k\\\\\n",
    "0 & \\textrm{otherwise}\n",
    "\\end{array}\n",
    "\\right..\n",
    "$$\n",
    "\n",
    "and\n",
    "\n",
    "$$\n",
    "A=???\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Answer to Q3:** Let us introduce the indicator matrix $F$ of the clusters $S_k$ such that:\n",
    "\n",
    "$$\n",
    "F_{ik} = \n",
    "\\left\\{\n",
    "\\begin{array}{ll}\n",
    "1 & \\textrm{if}  \\ i \\in S_k\\\\\n",
    "0 & \\textrm{otherwise}\n",
    "\\end{array}\n",
    "\\right..\n",
    "$$\n",
    "\n",
    "We may rewrite the Cut operator and the Volume operator with $F$ as:\n",
    "\n",
    "$$\n",
    "Vol(S_k) = \\sum_{i\\in S_k, j\\in V} W_{ij} = F_{\\cdot,k}^\\top D F_{\\cdot,k}\\\\\n",
    "Cut(S_k,S_k^c) = \\sum_{i\\in S_k, j\\in V} W_{ij} - \\sum_{i\\in S_k, j\\in S_k} W_{ij}   = F_{\\cdot,k}^\\top D F_{\\cdot,k} - F_{\\cdot,k}^\\top W F_{\\cdot,k} = F_{\\cdot,k}^\\top (D - W) F_{\\cdot,k} \\quad\n",
    "$$\n",
    "\n",
    "We thus have\n",
    "\n",
    "$$\n",
    "\\frac{Cut(S_k,S_k^c)}{Vol(S_k)} = \\frac{ F_{\\cdot,k}^\\top (D - W) F_{\\cdot,k} }{ F_{\\cdot,k}^\\top D F_{\\cdot,k} } \n",
    "$$\n",
    "\n",
    "\n",
    "Set $\\hat{F}_{\\cdot,k}=D^{1/2}F_{\\cdot,k}$ and observe that\n",
    "\n",
    "$$\n",
    "\\frac{ F_{\\cdot,k}^\\top (D - W) F_{\\cdot,k} }{ F_{\\cdot,k}^\\top D F_{\\cdot,k} } = \\frac{ \\hat{F}_{\\cdot,k}^\\top D^{-1/2}(D - W)D^{-1/2} \\hat{F}_{\\cdot,k} }{ \\hat{F}_{\\cdot,k}^\\top  \\hat{F}_{\\cdot,k} } = \\frac{ \\hat{F}_{\\cdot,k}^\\top (I - D^{-1/2}WD^{-1/2}) \\hat{F}_{\\cdot,k} }{ \\hat{F}_{\\cdot,k}^\\top  \\hat{F}_{\\cdot,k} } ,\n",
    "$$\n",
    "\n",
    "where $L_N=I - D^{-1/2}WD^{-1/2}$ is the normalized graph Laplacian. Set $Y_{\\cdot,k}=\\frac{\\hat{F}_{\\cdot,k}}{\\|\\hat{F}_{\\cdot,k}\\|_2}$, we have:\n",
    "\n",
    "$$\n",
    "\\frac{ \\hat{F}_{\\cdot,k}^\\top L_N \\hat{F}_{\\cdot,k} }{ \\hat{F}_{\\cdot,k}^\\top  \\hat{F}_{\\cdot,k} } = Y_{\\cdot,k}^\\top L_N Y_{\\cdot,k} \\quad\\quad\\quad(2)\n",
    "$$\n",
    "\n",
    "\n",
    "Using (2), we can rewrite (1) as a functional optimization problem:\n",
    "\n",
    "$$\n",
    "\\min_{Y}\\ tr( Y^\\top A Y) \\ \\textrm{ s.t. } \\ Y^\\top Y = I_K \\textrm{ and } Y \\in Ind_S,\n",
    "$$\n",
    "\n",
    "where\n",
    "\n",
    "$$\n",
    "Y \\in Ind_S \\ \\textrm{ reads as } \\ Y_{ik} = \n",
    "\\left\\{\n",
    "\\begin{array}{ll}\n",
    "\\big(\\frac{D_{ii}}{Vol(S_k)}\\big)^{1/2} & \\textrm{if}  \\ i \\in S_k\\\\\n",
    "0 & \\textrm{otherwise}\n",
    "\\end{array}\n",
    "\\right..\n",
    "$$\n",
    "\n",
    "and\n",
    "\n",
    "$$\n",
    "A=L_N=I-D^{-1/2}WD^{-1/2}.\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question 4:** Drop the cluster indicator constraint $Y\\in Ind_S$ in Q3, how do you compute the solution $Y^\\star$ of (3)? Why the first column of $Y^\\star$ is not relevant for clustering?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Answer to Q4:** Your answer here.\n",
    "\n",
    "Dropping the constraint $Y\\in Ind_S$ in (3) leads to a standard spectral relaxation problem:\n",
    "\n",
    "$$\n",
    "\\min_{Y}\\ tr( Y^\\top A Y) \\ \\textrm{ s.t. } \\ Y^\\top Y = I_K,\n",
    "$$\n",
    "\n",
    "which solution $Y^\\star$ is given by the $K$ smallest eigenvectors/eigenvalues of $A=L_N=I-D^{-1/2}WD^{-1/2}$. Note that the first column of $Y^\\star$ is the constant signal $y_1=\\frac{1}{\\sqrt{n}}1_{n\\times 1}$ associated to the smallest eigenvalue of $L_N$, which has value $\\lambda_1=0$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question 5:** Plot in 3D the 2nd, 3rd, 4th columns of $Y^\\star$. <br>\n",
    "Hint: Compute the degree matrix $D$.<br>\n",
    "Hint: You may use function *D_sqrt_inv = scipy.sparse.diags(d_sqrt_inv.A.squeeze(), 0)* for creating $D^{-1/2}$.<br>\n",
    "Hint: You may use function *I = scipy.sparse.identity(d.size, dtype=W.dtype)* for creating a sparse identity matrix.<br>\n",
    "Hint: You may use function *lamb, U = scipy.sparse.linalg.eigsh(A, k=4, which='SM')* to perform the eigenvalue decomposition of A.<br> \n",
    "Hint: You may use function *ax.scatter(Xdisp, Ydisp, Zdisp, c=Cgt)* for 3D visualization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Load dataset: W is the Adjacency Matrix and Cgt is the ground truth clusters\n",
    "mat = scipy.io.loadmat('datasets/mnist_2000_graph.mat')\n",
    "W = mat['W']\n",
    "n = W.shape[0]\n",
    "Cgt = mat['Cgt'] - 1; Cgt = Cgt.squeeze()\n",
    "nc = len(np.unique(Cgt))\n",
    "print('Number of nodes =',n)\n",
    "print('Number of classes =',nc);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Your code here\n",
    "\n",
    "# Construct Spectal Matrix A\n",
    "d = W.sum(axis=0) + 1e-6 # degree vector\n",
    "d = 1.0 / np.sqrt(d)\n",
    "Dinv = scipy.sparse.diags(d.A.squeeze(), 0)\n",
    "I = scipy.sparse.identity(d.size, dtype=W.dtype)\n",
    "A = I - Dinv* (W* Dinv)\n",
    "\n",
    "# Compute K smallest eigenvectors/eigenvalues of A\n",
    "lamb, U = scipy.sparse.linalg.eigsh(A, k=4, which='SM')\n",
    "\n",
    "# Sort eigenvalue from smallest to largest values\n",
    "idx = lamb.argsort() # increasing order\n",
    "lamb, U = lamb[idx], U[:,idx]\n",
    "print(lamb)\n",
    "\n",
    "# Y*\n",
    "Y = U\n",
    "\n",
    "# Plot in 3D the 2nd, 3rd, 4th columns of Y*\n",
    "Xdisp = Y[:,1]\n",
    "Ydisp = Y[:,2]\n",
    "Zdisp = Y[:,3]\n",
    "\n",
    "# 2D Visualization\n",
    "plt.figure(14)\n",
    "size_vertex_plot = 10\n",
    "plt.scatter(Xdisp, Ydisp, s=size_vertex_plot*np.ones(n), c=Cgt)\n",
    "plt.title('2D Visualization')\n",
    "plt.show()\n",
    "\n",
    "# 3D Visualization\n",
    "fig = pylab.figure(15)\n",
    "ax = Axes3D(fig)\n",
    "ax.scatter(Xdisp, Ydisp, Zdisp, c=Cgt)\n",
    "pylab.title('3D Visualization')\n",
    "pyplot.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question 6:** Solve the unsupervised clustering problem for MNIST following the popular technique of [Ng, Jordan, Weiss, “On Spectral Clustering: Analysis and an algorithm”, 2002], i.e. <br>\n",
    "(1) Compute $Y^\\star$? solution of Q4. <br>\n",
    "(2) Normalize the rows of $Y^\\star$? with the L2-norm. <br>\n",
    "Hint: You may use function X = ( X.T / np.sqrt(np.sum(X**2,axis=1)+1e-10) ).T  for the L2-normalization of the rows of X.<br>\n",
    "(3) Run standard K-Means on normalized $Y^\\star$? to get the clusters, and compute the clustering accuracy. You should get more than 50% accuracy. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Your code here\n",
    "# Normalize the rows of Y* with the L2 norm, i.e. ||y_i||_2 = 1\n",
    "Y = ( Y.T / np.sqrt(np.sum(Y**2,axis=1)+1e-10) ).T "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Your code here\n",
    "# Run standard K-Means\n",
    "Ker = construct_kernel(Y,'linear') # Compute linear Kernel for standard K-Means\n",
    "Theta = np.ones(n) # Equal weight for each data\n",
    "[C_kmeans,En_kmeans] = compute_kernel_kmeans_EM(nc,Ker,Theta,10)\n",
    "acc= compute_purity(C_kmeans,Cgt,nc)\n",
    "print('accuracy standard kmeans=',acc)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}