{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# A Network Tour of Data Science\n", "###       Xavier Bresson, Winter 2016/17\n", "## Exercise 4 - Code 2 : Unsupervised Learning\n", "## Unsupervised Clustering with Kernel K-Means " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Load libraries\n", "\n", "# Math\n", "import numpy as np\n", "\n", "# Visualization \n", "%matplotlib notebook \n", "import matplotlib.pyplot as plt\n", "plt.rcParams.update({'figure.max_open_warning': 0})\n", "from mpl_toolkits.axes_grid1 import make_axes_locatable\n", "from scipy import ndimage\n", "\n", "# Print output of LFR code\n", "import subprocess\n", "\n", "# Sparse matrix\n", "import scipy.sparse\n", "import scipy.sparse.linalg\n", "\n", "# 3D visualization\n", "import pylab\n", "from mpl_toolkits.mplot3d import Axes3D\n", "from matplotlib import pyplot\n", "\n", "# Import data\n", "import scipy.io\n", "\n", "# Import functions in lib folder\n", "import sys\n", "sys.path.insert(1, 'lib')\n", "\n", "# Import helper functions\n", "%load_ext autoreload\n", "%autoreload 2\n", "from lib.utils import construct_kernel\n", "from lib.utils import compute_kernel_kmeans_EM\n", "from lib.utils import compute_kernel_kmeans_spectral\n", "from lib.utils import compute_purity\n", "\n", "# Import distance function\n", "import sklearn.metrics.pairwise\n", "\n", "# Remove warnings\n", "import warnings\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of data = 2000\n", "Data dimensionality = 784\n", "Number of classes = 10\n" ] } ], "source": [ "# Load MNIST raw data images\n", "mat = scipy.io.loadmat('datasets/mnist_raw_data.mat')\n", "X = mat['Xraw']\n", "n = X.shape[0]\n", "d = X.shape[1]\n", "Cgt = mat['Cgt'] - 1; Cgt = Cgt.squeeze()\n", "nc = len(np.unique(Cgt))\n", "print('Number of data =',n)\n", "print('Data dimensionality =',d);\n", "print('Number of classes =',nc);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 1a:** What is the clustering accuracy of standard/linear K-Means?
\n", "Hint: You may use functions *Ker=construct_kernel(X,'linear')* to compute the\n", "linear kernel and *[C_kmeans, En_kmeans]=compute_kernel_kmeans_EM(n_classes,Ker,Theta,10)* with *Theta= np.ones(n)* to run the standard K-Means algorithm, and *accuracy = compute_purity(C_computed,C_solution,n_clusters)* that returns the\n", "accuracy." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Construct Linear Kernel\n", "accuracy standard kmeans= 13.200000000000001\n" ] } ], "source": [ "# Your code here\n", "Ker = construct_kernel(X,'linear') # Compute linear Kernel for standard K-Means\n", "Theta = np.ones(n) # Equal weight for each data\n", "[C_kmeans,En_kmeans] = compute_kernel_kmeans_EM(nc,Ker,Theta,10)\n", "acc= compute_purity(C_kmeans,Cgt,nc)\n", "print('accuracy standard kmeans=',acc)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "**Question 1b:** What is the clustering accuracy for the kernel K-Means algorithm with
\n", "(1) Gaussian Kernel for the EM approach and the Spectral approach?
\n", "(2) Polynomial Kernel for the EM approach and the Spectral approach?
\n", "Hint: You may use functions *Ker=construct_kernel(X,'gaussian')* and *Ker=construct_kernel(X,'polynomial',[1,0,2])* to compute the non-linear kernels
\n", "Hint: You may use functions *C_kmeans,__ = compute_kernel_kmeans_EM(K,Ker,Theta,10)* for the EM kernel KMeans algorithm and *C_kmeans,__ = compute_kernel_kmeans_spectral(K,Ker,Theta,10)* for the Spectral kernel K-Means algorithm.
" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Construct Gaussian Kernel\n", "accuracy non-linear kmeans with EM= 61.050000000000004\n", "Construct Linear Kernel\n", "accuracy non-linear kmeans with SPECTRAL= 52.1\n" ] } ], "source": [ "# Your code here\n", "Ker = construct_kernel(X,'gaussian') # Compute Gaussian Kernel\n", "Theta = np.ones(n) # Equal weight for each data\n", "\n", "C_kmeans,_ = compute_kernel_kmeans_EM(nc,Ker,Theta,10)\n", "acc = compute_purity(C_kmeans,Cgt,nc)\n", "print('accuracy non-linear kmeans with EM=',acc)\n", "\n", "C_kmeans,_ = compute_kernel_kmeans_spectral(nc,Ker,Theta,10)\n", "acc = compute_purity(C_kmeans,Cgt,nc)\n", "print('accuracy non-linear kmeans with SPECTRAL=',acc)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Construct Polynomial Kernel\n", "accuracy non-linear kmeans with EM= 49.95\n", "Construct Linear Kernel\n", "accuracy non-linear kmeans with SPECTRAL= 50.849999999999994\n" ] } ], "source": [ "# Your code here\n", "Ker = construct_kernel(X,'polynomial',[1,0,2])\n", "Theta = np.ones(n) # Equal weight for each data\n", "\n", "C_kmeans, En_kmeans = compute_kernel_kmeans_EM(nc,Ker,Theta,10)\n", "acc = compute_purity(C_kmeans,Cgt,nc)\n", "print('accuracy non-linear kmeans with EM=',acc)\n", "\n", "[C_kmeans,En_kmeans] = compute_kernel_kmeans_spectral(nc,Ker,Theta,10)\n", "acc = compute_purity(C_kmeans,Cgt,nc)\n", "print('accuracy non-linear kmeans with SPECTRAL=',acc)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "**Question 1c:** What is the clustering accuracy for the kernel K-Means algorithm with
\n", "(1) KNN_Gaussian Kernel for the EM approach and the Spectral approach?
\n", "(2) KNN_Cosine_Binary Kernel for the EM approach and the Spectral approach?
\n", "You can test for the value KNN_kernel=50.
\n", "Hint: You may use functions *Ker = construct_kernel(X,'kNN_gaussian',KNN_kernel)*\n", "and *Ker = construct_kernel(X,'kNN_cosine_binary',KNN_kernel)* to compute the\n", "non-linear kernels." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Construct kNN Gaussian Kernel\n", "accuracy non-linear kmeans with EM= 54.55\n", "Construct Linear Kernel\n", "accuracy non-linear kmeans with SPECTRAL= 58.650000000000006\n" ] } ], "source": [ "# Your code here\n", "KNN_kernel = 50\n", "Ker = construct_kernel(X,'kNN_gaussian',KNN_kernel)\n", "Theta = np.ones(n) # Equal weight for each data\n", "\n", "C_kmeans,_ = compute_kernel_kmeans_EM(nc,Ker,Theta,10)\n", "acc = compute_purity(C_kmeans,Cgt,nc)\n", "print('accuracy non-linear kmeans with EM=',acc)\n", "\n", "C_kmeans,_ = compute_kernel_kmeans_spectral(nc,Ker,Theta,10)\n", "acc = compute_purity(C_kmeans,Cgt,nc)\n", "print('accuracy non-linear kmeans with SPECTRAL=',acc)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Construct kNN Cosine Binary Kernel\n", "accuracy non-linear kmeans with EM= 58.550000000000004\n", "Construct Linear Kernel\n", "accuracy non-linear kmeans with SPECTRAL= 60.35\n" ] } ], "source": [ "# Your code here\n", "KNN_kernel = 50\n", "Ker = construct_kernel(X,'kNN_cosine_binary',KNN_kernel)\n", "Theta = np.ones(n) # Equal weight for each data\n", "\n", "C_kmeans,_ = compute_kernel_kmeans_EM(nc,Ker,Theta,10)\n", "acc = compute_purity(C_kmeans,Cgt,nc)\n", "print('accuracy non-linear kmeans with EM=',acc)\n", "\n", "C_kmeans,_ = compute_kernel_kmeans_spectral(nc,Ker,Theta,10)\n", "acc = compute_purity(C_kmeans,Cgt,nc)\n", "print('accuracy non-linear kmeans with SPECTRAL=',acc)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }