{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# A Network Tour of Data Science\n",
"### Xavier Bresson, Winter 2016/17\n",
"## Exercise 4 - Code 2 : Unsupervised Learning\n",
"## Unsupervised Clustering with Kernel K-Means "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Load libraries\n",
"\n",
"# Math\n",
"import numpy as np\n",
"\n",
"# Visualization \n",
"%matplotlib notebook \n",
"import matplotlib.pyplot as plt\n",
"plt.rcParams.update({'figure.max_open_warning': 0})\n",
"from mpl_toolkits.axes_grid1 import make_axes_locatable\n",
"from scipy import ndimage\n",
"\n",
"# Print output of LFR code\n",
"import subprocess\n",
"\n",
"# Sparse matrix\n",
"import scipy.sparse\n",
"import scipy.sparse.linalg\n",
"\n",
"# 3D visualization\n",
"import pylab\n",
"from mpl_toolkits.mplot3d import Axes3D\n",
"from matplotlib import pyplot\n",
"\n",
"# Import data\n",
"import scipy.io\n",
"\n",
"# Import functions in lib folder\n",
"import sys\n",
"sys.path.insert(1, 'lib')\n",
"\n",
"# Import helper functions\n",
"%load_ext autoreload\n",
"%autoreload 2\n",
"from lib.utils import construct_kernel\n",
"from lib.utils import compute_kernel_kmeans_EM\n",
"from lib.utils import compute_kernel_kmeans_spectral\n",
"from lib.utils import compute_purity\n",
"\n",
"# Import distance function\n",
"import sklearn.metrics.pairwise\n",
"\n",
"# Remove warnings\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of data = 2000\n",
"Data dimensionality = 784\n",
"Number of classes = 10\n"
]
}
],
"source": [
"# Load MNIST raw data images\n",
"mat = scipy.io.loadmat('datasets/mnist_raw_data.mat')\n",
"X = mat['Xraw']\n",
"n = X.shape[0]\n",
"d = X.shape[1]\n",
"Cgt = mat['Cgt'] - 1; Cgt = Cgt.squeeze()\n",
"nc = len(np.unique(Cgt))\n",
"print('Number of data =',n)\n",
"print('Data dimensionality =',d);\n",
"print('Number of classes =',nc);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Question 1a:** What is the clustering accuracy of standard/linear K-Means?
\n",
"Hint: You may use functions *Ker=construct_kernel(X,'linear')* to compute the\n",
"linear kernel and *[C_kmeans, En_kmeans]=compute_kernel_kmeans_EM(n_classes,Ker,Theta,10)* with *Theta= np.ones(n)* to run the standard K-Means algorithm, and *accuracy = compute_purity(C_computed,C_solution,n_clusters)* that returns the\n",
"accuracy."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Construct Linear Kernel\n",
"accuracy standard kmeans= 13.200000000000001\n"
]
}
],
"source": [
"# Your code here\n",
"Ker = construct_kernel(X,'linear') # Compute linear Kernel for standard K-Means\n",
"Theta = np.ones(n) # Equal weight for each data\n",
"[C_kmeans,En_kmeans] = compute_kernel_kmeans_EM(nc,Ker,Theta,10)\n",
"acc= compute_purity(C_kmeans,Cgt,nc)\n",
"print('accuracy standard kmeans=',acc)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"**Question 1b:** What is the clustering accuracy for the kernel K-Means algorithm with
\n",
"(1) Gaussian Kernel for the EM approach and the Spectral approach?
\n",
"(2) Polynomial Kernel for the EM approach and the Spectral approach?
\n",
"Hint: You may use functions *Ker=construct_kernel(X,'gaussian')* and *Ker=construct_kernel(X,'polynomial',[1,0,2])* to compute the non-linear kernels
\n",
"Hint: You may use functions *C_kmeans,__ = compute_kernel_kmeans_EM(K,Ker,Theta,10)* for the EM kernel KMeans algorithm and *C_kmeans,__ = compute_kernel_kmeans_spectral(K,Ker,Theta,10)* for the Spectral kernel K-Means algorithm.
"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Construct Gaussian Kernel\n",
"accuracy non-linear kmeans with EM= 61.050000000000004\n",
"Construct Linear Kernel\n",
"accuracy non-linear kmeans with SPECTRAL= 52.1\n"
]
}
],
"source": [
"# Your code here\n",
"Ker = construct_kernel(X,'gaussian') # Compute Gaussian Kernel\n",
"Theta = np.ones(n) # Equal weight for each data\n",
"\n",
"C_kmeans,_ = compute_kernel_kmeans_EM(nc,Ker,Theta,10)\n",
"acc = compute_purity(C_kmeans,Cgt,nc)\n",
"print('accuracy non-linear kmeans with EM=',acc)\n",
"\n",
"C_kmeans,_ = compute_kernel_kmeans_spectral(nc,Ker,Theta,10)\n",
"acc = compute_purity(C_kmeans,Cgt,nc)\n",
"print('accuracy non-linear kmeans with SPECTRAL=',acc)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Construct Polynomial Kernel\n",
"accuracy non-linear kmeans with EM= 49.95\n",
"Construct Linear Kernel\n",
"accuracy non-linear kmeans with SPECTRAL= 50.849999999999994\n"
]
}
],
"source": [
"# Your code here\n",
"Ker = construct_kernel(X,'polynomial',[1,0,2])\n",
"Theta = np.ones(n) # Equal weight for each data\n",
"\n",
"C_kmeans, En_kmeans = compute_kernel_kmeans_EM(nc,Ker,Theta,10)\n",
"acc = compute_purity(C_kmeans,Cgt,nc)\n",
"print('accuracy non-linear kmeans with EM=',acc)\n",
"\n",
"[C_kmeans,En_kmeans] = compute_kernel_kmeans_spectral(nc,Ker,Theta,10)\n",
"acc = compute_purity(C_kmeans,Cgt,nc)\n",
"print('accuracy non-linear kmeans with SPECTRAL=',acc)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"**Question 1c:** What is the clustering accuracy for the kernel K-Means algorithm with
\n",
"(1) KNN_Gaussian Kernel for the EM approach and the Spectral approach?
\n",
"(2) KNN_Cosine_Binary Kernel for the EM approach and the Spectral approach?
\n",
"You can test for the value KNN_kernel=50.
\n",
"Hint: You may use functions *Ker = construct_kernel(X,'kNN_gaussian',KNN_kernel)*\n",
"and *Ker = construct_kernel(X,'kNN_cosine_binary',KNN_kernel)* to compute the\n",
"non-linear kernels."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Construct kNN Gaussian Kernel\n",
"accuracy non-linear kmeans with EM= 54.55\n",
"Construct Linear Kernel\n",
"accuracy non-linear kmeans with SPECTRAL= 58.650000000000006\n"
]
}
],
"source": [
"# Your code here\n",
"KNN_kernel = 50\n",
"Ker = construct_kernel(X,'kNN_gaussian',KNN_kernel)\n",
"Theta = np.ones(n) # Equal weight for each data\n",
"\n",
"C_kmeans,_ = compute_kernel_kmeans_EM(nc,Ker,Theta,10)\n",
"acc = compute_purity(C_kmeans,Cgt,nc)\n",
"print('accuracy non-linear kmeans with EM=',acc)\n",
"\n",
"C_kmeans,_ = compute_kernel_kmeans_spectral(nc,Ker,Theta,10)\n",
"acc = compute_purity(C_kmeans,Cgt,nc)\n",
"print('accuracy non-linear kmeans with SPECTRAL=',acc)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Construct kNN Cosine Binary Kernel\n",
"accuracy non-linear kmeans with EM= 58.550000000000004\n",
"Construct Linear Kernel\n",
"accuracy non-linear kmeans with SPECTRAL= 60.35\n"
]
}
],
"source": [
"# Your code here\n",
"KNN_kernel = 50\n",
"Ker = construct_kernel(X,'kNN_cosine_binary',KNN_kernel)\n",
"Theta = np.ones(n) # Equal weight for each data\n",
"\n",
"C_kmeans,_ = compute_kernel_kmeans_EM(nc,Ker,Theta,10)\n",
"acc = compute_purity(C_kmeans,Cgt,nc)\n",
"print('accuracy non-linear kmeans with EM=',acc)\n",
"\n",
"C_kmeans,_ = compute_kernel_kmeans_spectral(nc,Ker,Theta,10)\n",
"acc = compute_purity(C_kmeans,Cgt,nc)\n",
"print('accuracy non-linear kmeans with SPECTRAL=',acc)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}