{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# A Network Tour of Data Science\n", "###       Xavier Bresson, Winter 2016/17\n", "## Exercise 4 - Code 1 : Graph Science\n", "## Construct Network of Text Documents " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Load libraries\n", "\n", "# Math\n", "import numpy as np\n", "\n", "# Visualization \n", "%matplotlib notebook \n", "import matplotlib.pyplot as plt\n", "plt.rcParams.update({'figure.max_open_warning': 0})\n", "from mpl_toolkits.axes_grid1 import make_axes_locatable\n", "from scipy import ndimage\n", "\n", "# Print output of LFR code\n", "import subprocess\n", "\n", "# Sparse matrix\n", "import scipy.sparse\n", "import scipy.sparse.linalg\n", "\n", "# 3D visualization\n", "import pylab\n", "from mpl_toolkits.mplot3d import Axes3D\n", "from matplotlib import pyplot\n", "\n", "# Import data\n", "import scipy.io\n", "\n", "# Import functions in lib folder\n", "import sys\n", "sys.path.insert(1, 'lib')\n", "\n", "# Import helper functions\n", "%load_ext autoreload\n", "%autoreload 2\n", "from lib.utils import compute_ncut\n", "from lib.utils import reindex_W_with_classes\n", "from lib.utils import nldr_visualization\n", "from lib.utils import construct_knn_graph\n", "from lib.utils import compute_purity\n", "\n", "# Import distance function\n", "import sklearn.metrics.pairwise\n", "\n", "# Remove warnings\n", "import warnings\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of data = 2000\n", "Data dimensionality = 7939\n", "Number of classes = 5\n" ] } ], "source": [ "# Load 10 classes of 4,000 text documents\n", "mat = scipy.io.loadmat('datasets/20news_5classes_raw_data.mat')\n", "X = mat['X']\n", "n = X.shape[0]\n", "d = X.shape[1]\n", "Cgt = mat['Cgt'] - 1; Cgt = Cgt.squeeze()\n", "nc = len(np.unique(Cgt))\n", "print('Number of data =',n)\n", "print('Data dimensionality =',d);\n", "print('Number of classes =',nc);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 1a:** Compute the k-NN graph (k=10) with L2/Euclidean distance
\n", "Hint: You may use the function *W=construct_knn_graph(X,k,'euclidean')*" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Your code here\n" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "**Question 1b:** Plot the adjacency matrix of the graph.
\n", "Hint: Use function *plt.spy(W, markersize=1)*" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Your code here\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 1c:** Reindex the adjacency matrix of the graph w.r.t. ground\n", "truth communities. Plot the reindexed adjacency matrix of the graph.
\n", "Hint: You may use the function *[W_reindex,C_classes_reindex]=reindex_W_with_classes(W,C_classes)*." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Your code here\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Your code here\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 1d:** Perform graph clustering with NCut technique. What is\n", "the clustering accuracy of the NCut solution? What is the clustering\n", "accuracy of a random partition? Reindex the adjacency matrix of the\n", "graph w.r.t. NCut communities. Plot the reindexed adjacency matrix of\n", "the graph.
\n", "Hint: You may use function *C_ncut, accuracy = compute_ncut(W,C_solution,n_clusters)* that performs Ncut clustering.
\n", "Hint: You may use function *accuracy = compute_purity(C_computed,C_solution,n_clusters)* that returns the\n", "accuracy of a computed partition w.r.t. the ground truth partition. A\n", "random partition can be generated with the function *np.random.randint*." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Your code here\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 2a:** Compute the k-NN graph (k=10) with Cosine distance.
\n", "Answer to questions 1b-1d for this graph.
\n", "Hint: You may use function *W=construct_knn_graph(X,10,'cosine')*." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Reload data matrix\n", "X = mat['X']" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Your code here\n", "# Compute the k-NN graph with Cosine distance\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Your code here\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Your code here\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Your code here\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question 2b:** Visualize the adjacency matrix with the non-linear reduction\n", "technique in 2D and 3D.
\n", "Hint: You may use function *[X,Y,Z] = nldr_visualization(W)*.
\n", "Hint: You may use function *plt.scatter(X,Y,c=Cncut)* for 2D visualization and *ax.scatter(X,Y,Z,c=Cncut)* for 3D visualization." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [], "source": [ "# Your code here\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }