{ "metadata": { "name": "", "signature": "sha256:08acdc95576cea5611166dfd98b79831bba6174792407fa04399b8d729e402f3" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Multi-Label Classification with Shogun Machine Learning Toolbox" ] }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "Abinash Panda (github: [abinashpanda](https://github.com/abinashpanda))" ] }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "Thanks Thoralf Klein for taking time to help me on this project! ;)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook presents training of [multi-label classification](http://en.wikipedia.org/wiki/Multi-label_classification) using [structured SVM](http://en.wikipedia.org/wiki/Structured_support_vector_machine) presented in shogun. We would be using MultilabelModel for multi-label classfication.\n", "\n", "We begin with brief introduction to Multi-Label Structured Prediction [1] followed by corresponding API in Shogun. Then we are going to implement a toy example (for illustration) before getting to the real one. Finally, we evaluate the multi-label classification on [well-known datasets](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multilabel.html) [2]. We showed that SHOGUNs [3] implementation delivers same accuracy as scikit-learn and same or better training time." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Introduction" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Multi-Label Structured Prediction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Multi-Label Structured Prediction combines the aspects of multi-label prediction and structured output. Structured prediction typically involves an input $\\mathbf{x}$ (can be structured) and a structured output $\\mathbf{y}$. Given a training set $\\{(x^i, y^i)\\}_{i=1,...,n} \\subset \\mathcal{X} \\times \\mathbb{P}(\\mathcal{Y})$ where $\\mathcal{Y}$ is a structured output set of potentially very large size (in this case $\\mathcal{Y} = \\{y_1, y_2, ...., y_q\\}$ where $q$ is total number of possible classes). A joint feature map $\\psi(x, y)$ is defined to incorporate structure information into the labels. \n", "\n", "The joint feature map $\\psi(x, y)$ for ```MultilabelModel``` is defined as $\\psi(x, y) \\rightarrow x \\otimes y$ where $\\otimes$ is the tensor product.\n", "\n", "We formulate the prediction as: \n", "$h(x) = \\{y \\in \\mathcal{Y} : f(x, y) > 0\\}$\n", "\n", "The compatibility function, $f(x, y)$, acts on individual inputs and outputs, as in single-label prediction, but the prediction step consists of collecting all outputs of positive scores instead of finding the outputs of maximal score." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Multi-Label Models" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook, we are going to compare the performance of two multi-label models:\n", "* ```MultilabelModel model``` : with **constant entry term $0$** in joint feature vector to **not** model bias term.\n", "* ```MultilabelModel model_with_bias``` : with **constant entry $1$** in the joint feature vector to model bias term.\n", "\n", "The joint feature vector are:\n", "* ```model```$\\leftrightarrow \\psi(x, y) = [x || 0] \\otimes y$.\n", "* ```model_with_bias```$\\leftrightarrow \\psi(x, y) = [x || 1] \\otimes y$.\n", "\n", "For comparision of the two models, we are going to perform on the datasets with binary labels. " ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Experiment 1 : Binary Label Data" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Generation of some synthetic data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First of all we create some synthetic data for our toy example. We add some static offset to the data to compare the models with/without threshold." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from __future__ import print_function\n", "\n", "try:\n", " from sklearn.datasets import make_classification\n", "except ImportError:\n", " import pip\n", " pip.main(['install', '--user', 'scikit-learn'])\n", " from sklearn.datasets import make_classification\n", " \n", "import numpy as np\n", "\n", "X, Y = make_classification(n_samples=1000,\n", " n_features=2,\n", " n_informative=2,\n", " n_redundant=0,\n", " n_clusters_per_class=2)\n", "\n", "# adding some static offset to the data\n", "X = X + 1" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Preparation of data and model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To create a multi-label model in shogun, we'll first create an instance of [MultilabelModel](http://shogun-toolbox.org/doc/en/latest/classshogun_1_1CMultilabelModel.html) and initialize it by the features and labels. The labels should be [MultilabelSOLables](http://shogun-toolbox.org/doc/en/latest/classshogun_1_1CMultilabelSOLabels.html). It should be [initialized](http://shogun-toolbox.org/doc/en/latest/classshogun_1_1CMultilabelSOLabels.html#a276b3e36a5b15d5185c913be160ac81c) by providing with the ```n_labels``` (number of examples) and ```n_classes``` (total number of classes) and then individually adding a label using [set\\_sparse\\_label()](http://shogun-toolbox.org/doc/en/latest/classshogun_1_1CMultilabelSOLabels.html#a5a08cc53a1d06c4c797206fbbfaf13b3) method." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from modshogun import RealFeatures, MultilabelSOLabels, MultilabelModel\n", "\n", "def create_features(X, constant):\n", " features = RealFeatures(\n", " np.c_[X, constant * np.ones(X.shape[0])].T)\n", " \n", " return features\n", "from modshogun import MultilabelSOLabels\n", "\n", "def create_labels(Y, n_classes):\n", " try:\n", " n_samples = Y.shape[0]\n", " except AttributeError:\n", " n_samples = len(Y)\n", " \n", " labels = MultilabelSOLabels(n_samples, n_classes)\n", " for i, sparse_label in enumerate(Y):\n", " try:\n", " sparse_label = sorted(sparse_label)\n", " except TypeError:\n", " sparse_label = [sparse_label]\n", " labels.set_sparse_label(i, np.array(sparse_label, dtype=np.int32))\n", " \n", " return labels\n", "\n", "def split_data(X, Y, ratio):\n", " num_samples = X.shape[0]\n", " train_samples = int(ratio * num_samples)\n", " return (X[:train_samples], Y[:train_samples],\n", " X[train_samples:], Y[train_samples:])" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "X_train, Y_train, X_test, Y_test = split_data(X, Y, 0.9)\n", "\n", "feats_0 = create_features(X_train, 0)\n", "feats_1 = create_features(X_train, 1)\n", "labels = create_labels(Y_train, 2)\n", "\n", "model = MultilabelModel(feats_0, labels)\n", "model_with_bias = MultilabelModel(feats_1, labels)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Training and Evaluation of Structured Machines with/without Threshold" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In Shogun, several solvers and online solvers have been implemented for SO-Learning. Let's try to train the model using an online solver [StochasticSOSVM](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CStochasticSOSVM.html)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from modshogun import StochasticSOSVM, DualLibQPBMSOSVM, StructuredAccuracy, LabelsFactory\n", "from time import time\n", "\n", "sgd = StochasticSOSVM(model, labels)\n", "sgd_with_bias = StochasticSOSVM(model_with_bias, labels)\n", "\n", "start = time()\n", "sgd.train()\n", "print(\">>> Time taken for SGD *without* threshold tuning = %f\" % (time() - start))\n", "start = time()\n", "sgd_with_bias.train()\n", "print(\">>> Time taken for SGD *with* threshold tuning = %f\" % (time() - start))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Accuracy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For measuring accuracy in multi-label classification, *Jaccard Similarity Coefficients* $\\big(J(A, B) = \\frac{|A \\cap B|}{|A \\cup B|}\\big)$ is used : \n", "$Accuracy = \\frac{1}{p}\\sum_{i=1}^{p}\\frac{ |Y_i \\cap h(x_i)|}{|Y_i \\cup h(x_i)|}$ \n", "This is available in [MultilabelAccuracy](http://shogun-toolbox.org/doc/en/latest/classshogun_1_1CMultilabelAccuracy.html) for ```MultilabelLabels``` and [StructuredAccuracy](http://shogun-toolbox.org/doc/en/latest/classshogun_1_1CStructuredAccuracy.html) for ```MultilabelSOLabels```." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def evaluate_machine(machine,\n", " X_test,\n", " Y_test,\n", " n_classes,\n", " bias):\n", " if bias:\n", " feats_test = create_features(X_test, 1)\n", " else:\n", " feats_test = create_features(X_test, 0)\n", " \n", " test_labels = create_labels(Y_test, n_classes)\n", " \n", " out_labels = LabelsFactory.to_structured(machine.apply(feats_test))\n", " evaluator = StructuredAccuracy()\n", " jaccard_similarity_score = evaluator.evaluate(out_labels, test_labels)\n", " \n", " return jaccard_similarity_score " ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "print(\">>> Accuracy of SGD *without* threshold tuning = %f \" % evaluate_machine(sgd, X_test, Y_test, 2, False))\n", "print(\">>> Accuracy of SGD *with* threshold tuning = %f \" %evaluate_machine(sgd_with_bias, X_test, Y_test, 2, True))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Plotting the *Data* along with the *Boundary* " ] }, { "cell_type": "code", "collapsed": false, "input": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "def get_parameters(weights):\n", " return -weights[0]/weights[1], -weights[2]/weights[1]\n", "\n", "def scatter_plot(X, y):\n", " zeros_class = np.where(y == 0)\n", " ones_class = np.where(y == 1)\n", " plt.scatter(X[zeros_class, 0], X[zeros_class, 1], c='b', label=\"Negative Class\")\n", " plt.scatter(X[ones_class, 0], X[ones_class, 1], c='r', label=\"Positive Class\")\n", " \n", "def plot_hyperplane(machine_0,\n", " machine_1,\n", " label_0,\n", " label_1,\n", " title,\n", " X, y):\n", " scatter_plot(X, y)\n", " x_min, x_max = np.min(X[:, 0]) - 0.5, np.max(X[:, 0]) + 0.5\n", " y_min, y_max = np.min(X[:, 1]) - 0.5, np.max(X[:, 1]) + 0.5\n", " xx = np.linspace(x_min, x_max, 1000)\n", " \n", " m_0, c_0 = get_parameters(machine_0.get_w()) \n", " m_1, c_1 = get_parameters(machine_1.get_w())\n", " yy_0 = m_0 * xx + c_0\n", " yy_1 = m_1 * xx + c_1\n", " plt.plot(xx, yy_0, \"k--\", label=label_0)\n", " plt.plot(xx, yy_1, \"g-\", label=label_1)\n", " \n", " plt.xlim((x_min, x_max))\n", " plt.ylim((y_min, y_max))\n", " plt.grid()\n", " plt.legend(loc=\"best\")\n", " plt.title(title)\n", " plt.show()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "fig = plt.figure(figsize=(10, 10))\n", "plot_hyperplane(sgd, sgd_with_bias,\n", " \"Boundary for machine *without* bias for class 0\",\n", " \"Boundary for machine *with* bias for class 0\",\n", " \"Binary Classification using SO-SVM with/without threshold tuning\",\n", " X, Y)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we can see from the above plot that ```sgd_with_bias``` can produce better classification boundary. The model without threshold tuning is crossing origin of space, while the one with threshold tuning is crossing $(1,1)$ *(the constant we have added earlier)*." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from modshogun import SparseMultilabel_obtain_from_generic\n", "\n", "def plot_decision_plane(machine,\n", " title,\n", " X, y, bias):\n", " plt.figure(figsize=(24, 8))\n", " plt.suptitle(title)\n", " plt.subplot(1, 2, 1)\n", " x_min, x_max = np.min(X[:, 0]) - 0.5, np.max(X[:, 0]) + 0.5\n", " y_min, y_max = np.min(X[:, 1]) - 0.5, np.max(X[:, 1]) + 0.5\n", " xx = np.linspace(x_min, x_max, 200)\n", " yy = np.linspace(y_min, y_max, 200)\n", " x_mesh, y_mesh = np.meshgrid(xx, yy)\n", "\n", " if bias:\n", " feats = create_features(np.c_[x_mesh.ravel(), y_mesh.ravel()], 1)\n", " else:\n", " feats = create_features(np.c_[x_mesh.ravel(), y_mesh.ravel()], 0)\n", " out_labels = machine.apply(feats)\n", " z = []\n", " for i in range(out_labels.get_num_labels()):\n", " label = SparseMultilabel_obtain_from_generic(out_labels.get_label(i)).get_data()\n", " if label.shape[0] == 1:\n", " # predicted a single label\n", " z.append(label[0])\n", " elif label.shape[0] == 2:\n", " # predicted both the classes\n", " z.append(2)\n", " elif label.shape[0] == 0:\n", " # predicted none of the class\n", " z.append(3)\n", " z = np.array(z)\n", " z = z.reshape(x_mesh.shape)\n", " c = plt.pcolor(x_mesh, y_mesh, z, cmap=plt.cm.gist_heat)\n", " scatter_plot(X, y)\n", " plt.xlim((x_min, x_max))\n", " plt.ylim((y_min, y_max))\n", " plt.colorbar(c)\n", " plt.title(\"Decision Surface\")\n", " plt.legend(loc=\"best\")\n", "\n", " plt.subplot(1, 2, 2)\n", " weights = machine.get_w()\n", " m_0, c_0 = get_parameters(weights[:3])\n", " m_1, c_1 = get_parameters(weights[3:])\n", " yy_0 = m_0 * xx + c_0\n", " yy_1 = m_1 * xx + c_1\n", " plt.plot(xx, yy_0, \"r--\", label=\"Boundary for class 0\")\n", " plt.plot(xx, yy_1, \"g-\", label=\"Boundary for class 1\")\n", " plt.title(\"Hyper planes for different classes\")\n", " plt.legend(loc=\"best\")\n", " plt.xlim((x_min, x_max))\n", " plt.ylim((y_min, y_max))\n", " \n", " plt.show()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "plot_decision_plane(sgd,\"Model *without* Threshold Tuning\", X, Y, False)\n", "plot_decision_plane(sgd_with_bias,\"Model *with* Threshold Tuning\", X, Y, True)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we can see from the above plots of decision surface, the **black** region corresponds to the region of **negative (label = $0$)** class, where as the **red** region corresponds to the **positive (label = $1$)**. But along with that there are some regions (although very small) of white surface and orange surface. The **white** surface corresponds to the region **not classified to any label**, whereas the **orange** region correspond to the region classified to **both the labels**. The reason for existence of these type of surface is that the above boundaries for both the class don't overlap exactly with each other (illustrated above). So, there are some regions for which both the compatibility function $f(x, 0) > 0$ as well as $f(x, 1) > 0$ (predicted both the labels) and there are some regions where both the compatibility function $f(x, 0) < 0$ and $f(x, 1) < 0$ (predicted none of the labels)." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Experiment 2 : Multi-Label Data" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Loading of data from LibSVM File" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def load_data(file_name):\n", " input_file = open(file_name)\n", " lines = input_file.readlines()\n", " n_samples = len(lines)\n", " n_features = len(lines[0].split()) - 1\n", " Y = []\n", " X = []\n", " for line in lines:\n", " data = line.split()\n", " Y.append(map(int, data[0].split(\",\")))\n", " feats = []\n", " for feat in data[1:]:\n", " feats.append(float(feat.split(\":\")[1]))\n", " X.append(feats)\n", " X = np.array(X)\n", " n_classes = max(max(label) for label in Y) + 1\n", " return X, Y, n_samples, n_features, n_classes" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###Training and Evaluation of Structured Machines with/without Threshold" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def test_multilabel_data(train_file,\n", " test_file):\n", " X_train, Y_train, n_samples, n_features, n_classes = load_data(train_file)\n", "\n", " X_test, Y_test, n_samples, n_features, n_classes = load_data(test_file)\n", "\n", " # create features and labels\n", " multilabel_feats_0 = create_features(X_train, 0)\n", " multilabel_feats_1 = create_features(X_train, 1)\n", " multilabel_labels = create_labels(Y_train, n_classes)\n", "\n", " # create multi-label model\n", " multilabel_model = MultilabelModel(multilabel_feats_0, multilabel_labels)\n", " multilabel_model_with_bias = MultilabelModel(multilabel_feats_1, multilabel_labels)\n", " \n", " # initializing machines for SO-learning\n", " multilabel_sgd = StochasticSOSVM(multilabel_model, multilabel_labels)\n", " multilabel_sgd_with_bias = StochasticSOSVM(multilabel_model_with_bias, multilabel_labels)\n", " \n", " start = time()\n", " multilabel_sgd.train()\n", " t1 = time() - start\n", " multilabel_sgd_with_bias.train()\n", " t2 = time() - start - t1\n", " \n", " return (evaluate_machine(multilabel_sgd,\n", " X_test, Y_test,\n", " n_classes, False), t1,\n", " evaluate_machine(multilabel_sgd_with_bias,\n", " X_test, Y_test,\n", " n_classes, True), t2)\n", " " ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Comparision with scikit-learn's implementation" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from sklearn.multiclass import OneVsRestClassifier\n", "from sklearn.svm import SVC\n", "from sklearn.metrics import jaccard_similarity_score\n", "from sklearn.preprocessing import LabelBinarizer\n", "\n", "def sklearn_implementation(train_file,\n", " test_file):\n", " label_binarizer = LabelBinarizer()\n", "\n", " X_train, Y_train, n_samples, n_features, n_classes = load_data(train_file)\n", " X_test, Y_test, n_samples, n_features, n_classes = load_data(test_file)\n", "\n", " clf = OneVsRestClassifier(SVC(kernel='linear'))\n", " start = time()\n", " clf.fit(X_train, label_binarizer.fit_transform(Y_train))\n", " t1 = time() - start\n", " return (jaccard_similarity_score(label_binarizer.fit_transform(Y_test),\n", " clf.predict(X_test)), t1)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "def print_table(train_file,\n", " test_file,\n", " caption):\n", " acc_0, t1, acc_1, t2 = test_multilabel_data(train_file,\n", " test_file)\n", " sk_acc, sk_t1 = sklearn_implementation(train_file,\n", " test_file)\n", " result = '''\n", " \\t\\t%s\n", " Machine\\t\\t\\t\\tAccuracy\\tTrain-time\\n\n", " SGD *without* threshold tuning \\t%f \\t%f\n", " SGD *with* threshold tuning \\t%f \\t%f\n", " scikit-learn's implementation \\t%f \\t%f\n", " ''' % (caption, acc_0, t1, acc_1, t2,\n", " sk_acc, sk_t1)\n", " print(result)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "[Yeast Multi-Label Data](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multilabel.html) \\[2\\]" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print_table(\"../../../data/multilabel/yeast_train.svm\",\n", " \"../../../data/multilabel/yeast_test.svm\",\n", " \"Yeast dataset\")" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "[Scene Multi-Label Data](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multilabel.html) \\[2\\]" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print_table(\"../../../data/multilabel/scene_train\",\n", " \"../../../data/multilabel/scene_test\",\n", " \"Scene dataset\")" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we can see that the accuracy of the machine *with* threshold tuning is comparable to that of *scikit-learn's implementation*. A possible explanation of that is : for multi-label classification using scikit-learn, we have used ```OneVsRestClassifier``` strategy. This strategy fits one classifier per class. It also support multi-label classification. It is initiated using an estimator, for eg. in our case:\n", "
\n",
      "clf = OneVsRestClassifier(SVC(kernel='linear'))\n",
      "
\n", "the estimator is ```SVC(kernel=\"linear\")``` a support vector machine for classification using linear kernel. So, the ```OneVsRestClassifier``` would train a number of estimator (one for each class). The ```SVC``` estimator learns the weight ($w$) as well as the thresholds/bias($b$). \n", "\n", "In the shogun implementation, the structured machines only learn the weights($w$) and there is no threshold or bias. So, to model the threshold to we have to add an constant entry to the joint feature vector. \n", "\n", "Thus the machines with constant entry have the same accuracy as that of scikit-learn implementation." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "References" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[1] C. Lampert. [Maximum Margin Multi-Label Structured Prediction](http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2011_0207.pdf), NIPS 2011\n", "\n", "[2] [LIBSVM Data: Multi-label Classification](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multilabel.html)\n", "\n", "[3] [Shogun Machine Learning Toolbox](http://shogun-toolbox.org/)" ] } ], "metadata": {} } ] }