{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"right\" style=\"max-width: 200px; height: auto\" src=\"cfds_logo.png\">\n",
    "\n",
    "###  Lab 07 - \"Supervised Machine Learning Support Vector Classification\"\n",
    "\n",
    "Chartered Financial Data Scientist (CFDS), Autumn Term 2020"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this lab, we will use a classification technique referred to as **Support Vector Machine (SVM)**. Please recall that SVMs correspond to the class of **discriminative** classifiers as distinguished in the following illustration: "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"center\" style=\"max-width: 700px; height: auto\" src=\"supervisedlearning.png\">\n",
    "\n",
    "(Inspired by: 'Machine Learning - A Probabilistic Perspective', Kevin P. Murphy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The *discriminative* **Support Vector Machine (SVM)** classifier is a supervised machine learning model that learns an optimal separating $n$-dimensional hyperplane to distinguish different observations of training data according to their corresponding class labels. Until recently (before to the advent of deep learning approaches) SVMs have been used in a variety of applications such as isolated handwritten digit recognition[2], object recognition[3], speaker identification[4], face detection in images[5], and text categorisation[6]."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This third lab builds in parts on the excellent SVM tutorial **\"A Tutorial on Support Vector Machines for Pattern Recognition\"** developed by Christopher J.C. Burges. The original tutorial is available under the following URL: https://link.springer.com/article/10.1023/A:1009715923555. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As always, pls. don't hesitate to ask all your questions either during the lab or send us an email (using our\n",
    "fds.ai email addresses)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Lab Objectives:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After today's lab, you should be able to:\n",
    "\n",
    "> 1. Understand how a **Suppport Vector Machine (SVM)** classifier can be trained and evaluated.\n",
    "> 2. Understand the impact of selected **SVM hyperparameters** and distinct kernel functions.  \n",
    "> 3. Design and extract information of **handcrafted features** from a set of arbitrary images. \n",
    "> 3. Train and evaluate discriminative **machine learning models** using Python's `scikit-learn` library.\n",
    "> 4. Understand how to **evaluate** and **interpret** the classification results."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we start, let's watch a motivational video:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import YouTubeVideo\n",
    "# OpenAI: \"Solving Rubik's Cube with a Robot Hand\"\n",
    "# YouTubeVideo('x4O8pojMF0w', width=800, height=600)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup of the Analysis Environment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similar to the previous labs, we need to import a couple of Python libraries that allow for data analysis and data visualisation. In this lab will use the `Pandas`, `Numpy`, `Scikit-Learn`, `Matplotlib` and the `Seaborn` library. Let's import the libraries by the execution of the statements below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import the numpy, scipy and pandas data science library\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import scipy as sp\n",
    "from scipy.stats import norm\n",
    "\n",
    "# import sklearn data and data pre-processing libraries\n",
    "from sklearn import datasets\n",
    "from sklearn.preprocessing import MinMaxScaler\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "# import torchvision library\n",
    "import torchvision\n",
    "\n",
    "# import sklearn HOG feature library\n",
    "from skimage.feature import hog\n",
    "\n",
    "# import sklearn support vector classifier (svc) library\n",
    "from sklearn.svm import SVC\n",
    "\n",
    "# import sklearn classification evaluation library\n",
    "from sklearn import metrics\n",
    "from sklearn.metrics import classification_report, confusion_matrix\n",
    "\n",
    "# import matplotlib data visualization library\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Enable inline Jupyter notebook plotting:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ignore potential library warnings:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use the 'Seaborn' plotting style in all subsequent visualisations:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.style.use('seaborn')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set random seed of all our experiments:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "random_seed = 42"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Support Vector Machine (SVM) Classification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.1. Dataset Download and Data Assessment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The **Iris Dataset** is a classic and straightforward dataset often used as a \"Hello World\" example in multi-class classification. This data set consists of measurements taken from three different types of iris flowers (referred to as **Classes**),  namely the Iris Setosa, the Iris Versicolour, and, the Iris Virginica) and their respective measured petal and sepal length (referred to as **Features**)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"center\" style=\"max-width: 700px; height: auto\" src=\"iris_dataset.png\">\n",
    "\n",
    "(Source: http://www.lac.inpe.br/~rafael.santos/Docs/R/CAP394/WholeStory-Iris.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In total, the dataset consists of **150 samples** (50 samples taken per class) as well as their corresponding **4 different measurements** taken for each sample. Please, find below the list of the individual measurements:\n",
    "\n",
    ">- `Sepal length (cm)`\n",
    ">- `Sepal width (cm)`\n",
    ">- `Petal length (cm)`\n",
    ">- `Petal width (cm)`\n",
    "\n",
    "Further details of the dataset can be obtained from the following publication: *Fisher, R.A. \"The use of multiple measurements in taxonomic problems\" Annual Eugenics, 7, Part II, 179-188 (1936); also in \"Contributions to Mathematical Statistics\" (John Wiley, NY, 1950).\"*\n",
    "\n",
    "Let's load the dataset and conduct a preliminary data assessment: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "iris = datasets.load_iris()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Print and inspect the names of the four features contained in the dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "iris.feature_names"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Determine and print the feature dimensionality of the dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "iris.data.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Determine and print the class label dimensionality of the dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "iris.target.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Print and inspect the names of the three classes contained in the dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "iris.target_names"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's briefly envision how the feature information of the dataset is collected and presented in the data:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"center\" style=\"max-width: 900px; height: auto\" src=\"featurecollection.png\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's inspect the top ten feature rows of the Iris Dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame(iris.data, columns=iris.feature_names).head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's also inspect the top ten class labels of the Iris Dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame(iris.target, columns=[\"class\"]).head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now conduct a more in-depth data assessment. Therefore, we plot the feature distributions of the Iris dataset according to their respective class memberships as well as the features pairwise relationships."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pls. note that we use Python's **Seaborn** library to create such a plot referred to as **Pairplot**. The Seaborn library is a powerful data visualisation library based on the Matplotlib. It provides a great interface for drawing informative statistical graphics (https://seaborn.pydata.org). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init the plot\n",
    "plt.figure(figsize=(10, 10))\n",
    "\n",
    "# load the dataset also available in seaborn\n",
    "iris_plot = sns.load_dataset(\"iris\")\n",
    "\n",
    "# plot a pairplot of the distinct feature distributions\n",
    "sns.pairplot(iris_plot, diag_kind='hist', hue='species');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It can be observed from the created Pairplot, that most of the feature measurements that correspond to flower class \"setosa\" exhibit a nice **linear separability** from the feature measurements of the remaining flower classes. Besides, the flower classes \"versicolor\" and \"virginica\" exhibit a commingled and **non-linear separability** across all the measured feature distributions of the Iris Dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2. Dataset Pre-Processing and Train-/Test-Split"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To understand and evaluate the performance of any trained **supervised machine learning** model, it is good practice, to divide the dataset into a **training set** (the fraction of data records solely used for training purposes) and an **evaluation set** (the fraction of data records solely used for evaluation purposes). Pls. note, the **evaluation set** will never be shown to the model as part of the training process."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"center\" style=\"max-width: 500px; height: auto\" src=\"trainevaldataset.png\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We set the fraction of evaluation records to **30%** of the original dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "eval_fraction = 0.3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Randomly split the dataset into a training set and an evaluation set using sklearns `train_test_split` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 70% training and 30% evaluation\n",
    "x_train, x_eval, y_train, y_eval = train_test_split(iris.data, iris.target, test_size=eval_fraction, random_state=random_seed, stratify=None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Evaluate the dimensionality of the training dataset $x^{train}$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x_train.shape, y_train.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Evaluate the dimensionality of the evaluation dataset $x^{eval}$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x_eval.shape, y_eval.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.3. Support Vector Machine (SVM) Classification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's suppose we are given $l$ observations. Each observation consists of a pair: a vector $x_{i} \\in \\mathbb{R}^{n}, i=1, ..., l$ and the associated \"truth\" $y_{i}$, provided by a trusted source. In the context of a face detection task, $x_{i}$ might be vector of pixel values (e.g. $n$=256 for 1024x1024 pixel image), and $y_{i}$ would be $1$ if the image contains a face, and $-1$ otherwise. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.3.2. Linear Support Vector Machine (SVM) Classifiers - The Linear Separable Case"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose we have some hyperplane which separates the positive from the negative examples referred to as \"separating hyperplane\". The points $x$ which lie on the hyperplane satisfy the following equation $w \\cdot x + b = 0$, where $w$ is normal to the hyperplane, $|b|/||w||$ is the perpendicular distance from the hyperplane to the origin, and $||w||$ is the Euclidean norm of $w$. Let $d_{+}$ ($d_{-}$) be the shortest distance from the separating hyperplane to the closest positive (negative) example. We define the \"margin\" of a separating hyperplane to be $d_{+} + d_{-}$. In the context of the linearly separable case, the support vector algorithm simply looks for the separating hyperplane with the maximum margin. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"center\" style=\"max-width: 400px; height: auto\" src=\"hyperplanelinear.png\">\n",
    "\n",
    "Linear separating hyperplanes $H_{1}$, $H_{2}$, and $H^{*}$ for the separable case. The support vectors that constitute $H_{1}$, $H_{2}$ are circled.\n",
    "\n",
    "(Source: https://link.springer.com/article/10.1023/A:1009715923555)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose that all the training data satisfies the following constraints: "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$ x_{i} \\cdot w + b \\geq + 1, y_{i} = +1 $$\n",
    "\n",
    "$$ x_{i} \\cdot w + b \\leq - 1, y_{i} = -1 $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This can be combined into one set of inequalities: "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$y_{i}(x_{i} \\cdot w + b) - 1 \\geq 0, \\forall_{i}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now consider the points for which the equality $x_{i} \\cdot w + b \\geq + 1$ holds. These points lie on a hyperplane $H_{1}: x_{i} \\cdot w + b = + 1$ with normal $w$ and perpendicular distance from the origin $|1-b|/||w||$. Similarly, the points for which the equality $x_{i} \\cdot w + b \\leq - 1$ holds lie on the hyperplane $H_{2}: x_{i} \\cdot w + b = -1$, with normal again $w$, and perpendicular distance from the origin $|-1-b|/||w||$. Hence $d_{+} = d_{-} = 1 / ||w||$ and the margin is simply 2/||w||. Note that $H_{1}$ and $H_{2}$ are parallel and that no training points $x_{i}$ fall between them. Thus we can find a pair of hyperplanes which correspond to a maximum margin by minimizing $||w||^{2}$, subject to constraint $y_{i}(x_{i} \\cdot w + b) - 1 \\geq 0$. Those training points $x_{i}$ which wind up lying on one of the hyperplanes $H_{1}$, $H_{2}$, and whose removal would change the solution found, are referred to as **\"support vectors\"**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### A \"Primal\"  Optimization Objective Formulation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As discussed in the lecture, we can reformulate the objective of finding such a max-margin seperating hyperplane as a Lagrangian optimization objective. Thereby, we introduce a set of positive Lagrange multipliers $\\alpha_{i}, i=1, ..., l$ which turns the search for a max-margin seperating hyperplane into solving the following Lagrangian:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$L_{P} = \\frac{1}{2}||w||^{2} - \\sum_{i=1}^{l} \\alpha_{i}y_{i}(x_{i} \\cdot w + b) + \\sum_{i=1}^{l}\\alpha_{i}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We must now minimize $L_{P}$, referred to as the **\"primal\"**, with respect to $w$, $b$. Thereby, \n",
    "\n",
    "> 1. the minimization of the first term $\\frac{1}{2}||w||^{2}$ maximizes the margin of the separating hyperplane, \n",
    "> 2. the maximization of the second term $\\sum_{i=1}^{l} \\alpha_{i}y_{i}(x_{i} \\cdot w + b)$ maximizes the number of correctly classfied training samples,\n",
    "> 3. the minimization of the third term $\\sum_{i=1}^{l}\\alpha_{i}$ minimizes the number of support vectors. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Minimization of $L_{P}$ is a convex quadratic programming problem, since the objective function is itself convex, and those points for which $\\alpha_{i} > 0$ that satisfy the constraints also form a convex set. Again, those points are called \"support vectors\", and lie on one of the hyperplanes $H_{1}$, $H_{2}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### A \"Dual\" Optimization Objective Formulation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Requiring that the gradient of $L_{P}$ with respect to $w$ and $b$ vanish result in the conditions, that $w = \\sum_{i=1}^{l} \\alpha_{i}y_{i}x_{i}$ and $\\sum_{i=1}^{l}\\alpha_{i}y_{i} = 0$. Using those conditions, the above shown Lagrangian can be reformulated to derive its **\"dual\"** formulation:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$L_{D} = \\sum_{i=1}^{l}\\alpha_{i} + \\frac{1}{2} \\sum_{i,j=1}^{l} \\alpha_{i}\\alpha_{j}y_{i}y_{j}<x_{i}, x_{j}>$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that solving the dual formulation doesn't depend on $w$ anymore. It only depends on the samples $x_{i} \\in \\mathbb{R}^{n}, i=1, ..., l$ of the training dataset as well as the associated labels $y_{i}$. This indicates that the optimal seperating hyperplane $H^{*}$ becomes a linear function of the data. Note also that if we formulate the problem, as above, with $b=0$, requires that all hyperplanes contain the origin. However, this is a mild restriction for high dimensional spaces since it amounts to reducing the number of degrees of freedom by one."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.3.3. Training of a Linear Support Vector Machine (SVM) Classifer using Python's Scikit-Learn Library"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Luckily, the `Scikit-Learn` (https://scikit-learn.org) machine learning library provides a variety of machine learning algorithms that can be easily interfaced using the Python programming language. Among others the library also contains a variety of supervised classification algorithms such as the **Support Vector Machine (SVM)** classifier. The SVM classifier can be trained \"off-the-shelf\" to solve the dual Lagrangian $L_{D}$ optimization objective formulated above. Let's instantiate one of the SVM classifiers available in `Scikit-Learn` to learn a linear seperating hyperplane:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init the Support Vector Machine classifier\n",
    "svm = SVC(kernel='linear', random_state=random_seed)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Train or fit the SVM classifier using the training dataset features and labels:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# train / fit the Support Vector Machine classifier\n",
    "svm.fit(x_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.3.4. Evaluation of the trained Support Vector Machine Classifier"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After fitting the training data, the optimal seperating hyperplane $H^{*}$ learned by the SVM model can then be used to predict the corresponding class labels $y_{i}'$ of so far unknown observations $x_{i}'$. We will utilize the trained model to predict the class labels of the remaining observations contained in the evaluation dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_pred = svm.predict(x_eval)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's have a look at the class labels $y_{i}'$ **predicted** by the SVM classifier on the evaluation dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_pred"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As well as the **true** class labels $y_{i}$ as contained in the evaluation dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_eval"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, comparing the **true** and **predicted** class labels looks encouraging. Let's determine the exact **prediction accuracy** that the trained model $h$ was able to achieve on the evaluation dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Model classification accuracy: {}%'.format(str(metrics.accuracy_score(y_eval, y_pred) * 100)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Determine the number of **misclassified** data sampels in the evaluation dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Number of mislabeled points out of a total {} points: {}'.format(x_eval.shape[0], np.sum(y_eval != y_pred)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the field of machine learning and in particular the field of statistical classification, a **confusion matrix**, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm. Each row of the matrix represents the number of instances that the classifier predicted per class, while each column represents the instances of the true or actual class:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"center\" style=\"max-width: 300px; height: auto\" src=\"confusionmatrix.png\">\n",
    "\n",
    "(Source: https://en.wikipedia.org/wiki/Confusion_matrix)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Determine and plot the **confusion matrix** of the individual predictions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# determine the prediction confusion matrix\n",
    "mat = confusion_matrix(y_eval, y_pred)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Plot the **confusion matrix** of the individual predictions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init the plot\n",
    "plt.figure(figsize=(5, 5))\n",
    "\n",
    "# plot confusion matrix heatmap\n",
    "sns.heatmap(mat.T, square=True, annot=True, fmt='d', cbar=False, cmap='YlOrRd_r', xticklabels=iris.target_names, yticklabels=iris.target_names)\n",
    "\n",
    "# add plot axis labels\n",
    "plt.xlabel('[true class label $y_{i}$]')\n",
    "plt.ylabel('[predicted class label $y_{i}\\'$]')\n",
    "\n",
    "# add plot title\n",
    "plt.title('SVM Predictions - Confusion Matrix');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.3.5. Prediction of Classes of Unknown Iris Flower Observations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**First unknown iris flower:** Now that we have trained and evaluated our SVM classifier let's apply it to two so far unknown or unseen **iris flower** observations. The first **iris flower** observation $x^{s1}$ exhibits the following observed feature values: $x^{s1} = \\{x_{sl}=5.8, x_{sw}=3.5, x_{pl}=1.5, x_{pw}=0.25\\}$:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"center\" style=\"max-width: 200px; height: auto\" src=\"iris_sample_1.png\">\n",
    "\n",
    "(Source: https://de.wikipedia.org/wiki/Schwertlilien)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's convert those measurements into a feature vector $x^{s1}$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init features of the first unknown iris flower observation \n",
    "sepal_length = 5.8 \n",
    "sepal_width  = 3.5\n",
    "petal_length = 1.5\n",
    "petal_width  = 0.25\n",
    "\n",
    "# create the observation feature vector\n",
    "x_s1_feature_vector = [sepal_length, sepal_width, petal_length, petal_width]\n",
    "\n",
    "# print the feature vector\n",
    "print(x_s1_feature_vector)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now use our trained SVM model $h$ to predict the class $c^{*}$ of the unknown iris flower $x^{s1}$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# determine class label prediction of the first unknown observation\n",
    "class_prediction_sample_1 = svm.predict([x_s1_feature_vector])\n",
    "\n",
    "# convert predicted class label to class name\n",
    "print(iris.target_names[class_prediction_sample_1[0]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's build an intuition of the distinct iris flower class distributions including the current iris flower observation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init the plot\n",
    "plt.figure(figsize=(10, 10))\n",
    "\n",
    "# load the dataset also available in seaborn\n",
    "iris_plot = sns.load_dataset('iris')\n",
    "\n",
    "# add preliminary label to unknown feature observation\n",
    "x_s1_feature_vector.append('observation s1')\n",
    "\n",
    "# add observation to the iris dataset\n",
    "iris_plot = iris_plot.append(pd.DataFrame([x_s1_feature_vector], columns=iris_plot.columns))\n",
    "\n",
    "# plot a pairplot of the distinct feature distributions\n",
    "sns.pairplot(iris_plot, diag_kind='hist', hue='species');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, the feature distributions of the feature values observable for the unknown iris flower $x^{s1}$ exhibit a high likelihood of beeing of class **setosa**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Second unknown iris flower:** Let's apply the learned SVM model to a second unknown or unseen **iris flower** observations. The second **iris flower** observation $x^{s2}$ exhibits the following observed feature values $x^{s2} = \\{x_{1}=7.8, x_{2}=2.3, x_{3}=6.4, x_{4}=2.5\\}$:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"center\" style=\"max-width: 200px; height: auto\" src=\"iris_sample_2.png\">\n",
    "\n",
    "\n",
    "(Source: https://de.wikipedia.org/wiki/Schwertlilien)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's again convert those measurements into a feature vector $x^{s2}$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init features of the second unknown iris flower observation \n",
    "sepal_length = 7.8\n",
    "sepal_width  = 2.3\n",
    "petal_length = 6.4\n",
    "petal_width  = 2.5\n",
    "\n",
    "# create the observation feature vector\n",
    "x_s2_feature_vector = [sepal_length, sepal_width, petal_length, petal_width]\n",
    "\n",
    "# print the feature vector\n",
    "print(x_s2_feature_vector)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use the trained SVM model $h$ to predict the class $c^{*}$ of the unknown iris flower $x^{s2}$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# determine class label prediction of the first unknown observation\n",
    "class_prediction_sample_2 = svm.predict([x_s2_feature_vector])\n",
    "\n",
    "# convert predicted class label to class name\n",
    "print(iris.target_names[class_prediction_sample_2[0]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, does this looks like a reasonable prediction? Let's again try to build an intuition of the prediction derived from the SVM model $h$ based on the distinct iris flower class distributions including $x^{s2}$: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init the plot\n",
    "plt.figure(figsize=(10, 10))\n",
    "\n",
    "# load the dataset also available in seaborn\n",
    "iris_plot = sns.load_dataset(\"iris\")\n",
    "\n",
    "# add observations to the iris dataset\n",
    "iris_plot = iris_plot.append(pd.DataFrame([[7.8, 2.3, 6.4, 2.50, \"observation s2\"]], columns=iris_plot.columns))\n",
    "\n",
    "# plot a pairplot of the distinct feature distributions\n",
    "sns.pairplot(iris_plot, diag_kind='hist', hue='species');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, the feature distributions of the feature values observable for the unknown iris flower $x^{s1}$ exhibit a high likelihood of beeing of class **virginica**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.3.6. Linear Support Vector Machine (SVM) Classifers - The Non-Linear Seperable Case"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, great we have seen how to apply Support Vector classification to separable data. So how can we extend these ideas to handle non-separable data? To achieve this we would like to relax the initial constraints $ x_{i} \\cdot w + b \\geq + 1, y_{i} = +1 $ and $ x_{i} \\cdot w + b \\leq - 1, y_{i} = -1 $ when necessary. That is, we would like to introduce a further cost for doing so. This can be done by the introducing of so-called positive **\"slack variables\"** denoted $\\xi_{i}, i=1, ..., l$ in the Lagrange optimization $L_{P}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"center\" style=\"max-width: 400px; height: auto\" src=\"hyperplaneslack.png\">\n",
    "\n",
    "Linear separating hyperplanes $H_{1}$, $H_{2}$, and $H^{*}$ for the non-separable case. The support vectors that constitute $H_{1}$, $H_{2}$ are circled.\n",
    "\n",
    "(Source: https://link.springer.com/article/10.1023/A:1009715923555)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Therefore, the initial constraints become:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$ x_{i} \\cdot w + b \\geq + 1 - \\xi_{i}, y_{i} = +1 $$\n",
    "\n",
    "$$ x_{i} \\cdot w + b \\leq - 1 + \\xi_{i}, y_{i} = -1 $$\n",
    "\n",
    "$$ \\xi_{i} \\geq 0,  \\forall i$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Thus, for an error to occur, the corresponding $\\xi_{i}$ must exceed unity. As a result, $\\sum_{i=1}^{l} \\xi_{i}$ defines an upper bound on the number of training errors."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### A \"Primal\"  Optimization Objective Formulation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A natural way to assign such an extra cost for errors is to add it to the primal Lagrangian objective function $L_{P}$ to be optimized. The Lagrangian therefore becomes:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$L_{P} = \\frac{1}{2}||w||^{2} + C \\sum_{i=1}^{l} \\xi_{i} - \\sum_{i=1}^{l} \\alpha_{i}\\{y_{i}(x_{i} \\cdot w + b) -1 + \\xi_{i}\\} + \\sum_{i=1}^{l}\\alpha_{i} - \\sum_{i=1}^{l} \\mu_{i} \\xi_{i} $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "where $C$ is a parameter determines the penalty magnitude of errors. Furthermore, $\\mu_{i}$ are another set of Lagrange multipliers introduced to enforce positivity of the slack variables $\\xi_{i}$. We must now minimize $L_{P}$ with respect to $w$, $b$. Thereby, \n",
    "\n",
    "> 1. the minimization of the first term $\\frac{1}{2}||w||^{2}$ maximizes the margin of the separating hyperplane,\n",
    "> 2. the minimization of the second term $C \\sum_{i=1}^{l} \\xi_{i}$ minimizes the penalty of misclassfied training samples,\n",
    "> 3. the maximization of the third term $\\sum_{i=1}^{l} \\alpha_{i}y_{i}(x_{i} \\cdot w + b)$ maximizes the number of correctly classfied training samples,\n",
    "> 4. the minimization of the fourth term $\\sum_{i=1}^{l}\\alpha_{i}$ minimizes the number of support vectors, \n",
    "> 5. the maximization of the fifth term $\\sum_{i=1}^{l} \\mu_{i} \\xi_{i}$ enforces the positivity of the slack variables."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In general, the penalty term $C$ is a parameter to be chosen by the user. A larger $C$ corresponds to assigning a higher penalty to errors."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### A \"Dual\" Optimization Objective Formulation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can again derive a dual formulation of the optimization objective using the conditions that $w = \\sum_{i=1}^{l} \\alpha_{i}y_{i}x_{i}$ and $\\sum_{i=1}^{l}\\alpha_{i}y_{i} = 0$, which becomes: "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$L_{D} = \\sum_{i=1}^{l}\\alpha_{i} + \\frac{1}{2} \\sum_{i,j=1}^{l} \\alpha_{i}\\alpha_{j}y_{i}y_{j}<x_{i}, x_{j}>$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "subject to $0 \\leq \\alpha_{i} \\leq C$. The only difference in comparison to the optimal hyperplane case is that the $\\alpha_{i}$ now have an upper bound of C. Again, the optimal seperating hyperplane $H^{*}$ still remains a linear function of the training data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.3.7. Training of a Support Vector Machine (SVM) Classifier Using Different C Parameterizations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's inspect different parametrizations of $C$ and their corresponding impact on the determined support vectors and learned optimal separating hyperplane $H^{*}$. We can obtain the learned support vectors from the model using the `support_vectors_` method available `Scikit-Learn`. Let's again fit a linear SVM to the training data observations $x_{i}$ using a penalty of $C=1$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init the Support Vector Machine classifier\n",
    "svm = SVC(kernel='linear', C=1, random_state=random_seed)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will train the SVM model on the sepal length $x_1$ and petal length $x_3$ features of the iris flower dataset to seperate flowers of the classes $c_{1}=$ versicolor and $c_{2}=$ virginica:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x_train_test = x_train[y_train != 0, :][:, [0,2]]\n",
    "y_train_test = y_train[y_train != 0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's fit the linear SVM model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "svm.fit(x_train_test, y_train_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's briefly glance over the determined support vectors for which $\\alpha_{i} > 0$ and that constitute the learned max-margin separating hyperplane $H^{*}$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "svm.support_vectors_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, let's visually inspect the maximum margin separating hyperplane $H^{*}$ that was learned by our SVM. Remember, the learned hyperplane was optimized to seperate the features sepal length $x_1$ and petal length $x_3$ of the iris flower classes $c_{1}=$ versicolor and $c_{2}=$ virginica:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init the plot\n",
    "fig = plt.figure(figsize=(6, 6))\n",
    "ax = fig.add_subplot(111)\n",
    "\n",
    "# add grid\n",
    "ax.grid(linestyle='dotted')\n",
    "\n",
    "# plot sepal length vs. petal length and corresponding classes\n",
    "ax.scatter(x_train[:,0], x_train[:,2], c=y_train, cmap=plt.cm.Set1)\n",
    "\n",
    "# highlight the determined support vectors in green\n",
    "ax.scatter(svm.support_vectors_[:,0], svm.support_vectors_[:,1], s=200, linewidth=1, facecolor='none', edgecolors='k', label='support vectors')\n",
    "\n",
    "# determine axis ranges\n",
    "ax = plt.gca()\n",
    "xlim = ax.get_xlim()\n",
    "ylim = ax.get_ylim()\n",
    "\n",
    "# create meshgrid to evaluate model\n",
    "xx = np.linspace(xlim[0], xlim[1], 30)\n",
    "yy = np.linspace(ylim[0], ylim[1], 30)\n",
    "YY, XX = np.meshgrid(yy, xx)\n",
    "xy = np.vstack([XX.ravel(), YY.ravel()]).T\n",
    "\n",
    "# determine and plot decision boundary\n",
    "Z = svm.decision_function(xy).reshape(XX.shape)\n",
    "ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--', '-', '--'])\n",
    "\n",
    "# add axis legends\n",
    "ax.set_xlabel(\"[sepal_length]\", fontsize=14)\n",
    "ax.set_ylabel(\"[petal_length]\", fontsize=14)\n",
    "\n",
    "# add plot title\n",
    "plt.title('Sepal Length vs. Petal Length - Decision Boundary', fontsize=14);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, we can observe how the learned 24 support vectors nicely constitute the optimal maximum margin separating hyperplane $H^{*}$. Let's now investigate how different values of $C \\in \\{0.1, 10, 100, 1000\\}$ will penalize and therefore affect the number of support vectors. Remember, a larger value of $C$ corresponds to assigning a higher penalty to errors:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init distinct C values\n",
    "C_values = [0.1, 1, 10, 100]\n",
    "\n",
    "# init SVM models of distinct C values\n",
    "svm_models = (SVC(kernel='linear', C=C, random_state=random_seed) for C in C_values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's fit the linear SVM models using distinct values of the penalty term $C$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# fit the distinct SVM models to the data\n",
    "svm_models = (model.fit(x_train_test, y_train_test) for model in svm_models)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now again visually inspect the maximum margin separating hyperplane $H^{*}$ that was learned by our SVM and applying different values of $C$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init the plot\n",
    "fig, sub = plt.subplots(2, 2, figsize=(14, 14))\n",
    "\n",
    "# iterate over distinct models\n",
    "for model, ax in zip(svm_models, sub.flatten()):\n",
    "    \n",
    "    # add grid\n",
    "    ax.grid(linestyle='dotted')\n",
    "\n",
    "    # plot sepal length vs. petal length and corresponding classes\n",
    "    ax.scatter(x_train[:,0], x_train[:,2], c=y_train, cmap=plt.cm.Set1)\n",
    "\n",
    "    # highlight the determined support vectors in green\n",
    "    ax.scatter(model.support_vectors_[:,0], model.support_vectors_[:,1], s=200, linewidth=1, facecolor='none', edgecolors='k', label='support vectors')\n",
    "\n",
    "    # determine and plot decision boundary\n",
    "    Z = model.decision_function(xy).reshape(XX.shape)\n",
    "    ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--', '-', '--'])\n",
    "\n",
    "    # add axis legends\n",
    "    ax.set_xlabel(\"[sepal_length]\", fontsize=14)\n",
    "    ax.set_ylabel(\"[petal_length]\", fontsize=14)\n",
    "\n",
    "    # add plot title\n",
    "    ax.set_title('Decision Boundary, C={}, kernel=\\'{}\\''.format(str(model.C), str(model.kernel)), fontsize=14);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can indeed observe that with increasing $C$ the number of misclassifications as well as the number of support vectors that constitute $H^{*}$ decreases. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.3.8. Non-Linear Support Vector Machine (SVM) Classifiers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How can the above linear SVMs be generalised to the case where the optimal separating hyperplane $H^{*}$ can not be formulated as a linear function of the data? This holds for instances when the training data is not linearly separable. Boser, Guyon and Vapnik [7] showed the so-called **\"kernel trick\"** (introduced by Aizermann[8]) could be used to accomplish this in a surprisingly straightforward way. First notice again, from the training objectives dual formulation, that the only way in which the data appears in the objective is in the form of dot products $<x_{i}, x_{j}>$. Now suppose we first mapped the data to some other (possibly infinite-dimensional) Euclidean space $\\mathcal{H}$, using the mapping which we will call $\\phi$:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\\phi: \\mathcal{R}^{d} \\mapsto \\mathcal{H}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, of course, the training algorithm would only depend on the data through dot products in $\\mathcal{H}$, i.e. on functions of the form $\\phi(x_{i}) \\cdot \\phi(x_{j})$. Now if there were a **\"kernel function\"** $K$ such that $K(x_{i}, x_{j}) = \\phi(x_{i}) \\cdot \\phi(x_{j})$, we would only need to use $K$ in the training algorithm, and would never need to explicitly even know what $\\phi$ is. One such kernel function is:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$K(x_{i}, x_{j}) = e^{-||x_{i}-x_{j}||^{2} / 2 \\sigma^{2}} $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this particular example, $\\mathcal{H}$ is infinite-dimensional, so it would not be very easy to work with $\\phi$ explicitly. However, if one replaces $x_{i} \\cdot x_{j}$ by $K(x_{i}, x_{j})$ everywhere in the training procedure, the algorithm will happily produce a SVM which lives in an infinite-dimensional space. All considerations of the previous sections still hold, since we are still doing a linear separation but in a different space. Since we can again derive a dual formulation of the optimisation objective using the conditions that $w = \\sum_{i=1}^{l} \\alpha_{i}y_{i}x_{i}$ and $\\sum_{i=1}^{l}\\alpha_{i}y_{i} = 0$, which becomes:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$L_{D} = \\sum_{i=1}^{l}\\alpha_{i} + \\frac{1}{2} \\sum_{i,j=1}^{l} \\alpha_{i}\\alpha_{j}y_{i}y_{j}K(x_{i}, x_{j})$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "subject to $0 \\leq \\alpha_{i} \\leq C$. The only difference in comparison to the linear hyperplane case is that the dot product $<x_{i}, x_{j}>$ is now replaced by a kernel function $K(x_{i}, x_{j})$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.3.9. Training of a Support Vector Machine (SVM) Classifier Using Different Kernel Functions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now train a set of non-linear SVMs and evaluate different kernel functions $K(x_{i}, x_{j})$. We will again train the distinct SVM models on the sepal length $x_1$ and petal length $x_3$ features of the iris flower dataset to separate the distinct flower classes $c_{0}=$ setosa, $c_{1}=$ versicolor and $c_{2}=$ virginica:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x_train_kernel = x_train[:, [0, 2]]\n",
    "y_train_kernel = y_train"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we will instantiate several SVM models each equipped with a different kernel function. Thereby, we will use three of the kernel functions already available in the `Scikit-Learn` library: "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 1. linear kernel function: **$<x_{i}, x_{j}>$**,\n",
    "> 2. radial-basis kernel-function: $exp({- \\gamma ||x_{i}, x_{j}||^{2}})$, where $\\gamma$ is specified by the keyword `gamma` and must be greater than 0,\n",
    "> 3. polynomial kernel-function: $(\\gamma <x_{i}, x_{j}> + r)^{d}$, where $d$ is specified by the keyword `degree` and $r$ by `coef0`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's instantiate the distinct SVM models accordingly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init the SVM models using distinct kernel functions\n",
    "svm_models = (SVC(kernel='linear', C=1)\n",
    "              , SVC(kernel='rbf', gamma=0.1, C=1)\n",
    "              , SVC(kernel='rbf', gamma=0.2, C=1)\n",
    "              , SVC(kernel='rbf', gamma=0.5, C=1)\n",
    "              , SVC(kernel='rbf', gamma=0.7, C=1)\n",
    "              , SVC(kernel='poly', degree=1, coef0=1.0, C=1)\n",
    "              , SVC(kernel='poly', degree=2, coef0=1.0, C=1)\n",
    "              , SVC(kernel='poly', degree=5, coef0=1.0, C=1)\n",
    "              , SVC(kernel='poly', degree=7, coef0=1.0, C=1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's subsequently train the distinct SVM models: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# fit the distinct SVM models to the data\n",
    "svm_models = (model.fit(x_train_kernel, y_train_kernel) for model in svm_models)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's visually inspect the optimal separating hyperplane $H^{*}$ learned by the distinct kernel functions $K(x_{i}, x_{j})$ to separate the sepal length $x_1$ and petal length $x_3$ features :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init the plot\n",
    "fig, sub = plt.subplots(3, 3, figsize=(14, 14))\n",
    "\n",
    "# determine mesh-grid limitations\n",
    "xlim = [np.min(x_train[:, 0]) - 0.8, np.max(x_train[:, 0]) + 0.8]\n",
    "ylim = [np.min(x_train[:, 2]) - 0.8, np.max(x_train[:, 2]) + 0.8]\n",
    "\n",
    "# create meshgrid to evaluate model\n",
    "xx = np.linspace(xlim[0], xlim[1], 1000)\n",
    "yy = np.linspace(ylim[0], ylim[1], 1000)\n",
    "YY, XX = np.meshgrid(yy, xx)\n",
    "xy = np.vstack([XX.ravel(), YY.ravel()]).T\n",
    "\n",
    "# iterate over distinct models\n",
    "for model, ax in zip(svm_models, sub.flatten()):\n",
    "    \n",
    "    print(model)\n",
    "    \n",
    "    # add grid\n",
    "    ax.grid(linestyle='dotted')\n",
    "    \n",
    "    Z = model.predict(xy).reshape(XX.shape)\n",
    "    ax.contourf(XX, YY, Z, alpha=0.5, cmap=plt.cm.coolwarm)\n",
    "    \n",
    "    # plot sepal length vs. petal length and corresponding classes\n",
    "    ax.scatter(x_train[:,0], x_train[:,2], c=y_train, cmap=plt.cm.Set1)\n",
    "\n",
    "    # highlight the determined support vectors in green\n",
    "    ax.scatter(model.support_vectors_[:,0], model.support_vectors_[:,1], s=200, linewidth=1, facecolor='none', edgecolors='k', label='support vectors')\n",
    "    \n",
    "    # set axis ranges\n",
    "    ax.set_xlim(xlim)\n",
    "    ax.set_ylim(ylim)\n",
    "    \n",
    "    # add axis legends\n",
    "    ax.set_xlabel('[sepal_length]', fontsize=10)\n",
    "    ax.set_ylabel('[petal_length]', fontsize=10)\n",
    "    \n",
    "    # add plot title\n",
    "    ax.set_title('C={}, kernel=\\'{}\\', degree=\\'{}\\', gamma=\\'{}\\''.format(str(model.C), str(model.kernel), str(model.degree), str(model.gamma)), fontsize=10);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. History of Oriented Gradients (HOG) Feature Extraction and Classification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1. Dataset Download and Data Assessment "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The **MNIST database** (**M**odified **N**ational **I**nstitute of **S**tandards and **T**echnology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is widely used for training and testing in the field of machine learning. Let's have a brief look into a couple of sample images contained in the dataset:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"center\" style=\"max-width: 500px; height: 300px\" src=\"mnist.png\">\n",
    "\n",
    "(Source: https://en.wikipedia.org/wiki/MNIST_database)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Further details on the dataset can be obtained via: *LeCun, Y., 1998. \"The MNIST database of handwritten digits\", ( http://yann.lecun.com/exdb/mnist/ ).\"*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The MNIST database contains **60,000 training images** and **10,000 evaluation images**. The size of each image is 28 by 28 pixels. The handwritten digits contained in each fixe-sized image have been size-normalized and centred. The MNIST dataset is a great dataset to start with when learning about machine learning techniques and pattern recognition methods on real-world data. It requires minimal efforts on preprocessing and formatting the distinct images."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.1.1. Training Dataset Download and Data Assessment "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's download, transform and inspect the training images of the dataset. Therefore, let's first define the directory in which we aim to store the training data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_path = './data/train_mnist'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's download the training data accordingly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# download and transform training images\n",
    "mnist_train_data = torchvision.datasets.MNIST(root=train_path, train=True, download=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Convert the downloaded images to `Numpy` arrays: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# convert images and labels to numpy array\n",
    "mnist_train_data_images = mnist_train_data.data.numpy()\n",
    "mnist_train_data_labels = mnist_train_data.targets.data.numpy()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Verify the number and dimensionality of training images downloaded:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# determine the number of training data images\n",
    "mnist_train_data_images.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Verify the number and dimensionality of training labels downloaded:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mnist_train_data_labels.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Furthermore, let's visually inspect a randomly sampled training image:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set image id\n",
    "image_id = 1000\n",
    "\n",
    "# obtain image\n",
    "mnist_train_image = mnist_train_data_images[image_id, :, :]\n",
    "mnist_train_label = mnist_train_data_labels[image_id]\n",
    "\n",
    "# set image plot title \n",
    "plt.title('Example: {}, Label: {}'.format(str(image_id), str(mnist_train_label)))\n",
    "\n",
    "# plot mnist handwritten digit sample\n",
    "plt.imshow(mnist_train_image, cmap='gray');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.1.2. Evaluation Dataset Download and Data Assessment "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now also download, transform and inspect the evaluation images of the dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set directory of evaluation images\n",
    "eval_path = './data/eval_mnist'\n",
    "\n",
    "# download and transform evaluation images\n",
    "mnist_eval_data = torchvision.datasets.MNIST(root=eval_path, train=False, download=True)\n",
    "\n",
    "# convert images and labels to numpy array\n",
    "mnist_eval_data_images = mnist_eval_data.data.numpy()\n",
    "mnist_eval_data_labels = mnist_eval_data.targets.data.numpy()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Verify the number and dimensionality of evaluation images downloaded:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# determine the number of evaluation data images\n",
    "mnist_eval_data_images.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Verify the number and dimensionality of evaluation labels downloaded:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mnist_eval_data_labels.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's again visually inspect a randomly sampled training image:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set image id\n",
    "image_id = 1000\n",
    "\n",
    "# obtain image\n",
    "mnist_eval_image = mnist_eval_data_images[image_id, :, :]\n",
    "mnist_eval_label = mnist_eval_data_labels[image_id]\n",
    "\n",
    "# set image plot title \n",
    "plt.title('Example: {}, Label: {}'.format(str(image_id), str(mnist_eval_label)))\n",
    "\n",
    "# plot mnist handwritten digit sample\n",
    "plt.imshow(mnist_eval_image, cmap='gray');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2. History of Oriented Gradients (HOG) Feature Extraction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The **\"Histogram of Oriented Gradients (HOG)\"** is a feature descriptor used in computer vision and image processing originally developed for the purpose of object detection. The technique counts occurrences of gradient orientation in localised portions of an image. Its usage became widespread in 2005 when Navneet Dalal and Bill Triggs, researchers for the French National Institute for Research in Computer Science and Automation (INRIA), presented their supplementary work on HOG descriptors at the Conference on Computer Vision and Pattern Recognition (CVPR) [9]."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.2.1. Extraction of Image Patches"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the first step of the HOG feature extraction, the images are divided into tiny **\"patches\"**, each consisting of N×N pixels. In general, the patch size is a design choice informed by the scale of features we are looking for and task we aim to accomplish. To classify the 28x28 MNIST handwritten digit images presented above, we will use patches of size 7x7 pixels, which will nicely divide each image into 4x4=16 image patches. The extraction of such a single 7x7 image patch is shown below:  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"center\" style=\"max-width: 900px; height: auto\" src=\"hogpatches.png\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.2.2. Calculation of Image Patch Gradients"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, in order to determine the distinct values of the HOG features, we calculate the horizontal and vertical gradients of each image patch. This can be achieved by filtering each patch using the two kernels or **\"filter masks\"** as shown below. Thereby, we will obtain for each filter mask, a corresponding **\"gradient map\"** that records the intensity of pixel value change in the particular direction of the filter mask. As a result, the gradient maps remove a lot of non-discriminative information ( e.g., image regions that exhibit a constant colour intensity ), but highlighted regions of high color intensity changes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"center\" style=\"max-width: 900px; height: auto\" src=\"hoggradients.png\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's have look at the image gradients obtainable for the horizonal filter-mask or kernel $k_{x}=[-1, 0, 1]$ in the x-direction of the $1000^{th}$ sample image contained in the evaluation dataset. Thereby, dark pixel values correspond to high negative gradient value and light pixel values to high positive gradient values (prior to the determination of the gradients absolute value):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define the filter masks\n",
    "kernel_x = np.array([[-1, 0, 1]])\n",
    "\n",
    "# determine the horizontal image gradients\n",
    "g_x = sp.signal.convolve2d(mnist_eval_image, kernel_x) \n",
    "\n",
    "# set image plot title \n",
    "plt.title('Gradients x-Direction, Example: {}, Label: {}'.format(str(image_id), str(mnist_eval_label)))\n",
    "\n",
    "# plot mnist handwritten digit sample\n",
    "plt.imshow(g_x, cmap='gray');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's have look at the image gradients obtainable for the vertical filter-mask or kernel $k_{y}=[-1, 0, 1]^{T}$ in the y-direction of the $1000^{th}$ sample image contained in the evaluation dataset. Thereby, dark pixel values correspond to high negative gradient value and light pixel values to high positive gradient values (prior to the determination of the gradients absolute value):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define the filter masks\n",
    "kernel_y = np.array([[-1, 0, 1]]).T\n",
    "\n",
    "# determine the vertical image gradients\n",
    "g_y = sp.signal.convolve2d(mnist_eval_image, kernel_y)\n",
    "\n",
    "# set image plot title \n",
    "plt.title('Gradients y-Direction, Example: {}, Label: {}'.format(str(image_id), str(mnist_eval_label)))\n",
    "\n",
    "# plot mnist handwritten digit sample\n",
    "plt.imshow(g_y, cmap='gray');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.2.3. Calculation of Gradient Magnitude and Orientation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once the gradients in (1) horizontal or x-direction and (2) vertical or y-direction is obtained for each pixel the information is consolidated to derive a more general information about the pixel intensity changes within an image. This is accomplished by the derivation of two important gradient attributes, namely:\n",
    "\n",
    ">- the **\"magnitude\"** of the gradients given be the gradients L2-norm: $\\sqrt{g_{x}^{2} + g_{y}^{2}}$,\n",
    ">- the **\"orientation\"** of the gradients given by the gradients arctangent: $\\arctan (\\frac{g_{y}}{g_{y}})$.\n",
    "\n",
    "We will derive both attributes for each of the pixel values contained in the distinct image patches. This results in the gradient magnitude and gradient orientation map, as shown below: "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"center\" style=\"max-width: 800px; height: auto\" src=\"hogmagnitudeorientation.png\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.2.4. Calculation of Histogram of Oriented Gradients (HOG)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As a last step, we will determine the HOG descriptors based on the gradient magnitude and the gradient orientation map. To achieve this, we will compute the histogram of the gradient orientations binned into $b_{n}, n=1,...,9$ bins. Thereby, the distinct bins correspond to equidistant intervalls of possible gradient orientations, e.g. $b_{1}=[0°, 19°], b_{2}=[20°, 39°], b_{3}=[40°, 59°], ..., b_{9}=[160°, 179°].$\n",
    "\n",
    "For each pixel of the image patch, the corresponding bin is selected based on its gradient orientation, and the vote ( the value that goes into the bin ) is selected based on the normalized gradient magnitude, according to: "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$b_{d} = \\frac{|b_{d} - d|}{b_{d}} \\times m = \\frac{|20 - 39|}{20} \\times 297 = 282.15$$\n",
    "\n",
    "$$b_{d+1} = \\frac{|b_{d+1} - d|}{b_{d}} \\times m = \\frac{|40 - 39|}{20} \\times 297 = 14.85$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"center\" style=\"max-width: 800px; height: auto\" src=\"hogfeatures.png\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once all the values of the gradient maps have been collected to obtained histogram of gradients is normalized. This normalization is usually done by calculating the $L2-Norm$ over the distinct bin values, as shown in the following: \n",
    "\n",
    "$$||h||_{2} = \\sqrt{b_{1}^{2} + b_{2}^{2} + ... + b_{n}^{2}} = \\sqrt{420^2 + 1110^2 + ... + 787^2} = 2312.9$$\n",
    "\n",
    "and normalize the distinct bins accordingly to obtain the HOG feature vector of a particular image patch:\n",
    "\n",
    "$$ x_{i} = [\\frac{420}{2312.9}, \\frac{1110}{2312.9}, ..., \\frac{787}{2312.9}] = [0.18, 0.47, 0.28, ..., 0.34]$$\n",
    "\n",
    "where $i$ denotes the current of the N=16 image patches. Ultimately, all the HOG feature vectors obtained for the 16 distinct image patches are concatenated into a single HOG combined feature vector of an image.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's calculate the HOG feature descriptors for the MNIST images of the training dataset: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# extract the hog features of all training images\n",
    "\n",
    "# init list of hog features\n",
    "mnist_train_data_hog_features = []\n",
    "mnist_train_data_hog_images = []\n",
    "\n",
    "# iterate over all training images\n",
    "for i, mnist_train_image in enumerate(mnist_train_data_images):\n",
    "    \n",
    "    # extract hog features of current training image\n",
    "    train_features, train_image = hog(mnist_train_image, orientations=4, pixels_per_cell=(7, 7), cells_per_block=(1, 1), visualize=True)\n",
    "    \n",
    "    # collect extracted hog features\n",
    "    mnist_train_data_hog_features.append(train_features)\n",
    "    mnist_train_data_hog_images.append(train_image)\n",
    "    \n",
    "    # case: print image processing status\n",
    "    if i % 10000 == 0:\n",
    "        \n",
    "        # print log message\n",
    "        print('[LOG] {} features of training image {} succesfully extracted.'.format(str(len(train_features)), str(i).zfill(5)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Inspect the completeness of the generated feature vectors derived from the training data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "len(mnist_train_data_hog_features)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Inspect a single feature vector:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mnist_train_data_hog_features[1000]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Inspect the number of features extracted for each MNIST digit image: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "len(mnist_train_data_hog_features[1000])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, we extracted HOG features for 4 orientations from each image consisting of 16 (4x4) patches of 7x7 pixels each. This results on total length of 64 extracted features per image (16 patches x 4 orientations)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's also visualise the HOG features of an exemplary MNIST digit image of the training dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.imshow(mnist_train_data_hog_images[1000], cmap='gray')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's calculate the HOG feature descriptors for the MNIST images of the training dataset: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# extract the hog features of all evaluation images\n",
    "\n",
    "# init list of hog features\n",
    "mnist_eval_data_hog_features = []\n",
    "mnist_eval_data_hog_images = []\n",
    "\n",
    "# iterate over all training images\n",
    "for i, mnist_eval_image in enumerate(mnist_eval_data_images):\n",
    "    \n",
    "    # extract hog features of current evluation image\n",
    "    eval_features, eval_image = hog(mnist_eval_image, orientations=4, pixels_per_cell=(7, 7), cells_per_block=(1, 1), visualize=True)\n",
    "    \n",
    "    # collect extracted hog features\n",
    "    mnist_eval_data_hog_features.append(eval_features)\n",
    "    mnist_eval_data_hog_images.append(eval_image)\n",
    "    \n",
    "    # case: print image processing status\n",
    "    if i % 1000 == 0:\n",
    "        \n",
    "        # print log message\n",
    "        print('[LOG] {} features of evaluation image {} succesfully extracted.'.format(str(len(eval_features)), str(i).zfill(5)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Inspect the completeness of the generated feature vectors derived from the evaluation data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "len(mnist_eval_data_hog_features)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's also visualise the HOG features of an exemplary MNIST digit image of the evaluation dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.imshow(mnist_eval_data_hog_images[1000], cmap='gray')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3. History of Oriented Gradients (HOG) Feature Classification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.3.1. Training of the Support Vector Machine Classifier"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's instantiate one of the SVM classifiers available in `Scikit-Learn` to learn a linear seperating hyperplane based on the extracted History of Oriented Gradients (HOG) features:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init the Support Vector Machine classifier\n",
    "svm = SVC(kernel='linear', C=1, random_state=random_seed)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Train or fit the SVM classifier using the extracted training dataset HOG features and training dataset labels:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Perform the training\n",
    "svm.fit(mnist_train_data_hog_features[0:10000], mnist_train_data_labels[0:10000])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.3.2. Evaluation of the trained Support Vector Machine Classifier"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After fitting the HOG features of the training data, the optimal seperating hyperplane $H^{*}$ learned by the SVM model can then be used to predict the corresponding class labels $y_{i}'$ of so far unknown observations $x_{i}'$. We will now utilize the trained model to predict the class labels of the observations contained in the evaluation dataset based the extracted HOG features:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_pred = svm.predict(mnist_eval_data_hog_features)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's have a look at the class labels $y_{i}'$ **predicted** by the SVM classifier on the evaluation dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_pred"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As well as the **true** class labels $y_{i}$ as contained in the evaluation dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mnist_eval_data_labels"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, comparing the **true** and **predicted** class labels looks encouraging. Let's determine the exact **prediction accuracy** that the optimal separating hyperplane $\\mathcal{H}^{*}$ learned by the SVM was able to achieve on the evaluation dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Model classification accuracy: {}%'.format(str(metrics.accuracy_score(mnist_eval_data_labels, y_pred) * 100)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Determine the number of **misclassified** data sampels in the evaluation dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Number of mislabeled points out of a total {} points: {}'.format(mnist_eval_data_labels.shape[0], np.sum(mnist_eval_data_labels != y_pred)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Determine and plot the **confusion matrix** of the individual predictions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# determine the prediction confusion matrix\n",
    "mat = confusion_matrix(mnist_eval_data_labels, y_pred)\n",
    "\n",
    "# init the plot\n",
    "plt.figure(figsize=(5, 5))\n",
    "\n",
    "# plot confusion matrix heatmap\n",
    "sns.heatmap(mat.T, square=True, annot=True, fmt='d', cbar=False, cmap='YlOrRd_r', xticklabels=range(0,10), yticklabels=range(0,10))\n",
    "\n",
    "# add plot axis labels\n",
    "plt.xlabel('[true class label $y_{i}$]')\n",
    "plt.ylabel('[predicted class label $y_{i}\\'$]')\n",
    "\n",
    "# add plot title\n",
    "plt.title('Confusion Matrix - SVM Predictions');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercises:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We recommend you to try the following exercises as part of the lab:\n",
    "\n",
    "**1. Train and evaluate the prediction accuracy of SVM models trained with different hyperparameters.**\n",
    "\n",
    "> Change the kernel function $\\phi$ of the SVM to a polynomial kernel, fit your model and calculate the new classification accuracy on the IRIS dataset. Subsequently, repeat similar experiment with different SVM hyperparameter setups by changing the value of $C$, $\\gamma$ and the kernel function $\\phi$. What pattern can be observed by the distinct hyperparameter setups in terms of classification accuracy? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ***************************************************\n",
    "# INSERT YOUR CODE HERE\n",
    "# ***************************************************"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**2. Train and evaluate the prediction accuracy of SVM models using different or additional features.**\n",
    "\n",
    "> Fix the hyperparameters of the SVM and evalute the classification accuracy on the MNIST dataset using different features. For example, evaluate the prediction accuracy that can be derived based on a set of Scale-Invariant Feature Transform (SIFT) features. Or the combination of HOG and SIFT features. Will the consideration of additional features improve you classification accuracy?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ***************************************************\n",
    "# INSERT YOUR CODE HERE\n",
    "# ***************************************************"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### References:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[1] **\"A Tutorial on Support Vector Machines for Pattern Recognition\"**, Burges C. J. C. , Bell Laboratories, Lucent Technologies, Data Mining and Knowledge Discovery, 2, 121-167, 1998.\n",
    "\n",
    "[2] **\"Support Vector Networks\"**, Cortes C. and Vapnik V., Machine Learning, 20:273-297, 1995. \n",
    "\n",
    "[3] **\"Comparison of View-Based Object Recognition Algorithms Using Realistic 3D Models\"**, Blanz V., Schölkopf B., Bülthoff H., Burges C., Vapnik V., and Vetter T., Artificial Neural Networks - ICANN'96, 251-256, 1996. \n",
    "\n",
    "[4] **\"Identifying Speaker with Support Vector Networks\"**, Schmidt M., Interface '96 Proceedings, 1996. \n",
    "\n",
    "[5] **\"Training Support Vector Machines: An Application to Face Detection\"**, Osuna E., Freund R., Girosi F., IEEE Conference on Computer Vision and Pattern Recognition, 130-136, 1997.\n",
    "\n",
    "[6] **\"Text Categorization With Support Vector Machines\"**, Joachims T., Technical Report, LS VIII Number 23, University of Dortmund, 1997.\n",
    "\n",
    "[7] **\"A Training Algorithm for Optimal Margin Classifiers\"**, Boser, B. E., Guyon, I. M., Vapnik V. A., Fifth Annual Workshop on Computational Learning Theory, 1992.  \n",
    "\n",
    "[8] **\"Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning\"**, Aizermann M. A., Bravermann E. M., Rozoner, L. I., Automation and Remote Control, 25:821-837, 1964.\n",
    "\n",
    "[9] **\"Histograms of Oriented Gradients for Human Detection\"**, Dalal N. and Triggs B., IEEE computer society conference on computer vision and pattern recognition (CVPR'05), vol. 1, pp. 886-893. IEEE, 2005."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Lab Summary:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this lab, a step by step introduction into (1) **Support Vector Classification** and (2) **Histogram of Oriented Gradients (HOG) features** classification is presented. The code and exercises presented in this lab may serves as a starting point for more complex and tailored programs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You may want to execute the content of your lab outside of the Jupyter notebook environment, e.g. on a compute node or a server. The cell below converts the lab notebook into a standalone and executable python script. Pls. note that to convert the notebook, you need to install Python's **nbconvert** library and its extensions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# installing the nbconvert library\n",
    "!pip install nbconvert\n",
    "!pip install jupyter_contrib_nbextensions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now convert the Jupyter notebook into a plain Python script:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!jupyter nbconvert --to script cfds_lab_07.ipynb"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "385px"
   },
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}