{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Benchmark of various outlier detection models\n", "\n", "### The models are evaluated by ROC, Precision @ n and execution time on 17 benchmark datasets. All datasets are split (60% for training and 40% for testing). The full result by averaging 10 indepent trials can be found [here](https://pyod.readthedocs.io/en/latest/benchmark.html).\n", "\n", "**[PyOD](https://github.com/yzhao062/pyod)** is a comprehensive **Python toolkit** to **identify outlying objects** in \n", "multivariate data with both unsupervised and supervised approaches.\n", "The model covered in this example includes:\n", "\n", " 1. Linear Models for Outlier Detection:\n", " 1. **PCA: Principal Component Analysis** use the sum of\n", " weighted projected distances to the eigenvector hyperplane \n", " as the outlier outlier scores)\n", " 2. **MCD: Minimum Covariance Determinant** (use the mahalanobis distances \n", " as the outlier scores)\n", " 3. **OCSVM: One-Class Support Vector Machines**\n", " \n", " 2. Proximity-Based Outlier Detection Models:\n", " 1. **LOF: Local Outlier Factor**\n", " 2. **CBLOF: Clustering-Based Local Outlier Factor**\n", " 3. **kNN: k Nearest Neighbors** (use the distance to the kth nearest \n", " neighbor as the outlier score)\n", " 4. **HBOS: Histogram-based Outlier Score**\n", " \n", " 3. Probabilistic Models for Outlier Detection:\n", " 1. **ABOD: Angle-Based Outlier Detection**\n", " \n", " 4. Outlier Ensembles and Combination Frameworks\n", " 1. **Isolation Forest**\n", " 2. **Feature Bagging**\n", "\n", " \n", "Corresponding file could be found at /examples/compare_all_models.py" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from __future__ import division\n", "from __future__ import print_function\n", "\n", "import os\n", "import sys\n", "from time import time\n", "\n", "# temporary solution for relative imports in case pyod is not installed\n", "# if pyod is installed, no need to use the following line\n", "sys.path.append(\n", " os.path.abspath(os.path.join(os.path.dirname(\"__file__\"), '..')))\n", "# supress warnings for clean output\n", "import warnings\n", "\n", "warnings.filterwarnings(\"ignore\")\n", "\n", "import numpy as np\n", "import pandas as pd\n", "from sklearn.model_selection import train_test_split\n", "from scipy.io import loadmat\n", "\n", "from pyod.models.abod import ABOD\n", "from pyod.models.cblof import CBLOF\n", "from pyod.models.feature_bagging import FeatureBagging\n", "from pyod.models.hbos import HBOS\n", "from pyod.models.iforest import IForest\n", "from pyod.models.knn import KNN\n", "from pyod.models.lof import LOF\n", "from pyod.models.mcd import MCD\n", "from pyod.models.ocsvm import OCSVM\n", "from pyod.models.pca import PCA\n", "\n", "from pyod.utils.utility import standardizer\n", "from pyod.utils.utility import precision_n_scores\n", "from sklearn.metrics import roc_auc_score" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "... Processing arrhythmia.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.7687, precision @ rank n:0.3571, execution time: 1.126s\n", "Cluster-based Local Outlier Factor ROC:0.7824, precision @ rank n:0.4643, execution time: 0.9786s\n", "Feature Bagging ROC:0.7796, precision @ rank n:0.4643, execution time: 0.6076s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.8511, precision @ rank n:0.5714, execution time: 0.8914s\n", "Isolation Forest ROC:0.8639, precision @ rank n:0.6071, execution time: 0.2507s\n", "K Nearest Neighbors (KNN) ROC:0.782, precision @ rank n:0.5, execution time: 0.1133s\n", "Local Outlier Factor (LOF) ROC:0.7787, precision @ rank n:0.4643, execution time: 0.0742s\n", "Minimum Covariance Determinant (MCD) ROC:0.8228, precision @ rank n:0.4286, execution time: 1.4578s\n", "One-class SVM (OCSVM) ROC:0.7986, precision @ rank n:0.5, execution time: 0.0481s\n", "Principal Component Analysis (PCA) ROC:0.7997, precision @ rank n:0.5, execution time: 0.0642s\n", "\n", "... Processing cardio.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.5892, precision @ rank n:0.1918, execution time: 0.3579s\n", "Cluster-based Local Outlier Factor ROC:0.973, precision @ rank n:0.7945, execution time: 0.0983s\n", "Feature Bagging ROC:0.6385, precision @ rank n:0.1781, execution time: 0.8683s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.8373, precision @ rank n:0.4521, execution time: 0.005s\n", "Isolation Forest ROC:0.9502, precision @ rank n:0.6027, execution time: 0.2767s\n", "K Nearest Neighbors (KNN) ROC:0.734, precision @ rank n:0.3562, execution time: 0.1905s\n", "Local Outlier Factor (LOF) ROC:0.588, precision @ rank n:0.1507, execution time: 0.0993s\n", "Minimum Covariance Determinant (MCD) ROC:0.8195, precision @ rank n:0.411, execution time: 0.5835s\n", "One-class SVM (OCSVM) ROC:0.9478, precision @ rank n:0.5342, execution time: 0.0822s\n", "Principal Component Analysis (PCA) ROC:0.9616, precision @ rank n:0.6849, execution time: 0.003s\n", "\n", "... Processing glass.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.6951, precision @ rank n:0.25, execution time: 0.0361s\n", "Cluster-based Local Outlier Factor ROC:0.7957, precision @ rank n:0.25, execution time: 0.0251s\n", "Feature Bagging ROC:0.7073, precision @ rank n:0.25, execution time: 0.0351s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.7073, precision @ rank n:0.0, execution time: 0.005s\n", "Isolation Forest ROC:0.7134, precision @ rank n:0.25, execution time: 0.1584s\n", "K Nearest Neighbors (KNN) ROC:0.8384, precision @ rank n:0.25, execution time: 0.013s\n", "Local Outlier Factor (LOF) ROC:0.7043, precision @ rank n:0.25, execution time: 0.003s\n", "Minimum Covariance Determinant (MCD) ROC:0.8293, precision @ rank n:0.0, execution time: 0.0481s\n", "One-class SVM (OCSVM) ROC:0.6585, precision @ rank n:0.25, execution time: 0.001s\n", "Principal Component Analysis (PCA) ROC:0.686, precision @ rank n:0.25, execution time: 0.001s\n", "\n", "... Processing ionosphere.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.9181, precision @ rank n:0.8431, execution time: 0.0602s\n", "Cluster-based Local Outlier Factor ROC:0.795, precision @ rank n:0.549, execution time: 0.0381s\n", "Feature Bagging ROC:0.9303, precision @ rank n:0.8039, execution time: 0.0642s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.6052, precision @ rank n:0.3922, execution time: 0.006s\n", "Isolation Forest ROC:0.8486, precision @ rank n:0.5882, execution time: 0.1785s\n", "K Nearest Neighbors (KNN) ROC:0.932, precision @ rank n:0.8824, execution time: 0.0251s\n", "Local Outlier Factor (LOF) ROC:0.9227, precision @ rank n:0.7843, execution time: 0.007s\n", "Minimum Covariance Determinant (MCD) ROC:0.9669, precision @ rank n:0.8627, execution time: 0.0622s\n", "One-class SVM (OCSVM) ROC:0.8257, precision @ rank n:0.6863, execution time: 0.004s\n", "Principal Component Analysis (PCA) ROC:0.7941, precision @ rank n:0.5686, execution time: 0.003s\n", "\n", "... Processing letter.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.8783, precision @ rank n:0.4375, execution time: 0.365s\n", "Cluster-based Local Outlier Factor ROC:0.5301, precision @ rank n:0.0312, execution time: 0.1052s\n", "Feature Bagging ROC:0.8947, precision @ rank n:0.4062, execution time: 0.7309s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.6063, precision @ rank n:0.0938, execution time: 0.007s\n", "Isolation Forest ROC:0.6201, precision @ rank n:0.0625, execution time: 0.2416s\n", "K Nearest Neighbors (KNN) ROC:0.8573, precision @ rank n:0.3125, execution time: 0.1905s\n", "Local Outlier Factor (LOF) ROC:0.8765, precision @ rank n:0.3438, execution time: 0.0922s\n", "Minimum Covariance Determinant (MCD) ROC:0.8061, precision @ rank n:0.1875, execution time: 1.3225s\n", "One-class SVM (OCSVM) ROC:0.5927, precision @ rank n:0.125, execution time: 0.0973s\n", "Principal Component Analysis (PCA) ROC:0.5216, precision @ rank n:0.125, execution time: 0.005s\n", "\n", "... Processing lympho.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.9831, precision @ rank n:0.0, execution time: 0.0261s\n", "Cluster-based Local Outlier Factor ROC:1.0, precision @ rank n:1.0, execution time: 0.0321s\n", "Feature Bagging ROC:1.0, precision @ rank n:1.0, execution time: 0.0261s\n", "Histogram-base Outlier Detection (HBOS) ROC:1.0, precision @ rank n:1.0, execution time: 0.005s\n", "Isolation Forest ROC:1.0, precision @ rank n:1.0, execution time: 0.1795s\n", "K Nearest Neighbors (KNN) ROC:1.0, precision @ rank n:1.0, execution time: 0.011s\n", "Local Outlier Factor (LOF) ROC:1.0, precision @ rank n:1.0, execution time: 0.003s\n", "Minimum Covariance Determinant (MCD) ROC:1.0, precision @ rank n:1.0, execution time: 0.0531s\n", "One-class SVM (OCSVM) ROC:1.0, precision @ rank n:1.0, execution time: 0.002s\n", "Principal Component Analysis (PCA) ROC:1.0, precision @ rank n:1.0, execution time: 0.001s\n", "\n", "... Processing mnist.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.7628, precision @ rank n:0.3367, execution time: 6.8502s\n", "Cluster-based Local Outlier Factor ROC:0.8204, precision @ rank n:0.3605, execution time: 0.9004s\n", "Feature Bagging ROC:0.7157, precision @ rank n:0.3741, execution time: 44.9604s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.5766, precision @ rank n:0.1361, execution time: 0.0411s\n", "Isolation Forest ROC:0.7939, precision @ rank n:0.2721, execution time: 1.6943s\n", "K Nearest Neighbors (KNN) ROC:0.8498, precision @ rank n:0.432, execution time: 6.6327s\n", "Local Outlier Factor (LOF) ROC:0.7195, precision @ rank n:0.3673, execution time: 5.9395s\n", "Minimum Covariance Determinant (MCD) ROC:0.8713, precision @ rank n:0.2653, execution time: 4.1711s\n", "One-class SVM (OCSVM) ROC:0.854, precision @ rank n:0.3946, execution time: 4.3626s\n", "Principal Component Analysis (PCA) ROC:0.8534, precision @ rank n:0.3878, execution time: 0.1433s\n", "\n", "... Processing musk.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.2161, precision @ rank n:0.1, execution time: 1.8951s\n", "Cluster-based Local Outlier Factor ROC:0.9899, precision @ rank n:0.65, execution time: 0.2998s\n", "Feature Bagging ROC:0.473, precision @ rank n:0.125, execution time: 12.2746s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.9999, precision @ rank n:0.975, execution time: 0.0582s\n", "Isolation Forest ROC:1.0, precision @ rank n:1.0, execution time: 1.0528s\n", "K Nearest Neighbors (KNN) ROC:0.8009, precision @ rank n:0.175, execution time: 1.7376s\n", "Local Outlier Factor (LOF) ROC:0.4629, precision @ rank n:0.125, execution time: 1.5411s\n", "Minimum Covariance Determinant (MCD) ROC:1.0, precision @ rank n:1.0, execution time: 21.114s\n", "One-class SVM (OCSVM) ROC:1.0, precision @ rank n:1.0, execution time: 1.1922s\n", "Principal Component Analysis (PCA) ROC:1.0, precision @ rank n:1.0, execution time: 0.1504s\n", "\n", "... Processing optdigits.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.4894, precision @ rank n:0.0152, execution time: 2.243s\n", "Cluster-based Local Outlier Factor ROC:0.5329, precision @ rank n:0.0, execution time: 0.3911s\n", "Feature Bagging ROC:0.5062, precision @ rank n:0.0303, execution time: 11.9945s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.8774, precision @ rank n:0.2121, execution time: 0.0271s\n", "Isolation Forest ROC:0.6735, precision @ rank n:0.0303, execution time: 0.8623s\n", "K Nearest Neighbors (KNN) ROC:0.406, precision @ rank n:0.0, execution time: 1.9502s\n", "Local Outlier Factor (LOF) ROC:0.5277, precision @ rank n:0.0303, execution time: 1.7256s\n", "Minimum Covariance Determinant (MCD) ROC:0.3822, precision @ rank n:0.0, execution time: 1.6714s\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "One-class SVM (OCSVM) ROC:0.5171, precision @ rank n:0.0, execution time: 1.4729s\n", "Principal Component Analysis (PCA) ROC:0.526, precision @ rank n:0.0, execution time: 0.0521s\n", "\n", "... Processing pendigits.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.667, precision @ rank n:0.0526, execution time: 1.3767s\n", "Cluster-based Local Outlier Factor ROC:0.9172, precision @ rank n:0.1579, execution time: 0.1955s\n", "Feature Bagging ROC:0.4889, precision @ rank n:0.0526, execution time: 3.6236s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.9348, precision @ rank n:0.2632, execution time: 0.008s\n", "Isolation Forest ROC:0.9376, precision @ rank n:0.3333, execution time: 0.6768s\n", "K Nearest Neighbors (KNN) ROC:0.7371, precision @ rank n:0.0702, execution time: 0.8352s\n", "Local Outlier Factor (LOF) ROC:0.4965, precision @ rank n:0.0702, execution time: 0.5886s\n", "Minimum Covariance Determinant (MCD) ROC:0.8204, precision @ rank n:0.0877, execution time: 2.274s\n", "One-class SVM (OCSVM) ROC:0.9235, precision @ rank n:0.3158, execution time: 0.9646s\n", "Principal Component Analysis (PCA) ROC:0.9309, precision @ rank n:0.3158, execution time: 0.007s\n", "\n", "... Processing pima.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.7163, precision @ rank n:0.5253, execution time: 0.1484s\n", "Cluster-based Local Outlier Factor ROC:0.7661, precision @ rank n:0.6061, execution time: 0.0752s\n", "Feature Bagging ROC:0.6448, precision @ rank n:0.4444, execution time: 0.0982s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.711, precision @ rank n:0.5354, execution time: 0.002s\n", "Isolation Forest ROC:0.6818, precision @ rank n:0.5152, execution time: 0.2176s\n", "K Nearest Neighbors (KNN) ROC:0.7395, precision @ rank n:0.5859, execution time: 0.0572s\n", "Local Outlier Factor (LOF) ROC:0.6574, precision @ rank n:0.4646, execution time: 0.011s\n", "Minimum Covariance Determinant (MCD) ROC:0.7175, precision @ rank n:0.5152, execution time: 0.0491s\n", "One-class SVM (OCSVM) ROC:0.6561, precision @ rank n:0.5051, execution time: 0.01s\n", "Principal Component Analysis (PCA) ROC:0.6762, precision @ rank n:0.5354, execution time: 0.001s\n", "\n", "... Processing satellite.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.5653, precision @ rank n:0.3962, execution time: 1.6945s\n", "Cluster-based Local Outlier Factor ROC:0.5548, precision @ rank n:0.345, execution time: 0.3288s\n", "Feature Bagging ROC:0.572, precision @ rank n:0.4, execution time: 7.0868s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.7486, precision @ rank n:0.57, execution time: 0.017s\n", "Isolation Forest ROC:0.6825, precision @ rank n:0.5825, execution time: 0.5966s\n", "K Nearest Neighbors (KNN) ROC:0.6853, precision @ rank n:0.4988, execution time: 1.1901s\n", "Local Outlier Factor (LOF) ROC:0.572, precision @ rank n:0.395, execution time: 0.9465s\n", "Minimum Covariance Determinant (MCD) ROC:0.8055, precision @ rank n:0.6762, execution time: 2.5738s\n", "One-class SVM (OCSVM) ROC:0.6478, precision @ rank n:0.5225, execution time: 1.3165s\n", "Principal Component Analysis (PCA) ROC:0.5923, precision @ rank n:0.465, execution time: 0.022s\n", "\n", "... Processing satimage-2.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.8432, precision @ rank n:0.2333, execution time: 1.3987s\n", "Cluster-based Local Outlier Factor ROC:0.9783, precision @ rank n:0.6667, execution time: 0.2396s\n", "Feature Bagging ROC:0.5235, precision @ rank n:0.1667, execution time: 5.8847s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.9784, precision @ rank n:0.6, execution time: 0.015s\n", "Isolation Forest ROC:0.9952, precision @ rank n:0.8667, execution time: 0.5154s\n", "K Nearest Neighbors (KNN) ROC:0.9515, precision @ rank n:0.4333, execution time: 0.9897s\n", "Local Outlier Factor (LOF) ROC:0.5257, precision @ rank n:0.1667, execution time: 0.7229s\n", "Minimum Covariance Determinant (MCD) ROC:0.9963, precision @ rank n:0.6667, execution time: 2.0875s\n", "One-class SVM (OCSVM) ROC:0.9997, precision @ rank n:0.9, execution time: 1.0097s\n", "Principal Component Analysis (PCA) ROC:0.9816, precision @ rank n:0.7333, execution time: 0.016s\n", "\n", "... Processing shuttle.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.6171, precision @ rank n:0.2003, execution time: 13.0948s\n", "Cluster-based Local Outlier Factor ROC:0.6273, precision @ rank n:0.2025, execution time: 0.7941s\n", "Feature Bagging ROC:0.4725, precision @ rank n:0.0257, execution time: 73.3649s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.9871, precision @ rank n:0.9985, execution time: 0.015s\n", "Isolation Forest ROC:0.9976, precision @ rank n:0.9596, execution time: 3.2025s\n", "K Nearest Neighbors (KNN) ROC:0.6507, precision @ rank n:0.212, execution time: 9.6808s\n", "Local Outlier Factor (LOF) ROC:0.5556, precision @ rank n:0.1548, execution time: 10.6754s\n", "Minimum Covariance Determinant (MCD) ROC:0.99, precision @ rank n:0.7395, execution time: 10.3525s\n", "One-class SVM (OCSVM) ROC:0.9934, precision @ rank n:0.956, execution time: 44.1575s\n", "Principal Component Analysis (PCA) ROC:0.9915, precision @ rank n:0.9516, execution time: 0.032s\n", "\n", "... Processing vertebral.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.5366, precision @ rank n:0.2143, execution time: 0.0431s\n", "Cluster-based Local Outlier Factor ROC:0.3937, precision @ rank n:0.0, execution time: 0.0311s\n", "Feature Bagging ROC:0.5279, precision @ rank n:0.1429, execution time: 0.029s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.3506, precision @ rank n:0.0, execution time: 0.002s\n", "Isolation Forest ROC:0.3772, precision @ rank n:0.0, execution time: 0.1575s\n", "K Nearest Neighbors (KNN) ROC:0.4573, precision @ rank n:0.0714, execution time: 0.015s\n", "Local Outlier Factor (LOF) ROC:0.4983, precision @ rank n:0.1429, execution time: 0.003s\n", "Minimum Covariance Determinant (MCD) ROC:0.4103, precision @ rank n:0.0714, execution time: 0.0331s\n", "One-class SVM (OCSVM) ROC:0.4686, precision @ rank n:0.0714, execution time: 0.002s\n", "Principal Component Analysis (PCA) ROC:0.4085, precision @ rank n:0.0, execution time: 0.001s\n", "\n", "... Processing vowels.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.9616, precision @ rank n:0.6316, execution time: 0.2336s\n", "Cluster-based Local Outlier Factor ROC:0.6496, precision @ rank n:0.1053, execution time: 0.0662s\n", "Feature Bagging ROC:0.9365, precision @ rank n:0.3684, execution time: 0.2577s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.6876, precision @ rank n:0.1579, execution time: 0.003s\n", "Isolation Forest ROC:0.8209, precision @ rank n:0.1579, execution time: 0.2116s\n", "K Nearest Neighbors (KNN) ROC:0.9734, precision @ rank n:0.4737, execution time: 0.1113s\n", "Local Outlier Factor (LOF) ROC:0.9398, precision @ rank n:0.3684, execution time: 0.0301s\n", "Minimum Covariance Determinant (MCD) ROC:0.7243, precision @ rank n:0.1053, execution time: 0.755s\n", "One-class SVM (OCSVM) ROC:0.8163, precision @ rank n:0.2632, execution time: 0.0311s\n", "Principal Component Analysis (PCA) ROC:0.6297, precision @ rank n:0.1579, execution time: 0.002s\n", "\n", "... Processing wbc.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.921, precision @ rank n:0.375, execution time: 0.0632s\n", "Cluster-based Local Outlier Factor ROC:0.8906, precision @ rank n:0.375, execution time: 0.0421s\n", "Feature Bagging ROC:0.9271, precision @ rank n:0.375, execution time: 0.0792s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.9479, precision @ rank n:0.5, execution time: 0.005s\n", "Isolation Forest ROC:0.9436, precision @ rank n:0.5, execution time: 0.1564s\n", "K Nearest Neighbors (KNN) ROC:0.9444, precision @ rank n:0.5, execution time: 0.0271s\n", "Local Outlier Factor (LOF) ROC:0.9227, precision @ rank n:0.375, execution time: 0.007s\n", "Minimum Covariance Determinant (MCD) ROC:0.9288, precision @ rank n:0.5, execution time: 0.0632s\n", "One-class SVM (OCSVM) ROC:0.9358, precision @ rank n:0.375, execution time: 0.005s\n", "Principal Component Analysis (PCA) ROC:0.9262, precision @ rank n:0.375, execution time: 0.001s\n" ] } ], "source": [ "# Define data file and read X and y\n", "mat_file_list = ['arrhythmia.mat',\n", " 'cardio.mat',\n", " 'glass.mat',\n", " 'ionosphere.mat',\n", " 'letter.mat',\n", " 'lympho.mat',\n", " 'mnist.mat',\n", " 'musk.mat',\n", " 'optdigits.mat',\n", " 'pendigits.mat',\n", " 'pima.mat',\n", " 'satellite.mat',\n", " 'satimage-2.mat',\n", " 'shuttle.mat',\n", " 'vertebral.mat',\n", " 'vowels.mat',\n", " 'wbc.mat']\n", "\n", "# Define nine outlier detection tools to be compared\n", "random_state = np.random.RandomState(42)\n", "\n", "df_columns = ['Data', '#Samples', '# Dimensions', 'Outlier Perc',\n", " 'ABOD', 'CBLOF', 'FB', 'HBOS', 'IForest', 'KNN', 'LOF', 'MCD',\n", " 'OCSVM', 'PCA']\n", "roc_df = pd.DataFrame(columns=df_columns)\n", "prn_df = pd.DataFrame(columns=df_columns)\n", "time_df = pd.DataFrame(columns=df_columns)\n", "\n", "\n", "for mat_file in mat_file_list:\n", " print(\"\\n... Processing\", mat_file, '...')\n", " mat = loadmat(os.path.join('data', mat_file))\n", "\n", " X = mat['X']\n", " y = mat['y'].ravel()\n", " outliers_fraction = np.count_nonzero(y) / len(y)\n", " outliers_percentage = round(outliers_fraction * 100, ndigits=4)\n", "\n", " # construct containers for saving results\n", " roc_list = [mat_file[:-4], X.shape[0], X.shape[1], outliers_percentage]\n", " prn_list = [mat_file[:-4], X.shape[0], X.shape[1], outliers_percentage]\n", " time_list = [mat_file[:-4], X.shape[0], X.shape[1], outliers_percentage]\n", "\n", " # 60% data for training and 40% for testing\n", " X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4,\n", " random_state=random_state)\n", "\n", " # standardizing data for processing\n", " X_train_norm, X_test_norm = standardizer(X_train, X_test)\n", "\n", " classifiers = {'Angle-based Outlier Detector (ABOD)': ABOD(\n", " contamination=outliers_fraction),\n", " 'Cluster-based Local Outlier Factor': CBLOF(\n", " contamination=outliers_fraction, check_estimator=False,\n", " random_state=random_state),\n", " 'Feature Bagging': FeatureBagging(contamination=outliers_fraction,\n", " random_state=random_state),\n", " 'Histogram-base Outlier Detection (HBOS)': HBOS(\n", " contamination=outliers_fraction),\n", " 'Isolation Forest': IForest(contamination=outliers_fraction,\n", " random_state=random_state),\n", " 'K Nearest Neighbors (KNN)': KNN(contamination=outliers_fraction),\n", " 'Local Outlier Factor (LOF)': LOF(\n", " contamination=outliers_fraction),\n", " 'Minimum Covariance Determinant (MCD)': MCD(\n", " contamination=outliers_fraction, random_state=random_state),\n", " 'One-class SVM (OCSVM)': OCSVM(contamination=outliers_fraction),\n", " 'Principal Component Analysis (PCA)': PCA(\n", " contamination=outliers_fraction, random_state=random_state),\n", " }\n", "\n", " for clf_name, clf in classifiers.items():\n", " t0 = time()\n", " clf.fit(X_train_norm)\n", " test_scores = clf.decision_function(X_test_norm)\n", " t1 = time()\n", " duration = round(t1 - t0, ndigits=4)\n", " time_list.append(duration)\n", "\n", " roc = round(roc_auc_score(y_test, test_scores), ndigits=4)\n", " prn = round(precision_n_scores(y_test, test_scores), ndigits=4)\n", "\n", " print('{clf_name} ROC:{roc}, precision @ rank n:{prn}, '\n", " 'execution time: {duration}s'.format(\n", " clf_name=clf_name, roc=roc, prn=prn, duration=duration))\n", "\n", " roc_list.append(roc)\n", " prn_list.append(prn)\n", "\n", " temp_df = pd.DataFrame(time_list).transpose()\n", " temp_df.columns = df_columns\n", " time_df = pd.concat([time_df, temp_df], axis=0)\n", "\n", " temp_df = pd.DataFrame(roc_list).transpose()\n", " temp_df.columns = df_columns\n", " roc_df = pd.concat([roc_df, temp_df], axis=0)\n", "\n", " temp_df = pd.DataFrame(prn_list).transpose()\n", " temp_df.columns = df_columns\n", " prn_df = pd.concat([prn_df, temp_df], axis=0)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Time complexity\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Data#Samples# DimensionsOutlier PercABODCBLOFFBHBOSIForestKNNLOFMCDOCSVMPCA
0arrhythmia45227414.60181.1260.97860.60760.89140.25070.11330.07421.45780.04810.0642
0cardio1831219.61220.35790.09830.86830.0050.27670.19050.09930.58350.08220.003
0glass21494.20560.03610.02510.03510.0050.15840.0130.0030.04810.0010.001
0ionosphere3513335.89740.06020.03810.06420.0060.17850.02510.0070.06220.0040.003
0letter1600326.250.3650.10520.73090.0070.24160.19050.09221.32250.09730.005
0lympho148184.05410.02610.03210.02610.0050.17950.0110.0030.05310.0020.001
0mnist76031009.20696.85020.900444.96040.04111.69436.63275.93954.17114.36260.1433
0musk30621663.16791.89510.299812.27460.05821.05281.73761.541121.1141.19220.1504
0optdigits5216642.87582.2430.391111.99450.02710.86231.95021.72561.67141.47290.0521
0pendigits6870162.27071.37670.19553.62360.0080.67680.83520.58862.2740.96460.007
0pima768834.89580.14840.07520.09820.0020.21760.05720.0110.04910.010.001
0satellite64353631.63951.69450.32887.08680.0170.59661.19010.94652.57381.31650.022
0satimage-25803361.22351.39870.23965.88470.0150.51540.98970.72292.08751.00970.016
0shuttle4909797.151113.09480.794173.36490.0153.20259.680810.675410.352544.15750.032
0vertebral240612.50.04310.03110.0290.0020.15750.0150.0030.03310.0020.001
0vowels1456123.43410.23360.06620.25770.0030.21160.11130.03010.7550.03110.002
0wbc378305.55560.06320.04210.07920.0050.15640.02710.0070.06320.0050.001
\n", "
" ], "text/plain": [ " Data #Samples # Dimensions Outlier Perc ABOD CBLOF FB \\\n", "0 arrhythmia 452 274 14.6018 1.126 0.9786 0.6076 \n", "0 cardio 1831 21 9.6122 0.3579 0.0983 0.8683 \n", "0 glass 214 9 4.2056 0.0361 0.0251 0.0351 \n", "0 ionosphere 351 33 35.8974 0.0602 0.0381 0.0642 \n", "0 letter 1600 32 6.25 0.365 0.1052 0.7309 \n", "0 lympho 148 18 4.0541 0.0261 0.0321 0.0261 \n", "0 mnist 7603 100 9.2069 6.8502 0.9004 44.9604 \n", "0 musk 3062 166 3.1679 1.8951 0.2998 12.2746 \n", "0 optdigits 5216 64 2.8758 2.243 0.3911 11.9945 \n", "0 pendigits 6870 16 2.2707 1.3767 0.1955 3.6236 \n", "0 pima 768 8 34.8958 0.1484 0.0752 0.0982 \n", "0 satellite 6435 36 31.6395 1.6945 0.3288 7.0868 \n", "0 satimage-2 5803 36 1.2235 1.3987 0.2396 5.8847 \n", "0 shuttle 49097 9 7.1511 13.0948 0.7941 73.3649 \n", "0 vertebral 240 6 12.5 0.0431 0.0311 0.029 \n", "0 vowels 1456 12 3.4341 0.2336 0.0662 0.2577 \n", "0 wbc 378 30 5.5556 0.0632 0.0421 0.0792 \n", "\n", " HBOS IForest KNN LOF MCD OCSVM PCA \n", "0 0.8914 0.2507 0.1133 0.0742 1.4578 0.0481 0.0642 \n", "0 0.005 0.2767 0.1905 0.0993 0.5835 0.0822 0.003 \n", "0 0.005 0.1584 0.013 0.003 0.0481 0.001 0.001 \n", "0 0.006 0.1785 0.0251 0.007 0.0622 0.004 0.003 \n", "0 0.007 0.2416 0.1905 0.0922 1.3225 0.0973 0.005 \n", "0 0.005 0.1795 0.011 0.003 0.0531 0.002 0.001 \n", "0 0.0411 1.6943 6.6327 5.9395 4.1711 4.3626 0.1433 \n", "0 0.0582 1.0528 1.7376 1.5411 21.114 1.1922 0.1504 \n", "0 0.0271 0.8623 1.9502 1.7256 1.6714 1.4729 0.0521 \n", "0 0.008 0.6768 0.8352 0.5886 2.274 0.9646 0.007 \n", "0 0.002 0.2176 0.0572 0.011 0.0491 0.01 0.001 \n", "0 0.017 0.5966 1.1901 0.9465 2.5738 1.3165 0.022 \n", "0 0.015 0.5154 0.9897 0.7229 2.0875 1.0097 0.016 \n", "0 0.015 3.2025 9.6808 10.6754 10.3525 44.1575 0.032 \n", "0 0.002 0.1575 0.015 0.003 0.0331 0.002 0.001 \n", "0 0.003 0.2116 0.1113 0.0301 0.755 0.0311 0.002 \n", "0 0.005 0.1564 0.0271 0.007 0.0632 0.005 0.001 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print('Time complexity')\n", "time_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Analyze the performance of ROC and Precision @ n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ROC Performance\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Data#Samples# DimensionsOutlier PercABODCBLOFFBHBOSIForestKNNLOFMCDOCSVMPCA
0arrhythmia45227414.60180.76870.78240.77960.85110.86390.7820.77870.82280.79860.7997
0cardio1831219.61220.58920.9730.63850.83730.95020.7340.5880.81950.94780.9616
0glass21494.20560.69510.79570.70730.70730.71340.83840.70430.82930.65850.686
0ionosphere3513335.89740.91810.7950.93030.60520.84860.9320.92270.96690.82570.7941
0letter1600326.250.87830.53010.89470.60630.62010.85730.87650.80610.59270.5216
0lympho148184.05410.9831111111111
0mnist76031009.20690.76280.82040.71570.57660.79390.84980.71950.87130.8540.8534
0musk30621663.16790.21610.98990.4730.999910.80090.4629111
0optdigits5216642.87580.48940.53290.50620.87740.67350.4060.52770.38220.51710.526
0pendigits6870162.27070.6670.91720.48890.93480.93760.73710.49650.82040.92350.9309
0pima768834.89580.71630.76610.64480.7110.68180.73950.65740.71750.65610.6762
0satellite64353631.63950.56530.55480.5720.74860.68250.68530.5720.80550.64780.5923
0satimage-25803361.22350.84320.97830.52350.97840.99520.95150.52570.99630.99970.9816
0shuttle4909797.15110.61710.62730.47250.98710.99760.65070.55560.990.99340.9915
0vertebral240612.50.53660.39370.52790.35060.37720.45730.49830.41030.46860.4085
0vowels1456123.43410.96160.64960.93650.68760.82090.97340.93980.72430.81630.6297
0wbc378305.55560.9210.89060.92710.94790.94360.94440.92270.92880.93580.9262
\n", "
" ], "text/plain": [ " Data #Samples # Dimensions Outlier Perc ABOD CBLOF FB \\\n", "0 arrhythmia 452 274 14.6018 0.7687 0.7824 0.7796 \n", "0 cardio 1831 21 9.6122 0.5892 0.973 0.6385 \n", "0 glass 214 9 4.2056 0.6951 0.7957 0.7073 \n", "0 ionosphere 351 33 35.8974 0.9181 0.795 0.9303 \n", "0 letter 1600 32 6.25 0.8783 0.5301 0.8947 \n", "0 lympho 148 18 4.0541 0.9831 1 1 \n", "0 mnist 7603 100 9.2069 0.7628 0.8204 0.7157 \n", "0 musk 3062 166 3.1679 0.2161 0.9899 0.473 \n", "0 optdigits 5216 64 2.8758 0.4894 0.5329 0.5062 \n", "0 pendigits 6870 16 2.2707 0.667 0.9172 0.4889 \n", "0 pima 768 8 34.8958 0.7163 0.7661 0.6448 \n", "0 satellite 6435 36 31.6395 0.5653 0.5548 0.572 \n", "0 satimage-2 5803 36 1.2235 0.8432 0.9783 0.5235 \n", "0 shuttle 49097 9 7.1511 0.6171 0.6273 0.4725 \n", "0 vertebral 240 6 12.5 0.5366 0.3937 0.5279 \n", "0 vowels 1456 12 3.4341 0.9616 0.6496 0.9365 \n", "0 wbc 378 30 5.5556 0.921 0.8906 0.9271 \n", "\n", " HBOS IForest KNN LOF MCD OCSVM PCA \n", "0 0.8511 0.8639 0.782 0.7787 0.8228 0.7986 0.7997 \n", "0 0.8373 0.9502 0.734 0.588 0.8195 0.9478 0.9616 \n", "0 0.7073 0.7134 0.8384 0.7043 0.8293 0.6585 0.686 \n", "0 0.6052 0.8486 0.932 0.9227 0.9669 0.8257 0.7941 \n", "0 0.6063 0.6201 0.8573 0.8765 0.8061 0.5927 0.5216 \n", "0 1 1 1 1 1 1 1 \n", "0 0.5766 0.7939 0.8498 0.7195 0.8713 0.854 0.8534 \n", "0 0.9999 1 0.8009 0.4629 1 1 1 \n", "0 0.8774 0.6735 0.406 0.5277 0.3822 0.5171 0.526 \n", "0 0.9348 0.9376 0.7371 0.4965 0.8204 0.9235 0.9309 \n", "0 0.711 0.6818 0.7395 0.6574 0.7175 0.6561 0.6762 \n", "0 0.7486 0.6825 0.6853 0.572 0.8055 0.6478 0.5923 \n", "0 0.9784 0.9952 0.9515 0.5257 0.9963 0.9997 0.9816 \n", "0 0.9871 0.9976 0.6507 0.5556 0.99 0.9934 0.9915 \n", "0 0.3506 0.3772 0.4573 0.4983 0.4103 0.4686 0.4085 \n", "0 0.6876 0.8209 0.9734 0.9398 0.7243 0.8163 0.6297 \n", "0 0.9479 0.9436 0.9444 0.9227 0.9288 0.9358 0.9262 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print('ROC Performance')\n", "roc_df" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Precision @ n Performance\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Data#Samples# DimensionsOutlier PercABODCBLOFFBHBOSIForestKNNLOFMCDOCSVMPCA
0arrhythmia45227414.60180.35710.46430.46430.57140.60710.50.46430.42860.50.5
0cardio1831219.61220.19180.79450.17810.45210.60270.35620.15070.4110.53420.6849
0glass21494.20560.250.250.2500.250.250.2500.250.25
0ionosphere3513335.89740.84310.5490.80390.39220.58820.88240.78430.86270.68630.5686
0letter1600326.250.43750.03120.40620.09380.06250.31250.34380.18750.1250.125
0lympho148184.05410111111111
0mnist76031009.20690.33670.36050.37410.13610.27210.4320.36730.26530.39460.3878
0musk30621663.16790.10.650.1250.97510.1750.125111
0optdigits5216642.87580.015200.03030.21210.030300.0303000
0pendigits6870162.27070.05260.15790.05260.26320.33330.07020.07020.08770.31580.3158
0pima768834.89580.52530.60610.44440.53540.51520.58590.46460.51520.50510.5354
0satellite64353631.63950.39620.3450.40.570.58250.49880.3950.67620.52250.465
0satimage-25803361.22350.23330.66670.16670.60.86670.43330.16670.66670.90.7333
0shuttle4909797.15110.20030.20250.02570.99850.95960.2120.15480.73950.9560.9516
0vertebral240612.50.214300.1429000.07140.14290.07140.07140
0vowels1456123.43410.63160.10530.36840.15790.15790.47370.36840.10530.26320.1579
0wbc378305.55560.3750.3750.3750.50.50.50.3750.50.3750.375
\n", "
" ], "text/plain": [ " Data #Samples # Dimensions Outlier Perc ABOD CBLOF FB \\\n", "0 arrhythmia 452 274 14.6018 0.3571 0.4643 0.4643 \n", "0 cardio 1831 21 9.6122 0.1918 0.7945 0.1781 \n", "0 glass 214 9 4.2056 0.25 0.25 0.25 \n", "0 ionosphere 351 33 35.8974 0.8431 0.549 0.8039 \n", "0 letter 1600 32 6.25 0.4375 0.0312 0.4062 \n", "0 lympho 148 18 4.0541 0 1 1 \n", "0 mnist 7603 100 9.2069 0.3367 0.3605 0.3741 \n", "0 musk 3062 166 3.1679 0.1 0.65 0.125 \n", "0 optdigits 5216 64 2.8758 0.0152 0 0.0303 \n", "0 pendigits 6870 16 2.2707 0.0526 0.1579 0.0526 \n", "0 pima 768 8 34.8958 0.5253 0.6061 0.4444 \n", "0 satellite 6435 36 31.6395 0.3962 0.345 0.4 \n", "0 satimage-2 5803 36 1.2235 0.2333 0.6667 0.1667 \n", "0 shuttle 49097 9 7.1511 0.2003 0.2025 0.0257 \n", "0 vertebral 240 6 12.5 0.2143 0 0.1429 \n", "0 vowels 1456 12 3.4341 0.6316 0.1053 0.3684 \n", "0 wbc 378 30 5.5556 0.375 0.375 0.375 \n", "\n", " HBOS IForest KNN LOF MCD OCSVM PCA \n", "0 0.5714 0.6071 0.5 0.4643 0.4286 0.5 0.5 \n", "0 0.4521 0.6027 0.3562 0.1507 0.411 0.5342 0.6849 \n", "0 0 0.25 0.25 0.25 0 0.25 0.25 \n", "0 0.3922 0.5882 0.8824 0.7843 0.8627 0.6863 0.5686 \n", "0 0.0938 0.0625 0.3125 0.3438 0.1875 0.125 0.125 \n", "0 1 1 1 1 1 1 1 \n", "0 0.1361 0.2721 0.432 0.3673 0.2653 0.3946 0.3878 \n", "0 0.975 1 0.175 0.125 1 1 1 \n", "0 0.2121 0.0303 0 0.0303 0 0 0 \n", "0 0.2632 0.3333 0.0702 0.0702 0.0877 0.3158 0.3158 \n", "0 0.5354 0.5152 0.5859 0.4646 0.5152 0.5051 0.5354 \n", "0 0.57 0.5825 0.4988 0.395 0.6762 0.5225 0.465 \n", "0 0.6 0.8667 0.4333 0.1667 0.6667 0.9 0.7333 \n", "0 0.9985 0.9596 0.212 0.1548 0.7395 0.956 0.9516 \n", "0 0 0 0.0714 0.1429 0.0714 0.0714 0 \n", "0 0.1579 0.1579 0.4737 0.3684 0.1053 0.2632 0.1579 \n", "0 0.5 0.5 0.5 0.375 0.5 0.375 0.375 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print('Precision @ n Performance')\n", "prn_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 2 }