{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Benchmark of various outlier detection models\n", "\n", "### The models are evaluaed on ROC, Precision @ n and execution time on 17 benchmark datasets. All datasets are splitted 60% for training and 40% for testing.\n", "\n", "**[PyOD](https://github.com/yzhao062/pyod)** is a comprehensive **Python toolkit** to **identify outlying objects** in \n", "multivariate data with both unsupervised and supervised approaches.\n", "The model covered in this example includes:\n", "\n", " 1. Linear Models for Outlier Detection:\n", " 1. **PCA: Principal Component Analysis** use the sum of\n", " weighted projected distances to the eigenvector hyperplane \n", " as the outlier outlier scores)\n", " 2. **MCD: Minimum Covariance Determinant** (use the mahalanobis distances \n", " as the outlier scores)\n", " 3. **OCSVM: One-Class Support Vector Machines**\n", " \n", " 2. Proximity-Based Outlier Detection Models:\n", " 1. **LOF: Local Outlier Factor**\n", " 2. **CBLOF: Clustering-Based Local Outlier Factor**\n", " 3. **kNN: k Nearest Neighbors** (use the distance to the kth nearest \n", " neighbor as the outlier score)\n", " 4. **Median kNN** Outlier Detection (use the median distance to k nearest \n", " neighbors as the outlier score)\n", " 5. **HBOS: Histogram-based Outlier Score**\n", " \n", " 3. Probabilistic Models for Outlier Detection:\n", " 1. **ABOD: Angle-Based Outlier Detection**\n", " \n", " 4. Outlier Ensembles and Combination Frameworks\n", " 1. **Isolation Forest**\n", " 2. **Feature Bagging**\n", " 3. **LSCP**\n", " \n", "Corresponding file could be found at /examples/compare_all_models.py" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from __future__ import division\n", "from __future__ import print_function\n", "\n", "import os\n", "import sys\n", "from time import time\n", "\n", "# temporary solution for relative imports in case pyod is not installed\n", "# if pyod is installed, no need to use the following line\n", "sys.path.append(\n", " os.path.abspath(os.path.join(os.path.dirname(\"__file__\"), '..')))\n", "# supress warnings for clean output\n", "import warnings\n", "\n", "warnings.filterwarnings(\"ignore\")\n", "\n", "import numpy as np\n", "import pandas as pd\n", "from sklearn.model_selection import train_test_split\n", "from scipy.io import loadmat\n", "\n", "from pyod.models.abod import ABOD\n", "from pyod.models.cblof import CBLOF\n", "from pyod.models.feature_bagging import FeatureBagging\n", "from pyod.models.hbos import HBOS\n", "from pyod.models.iforest import IForest\n", "from pyod.models.knn import KNN\n", "from pyod.models.lof import LOF\n", "from pyod.models.mcd import MCD\n", "from pyod.models.ocsvm import OCSVM\n", "from pyod.models.pca import PCA\n", "from pyod.models.lscp import LSCP\n", "\n", "from pyod.utils.utility import standardizer\n", "from pyod.utils.utility import precision_n_scores\n", "from sklearn.metrics import roc_auc_score" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "... Processing arrhythmia.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.7687, precision @ rank n:0.3571, execution time: 0.1454s\n", "Cluster-based Local Outlier Factor ROC:0.778, precision @ rank n:0.5, execution time: 0.0301s\n", "Feature Bagging ROC:0.7736, precision @ rank n:0.5, execution time: 0.5825s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.8511, precision @ rank n:0.5714, execution time: 0.0622s\n", "Isolation Forest ROC:0.8217, precision @ rank n:0.5, execution time: 0.2477s\n", "K Nearest Neighbors (KNN) ROC:0.782, precision @ rank n:0.5, execution time: 0.0932s\n", "Local Outlier Factor (LOF) ROC:0.7787, precision @ rank n:0.4643, execution time: 0.0681s\n", "Minimum Covariance Determinant (MCD) ROC:0.8228, precision @ rank n:0.4286, execution time: 0.5083s\n", "One-class SVM (OCSVM) ROC:0.7986, precision @ rank n:0.5, execution time: 0.0471s\n", "Principal Component Analysis (PCA) ROC:0.7997, precision @ rank n:0.5, execution time: 0.0602s\n", "Locally Selective Combination (LSCP) ROC:0.7754, precision @ rank n:0.4286, execution time: 2.4836s\n", "\n", "... Processing cardio.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.5952, precision @ rank n:0.1884, execution time: 0.399s\n", "Cluster-based Local Outlier Factor ROC:0.8894, precision @ rank n:0.4928, execution time: 0.0261s\n", "Feature Bagging ROC:0.5628, precision @ rank n:0.1594, execution time: 0.8914s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.8227, precision @ rank n:0.4783, execution time: 0.006s\n", "Isolation Forest ROC:0.8953, precision @ rank n:0.4493, execution time: 0.3219s\n", "K Nearest Neighbors (KNN) ROC:0.7442, precision @ rank n:0.2899, execution time: 0.3168s\n", "Local Outlier Factor (LOF) ROC:0.5459, precision @ rank n:0.1594, execution time: 0.1494s\n", "Minimum Covariance Determinant (MCD) ROC:0.7774, precision @ rank n:0.4203, execution time: 0.6467s\n", "One-class SVM (OCSVM) ROC:0.914, precision @ rank n:0.4493, execution time: 0.0983s\n", "Principal Component Analysis (PCA) ROC:0.9323, precision @ rank n:0.5507, execution time: 0.004s\n", "Locally Selective Combination (LSCP) ROC:0.622, precision @ rank n:0.1884, execution time: 5.0424s\n", "\n", "... Processing glass.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.8588, precision @ rank n:0.0, execution time: 0.0401s\n", "Cluster-based Local Outlier Factor ROC:0.7765, precision @ rank n:0.0, execution time: 0.012s\n", "Feature Bagging ROC:0.4235, precision @ rank n:0.0, execution time: 0.0281s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.6, precision @ rank n:0.0, execution time: 0.002s\n", "Isolation Forest ROC:0.7765, precision @ rank n:0.0, execution time: 0.1684s\n", "K Nearest Neighbors (KNN) ROC:0.8353, precision @ rank n:0.0, execution time: 0.018s\n", "Local Outlier Factor (LOF) ROC:0.3882, precision @ rank n:0.0, execution time: 0.003s\n", "Minimum Covariance Determinant (MCD) ROC:0.8353, precision @ rank n:0.0, execution time: 0.0431s\n", "One-class SVM (OCSVM) ROC:0.7529, precision @ rank n:0.0, execution time: 0.001s\n", "Principal Component Analysis (PCA) ROC:0.7176, precision @ rank n:0.0, execution time: 0.002s\n", "Locally Selective Combination (LSCP) ROC:0.7529, precision @ rank n:0.0, execution time: 0.2878s\n", "\n", "... Processing ionosphere.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.9302, precision @ rank n:0.8462, execution time: 0.0792s\n", "Cluster-based Local Outlier Factor ROC:0.8073, precision @ rank n:0.6154, execution time: 0.0271s\n", "Feature Bagging ROC:0.9092, precision @ rank n:0.7692, execution time: 0.0722s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.5869, precision @ rank n:0.4038, execution time: 0.008s\n", "Isolation Forest ROC:0.8734, precision @ rank n:0.7115, execution time: 0.2347s\n", "K Nearest Neighbors (KNN) ROC:0.9358, precision @ rank n:0.8846, execution time: 0.0251s\n", "Local Outlier Factor (LOF) ROC:0.9114, precision @ rank n:0.7692, execution time: 0.006s\n", "Minimum Covariance Determinant (MCD) ROC:0.9576, precision @ rank n:0.9038, execution time: 0.0682s\n", "One-class SVM (OCSVM) ROC:0.8861, precision @ rank n:0.8077, execution time: 0.005s\n", "Principal Component Analysis (PCA) ROC:0.8204, precision @ rank n:0.6154, execution time: 0.002s\n", "Locally Selective Combination (LSCP) ROC:0.9041, precision @ rank n:0.75, execution time: 0.5264s\n", "\n", "... Processing letter.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.9035, precision @ rank n:0.4255, execution time: 0.3579s\n", "Cluster-based Local Outlier Factor ROC:0.5555, precision @ rank n:0.0851, execution time: 0.021s\n", "Feature Bagging ROC:0.9077, precision @ rank n:0.4894, execution time: 0.751s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.6056, precision @ rank n:0.1915, execution time: 0.008s\n", "Isolation Forest ROC:0.5945, precision @ rank n:0.1064, execution time: 0.2597s\n", "K Nearest Neighbors (KNN) ROC:0.8909, precision @ rank n:0.4043, execution time: 0.1584s\n", "Local Outlier Factor (LOF) ROC:0.8821, precision @ rank n:0.4681, execution time: 0.1203s\n", "Minimum Covariance Determinant (MCD) ROC:0.8144, precision @ rank n:0.1915, execution time: 1.1551s\n", "One-class SVM (OCSVM) ROC:0.5727, precision @ rank n:0.1489, execution time: 0.0852s\n", "Principal Component Analysis (PCA) ROC:0.5104, precision @ rank n:0.1277, execution time: 0.004s\n", "Locally Selective Combination (LSCP) ROC:0.857, precision @ rank n:0.4043, execution time: 4.7737s\n", "\n", "... Processing lympho.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.9357, precision @ rank n:0.3333, execution time: 0.0231s\n", "Cluster-based Local Outlier Factor ROC:0.9708, precision @ rank n:0.6667, execution time: 0.015s\n", "Feature Bagging ROC:0.924, precision @ rank n:0.3333, execution time: 0.0231s\n", "Histogram-base Outlier Detection (HBOS) ROC:1.0, precision @ rank n:1.0, execution time: 0.004s\n", "Isolation Forest ROC:0.9942, precision @ rank n:0.6667, execution time: 0.2065s\n", "K Nearest Neighbors (KNN) ROC:0.9064, precision @ rank n:0.3333, execution time: 0.011s\n", "Local Outlier Factor (LOF) ROC:0.924, precision @ rank n:0.3333, execution time: 0.003s\n", "Minimum Covariance Determinant (MCD) ROC:0.7778, precision @ rank n:0.0, execution time: 0.0491s\n", "One-class SVM (OCSVM) ROC:0.9357, precision @ rank n:0.3333, execution time: 0.002s\n", "Principal Component Analysis (PCA) ROC:0.9649, precision @ rank n:0.6667, execution time: 0.001s\n", "Locally Selective Combination (LSCP) ROC:0.9357, precision @ rank n:0.3333, execution time: 0.2988s\n", "\n", "... Processing mnist.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.7978, precision @ rank n:0.3594, execution time: 7.2033s\n", "Cluster-based Local Outlier Factor ROC:0.8477, precision @ rank n:0.3915, execution time: 0.0622s\n", "Feature Bagging ROC:0.7451, precision @ rank n:0.3452, execution time: 57.4463s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.5645, precision @ rank n:0.1174, execution time: 0.0451s\n", "Isolation Forest ROC:0.8154, precision @ rank n:0.3096, execution time: 1.7867s\n", "K Nearest Neighbors (KNN) ROC:0.8643, precision @ rank n:0.4448, execution time: 7.1259s\n", "Local Outlier Factor (LOF) ROC:0.7442, precision @ rank n:0.3523, execution time: 6.4782s\n", "Minimum Covariance Determinant (MCD) ROC:0.8926, precision @ rank n:0.4875, execution time: 2.4916s\n", "One-class SVM (OCSVM) ROC:0.8595, precision @ rank n:0.3915, execution time: 4.6975s\n", "Principal Component Analysis (PCA) ROC:0.8572, precision @ rank n:0.3843, execution time: 0.1494s\n", "Locally Selective Combination (LSCP) ROC:0.7873, precision @ rank n:0.3665, execution time: 191.5348s\n", "\n", "... Processing musk.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.2111, precision @ rank n:0.0488, execution time: 2.3532s\n", "Cluster-based Local Outlier Factor ROC:0.9864, precision @ rank n:0.6829, execution time: 0.0361s\n", "Feature Bagging ROC:0.6141, precision @ rank n:0.2195, execution time: 13.4944s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.9999, precision @ rank n:0.9756, execution time: 0.0822s\n", "Isolation Forest ROC:0.9997, precision @ rank n:0.9756, execution time: 1.5561s\n", "K Nearest Neighbors (KNN) ROC:0.8224, precision @ rank n:0.2439, execution time: 2.3803s\n", "Local Outlier Factor (LOF) ROC:0.6232, precision @ rank n:0.2195, execution time: 2.0785s\n", "Minimum Covariance Determinant (MCD) ROC:0.9984, precision @ rank n:0.878, execution time: 14.7882s\n", "One-class SVM (OCSVM) ROC:1.0, precision @ rank n:1.0, execution time: 1.3396s\n", "Principal Component Analysis (PCA) ROC:0.9999, precision @ rank n:0.9512, execution time: 0.1594s\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Locally Selective Combination (LSCP) ROC:0.5304, precision @ rank n:0.1463, execution time: 116.7357s\n", "\n", "... Processing optdigits.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.4294, precision @ rank n:0.0149, execution time: 3.1473s\n", "Cluster-based Local Outlier Factor ROC:0.49, precision @ rank n:0.0, execution time: 0.0261s\n", "Feature Bagging ROC:0.4108, precision @ rank n:0.0149, execution time: 14.9847s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.835, precision @ rank n:0.209, execution time: 0.0341s\n", "Isolation Forest ROC:0.7365, precision @ rank n:0.0299, execution time: 0.8773s\n", "K Nearest Neighbors (KNN) ROC:0.3836, precision @ rank n:0.0, execution time: 1.9953s\n", "Local Outlier Factor (LOF) ROC:0.3996, precision @ rank n:0.0149, execution time: 1.7457s\n", "Minimum Covariance Determinant (MCD) ROC:0.3791, precision @ rank n:0.0, execution time: 1.0467s\n", "One-class SVM (OCSVM) ROC:0.532, precision @ rank n:0.0, execution time: 1.4028s\n", "Principal Component Analysis (PCA) ROC:0.525, precision @ rank n:0.0, execution time: 0.0451s\n", "Locally Selective Combination (LSCP) ROC:0.3975, precision @ rank n:0.0149, execution time: 60.2475s\n", "\n", "... Processing pendigits.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.6608, precision @ rank n:0.1224, execution time: 1.4017s\n", "Cluster-based Local Outlier Factor ROC:0.934, precision @ rank n:0.2041, execution time: 0.0551s\n", "Feature Bagging ROC:0.3992, precision @ rank n:0.0408, execution time: 4.9451s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.9209, precision @ rank n:0.3061, execution time: 0.011s\n", "Isolation Forest ROC:0.9296, precision @ rank n:0.3061, execution time: 0.5765s\n", "K Nearest Neighbors (KNN) ROC:0.7086, precision @ rank n:0.0408, execution time: 0.9656s\n", "Local Outlier Factor (LOF) ROC:0.419, precision @ rank n:0.0408, execution time: 0.6006s\n", "Minimum Covariance Determinant (MCD) ROC:0.8369, precision @ rank n:0.0612, execution time: 1.8158s\n", "One-class SVM (OCSVM) ROC:0.9267, precision @ rank n:0.2449, execution time: 1.0207s\n", "Principal Component Analysis (PCA) ROC:0.9359, precision @ rank n:0.2653, execution time: 0.01s\n", "Locally Selective Combination (LSCP) ROC:0.487, precision @ rank n:0.0408, execution time: 25.4254s\n", "\n", "... Processing pima.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.6864, precision @ rank n:0.5847, execution time: 0.1334s\n", "Cluster-based Local Outlier Factor ROC:0.7064, precision @ rank n:0.5678, execution time: 0.0221s\n", "Feature Bagging ROC:0.618, precision @ rank n:0.4746, execution time: 0.0942s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.7031, precision @ rank n:0.5847, execution time: 0.003s\n", "Isolation Forest ROC:0.6704, precision @ rank n:0.5424, execution time: 0.2035s\n", "K Nearest Neighbors (KNN) ROC:0.712, precision @ rank n:0.5847, execution time: 0.0662s\n", "Local Outlier Factor (LOF) ROC:0.6356, precision @ rank n:0.5169, execution time: 0.01s\n", "Minimum Covariance Determinant (MCD) ROC:0.7026, precision @ rank n:0.5678, execution time: 0.0852s\n", "One-class SVM (OCSVM) ROC:0.6297, precision @ rank n:0.5085, execution time: 0.012s\n", "Principal Component Analysis (PCA) ROC:0.6602, precision @ rank n:0.5508, execution time: 0.002s\n", "Locally Selective Combination (LSCP) ROC:0.6538, precision @ rank n:0.5339, execution time: 1.1691s\n", "\n", "... Processing satellite.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.5676, precision @ rank n:0.4078, execution time: 1.89s\n", "Cluster-based Local Outlier Factor ROC:0.7307, precision @ rank n:0.4539, execution time: 0.025s\n", "Feature Bagging ROC:0.5645, precision @ rank n:0.4054, execution time: 8.0935s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.7593, precision @ rank n:0.5804, execution time: 0.019s\n", "Isolation Forest ROC:0.6947, precision @ rank n:0.5686, execution time: 0.8693s\n", "K Nearest Neighbors (KNN) ROC:0.6827, precision @ rank n:0.5, execution time: 1.2533s\n", "Local Outlier Factor (LOF) ROC:0.5676, precision @ rank n:0.4066, execution time: 1.0658s\n", "Minimum Covariance Determinant (MCD) ROC:0.7991, precision @ rank n:0.6832, execution time: 2.8646s\n", "One-class SVM (OCSVM) ROC:0.6551, precision @ rank n:0.5355, execution time: 1.3265s\n", "Principal Component Analysis (PCA) ROC:0.5976, precision @ rank n:0.4787, execution time: 0.0271s\n", "Locally Selective Combination (LSCP) ROC:0.5809, precision @ rank n:0.4184, execution time: 34.2882s\n", "\n", "... Processing satimage-2.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.8497, precision @ rank n:0.1667, execution time: 1.7396s\n", "Cluster-based Local Outlier Factor ROC:0.9568, precision @ rank n:0.5, execution time: 0.0531s\n", "Feature Bagging ROC:0.4798, precision @ rank n:0.1, execution time: 7.2252s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.9948, precision @ rank n:0.7, execution time: 0.0211s\n", "Isolation Forest ROC:0.9997, precision @ rank n:0.9333, execution time: 0.6788s\n", "K Nearest Neighbors (KNN) ROC:0.9693, precision @ rank n:0.4, execution time: 1.1992s\n", "Local Outlier Factor (LOF) ROC:0.4819, precision @ rank n:0.1, execution time: 0.9445s\n", "Minimum Covariance Determinant (MCD) ROC:0.996, precision @ rank n:0.7, execution time: 2.1818s\n", "One-class SVM (OCSVM) ROC:1.0, precision @ rank n:1.0, execution time: 1.114s\n", "Principal Component Analysis (PCA) ROC:0.9974, precision @ rank n:0.8333, execution time: 0.0211s\n", "Locally Selective Combination (LSCP) ROC:0.6986, precision @ rank n:0.1, execution time: 33.6747s\n", "\n", "... Processing vertebral.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.2428, precision @ rank n:0.0, execution time: 0.0371s\n", "Cluster-based Local Outlier Factor ROC:0.2588, precision @ rank n:0.0, execution time: 0.023s\n", "Feature Bagging ROC:0.2909, precision @ rank n:0.0, execution time: 0.0391s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.1524, precision @ rank n:0.0, execution time: 0.003s\n", "Isolation Forest ROC:0.2449, precision @ rank n:0.0, execution time: 0.1775s\n", "K Nearest Neighbors (KNN) ROC:0.2791, precision @ rank n:0.0, execution time: 0.018s\n", "Local Outlier Factor (LOF) ROC:0.293, precision @ rank n:0.0, execution time: 0.003s\n", "Minimum Covariance Determinant (MCD) ROC:0.3273, precision @ rank n:0.0, execution time: 0.0491s\n", "One-class SVM (OCSVM) ROC:0.2909, precision @ rank n:0.0, execution time: 0.002s\n", "Principal Component Analysis (PCA) ROC:0.2439, precision @ rank n:0.0, execution time: 0.001s\n", "Locally Selective Combination (LSCP) ROC:0.2503, precision @ rank n:0.0, execution time: 0.3359s\n", "\n", "... Processing vowels.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.9726, precision @ rank n:0.5, execution time: 0.2637s\n", "Cluster-based Local Outlier Factor ROC:0.574, precision @ rank n:0.0, execution time: 0.0221s\n", "Feature Bagging ROC:0.955, precision @ rank n:0.25, execution time: 0.2787s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.6683, precision @ rank n:0.0625, execution time: 0.004s\n", "Isolation Forest ROC:0.7809, precision @ rank n:0.0625, execution time: 0.2557s\n", "K Nearest Neighbors (KNN) ROC:0.9775, precision @ rank n:0.3125, execution time: 0.1153s\n", "Local Outlier Factor (LOF) ROC:0.9514, precision @ rank n:0.3125, execution time: 0.0351s\n", "Minimum Covariance Determinant (MCD) ROC:0.7081, precision @ rank n:0.0625, execution time: 0.7189s\n", "One-class SVM (OCSVM) ROC:0.8244, precision @ rank n:0.1875, execution time: 0.0441s\n", "Principal Component Analysis (PCA) ROC:0.5585, precision @ rank n:0.0, execution time: 0.002s\n", "Locally Selective Combination (LSCP) ROC:0.9604, precision @ rank n:0.3125, execution time: 2.3161s\n", "\n", "... Processing wbc.mat ...\n", "Angle-based Outlier Detector (ABOD) ROC:0.8803, precision @ rank n:0.2, execution time: 0.0632s\n", "Cluster-based Local Outlier Factor ROC:0.9374, precision @ rank n:0.4, execution time: 0.015s\n", "Feature Bagging ROC:0.9224, precision @ rank n:0.2, execution time: 0.0702s\n", "Histogram-base Outlier Detection (HBOS) ROC:0.9415, precision @ rank n:0.4, execution time: 0.008s\n", "Isolation Forest ROC:0.9102, precision @ rank n:0.2, execution time: 0.2055s\n", "K Nearest Neighbors (KNN) ROC:0.9034, precision @ rank n:0.2, execution time: 0.0291s\n", "Local Outlier Factor (LOF) ROC:0.9211, precision @ rank n:0.2, execution time: 0.007s\n", "Minimum Covariance Determinant (MCD) ROC:0.9129, precision @ rank n:0.2, execution time: 0.0602s\n", "One-class SVM (OCSVM) ROC:0.9224, precision @ rank n:0.2, execution time: 0.006s\n", "Principal Component Analysis (PCA) ROC:0.9102, precision @ rank n:0.2, execution time: 0.002s\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Locally Selective Combination (LSCP) ROC:0.9116, precision @ rank n:0.4, execution time: 0.5545s\n" ] } ], "source": [ "# Define data file and read X and y\n", "mat_file_list = ['arrhythmia.mat',\n", " 'cardio.mat',\n", " 'glass.mat',\n", " 'ionosphere.mat',\n", " 'letter.mat',\n", " 'lympho.mat',\n", " 'mnist.mat',\n", " 'musk.mat',\n", " 'optdigits.mat',\n", " 'pendigits.mat',\n", " 'pima.mat',\n", " 'satellite.mat',\n", " 'satimage-2.mat',\n", "# 'shuttle.mat',\n", " 'vertebral.mat',\n", " 'vowels.mat',\n", " 'wbc.mat']\n", "\n", "# Define nine outlier detection tools to be compared\n", "random_state = np.random.RandomState(42)\n", "\n", "df_columns = ['Data', '#Samples', '# Dimensions', 'Outlier Perc',\n", " 'ABOD', 'CBLOF', 'FB', 'HBOS', 'IForest', 'KNN', 'LOF', 'MCD',\n", " 'OCSVM', 'PCA', 'LSCP']\n", "roc_df = pd.DataFrame(columns=df_columns)\n", "prn_df = pd.DataFrame(columns=df_columns)\n", "time_df = pd.DataFrame(columns=df_columns)\n", "\n", "# initialize a set of detectors for LSCP\n", "detector_list = [LOF(n_neighbors=5), LOF(n_neighbors=10), LOF(n_neighbors=15),\n", " LOF(n_neighbors=20), LOF(n_neighbors=25), LOF(n_neighbors=30),\n", " LOF(n_neighbors=35), LOF(n_neighbors=40), LOF(n_neighbors=45),\n", " LOF(n_neighbors=50)]\n", "\n", "for mat_file in mat_file_list:\n", " print(\"\\n... Processing\", mat_file, '...')\n", " mat = loadmat(os.path.join('data', mat_file))\n", "\n", " X = mat['X']\n", " y = mat['y'].ravel()\n", " outliers_fraction = np.count_nonzero(y) / len(y)\n", " outliers_percentage = round(outliers_fraction * 100, ndigits=4)\n", "\n", " # construct containers for saving results\n", " roc_list = [mat_file[:-4], X.shape[0], X.shape[1], outliers_percentage]\n", " prn_list = [mat_file[:-4], X.shape[0], X.shape[1], outliers_percentage]\n", " time_list = [mat_file[:-4], X.shape[0], X.shape[1], outliers_percentage]\n", "\n", " # 60% data for training and 40% for testing\n", " X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4,\n", " random_state=random_state)\n", "\n", " # standardizing data for processing\n", " X_train_norm, X_test_norm = standardizer(X_train, X_test)\n", "\n", " classifiers = {'Angle-based Outlier Detector (ABOD)': ABOD(\n", " contamination=outliers_fraction),\n", " 'Cluster-based Local Outlier Factor': CBLOF(\n", " contamination=outliers_fraction, check_estimator=False,\n", " random_state=random_state),\n", " 'Feature Bagging': FeatureBagging(contamination=outliers_fraction,\n", " check_estimator=False,\n", " random_state=random_state),\n", " 'Histogram-base Outlier Detection (HBOS)': HBOS(\n", " contamination=outliers_fraction),\n", " 'Isolation Forest': IForest(contamination=outliers_fraction,\n", " random_state=random_state),\n", " 'K Nearest Neighbors (KNN)': KNN(contamination=outliers_fraction),\n", " 'Local Outlier Factor (LOF)': LOF(\n", " contamination=outliers_fraction),\n", " 'Minimum Covariance Determinant (MCD)': MCD(\n", " contamination=outliers_fraction, random_state=random_state),\n", " 'One-class SVM (OCSVM)': OCSVM(contamination=outliers_fraction,\n", " random_state=random_state),\n", " 'Principal Component Analysis (PCA)': PCA(\n", " contamination=outliers_fraction, random_state=random_state),\n", " 'Locally Selective Combination (LSCP)': LSCP(\n", " detector_list, contamination=outliers_fraction,\n", " random_state=random_state),\n", " }\n", "\n", " for clf_name, clf in classifiers.items():\n", " t0 = time()\n", " clf.fit(X_train_norm)\n", " test_scores = clf.decision_function(X_test_norm)\n", " t1 = time()\n", " duration = round(t1 - t0, ndigits=4)\n", " time_list.append(duration)\n", "\n", " roc = round(roc_auc_score(y_test, test_scores), ndigits=4)\n", " prn = round(precision_n_scores(y_test, test_scores), ndigits=4)\n", "\n", " print('{clf_name} ROC:{roc}, precision @ rank n:{prn}, '\n", " 'execution time: {duration}s'.format(\n", " clf_name=clf_name, roc=roc, prn=prn, duration=duration))\n", "\n", " roc_list.append(roc)\n", " prn_list.append(prn)\n", "\n", " temp_df = pd.DataFrame(time_list).transpose()\n", " temp_df.columns = df_columns\n", " time_df = pd.concat([time_df, temp_df], axis=0)\n", "\n", " temp_df = pd.DataFrame(roc_list).transpose()\n", " temp_df.columns = df_columns\n", " roc_df = pd.concat([roc_df, temp_df], axis=0)\n", "\n", " temp_df = pd.DataFrame(prn_list).transpose()\n", " temp_df.columns = df_columns\n", " prn_df = pd.concat([prn_df, temp_df], axis=0)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Time complexity\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Data#Samples# DimensionsOutlier PercABODCBLOFFBHBOSIForestKNNLOFMCDOCSVMPCALSCP
0arrhythmia45227414.60180.14540.03010.58250.06220.24770.09320.06810.50830.04710.06022.4836
0cardio1831219.61220.3990.02610.89140.0060.32190.31680.14940.64670.09830.0045.0424
0glass21494.20560.04010.0120.02810.0020.16840.0180.0030.04310.0010.0020.2878
0ionosphere3513335.89740.07920.02710.07220.0080.23470.02510.0060.06820.0050.0020.5264
0letter1600326.250.35790.0210.7510.0080.25970.15840.12031.15510.08520.0044.7737
0lympho148184.05410.02310.0150.02310.0040.20650.0110.0030.04910.0020.0010.2988
0mnist76031009.20697.20330.062257.44630.04511.78677.12596.47822.49164.69750.1494191.535
0musk30621663.16792.35320.036113.49440.08221.55612.38032.078514.78821.33960.1594116.736
0optdigits5216642.87583.14730.026114.98470.03410.87731.99531.74571.04671.40280.045160.2475
0pendigits6870162.27071.40170.05514.94510.0110.57650.96560.60061.81581.02070.0125.4254
0pima768834.89580.13340.02210.09420.0030.20350.06620.010.08520.0120.0021.1691
0satellite64353631.63951.890.0258.09350.0190.86931.25331.06582.86461.32650.027134.2882
0satimage-25803361.22351.73960.05317.22520.02110.67881.19920.94452.18181.1140.021133.6747
0vertebral240612.50.03710.0230.03910.0030.17750.0180.0030.04910.0020.0010.3359
0vowels1456123.43410.26370.02210.27870.0040.25570.11530.03510.71890.04410.0022.3161
0wbc378305.55560.06320.0150.07020.0080.20550.02910.0070.06020.0060.0020.5545
\n", "
" ], "text/plain": [ " Data #Samples # Dimensions Outlier Perc ABOD CBLOF FB \\\n", "0 arrhythmia 452 274 14.6018 0.1454 0.0301 0.5825 \n", "0 cardio 1831 21 9.6122 0.399 0.0261 0.8914 \n", "0 glass 214 9 4.2056 0.0401 0.012 0.0281 \n", "0 ionosphere 351 33 35.8974 0.0792 0.0271 0.0722 \n", "0 letter 1600 32 6.25 0.3579 0.021 0.751 \n", "0 lympho 148 18 4.0541 0.0231 0.015 0.0231 \n", "0 mnist 7603 100 9.2069 7.2033 0.0622 57.4463 \n", "0 musk 3062 166 3.1679 2.3532 0.0361 13.4944 \n", "0 optdigits 5216 64 2.8758 3.1473 0.0261 14.9847 \n", "0 pendigits 6870 16 2.2707 1.4017 0.0551 4.9451 \n", "0 pima 768 8 34.8958 0.1334 0.0221 0.0942 \n", "0 satellite 6435 36 31.6395 1.89 0.025 8.0935 \n", "0 satimage-2 5803 36 1.2235 1.7396 0.0531 7.2252 \n", "0 vertebral 240 6 12.5 0.0371 0.023 0.0391 \n", "0 vowels 1456 12 3.4341 0.2637 0.0221 0.2787 \n", "0 wbc 378 30 5.5556 0.0632 0.015 0.0702 \n", "\n", " HBOS IForest KNN LOF MCD OCSVM PCA LSCP \n", "0 0.0622 0.2477 0.0932 0.0681 0.5083 0.0471 0.0602 2.4836 \n", "0 0.006 0.3219 0.3168 0.1494 0.6467 0.0983 0.004 5.0424 \n", "0 0.002 0.1684 0.018 0.003 0.0431 0.001 0.002 0.2878 \n", "0 0.008 0.2347 0.0251 0.006 0.0682 0.005 0.002 0.5264 \n", "0 0.008 0.2597 0.1584 0.1203 1.1551 0.0852 0.004 4.7737 \n", "0 0.004 0.2065 0.011 0.003 0.0491 0.002 0.001 0.2988 \n", "0 0.0451 1.7867 7.1259 6.4782 2.4916 4.6975 0.1494 191.535 \n", "0 0.0822 1.5561 2.3803 2.0785 14.7882 1.3396 0.1594 116.736 \n", "0 0.0341 0.8773 1.9953 1.7457 1.0467 1.4028 0.0451 60.2475 \n", "0 0.011 0.5765 0.9656 0.6006 1.8158 1.0207 0.01 25.4254 \n", "0 0.003 0.2035 0.0662 0.01 0.0852 0.012 0.002 1.1691 \n", "0 0.019 0.8693 1.2533 1.0658 2.8646 1.3265 0.0271 34.2882 \n", "0 0.0211 0.6788 1.1992 0.9445 2.1818 1.114 0.0211 33.6747 \n", "0 0.003 0.1775 0.018 0.003 0.0491 0.002 0.001 0.3359 \n", "0 0.004 0.2557 0.1153 0.0351 0.7189 0.0441 0.002 2.3161 \n", "0 0.008 0.2055 0.0291 0.007 0.0602 0.006 0.002 0.5545 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print('Time complexity')\n", "time_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Analyze the performance of ROC and Precision @ n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ROC Performance\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Data#Samples# DimensionsOutlier PercABODCBLOFFBHBOSIForestKNNLOFMCDOCSVMPCALSCP
0arrhythmia45227414.60180.76870.7780.77360.85110.82170.7820.77870.82280.79860.79970.7754
0cardio1831219.61220.59520.88940.56280.82270.89530.74420.54590.77740.9140.93230.622
0glass21494.20560.85880.77650.42350.60.77650.83530.38820.83530.75290.71760.7529
0ionosphere3513335.89740.93020.80730.90920.58690.87340.93580.91140.95760.88610.82040.9041
0letter1600326.250.90350.55550.90770.60560.59450.89090.88210.81440.57270.51040.857
0lympho148184.05410.93570.97080.92410.99420.90640.9240.77780.93570.96490.9357
0mnist76031009.20690.79780.84770.74510.56450.81540.86430.74420.89260.85950.85720.7873
0musk30621663.16790.21110.98640.61410.99990.99970.82240.62320.998410.99990.5304
0optdigits5216642.87580.42940.490.41080.8350.73650.38360.39960.37910.5320.5250.3975
0pendigits6870162.27070.66080.9340.39920.92090.92960.70860.4190.83690.92670.93590.487
0pima768834.89580.68640.70640.6180.70310.67040.7120.63560.70260.62970.66020.6538
0satellite64353631.63950.56760.73070.56450.75930.69470.68270.56760.79910.65510.59760.5809
0satimage-25803361.22350.84970.95680.47980.99480.99970.96930.48190.99610.99740.6986
0vertebral240612.50.24280.25880.29090.15240.24490.27910.2930.32730.29090.24390.2503
0vowels1456123.43410.97260.5740.9550.66830.78090.97750.95140.70810.82440.55850.9604
0wbc378305.55560.88030.93740.92240.94150.91020.90340.92110.91290.92240.91020.9116
\n", "
" ], "text/plain": [ " Data #Samples # Dimensions Outlier Perc ABOD CBLOF FB \\\n", "0 arrhythmia 452 274 14.6018 0.7687 0.778 0.7736 \n", "0 cardio 1831 21 9.6122 0.5952 0.8894 0.5628 \n", "0 glass 214 9 4.2056 0.8588 0.7765 0.4235 \n", "0 ionosphere 351 33 35.8974 0.9302 0.8073 0.9092 \n", "0 letter 1600 32 6.25 0.9035 0.5555 0.9077 \n", "0 lympho 148 18 4.0541 0.9357 0.9708 0.924 \n", "0 mnist 7603 100 9.2069 0.7978 0.8477 0.7451 \n", "0 musk 3062 166 3.1679 0.2111 0.9864 0.6141 \n", "0 optdigits 5216 64 2.8758 0.4294 0.49 0.4108 \n", "0 pendigits 6870 16 2.2707 0.6608 0.934 0.3992 \n", "0 pima 768 8 34.8958 0.6864 0.7064 0.618 \n", "0 satellite 6435 36 31.6395 0.5676 0.7307 0.5645 \n", "0 satimage-2 5803 36 1.2235 0.8497 0.9568 0.4798 \n", "0 vertebral 240 6 12.5 0.2428 0.2588 0.2909 \n", "0 vowels 1456 12 3.4341 0.9726 0.574 0.955 \n", "0 wbc 378 30 5.5556 0.8803 0.9374 0.9224 \n", "\n", " HBOS IForest KNN LOF MCD OCSVM PCA LSCP \n", "0 0.8511 0.8217 0.782 0.7787 0.8228 0.7986 0.7997 0.7754 \n", "0 0.8227 0.8953 0.7442 0.5459 0.7774 0.914 0.9323 0.622 \n", "0 0.6 0.7765 0.8353 0.3882 0.8353 0.7529 0.7176 0.7529 \n", "0 0.5869 0.8734 0.9358 0.9114 0.9576 0.8861 0.8204 0.9041 \n", "0 0.6056 0.5945 0.8909 0.8821 0.8144 0.5727 0.5104 0.857 \n", "0 1 0.9942 0.9064 0.924 0.7778 0.9357 0.9649 0.9357 \n", "0 0.5645 0.8154 0.8643 0.7442 0.8926 0.8595 0.8572 0.7873 \n", "0 0.9999 0.9997 0.8224 0.6232 0.9984 1 0.9999 0.5304 \n", "0 0.835 0.7365 0.3836 0.3996 0.3791 0.532 0.525 0.3975 \n", "0 0.9209 0.9296 0.7086 0.419 0.8369 0.9267 0.9359 0.487 \n", "0 0.7031 0.6704 0.712 0.6356 0.7026 0.6297 0.6602 0.6538 \n", "0 0.7593 0.6947 0.6827 0.5676 0.7991 0.6551 0.5976 0.5809 \n", "0 0.9948 0.9997 0.9693 0.4819 0.996 1 0.9974 0.6986 \n", "0 0.1524 0.2449 0.2791 0.293 0.3273 0.2909 0.2439 0.2503 \n", "0 0.6683 0.7809 0.9775 0.9514 0.7081 0.8244 0.5585 0.9604 \n", "0 0.9415 0.9102 0.9034 0.9211 0.9129 0.9224 0.9102 0.9116 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print('ROC Performance')\n", "roc_df" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Precision @ n Performance\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Data#Samples# DimensionsOutlier PercABODCBLOFFBHBOSIForestKNNLOFMCDOCSVMPCALSCP
0arrhythmia45227414.60180.35710.50.50.57140.50.50.46430.42860.50.50.4286
0cardio1831219.61220.18840.49280.15940.47830.44930.28990.15940.42030.44930.55070.1884
0glass21494.205600000000000
0ionosphere3513335.89740.84620.61540.76920.40380.71150.88460.76920.90380.80770.61540.75
0letter1600326.250.42550.08510.48940.19150.10640.40430.46810.19150.14890.12770.4043
0lympho148184.05410.33330.66670.333310.66670.33330.333300.33330.66670.3333
0mnist76031009.20690.35940.39150.34520.11740.30960.44480.35230.48750.39150.38430.3665
0musk30621663.16790.04880.68290.21950.97560.97560.24390.21950.87810.95120.1463
0optdigits5216642.87580.014900.01490.2090.029900.01490000.0149
0pendigits6870162.27070.12240.20410.04080.30610.30610.04080.04080.06120.24490.26530.0408
0pima768834.89580.58470.56780.47460.58470.54240.58470.51690.56780.50850.55080.5339
0satellite64353631.63950.40780.45390.40540.58040.56860.50.40660.68320.53550.47870.4184
0satimage-25803361.22350.16670.50.10.70.93330.40.10.710.83330.1
0vertebral240612.500000000000
0vowels1456123.43410.500.250.06250.06250.31250.31250.06250.187500.3125
0wbc378305.55560.20.40.20.40.20.20.20.20.20.20.4
\n", "
" ], "text/plain": [ " Data #Samples # Dimensions Outlier Perc ABOD CBLOF FB \\\n", "0 arrhythmia 452 274 14.6018 0.3571 0.5 0.5 \n", "0 cardio 1831 21 9.6122 0.1884 0.4928 0.1594 \n", "0 glass 214 9 4.2056 0 0 0 \n", "0 ionosphere 351 33 35.8974 0.8462 0.6154 0.7692 \n", "0 letter 1600 32 6.25 0.4255 0.0851 0.4894 \n", "0 lympho 148 18 4.0541 0.3333 0.6667 0.3333 \n", "0 mnist 7603 100 9.2069 0.3594 0.3915 0.3452 \n", "0 musk 3062 166 3.1679 0.0488 0.6829 0.2195 \n", "0 optdigits 5216 64 2.8758 0.0149 0 0.0149 \n", "0 pendigits 6870 16 2.2707 0.1224 0.2041 0.0408 \n", "0 pima 768 8 34.8958 0.5847 0.5678 0.4746 \n", "0 satellite 6435 36 31.6395 0.4078 0.4539 0.4054 \n", "0 satimage-2 5803 36 1.2235 0.1667 0.5 0.1 \n", "0 vertebral 240 6 12.5 0 0 0 \n", "0 vowels 1456 12 3.4341 0.5 0 0.25 \n", "0 wbc 378 30 5.5556 0.2 0.4 0.2 \n", "\n", " HBOS IForest KNN LOF MCD OCSVM PCA LSCP \n", "0 0.5714 0.5 0.5 0.4643 0.4286 0.5 0.5 0.4286 \n", "0 0.4783 0.4493 0.2899 0.1594 0.4203 0.4493 0.5507 0.1884 \n", "0 0 0 0 0 0 0 0 0 \n", "0 0.4038 0.7115 0.8846 0.7692 0.9038 0.8077 0.6154 0.75 \n", "0 0.1915 0.1064 0.4043 0.4681 0.1915 0.1489 0.1277 0.4043 \n", "0 1 0.6667 0.3333 0.3333 0 0.3333 0.6667 0.3333 \n", "0 0.1174 0.3096 0.4448 0.3523 0.4875 0.3915 0.3843 0.3665 \n", "0 0.9756 0.9756 0.2439 0.2195 0.878 1 0.9512 0.1463 \n", "0 0.209 0.0299 0 0.0149 0 0 0 0.0149 \n", "0 0.3061 0.3061 0.0408 0.0408 0.0612 0.2449 0.2653 0.0408 \n", "0 0.5847 0.5424 0.5847 0.5169 0.5678 0.5085 0.5508 0.5339 \n", "0 0.5804 0.5686 0.5 0.4066 0.6832 0.5355 0.4787 0.4184 \n", "0 0.7 0.9333 0.4 0.1 0.7 1 0.8333 0.1 \n", "0 0 0 0 0 0 0 0 0 \n", "0 0.0625 0.0625 0.3125 0.3125 0.0625 0.1875 0 0.3125 \n", "0 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0.4 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print('Precision @ n Performance')\n", "prn_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 2 }