{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Benchmark of various outlier detection models\n",
"\n",
"### The models are evaluaed on ROC, Precision @ n and execution time on 17 benchmark datasets. All datasets are splitted 60% for training and 40% for testing.\n",
"\n",
"**[PyOD](https://github.com/yzhao062/Pyod)** is a comprehensive **Python toolkit** to **identify outlying objects** in \n",
"multivariate data with both unsupervised and supervised approaches.\n",
"\n",
"\n",
" 1. Linear Models for Outlier Detection:\n",
" 1. **PCA: Principal Component Analysis** use the sum of\n",
" weighted projected distances to the eigenvector hyperplane \n",
" as the outlier outlier scores) [10]\n",
" 2. **MCD: Minimum Covariance Determinant** (use the mahalanobis distances \n",
" as the outlier scores) [11, 12]\n",
" 3. **One-Class Support Vector Machines** [3]\n",
" \n",
" 2. Proximity-Based Outlier Detection Models:\n",
" 1. **LOF: Local Outlier Factor** [1]\n",
" 2. **CBLOF: Clustering-Based Local Outlier Factor** [15]\n",
" 3. **kNN: k Nearest Neighbors** (use the distance to the kth nearest \n",
" neighbor as the outlier score)\n",
" 4. **Average kNN** Outlier Detection (use the average distance to k \n",
" nearest neighbors as the outlier score)\n",
" 5. **Median kNN** Outlier Detection (use the median distance to k nearest \n",
" neighbors as the outlier score)\n",
" 6. **HBOS: Histogram-based Outlier Score** [5]\n",
" \n",
" 3. Probabilistic Models for Outlier Detection:\n",
" 1. **ABOD: Angle-Based Outlier Detection** [7]\n",
" 2. **FastABOD: Fast Angle-Based Outlier Detection using approximation** [7]\n",
" \n",
" 4. Outlier Ensembles and Combination Frameworks\n",
" 1. **Isolation Forest** [2]\n",
" 2. **Feature Bagging** [9]"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from __future__ import division\n",
"from __future__ import print_function\n",
"\n",
"import os\n",
"import sys\n",
"from time import time\n",
"\n",
"# temporary solution for relative imports in case pyod is not installed\n",
"# if pyod is installed, no need to use the following line\n",
"sys.path.append(\n",
" os.path.abspath(os.path.join(os.path.dirname(\"__file__\"), '..')))\n",
"# supress warnings for clean output\n",
"import warnings\n",
"\n",
"warnings.filterwarnings(\"ignore\")\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn.model_selection import train_test_split\n",
"from scipy.io import loadmat\n",
"\n",
"from pyod.models.abod import ABOD\n",
"from pyod.models.cblof import CBLOF\n",
"from pyod.models.feature_bagging import FeatureBagging\n",
"from pyod.models.hbos import HBOS\n",
"from pyod.models.iforest import IForest\n",
"from pyod.models.knn import KNN\n",
"from pyod.models.lof import LOF\n",
"from pyod.models.mcd import MCD\n",
"from pyod.models.ocsvm import OCSVM\n",
"from pyod.models.pca import PCA\n",
"\n",
"from pyod.utils.utility import standardizer\n",
"from pyod.utils.utility import precision_n_scores\n",
"from sklearn.metrics import roc_auc_score"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"... Processing arrhythmia.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.7687, precision @ rank n:0.3571, execution time: 0.171s\n",
"Cluster-based Local Outlier Factor ROC:0.778, precision @ rank n:0.5, execution time: 0.0344s\n",
"Feature Bagging ROC:0.7736, precision @ rank n:0.5, execution time: 0.5254s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.8511, precision @ rank n:0.5714, execution time: 0.2122s\n",
"Isolation Forest ROC:0.8217, precision @ rank n:0.5, execution time: 0.2036s\n",
"K Nearest Neighbors (KNN) ROC:0.782, precision @ rank n:0.5, execution time: 0.0853s\n",
"Local Outlier Factor (LOF) ROC:0.7787, precision @ rank n:0.4643, execution time: 0.0702s\n",
"Minimum Covariance Determinant (MCD) ROC:0.8228, precision @ rank n:0.4286, execution time: 0.5374s\n",
"One-class SVM (OCSVM) ROC:0.7986, precision @ rank n:0.5, execution time: 0.0481s\n",
"Principal Component Analysis (PCA) ROC:0.7997, precision @ rank n:0.5, execution time: 0.0572s\n",
"\n",
"... Processing cardio.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.5668, precision @ rank n:0.209, execution time: 0.5818s\n",
"Cluster-based Local Outlier Factor ROC:0.8987, precision @ rank n:0.5075, execution time: 0.031s\n",
"Feature Bagging ROC:0.5667, precision @ rank n:0.1194, execution time: 0.909s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.8102, precision @ rank n:0.3731, execution time: 0.0469s\n",
"Isolation Forest ROC:0.8726, precision @ rank n:0.3433, execution time: 0.2626s\n",
"K Nearest Neighbors (KNN) ROC:0.7252, precision @ rank n:0.2388, execution time: 0.1524s\n",
"Local Outlier Factor (LOF) ROC:0.5313, precision @ rank n:0.1493, execution time: 0.0898s\n",
"Minimum Covariance Determinant (MCD) ROC:0.7966, precision @ rank n:0.3284, execution time: 0.4314s\n",
"One-class SVM (OCSVM) ROC:0.9055, precision @ rank n:0.3731, execution time: 0.0922s\n",
"Principal Component Analysis (PCA) ROC:0.9237, precision @ rank n:0.4925, execution time: 0.004s\n",
"\n",
"... Processing glass.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.7605, precision @ rank n:0.0, execution time: 0.0681s\n",
"Cluster-based Local Outlier Factor ROC:0.7457, precision @ rank n:0.0, execution time: 0.0192s\n",
"Feature Bagging ROC:0.758, precision @ rank n:0.2, execution time: 0.0156s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.6346, precision @ rank n:0.0, execution time: 0.0156s\n",
"Isolation Forest ROC:0.5556, precision @ rank n:0.0, execution time: 0.1253s\n",
"K Nearest Neighbors (KNN) ROC:0.8198, precision @ rank n:0.0, execution time: 0.0091s\n",
"Local Outlier Factor (LOF) ROC:0.8395, precision @ rank n:0.2, execution time: 0.002s\n",
"Minimum Covariance Determinant (MCD) ROC:0.7728, precision @ rank n:0.0, execution time: 0.0356s\n",
"One-class SVM (OCSVM) ROC:0.5506, precision @ rank n:0.0, execution time: 0.0s\n",
"Principal Component Analysis (PCA) ROC:0.5506, precision @ rank n:0.0, execution time: 0.0s\n",
"\n",
"... Processing ionosphere.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.92, precision @ rank n:0.8333, execution time: 0.0937s\n",
"Cluster-based Local Outlier Factor ROC:0.812, precision @ rank n:0.6111, execution time: 0.0156s\n",
"Feature Bagging ROC:0.9004, precision @ rank n:0.7407, execution time: 0.0625s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.6005, precision @ rank n:0.4259, execution time: 0.0313s\n",
"Isolation Forest ROC:0.8587, precision @ rank n:0.6667, execution time: 0.16s\n",
"K Nearest Neighbors (KNN) ROC:0.9378, precision @ rank n:0.8704, execution time: 0.0157s\n",
"Local Outlier Factor (LOF) ROC:0.9063, precision @ rank n:0.7407, execution time: 0.0156s\n",
"Minimum Covariance Determinant (MCD) ROC:0.9513, precision @ rank n:0.8704, execution time: 0.0468s\n",
"One-class SVM (OCSVM) ROC:0.8497, precision @ rank n:0.7593, execution time: 0.0s\n",
"Principal Component Analysis (PCA) ROC:0.8025, precision @ rank n:0.6481, execution time: 0.0s\n",
"\n",
"... Processing letter.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.8992, precision @ rank n:0.3438, execution time: 0.4973s\n",
"Cluster-based Local Outlier Factor ROC:0.5905, precision @ rank n:0.0625, execution time: 0.0156s\n",
"Feature Bagging ROC:0.8938, precision @ rank n:0.4062, execution time: 0.7252s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.6328, precision @ rank n:0.0312, execution time: 0.0625s\n",
"Isolation Forest ROC:0.6445, precision @ rank n:0.0312, execution time: 0.2632s\n",
"K Nearest Neighbors (KNN) ROC:0.8972, precision @ rank n:0.3438, execution time: 0.1263s\n",
"Local Outlier Factor (LOF) ROC:0.8821, precision @ rank n:0.3125, execution time: 0.0943s\n",
"Minimum Covariance Determinant (MCD) ROC:0.8766, precision @ rank n:0.125, execution time: 0.9454s\n",
"One-class SVM (OCSVM) ROC:0.6071, precision @ rank n:0.0938, execution time: 0.0762s\n",
"Principal Component Analysis (PCA) ROC:0.5265, precision @ rank n:0.0625, execution time: 0.004s\n",
"\n",
"... Processing lympho.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.7155, precision @ rank n:0.0, execution time: 0.0401s\n",
"Cluster-based Local Outlier Factor ROC:0.9914, precision @ rank n:0.5, execution time: 0.017s\n",
"Feature Bagging ROC:0.9483, precision @ rank n:0.5, execution time: 0.023s\n",
"Histogram-base Outlier Detection (HBOS) ROC:1.0, precision @ rank n:1.0, execution time: 0.0071s\n",
"Isolation Forest ROC:1.0, precision @ rank n:1.0, execution time: 0.1614s\n",
"K Nearest Neighbors (KNN) ROC:0.9397, precision @ rank n:0.5, execution time: 0.005s\n",
"Local Outlier Factor (LOF) ROC:0.9569, precision @ rank n:0.5, execution time: 0.002s\n",
"Minimum Covariance Determinant (MCD) ROC:0.9483, precision @ rank n:0.5, execution time: 0.031s\n",
"One-class SVM (OCSVM) ROC:0.9655, precision @ rank n:0.5, execution time: 0.002s\n",
"Principal Component Analysis (PCA) ROC:0.9914, precision @ rank n:0.5, execution time: 0.002s\n",
"\n",
"... Processing mnist.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.7747, precision @ rank n:0.384, execution time: 7.7986s\n",
"Cluster-based Local Outlier Factor ROC:0.8431, precision @ rank n:0.365, execution time: 0.067s\n",
"Feature Bagging ROC:0.7246, precision @ rank n:0.3422, execution time: 42.0965s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.5769, precision @ rank n:0.1217, execution time: 0.9069s\n",
"Isolation Forest ROC:0.8033, precision @ rank n:0.2966, execution time: 1.4098s\n",
"K Nearest Neighbors (KNN) ROC:0.8431, precision @ rank n:0.4183, execution time: 5.7465s\n",
"Local Outlier Factor (LOF) ROC:0.7101, precision @ rank n:0.3384, execution time: 5.7854s\n",
"Minimum Covariance Determinant (MCD) ROC:0.9059, precision @ rank n:0.5133, execution time: 2.0246s\n",
"One-class SVM (OCSVM) ROC:0.851, precision @ rank n:0.3802, execution time: 4.2719s\n",
"Principal Component Analysis (PCA) ROC:0.8497, precision @ rank n:0.3688, execution time: 0.125s\n",
"\n",
"... Processing musk.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.2716, precision @ rank n:0.0714, execution time: 2.2028s\n",
"Cluster-based Local Outlier Factor ROC:1.0, precision @ rank n:1.0, execution time: 0.028s\n",
"Feature Bagging ROC:0.6591, precision @ rank n:0.2143, execution time: 10.1455s\n",
"Histogram-base Outlier Detection (HBOS) ROC:1.0, precision @ rank n:1.0, execution time: 0.5653s\n",
"Isolation Forest ROC:1.0, precision @ rank n:1.0, execution time: 0.8163s\n",
"K Nearest Neighbors (KNN) ROC:0.8247, precision @ rank n:0.2857, execution time: 1.5487s\n",
"Local Outlier Factor (LOF) ROC:0.6128, precision @ rank n:0.2143, execution time: 1.466s\n",
"Minimum Covariance Determinant (MCD) ROC:1.0, precision @ rank n:0.9762, execution time: 8.3801s\n",
"One-class SVM (OCSVM) ROC:1.0, precision @ rank n:1.0, execution time: 1.3012s\n",
"Principal Component Analysis (PCA) ROC:1.0, precision @ rank n:1.0, execution time: 0.1484s\n",
"\n",
"... Processing optdigits.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.4971, precision @ rank n:0.0299, execution time: 2.8534s\n",
"Cluster-based Local Outlier Factor ROC:0.5922, precision @ rank n:0.0, execution time: 0.025s\n",
"Feature Bagging ROC:0.4715, precision @ rank n:0.0448, execution time: 10.828s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.8553, precision @ rank n:0.209, execution time: 0.4581s\n",
"Isolation Forest ROC:0.7033, precision @ rank n:0.0299, execution time: 0.658s\n",
"K Nearest Neighbors (KNN) ROC:0.4029, precision @ rank n:0.0, execution time: 1.7452s\n",
"Local Outlier Factor (LOF) ROC:0.4934, precision @ rank n:0.0448, execution time: 1.9845s\n",
"Minimum Covariance Determinant (MCD) ROC:0.4041, precision @ rank n:0.0, execution time: 1.2785s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"One-class SVM (OCSVM) ROC:0.4808, precision @ rank n:0.0, execution time: 1.5436s\n",
"Principal Component Analysis (PCA) ROC:0.5016, precision @ rank n:0.0, execution time: 0.0537s\n",
"\n",
"... Processing pendigits.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.6957, precision @ rank n:0.1127, execution time: 1.8649s\n",
"Cluster-based Local Outlier Factor ROC:0.9329, precision @ rank n:0.2394, execution time: 0.0669s\n",
"Feature Bagging ROC:0.4277, precision @ rank n:0.0986, execution time: 4.0121s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.9322, precision @ rank n:0.3099, execution time: 0.1376s\n",
"Isolation Forest ROC:0.9756, precision @ rank n:0.3944, execution time: 0.5828s\n",
"K Nearest Neighbors (KNN) ROC:0.7694, precision @ rank n:0.1549, execution time: 0.5576s\n",
"Local Outlier Factor (LOF) ROC:0.4056, precision @ rank n:0.0986, execution time: 0.5517s\n",
"Minimum Covariance Determinant (MCD) ROC:0.8413, precision @ rank n:0.0986, execution time: 1.609s\n",
"One-class SVM (OCSVM) ROC:0.9376, precision @ rank n:0.3662, execution time: 0.8713s\n",
"Principal Component Analysis (PCA) ROC:0.9384, precision @ rank n:0.3803, execution time: 0.005s\n",
"\n",
"... Processing pima.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.6623, precision @ rank n:0.4906, execution time: 0.1805s\n",
"Cluster-based Local Outlier Factor ROC:0.7654, precision @ rank n:0.5755, execution time: 0.022s\n",
"Feature Bagging ROC:0.6523, precision @ rank n:0.4811, execution time: 0.0851s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.7016, precision @ rank n:0.5283, execution time: 0.0168s\n",
"Isolation Forest ROC:0.696, precision @ rank n:0.5283, execution time: 0.1829s\n",
"K Nearest Neighbors (KNN) ROC:0.71, precision @ rank n:0.5094, execution time: 0.0156s\n",
"Local Outlier Factor (LOF) ROC:0.6455, precision @ rank n:0.4717, execution time: 0.0156s\n",
"Minimum Covariance Determinant (MCD) ROC:0.6755, precision @ rank n:0.5094, execution time: 0.0468s\n",
"One-class SVM (OCSVM) ROC:0.6024, precision @ rank n:0.4528, execution time: 0.0s\n",
"Principal Component Analysis (PCA) ROC:0.6508, precision @ rank n:0.5, execution time: 0.0s\n",
"\n",
"... Processing satellite.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.5821, precision @ rank n:0.4077, execution time: 2.2945s\n",
"Cluster-based Local Outlier Factor ROC:0.5428, precision @ rank n:0.3153, execution time: 0.0313s\n",
"Feature Bagging ROC:0.5469, precision @ rank n:0.3957, execution time: 6.9375s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.7471, precision @ rank n:0.5612, execution time: 0.2903s\n",
"Isolation Forest ROC:0.7153, precision @ rank n:0.5743, execution time: 0.5562s\n",
"K Nearest Neighbors (KNN) ROC:0.6868, precision @ rank n:0.5072, execution time: 0.8947s\n",
"Local Outlier Factor (LOF) ROC:0.5509, precision @ rank n:0.3993, execution time: 0.8896s\n",
"Minimum Covariance Determinant (MCD) ROC:0.8059, precision @ rank n:0.6906, execution time: 1.679s\n",
"One-class SVM (OCSVM) ROC:0.6564, precision @ rank n:0.5372, execution time: 1.225s\n",
"Principal Component Analysis (PCA) ROC:0.5902, precision @ rank n:0.4712, execution time: 0.0313s\n",
"\n",
"... Processing satimage-2.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.8216, precision @ rank n:0.2, execution time: 1.8979s\n",
"Cluster-based Local Outlier Factor ROC:0.9372, precision @ rank n:0.6, execution time: 0.0468s\n",
"Feature Bagging ROC:0.3658, precision @ rank n:0.04, execution time: 5.081s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.9862, precision @ rank n:0.68, execution time: 0.242s\n",
"Isolation Forest ROC:0.9959, precision @ rank n:0.88, execution time: 0.4869s\n",
"K Nearest Neighbors (KNN) ROC:0.9528, precision @ rank n:0.28, execution time: 0.7402s\n",
"Local Outlier Factor (LOF) ROC:0.3717, precision @ rank n:0.0417, execution time: 0.7539s\n",
"Minimum Covariance Determinant (MCD) ROC:0.9949, precision @ rank n:0.52, execution time: 1.4754s\n",
"One-class SVM (OCSVM) ROC:0.9975, precision @ rank n:0.96, execution time: 1.1365s\n",
"Principal Component Analysis (PCA) ROC:0.9901, precision @ rank n:0.8, execution time: 0.0022s\n",
"\n",
"... Processing shuttle.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.6164, precision @ rank n:0.1785, execution time: 19.6385s\n",
"Cluster-based Local Outlier Factor ROC:0.9899, precision @ rank n:0.95, execution time: 0.0781s\n",
"Feature Bagging ROC:0.5342, precision @ rank n:0.0836, execution time: 57.0473s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.9847, precision @ rank n:0.9657, execution time: 0.4604s\n",
"Isolation Forest ROC:0.9963, precision @ rank n:0.9022, execution time: 2.8161s\n",
"K Nearest Neighbors (KNN) ROC:0.6409, precision @ rank n:0.2028, execution time: 7.8509s\n",
"Local Outlier Factor (LOF) ROC:0.5373, precision @ rank n:0.1329, execution time: 10.3425s\n",
"Minimum Covariance Determinant (MCD) ROC:0.9898, precision @ rank n:0.7192, execution time: 7.8624s\n",
"One-class SVM (OCSVM) ROC:0.9919, precision @ rank n:0.9559, execution time: 38.0809s\n",
"Principal Component Analysis (PCA) ROC:0.9901, precision @ rank n:0.9507, execution time: 0.0373s\n",
"\n",
"... Processing vertebral.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.3435, precision @ rank n:0.0, execution time: 0.0671s\n",
"Cluster-based Local Outlier Factor ROC:0.504, precision @ rank n:0.0, execution time: 0.0156s\n",
"Feature Bagging ROC:0.4013, precision @ rank n:0.0, execution time: 0.0346s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.4278, precision @ rank n:0.0, execution time: 0.0s\n",
"Isolation Forest ROC:0.3628, precision @ rank n:0.0, execution time: 0.1377s\n",
"K Nearest Neighbors (KNN) ROC:0.3226, precision @ rank n:0.0, execution time: 0.0156s\n",
"Local Outlier Factor (LOF) ROC:0.4382, precision @ rank n:0.0, execution time: 0.0s\n",
"Minimum Covariance Determinant (MCD) ROC:0.3772, precision @ rank n:0.0, execution time: 0.0534s\n",
"One-class SVM (OCSVM) ROC:0.3868, precision @ rank n:0.0, execution time: 0.001s\n",
"Principal Component Analysis (PCA) ROC:0.427, precision @ rank n:0.0, execution time: 0.001s\n",
"\n",
"... Processing vowels.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.9718, precision @ rank n:0.5, execution time: 0.395s\n",
"Cluster-based Local Outlier Factor ROC:0.5681, precision @ rank n:0.0, execution time: 0.0156s\n",
"Feature Bagging ROC:0.9304, precision @ rank n:0.3636, execution time: 0.2297s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.7306, precision @ rank n:0.1818, execution time: 0.0072s\n",
"Isolation Forest ROC:0.7791, precision @ rank n:0.2727, execution time: 0.1966s\n",
"K Nearest Neighbors (KNN) ROC:0.9661, precision @ rank n:0.4545, execution time: 0.0731s\n",
"Local Outlier Factor (LOF) ROC:0.9277, precision @ rank n:0.3182, execution time: 0.0313s\n",
"Minimum Covariance Determinant (MCD) ROC:0.6799, precision @ rank n:0.0455, execution time: 0.5917s\n",
"One-class SVM (OCSVM) ROC:0.7739, precision @ rank n:0.3182, execution time: 0.0313s\n",
"Principal Component Analysis (PCA) ROC:0.6362, precision @ rank n:0.2727, execution time: 0.0156s\n",
"\n",
"... Processing wbc.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.9723, precision @ rank n:0.7273, execution time: 0.1057s\n",
"Cluster-based Local Outlier Factor ROC:0.9658, precision @ rank n:0.6364, execution time: 0.0156s\n",
"Feature Bagging ROC:0.9562, precision @ rank n:0.7273, execution time: 0.0703s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.9678, precision @ rank n:0.7273, execution time: 0.0171s\n",
"Isolation Forest ROC:0.9516, precision @ rank n:0.6364, execution time: 0.1392s\n",
"K Nearest Neighbors (KNN) ROC:0.962, precision @ rank n:0.6364, execution time: 0.0156s\n",
"Local Outlier Factor (LOF) ROC:0.9542, precision @ rank n:0.7273, execution time: 0.0179s\n",
"Minimum Covariance Determinant (MCD) ROC:0.9523, precision @ rank n:0.5455, execution time: 0.0541s\n",
"One-class SVM (OCSVM) ROC:0.9555, precision @ rank n:0.6364, execution time: 0.005s\n",
"Principal Component Analysis (PCA) ROC:0.951, precision @ rank n:0.6364, execution time: 0.002s\n"
]
}
],
"source": [
"# Define data file and read X and y\n",
"mat_file_list = ['arrhythmia.mat',\n",
" 'cardio.mat',\n",
" 'glass.mat',\n",
" 'ionosphere.mat',\n",
" 'letter.mat',\n",
" 'lympho.mat',\n",
" 'mnist.mat',\n",
" 'musk.mat',\n",
" 'optdigits.mat',\n",
" 'pendigits.mat',\n",
" 'pima.mat',\n",
" 'satellite.mat',\n",
" 'satimage-2.mat',\n",
" 'shuttle.mat',\n",
" 'vertebral.mat',\n",
" 'vowels.mat',\n",
" 'wbc.mat']\n",
"\n",
"# Define nine outlier detection tools to be compared\n",
"random_state = np.random.RandomState(42)\n",
"\n",
"df_columns = ['Data', '#Samples', '# Dimensions', 'Outlier Perc',\n",
" 'ABOD', 'CBLOF', 'FB', 'HBOS', 'IForest', 'KNN', 'LOF', 'MCD',\n",
" 'OCSVM', 'PCA']\n",
"roc_df = pd.DataFrame(columns=df_columns)\n",
"prn_df = pd.DataFrame(columns=df_columns)\n",
"time_df = pd.DataFrame(columns=df_columns)\n",
"\n",
"for mat_file in mat_file_list:\n",
" print(\"\\n... Processing\", mat_file, '...')\n",
" mat = loadmat(os.path.join('data', mat_file))\n",
"\n",
" X = mat['X']\n",
" y = mat['y'].ravel()\n",
" outliers_fraction = np.count_nonzero(y) / len(y)\n",
" outliers_percentage = round(outliers_fraction * 100, ndigits=4)\n",
"\n",
" # construct containers for saving results\n",
" roc_list = [mat_file[:-4], X.shape[0], X.shape[1], outliers_percentage]\n",
" prn_list = [mat_file[:-4], X.shape[0], X.shape[1], outliers_percentage]\n",
" time_list = [mat_file[:-4], X.shape[0], X.shape[1], outliers_percentage]\n",
"\n",
" # 60% data for training and 40% for testing\n",
" X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4,\n",
" random_state=random_state)\n",
"\n",
" # standardizing data for processing\n",
" X_train_norm, X_test_norm = standardizer(X_train, X_test)\n",
"\n",
" classifiers = {'Angle-based Outlier Detector (ABOD)': ABOD(\n",
" contamination=outliers_fraction),\n",
" 'Cluster-based Local Outlier Factor': CBLOF(\n",
" contamination=outliers_fraction, check_estimator=False,\n",
" random_state=random_state),\n",
" 'Feature Bagging': FeatureBagging(contamination=outliers_fraction,\n",
" check_estimator=False,\n",
" random_state=random_state),\n",
" 'Histogram-base Outlier Detection (HBOS)': HBOS(\n",
" contamination=outliers_fraction),\n",
" 'Isolation Forest': IForest(contamination=outliers_fraction,\n",
" random_state=random_state),\n",
" 'K Nearest Neighbors (KNN)': KNN(contamination=outliers_fraction),\n",
" 'Local Outlier Factor (LOF)': LOF(\n",
" contamination=outliers_fraction),\n",
" 'Minimum Covariance Determinant (MCD)': MCD(\n",
" contamination=outliers_fraction, random_state=random_state),\n",
" 'One-class SVM (OCSVM)': OCSVM(contamination=outliers_fraction,\n",
" random_state=random_state),\n",
" 'Principal Component Analysis (PCA)': PCA(\n",
" contamination=outliers_fraction, random_state=random_state),\n",
" }\n",
"\n",
" for clf_name, clf in classifiers.items():\n",
" t0 = time()\n",
" clf.fit(X_train_norm)\n",
" test_scores = clf.decision_function(X_test_norm)\n",
" t1 = time()\n",
" duration = round(t1 - t0, ndigits=4)\n",
" time_list.append(duration)\n",
"\n",
" roc = round(roc_auc_score(y_test, test_scores), ndigits=4)\n",
" prn = round(precision_n_scores(y_test, test_scores), ndigits=4)\n",
"\n",
" print('{clf_name} ROC:{roc}, precision @ rank n:{prn}, '\n",
" 'execution time: {duration}s'.format(\n",
" clf_name=clf_name, roc=roc, prn=prn, duration=duration))\n",
"\n",
" roc_list.append(roc)\n",
" prn_list.append(prn)\n",
"\n",
" temp_df = pd.DataFrame(time_list).transpose()\n",
" temp_df.columns = df_columns\n",
" time_df = pd.concat([time_df, temp_df], axis=0)\n",
"\n",
" temp_df = pd.DataFrame(roc_list).transpose()\n",
" temp_df.columns = df_columns\n",
" roc_df = pd.concat([roc_df, temp_df], axis=0)\n",
"\n",
" temp_df = pd.DataFrame(prn_list).transpose()\n",
" temp_df.columns = df_columns\n",
" prn_df = pd.concat([prn_df, temp_df], axis=0)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Time complexity\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Data | \n",
" #Samples | \n",
" # Dimensions | \n",
" Outlier Perc | \n",
" ABOD | \n",
" CBLOF | \n",
" FB | \n",
" HBOS | \n",
" IForest | \n",
" KNN | \n",
" LOF | \n",
" MCD | \n",
" OCSVM | \n",
" PCA | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" arrhythmia | \n",
" 452 | \n",
" 274 | \n",
" 14.6018 | \n",
" 0.171 | \n",
" 0.0344 | \n",
" 0.5254 | \n",
" 0.2122 | \n",
" 0.2036 | \n",
" 0.0853 | \n",
" 0.0702 | \n",
" 0.5374 | \n",
" 0.0481 | \n",
" 0.0572 | \n",
"
\n",
" \n",
" 0 | \n",
" cardio | \n",
" 1831 | \n",
" 21 | \n",
" 9.6122 | \n",
" 0.5818 | \n",
" 0.031 | \n",
" 0.909 | \n",
" 0.0469 | \n",
" 0.2626 | \n",
" 0.1524 | \n",
" 0.0898 | \n",
" 0.4314 | \n",
" 0.0922 | \n",
" 0.004 | \n",
"
\n",
" \n",
" 0 | \n",
" glass | \n",
" 214 | \n",
" 9 | \n",
" 4.2056 | \n",
" 0.0681 | \n",
" 0.0192 | \n",
" 0.0156 | \n",
" 0.0156 | \n",
" 0.1253 | \n",
" 0.0091 | \n",
" 0.002 | \n",
" 0.0356 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 0 | \n",
" ionosphere | \n",
" 351 | \n",
" 33 | \n",
" 35.8974 | \n",
" 0.0937 | \n",
" 0.0156 | \n",
" 0.0625 | \n",
" 0.0313 | \n",
" 0.16 | \n",
" 0.0157 | \n",
" 0.0156 | \n",
" 0.0468 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 0 | \n",
" letter | \n",
" 1600 | \n",
" 32 | \n",
" 6.25 | \n",
" 0.4973 | \n",
" 0.0156 | \n",
" 0.7252 | \n",
" 0.0625 | \n",
" 0.2632 | \n",
" 0.1263 | \n",
" 0.0943 | \n",
" 0.9454 | \n",
" 0.0762 | \n",
" 0.004 | \n",
"
\n",
" \n",
" 0 | \n",
" lympho | \n",
" 148 | \n",
" 18 | \n",
" 4.0541 | \n",
" 0.0401 | \n",
" 0.017 | \n",
" 0.023 | \n",
" 0.0071 | \n",
" 0.1614 | \n",
" 0.005 | \n",
" 0.002 | \n",
" 0.031 | \n",
" 0.002 | \n",
" 0.002 | \n",
"
\n",
" \n",
" 0 | \n",
" mnist | \n",
" 7603 | \n",
" 100 | \n",
" 9.2069 | \n",
" 7.7986 | \n",
" 0.067 | \n",
" 42.0965 | \n",
" 0.9069 | \n",
" 1.4098 | \n",
" 5.7465 | \n",
" 5.7854 | \n",
" 2.0246 | \n",
" 4.2719 | \n",
" 0.125 | \n",
"
\n",
" \n",
" 0 | \n",
" musk | \n",
" 3062 | \n",
" 166 | \n",
" 3.1679 | \n",
" 2.2028 | \n",
" 0.028 | \n",
" 10.1455 | \n",
" 0.5653 | \n",
" 0.8163 | \n",
" 1.5487 | \n",
" 1.466 | \n",
" 8.3801 | \n",
" 1.3012 | \n",
" 0.1484 | \n",
"
\n",
" \n",
" 0 | \n",
" optdigits | \n",
" 5216 | \n",
" 64 | \n",
" 2.8758 | \n",
" 2.8534 | \n",
" 0.025 | \n",
" 10.828 | \n",
" 0.4581 | \n",
" 0.658 | \n",
" 1.7452 | \n",
" 1.9845 | \n",
" 1.2785 | \n",
" 1.5436 | \n",
" 0.0537 | \n",
"
\n",
" \n",
" 0 | \n",
" pendigits | \n",
" 6870 | \n",
" 16 | \n",
" 2.2707 | \n",
" 1.8649 | \n",
" 0.0669 | \n",
" 4.0121 | \n",
" 0.1376 | \n",
" 0.5828 | \n",
" 0.5576 | \n",
" 0.5517 | \n",
" 1.609 | \n",
" 0.8713 | \n",
" 0.005 | \n",
"
\n",
" \n",
" 0 | \n",
" pima | \n",
" 768 | \n",
" 8 | \n",
" 34.8958 | \n",
" 0.1805 | \n",
" 0.022 | \n",
" 0.0851 | \n",
" 0.0168 | \n",
" 0.1829 | \n",
" 0.0156 | \n",
" 0.0156 | \n",
" 0.0468 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 0 | \n",
" satellite | \n",
" 6435 | \n",
" 36 | \n",
" 31.6395 | \n",
" 2.2945 | \n",
" 0.0313 | \n",
" 6.9375 | \n",
" 0.2903 | \n",
" 0.5562 | \n",
" 0.8947 | \n",
" 0.8896 | \n",
" 1.679 | \n",
" 1.225 | \n",
" 0.0313 | \n",
"
\n",
" \n",
" 0 | \n",
" satimage-2 | \n",
" 5803 | \n",
" 36 | \n",
" 1.2235 | \n",
" 1.8979 | \n",
" 0.0468 | \n",
" 5.081 | \n",
" 0.242 | \n",
" 0.4869 | \n",
" 0.7402 | \n",
" 0.7539 | \n",
" 1.4754 | \n",
" 1.1365 | \n",
" 0.0022 | \n",
"
\n",
" \n",
" 0 | \n",
" shuttle | \n",
" 49097 | \n",
" 9 | \n",
" 7.1511 | \n",
" 19.6385 | \n",
" 0.0781 | \n",
" 57.0473 | \n",
" 0.4604 | \n",
" 2.8161 | \n",
" 7.8509 | \n",
" 10.3425 | \n",
" 7.8624 | \n",
" 38.0809 | \n",
" 0.0373 | \n",
"
\n",
" \n",
" 0 | \n",
" vertebral | \n",
" 240 | \n",
" 6 | \n",
" 12.5 | \n",
" 0.0671 | \n",
" 0.0156 | \n",
" 0.0346 | \n",
" 0 | \n",
" 0.1377 | \n",
" 0.0156 | \n",
" 0 | \n",
" 0.0534 | \n",
" 0.001 | \n",
" 0.001 | \n",
"
\n",
" \n",
" 0 | \n",
" vowels | \n",
" 1456 | \n",
" 12 | \n",
" 3.4341 | \n",
" 0.395 | \n",
" 0.0156 | \n",
" 0.2297 | \n",
" 0.0072 | \n",
" 0.1966 | \n",
" 0.0731 | \n",
" 0.0313 | \n",
" 0.5917 | \n",
" 0.0313 | \n",
" 0.0156 | \n",
"
\n",
" \n",
" 0 | \n",
" wbc | \n",
" 378 | \n",
" 30 | \n",
" 5.5556 | \n",
" 0.1057 | \n",
" 0.0156 | \n",
" 0.0703 | \n",
" 0.0171 | \n",
" 0.1392 | \n",
" 0.0156 | \n",
" 0.0179 | \n",
" 0.0541 | \n",
" 0.005 | \n",
" 0.002 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Data #Samples # Dimensions Outlier Perc ABOD CBLOF FB \\\n",
"0 arrhythmia 452 274 14.6018 0.171 0.0344 0.5254 \n",
"0 cardio 1831 21 9.6122 0.5818 0.031 0.909 \n",
"0 glass 214 9 4.2056 0.0681 0.0192 0.0156 \n",
"0 ionosphere 351 33 35.8974 0.0937 0.0156 0.0625 \n",
"0 letter 1600 32 6.25 0.4973 0.0156 0.7252 \n",
"0 lympho 148 18 4.0541 0.0401 0.017 0.023 \n",
"0 mnist 7603 100 9.2069 7.7986 0.067 42.0965 \n",
"0 musk 3062 166 3.1679 2.2028 0.028 10.1455 \n",
"0 optdigits 5216 64 2.8758 2.8534 0.025 10.828 \n",
"0 pendigits 6870 16 2.2707 1.8649 0.0669 4.0121 \n",
"0 pima 768 8 34.8958 0.1805 0.022 0.0851 \n",
"0 satellite 6435 36 31.6395 2.2945 0.0313 6.9375 \n",
"0 satimage-2 5803 36 1.2235 1.8979 0.0468 5.081 \n",
"0 shuttle 49097 9 7.1511 19.6385 0.0781 57.0473 \n",
"0 vertebral 240 6 12.5 0.0671 0.0156 0.0346 \n",
"0 vowels 1456 12 3.4341 0.395 0.0156 0.2297 \n",
"0 wbc 378 30 5.5556 0.1057 0.0156 0.0703 \n",
"\n",
" HBOS IForest KNN LOF MCD OCSVM PCA \n",
"0 0.2122 0.2036 0.0853 0.0702 0.5374 0.0481 0.0572 \n",
"0 0.0469 0.2626 0.1524 0.0898 0.4314 0.0922 0.004 \n",
"0 0.0156 0.1253 0.0091 0.002 0.0356 0 0 \n",
"0 0.0313 0.16 0.0157 0.0156 0.0468 0 0 \n",
"0 0.0625 0.2632 0.1263 0.0943 0.9454 0.0762 0.004 \n",
"0 0.0071 0.1614 0.005 0.002 0.031 0.002 0.002 \n",
"0 0.9069 1.4098 5.7465 5.7854 2.0246 4.2719 0.125 \n",
"0 0.5653 0.8163 1.5487 1.466 8.3801 1.3012 0.1484 \n",
"0 0.4581 0.658 1.7452 1.9845 1.2785 1.5436 0.0537 \n",
"0 0.1376 0.5828 0.5576 0.5517 1.609 0.8713 0.005 \n",
"0 0.0168 0.1829 0.0156 0.0156 0.0468 0 0 \n",
"0 0.2903 0.5562 0.8947 0.8896 1.679 1.225 0.0313 \n",
"0 0.242 0.4869 0.7402 0.7539 1.4754 1.1365 0.0022 \n",
"0 0.4604 2.8161 7.8509 10.3425 7.8624 38.0809 0.0373 \n",
"0 0 0.1377 0.0156 0 0.0534 0.001 0.001 \n",
"0 0.0072 0.1966 0.0731 0.0313 0.5917 0.0313 0.0156 \n",
"0 0.0171 0.1392 0.0156 0.0179 0.0541 0.005 0.002 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print('Time complexity')\n",
"time_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Analyze the performance of ROC and Precision @ n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ROC Performance\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Data | \n",
" #Samples | \n",
" # Dimensions | \n",
" Outlier Perc | \n",
" ABOD | \n",
" CBLOF | \n",
" FB | \n",
" HBOS | \n",
" IForest | \n",
" KNN | \n",
" LOF | \n",
" MCD | \n",
" OCSVM | \n",
" PCA | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" arrhythmia | \n",
" 452 | \n",
" 274 | \n",
" 14.6018 | \n",
" 0.7687 | \n",
" 0.778 | \n",
" 0.7736 | \n",
" 0.8511 | \n",
" 0.8217 | \n",
" 0.782 | \n",
" 0.7787 | \n",
" 0.8228 | \n",
" 0.7986 | \n",
" 0.7997 | \n",
"
\n",
" \n",
" 0 | \n",
" cardio | \n",
" 1831 | \n",
" 21 | \n",
" 9.6122 | \n",
" 0.5668 | \n",
" 0.8987 | \n",
" 0.5667 | \n",
" 0.8102 | \n",
" 0.8726 | \n",
" 0.7252 | \n",
" 0.5313 | \n",
" 0.7966 | \n",
" 0.9055 | \n",
" 0.9237 | \n",
"
\n",
" \n",
" 0 | \n",
" glass | \n",
" 214 | \n",
" 9 | \n",
" 4.2056 | \n",
" 0.7605 | \n",
" 0.7457 | \n",
" 0.758 | \n",
" 0.6346 | \n",
" 0.5556 | \n",
" 0.8198 | \n",
" 0.8395 | \n",
" 0.7728 | \n",
" 0.5506 | \n",
" 0.5506 | \n",
"
\n",
" \n",
" 0 | \n",
" ionosphere | \n",
" 351 | \n",
" 33 | \n",
" 35.8974 | \n",
" 0.92 | \n",
" 0.812 | \n",
" 0.9004 | \n",
" 0.6005 | \n",
" 0.8587 | \n",
" 0.9378 | \n",
" 0.9063 | \n",
" 0.9513 | \n",
" 0.8497 | \n",
" 0.8025 | \n",
"
\n",
" \n",
" 0 | \n",
" letter | \n",
" 1600 | \n",
" 32 | \n",
" 6.25 | \n",
" 0.8992 | \n",
" 0.5905 | \n",
" 0.8938 | \n",
" 0.6328 | \n",
" 0.6445 | \n",
" 0.8972 | \n",
" 0.8821 | \n",
" 0.8766 | \n",
" 0.6071 | \n",
" 0.5265 | \n",
"
\n",
" \n",
" 0 | \n",
" lympho | \n",
" 148 | \n",
" 18 | \n",
" 4.0541 | \n",
" 0.7155 | \n",
" 0.9914 | \n",
" 0.9483 | \n",
" 1 | \n",
" 1 | \n",
" 0.9397 | \n",
" 0.9569 | \n",
" 0.9483 | \n",
" 0.9655 | \n",
" 0.9914 | \n",
"
\n",
" \n",
" 0 | \n",
" mnist | \n",
" 7603 | \n",
" 100 | \n",
" 9.2069 | \n",
" 0.7747 | \n",
" 0.8431 | \n",
" 0.7246 | \n",
" 0.5769 | \n",
" 0.8033 | \n",
" 0.8431 | \n",
" 0.7101 | \n",
" 0.9059 | \n",
" 0.851 | \n",
" 0.8497 | \n",
"
\n",
" \n",
" 0 | \n",
" musk | \n",
" 3062 | \n",
" 166 | \n",
" 3.1679 | \n",
" 0.2716 | \n",
" 1 | \n",
" 0.6591 | \n",
" 1 | \n",
" 1 | \n",
" 0.8247 | \n",
" 0.6128 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" 0 | \n",
" optdigits | \n",
" 5216 | \n",
" 64 | \n",
" 2.8758 | \n",
" 0.4971 | \n",
" 0.5922 | \n",
" 0.4715 | \n",
" 0.8553 | \n",
" 0.7033 | \n",
" 0.4029 | \n",
" 0.4934 | \n",
" 0.4041 | \n",
" 0.4808 | \n",
" 0.5016 | \n",
"
\n",
" \n",
" 0 | \n",
" pendigits | \n",
" 6870 | \n",
" 16 | \n",
" 2.2707 | \n",
" 0.6957 | \n",
" 0.9329 | \n",
" 0.4277 | \n",
" 0.9322 | \n",
" 0.9756 | \n",
" 0.7694 | \n",
" 0.4056 | \n",
" 0.8413 | \n",
" 0.9376 | \n",
" 0.9384 | \n",
"
\n",
" \n",
" 0 | \n",
" pima | \n",
" 768 | \n",
" 8 | \n",
" 34.8958 | \n",
" 0.6623 | \n",
" 0.7654 | \n",
" 0.6523 | \n",
" 0.7016 | \n",
" 0.696 | \n",
" 0.71 | \n",
" 0.6455 | \n",
" 0.6755 | \n",
" 0.6024 | \n",
" 0.6508 | \n",
"
\n",
" \n",
" 0 | \n",
" satellite | \n",
" 6435 | \n",
" 36 | \n",
" 31.6395 | \n",
" 0.5821 | \n",
" 0.5428 | \n",
" 0.5469 | \n",
" 0.7471 | \n",
" 0.7153 | \n",
" 0.6868 | \n",
" 0.5509 | \n",
" 0.8059 | \n",
" 0.6564 | \n",
" 0.5902 | \n",
"
\n",
" \n",
" 0 | \n",
" satimage-2 | \n",
" 5803 | \n",
" 36 | \n",
" 1.2235 | \n",
" 0.8216 | \n",
" 0.9372 | \n",
" 0.3658 | \n",
" 0.9862 | \n",
" 0.9959 | \n",
" 0.9528 | \n",
" 0.3717 | \n",
" 0.9949 | \n",
" 0.9975 | \n",
" 0.9901 | \n",
"
\n",
" \n",
" 0 | \n",
" shuttle | \n",
" 49097 | \n",
" 9 | \n",
" 7.1511 | \n",
" 0.6164 | \n",
" 0.9899 | \n",
" 0.5342 | \n",
" 0.9847 | \n",
" 0.9963 | \n",
" 0.6409 | \n",
" 0.5373 | \n",
" 0.9898 | \n",
" 0.9919 | \n",
" 0.9901 | \n",
"
\n",
" \n",
" 0 | \n",
" vertebral | \n",
" 240 | \n",
" 6 | \n",
" 12.5 | \n",
" 0.3435 | \n",
" 0.504 | \n",
" 0.4013 | \n",
" 0.4278 | \n",
" 0.3628 | \n",
" 0.3226 | \n",
" 0.4382 | \n",
" 0.3772 | \n",
" 0.3868 | \n",
" 0.427 | \n",
"
\n",
" \n",
" 0 | \n",
" vowels | \n",
" 1456 | \n",
" 12 | \n",
" 3.4341 | \n",
" 0.9718 | \n",
" 0.5681 | \n",
" 0.9304 | \n",
" 0.7306 | \n",
" 0.7791 | \n",
" 0.9661 | \n",
" 0.9277 | \n",
" 0.6799 | \n",
" 0.7739 | \n",
" 0.6362 | \n",
"
\n",
" \n",
" 0 | \n",
" wbc | \n",
" 378 | \n",
" 30 | \n",
" 5.5556 | \n",
" 0.9723 | \n",
" 0.9658 | \n",
" 0.9562 | \n",
" 0.9678 | \n",
" 0.9516 | \n",
" 0.962 | \n",
" 0.9542 | \n",
" 0.9523 | \n",
" 0.9555 | \n",
" 0.951 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Data #Samples # Dimensions Outlier Perc ABOD CBLOF FB \\\n",
"0 arrhythmia 452 274 14.6018 0.7687 0.778 0.7736 \n",
"0 cardio 1831 21 9.6122 0.5668 0.8987 0.5667 \n",
"0 glass 214 9 4.2056 0.7605 0.7457 0.758 \n",
"0 ionosphere 351 33 35.8974 0.92 0.812 0.9004 \n",
"0 letter 1600 32 6.25 0.8992 0.5905 0.8938 \n",
"0 lympho 148 18 4.0541 0.7155 0.9914 0.9483 \n",
"0 mnist 7603 100 9.2069 0.7747 0.8431 0.7246 \n",
"0 musk 3062 166 3.1679 0.2716 1 0.6591 \n",
"0 optdigits 5216 64 2.8758 0.4971 0.5922 0.4715 \n",
"0 pendigits 6870 16 2.2707 0.6957 0.9329 0.4277 \n",
"0 pima 768 8 34.8958 0.6623 0.7654 0.6523 \n",
"0 satellite 6435 36 31.6395 0.5821 0.5428 0.5469 \n",
"0 satimage-2 5803 36 1.2235 0.8216 0.9372 0.3658 \n",
"0 shuttle 49097 9 7.1511 0.6164 0.9899 0.5342 \n",
"0 vertebral 240 6 12.5 0.3435 0.504 0.4013 \n",
"0 vowels 1456 12 3.4341 0.9718 0.5681 0.9304 \n",
"0 wbc 378 30 5.5556 0.9723 0.9658 0.9562 \n",
"\n",
" HBOS IForest KNN LOF MCD OCSVM PCA \n",
"0 0.8511 0.8217 0.782 0.7787 0.8228 0.7986 0.7997 \n",
"0 0.8102 0.8726 0.7252 0.5313 0.7966 0.9055 0.9237 \n",
"0 0.6346 0.5556 0.8198 0.8395 0.7728 0.5506 0.5506 \n",
"0 0.6005 0.8587 0.9378 0.9063 0.9513 0.8497 0.8025 \n",
"0 0.6328 0.6445 0.8972 0.8821 0.8766 0.6071 0.5265 \n",
"0 1 1 0.9397 0.9569 0.9483 0.9655 0.9914 \n",
"0 0.5769 0.8033 0.8431 0.7101 0.9059 0.851 0.8497 \n",
"0 1 1 0.8247 0.6128 1 1 1 \n",
"0 0.8553 0.7033 0.4029 0.4934 0.4041 0.4808 0.5016 \n",
"0 0.9322 0.9756 0.7694 0.4056 0.8413 0.9376 0.9384 \n",
"0 0.7016 0.696 0.71 0.6455 0.6755 0.6024 0.6508 \n",
"0 0.7471 0.7153 0.6868 0.5509 0.8059 0.6564 0.5902 \n",
"0 0.9862 0.9959 0.9528 0.3717 0.9949 0.9975 0.9901 \n",
"0 0.9847 0.9963 0.6409 0.5373 0.9898 0.9919 0.9901 \n",
"0 0.4278 0.3628 0.3226 0.4382 0.3772 0.3868 0.427 \n",
"0 0.7306 0.7791 0.9661 0.9277 0.6799 0.7739 0.6362 \n",
"0 0.9678 0.9516 0.962 0.9542 0.9523 0.9555 0.951 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print('ROC Performance')\n",
"roc_df"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Precision @ n Performance\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Data | \n",
" #Samples | \n",
" # Dimensions | \n",
" Outlier Perc | \n",
" ABOD | \n",
" CBLOF | \n",
" FB | \n",
" HBOS | \n",
" IForest | \n",
" KNN | \n",
" LOF | \n",
" MCD | \n",
" OCSVM | \n",
" PCA | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" arrhythmia | \n",
" 452 | \n",
" 274 | \n",
" 14.6018 | \n",
" 0.3571 | \n",
" 0.5 | \n",
" 0.5 | \n",
" 0.5714 | \n",
" 0.5 | \n",
" 0.5 | \n",
" 0.4643 | \n",
" 0.4286 | \n",
" 0.5 | \n",
" 0.5 | \n",
"
\n",
" \n",
" 0 | \n",
" cardio | \n",
" 1831 | \n",
" 21 | \n",
" 9.6122 | \n",
" 0.209 | \n",
" 0.5075 | \n",
" 0.1194 | \n",
" 0.3731 | \n",
" 0.3433 | \n",
" 0.2388 | \n",
" 0.1493 | \n",
" 0.3284 | \n",
" 0.3731 | \n",
" 0.4925 | \n",
"
\n",
" \n",
" 0 | \n",
" glass | \n",
" 214 | \n",
" 9 | \n",
" 4.2056 | \n",
" 0 | \n",
" 0 | \n",
" 0.2 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0.2 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 0 | \n",
" ionosphere | \n",
" 351 | \n",
" 33 | \n",
" 35.8974 | \n",
" 0.8333 | \n",
" 0.6111 | \n",
" 0.7407 | \n",
" 0.4259 | \n",
" 0.6667 | \n",
" 0.8704 | \n",
" 0.7407 | \n",
" 0.8704 | \n",
" 0.7593 | \n",
" 0.6481 | \n",
"
\n",
" \n",
" 0 | \n",
" letter | \n",
" 1600 | \n",
" 32 | \n",
" 6.25 | \n",
" 0.3438 | \n",
" 0.0625 | \n",
" 0.4062 | \n",
" 0.0312 | \n",
" 0.0312 | \n",
" 0.3438 | \n",
" 0.3125 | \n",
" 0.125 | \n",
" 0.0938 | \n",
" 0.0625 | \n",
"
\n",
" \n",
" 0 | \n",
" lympho | \n",
" 148 | \n",
" 18 | \n",
" 4.0541 | \n",
" 0 | \n",
" 0.5 | \n",
" 0.5 | \n",
" 1 | \n",
" 1 | \n",
" 0.5 | \n",
" 0.5 | \n",
" 0.5 | \n",
" 0.5 | \n",
" 0.5 | \n",
"
\n",
" \n",
" 0 | \n",
" mnist | \n",
" 7603 | \n",
" 100 | \n",
" 9.2069 | \n",
" 0.384 | \n",
" 0.365 | \n",
" 0.3422 | \n",
" 0.1217 | \n",
" 0.2966 | \n",
" 0.4183 | \n",
" 0.3384 | \n",
" 0.5133 | \n",
" 0.3802 | \n",
" 0.3688 | \n",
"
\n",
" \n",
" 0 | \n",
" musk | \n",
" 3062 | \n",
" 166 | \n",
" 3.1679 | \n",
" 0.0714 | \n",
" 1 | \n",
" 0.2143 | \n",
" 1 | \n",
" 1 | \n",
" 0.2857 | \n",
" 0.2143 | \n",
" 0.9762 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" 0 | \n",
" optdigits | \n",
" 5216 | \n",
" 64 | \n",
" 2.8758 | \n",
" 0.0299 | \n",
" 0 | \n",
" 0.0448 | \n",
" 0.209 | \n",
" 0.0299 | \n",
" 0 | \n",
" 0.0448 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 0 | \n",
" pendigits | \n",
" 6870 | \n",
" 16 | \n",
" 2.2707 | \n",
" 0.1127 | \n",
" 0.2394 | \n",
" 0.0986 | \n",
" 0.3099 | \n",
" 0.3944 | \n",
" 0.1549 | \n",
" 0.0986 | \n",
" 0.0986 | \n",
" 0.3662 | \n",
" 0.3803 | \n",
"
\n",
" \n",
" 0 | \n",
" pima | \n",
" 768 | \n",
" 8 | \n",
" 34.8958 | \n",
" 0.4906 | \n",
" 0.5755 | \n",
" 0.4811 | \n",
" 0.5283 | \n",
" 0.5283 | \n",
" 0.5094 | \n",
" 0.4717 | \n",
" 0.5094 | \n",
" 0.4528 | \n",
" 0.5 | \n",
"
\n",
" \n",
" 0 | \n",
" satellite | \n",
" 6435 | \n",
" 36 | \n",
" 31.6395 | \n",
" 0.4077 | \n",
" 0.3153 | \n",
" 0.3957 | \n",
" 0.5612 | \n",
" 0.5743 | \n",
" 0.5072 | \n",
" 0.3993 | \n",
" 0.6906 | \n",
" 0.5372 | \n",
" 0.4712 | \n",
"
\n",
" \n",
" 0 | \n",
" satimage-2 | \n",
" 5803 | \n",
" 36 | \n",
" 1.2235 | \n",
" 0.2 | \n",
" 0.6 | \n",
" 0.04 | \n",
" 0.68 | \n",
" 0.88 | \n",
" 0.28 | \n",
" 0.0417 | \n",
" 0.52 | \n",
" 0.96 | \n",
" 0.8 | \n",
"
\n",
" \n",
" 0 | \n",
" shuttle | \n",
" 49097 | \n",
" 9 | \n",
" 7.1511 | \n",
" 0.1785 | \n",
" 0.95 | \n",
" 0.0836 | \n",
" 0.9657 | \n",
" 0.9022 | \n",
" 0.2028 | \n",
" 0.1329 | \n",
" 0.7192 | \n",
" 0.9559 | \n",
" 0.9507 | \n",
"
\n",
" \n",
" 0 | \n",
" vertebral | \n",
" 240 | \n",
" 6 | \n",
" 12.5 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 0 | \n",
" vowels | \n",
" 1456 | \n",
" 12 | \n",
" 3.4341 | \n",
" 0.5 | \n",
" 0 | \n",
" 0.3636 | \n",
" 0.1818 | \n",
" 0.2727 | \n",
" 0.4545 | \n",
" 0.3182 | \n",
" 0.0455 | \n",
" 0.3182 | \n",
" 0.2727 | \n",
"
\n",
" \n",
" 0 | \n",
" wbc | \n",
" 378 | \n",
" 30 | \n",
" 5.5556 | \n",
" 0.7273 | \n",
" 0.6364 | \n",
" 0.7273 | \n",
" 0.7273 | \n",
" 0.6364 | \n",
" 0.6364 | \n",
" 0.7273 | \n",
" 0.5455 | \n",
" 0.6364 | \n",
" 0.6364 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Data #Samples # Dimensions Outlier Perc ABOD CBLOF FB \\\n",
"0 arrhythmia 452 274 14.6018 0.3571 0.5 0.5 \n",
"0 cardio 1831 21 9.6122 0.209 0.5075 0.1194 \n",
"0 glass 214 9 4.2056 0 0 0.2 \n",
"0 ionosphere 351 33 35.8974 0.8333 0.6111 0.7407 \n",
"0 letter 1600 32 6.25 0.3438 0.0625 0.4062 \n",
"0 lympho 148 18 4.0541 0 0.5 0.5 \n",
"0 mnist 7603 100 9.2069 0.384 0.365 0.3422 \n",
"0 musk 3062 166 3.1679 0.0714 1 0.2143 \n",
"0 optdigits 5216 64 2.8758 0.0299 0 0.0448 \n",
"0 pendigits 6870 16 2.2707 0.1127 0.2394 0.0986 \n",
"0 pima 768 8 34.8958 0.4906 0.5755 0.4811 \n",
"0 satellite 6435 36 31.6395 0.4077 0.3153 0.3957 \n",
"0 satimage-2 5803 36 1.2235 0.2 0.6 0.04 \n",
"0 shuttle 49097 9 7.1511 0.1785 0.95 0.0836 \n",
"0 vertebral 240 6 12.5 0 0 0 \n",
"0 vowels 1456 12 3.4341 0.5 0 0.3636 \n",
"0 wbc 378 30 5.5556 0.7273 0.6364 0.7273 \n",
"\n",
" HBOS IForest KNN LOF MCD OCSVM PCA \n",
"0 0.5714 0.5 0.5 0.4643 0.4286 0.5 0.5 \n",
"0 0.3731 0.3433 0.2388 0.1493 0.3284 0.3731 0.4925 \n",
"0 0 0 0 0.2 0 0 0 \n",
"0 0.4259 0.6667 0.8704 0.7407 0.8704 0.7593 0.6481 \n",
"0 0.0312 0.0312 0.3438 0.3125 0.125 0.0938 0.0625 \n",
"0 1 1 0.5 0.5 0.5 0.5 0.5 \n",
"0 0.1217 0.2966 0.4183 0.3384 0.5133 0.3802 0.3688 \n",
"0 1 1 0.2857 0.2143 0.9762 1 1 \n",
"0 0.209 0.0299 0 0.0448 0 0 0 \n",
"0 0.3099 0.3944 0.1549 0.0986 0.0986 0.3662 0.3803 \n",
"0 0.5283 0.5283 0.5094 0.4717 0.5094 0.4528 0.5 \n",
"0 0.5612 0.5743 0.5072 0.3993 0.6906 0.5372 0.4712 \n",
"0 0.68 0.88 0.28 0.0417 0.52 0.96 0.8 \n",
"0 0.9657 0.9022 0.2028 0.1329 0.7192 0.9559 0.9507 \n",
"0 0 0 0 0 0 0 0 \n",
"0 0.1818 0.2727 0.4545 0.3182 0.0455 0.3182 0.2727 \n",
"0 0.7273 0.6364 0.6364 0.7273 0.5455 0.6364 0.6364 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print('Precision @ n Performance')\n",
"prn_df"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}