{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Benchmark of various outlier detection models\n",
"\n",
"### The models are evaluaed on ROC, Precision @ n and execution time on 17 benchmark datasets. All datasets are splitted 60% for training and 40% for testing.\n",
"\n",
"**[PyOD](https://github.com/yzhao062/pyod)** is a comprehensive **Python toolkit** to **identify outlying objects** in \n",
"multivariate data with both unsupervised and supervised approaches.\n",
"The model covered in this example includes:\n",
"\n",
" 1. Linear Models for Outlier Detection:\n",
" 1. **PCA: Principal Component Analysis** use the sum of\n",
" weighted projected distances to the eigenvector hyperplane \n",
" as the outlier outlier scores)\n",
" 2. **MCD: Minimum Covariance Determinant** (use the mahalanobis distances \n",
" as the outlier scores)\n",
" 3. **OCSVM: One-Class Support Vector Machines**\n",
" \n",
" 2. Proximity-Based Outlier Detection Models:\n",
" 1. **LOF: Local Outlier Factor**\n",
" 2. **CBLOF: Clustering-Based Local Outlier Factor**\n",
" 3. **kNN: k Nearest Neighbors** (use the distance to the kth nearest \n",
" neighbor as the outlier score)\n",
" 4. **Median kNN** Outlier Detection (use the median distance to k nearest \n",
" neighbors as the outlier score)\n",
" 5. **HBOS: Histogram-based Outlier Score**\n",
" \n",
" 3. Probabilistic Models for Outlier Detection:\n",
" 1. **ABOD: Angle-Based Outlier Detection**\n",
" \n",
" 4. Outlier Ensembles and Combination Frameworks\n",
" 1. **Isolation Forest**\n",
" 2. **Feature Bagging**\n",
" 3. **LSCP**\n",
" \n",
"Corresponding file could be found at /examples/compare_all_models.py"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from __future__ import division\n",
"from __future__ import print_function\n",
"\n",
"import os\n",
"import sys\n",
"from time import time\n",
"\n",
"# temporary solution for relative imports in case pyod is not installed\n",
"# if pyod is installed, no need to use the following line\n",
"sys.path.append(\n",
" os.path.abspath(os.path.join(os.path.dirname(\"__file__\"), '..')))\n",
"# supress warnings for clean output\n",
"import warnings\n",
"\n",
"warnings.filterwarnings(\"ignore\")\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn.model_selection import train_test_split\n",
"from scipy.io import loadmat\n",
"\n",
"from pyod.models.abod import ABOD\n",
"from pyod.models.cblof import CBLOF\n",
"from pyod.models.feature_bagging import FeatureBagging\n",
"from pyod.models.hbos import HBOS\n",
"from pyod.models.iforest import IForest\n",
"from pyod.models.knn import KNN\n",
"from pyod.models.lof import LOF\n",
"from pyod.models.mcd import MCD\n",
"from pyod.models.ocsvm import OCSVM\n",
"from pyod.models.pca import PCA\n",
"from pyod.models.lscp import LSCP\n",
"\n",
"from pyod.utils.utility import standardizer\n",
"from pyod.utils.utility import precision_n_scores\n",
"from sklearn.metrics import roc_auc_score"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"... Processing arrhythmia.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.7687, precision @ rank n:0.3571, execution time: 0.1454s\n",
"Cluster-based Local Outlier Factor ROC:0.778, precision @ rank n:0.5, execution time: 0.0301s\n",
"Feature Bagging ROC:0.7736, precision @ rank n:0.5, execution time: 0.5825s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.8511, precision @ rank n:0.5714, execution time: 0.0622s\n",
"Isolation Forest ROC:0.8217, precision @ rank n:0.5, execution time: 0.2477s\n",
"K Nearest Neighbors (KNN) ROC:0.782, precision @ rank n:0.5, execution time: 0.0932s\n",
"Local Outlier Factor (LOF) ROC:0.7787, precision @ rank n:0.4643, execution time: 0.0681s\n",
"Minimum Covariance Determinant (MCD) ROC:0.8228, precision @ rank n:0.4286, execution time: 0.5083s\n",
"One-class SVM (OCSVM) ROC:0.7986, precision @ rank n:0.5, execution time: 0.0471s\n",
"Principal Component Analysis (PCA) ROC:0.7997, precision @ rank n:0.5, execution time: 0.0602s\n",
"Locally Selective Combination (LSCP) ROC:0.7754, precision @ rank n:0.4286, execution time: 2.4836s\n",
"\n",
"... Processing cardio.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.5952, precision @ rank n:0.1884, execution time: 0.399s\n",
"Cluster-based Local Outlier Factor ROC:0.8894, precision @ rank n:0.4928, execution time: 0.0261s\n",
"Feature Bagging ROC:0.5628, precision @ rank n:0.1594, execution time: 0.8914s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.8227, precision @ rank n:0.4783, execution time: 0.006s\n",
"Isolation Forest ROC:0.8953, precision @ rank n:0.4493, execution time: 0.3219s\n",
"K Nearest Neighbors (KNN) ROC:0.7442, precision @ rank n:0.2899, execution time: 0.3168s\n",
"Local Outlier Factor (LOF) ROC:0.5459, precision @ rank n:0.1594, execution time: 0.1494s\n",
"Minimum Covariance Determinant (MCD) ROC:0.7774, precision @ rank n:0.4203, execution time: 0.6467s\n",
"One-class SVM (OCSVM) ROC:0.914, precision @ rank n:0.4493, execution time: 0.0983s\n",
"Principal Component Analysis (PCA) ROC:0.9323, precision @ rank n:0.5507, execution time: 0.004s\n",
"Locally Selective Combination (LSCP) ROC:0.622, precision @ rank n:0.1884, execution time: 5.0424s\n",
"\n",
"... Processing glass.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.8588, precision @ rank n:0.0, execution time: 0.0401s\n",
"Cluster-based Local Outlier Factor ROC:0.7765, precision @ rank n:0.0, execution time: 0.012s\n",
"Feature Bagging ROC:0.4235, precision @ rank n:0.0, execution time: 0.0281s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.6, precision @ rank n:0.0, execution time: 0.002s\n",
"Isolation Forest ROC:0.7765, precision @ rank n:0.0, execution time: 0.1684s\n",
"K Nearest Neighbors (KNN) ROC:0.8353, precision @ rank n:0.0, execution time: 0.018s\n",
"Local Outlier Factor (LOF) ROC:0.3882, precision @ rank n:0.0, execution time: 0.003s\n",
"Minimum Covariance Determinant (MCD) ROC:0.8353, precision @ rank n:0.0, execution time: 0.0431s\n",
"One-class SVM (OCSVM) ROC:0.7529, precision @ rank n:0.0, execution time: 0.001s\n",
"Principal Component Analysis (PCA) ROC:0.7176, precision @ rank n:0.0, execution time: 0.002s\n",
"Locally Selective Combination (LSCP) ROC:0.7529, precision @ rank n:0.0, execution time: 0.2878s\n",
"\n",
"... Processing ionosphere.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.9302, precision @ rank n:0.8462, execution time: 0.0792s\n",
"Cluster-based Local Outlier Factor ROC:0.8073, precision @ rank n:0.6154, execution time: 0.0271s\n",
"Feature Bagging ROC:0.9092, precision @ rank n:0.7692, execution time: 0.0722s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.5869, precision @ rank n:0.4038, execution time: 0.008s\n",
"Isolation Forest ROC:0.8734, precision @ rank n:0.7115, execution time: 0.2347s\n",
"K Nearest Neighbors (KNN) ROC:0.9358, precision @ rank n:0.8846, execution time: 0.0251s\n",
"Local Outlier Factor (LOF) ROC:0.9114, precision @ rank n:0.7692, execution time: 0.006s\n",
"Minimum Covariance Determinant (MCD) ROC:0.9576, precision @ rank n:0.9038, execution time: 0.0682s\n",
"One-class SVM (OCSVM) ROC:0.8861, precision @ rank n:0.8077, execution time: 0.005s\n",
"Principal Component Analysis (PCA) ROC:0.8204, precision @ rank n:0.6154, execution time: 0.002s\n",
"Locally Selective Combination (LSCP) ROC:0.9041, precision @ rank n:0.75, execution time: 0.5264s\n",
"\n",
"... Processing letter.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.9035, precision @ rank n:0.4255, execution time: 0.3579s\n",
"Cluster-based Local Outlier Factor ROC:0.5555, precision @ rank n:0.0851, execution time: 0.021s\n",
"Feature Bagging ROC:0.9077, precision @ rank n:0.4894, execution time: 0.751s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.6056, precision @ rank n:0.1915, execution time: 0.008s\n",
"Isolation Forest ROC:0.5945, precision @ rank n:0.1064, execution time: 0.2597s\n",
"K Nearest Neighbors (KNN) ROC:0.8909, precision @ rank n:0.4043, execution time: 0.1584s\n",
"Local Outlier Factor (LOF) ROC:0.8821, precision @ rank n:0.4681, execution time: 0.1203s\n",
"Minimum Covariance Determinant (MCD) ROC:0.8144, precision @ rank n:0.1915, execution time: 1.1551s\n",
"One-class SVM (OCSVM) ROC:0.5727, precision @ rank n:0.1489, execution time: 0.0852s\n",
"Principal Component Analysis (PCA) ROC:0.5104, precision @ rank n:0.1277, execution time: 0.004s\n",
"Locally Selective Combination (LSCP) ROC:0.857, precision @ rank n:0.4043, execution time: 4.7737s\n",
"\n",
"... Processing lympho.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.9357, precision @ rank n:0.3333, execution time: 0.0231s\n",
"Cluster-based Local Outlier Factor ROC:0.9708, precision @ rank n:0.6667, execution time: 0.015s\n",
"Feature Bagging ROC:0.924, precision @ rank n:0.3333, execution time: 0.0231s\n",
"Histogram-base Outlier Detection (HBOS) ROC:1.0, precision @ rank n:1.0, execution time: 0.004s\n",
"Isolation Forest ROC:0.9942, precision @ rank n:0.6667, execution time: 0.2065s\n",
"K Nearest Neighbors (KNN) ROC:0.9064, precision @ rank n:0.3333, execution time: 0.011s\n",
"Local Outlier Factor (LOF) ROC:0.924, precision @ rank n:0.3333, execution time: 0.003s\n",
"Minimum Covariance Determinant (MCD) ROC:0.7778, precision @ rank n:0.0, execution time: 0.0491s\n",
"One-class SVM (OCSVM) ROC:0.9357, precision @ rank n:0.3333, execution time: 0.002s\n",
"Principal Component Analysis (PCA) ROC:0.9649, precision @ rank n:0.6667, execution time: 0.001s\n",
"Locally Selective Combination (LSCP) ROC:0.9357, precision @ rank n:0.3333, execution time: 0.2988s\n",
"\n",
"... Processing mnist.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.7978, precision @ rank n:0.3594, execution time: 7.2033s\n",
"Cluster-based Local Outlier Factor ROC:0.8477, precision @ rank n:0.3915, execution time: 0.0622s\n",
"Feature Bagging ROC:0.7451, precision @ rank n:0.3452, execution time: 57.4463s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.5645, precision @ rank n:0.1174, execution time: 0.0451s\n",
"Isolation Forest ROC:0.8154, precision @ rank n:0.3096, execution time: 1.7867s\n",
"K Nearest Neighbors (KNN) ROC:0.8643, precision @ rank n:0.4448, execution time: 7.1259s\n",
"Local Outlier Factor (LOF) ROC:0.7442, precision @ rank n:0.3523, execution time: 6.4782s\n",
"Minimum Covariance Determinant (MCD) ROC:0.8926, precision @ rank n:0.4875, execution time: 2.4916s\n",
"One-class SVM (OCSVM) ROC:0.8595, precision @ rank n:0.3915, execution time: 4.6975s\n",
"Principal Component Analysis (PCA) ROC:0.8572, precision @ rank n:0.3843, execution time: 0.1494s\n",
"Locally Selective Combination (LSCP) ROC:0.7873, precision @ rank n:0.3665, execution time: 191.5348s\n",
"\n",
"... Processing musk.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.2111, precision @ rank n:0.0488, execution time: 2.3532s\n",
"Cluster-based Local Outlier Factor ROC:0.9864, precision @ rank n:0.6829, execution time: 0.0361s\n",
"Feature Bagging ROC:0.6141, precision @ rank n:0.2195, execution time: 13.4944s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.9999, precision @ rank n:0.9756, execution time: 0.0822s\n",
"Isolation Forest ROC:0.9997, precision @ rank n:0.9756, execution time: 1.5561s\n",
"K Nearest Neighbors (KNN) ROC:0.8224, precision @ rank n:0.2439, execution time: 2.3803s\n",
"Local Outlier Factor (LOF) ROC:0.6232, precision @ rank n:0.2195, execution time: 2.0785s\n",
"Minimum Covariance Determinant (MCD) ROC:0.9984, precision @ rank n:0.878, execution time: 14.7882s\n",
"One-class SVM (OCSVM) ROC:1.0, precision @ rank n:1.0, execution time: 1.3396s\n",
"Principal Component Analysis (PCA) ROC:0.9999, precision @ rank n:0.9512, execution time: 0.1594s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Locally Selective Combination (LSCP) ROC:0.5304, precision @ rank n:0.1463, execution time: 116.7357s\n",
"\n",
"... Processing optdigits.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.4294, precision @ rank n:0.0149, execution time: 3.1473s\n",
"Cluster-based Local Outlier Factor ROC:0.49, precision @ rank n:0.0, execution time: 0.0261s\n",
"Feature Bagging ROC:0.4108, precision @ rank n:0.0149, execution time: 14.9847s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.835, precision @ rank n:0.209, execution time: 0.0341s\n",
"Isolation Forest ROC:0.7365, precision @ rank n:0.0299, execution time: 0.8773s\n",
"K Nearest Neighbors (KNN) ROC:0.3836, precision @ rank n:0.0, execution time: 1.9953s\n",
"Local Outlier Factor (LOF) ROC:0.3996, precision @ rank n:0.0149, execution time: 1.7457s\n",
"Minimum Covariance Determinant (MCD) ROC:0.3791, precision @ rank n:0.0, execution time: 1.0467s\n",
"One-class SVM (OCSVM) ROC:0.532, precision @ rank n:0.0, execution time: 1.4028s\n",
"Principal Component Analysis (PCA) ROC:0.525, precision @ rank n:0.0, execution time: 0.0451s\n",
"Locally Selective Combination (LSCP) ROC:0.3975, precision @ rank n:0.0149, execution time: 60.2475s\n",
"\n",
"... Processing pendigits.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.6608, precision @ rank n:0.1224, execution time: 1.4017s\n",
"Cluster-based Local Outlier Factor ROC:0.934, precision @ rank n:0.2041, execution time: 0.0551s\n",
"Feature Bagging ROC:0.3992, precision @ rank n:0.0408, execution time: 4.9451s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.9209, precision @ rank n:0.3061, execution time: 0.011s\n",
"Isolation Forest ROC:0.9296, precision @ rank n:0.3061, execution time: 0.5765s\n",
"K Nearest Neighbors (KNN) ROC:0.7086, precision @ rank n:0.0408, execution time: 0.9656s\n",
"Local Outlier Factor (LOF) ROC:0.419, precision @ rank n:0.0408, execution time: 0.6006s\n",
"Minimum Covariance Determinant (MCD) ROC:0.8369, precision @ rank n:0.0612, execution time: 1.8158s\n",
"One-class SVM (OCSVM) ROC:0.9267, precision @ rank n:0.2449, execution time: 1.0207s\n",
"Principal Component Analysis (PCA) ROC:0.9359, precision @ rank n:0.2653, execution time: 0.01s\n",
"Locally Selective Combination (LSCP) ROC:0.487, precision @ rank n:0.0408, execution time: 25.4254s\n",
"\n",
"... Processing pima.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.6864, precision @ rank n:0.5847, execution time: 0.1334s\n",
"Cluster-based Local Outlier Factor ROC:0.7064, precision @ rank n:0.5678, execution time: 0.0221s\n",
"Feature Bagging ROC:0.618, precision @ rank n:0.4746, execution time: 0.0942s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.7031, precision @ rank n:0.5847, execution time: 0.003s\n",
"Isolation Forest ROC:0.6704, precision @ rank n:0.5424, execution time: 0.2035s\n",
"K Nearest Neighbors (KNN) ROC:0.712, precision @ rank n:0.5847, execution time: 0.0662s\n",
"Local Outlier Factor (LOF) ROC:0.6356, precision @ rank n:0.5169, execution time: 0.01s\n",
"Minimum Covariance Determinant (MCD) ROC:0.7026, precision @ rank n:0.5678, execution time: 0.0852s\n",
"One-class SVM (OCSVM) ROC:0.6297, precision @ rank n:0.5085, execution time: 0.012s\n",
"Principal Component Analysis (PCA) ROC:0.6602, precision @ rank n:0.5508, execution time: 0.002s\n",
"Locally Selective Combination (LSCP) ROC:0.6538, precision @ rank n:0.5339, execution time: 1.1691s\n",
"\n",
"... Processing satellite.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.5676, precision @ rank n:0.4078, execution time: 1.89s\n",
"Cluster-based Local Outlier Factor ROC:0.7307, precision @ rank n:0.4539, execution time: 0.025s\n",
"Feature Bagging ROC:0.5645, precision @ rank n:0.4054, execution time: 8.0935s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.7593, precision @ rank n:0.5804, execution time: 0.019s\n",
"Isolation Forest ROC:0.6947, precision @ rank n:0.5686, execution time: 0.8693s\n",
"K Nearest Neighbors (KNN) ROC:0.6827, precision @ rank n:0.5, execution time: 1.2533s\n",
"Local Outlier Factor (LOF) ROC:0.5676, precision @ rank n:0.4066, execution time: 1.0658s\n",
"Minimum Covariance Determinant (MCD) ROC:0.7991, precision @ rank n:0.6832, execution time: 2.8646s\n",
"One-class SVM (OCSVM) ROC:0.6551, precision @ rank n:0.5355, execution time: 1.3265s\n",
"Principal Component Analysis (PCA) ROC:0.5976, precision @ rank n:0.4787, execution time: 0.0271s\n",
"Locally Selective Combination (LSCP) ROC:0.5809, precision @ rank n:0.4184, execution time: 34.2882s\n",
"\n",
"... Processing satimage-2.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.8497, precision @ rank n:0.1667, execution time: 1.7396s\n",
"Cluster-based Local Outlier Factor ROC:0.9568, precision @ rank n:0.5, execution time: 0.0531s\n",
"Feature Bagging ROC:0.4798, precision @ rank n:0.1, execution time: 7.2252s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.9948, precision @ rank n:0.7, execution time: 0.0211s\n",
"Isolation Forest ROC:0.9997, precision @ rank n:0.9333, execution time: 0.6788s\n",
"K Nearest Neighbors (KNN) ROC:0.9693, precision @ rank n:0.4, execution time: 1.1992s\n",
"Local Outlier Factor (LOF) ROC:0.4819, precision @ rank n:0.1, execution time: 0.9445s\n",
"Minimum Covariance Determinant (MCD) ROC:0.996, precision @ rank n:0.7, execution time: 2.1818s\n",
"One-class SVM (OCSVM) ROC:1.0, precision @ rank n:1.0, execution time: 1.114s\n",
"Principal Component Analysis (PCA) ROC:0.9974, precision @ rank n:0.8333, execution time: 0.0211s\n",
"Locally Selective Combination (LSCP) ROC:0.6986, precision @ rank n:0.1, execution time: 33.6747s\n",
"\n",
"... Processing vertebral.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.2428, precision @ rank n:0.0, execution time: 0.0371s\n",
"Cluster-based Local Outlier Factor ROC:0.2588, precision @ rank n:0.0, execution time: 0.023s\n",
"Feature Bagging ROC:0.2909, precision @ rank n:0.0, execution time: 0.0391s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.1524, precision @ rank n:0.0, execution time: 0.003s\n",
"Isolation Forest ROC:0.2449, precision @ rank n:0.0, execution time: 0.1775s\n",
"K Nearest Neighbors (KNN) ROC:0.2791, precision @ rank n:0.0, execution time: 0.018s\n",
"Local Outlier Factor (LOF) ROC:0.293, precision @ rank n:0.0, execution time: 0.003s\n",
"Minimum Covariance Determinant (MCD) ROC:0.3273, precision @ rank n:0.0, execution time: 0.0491s\n",
"One-class SVM (OCSVM) ROC:0.2909, precision @ rank n:0.0, execution time: 0.002s\n",
"Principal Component Analysis (PCA) ROC:0.2439, precision @ rank n:0.0, execution time: 0.001s\n",
"Locally Selective Combination (LSCP) ROC:0.2503, precision @ rank n:0.0, execution time: 0.3359s\n",
"\n",
"... Processing vowels.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.9726, precision @ rank n:0.5, execution time: 0.2637s\n",
"Cluster-based Local Outlier Factor ROC:0.574, precision @ rank n:0.0, execution time: 0.0221s\n",
"Feature Bagging ROC:0.955, precision @ rank n:0.25, execution time: 0.2787s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.6683, precision @ rank n:0.0625, execution time: 0.004s\n",
"Isolation Forest ROC:0.7809, precision @ rank n:0.0625, execution time: 0.2557s\n",
"K Nearest Neighbors (KNN) ROC:0.9775, precision @ rank n:0.3125, execution time: 0.1153s\n",
"Local Outlier Factor (LOF) ROC:0.9514, precision @ rank n:0.3125, execution time: 0.0351s\n",
"Minimum Covariance Determinant (MCD) ROC:0.7081, precision @ rank n:0.0625, execution time: 0.7189s\n",
"One-class SVM (OCSVM) ROC:0.8244, precision @ rank n:0.1875, execution time: 0.0441s\n",
"Principal Component Analysis (PCA) ROC:0.5585, precision @ rank n:0.0, execution time: 0.002s\n",
"Locally Selective Combination (LSCP) ROC:0.9604, precision @ rank n:0.3125, execution time: 2.3161s\n",
"\n",
"... Processing wbc.mat ...\n",
"Angle-based Outlier Detector (ABOD) ROC:0.8803, precision @ rank n:0.2, execution time: 0.0632s\n",
"Cluster-based Local Outlier Factor ROC:0.9374, precision @ rank n:0.4, execution time: 0.015s\n",
"Feature Bagging ROC:0.9224, precision @ rank n:0.2, execution time: 0.0702s\n",
"Histogram-base Outlier Detection (HBOS) ROC:0.9415, precision @ rank n:0.4, execution time: 0.008s\n",
"Isolation Forest ROC:0.9102, precision @ rank n:0.2, execution time: 0.2055s\n",
"K Nearest Neighbors (KNN) ROC:0.9034, precision @ rank n:0.2, execution time: 0.0291s\n",
"Local Outlier Factor (LOF) ROC:0.9211, precision @ rank n:0.2, execution time: 0.007s\n",
"Minimum Covariance Determinant (MCD) ROC:0.9129, precision @ rank n:0.2, execution time: 0.0602s\n",
"One-class SVM (OCSVM) ROC:0.9224, precision @ rank n:0.2, execution time: 0.006s\n",
"Principal Component Analysis (PCA) ROC:0.9102, precision @ rank n:0.2, execution time: 0.002s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Locally Selective Combination (LSCP) ROC:0.9116, precision @ rank n:0.4, execution time: 0.5545s\n"
]
}
],
"source": [
"# Define data file and read X and y\n",
"mat_file_list = ['arrhythmia.mat',\n",
" 'cardio.mat',\n",
" 'glass.mat',\n",
" 'ionosphere.mat',\n",
" 'letter.mat',\n",
" 'lympho.mat',\n",
" 'mnist.mat',\n",
" 'musk.mat',\n",
" 'optdigits.mat',\n",
" 'pendigits.mat',\n",
" 'pima.mat',\n",
" 'satellite.mat',\n",
" 'satimage-2.mat',\n",
"# 'shuttle.mat',\n",
" 'vertebral.mat',\n",
" 'vowels.mat',\n",
" 'wbc.mat']\n",
"\n",
"# Define nine outlier detection tools to be compared\n",
"random_state = np.random.RandomState(42)\n",
"\n",
"df_columns = ['Data', '#Samples', '# Dimensions', 'Outlier Perc',\n",
" 'ABOD', 'CBLOF', 'FB', 'HBOS', 'IForest', 'KNN', 'LOF', 'MCD',\n",
" 'OCSVM', 'PCA', 'LSCP']\n",
"roc_df = pd.DataFrame(columns=df_columns)\n",
"prn_df = pd.DataFrame(columns=df_columns)\n",
"time_df = pd.DataFrame(columns=df_columns)\n",
"\n",
"# initialize a set of detectors for LSCP\n",
"detector_list = [LOF(n_neighbors=5), LOF(n_neighbors=10), LOF(n_neighbors=15),\n",
" LOF(n_neighbors=20), LOF(n_neighbors=25), LOF(n_neighbors=30),\n",
" LOF(n_neighbors=35), LOF(n_neighbors=40), LOF(n_neighbors=45),\n",
" LOF(n_neighbors=50)]\n",
"\n",
"for mat_file in mat_file_list:\n",
" print(\"\\n... Processing\", mat_file, '...')\n",
" mat = loadmat(os.path.join('data', mat_file))\n",
"\n",
" X = mat['X']\n",
" y = mat['y'].ravel()\n",
" outliers_fraction = np.count_nonzero(y) / len(y)\n",
" outliers_percentage = round(outliers_fraction * 100, ndigits=4)\n",
"\n",
" # construct containers for saving results\n",
" roc_list = [mat_file[:-4], X.shape[0], X.shape[1], outliers_percentage]\n",
" prn_list = [mat_file[:-4], X.shape[0], X.shape[1], outliers_percentage]\n",
" time_list = [mat_file[:-4], X.shape[0], X.shape[1], outliers_percentage]\n",
"\n",
" # 60% data for training and 40% for testing\n",
" X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4,\n",
" random_state=random_state)\n",
"\n",
" # standardizing data for processing\n",
" X_train_norm, X_test_norm = standardizer(X_train, X_test)\n",
"\n",
" classifiers = {'Angle-based Outlier Detector (ABOD)': ABOD(\n",
" contamination=outliers_fraction),\n",
" 'Cluster-based Local Outlier Factor': CBLOF(\n",
" contamination=outliers_fraction, check_estimator=False,\n",
" random_state=random_state),\n",
" 'Feature Bagging': FeatureBagging(contamination=outliers_fraction,\n",
" check_estimator=False,\n",
" random_state=random_state),\n",
" 'Histogram-base Outlier Detection (HBOS)': HBOS(\n",
" contamination=outliers_fraction),\n",
" 'Isolation Forest': IForest(contamination=outliers_fraction,\n",
" random_state=random_state),\n",
" 'K Nearest Neighbors (KNN)': KNN(contamination=outliers_fraction),\n",
" 'Local Outlier Factor (LOF)': LOF(\n",
" contamination=outliers_fraction),\n",
" 'Minimum Covariance Determinant (MCD)': MCD(\n",
" contamination=outliers_fraction, random_state=random_state),\n",
" 'One-class SVM (OCSVM)': OCSVM(contamination=outliers_fraction,\n",
" random_state=random_state),\n",
" 'Principal Component Analysis (PCA)': PCA(\n",
" contamination=outliers_fraction, random_state=random_state),\n",
" 'Locally Selective Combination (LSCP)': LSCP(\n",
" detector_list, contamination=outliers_fraction,\n",
" random_state=random_state),\n",
" }\n",
"\n",
" for clf_name, clf in classifiers.items():\n",
" t0 = time()\n",
" clf.fit(X_train_norm)\n",
" test_scores = clf.decision_function(X_test_norm)\n",
" t1 = time()\n",
" duration = round(t1 - t0, ndigits=4)\n",
" time_list.append(duration)\n",
"\n",
" roc = round(roc_auc_score(y_test, test_scores), ndigits=4)\n",
" prn = round(precision_n_scores(y_test, test_scores), ndigits=4)\n",
"\n",
" print('{clf_name} ROC:{roc}, precision @ rank n:{prn}, '\n",
" 'execution time: {duration}s'.format(\n",
" clf_name=clf_name, roc=roc, prn=prn, duration=duration))\n",
"\n",
" roc_list.append(roc)\n",
" prn_list.append(prn)\n",
"\n",
" temp_df = pd.DataFrame(time_list).transpose()\n",
" temp_df.columns = df_columns\n",
" time_df = pd.concat([time_df, temp_df], axis=0)\n",
"\n",
" temp_df = pd.DataFrame(roc_list).transpose()\n",
" temp_df.columns = df_columns\n",
" roc_df = pd.concat([roc_df, temp_df], axis=0)\n",
"\n",
" temp_df = pd.DataFrame(prn_list).transpose()\n",
" temp_df.columns = df_columns\n",
" prn_df = pd.concat([prn_df, temp_df], axis=0)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Time complexity\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Data | \n",
" #Samples | \n",
" # Dimensions | \n",
" Outlier Perc | \n",
" ABOD | \n",
" CBLOF | \n",
" FB | \n",
" HBOS | \n",
" IForest | \n",
" KNN | \n",
" LOF | \n",
" MCD | \n",
" OCSVM | \n",
" PCA | \n",
" LSCP | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" arrhythmia | \n",
" 452 | \n",
" 274 | \n",
" 14.6018 | \n",
" 0.1454 | \n",
" 0.0301 | \n",
" 0.5825 | \n",
" 0.0622 | \n",
" 0.2477 | \n",
" 0.0932 | \n",
" 0.0681 | \n",
" 0.5083 | \n",
" 0.0471 | \n",
" 0.0602 | \n",
" 2.4836 | \n",
"
\n",
" \n",
" 0 | \n",
" cardio | \n",
" 1831 | \n",
" 21 | \n",
" 9.6122 | \n",
" 0.399 | \n",
" 0.0261 | \n",
" 0.8914 | \n",
" 0.006 | \n",
" 0.3219 | \n",
" 0.3168 | \n",
" 0.1494 | \n",
" 0.6467 | \n",
" 0.0983 | \n",
" 0.004 | \n",
" 5.0424 | \n",
"
\n",
" \n",
" 0 | \n",
" glass | \n",
" 214 | \n",
" 9 | \n",
" 4.2056 | \n",
" 0.0401 | \n",
" 0.012 | \n",
" 0.0281 | \n",
" 0.002 | \n",
" 0.1684 | \n",
" 0.018 | \n",
" 0.003 | \n",
" 0.0431 | \n",
" 0.001 | \n",
" 0.002 | \n",
" 0.2878 | \n",
"
\n",
" \n",
" 0 | \n",
" ionosphere | \n",
" 351 | \n",
" 33 | \n",
" 35.8974 | \n",
" 0.0792 | \n",
" 0.0271 | \n",
" 0.0722 | \n",
" 0.008 | \n",
" 0.2347 | \n",
" 0.0251 | \n",
" 0.006 | \n",
" 0.0682 | \n",
" 0.005 | \n",
" 0.002 | \n",
" 0.5264 | \n",
"
\n",
" \n",
" 0 | \n",
" letter | \n",
" 1600 | \n",
" 32 | \n",
" 6.25 | \n",
" 0.3579 | \n",
" 0.021 | \n",
" 0.751 | \n",
" 0.008 | \n",
" 0.2597 | \n",
" 0.1584 | \n",
" 0.1203 | \n",
" 1.1551 | \n",
" 0.0852 | \n",
" 0.004 | \n",
" 4.7737 | \n",
"
\n",
" \n",
" 0 | \n",
" lympho | \n",
" 148 | \n",
" 18 | \n",
" 4.0541 | \n",
" 0.0231 | \n",
" 0.015 | \n",
" 0.0231 | \n",
" 0.004 | \n",
" 0.2065 | \n",
" 0.011 | \n",
" 0.003 | \n",
" 0.0491 | \n",
" 0.002 | \n",
" 0.001 | \n",
" 0.2988 | \n",
"
\n",
" \n",
" 0 | \n",
" mnist | \n",
" 7603 | \n",
" 100 | \n",
" 9.2069 | \n",
" 7.2033 | \n",
" 0.0622 | \n",
" 57.4463 | \n",
" 0.0451 | \n",
" 1.7867 | \n",
" 7.1259 | \n",
" 6.4782 | \n",
" 2.4916 | \n",
" 4.6975 | \n",
" 0.1494 | \n",
" 191.535 | \n",
"
\n",
" \n",
" 0 | \n",
" musk | \n",
" 3062 | \n",
" 166 | \n",
" 3.1679 | \n",
" 2.3532 | \n",
" 0.0361 | \n",
" 13.4944 | \n",
" 0.0822 | \n",
" 1.5561 | \n",
" 2.3803 | \n",
" 2.0785 | \n",
" 14.7882 | \n",
" 1.3396 | \n",
" 0.1594 | \n",
" 116.736 | \n",
"
\n",
" \n",
" 0 | \n",
" optdigits | \n",
" 5216 | \n",
" 64 | \n",
" 2.8758 | \n",
" 3.1473 | \n",
" 0.0261 | \n",
" 14.9847 | \n",
" 0.0341 | \n",
" 0.8773 | \n",
" 1.9953 | \n",
" 1.7457 | \n",
" 1.0467 | \n",
" 1.4028 | \n",
" 0.0451 | \n",
" 60.2475 | \n",
"
\n",
" \n",
" 0 | \n",
" pendigits | \n",
" 6870 | \n",
" 16 | \n",
" 2.2707 | \n",
" 1.4017 | \n",
" 0.0551 | \n",
" 4.9451 | \n",
" 0.011 | \n",
" 0.5765 | \n",
" 0.9656 | \n",
" 0.6006 | \n",
" 1.8158 | \n",
" 1.0207 | \n",
" 0.01 | \n",
" 25.4254 | \n",
"
\n",
" \n",
" 0 | \n",
" pima | \n",
" 768 | \n",
" 8 | \n",
" 34.8958 | \n",
" 0.1334 | \n",
" 0.0221 | \n",
" 0.0942 | \n",
" 0.003 | \n",
" 0.2035 | \n",
" 0.0662 | \n",
" 0.01 | \n",
" 0.0852 | \n",
" 0.012 | \n",
" 0.002 | \n",
" 1.1691 | \n",
"
\n",
" \n",
" 0 | \n",
" satellite | \n",
" 6435 | \n",
" 36 | \n",
" 31.6395 | \n",
" 1.89 | \n",
" 0.025 | \n",
" 8.0935 | \n",
" 0.019 | \n",
" 0.8693 | \n",
" 1.2533 | \n",
" 1.0658 | \n",
" 2.8646 | \n",
" 1.3265 | \n",
" 0.0271 | \n",
" 34.2882 | \n",
"
\n",
" \n",
" 0 | \n",
" satimage-2 | \n",
" 5803 | \n",
" 36 | \n",
" 1.2235 | \n",
" 1.7396 | \n",
" 0.0531 | \n",
" 7.2252 | \n",
" 0.0211 | \n",
" 0.6788 | \n",
" 1.1992 | \n",
" 0.9445 | \n",
" 2.1818 | \n",
" 1.114 | \n",
" 0.0211 | \n",
" 33.6747 | \n",
"
\n",
" \n",
" 0 | \n",
" vertebral | \n",
" 240 | \n",
" 6 | \n",
" 12.5 | \n",
" 0.0371 | \n",
" 0.023 | \n",
" 0.0391 | \n",
" 0.003 | \n",
" 0.1775 | \n",
" 0.018 | \n",
" 0.003 | \n",
" 0.0491 | \n",
" 0.002 | \n",
" 0.001 | \n",
" 0.3359 | \n",
"
\n",
" \n",
" 0 | \n",
" vowels | \n",
" 1456 | \n",
" 12 | \n",
" 3.4341 | \n",
" 0.2637 | \n",
" 0.0221 | \n",
" 0.2787 | \n",
" 0.004 | \n",
" 0.2557 | \n",
" 0.1153 | \n",
" 0.0351 | \n",
" 0.7189 | \n",
" 0.0441 | \n",
" 0.002 | \n",
" 2.3161 | \n",
"
\n",
" \n",
" 0 | \n",
" wbc | \n",
" 378 | \n",
" 30 | \n",
" 5.5556 | \n",
" 0.0632 | \n",
" 0.015 | \n",
" 0.0702 | \n",
" 0.008 | \n",
" 0.2055 | \n",
" 0.0291 | \n",
" 0.007 | \n",
" 0.0602 | \n",
" 0.006 | \n",
" 0.002 | \n",
" 0.5545 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Data #Samples # Dimensions Outlier Perc ABOD CBLOF FB \\\n",
"0 arrhythmia 452 274 14.6018 0.1454 0.0301 0.5825 \n",
"0 cardio 1831 21 9.6122 0.399 0.0261 0.8914 \n",
"0 glass 214 9 4.2056 0.0401 0.012 0.0281 \n",
"0 ionosphere 351 33 35.8974 0.0792 0.0271 0.0722 \n",
"0 letter 1600 32 6.25 0.3579 0.021 0.751 \n",
"0 lympho 148 18 4.0541 0.0231 0.015 0.0231 \n",
"0 mnist 7603 100 9.2069 7.2033 0.0622 57.4463 \n",
"0 musk 3062 166 3.1679 2.3532 0.0361 13.4944 \n",
"0 optdigits 5216 64 2.8758 3.1473 0.0261 14.9847 \n",
"0 pendigits 6870 16 2.2707 1.4017 0.0551 4.9451 \n",
"0 pima 768 8 34.8958 0.1334 0.0221 0.0942 \n",
"0 satellite 6435 36 31.6395 1.89 0.025 8.0935 \n",
"0 satimage-2 5803 36 1.2235 1.7396 0.0531 7.2252 \n",
"0 vertebral 240 6 12.5 0.0371 0.023 0.0391 \n",
"0 vowels 1456 12 3.4341 0.2637 0.0221 0.2787 \n",
"0 wbc 378 30 5.5556 0.0632 0.015 0.0702 \n",
"\n",
" HBOS IForest KNN LOF MCD OCSVM PCA LSCP \n",
"0 0.0622 0.2477 0.0932 0.0681 0.5083 0.0471 0.0602 2.4836 \n",
"0 0.006 0.3219 0.3168 0.1494 0.6467 0.0983 0.004 5.0424 \n",
"0 0.002 0.1684 0.018 0.003 0.0431 0.001 0.002 0.2878 \n",
"0 0.008 0.2347 0.0251 0.006 0.0682 0.005 0.002 0.5264 \n",
"0 0.008 0.2597 0.1584 0.1203 1.1551 0.0852 0.004 4.7737 \n",
"0 0.004 0.2065 0.011 0.003 0.0491 0.002 0.001 0.2988 \n",
"0 0.0451 1.7867 7.1259 6.4782 2.4916 4.6975 0.1494 191.535 \n",
"0 0.0822 1.5561 2.3803 2.0785 14.7882 1.3396 0.1594 116.736 \n",
"0 0.0341 0.8773 1.9953 1.7457 1.0467 1.4028 0.0451 60.2475 \n",
"0 0.011 0.5765 0.9656 0.6006 1.8158 1.0207 0.01 25.4254 \n",
"0 0.003 0.2035 0.0662 0.01 0.0852 0.012 0.002 1.1691 \n",
"0 0.019 0.8693 1.2533 1.0658 2.8646 1.3265 0.0271 34.2882 \n",
"0 0.0211 0.6788 1.1992 0.9445 2.1818 1.114 0.0211 33.6747 \n",
"0 0.003 0.1775 0.018 0.003 0.0491 0.002 0.001 0.3359 \n",
"0 0.004 0.2557 0.1153 0.0351 0.7189 0.0441 0.002 2.3161 \n",
"0 0.008 0.2055 0.0291 0.007 0.0602 0.006 0.002 0.5545 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print('Time complexity')\n",
"time_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Analyze the performance of ROC and Precision @ n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ROC Performance\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Data | \n",
" #Samples | \n",
" # Dimensions | \n",
" Outlier Perc | \n",
" ABOD | \n",
" CBLOF | \n",
" FB | \n",
" HBOS | \n",
" IForest | \n",
" KNN | \n",
" LOF | \n",
" MCD | \n",
" OCSVM | \n",
" PCA | \n",
" LSCP | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" arrhythmia | \n",
" 452 | \n",
" 274 | \n",
" 14.6018 | \n",
" 0.7687 | \n",
" 0.778 | \n",
" 0.7736 | \n",
" 0.8511 | \n",
" 0.8217 | \n",
" 0.782 | \n",
" 0.7787 | \n",
" 0.8228 | \n",
" 0.7986 | \n",
" 0.7997 | \n",
" 0.7754 | \n",
"
\n",
" \n",
" 0 | \n",
" cardio | \n",
" 1831 | \n",
" 21 | \n",
" 9.6122 | \n",
" 0.5952 | \n",
" 0.8894 | \n",
" 0.5628 | \n",
" 0.8227 | \n",
" 0.8953 | \n",
" 0.7442 | \n",
" 0.5459 | \n",
" 0.7774 | \n",
" 0.914 | \n",
" 0.9323 | \n",
" 0.622 | \n",
"
\n",
" \n",
" 0 | \n",
" glass | \n",
" 214 | \n",
" 9 | \n",
" 4.2056 | \n",
" 0.8588 | \n",
" 0.7765 | \n",
" 0.4235 | \n",
" 0.6 | \n",
" 0.7765 | \n",
" 0.8353 | \n",
" 0.3882 | \n",
" 0.8353 | \n",
" 0.7529 | \n",
" 0.7176 | \n",
" 0.7529 | \n",
"
\n",
" \n",
" 0 | \n",
" ionosphere | \n",
" 351 | \n",
" 33 | \n",
" 35.8974 | \n",
" 0.9302 | \n",
" 0.8073 | \n",
" 0.9092 | \n",
" 0.5869 | \n",
" 0.8734 | \n",
" 0.9358 | \n",
" 0.9114 | \n",
" 0.9576 | \n",
" 0.8861 | \n",
" 0.8204 | \n",
" 0.9041 | \n",
"
\n",
" \n",
" 0 | \n",
" letter | \n",
" 1600 | \n",
" 32 | \n",
" 6.25 | \n",
" 0.9035 | \n",
" 0.5555 | \n",
" 0.9077 | \n",
" 0.6056 | \n",
" 0.5945 | \n",
" 0.8909 | \n",
" 0.8821 | \n",
" 0.8144 | \n",
" 0.5727 | \n",
" 0.5104 | \n",
" 0.857 | \n",
"
\n",
" \n",
" 0 | \n",
" lympho | \n",
" 148 | \n",
" 18 | \n",
" 4.0541 | \n",
" 0.9357 | \n",
" 0.9708 | \n",
" 0.924 | \n",
" 1 | \n",
" 0.9942 | \n",
" 0.9064 | \n",
" 0.924 | \n",
" 0.7778 | \n",
" 0.9357 | \n",
" 0.9649 | \n",
" 0.9357 | \n",
"
\n",
" \n",
" 0 | \n",
" mnist | \n",
" 7603 | \n",
" 100 | \n",
" 9.2069 | \n",
" 0.7978 | \n",
" 0.8477 | \n",
" 0.7451 | \n",
" 0.5645 | \n",
" 0.8154 | \n",
" 0.8643 | \n",
" 0.7442 | \n",
" 0.8926 | \n",
" 0.8595 | \n",
" 0.8572 | \n",
" 0.7873 | \n",
"
\n",
" \n",
" 0 | \n",
" musk | \n",
" 3062 | \n",
" 166 | \n",
" 3.1679 | \n",
" 0.2111 | \n",
" 0.9864 | \n",
" 0.6141 | \n",
" 0.9999 | \n",
" 0.9997 | \n",
" 0.8224 | \n",
" 0.6232 | \n",
" 0.9984 | \n",
" 1 | \n",
" 0.9999 | \n",
" 0.5304 | \n",
"
\n",
" \n",
" 0 | \n",
" optdigits | \n",
" 5216 | \n",
" 64 | \n",
" 2.8758 | \n",
" 0.4294 | \n",
" 0.49 | \n",
" 0.4108 | \n",
" 0.835 | \n",
" 0.7365 | \n",
" 0.3836 | \n",
" 0.3996 | \n",
" 0.3791 | \n",
" 0.532 | \n",
" 0.525 | \n",
" 0.3975 | \n",
"
\n",
" \n",
" 0 | \n",
" pendigits | \n",
" 6870 | \n",
" 16 | \n",
" 2.2707 | \n",
" 0.6608 | \n",
" 0.934 | \n",
" 0.3992 | \n",
" 0.9209 | \n",
" 0.9296 | \n",
" 0.7086 | \n",
" 0.419 | \n",
" 0.8369 | \n",
" 0.9267 | \n",
" 0.9359 | \n",
" 0.487 | \n",
"
\n",
" \n",
" 0 | \n",
" pima | \n",
" 768 | \n",
" 8 | \n",
" 34.8958 | \n",
" 0.6864 | \n",
" 0.7064 | \n",
" 0.618 | \n",
" 0.7031 | \n",
" 0.6704 | \n",
" 0.712 | \n",
" 0.6356 | \n",
" 0.7026 | \n",
" 0.6297 | \n",
" 0.6602 | \n",
" 0.6538 | \n",
"
\n",
" \n",
" 0 | \n",
" satellite | \n",
" 6435 | \n",
" 36 | \n",
" 31.6395 | \n",
" 0.5676 | \n",
" 0.7307 | \n",
" 0.5645 | \n",
" 0.7593 | \n",
" 0.6947 | \n",
" 0.6827 | \n",
" 0.5676 | \n",
" 0.7991 | \n",
" 0.6551 | \n",
" 0.5976 | \n",
" 0.5809 | \n",
"
\n",
" \n",
" 0 | \n",
" satimage-2 | \n",
" 5803 | \n",
" 36 | \n",
" 1.2235 | \n",
" 0.8497 | \n",
" 0.9568 | \n",
" 0.4798 | \n",
" 0.9948 | \n",
" 0.9997 | \n",
" 0.9693 | \n",
" 0.4819 | \n",
" 0.996 | \n",
" 1 | \n",
" 0.9974 | \n",
" 0.6986 | \n",
"
\n",
" \n",
" 0 | \n",
" vertebral | \n",
" 240 | \n",
" 6 | \n",
" 12.5 | \n",
" 0.2428 | \n",
" 0.2588 | \n",
" 0.2909 | \n",
" 0.1524 | \n",
" 0.2449 | \n",
" 0.2791 | \n",
" 0.293 | \n",
" 0.3273 | \n",
" 0.2909 | \n",
" 0.2439 | \n",
" 0.2503 | \n",
"
\n",
" \n",
" 0 | \n",
" vowels | \n",
" 1456 | \n",
" 12 | \n",
" 3.4341 | \n",
" 0.9726 | \n",
" 0.574 | \n",
" 0.955 | \n",
" 0.6683 | \n",
" 0.7809 | \n",
" 0.9775 | \n",
" 0.9514 | \n",
" 0.7081 | \n",
" 0.8244 | \n",
" 0.5585 | \n",
" 0.9604 | \n",
"
\n",
" \n",
" 0 | \n",
" wbc | \n",
" 378 | \n",
" 30 | \n",
" 5.5556 | \n",
" 0.8803 | \n",
" 0.9374 | \n",
" 0.9224 | \n",
" 0.9415 | \n",
" 0.9102 | \n",
" 0.9034 | \n",
" 0.9211 | \n",
" 0.9129 | \n",
" 0.9224 | \n",
" 0.9102 | \n",
" 0.9116 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Data #Samples # Dimensions Outlier Perc ABOD CBLOF FB \\\n",
"0 arrhythmia 452 274 14.6018 0.7687 0.778 0.7736 \n",
"0 cardio 1831 21 9.6122 0.5952 0.8894 0.5628 \n",
"0 glass 214 9 4.2056 0.8588 0.7765 0.4235 \n",
"0 ionosphere 351 33 35.8974 0.9302 0.8073 0.9092 \n",
"0 letter 1600 32 6.25 0.9035 0.5555 0.9077 \n",
"0 lympho 148 18 4.0541 0.9357 0.9708 0.924 \n",
"0 mnist 7603 100 9.2069 0.7978 0.8477 0.7451 \n",
"0 musk 3062 166 3.1679 0.2111 0.9864 0.6141 \n",
"0 optdigits 5216 64 2.8758 0.4294 0.49 0.4108 \n",
"0 pendigits 6870 16 2.2707 0.6608 0.934 0.3992 \n",
"0 pima 768 8 34.8958 0.6864 0.7064 0.618 \n",
"0 satellite 6435 36 31.6395 0.5676 0.7307 0.5645 \n",
"0 satimage-2 5803 36 1.2235 0.8497 0.9568 0.4798 \n",
"0 vertebral 240 6 12.5 0.2428 0.2588 0.2909 \n",
"0 vowels 1456 12 3.4341 0.9726 0.574 0.955 \n",
"0 wbc 378 30 5.5556 0.8803 0.9374 0.9224 \n",
"\n",
" HBOS IForest KNN LOF MCD OCSVM PCA LSCP \n",
"0 0.8511 0.8217 0.782 0.7787 0.8228 0.7986 0.7997 0.7754 \n",
"0 0.8227 0.8953 0.7442 0.5459 0.7774 0.914 0.9323 0.622 \n",
"0 0.6 0.7765 0.8353 0.3882 0.8353 0.7529 0.7176 0.7529 \n",
"0 0.5869 0.8734 0.9358 0.9114 0.9576 0.8861 0.8204 0.9041 \n",
"0 0.6056 0.5945 0.8909 0.8821 0.8144 0.5727 0.5104 0.857 \n",
"0 1 0.9942 0.9064 0.924 0.7778 0.9357 0.9649 0.9357 \n",
"0 0.5645 0.8154 0.8643 0.7442 0.8926 0.8595 0.8572 0.7873 \n",
"0 0.9999 0.9997 0.8224 0.6232 0.9984 1 0.9999 0.5304 \n",
"0 0.835 0.7365 0.3836 0.3996 0.3791 0.532 0.525 0.3975 \n",
"0 0.9209 0.9296 0.7086 0.419 0.8369 0.9267 0.9359 0.487 \n",
"0 0.7031 0.6704 0.712 0.6356 0.7026 0.6297 0.6602 0.6538 \n",
"0 0.7593 0.6947 0.6827 0.5676 0.7991 0.6551 0.5976 0.5809 \n",
"0 0.9948 0.9997 0.9693 0.4819 0.996 1 0.9974 0.6986 \n",
"0 0.1524 0.2449 0.2791 0.293 0.3273 0.2909 0.2439 0.2503 \n",
"0 0.6683 0.7809 0.9775 0.9514 0.7081 0.8244 0.5585 0.9604 \n",
"0 0.9415 0.9102 0.9034 0.9211 0.9129 0.9224 0.9102 0.9116 "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print('ROC Performance')\n",
"roc_df"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Precision @ n Performance\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Data | \n",
" #Samples | \n",
" # Dimensions | \n",
" Outlier Perc | \n",
" ABOD | \n",
" CBLOF | \n",
" FB | \n",
" HBOS | \n",
" IForest | \n",
" KNN | \n",
" LOF | \n",
" MCD | \n",
" OCSVM | \n",
" PCA | \n",
" LSCP | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" arrhythmia | \n",
" 452 | \n",
" 274 | \n",
" 14.6018 | \n",
" 0.3571 | \n",
" 0.5 | \n",
" 0.5 | \n",
" 0.5714 | \n",
" 0.5 | \n",
" 0.5 | \n",
" 0.4643 | \n",
" 0.4286 | \n",
" 0.5 | \n",
" 0.5 | \n",
" 0.4286 | \n",
"
\n",
" \n",
" 0 | \n",
" cardio | \n",
" 1831 | \n",
" 21 | \n",
" 9.6122 | \n",
" 0.1884 | \n",
" 0.4928 | \n",
" 0.1594 | \n",
" 0.4783 | \n",
" 0.4493 | \n",
" 0.2899 | \n",
" 0.1594 | \n",
" 0.4203 | \n",
" 0.4493 | \n",
" 0.5507 | \n",
" 0.1884 | \n",
"
\n",
" \n",
" 0 | \n",
" glass | \n",
" 214 | \n",
" 9 | \n",
" 4.2056 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 0 | \n",
" ionosphere | \n",
" 351 | \n",
" 33 | \n",
" 35.8974 | \n",
" 0.8462 | \n",
" 0.6154 | \n",
" 0.7692 | \n",
" 0.4038 | \n",
" 0.7115 | \n",
" 0.8846 | \n",
" 0.7692 | \n",
" 0.9038 | \n",
" 0.8077 | \n",
" 0.6154 | \n",
" 0.75 | \n",
"
\n",
" \n",
" 0 | \n",
" letter | \n",
" 1600 | \n",
" 32 | \n",
" 6.25 | \n",
" 0.4255 | \n",
" 0.0851 | \n",
" 0.4894 | \n",
" 0.1915 | \n",
" 0.1064 | \n",
" 0.4043 | \n",
" 0.4681 | \n",
" 0.1915 | \n",
" 0.1489 | \n",
" 0.1277 | \n",
" 0.4043 | \n",
"
\n",
" \n",
" 0 | \n",
" lympho | \n",
" 148 | \n",
" 18 | \n",
" 4.0541 | \n",
" 0.3333 | \n",
" 0.6667 | \n",
" 0.3333 | \n",
" 1 | \n",
" 0.6667 | \n",
" 0.3333 | \n",
" 0.3333 | \n",
" 0 | \n",
" 0.3333 | \n",
" 0.6667 | \n",
" 0.3333 | \n",
"
\n",
" \n",
" 0 | \n",
" mnist | \n",
" 7603 | \n",
" 100 | \n",
" 9.2069 | \n",
" 0.3594 | \n",
" 0.3915 | \n",
" 0.3452 | \n",
" 0.1174 | \n",
" 0.3096 | \n",
" 0.4448 | \n",
" 0.3523 | \n",
" 0.4875 | \n",
" 0.3915 | \n",
" 0.3843 | \n",
" 0.3665 | \n",
"
\n",
" \n",
" 0 | \n",
" musk | \n",
" 3062 | \n",
" 166 | \n",
" 3.1679 | \n",
" 0.0488 | \n",
" 0.6829 | \n",
" 0.2195 | \n",
" 0.9756 | \n",
" 0.9756 | \n",
" 0.2439 | \n",
" 0.2195 | \n",
" 0.878 | \n",
" 1 | \n",
" 0.9512 | \n",
" 0.1463 | \n",
"
\n",
" \n",
" 0 | \n",
" optdigits | \n",
" 5216 | \n",
" 64 | \n",
" 2.8758 | \n",
" 0.0149 | \n",
" 0 | \n",
" 0.0149 | \n",
" 0.209 | \n",
" 0.0299 | \n",
" 0 | \n",
" 0.0149 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0.0149 | \n",
"
\n",
" \n",
" 0 | \n",
" pendigits | \n",
" 6870 | \n",
" 16 | \n",
" 2.2707 | \n",
" 0.1224 | \n",
" 0.2041 | \n",
" 0.0408 | \n",
" 0.3061 | \n",
" 0.3061 | \n",
" 0.0408 | \n",
" 0.0408 | \n",
" 0.0612 | \n",
" 0.2449 | \n",
" 0.2653 | \n",
" 0.0408 | \n",
"
\n",
" \n",
" 0 | \n",
" pima | \n",
" 768 | \n",
" 8 | \n",
" 34.8958 | \n",
" 0.5847 | \n",
" 0.5678 | \n",
" 0.4746 | \n",
" 0.5847 | \n",
" 0.5424 | \n",
" 0.5847 | \n",
" 0.5169 | \n",
" 0.5678 | \n",
" 0.5085 | \n",
" 0.5508 | \n",
" 0.5339 | \n",
"
\n",
" \n",
" 0 | \n",
" satellite | \n",
" 6435 | \n",
" 36 | \n",
" 31.6395 | \n",
" 0.4078 | \n",
" 0.4539 | \n",
" 0.4054 | \n",
" 0.5804 | \n",
" 0.5686 | \n",
" 0.5 | \n",
" 0.4066 | \n",
" 0.6832 | \n",
" 0.5355 | \n",
" 0.4787 | \n",
" 0.4184 | \n",
"
\n",
" \n",
" 0 | \n",
" satimage-2 | \n",
" 5803 | \n",
" 36 | \n",
" 1.2235 | \n",
" 0.1667 | \n",
" 0.5 | \n",
" 0.1 | \n",
" 0.7 | \n",
" 0.9333 | \n",
" 0.4 | \n",
" 0.1 | \n",
" 0.7 | \n",
" 1 | \n",
" 0.8333 | \n",
" 0.1 | \n",
"
\n",
" \n",
" 0 | \n",
" vertebral | \n",
" 240 | \n",
" 6 | \n",
" 12.5 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 0 | \n",
" vowels | \n",
" 1456 | \n",
" 12 | \n",
" 3.4341 | \n",
" 0.5 | \n",
" 0 | \n",
" 0.25 | \n",
" 0.0625 | \n",
" 0.0625 | \n",
" 0.3125 | \n",
" 0.3125 | \n",
" 0.0625 | \n",
" 0.1875 | \n",
" 0 | \n",
" 0.3125 | \n",
"
\n",
" \n",
" 0 | \n",
" wbc | \n",
" 378 | \n",
" 30 | \n",
" 5.5556 | \n",
" 0.2 | \n",
" 0.4 | \n",
" 0.2 | \n",
" 0.4 | \n",
" 0.2 | \n",
" 0.2 | \n",
" 0.2 | \n",
" 0.2 | \n",
" 0.2 | \n",
" 0.2 | \n",
" 0.4 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Data #Samples # Dimensions Outlier Perc ABOD CBLOF FB \\\n",
"0 arrhythmia 452 274 14.6018 0.3571 0.5 0.5 \n",
"0 cardio 1831 21 9.6122 0.1884 0.4928 0.1594 \n",
"0 glass 214 9 4.2056 0 0 0 \n",
"0 ionosphere 351 33 35.8974 0.8462 0.6154 0.7692 \n",
"0 letter 1600 32 6.25 0.4255 0.0851 0.4894 \n",
"0 lympho 148 18 4.0541 0.3333 0.6667 0.3333 \n",
"0 mnist 7603 100 9.2069 0.3594 0.3915 0.3452 \n",
"0 musk 3062 166 3.1679 0.0488 0.6829 0.2195 \n",
"0 optdigits 5216 64 2.8758 0.0149 0 0.0149 \n",
"0 pendigits 6870 16 2.2707 0.1224 0.2041 0.0408 \n",
"0 pima 768 8 34.8958 0.5847 0.5678 0.4746 \n",
"0 satellite 6435 36 31.6395 0.4078 0.4539 0.4054 \n",
"0 satimage-2 5803 36 1.2235 0.1667 0.5 0.1 \n",
"0 vertebral 240 6 12.5 0 0 0 \n",
"0 vowels 1456 12 3.4341 0.5 0 0.25 \n",
"0 wbc 378 30 5.5556 0.2 0.4 0.2 \n",
"\n",
" HBOS IForest KNN LOF MCD OCSVM PCA LSCP \n",
"0 0.5714 0.5 0.5 0.4643 0.4286 0.5 0.5 0.4286 \n",
"0 0.4783 0.4493 0.2899 0.1594 0.4203 0.4493 0.5507 0.1884 \n",
"0 0 0 0 0 0 0 0 0 \n",
"0 0.4038 0.7115 0.8846 0.7692 0.9038 0.8077 0.6154 0.75 \n",
"0 0.1915 0.1064 0.4043 0.4681 0.1915 0.1489 0.1277 0.4043 \n",
"0 1 0.6667 0.3333 0.3333 0 0.3333 0.6667 0.3333 \n",
"0 0.1174 0.3096 0.4448 0.3523 0.4875 0.3915 0.3843 0.3665 \n",
"0 0.9756 0.9756 0.2439 0.2195 0.878 1 0.9512 0.1463 \n",
"0 0.209 0.0299 0 0.0149 0 0 0 0.0149 \n",
"0 0.3061 0.3061 0.0408 0.0408 0.0612 0.2449 0.2653 0.0408 \n",
"0 0.5847 0.5424 0.5847 0.5169 0.5678 0.5085 0.5508 0.5339 \n",
"0 0.5804 0.5686 0.5 0.4066 0.6832 0.5355 0.4787 0.4184 \n",
"0 0.7 0.9333 0.4 0.1 0.7 1 0.8333 0.1 \n",
"0 0 0 0 0 0 0 0 0 \n",
"0 0.0625 0.0625 0.3125 0.3125 0.0625 0.1875 0 0.3125 \n",
"0 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0.4 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print('Precision @ n Performance')\n",
"prn_df"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}