{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Outlier detection with Local Outlier Factor (LOF)\n\nThe Local Outlier Factor (LOF) algorithm is an unsupervised anomaly detection\nmethod which computes the local density deviation of a given data point with\nrespect to its neighbors. It considers as outliers the samples that have a\nsubstantially lower density than their neighbors. This example shows how to use\nLOF for outlier detection which is the default use case of this estimator in\nscikit-learn. Note that when LOF is used for outlier detection it has no\n`predict`, `decision_function` and `score_samples` methods. See the `User\nGuide ` for details on the difference between outlier\ndetection and novelty detection and how to use LOF for novelty detection.\n\nThe number of neighbors considered (parameter `n_neighbors`) is typically set 1)\ngreater than the minimum number of samples a cluster has to contain, so that\nother samples can be local outliers relative to this cluster, and 2) smaller\nthan the maximum number of close by samples that can potentially be local\noutliers. In practice, such information is generally not available, and taking\n`n_neighbors=20` appears to work well in general.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate data with outliers\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n\nnp.random.seed(42)\n\nX_inliers = 0.3 * np.random.randn(100, 2)\nX_inliers = np.r_[X_inliers + 2, X_inliers - 2]\nX_outliers = np.random.uniform(low=-4, high=4, size=(20, 2))\nX = np.r_[X_inliers, X_outliers]\n\nn_outliers = len(X_outliers)\nground_truth = np.ones(len(X), dtype=int)\nground_truth[-n_outliers:] = -1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fit the model for outlier detection (default)\n\nUse `fit_predict` to compute the predicted labels of the training samples\n(when LOF is used for outlier detection, the estimator has no `predict`,\n`decision_function` and `score_samples` methods).\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.neighbors import LocalOutlierFactor\n\nclf = LocalOutlierFactor(n_neighbors=20, contamination=0.1)\ny_pred = clf.fit_predict(X)\nn_errors = (y_pred != ground_truth).sum()\nX_scores = clf.negative_outlier_factor_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plot results\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\nfrom matplotlib.legend_handler import HandlerPathCollection\n\n\ndef update_legend_marker_size(handle, orig):\n \"Customize size of the legend marker\"\n handle.update_from(orig)\n handle.set_sizes([20])\n\n\nplt.scatter(X[:, 0], X[:, 1], color=\"k\", s=3.0, label=\"Data points\")\n# plot circles with radius proportional to the outlier scores\nradius = (X_scores.max() - X_scores) / (X_scores.max() - X_scores.min())\nscatter = plt.scatter(\n X[:, 0],\n X[:, 1],\n s=1000 * radius,\n edgecolors=\"r\",\n facecolors=\"none\",\n label=\"Outlier scores\",\n)\nplt.axis(\"tight\")\nplt.xlim((-5, 5))\nplt.ylim((-5, 5))\nplt.xlabel(\"prediction errors: %d\" % (n_errors))\nplt.legend(\n handler_map={scatter: HandlerPathCollection(update_func=update_legend_marker_size)}\n)\nplt.title(\"Local Outlier Factor (LOF)\")\nplt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.21" } }, "nbformat": 4, "nbformat_minor": 0 }