{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Anomaly Detection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook we'll see how to apply deep neural networks to the problem of detecting anomalies. Anomaly detection is a wide-ranging and often weakly defined class of problem where we try to identify anomalous data points or sequences in a dataset. When dealing with time series specifically (such as a sensor or collection of sensors on a piece of equipment), defining something as anomalus needs to take into account temporal dependencies. This is a challenge that deep learning models are fairly well suited to handle. We'll start with a simple time series of sensor readings and see how to construct an autoencoder using LTSM's that predict future time steps in the series. We'll then see how to use the distribution of the model errors to identify points in time that stand out as potentially being anomalus.\n", "\n", "The dataset used for this example can be found [here](https://raw.githubusercontent.com/numenta/NAB/master/data/realKnownCause/machine_temperature_system_failure.csv). It's part of a collection of datasets that Numenta has hosted to showcase their own anomaly detection methods.\n", "\n", "Let's start by getting some imports out of the way and reading the data into a data frame." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import datetime\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "from jupyterthemes import jtplot\n", "jtplot.style()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(22695, 2)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = pd.read_csv('https://raw.githubusercontent.com/numenta/NAB/master/data/realKnownCause/machine_temperature_system_failure.csv')\n", "data.shape" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | timestamp | \n", "value | \n", "
---|---|---|
0 | \n", "2013-12-02 21:15:00 | \n", "73.967322 | \n", "
1 | \n", "2013-12-02 21:20:00 | \n", "74.935882 | \n", "
2 | \n", "2013-12-02 21:25:00 | \n", "76.124162 | \n", "
3 | \n", "2013-12-02 21:30:00 | \n", "78.140707 | \n", "
4 | \n", "2013-12-02 21:35:00 | \n", "79.329836 | \n", "