{ "cells": [ { "cell_type": "markdown", "metadata": { "scrolled": true }, "source": [ "# Analyzing Prometheus Alerts in Ceph\n", "\n", "For a better understanding of the structure of prometheus data types have a look at [Prometheus Metric Types](https://prometheus.io/docs/concepts/metric_types/), especially the [difference between Summaries and Histograms](https://prometheus.io/docs/practices/histograms/)\n", "\n", "The measurements are stored in an Ceph. Let's examine what we have stored." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import statistics libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import json\n", "import numpy as np\n", "import seaborn as sns\n", "import sys\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "import pyspark\n", "import json\n", "from pyspark.sql import SparkSession\n", "\n", "from datetime import datetime\n", "\n", "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set Spark Configuration" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Set the Spark configuration\n", "#This will point to a local Spark instance running in stand-alone mode on the notebook\n", "conf = pyspark.SparkConf().setAppName('Analyzing Prometheus Alerts in Ceph').setMaster('local[*]')\n", "sc = pyspark.SparkContext.getOrCreate(conf) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Access Ceph Object Storage over S3A" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Set the S3 configurations to access Ceph Object Storage\n", "sc._jsc.hadoopConfiguration().set(\"fs.s3a.access.key\", 'S3user1') \n", "sc._jsc.hadoopConfiguration().set(\"fs.s3a.secret.key\", 'S3user1key') \n", "sc._jsc.hadoopConfiguration().set(\"fs.s3a.endpoint\", 'http://10.0.1.111') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set SQL Context and Read Dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Get the SQL context\n", "sqlContext = pyspark.SQLContext(sc)\n", "\n", "#Read the Prometheus JSON BZip data\n", "jsonFile = sqlContext.read.option(\"multiline\", True).option(\"mode\", \"PERMISSIVE\").json(\"s3a://METRICS/kubelet_docker_operations_latency_microseconds/\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### IMPORTANT: If you run the above step with incorrect Ceph parameters, you must reset the Kernel to see changes.\n", "This can be done by going to Kernel in the menu and selecting 'Restart'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prometheus alerts\n", "\n", "```\n", "alert: DockerLatencyHigh\n", "message: Docker latency is high\n", "description: Docker latency is {{ $value }} seconds for 90% of kubelet operations\n", "expr: round(max(kubelet_docker_operations_latency_microseconds{quantile=\"0.9\"}) BY (hostname) / 1e+06, 0.1) > 10\n", "``` \n", "\n", "