{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "# Bro to Spark: Clustering\n", "In this notebook we will pull Bro data into Spark then do some analysis and clustering. The first step is to convert your Bro log data into a Parquet file, for instructions on how to do this (just a few lines of Python code using the BAT package) please see this notebook:\n", "\n", "
\n", "\n", "### How to convert Zeek/Bro log to Parquet Notebook\n", "- [Bro to Spark (and Parquet)](https://github.com/SuperCowPowers/bat/blob/master/notebooks/Bro_to_Spark.ipynb)\n", "\n", "Apache Parquet is a columnar storage format focused on performance. Parquet data is often used within the Hadoop ecosystem and we will specifically be using it for loading data into Spark.\n", "\n", "
\n", "
\n", "\n", "### Software\n", "- Bro Analysis Tools (BAT): https://github.com/SuperCowPowers/bat\n", "- Parquet: https://parquet.apache.org\n", "- Spark: https://spark.apache.org\n", "- Spark MLLib: https://spark.apache.org/mllib/\n", "\n", "### Data\n", "- Sec Repo: http://www.secrepo.com (no headers on these)\n", "- SuperCowPowers: [data.kitware.com](https://data.kitware.com/#collection/58d564478d777f0aef5d893a) (with headers)" ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "BAT: 0.2.9\n", "PySpark: 2.2.0\n", "PyArrow: 0.6.0\n" ] } ], "source": [ "# Third Party Imports\n", "import pyspark\n", "from pyspark.sql import SparkSession\n", "import pyarrow\n", "\n", "# Local imports\n", "import bat\n", "from bat.log_to_parquet import log_to_parquet\n", "\n", "# Good to print out versions of stuff\n", "print('BAT: {:s}'.format(bat.__version__))\n", "print('PySpark: {:s}'.format(pyspark.__version__))\n", "print('PyArrow: {:s}'.format(pyarrow.__version__))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "# Spark It!\n", "### Spin up Spark with 4 Parallel Executors\n", "Here we're spinning up a local spark server with 4 parallel executors, although this might seem a bit silly since we're probably running this on a laptop, there are a couple of important observations:\n", "\n", "
\n", "\n", "- If you have 4/8 cores use them!\n", "- It's the exact same code logic as if we were running on a distributed cluster.\n", "- We run the same code on **DataBricks** (www.databricks.com) which is awesome BTW.\n", "\n" ] }, { "cell_type": "code", "execution_count": 103, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Spin up a local Spark Session (with 4 executors)\n", "spark = SparkSession.builder.master(\"local[4]\").appName('my_awesome').getOrCreate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "## Read in our Parquet File\n", "Here we're loading in a Bro HTTP log with ~2 million rows to demonstrate the functionality and do some analysis and clustering on the data. For more information on converting Bro logs to Parquet files please see our Bro to Parquet notebook:\n", "\n", "#### Bro logs to Parquet Notebook\n", "- [Bro to Parquet to Spark](https://github.com/SuperCowPowers/bat/blob/master/notebooks/Bro_to_Parquet_to_Spark.ipynb)" ] }, { "cell_type": "code", "execution_count": 104, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Have Spark read in the Parquet File\n", "spark_df = spark.read.parquet(\"dns.parquet\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "# Lets look at our data\n", "We should always inspect out data when it comes in. Look at both the data values and the data types to make sure you're getting exactly what you should be." ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of Rows: 427935\n", "Columns: AA,RA,RD,TC,TTLs,Z,answers,id.orig_h,id.orig_p,id.resp_h,id.resp_p,proto,qclass,qclass_name,qtype,qtype_name,query,rcode,rcode_name,rejected,trans_id,uid,ts\n" ] } ], "source": [ "# Get information about the Spark DataFrame\n", "num_rows = spark_df.count()\n", "print(\"Number of Rows: {:d}\".format(num_rows))\n", "columns = spark_df.columns\n", "print(\"Columns: {:s}\".format(','.join(columns)))" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+----------+-----+------+\n", "|qtype_name|proto| count|\n", "+----------+-----+------+\n", "| A| udp|212473|\n", "| NB| udp| 77199|\n", "| AAAA| udp| 54519|\n", "| PTR| udp| 52991|\n", "| TXT| udp| 12644|\n", "| SRV| udp| 12268|\n", "| -| udp| 3472|\n", "| *| udp| 882|\n", "| AXFR| tcp| 440|\n", "| SOA| udp| 346|\n", "| TXT| tcp| 226|\n", "| -| tcp| 176|\n", "| MX| udp| 169|\n", "| NS| udp| 43|\n", "| HINFO| udp| 30|\n", "| NAPTR| udp| 27|\n", "| PTR| tcp| 26|\n", "| A| tcp| 4|\n", "+----------+-----+------+\n", "\n" ] } ], "source": [ "spark_df.groupby('qtype_name','proto').count().sort('count', ascending=False).show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "# Data looks good, lets take a deeper dive\n", "Spark has a powerful SQL engine as well as a Machine Learning library. So now that we've loaded our Bro data we're going to utilize the Spark SQL commands to do some investigation of our data including clustering from the MLLib.\n", "\n", "
\n", "
" ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [], "source": [ "# Add a column with the string length of the DNS query\n", "from pyspark.sql.functions import col, length\n", "\n", "# Create new dataframe that includes two new column\n", "spark_df = spark_df.withColumn('query_length', length(col('query')))\n", "spark_df = spark_df.withColumn('answer_length', length(col('answers')))" ] }, { "cell_type": "code", "execution_count": 108, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Plotting defaults\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "from bat.utils import plot_utils\n", "plot_utils.plot_defaults()" ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 109, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA70AAAF/CAYAAACFYq46AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X2YpHdZJ/rvnRliQkJ6iUAcZ2RydMBdREhMr+yCEM8C\nEtgzwhqOIi8eYJdxwSy66nrGY7KGCJ5wlmt1RRYZF8UTNCoakDlAVhFQUVckxHCMYBR0RmaCSQCb\nvJmQ5N4/qkrbZiaZnq7p6nr687muuqbq+VU9z13Vz3T3t38vT3V3AAAAYIhOmnUBAAAAcKIIvQAA\nAAyW0AsAAMBgCb0AAAAMltALAADAYAm9AAAADJbQCwAAwGDNReitqrOr6uaq+sD49vBZ1wQAAMDG\nt3XWBazCb3X3c2ddBAAAAPNjLnp6x55UVb9TVT9aVTXrYgAAANj41jX0VtVFVfXhqrqrqt6you3M\nqnp7Vd1eVQeq6vnLmm9MsivJU5I8Ism3rF/VAAAAzKv1Ht58OMmrkzwjyakr2t6Q5O4kZyU5J8m7\nquq67r6+u+9KcleSVNVVSf5Zkl+9vwM97GEP67PPPnu61R+jO++8M6eeuvLtwfA599msnPtsVs59\nNivn/sZwzTXX3NLdD7je07qG3u6+KkmqajHJjsn2qjotyYVJHtvdtyX5YFW9M8mLkuytqod0963j\npz85yceOtP+q2pNkT5Js27Ytb3rTm07Ye7k/Bw4cyM6dO2dybJgl5z6blXOfzcq5z2bl3N8YFhcX\nDxzL8zbKQlaPTnJPd9+wbNt1Sc4f3/+Gqnp1kjuS/EWSS460k+7el2RfkiwuLvZ555134ip+ALM8\nNsySc5/NyrnPZuXcZ7Ny7s+PjRJ6T0/y+RXblpI8JEm6+z1J3rPeRQEAADDfNsrqzbclOWPFtjOS\n3HqE596vqtpdVfuWlpamUhgAAADza6OE3huSbK2qRy3b9vgk1692R929v7v3LCwsTK04AAAA5tN6\nX7Joa1WdkmRLki1VdUpVbe3u25NcleSyqjqtqp6U5NlJrljP+gAAABiW9e7pvTjJnUn2Jnnh+P7F\n47ZXZHQZo5uSXJnk5d296p5ew5sBAACYWNfQ292XdnetuF06bvtsdz+nu0/r7kd29y8c5zEMbwYA\nACDJxpnTCwAAAFMn9AIAADBYgwu95vQCAAAwMbjQa04vAAAAE4MLvQAAADAh9MKMbd+xM1V13Lft\nO3bO+i0AAMCGtXXWBUxbVe1OsnvXrl2zLoVNYvuOnTl86OCa9nHB3muP+7VXX37umo4NAABDNrjQ\n2937k+xfXFx82axrYXM4fOig0AoAABuU4c3ApmeIOQDAcA2upxdgtfTWAwAMl55eAAAABmtwobeq\ndlfVvqWlpVmXAgAAwIwNLvR29/7u3rOwsDDrUgAAAJixwYVeAAAAmBB6AQAAGCyhFwAAgMESegEA\nABiswYVeqzcDAAAwMbjQa/VmAAAAJgYXegEAAGBC6AUAAGCwhF6ANTppy8mpqjXdtu/YOeu3AQAw\nSFtnXQDAvLvv3rtzwd5r17SPqy8/d0rVAACwnJ5eAAAABkvoBQAAYLAGF3pdpxcAAICJwYVe1+kF\nAABgYnChFwAAACaE3k1s+46dLrMCAAAMmksWbWKHDx10mRUAAGDQ9PQCAAAwWEIvAAAAgyX0MtfW\nOi/ZnGQAABg2c3qZa2udl2xOMgAADJueXgAAAAZL6AUAAGCwBhd6q2p3Ve1bWlqadSkAAADM2OBC\nb3fv7+49CwsLsy4F5sJaFwOzIBgAABuZhazm2PYdO3P40MFZl7EmQ3gP826ti4ElFgQDAGDjEnrn\n2BBWLh7CewAAADauwQ1vBgAAgAk9vcCanbTl5FTVcb/+y7c/Moc+dWCKFQEAwIjQC6zZfffebZg6\nAAAbkuHNAAAADJbQCzAAa730lMtOAQBDZXgzwABYCR0A4Mj09AIAADBYQi8AAACDNVeht6q+vapu\nnnUdAAAAzIe5Cb1VtSXJ/57kr2ZdCwAAAPNhbkJvkm9P8rYk9826EGBjWevKxQAADNe6rt5cVRcl\neXGSr01yZXe/eFnbmUnenOSbktyS5Ae7+xfGbVuSfGuS5yT5vvWsGdj4rFwMAMDRrPcliw4neXWS\nZyQ5dUXbG5LcneSsJOckeVdVXdfd1yd5YZJf7u779MoAAABwrNZ1eHN3X9Xd70jymeXbq+q0JBcm\nuaS7b+vuDyZ5Z5IXjZ/ymCTfUVVXJ3lUVf3EetYNALAe1jpdo6qyfcfOWb8NgA1lvXt6j+bRSe7p\n7huWbbsuyflJ0t3/52RjVX24u195pJ1U1Z4ke5Jk27Ztueaaa05cxffjwIEDMznurMzqc56Wea8/\n8R6GYtafwayPP8822/d9Tpy1TtdIRlM21uv/s3Ofzcq5P182Sug9PcnnV2xbSvKQlU/s7sWj7aS7\n9yXZlySLi4t93nnnTbPGVZnlsdfbvL/Xea8/8R6GYtafwayPP+98fmwk63k+OvfZrJz782OjrN58\nW5IzVmw7I8mtM6gFAACAgdgoofeGJFur6lHLtj0+yfWr3VFV7a6qfUtLS1MrDgAAgPm0rqG3qrZW\n1SlJtiTZUlWnVNXW7r49yVVJLquq06rqSUmeneSK1R6ju/d3956FhYXpFg8b1ElbTnaNWgAAOIr1\nntN7cZIfXvb4hUleleTSJK9I8jNJbspodeeXjy9XBNyP++692zVqAQDgKNY19Hb3pRkF3CO1fTbJ\nc9Z6jKranWT3rl271rorAAAA5txGmdM7NYY3sxprHRpseDAAAGxsG+WSRTATax0anBgeDAAAG9ng\nenoBAABgYnA9veb0wvyZDDMHAIBpG1zo7e79SfYvLi6+bNa1PJDtO3bm8KGDsy4DZs4K1AAAnCiD\nC73z5PChg37RBwAAOIHM6QUAAGCwBhd6q2p3Ve1bWlqadSkAAADM2OBCr+v0rq+1XucWYKPYvmPn\ncX8vW1xczPYdO2f9FgCAIzCnlzWxABEwFNZZAIBhGlxPLwCrt9ZRG3o5AYCNSk8vAEZtAACDNbie\nXgtZAQAAMDG40GshKwAAACYGF3oBAABgQugFgClY62JgFgQDgBPDQlYAMAVrXQwssSAYAJwIenoB\nAAAYrMGFXqs3AwAAMDG40Gv1ZgAAACYGF3oBAABgQugFgA1irStAW/0ZAL6Y1ZsBYINY6wrQVn8G\ngC+mpxcAAIDBEnoBAAAYLKEXAACAwRpc6HWdXgAAACYGF3pdpxcAAICJwYVeAAAAmBB6AQAAGCyh\nFwAAgMESegEAABgsoRcAAIDBEnoBAAAYLKEXAACAwRJ6AQAAGCyhFwAAgMEaXOitqt1VtW9paWnW\npQAAADBjgwu93b2/u/csLCzMuhQAAABmbHChFwAAACaEXgAAAAZL6AUAAGCwhF4AAAAGS+gFAABg\nsIReAAAABkvoBQAAYLCEXgAAAAZL6AUAAGCwhF4AAAAGa+usCzgWVXVWkrcn+UKSe5O8oLtvnG1V\nAAAAbHTz0tN7S5Jv6O7zk/y/Sf71jOsBYIPZvmNnquq4bwDAMM1FT29337vs4UOSXD+rWgDYmA4f\nOpgL9l573K+/+vJzp1gNALBRrGtPb1VdVFUfrqq7quotK9rOrKq3V9XtVXWgqp6/ov2cqvqDJBcl\n+cg6lg0AAMCcWu/hzYeTvDrJzxyh7Q1J7k5yVpIXJHljVX3NpLG7/6i7n5DkkiQ/uA61AgAAMOfW\nNfR291Xd/Y4kn1m+vapOS3Jhkku6+7bu/mCSdyZ50bj95GVPX0pyxzqVDAAAwBzbKHN6H53knu6+\nYdm265KcP75/TlW9LqOVm/82yUuPtJOq2pNkT5Js27Yt11xzzYmr+H4cOHBgJscF5tusvmdNy7zX\nPxS+DiTrdx74nYfNyrk/XzZK6D09yedXbFvKaNGqdPeHkjzlgXbS3fuS7EuSxcXFPu+886Zc5rGb\n5bGB+TTv3zfmvf6h8HUgWd/zwDnHZuXcnx8b5ZJFtyU5Y8W2M5LcOoNaAAAAGIiNEnpvSLK1qh61\nbNvjcxyXJqqq3VW1b2lpaWrFAQAAMJ/W+5JFW6vqlCRbkmypqlOqamt3357kqiSXVdVpVfWkJM9O\ncsVqj9Hd+7t7z8LCwnSLBwAAYO6sd0/vxUnuTLI3yQvH9y8et70iyalJbkpyZZKXd/eqe3oBAABg\nYl0XsuruS5NcepS2zyZ5zlqPUVW7k+zetWvXWncFAADAnNsoc3qnxvBmAAAAJgYXegEAYLPbvmNn\nquq4b9t37Jz1W4Cp2SjX6QUAAKbk8KGDuWDvtcf9+qsvP3eK1cBsDa6n1yWLAAAAmBhc6DWnFwAA\ngInBhV4AAACYEHoBAAAYrMGFXnN6AQAAmBhc6DWnFwAAgInjDr1VdWpVPa2qXMQLAACADemYQ29V\nvaWqXjG+f3KSDyX59SR/WlXPPEH1AQAAwHFbTU/vM5L8j/H9b07ykCRfluTS8Q0AAAA2lNWE3ocm\nuWl8/4Ikv9rdNyX5xSSPmXZhx8tCVgAAAEysJvR+Osljq2pLRr2+7x1vPz3JF6Zd2PGykBUAAAAT\nW1fx3J9J8ktJDie5N8lvjrc/IcnHp1wXAAAArNkxh97uvqyqrk/yyCRv6+67x033JHntiSgOAAAA\n1uKYQ29VPSXJr3X3PSuafj7JE6daFQAAAEzBaub0vj/JmUfYvjBuAwAAgA1lNXN6K0kfYfuXJrl9\nOuWsXVXtTrJ7165dsy4F4JidtOXkVNWsywAAGJwHDL1V9c7x3U7y1qq6a1nzliSPTfJ7J6C249Ld\n+5PsX1xcfNmsawE4Vvfde3cu2Hvtcb/+6svPnWI1s7F9x84cPnRw1mUAAANzLD29nxn/W0k+l+TO\nZW13J/lgkp+ecl0AbDKHDx3c9MEfAJi+Bwy93f2SJKmqv0zyuu7eMEOZAQAA4P6s5pJFrzqRhQAA\nAMC0reaSRWcmeU2SpyZ5RFas/NzdZ0y3NAAAAFib1aze/OYk5ybZl+RwjrySMwAAAGwYqwm9T03y\n9O7+gxNVDAAAAEzTSQ/8lL9zU5LbTlQh01JVu6tq39LS0qxLAQAAYMZWE3p/KMllVXX6iSpmGrp7\nf3fvWVhYmHUpAAAAzNhqhjdfnOTsJDdV1YEkX1je2N2Pm2JdAMyRk7acnKqadRkAAF9kNaH3V05Y\nFQDMtfvuvTsX7L12Tfu4+vJzp1QNx2v7jp05fOjgcb/+y7c/Moc+dWCKFQHA2rlOLwCQJDl86OCa\n/njhDxcAbESrmdMLAAAAc+WYe3qr6tbcz7V5u/uMqVQEAAAAU7KaOb0XrXj8oCTnJrkwyWumVhEA\nAABMyWrm9P7ckbZX1UeSPDXJ66dVFAAAAEzDNOb0vj/J7insBwAAAKZqGqH3eUlumcJ+pqKqdlfV\nvqWlpVmXAgAAwIytZiGr/z//cCGrSnJWkjOTvHzKdR237t6fZP/i4uLLZl0LAAAAs7Wahax+ZcXj\n+5LcnOQD3f3x6ZUEAAAA07GahaxedSILAQAAgGlbTU9vkqSq/kWSx2Q01Pn67v7AtIsCAACAaVjN\nnN7tSd6e5Lwkh8ebv7yqPpzkX3X34aO+GABgE9i+Y2cOHzo46zIAWGY1Pb0/keTeJLu6+y+SpKq+\nMslbx23PnX55AADz4/Chg7lg77XH/fqrLz93itUAkKwu9D49yTdOAm+SdPcnq+qVSX5z6pUBAADA\nGq32Or19jNsAAABg5lYTen8zyeur6ismG6rqkUl+PHp6AQAA2IBWE3pfmeS0JJ+sqgNVdSDJJ8bb\nXnkiigMAAIC1WM11ev+qqr4uydOS/OPx5o9193tPSGUAAACwRg/Y01tVz6yqv6yqM3rkN7r79d39\n+iR/OG57+jrUCgAAAKtyLMObL0ryn7r78ysbunspyWuTfM+0C1upqr6+qn6/qn67qq6sqged6GMC\nAAAw344l9D4uyf0NYX5fksdPp5z79VdJ/kV3PyXJXyZ59jocEwAAgDl2LHN6H57kvvtp7yRfOp1y\n7ucg3Tcue3h37r8mANh0Ttpycqpq1mUw57bv2JnDhw4e9+u/fPsjc+hTB6ZYEcDaHEvo/VRGvb1/\ndpT2xyU5dKwHrKqLkrw4ydcmubK7X7ys7cwkb07yTUluSfKD3f0LK16/c9z+6mM9JgBsBvfde3cu\n2Hvtcb/+6svPnWI1zKvDhw46j4BBOZbhze9K8iNVderKhqp6cJLLxs85VoczCqw/c4S2N2TUi3tW\nkhckeWNVfc2y452R5IokL+7uL6zimAAAAGxCx9LT+5okz01yQ1X9ZJKPj7f/k4wWuaokP3qsB+zu\nq5KkqhaT7Jhsr6rTklyY5LHdfVuSD1bVO5O8KMneqtqa5BeTvKq7//RYjwcAAMDm9YCht7tvqqon\nJnljRuF2Mlmok/z3JN/V3X89hVoeneSe7r5h2bbrkpw/vv/tSZ6Q5JKquiTJG7v7l5bvoKr2JNmT\nJNu2bcs111wzhbJW78AB81gA2Jxm9bOXf2jWX4dZH5/p8HU8Or/vz5dj6elNdx9I8qyqemiSXRkF\n3z/r7s9NsZbTk6y8LNJSkoeMa7gio6HN91fnviT7kmRxcbHPO++8KZa3OrM8NgDMip9/G8Osvw6z\nPj7T4et4/3w+8+OYQu/EOOT+4Qmq5bYkZ6zYdkaSW0/Q8QAAABi4Y1nIar3ckGRrVT1q2bbHJ7l+\nNTupqt1VtW9paWmqxQEAADB/1j30VtXWqjolyZYkW6rqlKra2t23J7kqyWVVdVpVPSnJs/MAQ5pX\n6u793b1nYWFh+sUDAAAwV2bR03txkjuT7E3ywvH9i8dtr0hyapKbklyZ5OXdvaqeXgAAAJhY1Zze\naejuS5NcepS2zyZ5zlr2X1W7k+zetWvXWnYDAADAAGykOb1TYXgzAAAAE4MLvQAAADAh9AIAADBY\ngwu9LlkEAADAxOBCrzm9AAAATAwu9AIAAMCE0AsATMVJW05OVR33bfuOnbN+CwAM0Lpfp/dEc51e\nAJiN++69Oxfsvfa4X3/15edOsRoAGBlcT685vQAAAEwMLvQCAADAhNALAADAYAm9AAAADNbgQm9V\n7a6qfUtLS7MuBQAAgBkbXOi1kBUAAAATgwu9AAAAMCH0AgAAMFhCLwAAAIMl9AIAADBYgwu9Vm8G\ngPl00paTU1Vrum3fsXPWbwOADWbrrAuYtu7en2T/4uLiy2ZdCwBw7O679+5csPfaNe3j6svPnVI1\nAAzF4Hp6AQAAYELoBQAGY61DpA2PBhiewQ1vBgA2r7UOkTY8GmB49PQCAAAwWEIvAAAAgyX0AgAA\nMFiDC72u0wsAAMDE4EJvd+/v7j0LCwuzLgUAAIAZG1zoBQAAgAmhFwAAgMESegEAABgsoRcAAIDB\nEnoBAAAYLKEXAACAwRJ6AQAAGCyhFwAAgMEaXOitqt1VtW9paWnWpQAAADBjgwu93b2/u/csLCzM\nuhQAAABmbHChFwAAACaEXgAAAAZL6AUAAGCwhF4AAAAGS+gFAABgsIReAAAABkvoBQAAYLCEXgAA\nAAZL6AUAAGCwhF4AAAAGS+gFAABgsOYi9FbVQlV9qKpuq6rHzroeAAAA5sNchN4kdyT5l0l+ZdaF\nAAAAMD/mIvR29xe6++ZZ1wEAAMB8WdfQW1UXVdWHq+quqnrLirYzq+rtVXV7VR2oquevZ20AAAAM\nz9Z1Pt7hJK9O8owkp65oe0OSu5OcleScJO+qquu6+/r1LREAAIChWNee3u6+qrvfkeQzy7dX1WlJ\nLkxySXff1t0fTPLOJC9az/oAAAAYlvXu6T2aRye5p7tvWLbtuiTnTx5U1bsz6gH+6qp6U3e/ZeVO\nqmpPkj1Jsm3btlxzzTUntOijOXDgwEyOCwCs3ax+f5imWb+HWR+f6Zjnr+Mzn7U7N99045r28fBH\nbMt73r3/iG1+358vGyX0np7k8yu2LSV5yORBdz/rgXbS3fuS7EuSxcXFPu+886ZZ46rM8tgAwPEb\nws/wWb+HWR+f6Zjnr+PNN92YC/Zeu6Z9XH35uff7Gczz57PZbJTVm29LcsaKbWckuXUGtQAAADAQ\nGyX03pBka1U9atm2xydZ9SJWVbW7qvYtLS1NrTgAAADm03pfsmhrVZ2SZEuSLVV1SlVt7e7bk1yV\n5LKqOq2qnpTk2UmuWO0xunt/d+9ZWFiYbvEAAADMnfXu6b04yZ1J9iZ54fj+xeO2V2R0GaObklyZ\n5OUuVwQAAMBarOtCVt19aZJLj9L22STPWesxqmp3kt27du1a664AAACYcxtlTu/UGN4MAADAxOBC\nLwAAAEwIvQAAAAzW4EKvSxYBAAAwMbjQa04vAAAAE4MLvQAAADAh9AIAADBYgwu95vQCAACztn3H\nzlTVcd+279g567cwGFtnXcC0dff+JPsXFxdfNutaAACAzenwoYO5YO+1x/36qy8/d4rVbG6D6+kF\nAACACaEXAACAwRJ6AQAAGKzBhV4LWQEAADAxuNDb3fu7e8/CwsKsSwEAAGDGBhd6AQAAYELoBQAA\nYLCEXgAAAAZL6AUAAGCwBhd6rd4MAADAxOBCr9WbAQAAmBhc6AUAAIAJoRcAAIDBEnoBAAAYLKEX\nAACAwRJ6AQAAGKzBhV6XLAIAAGBicKHXJYsAAACYGFzoBQAAgAmhFwAAgMESegEAABgsoRcAAIDB\nEnoBAAAYLKEXAACAwRJ6AQAAGCyhFwAAgMESegEAABiswYXeqtpdVfuWlpZmXQoAADOwfcfOVNVx\n3x70JQ9e0+u379g564+AAThpy8nOwynZOusCpq279yfZv7i4+LJZ1wIAwPo7fOhgLth77XG//urL\nz13z62Gt7rv3bufhlAyupxcAAAAmhF4AAAAGS+gFAABgsIReAAAABkvoBQAAYLCEXgAAAAZL6AUA\nAGCwhF4AAAAGS+gFAABgsIReAAAABkvoBQAAYLDmJvRW1Wur6neq6oqqetCs6wEAAGDjm4vQW1WP\nT7K9u5+c5ONJnjvjkgAAAJgDcxF6kzwxya+P71+d5EkzrAUAAIA5sa6ht6ouqqoPV9VdVfWWFW1n\nVtXbq+r2qjpQVc9f1vzQJJ8f319KcuY6lQwAAMAc27rOxzuc5NVJnpHk1BVtb0hyd5KzkpyT5F1V\ndV13X5/kb5KcMX7eQpLPrk+5AAAAzLN17ent7qu6+x1JPrN8e1WdluTCJJd0923d/cEk70zyovFT\nfi/J08b3n5Hkd9epZAAAAObYevf0Hs2jk9zT3Tcs23ZdkvOTpLv/qKr+uqp+J8nBJK870k6qak+S\nPUmybdu2XHPNNSe26qM4cODATI4LAKzdrH5/mKZZv4dZH38jWMtn8Mxn7c7NN904xWqOz2b/Op60\n5eRU1azLOG7TqP/hj9iW97x7/5Qqmp2NEnpPz9/P2Z1YSvKQyYPu/g8PtJPu3pdkX5IsLi72eeed\nN80aV2WWxwYAjt8QfobP+j3M+vgbwVo+g5tvujEX7L12Tce/+vJz1/T6xNfxvnvvXtPXYRpfg7VY\na/3J6D0M4TzYKKs335a/n7M7cUaSW2dQCwAAAAOxUULvDUm2VtWjlm17fJLrV7ujqtpdVfuWlpam\nVhwAAADzab0vWbS1qk5JsiXJlqo6paq2dvftSa5KcllVnVZVT0ry7CRXrPYY3b2/u/csLCxMt3gA\nAADmznr39F6c5M4ke5O8cHz/4nHbKzK6jNFNSa5M8vLx5YoAAADguKzrQlbdfWmSS4/S9tkkz1nr\nMapqd5Ldu3btWuuuAAAAmHMbZU7v1BjeDAAAwMTgQi8AAABMCL0AAAAM1uBCr0sWAQAAMDG40GtO\nLwAAABODC70AAAAwIfQCAAAwWIMLveb0AgAAMFHdPesaToiqujnJgRkd/mFJbpnRsWGWnPtsVs59\nNivnPpuVc39j2NndD3+gJw029M5SVX24uxdnXQesN+c+m5Vzn83Kuc9m5dyfL4Mb3gwAAAATQi8A\nAACDJfSeGPtmXQDMiHOfzcq5z2bl3Gezcu7PEXN6AQAAGCw9vQAAAAyW0AsAAMBgCb1TVFVnVtXb\nq+r2qjpQVc+fdU1wIlTVl1TVm8fn+a1V9UdV9cxl7U+tqo9X1R1V9f6q2jnLemHaqupRVfW3VfXW\nZdueP/4/cXtVvaOqzpxljXAiVNXzqupj4/P8E1X15PF23/cZrKo6u6reXVWfq6pPV9VPVtXWcds5\nVXXN+Ny/pqrOmXW9fDGhd7rekOTuJGcleUGSN1bV18y2JDghtib5qyTnJ1lIcnGSXx7/UHhYkquS\nXJLkzCQfTvJLsyoUTpA3JPnDyYPx9/o3JXlRRj8D7kjyX2dTGpwYVfX0JK9N8pIkD0nylCSf9H2f\nTeC/JrkpybYk52T0+88rqurkJL+W5K1JHprk55L82ng7G4iFrKakqk5L8rkkj+3uG8bbrkhyqLv3\nzrQ4WAdV9dEkr0rypUle3N1PHG8/LcktSc7t7o/PsESYiqp6XpJvSfInSXZ19wur6keTnN3dzx8/\n56uSfCzJl3b3rbOrFqanqn4vyZu7+80rtu+J7/sMWFV9LMn3dfe7x4//U5Izkvxqkp9NsqPHoaqq\nDibZ091Xz6pevpie3ul5dJJ7JoF37LokenoZvKo6K6P/A9dndM5fN2nr7tuTfCL+LzAAVXVGksuS\nfO+KppXn/ScyGvnz6PWrDk6cqtqSZDHJw6vqz6vqU+MhnqfG932G78eTPK+qHlxV25M8M8nVGZ3j\nH+1/2Iv40Tj3Nxyhd3pOT/L5FduWMhr+A4NVVQ9K8vNJfm78F/3TMzr3l/N/gaH4kYx6uj61Yrvz\nnqE7K8mDkjw3yZMzGuJ5bkbTW5z/DN1vZxRkP5/kUxkN4X9HnPtzQ+idntsyGuaw3BlJDGtjsKrq\npCRXZNSjddF4s/8LDNJ4cZKnJfmxIzQ77xm6O8f/vr67b+zuW5L85yTPivOfARv/rnN1RvPWT0vy\nsIzm7742zv25IfROzw1JtlbVo5Zte3xGwz1hcKqqkrw5o7/+X9jdXxg3XZ/RuT953mlJvir+LzD/\nvjHJ2UliLN3sAAAJGUlEQVQOVtWnk3x/kgur6iP54vP+K5N8SUY/G2DudffnMurhWj6Mc3Lf932G\n7Mwkj0zyk919V3d/JqN5vM/K6Bx/3Ph3oonHxbm/4Qi9UzKev3JVksuq6rSqelKSZ2fUCwZD9MYk\n/yTJ7u6+c9n2tyd5bFVdWFWnJPmPGc13sZgJ825fRr/InzO+/VSSdyV5RkZD/HdX1ZPHv/BfluQq\ni1gxMD+b5N9V1SOq6qFJ/n2S/y++7zNg41ENf5Hk5VW1tar+UZL/I6O5ux9Icm+SV44v5zgZ9fa+\nmRTLUQm90/WKJKdmtKT5lUle3t3+0sPgjK+/+J0Z/eL/6aq6bXx7QXffnOTCJK/JaEXzJyR53uyq\nheno7ju6+9OTW0bD2v62u28ef6//txmF35syms/1ihmWCyfCj2R0qa4bMlqd/Nokr/F9n03gW5Jc\nkOTmJH+e5AtJ/n13353kOUm+I8nfJHlpkueMt7OBuGQRAAAAg6WnFwAAgMESegEAABgsoRcAAIDB\nEnoBAAAYLKEXAACAwRJ6AQAAGCyhFwDYsKrqG6uqq+phs64FgPkk9AIwt6rqLeNA1FX1haq6qare\nX1XfVVUPWvHcD4yf96IV219cVbet2PZvquraqrqtqpaq6qNV9epjqOcJVfXOqvpsVd1VVR+vqh+u\nqlOm845PjPFn85PqAGCIhF4A5t17k2xLcnaSb0qyP8mrkvxOVZ224rl/m+RHqupLjrazqnppkp9I\n8lNJzknyz5L8SJIH318RVfXNSX4nyWeSPC3Jo8d17Eny61V18mrf2GpU1UlVteVEHgMA5pHQC8C8\nu6u7P93dh7r7j7r7Pyf5xiRfl+QHVjz3l5KcmuS77md/35zkqu5+U3f/eXd/rLvf1t3fe7QXVNWD\nk7w5ybu7+yXd/ZHuPtDdVybZneQbknz3sud3VT13xT7+sqq+f9njharaN+69vrWqfquqFpe1v3jc\nE/2sqvrjJHcnedK4x/vLVuz7NVX10ft5z/erqrZX1S9W1efGt3dV1aOWtV9aVX9cVc+rqk+M633H\n8iHJVbW1qn5s2T5+rKreWFUfGLe/Jcn5Sb5rWe/92cvKeHxV/UFV3VFVH66qr1vxWV0x/qz+tqo+\nWVXfc7zvF4BhEXoBGJzu/uMkVye5cEXTbRn1vv5QVf2jo7z800m+vqq+chWHfEaShyX5f45Qy0eS\n/GaS5x/rzqqqkrwryfYk/1uSc5P8dpL3VdW2ZU89JcklSb4zyWOSXJvkE0m+Y9m+Tho/fvMq3s/y\nWh6c5P0Z9ZKfn+SfJ7kxyXvHbRNnJ/m2JP8qox73c5O8Zln79yd5cZJ/k1Hv+Un5h5/Jdyf5/SQ/\nm1HP/bYkf7Ws/f9OsjejP2Z8JsnPjz+nJHl1kq/N6LP66iQvTXLoeN4vAMMj9AIwVH+S5EjBdV9G\noWnvUV73qnH7J6rqz6rqrVX1HSvnCK/w6PG/H7ufWr76GGqe+F8zGlr93O7+0LjH+ZIkn0yyfE7y\nliQXdffvdvcN3X1rkv+W5CXLnvOMJI9I8tZVHH+55yWpJC/p7o9298czCtmnZxQyJ7YmefH4Ob+f\n0ef81GXt353ktd39q939p0m+J6M/MCRJunspo97qO8Y995/u7nuXvf6S7n7/+PiXJfnHGf1RIEl2\nJvnI+LM60N0f6O63Hef7BWBghF4AhqqS9MqN3X1Pkh9K8sqq2n6E9hu7+59n1HP44+P9vCnJh1b0\nbK7W3at47nkZzSG+eTyE+bbxYluPTfJVy553T5I/WvHan0vylVX1xPHjlyZ5R3d/5jjrPi/J/5Lk\n1mV1LCV56IpaDoyD68ThjMJ2qmohyZcl+dCksbt7+eNjsHx49uHxv48Y//vGJN9WVddV1euq6vxV\n7BeAgds66wIA4AR5TEY9o1+ku982nj97WUaLTx3pOX+c5I+TvKGqvmH8vG9N8pYjPP2GZcf83aPU\ncsOyx51RmF5ueU/ySUn+OsmTj7Cvzy+7f9eK3tB0981V9c4kL62qP81ojvLuI+znWJ2UUbB+3hHa\nPrvs/hdWtHWm+8f15fuf/DHjpCTp7vdU1c4kz8yod/ldVfW27n5JANj0hF4ABqeqHpvkgozmeh7N\nD2Q01/az9/OciT8Z/3v6Udr/e5JbkvyHrAi94wWXnprkomWbb85ozurkOWctf5zkI0nOSnJfdx8x\nuD+An07yKxmF/k9ntML18fpIkm9Pckt3/83x7KC7l6rq00n+aZL3JX83b/mfZtkQ54x6w49rBeru\nviXJFUmuqKr3JLmyqv5td991PPsDYDiEXgDm3ZeMVys+KcnDMwqY/1eSa5K87mgv6u7fqqqrMwqj\nf9dbWlVvzGj47PuSfCqjMHpxkjuS/PpR9nVHVf3rJL9SVT+T5PUZzQt+4riGqzMaIj3xvoxWKf69\n8bF/NKOFoibem1F4/rWq+oEkH89oePAFSd7b3UfsnV7mN8bH/+Ekl3f3fQ/w/CR5WFWds2LbTUl+\nPqNFqH6tqv5jkoNJviLJs5P8VHf/2THsO0n+S5IfqKobMvojwndm9NneuOw5f5nRImJnZ7To2LH8\nQSJVdVlG4fz6jH63+ZYknxR4AUjM6QVg/j0to+B0MKOe229OcmmSp3T37Q/w2r1JVl4/9zeSPCHJ\nL2c0JPnt4+1P7+4bchTd/c4kT8koeL8vyYEkV2bU47p7xTDk78uoF/YD4/b/llHAnOyrkzxrvJ+f\nTvKn43q+On8/n/Woxq//2YyGTP/sAz1/7NsyWv15+e17u/uO8fv6ZJK3ZRTAfy6jOb2fO8Z9J6Pw\nf8W4nv8x3vb2/MOw/7qMenv/JKPe8Ece477vymil6Osy+mPBQ7K2Id0ADEiNfi4CANNUVVsy6iV9\ncpLzu/vP1/n4b0yyq7ufvp7HXY2qujbJB7v73826FgCGy/BmADgBuvveqnpBRpfqeUqSdQm945WS\nH5PRtXm/dT2OeSzGC009I8lvZdQD/bIkjxv/CwAnjJ5eABiQqvpAkq9P8uaN1INaVV+R0XDvr81o\netWfZHTt3SPOkwaAaRF6AQAAGCwLWQEAADBYQi8AAACDJfQCAAAwWEIvAAAAgyX0AgAAMFhCLwAA\nAIP1PwEmF9OY4U56twAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Show histogram of the Spark DF request body lengths\n", "bins, counts = spark_df.select('query_length').rdd.flatMap(lambda x: x).histogram(50)\n", "\n", "# This is a bit awkward but I believe this is the correct way to do it\n", "plt.hist(bins[:-1], bins=bins, weights=counts, log=True)\n", "plt.grid(True)\n", "plt.xlabel('DNS Query Lengths')\n", "plt.ylabel('Counts')" ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA70AAAF+CAYAAABOPn2fAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X2U5HddJ/r3JzNAYh5aohjHiQxXk+AKa8hOX12JPLiA\nBL0juOG6GEHRlTkL5rB7vbsar+EaIGo4l7O7wkZlFDYaFlC4AZmLZn0MgqxCxhjXYIyiTEwGCA/S\neSAkJPncP6oKimYm0z3T01X1y+t1Tp10fb9Vv/pUf6a78u7v76G6OwAAADBEx826AAAAADhWhF4A\nAAAGS+gFAABgsIReAAAABkvoBQAAYLCEXgAAAAZL6AUAAGCwhF4AAAAGS+gFAABgsIReAAAABmvr\nrAvYaFW1K8muk08++UVnnXXWrMs5pLvvvjsnnHDCrMtgg+nrMOnrcOntMOnrMOnrcOntMG1GX/ft\n2/eJ7n7U4R5X3X1MC5mV5eXlvvbaa2ddxiHt27cvO3funHUZbDB9HSZ9HS69HSZ9HSZ9HS69HabN\n6GtV7evu5cM9zu7NAAAADNbgQm9V7aqqPSsrK7MuBQAAgBkbXOjt7r3dvXtpaWnWpQAAADBjgwu9\nAAAAMCH0AgAAMFiDC72O6QUAAGBicKHXMb0AAABMDC70AgAAwITQCwAAwGANLvQ6phcAAICJwYVe\nx/QCAAAwMbjQCwAAABNC7wxtP31HquqIb9tP3zHrtwAAADDXts66gIeyA7fenPMuuu6In3/1Zeds\nYDUAAADDM7iVXieyAgAAYGJwodeJrAAAAJgYXOgFAACACaEXAACAwRJ6AQAAGCyhFwAAgMESegEA\nABiswYVelywCAABgYnCh1yWLAAAAmBhc6AUAAIAJoRcAAIDBEnoBAAAYLKEXAACAwRJ6AQAAGCyh\nFwAAgMESegEAABiswYXeqtpVVXtWVlZmXQoAAAAzNrjQ2917u3v30tLSrEsBAABgxgYXegEAAGBC\n6AUAAGCwhF4AAAAGS+gFAABgsIReAAAABkvoBQAAYLCEXgAAAAZrIUJvVT2mqj5eVdeMb4+adU0A\nAADMv62zLmAd3t3dz511EQAAACyOhVjpHTu3qt5TVT9bVTXrYgAAAJh/mxp6q+rCqrq2qu6pqitW\nzZ1aVW+vqruqan9VXTA1/ZEkZyR5cpKvSvIvN69qAAAAFtVmr/QeSHJpkjccZO7yJPcmOS3J9yf5\nxap6XJJ09z3dfVd3d5Krkpy9SfUCAACwwDY19Hb3Vd39jiSfnB6vqhOTnJ/kZd19Z3e/N8k7k7xg\nPH/y1MOflORvN6lkAAAAFti8HNN7VpL7uvumqbHrkzxu/PW3VdW+qnpPku1J3rTZBQIAALB45uXs\nzScluX3V2EqSk5Oku387yW8fbiNVtTvJ7iTZtm1b9u3bt8Flbpz9+/dvyHbm+T0+FG1UX5kv+jpc\nejtM+jpM+jpcejtM89TXeQm9dyY5ZdXYKUnuWM9GuntPkj1Jsry83Dt37tyY6ubYQ+E9Lho9GSZ9\nHS69HSZ9HSZ9HS69HaZ56eu87N58U5KtVXXm1NjZSW5Y74aqaldV7VlZWdmw4gAAAFhMm33Joq1V\ndXySLUm2VNXxVbW1u+/K6KzMr6iqE6vq3CTPTnLlel+ju/d29+6lpaWNLR4AAICFs9krvRcnuTvJ\nRUmeP/764vHcS5KckOS2JG9O8uLuXvdKLwAAAExs6jG93X1JkksOMfepJM852teoql1Jdp1xxhlH\nuykAAAAW3Lwc07th7N4MAADAxOBCLwAAAEwMLvQ6ezMAAAATgwu9dm8GAABgYnChFwAAACYGF3rt\n3gwAAMDE4EKv3ZsBAACYGFzoBQAAgAmhFwAAgMESegEAABiswYVeJ7ICAABgYnCh14msAAAAmBhc\n6AUAAIAJoRcAAIDBEnoBAAAYrMGFXieyAgAAYGJwodeJrAAAAJgYXOgFAACACaEXAACAwRJ6AQAA\nGCyhFwAAgMEaXOh19mYAAAAmBhd6nb0ZAACAicGFXgAAAJgQegEAABgsoRcAAIDBEnoBAAAYLKEX\nAACAwRJ6AQAAGCyhFwAAgMEaXOitql1VtWdlZWXWpQAAADBjgwu93b23u3cvLS3NuhQAAABmbHCh\nFwAAACaEXgAAAAZL6AUAAGCwhF4AAAAGS+gFAABgsIReAAAABkvoBQAAYLCEXgAAAAZroUJvVX1f\nVX181nUAAACwGBYm9FbVliT/e5J/mHUtAAAALIaFCb1Jvi/JW5M8MOtCAAAAWAybGnqr6sKquraq\n7qmqK1bNnVpVb6+qu6pqf1VdMDW3Jcn3Jvn1zawXAACAxbZ1k1/vQJJLkzwzyQmr5i5Pcm+S05I8\nIcm7qur67r4hyfOT/EZ3P1BVm1kvAAAAC2xTV3q7+6rufkeST06PV9WJSc5P8rLuvrO735vknUle\nMH7INyb5gaq6OsmZVfWazawbAACAxbTZK72HclaS+7r7pqmx65M8JUm6+ycmg1V1bXe/9GAbqard\nSXYnybZt27Jv375jV/FR2r9//4ZsZ57f40PRRvWV+aKvw6W3w6Svw6Svw6W3wzRPfZ2X0HtSkttX\nja0kOXn1A7t7+VAb6e49SfYkyfLycu/cuXMja5xLD4X3uGj0ZJj0dbj0dpj0dZj0dbj0dpjmpa/z\ncvbmO5OcsmrslCR3zKAWAAAABmJeQu9NSbZW1ZlTY2cnuWG9G6qqXVW1Z2VlZcOKAwAAYDFt9iWL\ntlbV8Um2JNlSVcdX1dbuvivJVUleUVUnVtW5SZ6d5Mr1vkZ37+3u3UtLSxtbPAAAAAtns1d6L05y\nd5KLMroM0d3jsSR5SUaXMbotyZuTvHh8uaJ1sdILAADAxGZfsuiS7q5Vt0vGc5/q7ud094nd/eju\nftMRvoaVXgAAAJLMzzG9AAAAsOEGF3rt3gwAAMDE4EKv3ZsBAACYGFzoBQAAgAmhFwAAgMEaXOh1\nTC8AAAATgwu9jukFAABgYnChFwAAACaEXgAAAAZrcKHXMb0AAABMDC70OqYXAACAicGFXgAAAJgQ\negEAABgsoRcAAIDBGlzodSIrAAAAJgYXep3ICgAAgInBhV4AAACYEHoBAAAYLKEXAACAwRJ6AQAA\nGCyhFwAAgMEaXOh1ySIAAAAmBhd6XbIIAACAicGFXgAAAJgQegEAABgsoRcAAIDBEnoBAAAYLKEX\nAACAwRJ6AQAAGCyhFwAAgMEaXOitql1VtWdlZWXWpQAAADBjgwu93b23u3cvLS3NuhQAAABmbHCh\nFwAAACaOOPRW1QlV9fSq2rGRBQEAAMBGWXPoraorquol468fnuT9SX4nyV9X1bOOUX0AAABwxNaz\n0vvMJH8y/vq7k5yc5KuTXDK+AQAAwFxZT+h9ZJLbxl+fl+T/7e7bkrwlyTdudGEAAABwtNYTej+a\n5PFVtSWjVd/fG4+flORzG10YAAAAHK2t63jsG5L8epIDSe5P8vvj8W9JcuMG1wUAAABHbc2ht7tf\nUVU3JHl0krd2973jqfuSvOpYFAcAAABHY82ht6qenOQ3u/u+VVP/LckTN7SqL33t05K8PaPdqO9P\n8v3d/ZFj+ZoAAAAsvvUc0/uHSU49yPjSeO5Y+kSSb+vupyT5tST/+hi/HgAAAAOwnmN6K0kfZPwr\nkty1MeUcXHffP3X35CQ3HMvXAwAAYBgOG3qr6p3jLzvJG6vqnqnpLUken+R9a3mxqrowyQuT/NMk\nb+7uF07NnZrk9Um+I6OV3Z/s7jdNzT8hyeuSfPn4MQAAAPCg1rLS+8nxfyvJPya5e2ru3iTvTfLL\na3y9A0kuzeiSRyesmrt8vL3Tkjwhybuq6vruviFJuvvPk3xLVX1vkp9M8m/W+JoAAAA8RB029Hb3\nDyVJVX04yau7+4h3Ze7uq8bbWk5y+mS8qk5Mcn6Sx3f3nUneO15hfkGSi6rq4VNni15J8pkjrQEA\nAICHjvVcsujlx7COs5Lc1903TY1dn+Qp46+fUFWvzujMzZ9N8sMH20hV7U6yO0m2bduWffv2HbuK\nj9L+/fs3ZDvz/B4fijaqr8wXfR0uvR0mfR0mfR0uvR2meerrei5ZdGqSn0nytCRflVVnfu7uU46i\njpOS3L5qbCWjk1alu9+f5MmH20h370myJ0mWl5d7586dR1HSYngovMdFoyfDpK/DpbfDpK/DpK/D\npbfDNC99Xc/Zm1+f5JyMQuWBHPxMzkfqziSrQ/MpSe5Y74aqaleSXWecccZG1AUAAMACW0/ofVqS\nZ3T3nx6DOm5KsrWqzuzuvxmPnZ0juDRRd+9Nsnd5eflFG1kgAAAAi+e4wz/k827LaEX2iFXV1qo6\nPqNLHW2pquOrauv45FhXJXlFVZ1YVecmeXaSK4/m9QAAAHhoW0/o/amMQulJR/F6F2d0yaOLkjx/\n/PXF47mXZHQZo9uSvDnJiyeXK1qPqtpVVXtWVlaOokwAAACGYD27N1+c5DFJbquq/Uk+Nz3Z3d90\nuA109yVJLjnE3KeSPGcd9RzqNezeDAAAQJL1hd63HbMqAAAA4BiYl+v0bhhnbwYAAGBiPcf0LoTu\n3tvdu5eWlmZdCgAAADO25pXeqrojD3Jt3u5efZ1dAAAAmKn1HNN74ar7D0tyTpLzk/zMhlUEAAAA\nG2Q9x/T+6sHGq+rPkjwtyWs3qqij4ZheAAAAJjbimN4/TLJrA7azIRzTCwAAwMRGhN7nJfnEBmwH\nAAAANtR6TmT1P/PFJ7KqJKclOTXJize4LgAAADhq6zmR1dtW3X8gyceTXNPdN25cSUfHMb0AAABM\nrOdEVi8/loVslO7em2Tv8vLyi2ZdCwAAALO1npXeJElV/Ysk35jRrs43dPc1G10UAAAAbIT1HNO7\nPcnbk+xMcmA8/DVVdW2S7+nuA4d8MgAAAMzAes7e/Jok9yc5o7u/tru/NsmZ47HXHIviAAAA4Gis\nZ/fmZyR5anf//WSgu/+uql6a5Pc3vLIj5ERWAAAATKz3Or29xrGZ6e693b17aWlp1qUAAAAwY+sJ\nvb+f5LVV9bWTgap6dJL/nDla6QUAAICJ9YTelyY5McnfVdX+qtqf5EPjsZcei+IAAADgaKznOr3/\nUFX/LMnTk3zDePivuvv3jkllAAAAcJQOu9JbVc+qqg9X1Sk98rvd/drufm2SD4znnrEJtQIAAMC6\nrGX35guT/D/dffvqie5eSfKqJP9uowsDAACAo7WW0PtNSR5sF+Y/SHL2xpRz9KpqV1XtWVlZmXUp\nAAAAzNhaQu+jkjzwIPOd5Cs2ppyj55JFAAAATKwl9N6S0WrvoXxTkls3phwAAADYOGsJve9K8sqq\nOmH1RFV9WZJXjB8DAAAAc2Utlyz6mSTPTXJTVf2XJDeOx/9JRie5qiQ/e2zKAwAAgCN32NDb3bdV\n1ROT/GJG4bYmU0n+e5If7e6PHbsSAQAA4MisZaU33b0/yXdW1SOTnJFR8P2b7v7HY1kcx9b203fk\nwK03H9U2vmb7o3PrLfs3qCIAAICNtabQOzEOuR84RrWwyQ7cenPOu+i6o9rG1Zeds0HVAAAAbLy1\nnMgKAAAAFtLgQm9V7aqqPSsrK7MuBQAAgBkbXOjt7r3dvXtpaWnWpQAAADBjgwu9AAAAMCH0AgAA\nMFhCLwAAAIMl9AIAADBY67pOL/PluC0PT1XNugwAAIC5JfQusAfuvzfnXXTdET//6svOOeoajjZ4\nf832R+fWW/YfdR0AAAAHI/RyVOYheAMAABzKwhzTW1XfXFX/o6r+qKreXFUPm3VNAAAAzLeFCb1J\n/iHJv+juJyf5cJJnz7YcAAAA5t3C7N7c3R+ZuntvkgdmVQsAAACLYdNXeqvqwqq6tqruqaorVs2d\nWlVvr6q7qmp/VV1wkOfvSPIdSfZuUskAAAAsqFms9B5IcmmSZyY5YdXc5Rmt4p6W5AlJ3lVV13f3\nDUlSVackuTLJC7v7c5tXMgAAAIto01d6u/uq7n5Hkk9Oj1fViUnOT/Ky7r6zu9+b5J1JXjCe35rk\nLUle3t1/vcllAwAAsIDm6URWZyW5r7tvmhq7Psnjxl9/X5JvSfKyqrqmqv7VZhcIAMy37afvyPLy\ncqrqiG7bT98x67cAwAabpxNZnZTk9lVjK0lOTpLuvjKjXZsPqap2J9mdJNu2bcu+ffuOQZkbY//+\n/bMuYW7Mc5/WS1+HSV+HS2+H58CtNx/19eOH9Lk0JH5eh0tvh2me+jpPoffOJKesGjslyR1r3UB3\n70myJ0mWl5d7586dG1cdx8zQ+jS098OIvg6X3rKafxPzS2+GS2+HaV76Ok+7N9+UZGtVnTk1dnaS\nG9azkaraVVV7VlZWNrQ4AAAAFs8sLlm0taqOT7IlyZaqOr6qtnb3XUmuSvKKqjqxqs5N8uwcZpfm\n1bp7b3fvXlpa2vjiAThmtp++44iPw3QsJgBwKLPYvfniJD89df/5SV6e5JIkL0nyhiS3ZXR25xdP\nLlcEwLBtxLGYD3XbT9+RA7fefMTP/5rtj86tt8zPMVgAsBE2PfR29yUZBdyDzX0qyXOOZvtVtSvJ\nrjPOOONoNgObxv+kAhvFHw4A4EvN04msNkR3702yd3l5+UWzrgXWwv+kAgDAsTNPJ7ICAABgAxzt\nuTKGdL6Mwa302r0ZAAB4qDvavQmT4exROLiVXmdvBgAAYGJwoRcAAAAmBhd6q2pXVe1ZWVmZdSkA\nAADM2OBCr92bAQAAmBhc6AUAAIAJoRcAAIDBEnoBAAAYrMGFXieyAgAAYGJwodeJrAAAAJgYXOgF\nAACACaEXAACAwRJ6AQAAGKzBhV4nsgIAAGBicKHXiawAAACYGFzoBQAAgAmhFwAAgMESegEAABgs\noZeZOm7Lw1NVR3zbfvqOWb8FAABgjm2ddQEbrap2Jdl1xhlnzLoU1uCB++/NeRddd8TPv/qyczaw\nGgAAYGgGt9Lr7M081FgtBwCAQxvcSi881FgtBwCAQxvcSi8AAABMCL0AAAAMltALAADAYAm9AAAA\nDJbQCwAAwGAJvQAAAAzW4EJvVe2qqj0rKyuzLgUAAIAZG1zo7e693b17aWlp1qUAAAAwY4MLvQAA\nADAh9AIAADBYQi8AAACDJfQCAAAwWEIvcNS2n74jVXXEt+2n75j1WwAAYKC2zroAYPEduPXmnHfR\ndUf8/KsvO2cDqwEAgC+w0gsAAMBgCb0AAAAM1kKE3qpaqqr3V9WdVfX4WdcDAADAYliI0JvkM0m+\nK8nbZl0IAAAAi2MhQm93f667Pz7rOgAAAFgsmxp6q+rCqrq2qu6pqitWzZ1aVW+vqruqan9VXbCZ\ntQEAADA8m33JogNJLk3yzCQnrJq7PMm9SU5L8oQk76qq67v7hs0tEdZn++k7cuDWm2ddxhE7bsvD\nU1WzLgMAAI6JTQ293X1VklTVcpLTJ+NVdWKS85M8vrvvTPLeqnpnkhckuWgza4T1WvRr1D5w/71H\nVX8y+/cAAACHstkrvYdyVpL7uvumqbHrkzxlcqeqfiujFeDHVtXruvuK1Rupqt1JdifJtm3bsm/f\nvmNa9NHYv3//rEsYjHnuM2s3z33087o41vvvSG+/1Dz/LG4W34P55Od1uPR2vh3p78R56uu8hN6T\nkty+amwlycmTO939nYfbSHfvSbInSZaXl3vnzp0bWSNzSp+HYd77OO/1MXIkfdLbL+b74Xswz/Rm\nuPR2fh1Nb+alr/Ny9uY7k5yyauyUJHfMoBYAAAAGYl5C701JtlbVmVNjZydZ90msqmpXVe1ZWVnZ\nsOIAAABYTJt9yaKtVXV8ki1JtlTV8VW1tbvvSnJVkldU1YlVdW6SZye5cr2v0d17u3v30tLSxhYP\nAADAwtnsld6Lk9yd0RmZnz/++uLx3EsyuozRbUnenOTFR3K5Iiu9AAAATGxq6O3uS7q7Vt0uGc99\nqruf090ndveju/tNR/gaVnoBAABIMj/H9AIAAMCGG1zotXszLJ7jtjw8VXXEt+2n75j1WwAAYE7N\ny3V6N0x3702yd3l5+UWzrgVYmwfuvzfnXXTdET//6svO2cBqAAAYksGt9AIAAMCE0AsAAMBgDS70\nOqYXAACAicGFXpcsAgAAYGJwoRcAAAAmhF4AAAAGa3Ch1zG9AAAATAwu9DqmFwAAgInBhV4AAACY\nEHoBAAAYLKEXAACAwRpc6HUiKwAAYNa2n74jVXXEt+2n75j1WxiMrbMuYKN1994ke5eXl18061oA\nAICHpgO33pzzLrruiJ9/9WXnbGA1D22DW+kFAACACaEXAACAwRJ6AQAAGCyhFwAAgMESegEAABis\nwYVelyx6aDluy8OdCh6AueESJfNBH4BpLlnEQnvg/nudCh6AueESJfNBH4Bpg1vpBQAAgAmhFwAA\ngMESegEAABgsoRcAAIDBEnoBAAAYLKEXAACAwRJ6AQAAGKzBhd6q2lVVe1ZWVmZdCgAAADM2uNDb\n3Xu7e/fS0tKsSwEAAGDGBhd6AQAAYELoBQAAYLCEXgAAAAZL6AUAAGCwhF4AAAAGS+gFAABgsIRe\nAAAABkvoBQAAYLAWJvRW1auq6j1VdWVVPWzW9QAAADD/FiL0VtXZSbZ395OS3JjkuTMuCQAAgAWw\nEKE3yROT/M7466uTnDvDWgAAAFgQmxp6q+rCqrq2qu6pqitWzZ1aVW+vqruqan9VXTA1/cgkt4+/\nXkly6iaVDAAAwALbusmvdyDJpUmemeSEVXOXJ7k3yWlJnpDkXVV1fXffkOTTSU4ZP24pyac2p1wA\nAAAW2aau9Hb3Vd39jiSfnB6vqhOTnJ/kZd19Z3e/N8k7k7xg/JD3JXn6+OtnJvnjTSoZAACABbbZ\nK72HclaS+7r7pqmx65M8JUm6+8+r6mNV9Z4kNyd59cE2UlW7k+xOkm3btmXfvn3HtuqjsH///lmX\nwNg8/zthbY7b8vBU1RE//1FftS2//Vt7Dznv53VxrPfnWW+/lN+Js/8ezPr159Vm/7zqw+bxu/jQ\n5uHf4ZHWME99nZfQe1K+cMzuxEqSkyd3uvs/HG4j3b0nyZ4kWV5e7p07d25kjQyUfyeL74H77815\nF113xM+/+rJzDvvvwL+TxXAkfdLbL+b7Mfvvwaxff55t5vdGHzaX7/fBzcP35WhqmIf6k/k5e/Od\n+cIxuxOnJLljvRuqql1VtWdlZWVDCgMAAGBxzUvovSnJ1qo6c2rs7CQ3rHdD3b23u3cvLS1tWHEA\nAAAsps2+ZNHWqjo+yZYkW6rq+Kra2t13JbkqySuq6sSqOjfJs5NcuZn1AQAAMCybvdJ7cZK7k1yU\n5Pnjry8ez70ko8sY3ZbkzUlePL5c0brYvRkAAICJzb5k0SXdXatul4znPtXdz+nuE7v70d39piN8\nDbs3AwAAkGR+jukFAACADTe40Gv3ZgAAACYGF3rt3gwAAMDE4EIvAAAATAi9AAAADNbgQq9jegEA\nAJgYXOh1TC8AAAATgwu9AAAAMFHdPesajomq+niS/bOu40F8ZZJPzLoINpy+DpO+DpfeDpO+DpO+\nDpfeDtNm9HVHdz/qcA8abOidd1V1bXcvz7oONpa+DpO+DpfeDpO+DpO+DpfeDtM89dXuzQAAAAyW\n0AsAAMBgCb2zs2fWBXBM6Osw6etw6e0w6esw6etw6e0wzU1fHdMLAADAYFnpBQAAYLCEXgAAAAZL\n6N1kVXVqVb29qu6qqv1VdcGsa+LwqurCqrq2qu6pqitWzT2tqm6sqs9U1R9W1Y6puUdU1Ruq6vaq\n+mhV/dimF88hjfvz+vHP4h1V9edV9aypeb1dUFX1xqr6yLg/N1XVj0zN6euCq6ozq+qzVfXGqbEL\nxj/Ld1XVO6rq1Kk5n71zrqquGff0zvHtr6fm9HaBVdXzquqvxj36UFU9aTzud/GCmvo5ndzur6rX\nTs3PXW+F3s13eZJ7k5yW5PuT/GJVPW62JbEGB5JcmuQN04NV9ZVJrkrysiSnJrk2ya9PPeSSJGcm\n2ZHk25P8eFWdtwn1sjZbk/xDkqckWUpycZLfqKrH6O3C+7kkj+nuU5J8d5JLq2qnvg7G5Uk+MLkz\n/hx9XZIXZPT5+pkkv7Dq8T5759+F3X3S+PbYRG8XXVU9I8mrkvxQkpOTPDnJ3/ldvNimfk5PSvLV\nSe5O8tZkfv/f2ImsNlFVnZjkH5M8vrtvGo9dmeTW7r5opsWxJlV1aZLTu/uF4/u7k7ywu584vn9i\nkk8kOae7b6yqA+P53xnPvzLJmd39vJm8AQ6rqv4iycuTfEX0dhCq6rFJrknyb5N8efR1oVXV85L8\nyyQfTHJGdz+/qn42oz9yXDB+zNcn+auMfo4fiM/euVdV1yR5Y3f/yqpxvV1gVfW+JK/v7tevGvf/\nTwNRVT+Y5KeTfH1397z21krv5joryX2TX8xj1yfxF8nF9biMepgk6e67knwoyeOq6pFJtk3PR7/n\nWlWdltHP6Q3R24VXVb9QVZ9JcmOSjyT5rejrQquqU5K8Isnq3eFW9/VDGa3+nRWfvYvk56rqE1X1\nx1X11PGY3i6oqtqSZDnJo6rqb6vqlqr6L1V1QvwuHpIfTPJr/YWV1LnsrdC7uU5KcvuqsZWMdvdg\nMZ2UUQ+nTXp60tT91XPMmap6WJL/luRXu/vG6O3C6+6XZNSTJ2W0q9U90ddF98qMVo1uWTV+uL76\n7J1/P5Hk65Jsz+jannvHq7p6u7hOS/KwJM/N6PfwE5Kck9GhRH4XD8D4WN2nJPnVqeG57K3Qu7nu\nTHLKqrFTktwxg1rYGA/W0zun7q+eY45U1XFJrsxo9eDC8bDeDkB339/d701yepIXR18XVlU9IcnT\nk/yng0wfrq8+e+dcd/9pd9/R3fd0968m+eMk3xm9XWR3j//72u7+SHd/Isl/zNr6mvhdvAhekOS9\n3f33U2Nz2Vuhd3PdlGRrVZ05NXZ2RrtSsphuyKiHST5/3MLXJ7mhu/8xo10qz556vH7PmaqqJK/P\n6C/S53f358ZTejssWzPuX/R1UT01yWOS3FxVH03y75OcX1V/li/t69cleURGn7s+exdTJ6no7cIa\n/069JaMIk2w4AAAJWUlEQVRefn54/F+/i4fhB/LFq7zJvPa2u9028ZbkLUnenOTEJOdmtKT/uFnX\n5XbYvm1NcnxGZ4S9cvz11iSPGvfw/PHYq5L8ydTzLkvy7iSPTPINGf2gnzfr9+P2Rb39pSR/kuSk\nVeN6u6C3JF+V5HkZ7Ua1Jckzk9yV0Vmc9XVBb0m+LKOzhE5ur07ytnFPH5fRbq5PGn++vjHJW6ae\n67N3jm8ZnWDumVOfrd8//pk9S28X+5bRMfgfGP9efmSS92R0mILfxQt+S/LE8c/pyavG57K3M/+G\nPdRuGZ26+x3jfyQ3J7lg1jW5ralvl2T018np2yXjuadndKKcuzM6Q+xjpp73iIwuc3R7ko8l+bFZ\nvxe3L+rrjnEvP5vRLjeT2/fr7eLexh+4707y6XF//meSF03N6+sAbuPfy2+cun/B+HP1riS/meTU\nqTmfvXN8G//MfiCjXRw/ndEfIp+ht4t/y+iY3l8Y9/WjSV6T5PjxnN/FC3zL6FJiVx5ibu5665JF\nAAAADJZjegEAABgsoRcAAIDBEnoBAAAYLKEXAACAwRJ6AQAAGCyhFwAAgMESegGAhVFVj6mqrqrl\nWdcCwGIQegFYCFV1xTjsdFV9rqpuq6o/rKofraqHrXrsNePHvWDV+Aur6s5VYz9SVddV1Z1VtVJV\nf1FVl66xph+rqvur6meO/h3Oj/H3+v9TBwBDIPQCsEh+L8m2JI9J8h1J9iZ5eZL3VNWJqx772SSv\nrKpHHGpjVfXDSV6T5JeSPCHJP0/yyiRftsZ6/nWSy5K8sKq2rP1tzIeqevisawCAY03oBWCR3NPd\nH+3uW7v7z7v7PyZ5apJ/luTHVz3215OckORHH2R7353kqu5+XXf/bXf/VXe/tbt/7HCFVNW3JvnK\nJJckuTvJs1bNv3C8evy0qvrLqrprvDL9v0w95mur6jer6lNV9ZmqurGqnjeee0tV/dLUYy8dr17/\n86mxf6iq50/d/6Gq+mBVfbaqbqqq/6Oqjpua7/HK+FVVdVeSnz3c+zzEe1+qqj3j1fY7qurd07sb\nr+W9jx/3k1X1sfFjf62qfrqqPjyeuyTJDyb5rqkV/qdOPX1HVf3u+Pv2wap6xtR2H1ZVr6mqA1V1\nz/j7dNmRvFcAFp/QC8BC6+6/THJ1kvNXTd2Z0SrwT1XVlx/i6R9N8s1V9XVH8NI/kuQt3f25JG8c\n31/tEUl+MskPJ/nWJF+e0aryxC9ktKr87Ukel+TfJfn0eO6ajAL9xFOTfGIyVlVnJDl9/LhU1Ysy\nCrH/d5J/kuT/TPITSV6yqqafTvJbSf5pksvX/G7HqqqSvCvJ9iT/W5JzkvxRkj+oqm1TD33Q9z4O\n9z+d5Kcy+qPFXyWZ/mPDq5P8Rr6wur8tyfum5n8mo1X6s5N8IMlbquqk8dxLk3xPkuclOTPJv0ry\n1+t9rwAMg9ALwBB8MMnBguueJJ9MctEhnvfy8fyHqupvquqNVfUDq48RXm0crr43yZXjoSuTfGdV\nffWqh25N8qPd/f7u/ouMgtxTx8ExSXYkeW93X9/df9/dV3f31eO5a5I8tqq2VdWXJflfx8//9vH8\nU5N8qLtvGd9/WZIf7+63jbe1N6Ndr1eH3l/v7l/p7r/r7r9/sPd5CN+e0a7gzx2/r7/t7pcl+bsk\n08dQH+69/9skV4xruam7fy7Jn06e3N13ZrSCPlnd/2h33zu1/f/U3Xu7+2+S/F9JTh3XlYy+rzcl\neU9339zd7+vu/3oE7xWAARB6ARiCStKrB7v7voxWEl9aVdsPMv+R7v7WjFY9//N4O69L8v5x0DyU\n5yW5pbuvHW/nQxmtNv7gqsfd093TK4wHkjw8ySPH938+ycVV9T/Guy/vnKrtxoxWop+a5IlJPpTR\nLtvnjkP5U/OFVd5HJfnaJK8b7yp85/iEXZcl+fpVNV37IO9rLXZmtDr98VWv9fhVr3W49/4NSd6/\natt/mrX7i1XbTpKvGv/3iowC8E1VdXlVfdf0bt4APLRsnXUBALABvjGjlcYv0d1vrap/n+QVSd5z\niMf8ZZK/THJ5VX3b+HHfm1F4OpgfyWgV9r6pseOSPCrJq6bG7ssX66nHprtfX1X/Pcl3Jnl6kvdV\n1c919yXjx707o5XV25L8YXd/uKo+kdGq71My2n3489tL8m/yxbsAH8xdh5k/nOOSfCzJkw4yd/vU\n1w/63jfA5z6/4e4eLyBPvq9/VlWPSfLMJE9L8qtJrq+qZ3T3Axv0+gAsCKEXgIVWVY9Pcl6SB7vM\n0I8n+f0kn1rDJj84/u9JB5usqscl+ZYkz8hoJXbihCR/XFVP7u4/WsPrJEnGuyfvSbKnqn4io91+\nLxlPX5PRsbkfy2hVeDL2okwdz9vdH6uqA0m+vrt/ba2vfYT+LMlpSR7o7oP+oWGNbswovL9hauyb\nVz3m3iRHdFbs7r4jyduSvK2qrkjyJ0nOyGi3ZwAeQoReABbJI8bHzU5WVZ+W0fGc+zI6ZvSguvvd\nVXV1kguT3D8Zr6pfzGjX2D9IcktGJ0u6OMlnkvzOITb3I0mu6+7fWz1RVb8/nl9T6K2qn0/y2xkF\nsVMyCu8fnHrINUl+MaNjVK+ZGvvlfPHxvMnopFCvrapPZ3SiqodldIKo7ePjZdfrlKp6wqqxT2d0\nYqk/TvKbVfXjGYXXrx7X/nvdfdDV9IP4+ST/tao+kNHK+vdk9MeEf5x6zIeTPKuqHpvRsdcra9lw\nVf1Yko8k+fOMVoQvyGgV+pYHex4Aw+T4FgAWydMzCjM3Z7Ry+90ZrYo+ubsPt9vuRRkdUzrtdzMK\nWr+RUfB8+3j8Gd39JSuCNbqu7fMzWkE8mLcmeW5VLR32nYwcl+S1GQXd381oRffzxwVPHdd7U3d/\nfDx8TUZ/tL5mekPd/SsZnSn5BUmuzyhI7k5yJCerSka7L1+36vbq7u6Mdsf+g4zC919n9P17bL5w\nbO1hdfdbMrom8mXjbT8+o7M7f3bqYb+c0Vmdr03y8STnrnHzdyT5DxkdM/xnGR3f+6zu/sxa6wNg\nOGr02QUAMFtV9fYkW7t716xrAWA47N4MAGy68dmxX5zRNZbvy+g6y8/Ol15vGQCOipVeAGDTVdUJ\nSfYmOSejk4D9TZJXdfebZloYAIMj9AIAADBYTmQFAADAYAm9AAAADJbQCwAAwGAJvQAAAAyW0AsA\nAMBgCb0AAAAM1v8Psk5ERyvwT0QAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Show histogram of the Spark DF request body lengths\n", "bins, counts = spark_df.select('answer_length').rdd.flatMap(lambda x: x).histogram(50)\n", "\n", "# This is a bit awkward but I believe this is the correct way to do it\n", "plt.hist(bins[:-1], bins=bins, weights=counts, log=True)\n", "plt.grid(True)\n", "plt.xlabel('DNS Answer Lengths')\n", "plt.ylabel('Counts')" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [], "source": [ "from pyspark.ml import Pipeline\n", "from pyspark.ml.feature import OneHotEncoder, StringIndexer, VectorAssembler\n", "\n", "categoricalColumns = ['qtype_name', 'proto']\n", "stages = [] \n", "\n", "for categoricalCol in categoricalColumns:\n", " stringIndexer = StringIndexer(inputCol=categoricalCol, \n", " outputCol=categoricalCol+\"Index\")\n", " encoder = OneHotEncoder(inputCol=categoricalCol+\"Index\", \n", " outputCol=categoricalCol+\"classVec\")\n", " stages += [stringIndexer, encoder]\n", "\n", "numericCols = ['query_length', 'answer_length', 'Z', 'rejected']\n", "assemblerInputs = [c + \"classVec\" for c in categoricalColumns] + numericCols\n", "assembler = VectorAssembler(inputCols=assemblerInputs, outputCol=\"features\")\n", "stages += [assembler]\n", "\n", "pipeline = Pipeline(stages=stages)\n", "pipelineModel = pipeline.fit(spark_df)\n", "spark_df = pipelineModel.transform(spark_df)" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+--------------------+\n", "| features|\n", "+--------------------+\n", "|(18,[5,13,14,15,1...|\n", "|(18,[1,13,14,15,1...|\n", "|(18,[1,13,14,15,1...|\n", "|(18,[1,13,14,15,1...|\n", "|(18,[1,13,14,15,1...|\n", "|(18,[1,13,14,15,1...|\n", "|(18,[1,13,14,15,1...|\n", "|(18,[1,13,14,15,1...|\n", "|(18,[1,13,14,15,1...|\n", "|(18,[1,13,14,15,1...|\n", "|(18,[5,13,14,15],...|\n", "|(18,[5,13,14,15],...|\n", "|(18,[5,13,14,15],...|\n", "|(18,[3,13,14,15],...|\n", "|(18,[3,13,14,15],...|\n", "|(18,[5,13,14,15,1...|\n", "|(18,[1,13,14,15,1...|\n", "|(18,[1,13,14,15,1...|\n", "|(18,[1,13,14,15,1...|\n", "|(18,[2,13,14,15],...|\n", "+--------------------+\n", "only showing top 20 rows\n", "\n" ] } ], "source": [ "spark_df.select('features').show()" ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [], "source": [ "from pyspark.ml.clustering import KMeans\n", "\n", "# Trains a k-means model.\n", "kmeans = KMeans().setK(70)\n", "model = kmeans.fit(spark_df)" ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Within Set Sum of Squared Errors = 120733.85472213484\n", "Cluster Centers: \n", "[ 9.50906344e-01 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 4.74751834e-02 0.00000000e+00 0.00000000e+00 9.71083297e-04\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 6.47388865e-04 9.97410445e-01 1.14872680e+01 1.00000000e+00\n", " 0.00000000e+00 9.17134225e-03]\n", "[ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 2.72002232e+01 1.00360999e+00\n", " 0.00000000e+00 1.60808638e-03]\n", "[ 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.\n", " 0. 0. 10. 687. 0. 0.]\n", "[ 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 9.63855422e-02 0.00000000e+00\n", " 9.03614458e-01 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 9.13253012e+00 3.47000000e+02\n", " 0.00000000e+00 0.00000000e+00]\n", "[ 0. 0. 0. 1. 0. 0.\n", " 0. 0. 0. 0. 0. 0.\n", " 0. 1. 21.65517241 137.27586207 0. 0. ]\n", "[ 7.35499488e-03 0.00000000e+00 7.54119728e-03 0.00000000e+00\n", " 1.86202402e-04 9.70021413e-01 0.00000000e+00 1.48961922e-02\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 5.69467461e+01 1.00000000e+00\n", " 3.52108742e-01 0.00000000e+00]\n", "[ 0.00000000e+00 0.00000000e+00 9.21781975e-01 7.82180250e-02\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 2.05908438e+01 1.00000000e+00\n", " 8.21186615e-04 4.10593307e-04]\n", "[ 9.97151713e-01 0.00000000e+00 0.00000000e+00 2.26187484e-03\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 5.02638854e-04 0.00000000e+00\n", " 8.37731423e-05 9.99664907e-01 1.50000000e+01 1.00000000e+00\n", " 6.70185139e-04 3.09960627e-03]\n", "[ 8.24082785e-01 0.00000000e+00 1.48871119e-01 2.42238946e-02\n", " 2.82220132e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 2.30000000e+01 1.00000000e+00\n", " 0.00000000e+00 2.35183443e-03]\n", "[ 0. 0. 0. 0. 0. 0. 1.\n", " 0. 0. 0. 0. 0. 0.\n", " 0.97252747 1. 47.64285714 0. 0. ]\n", "[ 8.47867380e-02 0.00000000e+00 8.63408738e-03 8.28699706e-01\n", " 7.68433777e-02 1.03609049e-03 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 3.97349335e+01 1.00000000e+00\n", " 0.00000000e+00 9.82559143e-02]\n", "[ 0. 1. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 1.\n", " 4.20721412 1. 0.99945181 0. ]\n", "[ 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.\n", " 0. 0. 10. 412. 0. 0.]\n", "[ 8.98322492e-01 0.00000000e+00 8.40465594e-02 5.13522766e-03\n", " 3.59465936e-03 0.00000000e+00 0.00000000e+00 8.90106128e-03\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 3.31299213e+01 1.00000000e+00\n", " 0.00000000e+00 0.00000000e+00]\n", "[ 9.70394737e-01 0.00000000e+00 0.00000000e+00 6.57894737e-03\n", " 2.30263158e-02 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 9.80263158e-01 1.40986842e+01 1.19210526e+01\n", " 9.86842105e-03 0.00000000e+00]\n", "[ 0. 0. 0. 0. 0. 0. 1.\n", " 0. 0. 0. 0. 0. 0.\n", " 0.95454545 1. 17.59848485 0. 0. ]\n", "[ 9.84349541e-01 0.00000000e+00 1.56075808e-02 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 4.28779693e-05 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 2.50398551e+01 1.00000000e+00\n", " 0.00000000e+00 8.57559386e-05]\n", "[ 0. 1. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 1.\n", " 8. 1. 0.99988369 0. ]\n", "[ 3.88098318e-03 0.00000000e+00 0.00000000e+00 8.58990944e-01\n", " 1.81112549e-02 1.29366106e-02 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.06080207e-01 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 7.21966365e+01 1.07761966e+00\n", " 0.00000000e+00 3.88098318e-02]\n", "[ 9.95342475e-01 0.00000000e+00 0.00000000e+00 3.25701072e-05\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 4.62495522e-03\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 1.60000000e+01 1.00000000e+00\n", " 3.25701072e-05 1.46565482e-03]\n", "[ 0. 1. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 1.\n", " 14.65577023 1. 0.97712794 0. ]\n", "[ 0. 0. 0. 0.36548223 0.63451777 0. 0.\n", " 0. 0. 0. 0. 0. 0.\n", " 0.53807107 16.69035533 32.99492386 0. 0. ]\n", "[ 1. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 1.\n", " 5.93881886 1. 0. 0. ]\n", "[ 0.00000000e+00 0.00000000e+00 0.00000000e+00 7.59493671e-02\n", " 0.00000000e+00 0.00000000e+00 6.32911392e-02 0.00000000e+00\n", " 8.60759494e-01 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.39240506e-01 1.03924051e+01 1.13240506e+02\n", " 0.00000000e+00 0.00000000e+00]\n", "[ 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 9.98357964e-01 0.00000000e+00 0.00000000e+00 1.06249396e-03\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 5.79542162e-04 1.00000000e+00 1.39970057e+01 1.00000000e+00\n", " 2.99430117e-03 3.86361441e-03]\n", "[ 9.20485175e-01 0.00000000e+00 3.09973046e-02 3.03234501e-03\n", " 0.00000000e+00 4.54851752e-02 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 2.98662399e+01 1.00000000e+00\n", " 0.00000000e+00 0.00000000e+00]\n", "[ 0. 0. 0. 0. 0. 0.\n", " 0. 0. 1. 0. 0. 0.\n", " 0. 0. 10. 519.33333333 0. 0. ]\n", "[ 9.93240447e-01 0.00000000e+00 0.00000000e+00 2.06925093e-03\n", " 0.00000000e+00 8.27700372e-04 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 3.86260174e-03 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 2.00000000e+01 1.00000000e+00\n", " 0.00000000e+00 3.44875155e-03]\n", "[ 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.\n", " 0. 0. 10. 277. 0. 0.]\n", "[ 7.62580244e-01 0.00000000e+00 1.17829778e-01 1.15862497e-01\n", " 1.03541106e-04 0.00000000e+00 0.00000000e+00 6.21246635e-04\n", " 0.00000000e+00 3.00269207e-03 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 2.40000000e+01 1.00000000e+00\n", " 0.00000000e+00 1.65665769e-03]\n", "[ 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.\n", " 0. 0. 10. 440. 0. 0.]\n", "[ 0.00000000e+00 0.00000000e+00 1.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 1.60000000e+01 1.00000000e+00\n", " 0.00000000e+00 6.07902736e-03]\n", "[ 8.65995032e-01 0.00000000e+00 1.08059619e-01 0.00000000e+00\n", " 2.89815070e-03 1.38007176e-04 0.00000000e+00 1.91829975e-02\n", " 3.31217223e-03 0.00000000e+00 0.00000000e+00 4.14021529e-04\n", " 0.00000000e+00 9.96687828e-01 9.67913331e+00 1.00000000e+00\n", " 0.00000000e+00 5.79630141e-03]\n", "[ 1.22048223e-01 0.00000000e+00 4.77255779e-02 6.94258016e-01\n", " 1.32239622e-01 0.00000000e+00 0.00000000e+00 3.72856078e-03\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 4.25021129e+01 1.00000000e+00\n", " 0.00000000e+00 7.05940840e-02]\n", "[ 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 7. 1. 1. 0.]\n", "[ 1.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 1.40000000e+01 1.00000000e+00\n", " 0.00000000e+00 5.74349549e-05]\n", "[ 2.35294118e-01 0.00000000e+00 1.61764706e-02 2.10294118e-01\n", " 8.08823529e-03 5.30147059e-01 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 3.71735294e+01 1.03088235e+00\n", " 0.00000000e+00 0.00000000e+00]\n", "[ 9.99643589e-01 0.00000000e+00 0.00000000e+00 2.67308206e-04\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 1.90000000e+01 1.00000000e+00\n", " 0.00000000e+00 2.49487659e-03]\n", "[ 0. 1. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 1.\n", " 6. 1. 0.99921034 0. ]\n", "[ 8.58277625e-01 1.41486433e-01 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 2.35941801e-04\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 1.30000000e+01 1.00000000e+00\n", " 1.40464019e-01 1.73023987e-03]\n", "[ 0.00000000e+00 0.00000000e+00 9.61997828e-01 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 3.33876221e-02\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 4.61454940e-03 1.00000000e+00 1.06636808e+01 1.00000000e+00\n", " 0.00000000e+00 7.32899023e-03]\n", "[ 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 9. 1. 1. 0.]\n", "[ 9.92063492e-01 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 7.93650794e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 9.92063492e-01 9.04761905e+00 9.13492063e+00\n", " 0.00000000e+00 0.00000000e+00]\n", "[ 9.97958721e-03 0.00000000e+00 6.35064641e-03 8.84781130e-01\n", " 9.88886369e-02 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 4.10000000e+01 1.00000000e+00\n", " 0.00000000e+00 9.70741665e-02]\n", "[ 0. 0. 1. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 1.\n", " 30.99463087 1. 0. 0. ]\n", "[ 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 24.\n", " 1. 1. 0.]\n", "[ 0.00000000e+00 0.00000000e+00 2.58215962e-02 0.00000000e+00\n", " 0.00000000e+00 9.63615023e-01 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.05633803e-02 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 2.79495305e+01 1.00000000e+00\n", " 0.00000000e+00 0.00000000e+00]\n", "[ 0.00000000e+00 0.00000000e+00 1.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 1.88638181e+01 1.00000000e+00\n", " 0.00000000e+00 5.89349611e-03]\n", "[ 0.00000000e+00 0.00000000e+00 1.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 1.40000000e+01 1.00000000e+00\n", " 2.51098556e-04 0.00000000e+00]\n", "[ 0.00000000e+00 0.00000000e+00 1.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 1.50000000e+01 1.00000000e+00\n", " 1.78126113e-03 2.49376559e-03]\n", "[ 9.20588235e-01 0.00000000e+00 0.00000000e+00 5.83823529e-02\n", " 9.11764706e-03 3.38235294e-03 0.00000000e+00 4.70588235e-03\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 9.96323529e-01 2.13551471e+01 1.05882353e+00\n", " 5.42647059e-02 8.82352941e-04]\n", "[ 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 1. 23. 122. 0. 0.]\n", "[ 0.00000000e+00 0.00000000e+00 6.25000000e-02 8.43750000e-01\n", " 0.00000000e+00 0.00000000e+00 9.37500000e-02 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 1.56875000e+01 6.38437500e+01\n", " 0.00000000e+00 0.00000000e+00]\n", "[ 9.99437254e-01 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 5.62746201e-04 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 3.10153348e+01 1.00000000e+00\n", " 7.03432752e-04 0.00000000e+00]\n", "[ 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.\n", " 0. 0. 10. 658. 0. 0.]\n", "[ 9.98983482e-01 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 7.62388818e-04\n", " 0.00000000e+00 2.54129606e-04 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 1.70000000e+01 1.00000000e+00\n", " 0.00000000e+00 0.00000000e+00]\n", "[ 9.20746742e-01 0.00000000e+00 0.00000000e+00 2.00774921e-02\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.76118352e-03\n", " 0.00000000e+00 0.00000000e+00 5.74145826e-02 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 1.80000000e+01 1.00000000e+00\n", " 0.00000000e+00 0.00000000e+00]\n", "[ 9.18107833e-01 0.00000000e+00 2.64496439e-02 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 3.45879959e-02\n", " 0.00000000e+00 5.08646999e-04 0.00000000e+00 2.03458800e-02\n", " 0.00000000e+00 1.00000000e+00 7.84740590e+00 1.00000000e+00\n", " 0.00000000e+00 2.03458800e-03]\n", "[ 9.98236677e-01 0.00000000e+00 1.76332288e-03 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 3.76057994e+00 1.00000000e+00\n", " 0.00000000e+00 1.95924765e-04]\n", "[ 7.80748663e-01 0.00000000e+00 2.18360071e-01 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 8.91265597e-04\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 2.70713012e+01 1.00000000e+00\n", " 5.94177065e-04 0.00000000e+00]\n", "[ 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 10.\n", " 1. 1. 0.]\n", "[ 0.00000000e+00 0.00000000e+00 1.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 1.27907801e+01 1.00000000e+00\n", " 0.00000000e+00 5.80270793e-03]\n", "[ 0.00000000e+00 9.84625240e-01 0.00000000e+00 0.00000000e+00\n", " 1.53747598e-02 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 1.11755285e+01 1.00000000e+00\n", " 9.92099082e-01 1.70830664e-03]\n", "[ 0. 0. 0. 0. 0. 0. 1.\n", " 0. 0. 0. 0. 0. 0.\n", " 0.72340426 1. 11.55319149 0. 0. ]\n", "[ 0.00000000e+00 0.00000000e+00 0.00000000e+00 8.57142857e-02\n", " 0.00000000e+00 0.00000000e+00 9.14285714e-01 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 2.80000000e+00 1.57914286e+02\n", " 0.00000000e+00 0.00000000e+00]\n", "[ 0.18624735 0.29167346 0.00195535 0. 0. 0.\n", " 0.52012384 0. 0. 0. 0. 0. 0.\n", " 0.97865407 1.47286948 1. 0.29151051 0.02281245]\n", "[ 0.00000000e+00 0.00000000e+00 6.01967058e-02 9.25702097e-01\n", " 0.00000000e+00 1.41011968e-02 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 2.57784098e+01 1.00000000e+00\n", " 0.00000000e+00 2.72544140e-03]\n", "[ 0. 0. 0. 0. 1. 0. 0.\n", " 0. 0. 0. 0. 0. 0.\n", " 0.68168168 12.03903904 5.03903904 0.11711712 0.03903904]\n", "[ 0.00000000e+00 4.17310665e-02 4.88408037e-01 0.00000000e+00\n", " 0.00000000e+00 4.63678516e-03 0.00000000e+00 1.66924266e-01\n", " 0.00000000e+00 2.98299845e-01 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 5.37094281e+00 1.00000000e+00\n", " 0.00000000e+00 0.00000000e+00]\n", "[ 0.00000000e+00 0.00000000e+00 1.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00\n", " 0.00000000e+00 1.00000000e+00 1.70000000e+01 1.00000000e+00\n", " 2.40673887e-03 0.00000000e+00]\n" ] } ], "source": [ "# Evaluate clustering by computing Within Set Sum of Squared Errors.\n", "wssse = model.computeCost(spark_df)\n", "print(\"Within Set Sum of Squared Errors = \" + str(wssse))\n", "\n", "# Shows the result.\n", "centers = model.clusterCenters()\n", "print(\"Cluster Centers: \")\n", "for center in centers:\n", " print(center)" ] }, { "cell_type": "code", "execution_count": 115, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+----------+-----+------------+-------------+---+--------+----------+\n", "|qtype_name|proto|query_length|answer_length| Z|rejected|prediction|\n", "+----------+-----+------------+-------------+---+--------+----------+\n", "| SRV| udp| 57| 1| 1| false| 5|\n", "| NB| udp| 8| 1| 1| false| 17|\n", "| NB| udp| 8| 1| 1| false| 17|\n", "| NB| udp| 8| 1| 1| false| 17|\n", "| NB| udp| 4| 1| 1| false| 11|\n", "| NB| udp| 4| 1| 1| false| 11|\n", "| NB| udp| 4| 1| 1| false| 11|\n", "| NB| udp| 6| 1| 1| false| 38|\n", "| NB| udp| 6| 1| 1| false| 38|\n", "| NB| udp| 6| 1| 1| false| 38|\n", "| SRV| udp| 57| 1| 0| false| 5|\n", "| SRV| udp| 57| 1| 0| false| 5|\n", "| SRV| udp| 57| 1| 0| false| 5|\n", "| PTR| udp| 28| 1| 0| false| 1|\n", "| PTR| udp| 28| 1| 0| false| 1|\n", "| SRV| udp| 57| 1| 1| false| 5|\n", "| NB| udp| 15| 1| 1| false| 20|\n", "| NB| udp| 12| 1| 1| false| 62|\n", "| NB| udp| 15| 1| 1| false| 20|\n", "| AAAA| udp| 13| 1| 0| false| 61|\n", "+----------+-----+------------+-------------+---+--------+----------+\n", "only showing top 20 rows\n", "\n" ] } ], "source": [ "features = ['qtype_name', 'proto', 'query_length', 'answer_length', 'Z', 'rejected']\n", "transformed = model.transform(spark_df).select(features + ['prediction'])\n", "transformed.collect()\n", "transformed.show()" ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+----------+-----+------------+-------------+---+--------+----------+-----+\n", "|qtype_name|proto|query_length|answer_length| Z|rejected|prediction|count|\n", "+----------+-----+------------+-------------+---+--------+----------+-----+\n", "| A| udp| 12| 1| 0| true| 0| 26|\n", "| A| udp| 11| 1| 0| false| 0| 4713|\n", "| TXT| tcp| 12| 1| 0| false| 0| 19|\n", "| TXT| udp| 12| 1| 0| true| 0| 15|\n", "| HINFO| udp| 12| 1| 0| false| 0| 6|\n", "| TXT| udp| 12| 1| 0| false| 0| 401|\n", "| *| udp| 12| 1| 0| false| 0| 9|\n", "| TXT| tcp| 12| 1| 0| true| 0| 5|\n", "| A| udp| 11| 1| 0| true| 0| 39|\n", "| A| udp| 12| 1| 0| false| 0| 4035|\n", "| PTR| udp| 27| 11| 0| false| 1| 11|\n", "| PTR| udp| 28| 1| 0| true| 1| 1|\n", "| PTR| udp| 27| 1| 0| true| 1| 48|\n", "| PTR| udp| 27| 1| 0| false| 1|24311|\n", "| PTR| udp| 28| 1| 0| false| 1| 6100|\n", "| AXFR| tcp| 10| 687| 0| false| 2| 74|\n", "| AXFR| tcp| 10| 347| 0| false| 3| 75|\n", "| -| tcp| 1| 347| 0| false| 3| 8|\n", "| PTR| udp| 20| 136| 0| false| 4| 3|\n", "| PTR| udp| 23| 137| 0| false| 4| 5|\n", "| PTR| udp| 23| 138| 0| false| 4| 11|\n", "| PTR| udp| 20| 137| 0| false| 4| 10|\n", "| *| udp| 54| 1| 0| false| 5| 144|\n", "| AAAA| udp| 55| 1| 0| false| 5| 12|\n", "| SRV| udp| 50| 1| 0| false| 5| 18|\n", "| A| udp| 51| 1| 0| false| 5| 10|\n", "| *| udp| 51| 1| 0| false| 5| 16|\n", "| A| udp| 55| 1| 0| false| 5| 11|\n", "| SRV| udp| 57| 1| 1| false| 5| 3782|\n", "| AAAA| udp| 51| 1| 0| false| 5| 10|\n", "| TXT| udp| 64| 1| 0| false| 5| 2|\n", "| SRV| udp| 57| 1| 0| false| 5| 6619|\n", "| A| udp| 59| 1| 0| false| 5| 58|\n", "| AAAA| udp| 59| 1| 0| false| 5| 59|\n", "| AAAA| udp| 21| 1| 0| false| 6| 1825|\n", "| PTR| udp| 21| 1| 0| false| 6| 381|\n", "| AAAA| udp| 22| 1| 0| false| 6| 333|\n", "| AAAA| udp| 21| 1| 0| true| 6| 2|\n", "| AAAA| udp| 20| 1| 0| false| 6| 2326|\n", "| AAAA| udp| 21| 1| 1| false| 6| 4|\n", "| MX| udp| 15| 1| 0| false| 7| 6|\n", "| A| udp| 15| 1| 0| true| 7| 37|\n", "| A| udp| 15| 1| 1| false| 7| 8|\n", "| PTR| udp| 15| 1| 0| false| 7| 27|\n", "| A| udp| 15| 1| 0| false| 7|11854|\n", "| HINFO| udp| 15| 1| 0| false| 7| 1|\n", "| A| tcp| 15| 1| 0| false| 7| 4|\n", "| TXT| udp| 23| 1| 0| true| 8| 2|\n", "| A| udp| 23| 1| 0| false| 8| 3496|\n", "| TXT| udp| 23| 1| 0| false| 8| 10|\n", "+----------+-----+------------+-------------+---+--------+----------+-----+\n", "only showing top 50 rows\n", "\n" ] } ], "source": [ "transformed.groupby(features + ['prediction']).count().sort('prediction').show(50)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# More Coming..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Wrap Up\n", "Well that's it for this notebook, we pulled in Bro data from a Parquet file, then did some digging with high speed, parallel SQL operations and we clustered our data to organize the restuls.\n", "\n", "If you liked this notebook please visit the [BAT](https://github.com/SuperCowPowers/bat) project for more notebooks and examples." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }