{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "# Zeek to Spark Clustering\n", "In this notebook we will pull Zeek data into Spark then do some analysis and clustering. The first step is to convert your Zeek log data into a Parquet file, for instructions on how to do this (just a few lines of Python code using the ZAT package) please see this notebook:\n", "\n", "
\n", "\n", "### See these related notebooks\n", "- [Zeek to Parquet](https://nbviewer.jupyter.org/github/SuperCowPowers/zat/blob/main/notebooks/Zeek_to_Parquet.ipynb)\n", "- [Zeek to Spark](https://nbviewer.jupyter.org/github/SuperCowPowers/zat/blob/main/notebooks/Zeek_to_Spark.ipynb)\n", "\n", "Apache Parquet is a columnar storage format focused on performance. Reading Parquet data is fast and efficient, for this notebook we will specifically be using it for loading data into Spark.\n", "\n", "
\n", "
\n", "\n", "### Software\n", "- Zeek Analysis Tools (ZAT): https://github.com/SuperCowPowers/zat\n", "- Parquet: https://parquet.apache.org\n", "- Spark: https://spark.apache.org\n", "- Spark MLLib: https://spark.apache.org/mllib/\n", "\n", "### Data\n", "- About 1/2 million rows of a Zeek dns.log\n", "- Grabe the data here: [data.kitware.com](https://data.kitware.com/#collection/58d564478d777f0aef5d893a) (with headers)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ZAT: 0.3.7\n", "PySpark: 2.4.4\n" ] } ], "source": [ "# Third Party Imports\n", "import pyspark\n", "from pyspark.sql import SparkSession\n", "\n", "# Local imports\n", "import zat\n", "\n", "# Good to print out versions of stuff\n", "print('ZAT: {:s}'.format(zat.__version__))\n", "print('PySpark: {:s}'.format(pyspark.__version__))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "# Spark It!\n", "### Spin up Spark with 4 Parallel Executors\n", "Here we're spinning up a local spark server with 4 parallel executors, although this might seem a bit silly since we're probably running this on a laptop, there are a couple of important observations:\n", "\n", "
\n", "\n", "- If you have 4/8 cores use them!\n", "- It's the exact same code logic as if we were running on a distributed cluster.\n", "- We run the same code on **DataBricks** (www.databricks.com) which is awesome BTW.\n", "\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Spin up a local Spark Session (with 4 executors)\n", "spark = SparkSession.builder.master(\"local[4]\").appName('my_awesome').getOrCreate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "## Read in our Parquet File\n", "Here we're loading in a Zeek DNS log with ~1/2 million rows to demonstrate the functionality and do some analysis and clustering on the data. For more information on converting Zeek logs to Parquet files please see our Zeek to Spark notebook:\n", "\n", "#### Zeek logs to Parquet Notebook\n", "- [Zeek to Spark (and Parquet)](https://nbviewer.jupyter.org/github/SuperCowPowers/zat/blob/main/notebooks/Zeek_to_Spark.ipynb)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Have Spark read in the Parquet File\n", "spark_df = spark.read.parquet('/Users/briford/data/bro/dns.parquet')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "# Lets look at our data\n", "We should always inspect out data when it comes in. Look at both the data values and the data types to make sure you're getting exactly what you should be." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of Rows: 427935\n", "Columns: ts,uid,id_orig_h,id_orig_p,id_resp_h,id_resp_p,proto,trans_id,query,qclass,qclass_name,qtype,qtype_name,rcode,rcode_name,AA,TC,RD,RA,Z,answers,TTLs,rejected\n" ] } ], "source": [ "# Get information about the Spark DataFrame\n", "num_rows = spark_df.count()\n", "print(\"Number of Rows: {:d}\".format(num_rows))\n", "columns = spark_df.columns\n", "print(\"Columns: {:s}\".format(','.join(columns)))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+----------+-----+------+\n", "|qtype_name|proto| count|\n", "+----------+-----+------+\n", "| A| udp|212473|\n", "| NB| udp| 77199|\n", "| AAAA| udp| 54519|\n", "| PTR| udp| 52991|\n", "| TXT| udp| 12644|\n", "| SRV| udp| 12268|\n", "| -| udp| 3472|\n", "| *| udp| 882|\n", "| AXFR| tcp| 440|\n", "| SOA| udp| 346|\n", "| TXT| tcp| 226|\n", "| -| tcp| 176|\n", "| MX| udp| 169|\n", "| NS| udp| 43|\n", "| HINFO| udp| 30|\n", "| NAPTR| udp| 27|\n", "| PTR| tcp| 26|\n", "| A| tcp| 4|\n", "+----------+-----+------+\n", "\n" ] } ], "source": [ "spark_df.groupby('qtype_name','proto').count().sort('count', ascending=False).show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "# Data looks good, lets take a deeper dive\n", "Spark has a powerful SQL engine as well as a Machine Learning library. So now that we've loaded our Zeek data we're going to utilize the Spark SQL commands to do some investigation of our data including clustering from the MLLib.\n", "\n", "
\n", "
" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Plotting defaults\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "from zat.utils import plot_utils\n", "plot_utils.plot_defaults()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Add a column with the string length of the DNS query\n", "from pyspark.sql.functions import col, length\n", "\n", "# Create new dataframe that includes two new column\n", "spark_df = spark_df.withColumn('query_length', length(col('query')))\n", "spark_df = spark_df.withColumn('answer_length', length(col('answers')))" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0, 0.5, 'Counts')" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA7oAAAF/CAYAAABnvrVDAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3dfZild1kn+O+dbmJCQmqIQGy7pbPa4AwiJKZGZkCIO4AEdlsYwyry4gIztANm0FHHbddkDAHcsMM1uiKDtIPiBo2KBqQXyCgCKuqIhBjWCEZBu6U7mASwyJsJSe7545zWoqY76ao6XafOcz6f6zpXn/P8znme+5x6uqq+9Xt5qrsDAAAAQ3HStAsAAACASRJ0AQAAGBRBFwAAgEERdAEAABgUQRcAAIBBEXQBAAAYFEEXAACAQZmJoFtVZ1fVzVX1wfHt4dOuCQAAgM1p67QLWIXf7u7nTrsIAAAANreZ6NEde1JV/W5V/VhV1bSLAQAAYHPa0KBbVRdV1Ueq6q6qeuuKtjOr6h1VdXtVHaiq5y9rvjHJriRPSfKIJN+2cVUDAAAwSzZ66PLhJK9J8owkp65oe2OSu5OcleScJO+uquu6+/ruvivJXUlSVVcl+WdJfu3+DvSwhz2szz777MlWf5zuvPPOnHrqyrcHw+fcZ14595lXzn3mlXN/c7jmmmtu6e6jrt+0oUG3u69KkqpaTLLjyPaqOi3JhUke2923JflQVb0ryYuS7K2qh3T3reOnPznJx4+2/6rak2RPkmzbti1vfvObT9h7uT8HDhzIzp07p3JsmCbnPvPKuc+8cu4zr5z7m8Pi4uKBY7VtlsWoHp3knu6+Ydm265KcP77/TVX1miR3JPnLJJccbSfdvS/JviRZXFzs884778RV/ACmeWyYJuc+88q5z7xy7jOvnPub22YJuqcn+cKKbUtJHpIk3f3eJO/d6KIAAACYPZtl1eXbkpyxYtsZSW49ynPvV1Xtrqp9S0tLEykMAACA2bJZgu4NSbZW1aOWbXt8kutXu6Pu3t/dexYWFiZWHAAAALNjoy8vtLWqTkmyJcmWqjqlqrZ29+1JrkpyWVWdVlVPSvLsJFdsZH0AAADMvo3u0b04yZ1J9iZ54fj+xeO2V2R0yaGbklyZ5OXdveoeXUOXAQAA5tuGBt3uvrS7a8Xt0nHb57r7Od19Wnc/srt/cY3HMHQZAABgjm2WOboAAAAwEYIuAAAAgzK4oGuOLgAAwHwbXNA1RxcAAGC+DS7oAgAAMN8EXZiy7Tt2pqrWfNu+Y+e03wIAAGwqW6ddwKRV1e4ku3ft2jXtUpgT23fszOFDB9e1jwv2Xrvm1159+bnrOjYAAAzN4IJud+9Psn9xcfFl066F+XD40EFBFQAANhFDl4G5Z/g4AMCwDK5HF2C19MoDAAyLHl0AAAAGZXBBt6p2V9W+paWlaZcCAADAFAwu6Hb3/u7es7CwMO1SAAAAmILBBV0AAADmm6ALAADAoAi6AAAADIqgCwAAwKAMLuhadRkAAGC+DS7oWnUZAABgvg0u6AIAADDfBF0AAAAGRdAFWKeTtpycqlrXbfuOndN+GwAAg7F12gUAzLr77r07F+y9dl37uPrycydUDQAAenQBAAAYFEEXAACAQRlc0HUdXQAAgPk2uKDrOroAAADzbXBBFwAAgPkm6M6x7Tt2uiQKAAAwOC4vNMcOHzrokigAAMDg6NEFAABgUARdAAAABkXQZaatd56xOcYAADA85ugy09Y7z9gcYwAAGB49ugAAAAyKoAsAAMCgDC7oVtXuqtq3tLQ07VIAAACYgsEF3e7e3917FhYWpl0KzIT1LuhlUS8AADYbi1HNsO07dubwoYPTLmNdhvAeZt16F/RKLOoFAMDmIujOsCGsODyE9wAAAGwugxu6DAAAwHzTowus20lbTk5Vrfn1X7n9kTn06QMTrAgAgHkm6ALrdt+9dxuCDgDApmHoMgAAAIMi6AIMwHovE+USUQDAkBi6DDAAVjAHAPgHenQBAAAYFEEXAACAQZmpoFtV31lVN0+7DgAAADavmQm6VbUlyf+W5K+nXQsAAACb18wE3STfmeTtSe6bdiHA5rLeFYcBABiWDV11uaouSvLiJF+f5MrufvGytjOTvCXJtyS5JckPd/cvjtu2JPn2JM9J8gMbWTOw+VlxGACA5Tb68kKHk7wmyTOSnLqi7Y1J7k5yVpJzkry7qq7r7uuTvDDJr3T3fXpfAAAAuD8bOnS5u6/q7ncm+ezy7VV1WpILk1zS3bd194eSvCvJi8ZPeUyS76qqq5M8qqp+ciPrBgDYCOudilFV2b5j57TfBsDUbXSP7rE8Osk93X3Dsm3XJTk/Sbr7/ziysao+0t2vPNpOqmpPkj1Jsm3btlxzzTUnruL7ceDAgakcd1qm9TlPyqzXn3gPQzHtz2Dax59l8/Z9nxNnvVMxktF0jI36/+zcZ1459ze/zRJ0T0/yhRXblpI8ZOUTu3vxWDvp7n1J9iXJ4uJin3feeZOscVWmeeyNNuvvddbrT7yHoZj2ZzDt4886nx+byUaej8595pVzf3PbLKsu35bkjBXbzkhy6xRqAQAAYIZtlqB7Q5KtVfWoZdsen+T61e6oqnZX1b6lpaWJFQcAAMDs2NCgW1Vbq+qUJFuSbKmqU6pqa3ffnuSqJJdV1WlV9aQkz05yxWqP0d37u3vPwsLCZIuHTeqkLSe7hiwAACyz0XN0L07yo8sevzDJq5JcmuQVSX42yU0Zrcr88vGlhYD7cd+9d7uGLAAALLOhQbe7L80o1B6t7XNJnrPeY1TV7iS7d+3atd5dAQAAMIM2yxzdiTF0mdVY77BfQ38BAGDz2SyXF4KpWO+w38TQXwAA2GwG16MLAADAfBtcj645ujB7jgwhBwCASRhc0O3u/Un2Ly4uvmzatTyQ7Tt25vChg9MuA6bOytEAAEzS4ILuLDl86KBf7gEAACbMHF0AAAAGZXBBt6p2V9W+paWlaZcCAADAFAwu6LqO7sZa73VoATaL7Tt2rvl72eLiYrbv2DnttwAAjJmjy7pYRAgYCusmAMBwDK5HF4DVW+/oDL2ZAMBmokcXAKMzAIBBGVyPrsWoAAAA5tvggq7FqAAAAObb4IIuAAAA803QBYAJWO+CXhb1AoDJsRgVAEzAehf0SizqBQCTokcXAACAQRlc0LXqMgAAwHwbXNC16jIAAMB8G1zQBQAAYL4JugCwSax35WarNgPAiFWXAWCTWO/KzVZtBoARPboAAAAMiqALAADAoAi6AAAADMrggq7r6AIAAMy3wQVd19EFAACYb4MLugAAAMw3QRcAAIBBEXQBAAAYFEEXAACAQRF0AQAAGBRBFwAAgEERdAEAABgUQRcAAIBBEXQBAAAYlMEF3araXVX7lpaWpl0KAAAAUzC4oNvd+7t7z8LCwrRLAQAAYAoGF3QBAACYb4IuAAAAgyLoAgAAMCiCLgAAAIMi6AIAADAogi4AAACDIugCAAAwKIIuAAAAgyLoAgAAMCiCLgAAAIOyddoFHI+qOivJO5J8Mcm9SV7Q3TdOtyoAAAA2o1np0b0lyTd19/lJ/t8k/2rK9QCwyWzfsTNVteYbADAcM9Gj2933Lnv4kCTXT6sWADanw4cO5oK916759Vdffu4EqwEApmlDe3Sr6qKq+khV3VVVb13RdmZVvaOqbq+qA1X1/BXt51TVHya5KMlHN7BsAAAAZshGD10+nOQ1SX72KG1vTHJ3krOSvCDJm6rq6440dvcfd/cTklyS5Ic3oFYAAABm0IYG3e6+qrvfmeSzy7dX1WlJLkxySXff1t0fSvKuJC8at5+87OlLSe7YoJIBAACYMZtlju6jk9zT3Tcs23ZdkvPH98+pqtdntOLy3yV56dF2UlV7kuxJkm3btuWaa645cRXfjwMHDkzluMBsm9b3rEmZ9fqHwteBZOPOA7/zMK+c+5vfZgm6pyf5woptSxktPJXu/nCSpzzQTrp7X5J9SbK4uNjnnXfehMs8ftM8NjCbZv37xqzXPxS+DiQbex4455hXzv3NbbNcXui2JGes2HZGklunUAsAAAAzbLME3RuSbK2qRy3b9vis4TJCVbW7qvYtLS1NrDgAAABmx0ZfXmhrVZ2SZEuSLVV1SlVt7e7bk1yV5LKqOq2qnpTk2UmuWO0xunt/d+9ZWFiYbPEAAADMhI3u0b04yZ1J9iZ54fj+xeO2VyQ5NclNSa5M8vLuXnWPLgAAAPNtQxej6u5Lk1x6jLbPJXnOeo9RVbuT7N61a9d6dwUAAMAM2ixzdCfG0GUAAID5NrigCwAA8277jp2pqjXftu/YOe23AOuyWa6jCwAATMjhQwdzwd5r1/z6qy8/d4LVwMYbXI+uywsBAADMt8EFXXN0AQAA5tvggi4AAADzTdAFAABgUAYXdM3RBQAAmG+DC7rm6AIAAMy3NQfdqjq1qp5WVS6yBQAAwKZx3EG3qt5aVa8Y3z85yYeT/EaSP6uqZ56g+gAAAGBVVtOj+4wk/218/1uTPCTJVyS5dHwDAACAqVtN0H1okpvG9y9I8mvdfVOSX0rymEkXtlYWowIAAJhvqwm6n0ny2KraklHv7vvG209P8sVJF7ZWFqMCAACYb1tX8dyfTfLLSQ4nuTfJb423PyHJJyZcFwAAAKzJcQfd7r6sqq5P8sgkb+/uu8dN9yR53YkoDgAAAFbruINuVT0lya939z0rmn4hyRMnWhUAAACs0Wrm6H4gyZlH2b4wbgMAAICpW80c3UrSR9n+5Ulun0w561dVu5Ps3rVr17RLAThuJ205OVU17TIAAAbhAYNuVb1rfLeTvK2q7lrWvCXJY5P8/gmobU26e3+S/YuLiy+bdi0Ax+u+e+/OBXuvXfPrr7783AlWMx3bd+zM4UMHp10GADAAx9Oj+9nxv5Xk80nuXNZ2d5IPJfmZCdcFwJw5fOjg3Id9AGAyHjDodvdLkqSq/irJ67t70wxTBgAAgJVWc3mhV53IQgAAAGASVnN5oTOTvDbJU5M8IitWbO7uMyZbGgAAAKzealZdfkuSc5PsS3I4R1+BGQAAAKZqNUH3qUme3t1/eKKKAQAAgPU66YGf8vduSnLbiSpkUqpqd1XtW1pamnYpAAAATMFqgu6PJLmsqk4/UcVMQnfv7+49CwsL0y4FAACAKVjN0OWLk5yd5KaqOpDki8sbu/txE6wLgBly0paTU1XTLgMAIMnqgu6vnrAqAJhp9917dy7Ye+269nH15edOqBrWavuOnTl86OCaX/+V2x+ZQ58+MMGKAGBtXEcXAEiSHD50cF1/sPDHCgA2i9XM0QUAAIBN77h7dKvq1tzPtXO7+4yJVAQAAADrsJo5uhetePygJOcmuTDJaydWEQAAAKzDaubo/vzRtlfVR5M8NckbJlUUAAAArNUk5uh+IMnuCewHAAAA1m0SQfd5SW6ZwH4moqp2V9W+paWlaZcCAADAFKxmMar/P1+6GFUlOSvJmUlePuG61qy79yfZv7i4+LJp1wIAAMDGW81iVL+64vF9SW5O8sHu/sTkSgIAAIC1W81iVK86kYUAAADAJKymRzdJUlX/IsljMhrGfH13f3DSRQEAAMBarWaO7vYk70hyXpLD481fWVUfSfIvu/vwMV8MADAHtu/YmcOHDk67DIC5t5oe3Z9Mcm+SXd39l0lSVV+d5G3jtudOvjwAgNlx+NDBXLD32jW//urLz51gNQDzazVB9+lJvvlIyE2S7v5UVb0yyW9NvDIAAABYg9VeR7ePcxsAAABMxWqC7m8leUNVfdWRDVX1yCQ/ET26AAAAbBKrCbqvTHJakk9V1YGqOpDkk+NtrzwRxQEAAMBqreY6un9dVd+Q5GlJ/vF488e7+30npDIAAABYgwfs0a2qZ1bVX1XVGT3ym939hu5+Q5I/Grc9fQNqBQAAgAd0PEOXL0ryH7v7CysbunspyeuSfN+kC1upqr6xqv6gqn6nqq6sqged6GMCAAAwe44n6D4uyf0NT35/ksdPppz79ddJ/kV3PyXJXyV59gYcEwAAgBlzPHN0H57kvvtp7yRfPply7ucg3Tcue3h37r8mAJg7J205OVU17TKYcdt37MzhQwfX/Pqv3P7IHPr0gQlWBLB6xxN0P51Rr+6fH6P9cUkOHe8Bq+qiJC9O8vVJruzuFy9rOzPJW5J8S5Jbkvxwd//iitfvHLe/5niPCQDz4L57784Fe69d8+uvvvzcCVbDrDp86KDzCJh5xzN0+d1JXl1Vp65sqKoHJ7ls/JzjdTijkPqzR2l7Y0a9tWcleUGSN1XV1y073hlJrkjy4u7+4iqOCQAAwJw4nh7d1yZ5bpIbquqnknxivP2fZLRQVSX5seM9YHdflSRVtZhkx5HtVXVakguTPLa7b0vyoap6V5IXJdlbVVuT/FKSV3X3nx3v8QAAAJgvDxh0u/umqnpikjdlFGiPTP7pJP81yfd0999MoJZHJ7mnu29Ytu26JOeP739nkickuaSqLknypu7+5eU7qKo9SfYkybZt23LNNddMoKzVO3DAvBQA5tO0fvbypab9dZj28ZkMX8dj8/v+5nc8Pbrp7gNJnlVVD02yK6Ow++fd/fkJ1nJ6kpWXMFpK8pBxDVdkNGz5/urcl2RfkiwuLvZ55503wfJWZ5rHBoBp8fNvc5j212Hax2cyfB3vn89nczuuoHvEONj+0Qmq5bYkZ6zYdkaSW0/Q8QAAABig41mMaqPckGRrVT1q2bbHJ7l+NTupqt1VtW9paWmixQEAADAbNjzoVtXWqjolyZYkW6rqlKra2t23J7kqyWVVdVpVPSnJs/MAw5VX6u793b1nYWFh8sUDAACw6U2jR/fiJHcm2ZvkheP7F4/bXpHk1CQ3Jbkyycu7e1U9ugAAAMy3Vc3RnYTuvjTJpcdo+1yS56xn/1W1O8nuXbt2rWc3AAAAzKjNNEd3IgxdBgAAmG+DC7oAAADMN0EXAACAQRlc0HV5IQAAgPk2uKBrji4AAMB8G1zQBQAAYL4JugDARJy05eRU1Zpv23fsnPZbAGAgNvw6uiea6+gCwHTcd+/duWDvtWt+/dWXnzvBagCYZ4Pr0TVHFwAAYL4NLugCAAAw3wRdAAAABkXQBQAAYFAGF3SrandV7VtaWpp2KQAAAEzB4IKuxagAAADm2+CCLgAAAPNN0AUAAGBQBF0AAAAGRdAFAABgUAYXdK26DACz6aQtJ6eq1nXbvmPntN8GAJvA1mkXMGndvT/J/sXFxZdNuxYA4Pjdd+/duWDvtevax9WXnzuhagCYZYPr0QUAAGC+CboAwGCsd/izoc8AwzC4ocsAwPxa7/BnQ58BhkGPLgAAAIMi6AIAADAogi4AAACDMrig6zq6AAAA821wQbe793f3noWFhWmXAgAAwBQMLugCAAAw3wRdAAAABkXQBQAAYFAEXQAAAAZF0AUAAGBQBF0AAAAGRdAFAABgUARdAAAABmVwQbeqdlfVvqWlpWmXAgAAwBQMLuh29/7u3rOwsDDtUgAAAJiCwQVdAAAA5pugCwAAwKAIugAAAAyKoAsAAMCgCLoAAAAMiqALAADAoAi6AAAADIqgCwAAwKAIugAAAAyKoAsAAMCgCLoAAAAMykwE3apaqKoPV9VtVfXYadcDAADA5jUTQTfJHUn+lyS/Ou1CAAAA2NxmIuh29xe7++Zp1wEAAMDmt6FBt6ouqqqPVNVdVfXWFW1nVtU7qur2qjpQVc/fyNoAAAAYhq0bfLzDSV6T5BlJTl3R9sYkdyc5K8k5Sd5dVdd19/UbWyIAAACzbEN7dLv7qu5+Z5LPLt9eVacluTDJJd19W3d/KMm7krxoI+sDAABg9m10j+6xPDrJPd19w7Jt1yU5/8iDqnpPRj29X1tVb+7ut67cSVXtSbInSbZt25ZrrrnmhBZ9LAcOHJjKcQGA9ZvW7w+TNO33MO3jMxmz/HV85rN25+abblzXPh7+iG1573v2H7XN7/ub32YJuqcn+cKKbUtJHnLkQXc/64F20t37kuxLksXFxT7vvPMmWeOqTPPYAMDaDeFn+LTfw7SPz2TM8tfx5ptuzAV7r13XPq6+/Nz7/Qxm+fOZB5tl1eXbkpyxYtsZSW6dQi0AAADMsM0SdG9IsrWqHrVs2+OTrHohqqraXVX7lpaWJlYcAAAAs2OjLy+0tapOSbIlyZaqOqWqtnb37UmuSnJZVZ1WVU9K8uwkV6z2GN29v7v3LCwsTLZ4AAAAZsJG9+henOTOJHuTvHB8/+Jx2ysyuuTQTUmuTPJylxYCAABgtTZ0MaruvjTJpcdo+1yS56z3GFW1O8nuXbt2rXdXAAAAzKDNMkd3YgxdBgAAmG+DC7oAAADMN0EXAACAQRlc0HV5IQAAgPk2uKBrji4AAMB8G1zQBQAAYL4JugAAAAzK4IKuOboAAMC0bd+xM1W15tv2HTun/RZm2tZpFzBp3b0/yf7FxcWXTbsWAABgPh0+dDAX7L12za+/+vJzJ1jN/Blcjy4AAADzTdAFAABgUARdAAAABmVwQddiVAAAAPNtcEG3u/d3956FhYVplwIAAMAUDC7oAgAAMN8EXQAAAAZF0AUAAGBQBF0AAAAGZXBB16rLAAAA821wQdeqywAAAPNtcEEXAACA+SboAgAAMCiCLgAAAIMi6AIAADAogi4AAACDMrig6/JCAAAA821wQdflhQAAAObb4IIuAAAA803QBQAAYFAEXQAAAAZF0AUAAGBQBF0AAAAGRdAFAABgUARdAAAABkXQBQAAYFAEXQAAAAZlcEG3qnZX1b6lpaVplwIAwBRs37EzVbXm24O+7MHrev32HTun/REwACdtOdl5uA5bp13ApHX3/iT7FxcXXzbtWgAA2HiHDx3MBXuvXfPrr7783HW/Htbrvnvvdh6uw+B6dAEAAJhvgi4AAACDIugCAAAwKIIuAAAAgyLoAgAAMCiCLgAAAIMi6AIAADAogi4AAACDIugCAAAwKIIuAAAAgyLoAgAAMCgzE3Sr6nVV9btVdUVVPWja9QAAALA5zUTQrarHJ9ne3U9O8okkz51ySQAAAGxSMxF0kzwxyW+M71+d5ElTrAUAAIBNbEODblVdVFUfqaq7quqtK9rOrKp3VNXtVXWgqp6/rPmhSb4wvr+U5MwNKhkAAIAZs3WDj3c4yWuSPCPJqSva3pjk7iRnJTknybur6rruvj7J3yY5Y/y8hSSf25hyAQAAmDUb2qPb3Vd19zuTfHb59qo6LcmFSS7p7tu6+0NJ3pXkReOn/H6Sp43vPyPJ721QyQAAAMyYje7RPZZHJ7mnu29Ytu26JOcnSXf/cVX9TVX9bpKDSV5/tJ1U1Z4ke5Jk27Ztueaaa05s1cdw4MCBqRwXAFi/af3+MEnTfg/TPv5msJ7P4JnP2p2bb7pxgtWszbx/HU/acnKqatplrNkk6n/4I7blve/ZP6GKNtZmCbqn5x/m4B6xlOQhRx50979/oJ10974k+5JkcXGxzzvvvEnWuCrTPDYAsHZD+Bk+7fcw7eNvBuv5DG6+6cZcsPfadR3/6svPXdfrE1/H++69e11fh0l8DdZjvfUno/cwq+fBZll1+bb8wxzcI85IcusUagEAAGCGbZage0OSrVX1qGXbHp/k+tXuqKp2V9W+paWliRUHAADA7NjoywttrapTkmxJsqWqTqmqrd19e5KrklxWVadV1ZOSPDvJFas9Rnfv7+49CwsLky0eAACAmbDRPboXJ7kzyd4kLxzfv3jc9oqMLjl0U5Irk7x8fGkhAAAAOG4buhhVd1+a5NJjtH0uyXPWe4yq2p1k965du9a7KwAAAGbQZpmjOzGGLgMAAMy3wQVdAAAA5pugCwAAwKAMLui6vBAAAMB8G1zQNUcXAABgvg0u6AIAADDfBF0AAAAGZXBB1xxdAACA+VbdPe0aToiqujnJgSkd/mFJbpnSsWGanPvMK+c+88q5z7xy7m8OO7v74UdrGGzQnaaq+kh3L067Dthozn3mlXOfeeXcZ1459ze/wQ1dBgAAYL4JugAAAAyKoHti7Jt2ATAlzn3mlXOfeeXcZ1459zc5c3QBAAAYFD26AAAADIqgCwAAwKAIuhNUVWdW1Tuq6vaqOlBVz592TXAiVNWXVdVbxuf5rVX1x1X1zGXtT62qT1TVHVX1garaOc16YdKq6lFV9XdV9bZl254//j9xe1W9s6rOnGaNcCJU1fOq6uPj8/yTVfXk8Xbf9xmsqjq7qt5TVZ+vqs9U1U9V1dZx2zlVdc343L+mqs6Zdr2MCLqT9cYkdyc5K8kLkrypqr5uuiXBCbE1yV8nOT/JQpKLk/zK+AfBw5JcleSSJGcm+UiSX55WoXCCvDHJHx15MP5e/+YkL8roZ8AdSf7zdEqDE6Oqnp7kdUlekuQhSZ6S5FO+7zMH/nOSm5JsS3JORr//vKKqTk7y60neluShSX4+ya+PtzNlFqOakKo6Lcnnkzy2u28Yb7siyaHu3jvV4mADVNXHkrwqyZcneXF3P3G8/bQktyQ5t7s/McUSYSKq6nlJvi3JnybZ1d0vrKofS3J2dz9//JyvSfLxJF/e3bdOr1qYnKr6/SRv6e63rNi+J77vM2BV9fEkP9Dd7xk//o9Jzkjya0l+LsmOHoeqqjqYZE93Xz2tehnRozs5j05yz5GQO3ZdEj26DF5VnZXR/4HrMzrnrzvS1t23J/lk/F9gAKrqjCSXJfn+FU0rz/tPZjTC59EbVx2cOFW1JclikodX1V9U1afHwzdPje/7DN9PJHleVT24qrYneWaSqzM6xz/WX9pz+LE49zcFQXdyTk/yhRXbljIa2gODVVUPSvILSX5+/Jf70zM695fzf4GheHVGPVqfXrHdec/QnZXkQUmem+TJGQ3fPDejqSvOf4budzIKr19I8umMhue/M879TU3QnZzbMhrCsNwZSQxZY7Cq6qQkV2TUc3XReLP/CwzSeIGRpyX58aM0O+8ZujvH/76hu2/s7luS/Kckz4rznwEb/65zdUbz0E9L8rCM5uO+Ls79TU3QnZwbkmytqkct2/b4jIZywuBUVSV5S0Z/5b+wu784bro+o1uEW20AAAktSURBVHP/yPNOS/I18X+B2ffNSc5OcrCqPpPkB5NcWFUfzf943n91ki/L6GcDzLzu/nxGPVnLh2geue/7PkN2ZpJHJvmp7r6ruz+b0bzcZ2V0jj9u/DvREY+Lc39TEHQnZDwf5aokl1XVaVX1pCTPzqi3C4boTUn+SZLd3X3nsu3vSPLYqrqwqk5J8h8ymr9iQRJm3b6Mfnk/Z3z76STvTvKMjIbv766qJ49/yb8syVUWomJgfi7Jv62qR1TVQ5P8uyT/X3zfZ8DGoxf+MsnLq2prVf2jJP97RnNxP5jk3iSvHF968cjotvdPpVi+hKA7Wa9IcmpGy49fmeTl3e0vOgzO+PqI353RL/ufqarbxrcXdPfNSS5M8tqMViJ/QpLnTa9amIzuvqO7P3PkltGQtb/r7pvH3+v/TUaB96aM5me9Yorlwonw6owuq3VDRquKX5vktb7vMwe+LckFSW5O8hdJvpjk33X33Umek+S7kvxtkpcmec54O1Pm8kIAAAAMih5dAAAABkXQBQAAYFAEXQAAAAZF0AUAAGBQBF0AAAAGRdAFAABgUARdAGDTqqpvrqquqodNuxYAZoegC8DMqqq3jkNQV9UXq+qmqvpAVX1PVT1oxXM/OH7ei1Zsf3FV3bZi27+uqmur6raqWqqqj1XVa46jnidU1buq6nNVdVdVfaKqfrSqTpnMOz4xxp/NT6kDgKEQdAGYde9Lsi3J2Um+Jcn+JK9K8rtVddqK5/5dkldX1Zcda2dV9dIkP5nkp5Ock+SfJXl1kgffXxFV9a1JfjfJZ5M8Lcmjx3XsSfIbVXXyat/YalTVSVW15UQeAwBmhaALwKy7q7s/092HuvuPu/s/JfnmJN+Q5IdWPPeXk5ya5HvuZ3/fmuSq7n5zd/9Fd3+8u9/e3d9/rBdU1YOTvCXJe7r7Jd390e4+0N1XJtmd5JuSfO+y53dVPXfFPv6qqn5w2eOFqto37qW+tap+u6oWl7W/eNzj/Kyq+pMkdyd50rhn+ytW7Pu1VfWx+3nP96uqtlfVL1XV58e3d1fVo5a1X1pVf1JVz6uqT47rfefy4cZVtbWqfnzZPn68qt5UVR8ct781yflJvmdZL/3Zy8p4fFX9YVXdUVUfqapvWPFZXTH+rP6uqj5VVd+31vcLwOwTdAEYnO7+kyRXJ7lwRdNtGfWy/khV/aNjvPwzSb6xqr56FYd8RpKHJfm/j1LLR5P8VpLnH+/OqqqSvDvJ9iT/a5Jzk/xOkvdX1bZlTz0lySVJvjvJY5Jcm+STSb5r2b5OGj9+yyrez/JaHpzkAxn1hp+f5J8nuTHJ+8ZtR5yd5DuS/MuMetbPTfLaZe0/mOTFSf51Rr3kJ+VLP5PvTfIHSX4uox76bUn+eln7/5Vkb0Z/wPhskl8Yf05J8pokX5/RZ/W1SV6a5NBa3i8AwyDoAjBUf5rkaGF1X0ZBae8xXveqcfsnq+rPq+ptVfVdK+f8rvDo8b8fv59avvY4aj7if85o2PRzu/vD457lS5J8KsnyOcZbklzU3b/X3Td0961J/kuSlyx7zjOSPCLJ21Zx/OWel6SSvKS7P9bdn8goWJ+eUbA8YmuSF4+f8wcZfc5PXdb+vUle192/1t1/luT7MvqjQpKku5cy6pW+Y9xD/5nuvnfZ6y/p7g+Mj39Zkn+c0R8CkmRnko+OP6sD3f3B7n77Gt8vAAMg6AIwVJWkV27s7nuS/EiSV1bV9qO039jd/zyjHsKfGO/nzUk+vKIHc7XuXsVzz8toTvDN4+HJt40XzHpskq9Z9rx7kvzxitf+fJKvrqonjh+/NMk7u/uza6z7vCT/U5Jbl9WxlOShK2o5MA6rRxzOKGCnqhaSfEWSDx9p7O5e/vg4LB96fXj87yPG/74pyXdU1XVV9fqqOn8V+wVggLZOuwAAOEEek1EP6P+gu98+ng97WUYLSB3tOX+S5E+SvLGqvmn8vG9P8tajPP2GZcf8vWPUcsOyx51RgF5ueY/xSUn+JsmTj7KvLyy7f9eKXs90981V9a4kL62qP8tozvHuo+zneJ2UUZh+3lHaPrfs/hdXtHUm+wf15fs/8geMk5Kku99bVTuTPDOjXuR3V9Xbu/slAWAuCboADE5VPTbJBRnN3TyWH8po7uzn7uc5R/zp+N/Tj9H+X5PckuTfZ0XQHS+a9NQkFy3bfHNGc1CPPOes5Y+TfDTJWUnu6+6jhvUH8DNJfjWjoP+ZjFamXquPJvnOJLd099+uZQfdvVRVn0nyT5O8P/n7ecj/NMuGL2fU672mlaO7+5YkVyS5oqrem+TKqvo33X3XWvYHwGwTdAGYdV82XmX4pCQPzyhU/p9Jrkny+mO9qLt/u6quziiA/n2vaFW9KaOhse9P8umMAujFSe5I8hvH2NcdVfWvkvxqVf1skjdkNM/3ieMars5o+PMR789odeHfHx/7xzJa7OmI92UUmH+9qn4oyScyGvp7QZL3dfdRe6GX+c3x8X80yeXdfd8DPD9JHlZV56zYdlOSX8hoIalfr6r/kORgkq9K8uwkP93df34c+06S/yfJD1XVDRn94eC7M/psb1z2nL/KaCGwszNaOOx4/giRqroso0B+fUa/23xbkk8JuQDzyxxdAGbd0zIKSwcz6qH91iSXJnlKd9/+AK/dm2Tl9W1/M8kTkvxKRsON3zHe/vTuviHH0N3vSvKUjML2+5McSHJlRj2ru1cMMf6BjHpbPzhu/y8Zhcoj++okzxrv52eS/Nm4nq/NP8xPPabx638uo+HQP/dAzx/7joxWbV5++/7uvmP8vj6V5O0Zhe6fz2iO7uePc9/JKPBfMa7nv423vSNfGvBfn1Gv7p9m1Ov9yOPc910ZrfB8XUZ/IHhI1jdcG4AZV6OfhQDAJFXVlox6Q5+c5Pzu/osNPv6bkuzq7qdv5HFXo6quTfKh7v63064FgGExdBkAToDuvreqXpDRZXWekmRDgu54hePHZHTt3G/fiGMej/FiUc9I8tsZ9TS/LMnjxv8CwETp0QWAAamqDyb5xiRv2Uw9pVX1VRkN5f76jKZO/WlG18Y96rxnAFgPQRcAAIBBsRgVAAAAgyLoAgAAMCiCLgAAAIMi6AIAADAogi4AAACDIugCAAAwKP8d1f3TmAS+i0gAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Show histogram of the Spark DF request body lengths\n", "bins, counts = spark_df.select('query_length').rdd.flatMap(lambda x: x).histogram(50)\n", "\n", "# This is a bit awkward but I believe this is the correct way to do it\n", "plt.hist(bins[:-1], bins=bins, weights=counts, log=True)\n", "plt.grid(True)\n", "plt.xlabel('DNS Query Lengths')\n", "plt.ylabel('Counts')" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0, 0.5, 'Counts')" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA7oAAAF6CAYAAAA3cyTwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3dfZild1kn+O+dbiAxLyVRjG0iYTUJjrAmme7VEeTFAYag24IT1sUIig70NWAuZtad0biGNWDUcC3XzAgT0VaYaBhehE2UXjTjaxBkFNLEOAaxFaUjaSC8SCVpQkKSe/84p+Gk6KSrqqvrnPOcz+e6nitVv985z7lP3V1V+dbveanuDgAAAAzFcdMuAAAAADaSoAsAAMCgCLoAAAAMiqALAADAoAi6AAAADIqgCwAAwKBsnXYBG62qdibZefLJJ7/4nHPOmXY5D+quu+7KCSecMO0y2GD6Okz6Olx6O0z6Okz6Olx6O0yb0de9e/d+qrsfdbi5Gup9dHfs2NE33HDDtMt4UHv37s327dunXQYbTF+HSV+HS2+HSV+HSV+HS2+HaTP6WlV7u3vH4eYcugwAAMCgDC7oVtXOqtq9vLw87VIAAACYgsEF3e7e0927lpaWpl0KAAAAUzC4oAsAAMBiE3QBAAAYlMEFXefoAgAALLbBBV3n6AIAACy2wQVdAAAAFpugCwAAwKAMLug6RxcAAGCxDS7oOkcXAABgsQ0u6AIAALDYBN0pOv2MM1NV695OP+PMab8FAACAmbN12gUssgO33pILLrlx3c+/7orzN7AaAACAYRjciq6LUQEAACy2wQVdF6MCAABYbIMLugAAACw2QRcAAIBBEXQBAAAYFEEXAACAQRF0AQAAGJTBBV23FwIAAFhsgwu6bi8EAACw2AYXdAEAAFhsgi4AAACDIugCAAAwKIIuAAAAgyLoAgAAMCiCLgAAAIMi6AIAADAogwu6VbWzqnYvLy9PuxQAAACmYHBBt7v3dPeupaWlaZcCAADAFAwu6AIAALDYBF0AAAAGRdAFAABgUARdAAAABkXQBQAAYFAEXQAAAAZF0AUAAGBQBF0AAAAGZS6CblU9pqo+WVXXj7dHTbsmAAAAZtPWaRewBu/q7udOuwgAAABm21ys6I49sareXVU/V1U17WIAAACYTZsadKvq4qq6oarurqqrVsydWlXXVtXBqtpfVRdNTH8syVlJnpzka5L8y82rGgAAgHmy2Su6B5JcnuQNh5m7Msk9SU5L8gNJXldVj0uS7r67uw92dye5Jsm5m1QvAAAAc2ZTg253X9Pdv5nk05PjVXVikguTvLy77+zu9yR5R5IXjOdPnnj4k5L87SaVDAAAwJyZlYtRnZPk3u7eNzF2U5KnjD/+jqq6PMnnkvx9kpcfbidVtSvJriTZtm1b9u7de+wqPkr79+/fkP3M8ntcRBvVV2aLvg6X3g6Tvg6Tvg6X3g7TtPs6K0H3pCS3rxhbTnJyknT37yT5nSPtpLt3J9mdJDt27Ojt27dvcJmzZxHe47zRk2HS1+HS22HS12HS1+HS22GaZl9n5arLdyY5ZcXYKUnuWOuOqmpnVe1eXl7ekMIAAACYL7MSdPcl2VpVZ0+MnZvk5rXuqLv3dPeupaWlDSsOAACA+bHZtxfaWlXHJ9mSZEtVHV9VW7v7YEZXU35lVZ1YVU9M8uwkV29mfQAAAMy/zV7RvTTJXUkuSfL88ceXjudemuSEJLcleXOSl3T3mld0HboMAACw2Db79kKXdXet2C4bz32mu5/T3Sd296O7+03rfA2HLgMAACywWTlHFwAAADbE4IKuQ5cBAAAW2+CCrkOXAQAAFtvggi4AAACLbXBB16HLAAAAi21wQdehywAAAIttcEEXAACAxSboAgAAMCiCLgAAAIMyuKDrYlQAAACLbXBB18WoAAAAFtvggi4AAACLTdAFAABgUARdAAAABmVwQdfFqAAAABbb4IKui1EBAAAstsEFXQAAABaboAsAAMCgCLoAAAAMiqALAADAoAwu6LrqMgAAwGIbXNB11WUAAIDFNrigCwAAwGITdAEAABgUQRcAAIBBEXQBAAAYFEEXAACAQRF0AQAAGBRBFwAAgEEZXNCtqp1VtXt5eXnapQAAADAFgwu63b2nu3ctLS1NuxQAAACmYHBBFwAAgMUm6AIAADAogi4AAACDIugCAAAwKIIuAAAAgyLoAgAAMCiCLgAAAIMi6AIAADAocxV0q+r7q+qT064DAACA2TU3QbeqtiT535L8w7RrAQAAYHbNTdBN8v1J3pbk/mkXAgAAwOza1KBbVRdX1Q1VdXdVXbVi7tSquraqDlbV/qq6aGJuS5LvS/LWzawXAACA+bN1k1/vQJLLkzwzyQkr5q5Mck+S05Kcl+SdVXVTd9+c5PlJfqO776+qzawXAACAObOpK7rdfU13/2aST0+OV9WJSS5M8vLuvrO735PkHUleMH7INyf5waq6LsnZVfWazawbAACA+bHZK7oP5pwk93b3vomxm5I8JUm6+ycODVbVDd39ssPtpKp2JdmVJNu2bcvevXuPXcVHaf/+/Ruyn1l+j4too/rKbNHX4dLbYdLXYdLX4dLbYZp2X2cl6J6U5PYVY8tJTl75wO7e8WA76e7dSXYnyY4dO3r79u0bWeNMWoT3OG/0ZJj0dbj0dpj0dZj0dbj0dpim2ddZuerynUlOWTF2SpI7plALAAAAc2xWgu6+JFur6uyJsXOT3LzWHVXVzqravby8vGHFAQAAMD82+/ZCW6vq+CRbkmypquOramt3H0xyTZJXVtWJVfXEJM9OcvVaX6O793T3rqWlpY0tHgAAgLmw2Su6lya5K8klGd0y6K7xWJK8NKNbDt2W5M1JXjK+tdCaWNEFAABYbJt9e6HLurtWbJeN5z7T3c/p7hO7+9Hd/aZ1voYVXQAAgAU2K+foAgAAwIYYXNB16DIAAMBiG1zQdegyAADAYhtc0AUAAGCxCboAAAAMyuCCrnN0AQAAFtvggq5zdAEAABbb4IIuAAAAi03QBQAAYFAGF3SdowsAALDYBhd0naMLAACw2AYXdAEAAFhsgi4AAACDIugCAAAwKIMLui5GBQAAsNgGF3RdjAoAAGCxDS7oAgAAsNgEXQAAAAZF0AUAAGBQBF0AAAAGRdAFAABgUAYXdN1eCAAAYLENLui6vRAAAMBiG1zQBQAAYLEJugAAAAyKoAsAAMCgCLoAAAAMiqALAADAoAi6AAAADIqgCwAAwKAMLuhW1c6q2r28vDztUgAAAJiCwQXd7t7T3buWlpamXQoAAABTMLigCwAAwGITdAEAABiUdQfdqjqhqp5eVWduZEEAAABwNFYddKvqqqp66fjjhyd5X5LfTfLXVfWsY1QfAAAArMlaVnSfmeRPxx9/T5KTk3xtksvGGwAAAEzdWoLuI5PcNv74giT/b3ffluQtSb55owsDAACA9VhL0P14ksdX1ZaMVnd/fzx+UpIvbHRhAAAAsB5b1/DYNyR5a5IDSe5L8gfj8W9L8qENrgsAAADWZdVBt7tfWVU3J3l0krd19z3jqXuTvOpYFHdIVZ2W5NqMVo7vS/ID3f2xY/maAAAAzKdVB92qenKS3+rue1dM/dckT9jQqr7cp5J8R3ffX1UvTPKvklx+jF8TAACAObSWc3T/KMmphxlfGs8dM919X3ffP/705CQ3H8vXAwAAYH6tJehWkj7M+FclObiqHVRdXFU3VNXdVXXVirlTq+raqjpYVfur6qIV8+dV1Z8luTjJB9ZQNwAAAAvkiIcuV9U7xh92kjdW1d0T01uSPD7Je1f5egcyOuT4mUlOWDF3ZZJ7kpyW5Lwk76yqm7r75iTp7j9P8m1V9X1JfjLJv17lawIAALBAVnOO7qfH/60k/5jkrom5e5K8J8mvrObFuvuaJKmqHUnOODReVScmuTDJ47v7ziTvGQfsFyS5pKoePnHxq+Ukn1vN6wEAALB4jhh0u/uHk6SqPpLk1d29qsOU1+icJPd2976JsZuSPGX88XlV9eqMrrj8+SQ/cridVNWuJLuSZNu2bdm7d+8xKHVj7N+/f0P2M8vvcRFtVF+ZLfo6XHo7TPo6TPo6XHo7TNPu61puL/SKY1jHSUluXzG2nNGFp9Ld70vy5CPtpLt3J9mdJDt27Ojt27dvcJmzZxHe47zRk2HS1+HS22HS12HS1+HS22GaZl9XfTGq8cWiXldV+6rqs1V1++R2lHXcmeSUFWOnJLljrTuqqp1VtXt5efkoSwIAAGAerXpFN8nrk5yf0YrpgRz+CszrtS/J1qo6u7v/Zjx2btZxG6Hu3pNkz44dO168gfUBAAAwJ9YSdJ+W5Bnd/WfrfbGq2jp+zS1JtlTV8Rmdm3uwqq5J8sqqelFGV11+dpInrPe1AAAAWExruY/ubRkdYnw0Ls3oqs2XJHn++ONLx3MvzeiWQ7cleXOSlxy6tdBaOHQZAABgsa0l6P5URiuuJ633xbr7su6uFdtl47nPdPdzuvvE7n50d79pna+xp7t3LS0trbdMAAAA5thaDl2+NMljktxWVfuTfGFysru/ZQPrAgAAgHVZS9B9+zGrYgNV1c4kO88666xplwIAAMAUzMp9dDeMqy4DAAAstrWcowsAAAAzb9UrulV1Rx7i3rndfcqGVAQAAABHYS3n6F684vOHJTk/yYVJfnbDKjpKztEFAABYbGs5R/fXDjdeVR9I8rQkr92ooo6Gc3QBAAAW20aco/tHSXZuwH4AAADgqG1E0H1ekk9twH4AAADgqK3lYlT/Iw+8GFUlOS3JqUlessF1rZtzdAEAABbbWi5G9fYVn9+f5JNJru/uD21cSUfHOboAAACLbS0Xo3rFsSwEAAAANsJaVnSTJFX1z5N8c0aHMd/c3ddvdFEAAACwXms5R/f0JNcm2Z7kwHj466rqhiTf290HHvTJAAAAsEnWctXl1yS5L8lZ3f313f31Sc4ej73mWBS3HlW1s6p2Ly8vT7sUAAAApmAtQfcZSX60u//+0EB3/12Sl43nZkJ37+nuXUtLS9MuBQAAgClY6310e5VjAAAAMBVrCbp/kOS1VfX1hwaq6tFJ/tN4DgAAAKZuLUH3ZUlOTPJ3VbW/qvYn+fB47GXHojgAAABYq7XcR/cfquqfJnl6km8aD/9Vd//+MakMAAAA1uGIK7pV9ayq+khVndIjv9fdr+3u1yZ5/3huZi5GBQAAwGJbzaHLFyf5f7r79pUT3b2c5FVJ/u1GF7Zebi8EAACw2FYTdL8lyUMdnvyHSc7dmHKOntsLAQAALLbVBN1HJbn/IeY7yVdtTDkAAABwdFYTdD+a0arug/mWJLduTDkAAABwdFYTdN+Z5Geq6oSVE1X1FUleOX4MAAAATN1qbi/0s0mem2RfVf3nJB8aj/+TjC5UVUl+7tiUBwAAAGtzxKDb3bdV1ROSvC6jQFuHppL8tyQ/2t2fOHYlcqycfsaZOXDrLUe1j687/dG59aP7N6giAACAo7eaFd109/4k31VVj0xyVkZh92+6+x+PZXEcWwduvSUXXHLjUe3juivO36BqAAAANsaqgu4h42D7/mNUCwAAABy11VyMaq5U1c6q2r28vDztUgAAAJiCwQXd7t7T3buWlpamXQoAAABTMLigCwAAwGITdAEAABgUQRcAAIBBEXQBAAAYlDXdXojZctyWh6eqpl0GAADATBF059j9992TCy65cd3Pv+6K84+6hqMN2193+qNz60f3H3UdAAAAhwi6HJVZCNsAAACTnKMLAADAoMxN0K2qb62q/15Vf1xVb66qh027JgAAAGbP3ATdJP+Q5J9395OTfCTJs6dbDgAAALNobs7R7e6PTXx6T5L7p1ULAAAAs2vTV3Sr6uKquqGq7q6qq1bMnVpV11bVwaraX1UXHeb5Zyb5F0n2bFLJAAAAzJFprOgeSHJ5kmcmOWHF3JUZrdaeluS8JO+sqpu6++YkqapTklyd5IXd/YXNKxkAAIB5selBt7uvSZKq2pHkjEPjVXVikguTPL6770zynqp6R5IXJLmkqrYmeUuSV3T3X2923QDA7Dv9jDNz4NZb1v1893cHGIZZOkf3nCT3dve+ibGbkjxl/PH3J/m2JC+vqpcneV13v3VyB1W1K8muJNm2bVv27t177Ktep/37/RI9ZJb7tFb6Okz6Olx6OzwHbr3lqO/vPqTfS0Pi+3W49HaYpt3XWQq6JyW5fcXYcpKTk6S7r87osOUH1d27k+xOkh07dvT27duPQZlstKH1aWjvhxF9HS69ZSX/JmaX3gyX3g7TNPs6S7cXujPJKSvGTklyx1p2UlU7q2r38vLyhhUGAADA/JiloLsvydaqOnti7NwkN69lJ929p7t3LS0tbWhxABxbp59xZqpq3dvpZ5w57bcAAMyITT90eXxRqa1JtiTZUlXHZ3Ru7sGquibJK6vqRRlddfnZSZ6w2TUCsPk24tzKRedCTAAwMo1zdC9N8tMTnz8/ySuSXJbkpUnekOS2JJ9O8pJDtxZararamWTnWWedtSHFwrHmf0yBjeKPBQAwMo3bC12WUag93NxnkjznKPe/J8meHTt2vPho9gObxf+YAgDAxpqlc3QBAADYAEd77Yt5v/7FLN1eaEM4dBkAAFh0R3vUYDLfRw4ObkXXVZcBAAAW2+CCLgAAAIttcEG3qnZW1e7l5eVplwIAAMAUDC7oOnQZAABgsQ0u6AIAALDYBF0AAAAGRdAFAABgUAYXdF2MCgAAYLENLui6GBUAAMBiG1zQBQAAYLEJugAAAAyKoAsAAMCgDC7ouhgVAADAYhtc0HUxKgAAgMU2uKALAADAYhN0AQAAGBRBFwAAgEERdJmq47Y8PFW17u30M86c9lsAAABmzNZpF7DRqmpnkp1nnXXWtEthFe6/755ccMmN637+dVecv4HVAAAAQzC4FV1XXWbRWBUHAIAHGtyKLiwaq+IAAPBAg1vRBQAAYLEJugAAAAyKoAsAAMCgCLoAAAAMiqALAADAoAi6AAAADMrggm5V7ayq3cvLy9MuBQAAgCkYXNDt7j3dvWtpaWnapQAAADAFgwu6AAAALDZBFwAAgEERdAEAABgUQRcAAIBBEXSBo3b6GWemqta9nX7GmdN+CwAADMjWaRcAzL8Dt96SCy65cd3Pv+6K8zewGgAAFp0VXQAAAAZF0AUAAGBQ5iLoVtVSVb2vqu6sqsdPux4AAABm11wE3SSfS/LdSd4+7UIAAACYbXMRdLv7C939yWnXAQAAwOzb1KBbVRdX1Q1VdXdVXbVi7tSquraqDlbV/qq6aDNrAwAAYBg2+/ZCB5JcnuSZSU5YMXdlknuSnJbkvCTvrKqbuvvmzS0R1ub0M87MgVtvmXYZ63bcloenqqZdBgAAbJhNDbrdfU2SVNWOJGccGq+qE5NcmOTx3X1nkvdU1TuSvCDJJZtZI6zVvN9D9v777jmq+pPpvwcAAJi02Su6D+acJPd2976JsZuSPOXQJ1X12xmt9D62qn65u69auZOq2pVkV5Js27Yte/fuPaZFH439+/dPu4TBmOU+s3qz3Effr/Njrf+O9PbLzfL34mbxNZhNvl+HS29n23p/Jk67r7MSdE9KcvuKseUkJx/6pLu/60g76e7dSXYnyY4dO3r79u0bWSMzSp+HYdb7OOv1MbKePuntA/l6+BrMMr0ZLr2dXUfTm2n2dVauunxnklNWjJ2S5I4p1AIAAMAcm5Wguy/J1qo6e2Ls3CRrvhBVVe2sqt3Ly8sbVhwAAADzY7NvL7S1qo5PsiXJlqo6vqq2dvfBJNckeWVVnVhVT0zy7CRXr/U1untPd+9aWlra2OIBAACYC5u9ontpkrsyupLy88cfXzqee2lGtxy6Lcmbk7xkPbcWsqILAACw2DY16Hb3Zd1dK7bLxnOf6e7ndPeJ3f3o7n7TOl/Dii4AAMACm5VzdAEAAGBDDC7oOnQZ5s9xWx6eqlr3dvoZZ077LQAAMENm5T66G6a79yTZs2PHjhdPuxZgde6/755ccMmN637+dVecv4HVAAAw7wa3ogsAAMBiE3QBAAAYlMEFXefoAgAALLbBBV23FwIAAFhsgwu6AAAALDZBFwAAgEEZXNB1ji4AAMBiG1zQdY4uAADAYhtc0AUAAGCxCboAAAAMiqALAADAoAwu6LoYFQAAMG2nn3Fmqmrd2+lnnDnttzDXtk67gI3W3XuS7NmxY8eLp10LAACwmA7ceksuuOTGdT//uivO38BqFs/gVnQBAABYbIIuAAAAgyLoAgAAMCiCLgAAAIMi6AIAADAogwu6bi+0WI7b8nCXbQdgZridyGzQB8DthZhr9993j8u2AzAz3E5kNugDMLgVXQAAABaboAsAAMCgCLoAAAAMiqALAADAoAi6AAAADIqgCwAAwKAIugAAAAzK4IJuVe2sqt3Ly8vTLgUAAIApGFzQ7e493b1raWlp2qUAAAAwBYMLugAAACw2QRcAAIBBEXQBAAAYFEEXAACAQRF0AQAAGBRBFwAAgEERdAEAABgUQRcAAIBBmZugW1Wvqqp3V9XVVfWwadcDAADAbJqLoFtV5yY5vbuflORDSZ475ZIAAACYUXMRdJM8Icnvjj++LskTp1gLAAAAM2xTg25VXVxVN1TV3VV11Yq5U6vq2qo6WFX7q+qiielHJrl9/PFyklM3qWQAAADmzNZNfr0DSS5P8swkJ6yYuzLJPUlOS3JekndW1U3dfXOSzyY5Zfy4pSSf2ZxyAQAAmDebuqLb3dd0928m+fTkeFWdmOTCJC/v7ju7+z1J3pHkBeOHvDfJ08cfPzPJn2xSyQAAAMyZzV7RfTDnJLm3u/dNjN2U5ClJ0t1/XlWfqKp3J7klyasPt5Oq2pVkV5Js27Yte/fuPbZVH4X9+/dPuwTGZvnfCatz3JaHp6rW/fxHfc22/M5v73nQed+v82Ot3896++X8TJz+12Darz+rNvv7VR82j5/FD24W/h2ut4Zp93VWgu5J+dI5uIcsJzn50Cfd/e+PtJPu3p1kd5Ls2LGjt2/fvpE1MlD+ncy/+++7JxdccuO6n3/dFecf8d+BfyfzYT190tsH8vWY/tdg2q8/yzbza6MPm8vX+/Bm4etyNDVMs/5ZuerynfnSObiHnJLkjrXuqKp2VtXu5eXlDSkMAACA+TIrQXdfkq1VdfbE2LlJbl7rjrp7T3fvWlpa2rDiAAAAmB+bfXuhrVV1fJItSbZU1fFVtbW7Dya5Jskrq+rEqnpikmcnuXoz6wMAAGD+bfaK7qVJ7kpySZLnjz++dDz30oxuOXRbkjcnecn41kJr4tBlAACAxbbZtxe6rLtrxXbZeO4z3f2c7j6xux/d3W9a52s4dBkAAGCBzco5ugAAALAhBhd0HboMAACw2AYXdB26DAAAsNgGF3QBAABYbIIuAAAAgzK4oOscXQAAgMU2uKDrHF0AAIDFNrigCwAAwGKr7p52DcdEVX0yyf5p1/EQvjrJp6ZdBBtOX4dJX4dLb4dJX4dJX4dLb4dpM/p6Znc/6nATgw26s66qbujuHdOug42lr8Okr8Olt8Okr8Okr8Olt8M07b46dBkAAIBBEXQBAAAYFEF3enZPuwCOCX0dJn0dLr0dJn0dJn0dLr0dpqn21Tm6AAAADIoVXQAAAAZF0AUAAGBQBN1NVlWnVtW1VXWwqvZX1UXTrokjq6qLq+qGqrq7qq5aMfe0qvpQVX2uqv6oqs6cmHtEVb2hqm6vqo9X1Y9tevE8qHF/Xj/+Xryjqv68qp41Ma+3c6qq3lhVHxv3Z19VvWhiTl/nXFWdXVWfr6o3ToxdNP5ePlhVv1lVp07M+d0746rq+nFP7xxvfz0xp7dzrKqeV1V/Ne7Rh6vqSeNxP4vn1MT36aHtvqp67cT8TPRW0N18Vya5J8lpSX4gyeuq6nHTLYlVOJDk8iRvmBysqq9Ock2Slyc5NckNSd468ZDLkpyd5Mwk35nkx6vqgk2ol9XZmuQfkjwlyVKSS5P8RlU9Rm/n3s8neUx3n5Lke5JcXlXb9XUwrkzy/kOfjH+P/nKSF2T0+/VzSX5xxeP97p19F3f3SePtsYnezruqekaSVyX54SQnJ3lykr/zs3i+TXyfnpTka5PcleRtyWz9v7GLUW2iqjoxyT8meXx37xuPXZ3k1u6+ZKrFsSpVdXmSM7r7hePPdyV5YXc/Yfz5iUk+leT87v5QVR0Yz//ueP5nkpzd3c+byhvgiKrqL5K8IslXRW8Hoaoem+T6JP8myVdGX+daVT0vyb9M8sEkZ3X386vq5zL6w8ZF48d8Y5K/yuj7+P743Tvzqur6JG/s7l9dMa63c6yq3pvk9d39+hXj/v9pIKrqh5L8dJJv7O6epd5a0d1c5yS599AP47GbkvjL4/x6XEY9TJJ098EkH07yuKp6ZJJtk/PR75lWVadl9H16c/R27lXVL1bV55J8KMnHkvx29HWuVdUpSV6ZZOWhbiv7+uGMVvnOid+98+Tnq+pTVfUnVfXU8Zjezqmq2pJkR5JHVdXfVtVHq+o/V9UJ8bN4SH4oya/3l1ZPZ6a3gu7mOinJ7SvGljM6lIP5dFJGPZx0qKcnTXy+co4ZU1UPS/Jfk/xad38oejv3uvulGfXkSRkdRnV39HXe/UxGq0MfXTF+pL763Tv7fiLJNyQ5PaN7b+4Zr97q7fw6LcnDkjw3o5/D5yU5P6PThPwsHoDxubdPSfJrE8Mz01tBd3PdmeSUFWOnJLljCrWwMR6qp3dOfL5yjhlSVccluTqjVYKLx8N6OwDdfV93vyfJGUleEn2dW1V1XpKnJ/mPh5k+Ul/97p1x3f1n3X1Hd9/d3b+W5E+SfFf0dp7dNf7va7v7Y939qST/Iavra+Jn8Tx4QZL3dPffT4zNTG8F3c21L8nWqjp7YuzcjA6TZD7dnFEPk3zxPIRvTHJzd/9jRodLnjvxeP2eMVVVSV6f0V+eL+zuL4yn9HZYtmbcv+jrvHpqksckuaWqPp7k3yW5sKo+kC/v6zckeURGv3f97p1PnaSit3Nr/DP1oxn18ovD4//6WTwMP5gHruYms9Tb7rZt4pbkLQmegvEAAAlASURBVEnenOTEJE/MaLn+cdOuy3bEvm1NcnxGV3K9evzx1iSPGvfwwvHYq5L86cTzrkjyriSPTPJN42/uC6b9fmwP6O0vJfnTJCetGNfbOd2SfE2S52V0iNSWJM9McjCjqy/r65xuSb4io6t7HtpeneTt454+LqNDWJ80/v36xiRvmXiu370zvGV0kbhnTvxu/YHx9+w5ejvfW0bn1L9//HP5kUnendEpCH4Wz/mW5Anj79OTV4zPTG+n/kVatC2jy2z/5vgfxi1JLpp2TbZV9e2yjP4KObldNp57ekYXu7kroyu7PmbieY/I6JZEtyf5RJIfm/Z7sT2gr2eOe/n5jA6nObT9gN7O7zb+JfuuJJ8d9+d/JHnxxLy+DmAb/1x+48TnF41/rx5M8ltJTp2Y87t3hrfx9+z7Mzp88bMZ/fHxGXo7/1tG5+j+4rivH0/ymiTHj+f8LJ7jLaPbfl39IHMz0Vu3FwIAAGBQnKMLAADAoAi6AAAADIqgCwAAwKAIugAAAAyKoAsAAMCgCLoAAAAMiqALAMyNqnpMVXVV7Zh2LQDMLkEXgLlQVVeNA05X1Req6raq+qOq+tGqetiKx14/ftwLVoy/sKruXDH2oqq6sarurKrlqvqLqrp8lTX9WFXdV1U/e/TvcHaMv9b/nzoAmFeCLgDz5PeTbEvymCT/IsmeJK9I8u6qOnHFYz+f5Geq6hEPtrOq+pEkr0nyS0nOS/LPkvxMkq9YZT3/KskVSV5YVVtW/zZmQ1U9fNo1AMCxIOgCME/u7u6Pd/et3f3n3f0fkjw1yT9N8uMrHvvWJCck+dGH2N/3JLmmu3+5u/+2u/+qu9/W3T92pEKq6tuTfHWSy5LcleRZK+ZfOF4lflpV/WVVHRyvQP9PE4/5+qr6rar6TFV9rqo+VFXPG8+9pap+aeKxl49Xqf/ZxNg/VNXzJz7/4ar6YFV9vqr2VdX/UVXHTcz3eAX8mqo6mOTnjvQ+H+S9L1XV7vGq+h1V9a7JQ4lX897Hj/vJqvrE+LG/XlU/XVUfGc9dluSHknz3xEr+UyeefmZV/d746/bBqnrGxH4fVlWvqaoDVXX3+Ot0xXreKwDzSdAFYK51918muS7JhSum7sxotfenquorH+TpH0/yrVX1Det46RcleUt3fyHJG8efr/SIJD+Z5EeSfHuSr8xo9fiQX8xo9fg7kzwuyb9N8tnx3PUZhfhDnprkU4fGquqsJGeMH5eqenFGwfX/TvJPkvyfSX4iyUtX1PTTSX47yf+c5MpVv9uxqqok70xyepL/Ncn5Sf44yR9W1baJhz7kex8H+p9O8lMZ/aHir5JM/oHh1Ul+I19axd+W5L0T8z+b0Wr8uUnen+QtVXXSeO5lSb43yfOSnJ3kf0/y12t9rwDML0EXgCH4YJLDhdXdST6d5JIHed4rxvMfrqq/qao3VtUPrjznd6VxoPq+JFePh65O8l1V9bUrHro1yY929/u6+y8yCm9PHYfFJDkzyXu6+6bu/vvuvq67rxvPXZ/ksVW1raq+Isn/Mn7+d47nn5rkw9390fHnL0/y49399vG+9mR0WPXKoPvW7v7V7v677v77h3qfD+I7MzrM+7nj9/W33f3yJH+XZPKc6CO993+T5KpxLfu6++eT/NmhJ3f3nRmtlB9axf94d98zsf//2N17uvtvkvxfSU4d15WMvq77kry7u2/p7vd2939Zx3sFYE4JugAMQSXplYPdfW9GK4Yvq6rTDzP/se7+9oxWN//TeD+/nOR943D5YJ6X5KPdfcN4Px/OaFXxh1Y87u7unlxJPJDk4UkeOf78F5JcWlX/fXxo8vaJ2j6U0YrzU5M8IcmHMzoc+4njIP7UfGk191FJvj7JL48PA75zfNGtK5J844qabniI97Ua2zNahf7kitd6/IrXOtJ7/6Yk71ux7z/L6v3Fin0nydeM/3tVRqF3X1VdWVXfPXkINwDDt3XaBQDABvjmjFYUv0x3v62q/l2SVyZ594M85i+T/GWSK6vqO8aP+76MAtPhvCij1dZ7J8aOS/KoJK+aGLs3D9QTj013v76q/luS70ry9CTvraqf7+7Lxo97V0YrqLcl+aPu/khVfSqj1d2nZHRo8Bf3l+Rf54GH9x7OwSPMH8lxST6R5EmHmbt94uOHfO8b4Atf3HF3jxeKD31dP1BVj0nyzCRPS/JrSW6qqmd09/0b9PoAzDBBF4C5VlWPT3JBkoe6JdCPJ/mDJJ9ZxS4/OP7vSYebrKrHJfm2JM/IaMX1kBOS/ElVPbm7/3gVr5MkGR96vDvJ7qr6iYwO6b1sPH19RufafiKj1d9DYy/OxPm53f2JqjqQ5Bu7+9dX+9rr9IEkpyW5v7sP+8eFVfpQRoH9DRNj37riMfckWdfVrLv7jiRvT/L2qroqyZ8mOSujQ5oBGDhBF4B58ojxebCHVk+fltH5mXszOgf0sLr7XVV1XZKLk9x3aLyqXpfRYa9/mOSjGV3w6NIkn0vyuw+yuxclubG7f3/lRFX9wXh+VUG3qn4hye9kFL5OySiwf3DiIdcneV1G55xePzH2K3ng+bnJ6MJOr62qz2Z0samHZXSRp9PH57+u1SlVdd6Ksc9mdHGoP0nyW1X14xkF1q8d1/773X3YVfPD+IUk/6Wq3p/RCvr3ZvQHhH+ceMxHkjyrqh6b0bnUy6vZcVX9WJKPJfnzjFZ+L8potfmjD/U8AIbD+SoAzJOnZxRgbslohfZ7Mlr9fHJ3H+mQ3EsyOkd00u9lFK5+I6Owee14/Bnd/WUrfzW67+zzM1opPJy3JXluVS0d8Z2MHJfktRmF29/LaOX2i+f5Tpynu6+7Pzkevj6jP1RfP7mj7v7VjK5w/IIkN2UUHnclWc8Fp5LRock3rthe3d2d0aHWf5hR4P7rjL5+j82XzpU9ou5+S0b3LL5ivO/HZ3RV5s9PPOxXMroa8w1JPpnkiavc/R1J/n1G5wB/IKPzdZ/V3Z9bbX0AzLca/b4CAJiuqro2ydbu3jntWgCYbw5dBgA23fiq1i/J6B7I92Z0H+Rn58vvhwwAa2ZFFwDYdFV1QpI9Sc7P6EJef5PkVd39pqkWBsAgCLoAAAAMiotRAQAAMCiCLgAAAIMi6AIAADAogi4AAACDIugCAAAwKIIuAAAAg/L/AxY/dl7AO4bKAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Show histogram of the Spark DF request body lengths\n", "bins, counts = spark_df.select('answer_length').rdd.flatMap(lambda x: x).histogram(50)\n", "\n", "# This is a bit awkward but I believe this is the correct way to do it\n", "plt.hist(bins[:-1], bins=bins, weights=counts, log=True)\n", "plt.grid(True)\n", "plt.xlabel('DNS Answer Lengths')\n", "plt.ylabel('Counts')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "# Cleanup\n", "**Note:** This bit of cleanup code is no longer needed as the ZAT log_to_sparkdf now takes care of these things for us. :)\n", "\n", "There are two bits of cleanup that we MUST do:\n", "- Remove '.' from the column names (see Note:)\n", "- Drop NULLs\n", "\n", "**Note:** Yes you can do backticks when selecting the column names BUT some of the pipeline operations below will FAIL internally if the column names have a '.' in them." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "# Spark Pipelines\n", "A Spark pipeline is a way to combine a sequence of complex algorithms and transformations to create a workflow. Once a pipeline is created Spark can optimize that pipeline when it's executed.\n", "\n", "Below our pipeline consists of the following stages:\n", "- **String Indexer:** Takes our string columns and assigns an index to each unique string\n", "- **OneHotEncoder:** Takes our string index and maps it to a bit vector\n", "- **Normalization:** Converts our numeric data into a 0-1 range\n", "- **Assembler:** Combines the encoded categorical data and numerical data into a combined matrix\n", "\n", "\n", "For more information on the details of Categorical Type to One Hot Encoding see our SCP Labs [Encoding Dangers](https://nbviewer.jupyter.org/github/SuperCowPowers/scp-labs/blob/main/notebooks/Categorical_Encoding_Dangers.ipynb) notebook." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "from pyspark.ml import Pipeline\n", "from pyspark.ml.feature import OneHotEncoder, StringIndexer, VectorAssembler, StandardScaler\n", "\n", "cat_columns = ['qtype_name', 'proto']\n", "num_columns = ['query_length', 'answer_length']\n", "features = cat_columns + num_columns\n", "stages = []\n", "\n", "# String Indexer + One Hot Encoder (for categorical columns)\n", "for cat_col in cat_columns:\n", " string_indexer = StringIndexer(inputCol=cat_col, outputCol=cat_col + '_index')\n", " encoder = OneHotEncoder(inputCol=cat_col + '_index', outputCol=cat_col + '_onehot')\n", " stages += [string_indexer, encoder]\n", "\n", "# Run StandardScaler on all the numerical features\n", "num_vector = VectorAssembler(inputCols=num_columns, outputCol = 'num_features')\n", "norm = StandardScaler(inputCol='num_features', outputCol='num_features_norm')\n", "stages += [num_vector, norm]\n", "\n", "# Assemble the categorical (one hot vectors) and numeric columns together\n", "assembler_inputs = [c + \"_onehot\" for c in cat_columns] + ['num_features_norm']\n", "assembler = VectorAssembler(inputCols=assembler_inputs, outputCol='features')\n", "stages += [assembler]\n", "\n", "# Run the pipeline\n", "pipeline = Pipeline(stages=stages)\n", "pipelineModel = pipeline.fit(spark_df)\n", "spark_df = pipelineModel.transform(spark_df)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+------------------------------------------------------------------+\n", "|features |\n", "+------------------------------------------------------------------+\n", "|(16,[0,13,14,15],[1.0,1.0,2.280420188456751,0.07560960809619262]) |\n", "|(16,[0,13,14,15],[1.0,1.0,1.1858184979975104,0.07560960809619262])|\n", "|(16,[0,13,14,15],[1.0,1.0,1.4594689206123206,0.07560960809619262])|\n", "|(16,[0,13,14,15],[1.0,1.0,3.0101546487629114,0.07560960809619262])|\n", "|(16,[0,13,14,15],[1.0,1.0,1.3682521130740506,0.07560960809619262])|\n", "|(16,[0,13,14,15],[1.0,1.0,3.0101546487629114,0.07560960809619262])|\n", "|(16,[0,13,14,15],[1.0,1.0,1.4594689206123206,0.07560960809619262])|\n", "|(16,[0,13,14,15],[1.0,1.0,2.280420188456751,0.07560960809619262]) |\n", "|(16,[0,13,14,15],[1.0,1.0,1.9155529583036708,0.07560960809619262])|\n", "|(16,[0,13,14,15],[1.0,1.0,1.9155529583036708,0.07560960809619262])|\n", "|(16,[0,13,14,15],[1.0,1.0,1.9155529583036708,0.07560960809619262])|\n", "|(16,[0,13,14,15],[1.0,1.0,1.9155529583036708,0.07560960809619262])|\n", "|(16,[0,13,14,15],[1.0,1.0,1.9155529583036708,0.07560960809619262])|\n", "|(16,[0,13,14,15],[1.0,1.0,1.9155529583036708,0.07560960809619262])|\n", "|(16,[0,13,14,15],[1.0,1.0,1.9155529583036708,0.07560960809619262])|\n", "|(16,[0,13,14,15],[1.0,1.0,1.9155529583036708,0.07560960809619262])|\n", "|(16,[0,13,14,15],[1.0,1.0,1.9155529583036708,0.07560960809619262])|\n", "|(16,[0,13,14,15],[1.0,1.0,1.9155529583036708,0.07560960809619262])|\n", "|(16,[3,13,14,15],[1.0,1.0,2.462853803533291,0.07560960809619262]) |\n", "|(16,[2,13,14,15],[1.0,1.0,1.5506857281505906,0.07560960809619262])|\n", "+------------------------------------------------------------------+\n", "only showing top 20 rows\n", "\n" ] } ], "source": [ "spark_df.select('features').show(truncate = False)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "from pyspark.ml.clustering import KMeans\n", "\n", "# Train a k-means model.\n", "kmeans = KMeans().setK(40)\n", "model = kmeans.fit(spark_df)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Within Set Sum of Squared Errors = 6885.514765318761\n" ] } ], "source": [ "# Evaluate clustering by computing Within Set Sum of Squared Errors.\n", "wssse = model.computeCost(spark_df)\n", "print(\"Within Set Sum of Squared Errors = \" + str(wssse))" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+----------+-----+----------+-----+\n", "|qtype_name|proto|prediction|count|\n", "+----------+-----+----------+-----+\n", "| TXT| tcp| 0| 135|\n", "| TXT| udp| 0|11149|\n", "| PTR| udp| 1|11809|\n", "| AXFR| tcp| 2| 87|\n", "| AXFR| tcp| 3| 78|\n", "| A| udp| 4|28609|\n", "| PTR| udp| 5|40370|\n", "| PTR| tcp| 5| 25|\n", "| A| udp| 6|55667|\n", "| NB| udp| 7|11374|\n", "| NB| udp| 8|20059|\n", "| AAAA| udp| 9| 2|\n", "| -| udp| 9| 180|\n", "| -| tcp| 9| 5|\n", "| AAAA| udp| 10| 9241|\n", "| *| udp| 11| 144|\n", "| AAAA| udp| 11| 71|\n", "| SRV| udp| 11|10419|\n", "| NAPTR| udp| 12| 27|\n", "| A| udp| 12|25369|\n", "| MX| udp| 12| 163|\n", "| NB| udp| 13|15787|\n", "| AAAA| udp| 14| 6062|\n", "| AXFR| tcp| 15| 68|\n", "| -| udp| 15| 37|\n", "| PTR| udp| 15| 48|\n", "| *| udp| 16| 52|\n", "| A| udp| 16| 6059|\n", "| PTR| udp| 17| 100|\n", "| TXT| udp| 17| 12|\n", "| PTR| tcp| 17| 1|\n", "| AXFR| tcp| 18| 107|\n", "| SOA| udp| 19| 31|\n", "| NB| udp| 19| 1920|\n", "| MX| udp| 20| 6|\n", "| AXFR| tcp| 20| 24|\n", "| *| udp| 20| 652|\n", "| HINFO| udp| 20| 30|\n", "| AAAA| udp| 21|14022|\n", "| AAAA| udp| 22| 3764|\n", "| -| tcp| 23| 163|\n", "| -| udp| 23| 3255|\n", "| SRV| udp| 24| 727|\n", "| A| udp| 25| 614|\n", "| *| udp| 25| 31|\n", "| AAAA| udp| 25| 38|\n", "| A| udp| 26| 5429|\n", "| TXT| udp| 27| 1445|\n", "| A| udp| 28|13479|\n", "| A| udp| 29|50330|\n", "+----------+-----+----------+-----+\n", "only showing top 50 rows\n", "\n" ] } ], "source": [ "# Lets look at some of the clustering results\n", "transformed = model.transform(spark_df).select(features + ['prediction'])\n", "transformed.groupby(cat_columns + ['prediction']).count().sort('prediction').show(50)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "# The clusters seem to look okay\n", "We can see that there's some natural grouping/clusters around the different qtype_names and protocpls but we also see that many of the query types/protocols are in several clusters... so lets take a closer look at the 'TXT' queries (Note: Replace 'TXT', with any other type and feel free to explore the other 'sub-clusters')" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+----------+-----+------------+-------------+----------+-----+\n", "|qtype_name|proto|query_length|answer_length|prediction|count|\n", "+----------+-----+------------+-------------+----------+-----+\n", "| TXT| udp| 12| 1| 0| 488|\n", "| TXT| udp| 9| 1| 0| 21|\n", "| TXT| udp| 12| 12| 0| 1|\n", "| TXT| tcp| 12| 5| 0| 106|\n", "| TXT| udp| 23| 1| 0| 12|\n", "| TXT| udp| 22| 1| 0| 62|\n", "| TXT| udp| 14| 17| 0| 1|\n", "| TXT| tcp| 12| 1| 0| 24|\n", "| TXT| udp| 12| 5| 0| 214|\n", "| TXT| udp| 14| 1| 0|10305|\n", "| TXT| udp| 13| 1| 0| 31|\n", "| TXT| udp| 24| 1| 0| 1|\n", "| TXT| tcp| 12| 12| 0| 5|\n", "| TXT| udp| 13| 6| 0| 13|\n", "| TXT| udp| 29| 32| 17| 6|\n", "| TXT| udp| 36| 33| 17| 6|\n", "| TXT| udp| 36| 1| 27| 9|\n", "| TXT| udp| 33| 1| 27| 2|\n", "| TXT| udp| 36| 22| 27| 2|\n", "| TXT| udp| 40| 1| 27| 347|\n", "| TXT| udp| 39| 1| 27| 98|\n", "| TXT| udp| 41| 1| 27| 436|\n", "| TXT| udp| 35| 1| 27| 16|\n", "| TXT| udp| 42| 1| 27| 532|\n", "| TXT| udp| 34| 1| 27| 3|\n", "| TXT| udp| 72| 1| 34| 4|\n", "| TXT| udp| 64| 1| 34| 2|\n", "| TXT| udp| 67| 1| 34| 2|\n", "| TXT| udp| 82| 11| 34| 6|\n", "| TXT| udp| 83| 1| 34| 2|\n", "| TXT| udp| 12| 33| 37| 22|\n", "| TXT| tcp| 12| 33| 37| 91|\n", "+----------+-----+------------+-------------+----------+-----+\n", "\n" ] } ], "source": [ "# Lets look at the 'TXT' query_name clusters\n", "txt_queries = transformed.where(transformed['qtype_name'] == 'TXT').groupby(features + ['prediction']).\\\n", "count().sort('prediction')\n", "txt_queries.show(50)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA7MAAAF6CAYAAADLZg86AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nOzdf3TV9Z3v++eHJGTvJBuSQAgQyw4QSbii2JjaUyqTgGiwOHqxlUsHuVItlRHR9mgPLpwZMt6D9HjxrmWBUdAuYWKpRxzldKLHjqWwkWqnDeFHdWCHaLKZxhiJ+SHJDrCTfO4fX9gSSSA77CQkvB5rfRfZ38/n/f2+E5bgm88vY61FREREREREZDAZNtAJiIiIiIiIiERKxayIiIiIiIgMOipmRUREREREZNBRMSsiIiIiIiKDjopZERERERERGXRiBzqBSzF69GibmZk50GmIiIiIiIhIH9i3b1+dtTatq7ZBXcxmZmZSWlo60GmIiIiIiIhIHzDGBLpr69dpxsaYl40xNcaYL4wx5caYH565n2mMscaY5nOuv+/P3ERERERERGTw6O+R2bXA/dbaU8aYHGC3MWY/8PmZ9mRrbVs/5yQiIiIiIiKDTL+OzFprP7TWnjr78cw1uT9zEBERERERkcGv33czNsb8kzEmCBwBaoC3zmkOGGP+Yox5yRgzur9zExERERERkcGh34tZa+2DgAeYCbwOnALqgG8AXuCGM+2/7CreGPMjY0ypMab0+PHj/ZO0iIiIiIiIXFaMtXbgXm7M88B/WGt//pX7Y3FGbUdYa090F5+Xl2e1m7GIiIiIyNDwxRdf8NlnnxEKhQY6FekHcXFxjBkzhhEjRnTbxxizz1qb11XbQB/NE0vXa2bPVtj9PnIsIiIiIiL974svvqC2tpaMjAzcbjfGmIFOSfqQtZbW1laqq6sBLljQdqffikVjzBhjzEJjTJIxJsYYUwh8H9hpjPmmMSbbGDPMGDMK+Dmw21rb1F/5iYiIiIjIwPnss8/IyMggISFBhewVwBhDQkICGRkZfPbZZ716Rn+OfFrgb4G/AA3AOuDH1tpfA5OAt4ETwAc462i/34+5iYiIiIjIAAqFQrjd7oFOQ/qZ2+3u9bTyfptmbK09DuR30/Yr4Ff9lYuIiIiIiFx+NCJ75bmU3/OBXjMrIiIiIiJyaayFQAB8PvD7obUV3G7Izob8fPB6QYXykKNiVkREREREBq+KCigudorZuDhITgaXC0Ih2LsXdu1yitnFiyEra6CzlSjSbsEiIiIiIjI4lZXBmjXQ0OAUrBkZkJgI8fHOrxkZzv2GBqdfWdmApZqZmcm6desG7P1DkYrZoaijAz7bC3/4IfxmBvzvXOfXP/zQud/RMdAZikg/s9ZS1VjF1gNbWbVzFT/5zU9YtXMVWw9spaqxioE8c1xERKRXKipg/XpIS4PU1O6nERvjtKelwYYNTlyU1dbW8sgjjzB58mTi4+PJyMjgtttu46233or6u85asmQJt99+e589vzs+n48bbrgBl8vFpEmTeP755/s9h7M0zXio+XQn/PlJaK0GEwuxHjDDoSMEx/fCZ7vBnQHX/gOMvXmgsxWRflBRX0HxwWICTQHihsWR7ErGFeMi1B5i77G97KrahXekl8XTF5OVqulXIiIyCFjrTC32eCAhoWcxCQmQlOTEFRVFbQ1tVVUV3/72t/F4PKxdu5bp06fT0dHBzp07WbZsGceOHYvKe/pKW1sbMTExPdqIqbKyku985zvcd999vPzyy+zdu5cHH3yQtLQ0vvvd7/ZDtp1pZHYoqfwl/HEZnG4E13hwjYXYRIhxOb+6xjr3Tzc6/Sp/OdAZi0gfK6spY827a2g42YB3pJeMERkkDk8kPjaexOGJZIzIwDvSS8PJBta8u4aymoGbfiUiItJjgYBzpaREFpeS8mVslDz44IMAlJaWsmDBArKzs5k6dSoPPfQQhw4d6jbOGMNrr73W6d5XpyJv2rSJKVOm4HK5GD16NIWFhbS1tVFUVMTWrVt58803McZgjGH37t0AVFdXs3DhQlJSUkhJSWHevHkcPXo0/MyioiKmTZvGli1bwiPJLS0tPfpen3/+ecaPH8/69euZOnUqS5cu5d577x2w6dMqZoeKT3fCn4sgLgWGp4Dp5rfWDHPa41Kc/p/u7M8sRaQfVdRXsP6P60lLSCPVndrtv7gaY0h1p5KWkMaGP26goj76069ERESiyudzNnuKdHTVGCfO54tKGvX19bz99tssX76cpKSk89qTk5N7/ezS0lKWL1/O6tWr8fv97Ny5k7lz5wLw2GOPsWDBAubMmUNNTQ01NTXMmDGDYDDIrFmzcLlc+Hw+3n//fcaNG8ecOXMIBoPhZ1dWVrJt2za2b9/OwYMHcblcbNmyBWMMVVVV3eb0/vvvc+utt3a6V1hYSGlpaa/Pir0UKmaHgo4OZ2pxTALE9nCaRWyC0//PT2oNrcgQZK2l+GAxnuEeEuJ69udCQlwCScOTKD5YrDW0IiJyefP7nV2LeyM5GcrLo5JGRUUF1lqmTp0aleed69ixYyQmJnLHHXfg9XqZPn06P/nJT4iNjSUpKQm32018fDxjx45l7NixDB8+nFdeeQVrLS+99BLXXXcdOTk5bNq0iebmZkpKSsLPPn36NMXFxeTm5jJt2jRiY2MZOXIk2dnZxMXFdZvTp59+Snp6eqd76enptLW1UVdXF/WfwcWomB0K6t5z1sjGjYwsLm6kE1f3Xt/kJSIDJtAUINAUIMUV2fSrFFdKOFZEROSy1doKsb3c/icmxomPgr78x99bbrkFr9fLxIkTWbRoEVu3buXEiRMXjNm3bx+VlZV4PB6SkpJISkpi5MiRNDQ08NFHH4X7XXXVVecVpfPnz+fIkSNkZGT0yffTF1TMDgUfb3E2e+puanF3zDAn7uMtfZGViAwgX5WPuGFxPdrM4VzGGOKGxeGris70KxERkT7hdkNbW+9i29ud+Ci4+uqrMcZw+PDhiGONMecVw+dO1fV4PJSVlfHqq68yYcIE1q5dS05ODp988km3z+zo6OD666/nwIEDna7y8nIeeOCBcL/ExMSI8wUYO3YstbW1ne7V1tYSGxvL6NGje/XMS6Fidiho+g9n1+LeiPVAU+T/8YnI5c3/uZ9kV++mXyW7kin/PDrTr0RERPpEdjY0NvYutrERpkyJShqpqakUFhayYcMGmpubu3hV9zmmpaVRU1MT/lxbW9vpM0BsbCyzZ89m7dq1HDp0iJaWlvB04eHDh9Pe3t6pf25uLhUVFYwePZqsrKxOV2pq6qV8qwB861vf4p133ul075133iEvL++C05P7iorZoaDjJJiY3sWaGOiIzjQLEbl8tLa1Ejusd9OvYobF0NqmPxdEROQylp8PoZBzRE8krHXi8vOjlsrGjRux1pKXl8f27dvx+/0cOXKE5557juuuu67buNmzZ7Nx40ZKS0vZv38/S5YsweVyhdtLSkp49tln2b9/P4FAgG3btnHixInw+tzMzEw++OAD/H4/dXV1hEIhFi1aRHp6OnfeeSc+n4/Kykr27NnDo48+2mlH46688cYb5OTkUF1d3W2fZcuWUV1dzY9//GMOHz7Miy++yJYtW3jsscci/KlFh86ZHQqGuZxzZHvDtsOw6EyzEOlP1loCTQF8VT78n/tpbWvFHesme1Q2+Zn5eEd6I55iO5S4Y92E2kPEEx9xbHtHO+5Y/bkgIiKXMa/XuRoaIJIRx4aGL2OjZNKkSZSVlfHUU0+xcuVKqqurGTVqFNOnT2fz5s3dxj3zzDPcf//9FBQUkJ6eztNPP91punJycjI7duzgySefJBgMMnnyZF588UVmzpwJwNKlS9m9ezd5eXk0Nzeza9cuCgoK2LNnD48//jh33303TU1NjB8/nlmzZpFykWOMmpqa8Pv9F9yVeOLEibz11lv85Cc/4bnnnmP8+PH8/Oc/H5AzZgHMYN6xMi8vz5aWlg50GgPvDz+E43udc2QjdfJTSLsJ/suL0c9LpI9U1FdQfLCYQFOAuGFxJLuSiR0WS1tHG40nGwl1hPCO9LJ4+mKyUrMGOt0BsfXAVvYe20vGiMg3caj+opqbJtzEvdff2weZiYiIdO3w4cOR7QpcUQFr1kBaGiT0YOf+YBDq6mDVKsi6Mv//4HJ1od97Y8w+a21eV22aZjwUTFoCtg1shEfs2A4nbtKSvshKpE+U1ZSx5t01NJxswDvSS8aIDBKHJxIfG0/i8EQyRmTgHeml4WQDa95dQ1lN2UCnPCDyM/MJdYQi3mXRWkuoI0R+ZvSmX4mIiPSJrCxYsQKOH4f6+u6nHFvrtNfVOf1VyA4ZKmaHgtEzwJ0BoabI4kJNTtzoGX2Tl0iUVdRXsP6P60lLSCPVndrtNGJjDKnuVNIS0tjwxw1U1Ff0c6YDzzvSGy7qI3H2Hwm8I6M3/UpERKTP5ObCE09ASgoEAlBdDS0tcPKk82t1tXM/JcUZkf361wc6Y4kiFbNDwbBhcO0/QHsQ2oI9i2kLOv2v/QcnXuQyZ62l+GAxnuEeEuJ6MJUISIhLIGl4EsUHi/v0HLjLkTGGxdMXc+L0CYKhnv25EAwFaT7dzOLpi6/o9cYiIjLIZGVBUZFz3XQTDB8Op087v950E/zjPzptGpEdcrQB1FAx9ma4tgj+XAQdpyBuZNfnztoOZ0S2PQjXPunEiQwCgaYAgaZAxCOGKa6UcGxmcmbfJHeZykrNYsWNK1j/x/V4hntIcaV0WaRaa2k42UDz6WZW3Ljiil1nLCIig5gxkJnpXHLF0JDcUDJxEdz4PAxPhpOfOJs7tbVA+0nn15OfOveHJzv9Jn5/oDMW6TFflY+4YXERjxgaY4gbFoevytdHmV3ecsfl8sTMJ8JFffUX1bScbuFk20laTrdQ/UU1gaYAKa4UVs1cxdfHafqViIiIDA4amR1qxt4MY2ZB3Xvw8RZoOuycIzvM7exaPPkHMOpbmlosg47/cz/JruRexSa7kin/vDzKGQ0eWalZFBUUhY8yKv+8PHyU0U0TbqIgs4AJIydoarGIiIgMKipmh6Jhw2DMTc4lMkS0trXiinFdvGMXYobF0NrWGuWMBhdjDJnJmWRenznQqYiIiIhEhYpZERkU3LFuQu0h4omPOLa9ox13rLsPshIREZHLgbU2PAPJ/7k/PAMpe1Q2+Zn5eEd6NQNpCFIxKyKDQvaobPYe20vi8MSIYxtPNnLTBM1UEBERGYoq6isoPlhMoClA3LA4kl3JuGJchNpD7D22l11Vu/CO9LJ4+mJtcjjEaOGkiAwK+Zn5hDpCER+xY60l1BEiPzO/jzITERGRgVJWU8aad9eEz0nPGJFB4vBE4mPjSRyeSMaIjPC562veXUNZTdmA5ZqZmcm6desG7P1DkYpZERkUvCO94b+MInH2L7dIj/QRERGRy1tFfQXr/7ietIQ0Ut2p3U4jNsaQ6k4lLSGNDX/cQEV9RdRzqa2t5ZFHHmHy5MnEx8eTkZHBbbfdxltvvRX1d521ZMkSbr/99j57fldqamr4m7/5G3JycoiJiWHJkiXn9SkoKMAYc951zTXXRD0fFbMiMigYY1g8fTEnTp8gGAr2KCYYCtJ8upnF0xdrnYyIiMgQYq2l+GAxnuEeEuISehSTEJdA0vAkig8WRzzT60KqqqrIzc3lN7/5DWvXruXQoUP89re/Zd68eSxbtixq7+krbW1tPf55nDp1itGjR/P444/zzW9+s8s+r7/+OjU1NeGrqqoKj8fDggULopk2oGJWRAaRrNQsVty4guPB49S31nf7B6+1lvrWeuqCday4cYXWx4iIiAwxgaZA+Jz0SJw9dz3QFIhaLg8++CAApaWlLFiwgOzsbKZOncpDDz3EoUOHuo0zxvDaa691uvfVqcibNm1iypQpuFwuRo8eTWFhIW1tbRQVFbF161befPPN8Mjn7t27AaiurmbhwoWkpKSQkpLCvHnzOHr0aPiZRUVFTJs2jS1btoRHkltaWnr0vWZmZvLzn/+cJUuWkJqa2mWf1NRUxo4dG7727t1LMBjkvvvu69E7IqFiVkQGldxxuTwx84nwX0bVX1TTcrqFk20naTndQvUX1eG/3FbNXMXXx319oFMWERGRKPNV+YgbFhfxzCtjDHHD4vBV+aKSR319PW+//TbLly8nKSnpvPbk5OReP7u0tJTly5ezevVq/H4/O3fuZO7cuQA89thjLFiwgDlz5oRHQGfMmEEwGGTWrFm4XC58Ph/vv/8+48aNY86cOQSDX85sq6ysZNu2bWzfvp2DBw/icrnYsmULxhiqqqp6nXNXXnjhBebOncvXvva1qD4XtJuxiAxCWalZFBUUhbfgL/+8PLwF/00TbqIgs4AJIydoarGIiMgQ5f/cT7Krd4VisiuZ8s/Lo5JHRUUF1lqmTp0aleed69ixYyQmJnLHHXfg8Xjwer1Mnz4dgKSkJNxuN/Hx8YwdOzYc8/LLL2Ot5aWXXgr/f9CmTZsYM2YMJSUl4am+p0+fpri4mPT09HDsyJEjyc7OJi4uLmrfQ3l5OT6fjx07dkTtmedSMSsig5IxhszkTDKvzxzoVERERKSftba14opx9So2ZlgMrW2tUckjmmtvv+qWW27B6/UyceJECgsLufXWW7nrrrvweDzdxuzbt4/Kysrz+gSDQT766KPw56uuuqpTIQswf/585s+fH9Xv4YUXXmDcuHHMmzcvqs89S9OMRURERERkUHHHumnraOtVbHtHO+5Yd1TyuPrqqzHGcPjw4YhjjTHnFcOhUCj8tcfjoaysjFdffZUJEyawdu1acnJy+OSTT7p9ZkdHB9dffz0HDhzodJWXl/PAAw+E+yUmJkacb6ROnz7N1q1b+cEPfkBsbN+MoaqYFRERERGRQSV7VDaNJxt7Fdt4spEpo6ZEJY/U1FQKCwvZsGEDzc3N57+rsfsc09LSqKmpCX+ura3t9BkgNjaW2bNnh3dJbmlpoaSkBIDhw4fT3t7eqX9ubi4VFRWMHj2arKysTld3Gzb1lR07dlBXV8f999/fZ+9QMSsiIiIiIoNKfmY+oY5QxNN8rbWEOkLkZ+ZHLZeNGzdirSUvL4/t27fj9/s5cuQIzz33HNddd123cbNnz2bjxo2Ulpayf/9+lixZgsv15dTpkpISnn32Wfbv308gEGDbtm2cOHEivD43MzOTDz74AL/fT11dHaFQiEWLFpGens6dd96Jz+ejsrKSPXv28Oijj3ba0bgrb7zxBjk5OVRXV1+w39nR3i+++IL6+noOHDjAf/zHf5zXb/Pmzdx8881MmjTpgs+7FFozKyIiIiIig4p3pBfvSC8NJxtIdfd8xLHhZEM4NlomTZpEWVkZTz31FCtXrqS6uppRo0Yxffp0Nm/e3G3cM888w/33309BQQHp6ek8/fTTnaYrJycns2PHDp588kmCwSCTJ0/mxRdfZObMmQAsXbqU3bt3k5eXR3NzM7t27aKgoIA9e/bw+OOPc/fdd9PU1MT48eOZNWsWKSkXPsaoqakJv9/faapzV77+9c4nRfzrv/4rXq+30y7IH3/8Mb/73e945ZVXLvisS2X6ctHyeS8z5mXgZiAR+BR42lr74pm2m4GNwATg34El1toLHgCVl5dnS0tL+zZpERERERHpc4cPH45oV+CK+grWvLuGtIQ0EuISLto/GApSF6xj1cxVOoP+MnOh33tjzD5rbV5Xbf09zXgtkGmtHQHcAfx3Y8wNxpjRwOvA3wOpQCnwP/s5NxERERERGSSyUrNYceMKjgePU99a3+2UY2st9a311AXrWHHjChWyQ0i/TjO21n547scz12TgBuBDa+12AGNMEVBnjMmx1h7pzxxFRERERGRwyB2XyxMzn6D4YDGBpgBxw+JIdiUTMyyG9o52Gk82EuoI4R3p5eFvPqxCdojp9zWzxph/ApYAbmA/8BawBjh4to+1tsUY8xFwDXDkK/E/An4EMGHChP5JWkRERERELktZqVkUFRQRaArgq/JR/nk5rW2tuGPd3DThJgoyC5gwcgLGmIFOVaKs34tZa+2DxpgVwLeAAuAUkAQc/0rXJuC8E4GttZuBzeCsme3TZEVERERE5LJnjCEzOZPM6zMHOhXpRwNyNI+1tt1auxe4CvhboBkY8ZVuI4AT/Z2biIiIiIiIXP4G+pzZWJw1sx8C08/eNMYknnNfREREREREpJN+K2aNMWOMMQuNMUnGmBhjTCHwfWAn8AYwzRjzXWOMC/gH4JA2fxIREREREZGu9OeaWYszpfh5nCI6APzYWvtrAGPMd4ENwMs458wu7MfcRERERERkkLIWAgHw+cDvh9ZWcLshOxvy88HrBe3/NPT0WzFrrT0O5F+g/bdATn/lIyIiIiIig19FBRQXO8VsXBwkJ4PLBaEQ7N0Lu3Y5xezixZClk3mGlIFeMysiIiIiItIrZWWwZg00NDgFa0YGJCZCfLzza0aGc7+hwelXVjZwuWZmZrJu3bqBS2AIUjErIiIiIiKDTkUFrF8PaWmQmtr9NGJjnPa0NNiwwYmLttraWh555BEmT55MfHw8GRkZ3Hbbbbz11lvRf9kZS5Ys4fbbb++z53fl9ddf59ZbbyUtLQ2Px8M3v/lNfv3rX3fqU1BQgDHmvOuaa66Jej4qZkVEREREZFCx1pla7PFAQkLPYhISICnJibM2erlUVVWRm5vLb37zG9auXcuhQ4f47W9/y7x581i2bFn0XtRH2trasD38gfh8PmbPns2bb77J/v37+c53vsP8+fN59913w31ef/11ampqwldVVRUej4cFCxZEPXcVsyIiIiIiMqgEAs6VkhJZXErKl7HR8uCDDwJQWlrKggULyM7OZurUqTz00EMcOnSo2zhjDK+99lqne1+dirxp0yamTJmCy+Vi9OjRFBYW0tbWRlFREVu3buXNN98Mj3zu3r0bgOrqahYuXEhKSgopKSnMmzePo0ePhp9ZVFTEtGnT2LJlS3gkuaWlpUff67PPPsvjjz/OjTfeSFZWFqtXr+aGG25gx44d4T6pqamMHTs2fO3du5dgMMh9993Xo3dEQsWsiIiIiIgMKj6fs9lTpDsUG+PE+XzRyaO+vp63336b5cuXk5SUdF57cnJyr59dWlrK8uXLWb16NX6/n507dzJ37lwAHnvsMRYsWMCcOXPCI6AzZswgGAwya9YsXC4XPp+P999/n3HjxjFnzhyCwWD42ZWVlWzbto3t27dz8OBBXC4XW7ZswRhDVVVVRHmeOHGClAv8q8ILL7zA3Llz+drXvtarn8OF9OfRPCIiIiIiIpfM73d2Le6N5GQoL49OHhUVFVhrmTp1anQeeI5jx46RmJjIHXfcgcfjwev1Mn36dACSkpJwu93Ex8czduzYcMzLL7+MtZaXXnoJc6bS37RpE2PGjKGkpCQ81ff06dMUFxeTnp4ejh05ciTZ2dnExcX1OMeNGzfyl7/8hcWLF3fZXl5ejs/n6zRyG00amRURERERkUGltRViezksFxPjxEdDT9ea9sYtt9yC1+tl4sSJLFq0iK1bt3LixIkLxuzbt4/Kyko8Hg9JSUkkJSUxcuRIGhoa+Oijj8L9rrrqqk6FLMD8+fM5cuQIGRkZPcrvX/7lX/jpT3/Ktm3b8Hq9XfZ54YUXGDduHPPmzevRMyOlYlZERERERAYVtxva2noX297uxEfD1VdfjTGGw4cPRxxrjDmvGA6FQuGvPR4PZWVlvPrqq0yYMIG1a9eSk5PDJ5980u0zOzo6uP766zlw4ECnq7y8nAceeCDcLzExMeJ8z/Xaa6+xePFi/vmf/5m//uu/7rLP6dOn2bp1Kz/4wQ+I7e2/PFyEilkRERERERlUsrOhsbF3sY2NMGVKdPJITU2lsLCQDRs20Nzc3MW7uk8yLS2Nmpqa8Ofa2tpOnwFiY2OZPXt2eJfklpYWSkpKABg+fDjt7e2d+ufm5lJRUcHo0aPJysrqdKWmpl7Ktxr26quvsnjxYrZs2cL3vve9bvvt2LGDuro67r///qi8tysqZkVEREREZFDJz4dQKPIjdqx14vLzo5fLxo0bsdaSl5fH9u3b8fv9HDlyhOeee47rrruu27jZs2ezceNGSktL2b9/P0uWLMHlcoXbS0pKePbZZ9m/fz+BQIBt27Zx4sSJ8PrczMxMPvjgA/x+P3V1dYRCIRYtWkR6ejp33nknPp+PyspK9uzZw6OPPtppR+OuvPHGG+Tk5FBdXd1tn1deeYVFixbxs5/9jL/6q7/i008/5dNPP6W+vv68vps3b+bmm29m0qRJF/sR9pqKWRERERERGVS8XudqaIgsrqHhy9homTRpEmVlZdxyyy2sXLmS6667jtmzZ/PrX/+azZs3dxv3zDPPMGnSJAoKCvje977HD3/4Q8aMGRNuT05OZseOHcyZM4ecnBzWrVvHiy++yMyZMwFYunQpU6dOJS8vj7S0NH7/+9+TkJDAnj17mDRpEnfffTc5OTnce++9NDQ0XHDHYYCmpib8fn+nqc5f9fzzz9PW1saPf/xjxo0bF77uuuuuTv0+/vhjfve737F06dKe/Ah7zfTlouW+lpeXZ0tLSwc6DRERERERuUSHDx+OaFfgigpYswbS0iAh4eL9g0Goq4NVqyAr6xISlai70O+9MWaftTavqzaNzIqIiIiIyKCTlQUrVsDx41Bf3/2UY2ud9ro6p78K2aFD58yKiIiIiMiglJsLTzwBxcUQCEBcnHOObEyMs2txY6OzRtbrhYcfViE71KiYFRERERGRQSsrC4qKnGLW54PycuccWbcbbroJCgpgwgQwZqAzlWhTMSsiIiIiIoOaMZCZ6Vxy5dCaWRERERERERl0VMyKiIiIiIjIoKNiVkRERERERAYdrZkVEREREZFBzVpLIBDA5/Ph9/tpbW3F7XaTnZ1Nfn4+Xq8Xox2ghhwVsyIiIiIiMmhVVFRQXFxMIBAgLi6O5ORkXC4XoVCIvXv3smvXLrxeL4sXLyZLZ/MMKZpmLCIiIiIig1JZWRlr1qyhoaEBr9dLRkYGiYmJxMfHk5iYSEZGBl6vl4aGBtasWUNZWdmA5ZqZmXHQ7LoAACAASURBVMm6desG7P1DkYpZEREREREZdCoqKli/fj1paWmkpqZ2O43YGENqaippaWls2LCBioqKqOdSW1vLI488wuTJk4mPjycjI4PbbruNt956K+rvOmvJkiXcfvvtffb8rvh8PmbMmMGoUaNwu93k5ORcsED/1a9+hTGmz/LUNGMRERERERlUrLUUFxfj8XhISEjoUUxCQgJJSUkUFxdTVFQUtTW0VVVVfPvb38bj8bB27VqmT59OR0cHO3fuZNmyZRw7diwq7+krbW1txMTE9OjnkZSUxMMPP8y1115LQkICv//973nggQdISEjgwQcf7NT3448/5qc//SkzZ87sq9Q1MisiIiIiIoNLIBAgEAiQkpISUVxKSko4NlrOFnGlpaUsWLCA7Oxspk6dykMPPcShQ4e6jTPG8Nprr3W699WpyJs2bWLKlCm4XC5Gjx5NYWEhbW1tFBUVsXXrVt58802MMRhj2L17NwDV1dUsXLiQlJQUUlJSmDdvHkePHg0/s6ioiGnTprFly5bwSHJLS0uPvtcbbriBhQsXcs011zBx4kTuueceCgsLeffddzv1C4VCfP/732fNmjVMmjSpR8/uDRWzIiIiIiIyqPh8PuLi4iIeXTXGEBcXh8/ni0oe9fX1vP322yxfvpykpKTz2pOTk3v97NLSUpYvX87q1avx+/3s3LmTuXPnAvDYY4+xYMEC5syZQ01NDTU1NcyYMYNgMMisWbNwuVz4fD7ef/99xo0bx5w5cwgGg+FnV1ZWsm3bNrZv387BgwdxuVxs2bIFYwxVVVU9znH//v2899575Ofnd7r/xBNPkJmZyb333tvr778nNM1YREREREQGFb/f3+tCMTk5mfLy8qjkUVFRgbWWqVOnRuV55zp27BiJiYnccccdeDwevF4v06dPB5zpvm63m/j4eMaOHRuOefnll7HW8tJLL4UL/U2bNjFmzBhKSkpYsGABAKdPn6a4uJj09PRw7MiRI8nOziYuLu6iuV111VUcP36ctrY2Vq9ezbJly8Jt//Zv/8arr77KgQMHovJzuBAVsyIiIiIiMqi0trbicrl6FRsTE0Nra2tU8rDWRuU5Xbnlllvwer1MnDiRwsJCbr31Vu666y48Hk+3Mfv27aOysvK8PsFgkI8++ij8+aqrrupUyALMnz+f+fPn9yi3d999l+bmZv7whz+wcuVKJk6cyOLFizl+/DhLlizhV7/61SWNSveUilkRERERERlU3G43oVCI+Pj4iGPb29txu91RyePqq6/GGMPhw4d7XAieZYw5rxgOhULhrz0eD2VlZezZs4d33nmHtWvXsmrVKv70pz8xfvz4Lp/Z0dHB9ddfzyuvvHJeW2pqavjrxMTEiHL9qokTJwJw7bXXUltbS1FREYsXL+bDDz+kpqaGm2++uVNOALGxsXz44YdkZ2df0rvPpTWzIiIiIiIyqGRnZ9PY2Nir2MbGRqZMmRKVPFJTUyksLGTDhg00Nzd3+a7upKWlUVNTE/5cW1vb6TM4BeDs2bNZu3Ythw4doqWlhZKSEgCGDx9Oe3t7p/65ublUVFQwevRosrKyOl3nFrPR1NHRwalTpwD4xje+wZ///GcOHDgQvu644w5mzpzJgQMHwkVwtKiYFRERERGRQSU/P59QKBTxNF9rLaFQ6LwNiy7Fxo0bsdaSl5fH9u3b8fv9HDlyhOeee47rrruu27jZs2ezceNGSktL2b9/P0uWLOk0dbqkpIRnn32W/fv3EwgE2LZtGydOnAivz83MzOSDDz7A7/dTV1dHKBRi0aJFpKenc+edd+Lz+aisrGTPnj08+uijnXY07sobb7xBTk4O1dXV3fZZv349JSUlHD16lKNHj/KLX/yCdevWcc899wDOiO+0adM6XcnJyXg8HqZNm8bw4cMj+dFelKYZi4iIiIjIoOL1evF6vTQ0NEQ04tjQ0BCOjZZJkyZRVlbGU089xcqVK6murmbUqFFMnz6dzZs3dxv3zDPPcP/991NQUEB6ejpPP/00hw8fDrcnJyezY8cOnnzySYLBIJMnT+bFF18Mn9u6dOlSdu/eTV5eHs3NzezatYuCggL27NnD448/zt13301TUxPjx49n1qxZFz3GqKmpCb/f32mq81e1t7ezcuVKqqqqiI2NZfLkyfzsZz/rtAFUfzJ9uWi5r+Xl5dnS0tKBTkNERERERC7R4cOHI9oVuKKigjVr1pCWlkZCQsJF+weDQerq6li1ahVZWVmXkqpE2YV+740x+6y1eV21aZqxiIiIiIgMOllZWaxYsYLjx49TX1/f7ZRjay319fXU1dWxYsUKFbJDSL8Vs8aYeGPML4wxAWPMCWPMAWPMbWfaMo0x1hjTfM719/2Vm4iIiIiIDD65ubk88cQTpKSkEAgEqK6upqWlhZMnT9LS0kJ1dTWBQICUlBRWrVrF17/+9YFOWaKoP9fMxgL/CeQDx4DvAK8aY649p0+ytbatH3MSEREREZFBLCsri6KiIgKBAD6fj/LyclpbW3G73dx0000UFBQwYcIEjDEDnapEWb8Vs9baFqDonFslxphK4AZgX3/lISIiIiIiQ4sxhszMTDIzMwc6FelHA7Zm1hiTDkwBPjzndsAY8xdjzEvGmNHdxP3IGFNqjCk9fvx4v+QqIiIiIiIil5cBKWaNMXHAL4Gt1tojQB3wDcCLM1LrOdN+HmvtZmttnrU2Ly0trb9SFhERERERkctIv58za4wZBhQDp4GHAKy1zcDZM3ZqjTEPATXGGI+19kR/5ygiIiIiIiKXt34tZo2z6voXQDrwHWttdyfynt1XW0cHiYiIiIjIhVkLLQH4zAdf+KG9FWLcMCIbxuRDohe0AdSQ098js88BU4E51trWszeNMd8EGoGjQArwc2C3tbapn/MTEREREZHB5EQFVBY7xeywOIhLhmEu6AjB8b1Qu8spZicuBo/OmB1K+vOcWS/wAHA98Ok558kuAiYBbwMngA+AU8D3+ys3EREREREZhOrL4IM1cKoBErzgzoDYRIiJd351Zzj3TzU4/erLBizVzMxM1q1bN2DvH4r6rZi11gastcZa67LWJp1z/dJa+ytr7URrbaK1dpy19v+21n7aX7mJiIiIiMggc6IC/OshPg3iU7ufRmyM0x6fBv4NTlyU1dbW8sgjjzB58mTi4+PJyMjgtttu46233or6u85asmQJt99+e589vys+n48ZM2YwatQo3G43OTk55xXoBQUFGGPOu6655pqo59PvG0CJiIiIiIhcEmudqcWxHohN6FlMbAK0Jzlx1xZFbQ1tVVUV3/72t/F4PKxdu5bp06fT0dHBzp07WbZsGceOHYvKe/pKW1sbMTExmB78PJKSknj44Ye59tprSUhI4Pe//z0PPPAACQkJPPjggwC8/vrrnD59Ohxz6tQprr32WhYsWBD13LXBkoiIiIiIDC4tAecanhJZ3PCUL2Oj5GwRV1payoIFC8jOzmbq1Kk89NBDHDp0qNs4YwyvvfZap3tfnYq8adMmpkyZgsvlYvTo0RQWFtLW1kZRURFbt27lzTffDI987t69G4Dq6moWLlxISkoKKSkpzJs3j6NHj4afWVRUxLRp09iyZUt4JLmlpaVH3+sNN9zAwoULueaaa5g4cSL33HMPhYWFvPvuu+E+qampjB07Nnzt3buXYDDIfffd16N3RELFrIiIiIiIDC6f+ZzNniIdXTXGifvMF5U06uvrefvtt1m+fDlJSUnntScnJ/f62aWlpSxfvpzVq1fj9/vZuXMnc+fOBeCxxx5jwYIFzJkzh5qaGmpqapgxYwbBYJBZs2bhcrnw+Xy8//77jBs3jjlz5hAMBsPPrqysZNu2bWzfvp2DBw/icrnYsmULxhiqqqp6nOP+/ft57733yM/P77bPCy+8wNy5c/na177W659FdzTNWEREREREBpcv/M6uxb0RlwxflEcljYqKCqy1TJ06NSrPO9exY8dITEzkjjvuwOPx4PV6mT59OuBM93W73cTHxzN27NhwzMsvv4y1lpdeeik8bXjTpk2MGTOGkpKS8FTf06dPU1xcTHp6ejh25MiRZGdnExcXd9HcrrrqKo4fP05bWxurV69m2bJlXfYrLy/H5/OxY8eOXv8cLkTFrIiIiIiIDC7trc7xO71hYpz4KLDWRuU5Xbnlllvwer1MnDiRwsJCbr31Vu666y48Hk+3Mfv27aOysvK8PsFgkI8++ij8+aqrrupUyALMnz+f+fPn9yi3d999l+bmZv7whz+wcuVKJk6cyOLFi8/r98ILLzBu3DjmzZvXo+dGSsWsiIiIiIgMLjFu5xxZ4iOPte1OfBRcffXVGGM4fPhwjwvBs4wx5xXDoVAo/LXH46GsrIw9e/bwzjvvsHbtWlatWsWf/vQnxo8f3+UzOzo6uP7663nllVfOa0tNTQ1/nZiYGFGuXzVx4kQArr32WmpraykqKjqvmD19+jRbt25l6dKlxMb2TdmpNbMiIiIiIjK4jMiGUGPvYkONMGJKVNJITU2lsLCQDRs20NzcfF57Y2P3OaalpVFTUxP+XFtb2+kzQGxsLLNnz2bt2rUcOnSIlpYWSkpKABg+fDjt7e2d+ufm5lJRUcHo0aPJysrqdJ1bzEZTR0cHp06dOu/+jh07qKur4/777++T94KKWRERERERGWzG5Dsjs5FO87XWiRvT/YZFkdq4cSPWWvLy8ti+fTt+v58jR47w3HPPcd1113UbN3v2bDZu3EhpaSn79+9nyZIluFxfTp0uKSnh2WefZf/+/QQCAbZt28aJEyfC63MzMzP54IMP8Pv91NXVEQqFWLRoEenp6dx55534fD4qKyvZs2cPjz76aKcdjbvyxhtvkJOTQ3V1dbd91q9fT0lJCUePHuXo0aP84he/YN26ddxzzz3n9d28eTM333wzkyZNutiPsNc0zVhERERERAaXRK9znWqA+AhGHE83fBkbJZMmTaKsrIynnnqKlStXUl1dzahRo5g+fTqbN2/uNu6ZZ57h/vvvp6CggPT0dJ5++mkOHz4cbk9OTmbHjh08+eSTBINBJk+ezIsvvsjMmTMBWLp0Kbt37yYvL4/m5mZ27dpFQUEBe/bs4fHHH+fuu++mqamJ8ePHM2vWLFJSLnyMUVNTE36/v9NU569qb29n5cqVVFVVERsby+TJk/nZz3523gZQH3/8Mb/73e+6nO4cTaYvFy33tby8PFtaWjrQaYiIiIiIyCU6fPhwZLsCn6iAD9ZAfBrEJly8f1sQTtXBtFXgyep9ohJ1F/q9N8bss9bmddWmacYiIiIiIjL4eLIgewWcOg6n6rufcmyt036qzumvQnbI0DRjEREREREZnFJzYdoTUFkMLQEYFuecI2tinF2LQ43OGtlEL+Q8rEJ2iFExKyJyBbAWAgHw+cDvh9ZWcLshOxvy88HrhTNnq4uIiAwuniy4tsgpZj/zwRflzjmyMW5IuwnSCyBhgv6iG4JUzIqIDHEVFVBc7BSzcXGQnAwuF4RCsHcv7NrlFLOLF0OW/sFaREQGI2MgKdO55IqhYlZEZAgrK4P168HjOX/0NT4eEhOdUduGBlizBlasgNzcgctXRESubNZajEZQryiXsiGxNoASERmiKiqcQjYtDVJTu59dZYzTnpYGGzY4cSIiIv0tLi6O1tbWgU5D+llraytxcXG9io2omDXGJBhjZhhj/k9jzF3nXr16u4iI9AlrnanFHg8k9OC0AnD6JSU5cYP41DYRERmkxowZQ3V1NcFg8JJG62RwsNYSDAaprq5mzJgxvXpGj6cZG2PmAL8CRnWVCxDTqwxERCTqAgHn8kZ4JnxKypexmZl9kpqIiEiXRowYAcAnn3xCKBQa4GykP8TFxZGenh7+vY9UJGtmnwXeBFZZaz/p1dtERKRf+HzOZk+RLjsyxonz+VTMiohI/xsxYkSvCxu58kRSzGYCd6iQFRG5/Pn9zq7FvZGcDOXl0c1HREREJNoiWTP7eyC7rxIREZHoaW2F2F7uVx8T48SLiIiIXM4u+L86xphzD2h4HlhnjBkP/BnoNJHdWlsW/fRERKQ33G7nHNn4+Mhj29udeBEREZHL2cX+3b4UZ3Onc1ddbe6inzaAEhG5jGRnw969zjmykWpshJtuin5OIiIiItF0sWJ2Yr9kISIiUZWfD7t2OUfsRLIJlLXOiG5+ft/lJiIiIhINFyxmrbWBs18bY/4KeM9a23ZuH2NMLDADCCAiIpcFr9e5GhogNbXncQ0NX8aKiIiIXM4i2QBqF9DV/xKNPNMmIiKXCWNg8WI4cQKCwZ7FBIPQ3OzERXqkj4iIiEh/i6SYNThrY79qFNASnXRERCRasrJgxQo4fhzq650pxF2x1mmvq3P6Z2X1b54iIiIivXHRgxuMMb8+86UFXjbGnDqnOQaYBrzXB7mJiMglys2FJ56A4mIIBCAuzjlHNibG2bW4sdFZI+v1wsMPq5AVERGRwaMnpxB+fuZXAzQA554+eBrYC7wQ5bxERCRKsrKgqMgpZn0+KC93zpF1u51diwsKYMIETS0WERGRweWixay19gcAxpgqYJ21VlOKRUQGGWMgM9O5RERERIaCnozMAmCt/ce+TERERERERESkp3pczBpjKul6AygLnAQqgF9Ya3/dRR8RERERERGRqIlkN+OXcI7mOQq8fOY6euber4F24HVjzP8V7SRFREREREREztXjkVlgEvAza+3Pzr1pjPlvwP9hrb3LGLMKeBz4n1HMUURERERERKSTSEZm7wJe6+L+62faAP4FuLqrYGNMvDHmF8aYgDHmhDHmgDHmtnPabzbGHDHGBI0xu4wx3ghyExERERERkStIJMVsEJjZxf2ZZ9rAOXe2tYs+4IwC/yeQD4wE/g541RiTaYwZjVMU/z3OtOVSNLorIiIiIiIi3YhkmvGzwD8ZY/KAP5259w1gCfD/nPk8FzjQVfCZI32KzrlVcmZTqRuAUcCH1trtAMaYIqDOGJNjrT0SQY4iIiIiIiJyBYjkaJ61Z4rPh4Hvn7l9BLjPWnt2FPU54J968jxjTDowBfgQ+Fvg4DnvajHGfARcc+Yd58b9CPgRwIQJE3qavoiIiIiIiAwhkYzMYq19BXjlAu3dTTHuxBgTB/wS2GqtPWKMSQKOf6VbE+Dp4h2bgc0AeXl5XR0VJCIiIiIiIkNcRMXsWcaYZL6y3tZaW9/D2GFAMXAaeOjM7WZgxFe6jgBO9CY/ERERERERGdp6vAGUMcZrjPnfxphW4HOckdTjQB3nj6p29wwD/AJIB75rrQ2dafoQmH5Ov0Rg8pn7IiIiIiIiIp1EMjL7EpAM3A98AvRmiu9zwFRgzlemJL8B/L/GmO8CbwL/ABzS5k8iIiIiIiLSlUiK2RuB/2Kt/aA3LzpzbuwDwCngU2eQFoAHrLW/PFPIbgBeBv4dWNib94iIiIiIiMjQF0kxWwnE9/ZF1toAYC7Q/lsgp7fPFxERERERkStHj9fMAo8Aa40xWX2VjIiIiIiIiEhPRDIy+79wRmb9xphTQNu5jdbar+5GLCIiIiIiItInIilmH7p4FxEREREREZG+1+Ni1lq7tS8TEREREREREempSNbMYoxJN8Y8Zox5zhgz+sy9bxtjJvZNeiIiIiIiIiLn63Exa4y5AfADi3DOmj27RvYWYE30UxMRERERERHpWiQjs+uAZ621X8c5K/as3wDfjmpWIiIiIiIiIhcQSTF7A9DVutkaID066YiIiIiIiIhcXCTFbCuQ0sX9HOCz6KQjIiIiIiIicnGRFLP/C1htjIk/89kaYzKB/wH8S5TzEhEREREREelWJMXsY0AqcBxIAPYCFUAj8HfRT01ERERERESka5GcM/sFcJMxZjaQi1MIl1lrf9tXyYmIiIiIiIh0pcfF7FnW2t8Bv+uDXORyZy0EAuDzgd8Pra3gdkN2NuTng9cLxgx0liIiIiIicgW4YDFrjPmvPX2Qtfb/u/R05LJVUQHFxU4xGxcHycngckEoBHv3wq5dTjG7eDFkZQ10tiIiIiIiMsQZa233jcZU9vA51lo7KTop9VxeXp4tLS3t79deecrKYP168HggJaXr0VdroaEBTpyAFSsgN7f/8xQRERERkSHFGLPPWpvXVdsFR2attRP7JiUZNCoqnEI2LQ0SErrvZwykpjqjtRs2wKpVGqEVEREREZE+E8luxj1ijPmzMeZr0X6uDABrnanFHs+FC9lzJSRAUpITd4FRfxERERERkUsR9WIWyATi+uC50t8CAedKSYksLiXly1gREREREZE+0BfFrAwVPp+z2VOkOxQb48T5fH2Tl4iIiIiIXPFUzEr3/H5n1+LeSE6G8vLo5iMiIiIiInKGilnpXmsrxEZ8FLEjJsaJFxERERER6QMqZqV7bje0tfUutr3diRcREREREekDKmale9nZ0NjYu9jGRpgyJbr5iIiIiIiInNGjYtYYE2eM+XdjTHYPuj8A1F5aWnJZyM+HUCjyI3asdeLy8/smLxERERERueL1qJi11oaAicBFqxpr7TZrbculJiaXAa/XuRoaIotraPgyVkREREREpA9EMs14K7C0rxKRy5AxsHgxnDgBwWDPYoJBaG524iI90kdERERERKSHItmqNhFYZIy5BdgHdBp9tdY+HM3E5DKRlQUrVsD69eDxQEpK10Wqtc6IbHOz0z8rq/9zFRERERGRK0YkxexUoOzM15O+0hbhokoZVHJz4YknoLgYAgGIi3POkY2JcXYtbmx01sh6vfDwwypkRURERESkzxkb6eY+l5G8vDxbWlo60GlcOax1ilmfD8rLnXNk3W5n1+KCApgwQVOLRUREREQkaowx+6y1eV21RTIye/Zho4HJwAFr7alLTU4GEWMgM9O5REREREREBlCPN4AyxniMMduBz4D3gIwz9583xhT1TXoiIiIiIiIi54tkN+P/AYwHcoHWc+6XAPOjmZSIiIiIiIjIhUQyzfgOYL619oAx5tyFtoc5f0MoERERERERkT4TychsCvB5F/c9QHt00hERERERERG5uEiK2T/hjM6edXZ09gGcNbQXZYx5yBhTaow5ZYzZcs79TGOMNcY0n3P9fQS5iYiIiIiIyBUkkmnGq4DfGGOuORP3X898fSPwVz18xifAfwcKAXcX7cnW2rYIchIREREREZErUI9HZq217wEzgOHAR8DNOMXpt6y1ZT18xuvW2h10PV1ZREREREREpEciOmfWWvtn4N4+ygUgcGZzqXeAn1pr677awRjzI+BHABMmTOjDVERERERERORyFck5s5uNMd83xozrgzzqgG8AXuAGnE2lftlVR2vtZmttnrU2Ly0trQ9SERERERERkctdJCOzCThnzWYYYz4Cdp+9rLWfXEoS1tpmoPTMx1pjzENAjTHGY609cSnPFhERERERkaEnkjWz91hrJwDZOEWtG1gL/Kcxxh/lvM7ulBzJbssiIiIiIiJyhYhozewZHwOjgDFAOjAOZ1OoizLGxJ55ZwwQY4xxAW04U4sbgaM459n+HGfEt6kX+YmIiIiIiMgQF8ma2f9mjHkLp+j8FTAFZ13r1dbaiT18zN8BrcDjwD1nvv47YBLwNnAC+AA4BXy/p7mJiIiIiIjIlcVYay/eCzDGdADHgXXAFmvt8b5MrCfy8vJsaWnpxTuKiIiIiIjIoGOM2WetzeuqLZI1qbcAm4E7gGPGmD8bY9YbY+4yxoyKRqIiIvL/t3f/0XWc5YHHv48lxZZsxZLBEDDHMolrk8AhPzAF3BCFNSxb9nSXkp4uJVVxYJsCi9styyktUG/WJaUslHaP3ZamBQwiy9IfUErpoWXdRJzEe0qdlJ8O9lECl8aQYmPZyJbsyNK7f7xX+FpIsUbS9Z0rfT/n3GPdmXfmPjPPjHwfve/MSJIkaTZmfc1sSmkfsA8gItqBrcCt5CHHy4C2egQoSZIkSdJUhW4AFRFPAV4C3Fz9dxPwGDCw4JFJkiRJkjSDWRezEfEQuXj9V3Lx+nvkOw4v9GN5JEmSJEl6QkV6Zn8fi1dJkiRJUgkUuWb2j6dOi4iNwKMppTMLGpUkSZIkSU+gyHNmfzsiXlv9OSLi88Bh4LsR8YJ6BShJkiRJ0lRFHs1zKzA5xPgngeuAFwIfBX5ngeOSJEmSJGlGRa6ZfSrwaPXnVwB/llL6YkQcBw4seGSSJEmSJM2gSM/s94Ge6s//luozZ8kFcSxkUJIkSZIkPZEiPbN/CfzviDgMrAH+rjr9OmBwoQOTJEmSJGkmRYrZtwAVYD3wayml09XpTwP+aKED0zxMTMD+/bB3Lxw8CGfOwIoVcM01sH07bN0Ky4p0ykuSJElSuURKqdExzNmWLVvSgQNernuBfftg1y44cgRaW6GzE1paYHwchofh3DlYtw527oRt2xodrSRJkiTNKCIeSCltmW5ekZ5ZIqKDPKz4KVx4vW1KKX1q7iFqQdx9N9xxB3R0wNOf/qO9rytX5l7bEyfgDW/IbW+9tRGRSpIkSdK8zLqYjYiXAh8HnjTN7AS0LFRQmoN9+3Jx2t2di9mZLFuW2yxfnttfcYU9tJIkSZKaTpELJ/8X8FngGSmlZVNeFrKNNDGRhxZ3dDxxIVtrsu2uXXl5SZIkSWoiRYrZDcBvpZS+U6dYNFf79+drZFevLrbc6tV5uf376xOXJEmSJNVJkWL2fmBzvQLRPOzdm2/2VPQOxcuW5eX27q1HVJIkSZJUN0VuAPUB4H0R8XTgq8BY7cyU0oMLGZgKOHgw37V4Ljo74aGHFjYeSZIkSaqzIsXsX1T/vWuaed4AqpHOnIHLLpvbsi0tMDq6sPFIkiRJUp0VKWafWbcoND8rVsDY2MXbTWd8HNrbFzYeSZIkSXWTUqJSqTAwMMChQ4cYHR2lvb2dzZs309vbS09PDxHR6DDrbtbFbEqpRfLqIgAAH5tJREFUEhGtwI8D64HarsAE9C9wbJqta66B++7Lz5EtangYbrxx4WOSJEmStOAGBwfp7++nUqnQ1tZGV1cXK1asYGxsjPvuu4977rmHnp4e+vr62LhxY6PDrasiz5l9FvAZcg9tAOPV5ceAs1jMNs727XDvvfkRO0VuAjUxAefO5eUlSZIkldqDDz7I7t276ezs/JHe1+XLl7Ny5UpSSgwNDXHnnXeyY8cObrjhhgZGXF9Fbn/7+8ADwGpgBLga2AJ8Cbhl4UPTrG3dCuvWwcmTxZY7eTIvt3VrfeKSJEmStCAGBwfZvXs3a9euZc2aNTMOI44I1qxZw9q1a9mzZw+Dg4OXONJLp0gx+3zgXSml08AE0Fq9g/GvAb9bj+A0S8uWwc6dMDKSX7Mx2XbnzuKP9JEkSZJ0yaSU6O/vp7Ozk46Ojlkt09HRwapVq+jv7yelVOcIG6NIFRPkHlmAo8C66s+PAot7MHYz2LYN7rgDhobya2Ji+nYTE+fb7NqVl5MkSZJUWpVKhUqlQnd3d6Hluru7f7jsYlTkbsZfA64FHgG+CLwtIsaBXwQWb991M7n1VrjiilykHjkCra35ObItLfmuxcPD+RrZdevg/e+3kJUkSZKawMDAAG1tbYXvUBwRtLW1MTAwwIYNG+oTXAMVKWbvBCZvl/tO4LPAPcAx4GcXOC7N1bZt8JKXwP79sHcvPPRQfo5se3u+a/Ftt8GLXuTQYkmSJKlJHDp0iK6urjkt29XVxeHDhxc4onIo8miev6v5+RHg6ohYAwylxToIu1ktW5YLVx+5I0mSJDW90dFRVqxYMadlW1paGB0dXeCIymFe3XMppeMWspIkSZJUP+3t7Zw7d25Oy46Pj9Pe3r7AEZWDY00lSZIkqcQ2b97MiRMn5rTsiRMn2LRp0wJHVA4Ws5IkSZJUYr29vYyNjRV+xE5KibGxMXp7e+sUWWNZzEqSJElSifX09NDT08PQ0FCh5YaGhn647GJkMStJkiRJJRYR9PX1MTw8zMjIyKyWGRkZ4dSpU/T19RV+pE+zuKTFbES8OSIORMTZiNg7Zd62iPhGRIxExD0RsTj/fCBJkiRJBW3cuJEdO3Zw9OhRjh8/PuOQ45QSx48f59ixY+zYsYONGzde4kgvnUvdM/sd4F3Ah2onRsSTgU8CvwmsAQ4An7jEsUmSJElSad1www284x3voLu7m0qlwpEjRzh9+jRnzpzh9OnTHDlyhEqlQnd3N29/+9u5/vrrGx1yXUUjnqwTEe8CnpFS2l59fzuwPaW0tfp+JXAMuD6l9I2Z1rNly5Z04MCBSxCxJEmSJJVDSolKpcLAwACHDx9mdHSU9vZ2Nm3axM0338z69esXzdDiiHggpbRlunmtlzqYGTwb+PLkm5TS6Yh4uDr9gmK2WvjeDrB+/fpLGaMkSZIkNVxEsGHDBjZs2NDoUBqqLDeAWgWcnDLtJNA5tWFK6a6U0paU0pa1a9dekuAkSZIkSeVSlmL2FHD5lGmXA8MNiEWSJEmSVHJlKWa/Dlw7+aZ6zexV1emSJEmSJF3gUj+apzUiVgAtQEtErIiIVuBTwHMi4pbq/J3AV57o5k+SJEmSpKXrUvfMvhMYBX4d+Pnqz+9MKR0FbgHuBIaAFwCvvsSxSZIkSZKaxCW9m3FK6Q7gjhnm/V/gWZcyHkmSJElScyrLNbOSJEmSJM2axawkSZIkqelYzEqSJEmSmo7FrCRJkiSp6VjMSpIkSZKajsWsJEmSJKnpWMxKkiRJkpqOxawkSZIkqelYzEqSJEmSmo7FrCRJkiSp6VjMSpIkSZKajsWsJEmSJKnpWMxKkiRJkpqOxawkSZIkqelYzEqSJEmSmo7FrCRJkiSp6VjMSpIkSZKajsWsJEmSJKnpWMxKkiRJkpqOxawkSZIkqem0NjoANVhKUKnAwAAcOgSjo9DeDps3Q28v9PRARKOjlCRJkqQLWMwuZYOD0N+fi9m2NujqghUrYGwM7rsP7rknF7N9fbBxY6OjlSRJkqQfsphdqh58EHbvhs7OH+19Xb4cVq7MvbZDQ3DnnbBjB9xwQ+PilSRJkqQaXjO7FA0O5kJ27VpYs2bmYcQRef7atbBnT15OkiRJkkrAYnapSSkPLe7shI6O2S3T0QGrVuXlUqpvfJIkSZI0CxazS02lkl/d3cWW6+4+v6wkSZIkNZjF7FIzMJBv9lT0DsURebmBgfrEJUmSJEkFWMwuNYcO5bsWz0VXFxw+vLDxSJIkSdIcWMwuNaOj0DrHm1i3tOTlJUmSJKnBfDTPYjQxAfv3w969cPAgnDmTnx97zTVw9iw89an58TtFjY9De/uChytJkiRJRVnMLjb79sGuXXDkSO6B7eyEyy6DsTG47z44fjz3sL7sZXDllcXWfeIE3HhjfeKWJEmSpAIcZryY3H03vOENueh8+tPhiitg5crcK7tyZX6/YUPuqf3MZ+ArX5n9ulPKBXFvb93ClyRJkqTZsphdLPbtgzvuyI/Q6e6GZTOktr0dLr8899beey888sjs1j80BD09+SVJkiRJDWYxuxhMTOShxR0d+fVEInIPbUQebjwwkHtdn8jICJw6BX19xR/pI0mSJEl1UKpiNiLujYgzEXGq+jrU6Jiawv79+RrZ1atn1769Hdaty723J07At789fbuU8jW2x47Bjh2wcePCxSxJkiRJ81DGG0C9OaX0p40Ooqns3Ztv9jTT0OLpdHbm62cffhjuvz8v39WVe2vHx3OROzaWhxX/8i9byEqSJEkqlTIWsyrq4MFcnBbV3p4L2pTyXYoPH87PkW1vz+9vvhnWr3dosSRJkqTSKWMx++6I+B3gEPCOlNK9tTMj4nbgdoD169df+ujK6MyZfEOnuWhthccfh9e+dmFjkiRJkqQ6KtU1s8DbgCuBdcBdwGci4qraBimlu1JKW1JKW9auXduIGMtnxYo8NHguxsdzT6wkSZIkNZFSFbMppX9MKQ2nlM6mlD4C3A+8otFxld4118Dw8NyWHR6Gq69e2HgkSZIkqc5KVcxOIwFesHkx27fDuXP5ET1FTEzk5bZvr0dUkiRJklQ3pSlmI6IrIl4eESsiojUibgVuAj7X6NhKb+vW/KidkyeLLXfyZF5u69b6xCVJkiRJdVKaYhZoA94FHAWOATuAV6aUDjc0qmawbBns3AkjI/k1G5Ntd+4s9kgfSZIkSSqB0lQxKaWjKaXnp5Q6U0pdKaUXppQ+3+i4msa2bXDHHTA0lF8zDTmemDjfZteuvJwkSZIkNZkyPppHc3XrrXDFFblIPXIkP3ansxNaWvJdi4eH8zWy69bB+99vIStJkiSpaVnMLjbbtsFLXgL798PevfDQQzA6mh+/c+ONcNtt8KIXObRYkiRJUlOzmF2Mli3LheuNNzY6EkmSJEmqC7vnJEmSJElNx2JWkiRJktR0LGYlSZIkSU3HYlaSJEmS1HQsZiVJkiRJTcdiVpIkSZLUdCxmJUmSJElNx2JWkiRJktR0LGYlSZIkSU3HYlaSJEmS1HQsZiVJkiRJTcdiVpIkSZLUdCxmJUmSJElNx2JWkiRJktR0LGYlSZIkSU3HYlaSJEmS1HQsZiVJkiRJTae10QEsOilBpQIDA3DoEIyOQns7bN4Mvb3Q0wMR9Y1hYgL274e9e+HgQThzBlasgGuuge3bYetWWLaseLxl2LbFql55MGelUIb0luFQaPS2lWEfSJq7lBKVSoWBgQEOHTrE6Ogo7e3tbN68md7eXnp6eog6nsSN/vx6WszbpsWd30gpNTqGOduyZUs6cOBAo8M4b3AQ+vvzt6W2NujqgtZWOHcOTpyAsbH8bamvDzZurE8M+/bBrl1w5Ej+7M5OaGmB8XEYHs6xrFsHO3fmWGYbLzR+2xarIscNmLMmU4b0Fmlbr0Oh0fvhppvgC1/wdJCa1eDgIP39/VQqFdra2ujq6qK1tZVz585x4sQJxsbG6Onpoa+vj411OIkb/fn1tJi3TYsjvxHxQEppy7TzLGYXyIMPwu7duXjs7p7+z/spwdBQLip37IAbbljYGO6+G+64Azo6YPXq872vtSYm4ORJOH4cnvlMuO66i8f7rW/l9xs2NG7bFqsix02RPJizUihDestwKDR6Pxw6BA88AM97Xu6F9XSQmsuDDz7I7t276ezspLu7e9oepJQSQ0NDDA8Ps2PHDm5YwJO40Z9fT4t527R48vtExazXzC6EwcH8TW3tWlizZuZxahF5/tq1sGdPXm6h7NuXC9nu7vyarpCFPH3Fijy+7qtfzV0STxQvwDe/mV+106ZrW69tW6yKHDcw+zwAPPKIOWuwMqS3SNt6HQpF90ORbZvNPhsagm98I/9aPHQov59pnZ4OUvkMDg6ye/du1q5dy5o1a2YcChkRrFmzhrVr17Jnzx4GF+gkbvTn19Ni3jYtnfxazM5XSnmMW2dn7hGdjY4OWLUqL7cQPeMTE3locUfHxWNICR57DJYvh8suyxePTUzM3PbLXz6/3i9/+eLxLvS2LVZFjpsieZhsu3KlOWugMqS3DIfCXPZDkW272D6bbLd8eW532WUX3w+eDlJ5pJTo7++ns7OTjll+x+ro6GDVqlX09/cz39GHjf78elrM26allV+L2fmqVPKru7vYct3d55edr/378zWyq1dfvO3Zs/nV2pq/4Q0Pw6OPTt/25Mn8WrEivybfX8xCbttiVeS4KZIHc1YKZUhvGQ6FRu+H2nYw+/3g6SCVQ6VSoVKp0F3wO1Z3d/cPl23mz6+nxbxtWlr5tZidr4GBfDeRoncAi8jLDQzMP4a9e3NxOtPQ4lqTw4ojcvsI+NKXpm/7rW+dbzPZfvIitSeykNu2WBU5borkwZyVQhnSW4ZDodH7obYdzH4/eDpI5TAwMEBbW1vhu6xGBG1tbQzM8yRu9OfX02LeNi2t/FrMztehQ/m2mHPR1QWHD88/hoMH8zi+2RgZyYXvpOXL4ejR6dt+//vnuzQg//z978/ucxZq2xarIsdNkTyYs1IoQ3rLcCg0ej9MbXex9dbydJAa79ChQ3TN8TtWV1cXh+d5Ejf68+tpMW+bllZ+LWbna3T0wuKwiJaWvPx8nTmT1zUbExMXdpMsW5afTTGdc+cu7O19orZTLdS2LVZFjpsieRgbM2clUIb0luFQKLIfisQ72302dZ0XW28tTwep8UZHR2md43eslpYWRud5Ejf68+tpMW+bllZ+LWbnq7199t8Qpxofz8vP14oVeV2zsWzZhXc1mZiY+dtma+uFN4d6orZTLdS2LVZFjpsieWhrM2clUIb0luFQKLIfisQ72302dZ0XW28tTwep8drb2zk3x+9Y4+PjtM/zJG7059fTYt42La38WszO1+bN+TrUuThxAjZtmn8M11yTb+Q0Gx0dF367PHs2P4tiOk96Uu71nXTmTJ42Gwu1bYtVkeOmSB7MWSmUIb1lOBQavR+mtrvYemt5OkiNt3nzZk7M8TvWiRMn2DTPk7jRn19Pi3nbtLTyazE7X729eSxb0VtYp5SX6+2dfwzbt+cCdaZH7NTq6sqfnVJunxJcd930bTdsON9msv2GDRf/jIXctsWqyHFTJA/mrBTKkN4yHAqN3g+17WD2+8HTQSqH3t5exsbGCj8mJKXE2NgYvfM8iRv9+fW0mLdNSyu/FrPz1dOTX0NDxZYbGjq/7Hxt3Qrr1s3uuRvLl+fXuXO5V7azE57xjOnbrl6dX2fO5Nfk+4tZyG1brIocN0XyYM5KoQzpLcOh0Oj9UNsOZr8fPB2kcujp6aGnp4ehgt+xhoaGfrhsM39+PS3mbdPSyq/F7HxFQF9fHuY7MjK7ZUZG4NSpvFzRR/pMZ9ky2Lkzr/diMUTAFVfkQvbxx3PXw0yP9ImAa689v95rr714vAu9bYtVkeOmSB4m254+bc4aqAzpLcOhMJf9UGTbLrbPJtudPZvbPf74xfeDp4NUHhFBX18fw8PDjMzyO9bIyAinTp2ir6+v8GNJyvb59bSYt01LK7+lKmYjYk1EfCoiTkdEJSJe0+iYZmXjRtixIz/i5vjxmcfUpZTnHzuW22/cuHAxbNsGd9yRuxSGhmYecjwxkbsn2tvhuc89P+x4pngBnvlMuPLKC6dN17Ze27ZYFTluYPZ5ALjqqtz+Ym3NWd2UIb1F2tbrUCi6H4ps22z2WXc3POtZ+dfis56V38+0Tk8HqXw2btzIjh07OHr0KMePH59x2GRKiePHj3Ps2DF27NjBxgU6iRv9+fW0mLdNSye/UXQsdT1FxMfJBfbrgeuAzwJbU0pfn679li1b0oEDBy5hhBcxOAj9/VCp5NtodnXl5zuMj+e7iYyN5XFrfX31+6a0bx/s2gVHjuRbdnZ2no9heDgPL163Lvfk9vTMPl5o/LYtVkWOGzBnTaYM6S3Stl6HQqP3w003wRe+4OkgNavBwUH6+/upVCq0tbXR1dVFS0sL4+PjnDhxgrGxMXp6eujr66vLl/FGf349LeZt0+LIb0Q8kFLaMu28shSzEbESGAKek1I6XJ3WDxxJKf36dMuUrpiF/Of9SgUGBuDw4fygwvb2fFvMm2+G9evrP3ZtYgL274e9e+Ghh87HcPXVcNtt8KIXnR9aXCTeMmzbYlWvPJizUihDestwKDR628qwDyTNXUqJSqXCwMAAhw8fZnR0lPb2djZt2sTNN9/M+vXr6zo8stGfX0+LedvU/PltlmL2euD+lFJHzbS3Ar0ppZ+qmXY7cDvA+vXrn1epVC55rJIkSZKk+nuiYrZM18yuAn4wZdpJoLN2QkrprpTSlpTSlrUzPR9VkiRJkrSolamYPQVcPmXa5cBwA2KRJEmSJJVYmYrZw0BrRPxYzbRrgWlv/iRJkiRJWrpKU8ymlE4DnwR2RcTKiPgJ4D8C/Y2NTJIkSZJUNqUpZqveBLQD3wM+DrxxpsfySJIkSZKWrtZGB1ArpXQceGWj45AkSZIklVvZemYlSZIkSbooi1lJkiRJUtOxmJUkSZIkNR2LWUmSJElS07GYlSRJkiQ1HYtZSZIkSVLTsZiVJEmSJDWdSCk1OoY5i4ijQKXRcSygJwPHGh2ECjFnzcecNSfz1nzMWfMxZ83HnDUfc1ZcT0pp7XQzmrqYXWwi4kBKaUuj49DsmbPmY86ak3lrPuas+Ziz5mPOmo85W1gOM5YkSZIkNR2LWUmSJElS07GYLZe7Gh2ACjNnzcecNSfz1nzMWfMxZ83HnDUfc7aAvGZWkiRJktR07JmVJEmSJDUdi1lJkiRJUtOxmG2AiHhzRByIiLMRsXfKvG0R8Y2IGImIeyKip0FhqkZELI+ID0ZEJSKGI+JLEfGTNfPNWwlFxMci4rsR8YOIOBwR/7lmnjkrsYj4sYg4ExEfq5n2muo5eDoi/ioi1jQyRmURcW81V6eqr0M188xZSUXEqyPioWpuHo6IF1en+7uxhGrOr8nXeETsrplv3kooIjZExN9GxFBEPBYReyKitTrvuoh4oJqzByLiukbH24wsZhvjO8C7gA/VToyIJwOfBH4TWAMcAD5xyaPTdFqBfwF6gdXAO4E/q/6SMm/l9W5gQ0rpcuA/AO+KiOeZs6bwB8A/Tb6JiGcDfwz0AU8FRoA/bExomsabU0qrqq/NYM7KLCJeBrwHuA3oBG4CHvF3Y3nVnF+rgCuAUeDPwe+PJfeHwPeApwHXkb9HvikiLgM+DXwM6AY+Any6Ol0FeAOoBoqIdwHPSCltr76/HdieUtpafb8SOAZcn1L6RsMC1bQi4ivA/wCehHkrvYjYDNwL/ArQhTkrrYh4NfAq4CCwMaX08xHx2+Q/TLym2uYq4CHgSSml4cZFq4i4F/hYSulPp0w3ZyUVEfuBD6aUPjhlut9DmkBEvBb478BVKaVk3sorIh4C/ltK6W+r798LXA78JfBhch2QqvO+DdyeUvpco+JtRvbMlsuzgS9PvkkpnQYerk5XiUTEU4FNwNcxb6UWEX8YESPAN4DvAn+LOSutiLgc2AW8ZcqsqTl7GHicfB6q8d4dEcci4v6IuLk6zZyVUES0AFuAtRExGBGPVoc+tuPvxmbxWuCj6XyPlHkrr98HXh0RHRGxDvhJ4HPk3HylJocAX8GcFWYxWy6rgJNTpp0kDwFSSUREG3A38JHqXzzNW4mllN5EzsWLycOwzmLOyuy3yD1Gj06Zbs7K623AlcA68vMTP1PthTVn5fRUoA34GfLvxeuA68mXz5izkqteC9tLHpY6ybyV1xfIBeoPgEfJQ8D/CnO2YCxmy+UUeehBrcsBh2OVREQsA/rJvQtvrk42byWXUhpPKd0HPAN4I+aslKo3v3gp8HvTzDZnJZVS+seU0nBK6WxK6SPA/cArMGdlNVr9d3dK6bsppWPA+zFnzaIPuC+l9M2aaeathKrfGT9H/kP6SuDJ5Otj34M5WzAWs+XydeDayTfVax6uqk5Xg0VEAB8k/1X7lpTSWHWWeWserZzPjTkrn5uBDcC3I+Ix4K3ALRHxID+asyuB5cDhSx+mLiIBgTkrpZTSELmHqHZ44+TP/m4sv1/gwl5ZMG9ltQZYD+yp/rHv++TrZF9Bzs1zq98tJz0Xc1aYxWwDRERrRKwAWoCWiFhRvU33p4DnRMQt1fk7yePpvXi/HP4IuBr4qZTSaM1081ZCEfGU6qMnVkVES0S8HPg5YB/mrKzuIn8Bu676+gDwWeDl5KH9PxURL65+UdsFfNIbCTVWRHRFxMsn/x+LiFvJd8b9HOaszD4M7Kj+nuwGfhX4G/zdWGoRsZU8nP/Pp8wybyVUHfXwTeCN1d+PXeTrnb9CviHlOPDLkR//ODna7x8aEmwTs5htjHeSh/n8OvDz1Z/fmVI6CtwC3AkMAS8AXt2oIHVe9RqVXyJ/wX6s5jlvt5q30krkIcWPkvPyPuC/ppT+2pyVU0ppJKX02OSLPAzrTErpaErp68AbyAXS98jXFb2pgeEqayM/au4o+e6pO4BXppQOm7NS+y3yo68Ok+8w/c/Anf5uLL3XMs0fhMxbqb0K+Hfk35GDwBjwqymlx4FXknvaTwCvI//ufLxRgTYrH80jSZIkSWo69sxKkiRJkpqOxawkSZIkqelYzEqSJEmSmo7FrCRJkiSp6VjMSpIkSZKajsWsJEmSJKnpWMxKkiRJkpqOxawkSU0mIjZERIqILY2OBSAitkfEqUbHIUlaWixmJUnSrEXEtyLirY2OQ5Iki1lJki6xiGhrdAySJDU7i1lJ0pIVER0RsTciTkXEv0bE2yPibyJib3X+j/RCRsS9EbGn5v1lEfGeiHg0IkYi4p8i4uU182+uDgl+RUR8MSIeB34pIiamDhOOiF+MiGMRcdkctuWaiPhsRAxHxPci4uMRcUXN/L3VbfuViDgSEUMR8eGI6KhpszIiPlqzP35jyv64F+gB3lvdpjQlhm0R8bWIOB0R90TEM4tuhyRJs2UxK0layt4HvAy4BdgGXA/cVHAdHwZ6gdcAzwE+AnwmIq6d0u49wDuBZwGfAD4PvG5Km9cB/Smlx4sEEBFPA74AfA34ceClwCrg0xFR+3/9i6sxvhT4T8BPA79SM/93q9vy08C/Aa6tLjPpVcCjwC7gadXXpOXAb1S34UVAF/CBItshSVIRrY0OQJKkRoiIVcDrgdellP6uOu02crE223VcBfwcsCGl9O3q5D0R8VLgl4A31TS/I6X09zXL/gnwJxHxlpTSmYi4Gngh8Itz2Jw3Al9OKb2tZv2/ABwHtgBfrE7+AfCGlNI48FBE/Dm5iH93dX+8DviFlNLnq+t4PTX7I6V0PCLGgeGU0mNTYmgF/ktK6VB12fcBH4qISCklJElaYPbMSpKWqquAy4D/NzkhpXQK+GqBddwABHCwOjT3VPWuvv++uv5aB6a8/zTwOLm3E3Ih+cWU0tcKfP6k5wE3TYnhX6rzauM4WC1kJ30HeEpNuzbOF76klE6Te3tn4+xkIVuz7suA7tlvhiRJs2fPrCRJM5sgF6u1am/etAxIwPOBsSntRqe8P137JqU0FhEfBV4XEX8G9AE75xjnMuCzwHR3Gf7Xmp+nxphYuD9sn5tm3Szg+iVJuoDFrCRpqXqYXNy9EHgE8g2QyNeUPlxtc5Sa60IjYgX5mtd/rk76Z3Kxe0VK6Z45xPCnwEHycORO4P/MYR0ADwI/C1RSSlML1tma3B/P5/z+6ODC/QG5N7lljp8hSdKC8a+lkqQlqTqk+IPAeyLiZRHxbOBDXFio/QNwa/WOxJPzW2vWcRi4G9gbET8TEVdGxJaIeGtEvIqLqA7LvQ94L/AXKaUfzHFz/gBYDXwiIl5QjeOlEXFXRHTOZgXV/fEh8v7YFhHXkIvtyd7nSd8CXhwR6yLiyXOMV5KkebNnVpK0lL0VWAl8ChgBdlffT3o3sIF8fesp4E7g6VPWcRvwDuB/As8g33Tpi8Bse2o/SL6D8gfnsgEAKaXvRMRPVOP9HLAC+Dbw98DZAqua3B9/Td7e3wOeCpypabMT+GNyb+1yfnQYtiRJl0R4g0FJks6LiL8BjqWUtl+iz3sb8PqU0qZL8XlFRMRyoAK8N6X0u42OR5KkWvbMSpLUANVH4fSQn/N6Z4PDASAirgeuJvcsdwJvq/77iUbGJUnSdLxmVpKkxthDvnHT/eRhuz8UER+ofczOlNcH6hzXW8g3tvoH8hDjm1JKs372riRJl4rDjCVJKpmIeApw+Qyzf5BS+t6ljEeSpDKymJUkSZIkNR2HGUuSJEmSmo7FrCRJkiSp6VjMSpIkSZKajsWsJEmSJKnp/H+YOd/C7PppWQAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# The groupby/count produces a very small amount of data that \n", "# we can easily pull down and plot from our local client\n", "import pandas as pd\n", "spark.conf.set(\"spark.sql.execution.arrow.enabled\", \"true\")\n", "\n", "# Convert to Pandas (make sure it's small)\n", "txt_df = txt_queries.toPandas()\n", "\n", "# Now use dataframe group by cluster\n", "cluster_groups = txt_df.groupby('prediction')\n", "\n", "# Plot the Machine Learning results\n", "choices = ['red', 'green', 'blue', 'black', 'orange', 'purple', 'brown',\n", " 'pink', 'lightblue', 'grey', 'yellow']\n", "colors = {value: choices[index] for index, value in enumerate(txt_df['prediction'].unique())}\n", "\n", "fig, ax = plt.subplots()\n", "for key, group in cluster_groups:\n", " group.plot(ax=ax, kind='scatter', x='query_length', y='answer_length', alpha=0.5, s=250,\n", " label='Cluster: {:d}'.format(key), color=colors[key])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "## Interesting...\n", "So we gave the clustering algorithm both categorical types and numerical types and it seems to have done a reasonable job using both, we can see that the categorical types are clustered and then within the categorical clustering we have a set of 'sub-clusters' based on the numerical values." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "## Wrap Up\n", "Well that's it for this notebook, we pulled in Zeek log data from a Parquet file, then did some digging with high speed, parallel SQL operations and finally we clustered our data to organize the restuls.\n", "\n", "If you liked this notebook please visit the [ZAT](https://github.com/SuperCowPowers/zat) project for more notebooks and examples.\n", "\n", "## About SuperCowPowers\n", "The company was formed so that its developers could follow their passion for Python, streaming data pipelines and having fun with data analysis. We also think cows are cool and should be superheros or at least carry around rayguns and burner phones. Visit SuperCowPowers" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }