{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Having Fun with Syslog.\n",
"\n",
"\n",
"In the next exercise we're going look at some syslog data. We'll take it up a notch by computing similarities with 'Banded MinHash' and running a hierarchical clustering algorithm.\n",
"\n",
"Systems logs are particularly challenging because they often lack any kind of structure as all. For this exercise we're going to be looking at the /var/log/system.log of a typical Mac OSX Laptop. The first steps will be standard data inspection and plots, after that we'll pull out the big guns!\n",
"\n",
"
\n",
"
\n",
"
\n", " | features | \n", "label | \n", "type | \n", "
---|---|---|---|
2014-01-11 00:16:09 | \n", "[kernel[0]:, Wake, reason:, RTC, (Alarm)] | \n", "kernel:Wake:reason:RTC:(Alarm | \n", "kernel | \n", "
2014-01-11 00:16:09 | \n", "[kernel[0]:, RTC:, SleepService, 2014/1/11, 07... | \n", "kernel:RTC::SleepS:2014/1:07:16::sleep:2014/1:... | \n", "kernel | \n", "
2014-01-11 00:16:09 | \n", "[kernel[0]:, AirPort_Brcm43xx::powerChange:, S... | \n", "kernel:AirPor:System:Wake:-:Full:Wake/:Dark:Wa... | \n", "kernel | \n", "
2014-01-11 00:16:09 | \n", "[kernel[0]:, Previous, Sleep, Cause:, 5] | \n", "kernel:Previo:Sleep:Cause::5 | \n", "kernel | \n", "
2014-01-11 00:16:09 | \n", "[kernel[0]:, IOPPF:, Sent, gpu-internal-plimit... | \n", "kernel:IOPPF::Sent:gpu-in:last:value:0:(round:... | \n", "kernel | \n", "
5 rows × 3 columns
\n", "\n", " | features | \n", "label | \n", "type | \n", "
---|---|---|---|
2014-01-11 14:03:10 | \n", "[Google, Chrome, Helper[9524]:, CoreText, Copy... | \n", "Google:Chrome:Helper:CoreTe:CopyFo:receiv:mig:... | \n", "|
2014-01-11 14:03:10 | \n", "[last, message, repeated, 1, time, ---] | \n", "last:messag:repeat:1:time:--- | \n", "last | \n", "
2014-01-11 14:03:10 | \n", "[kernel[0]:, SMC::smcReadKeyAction, ERROR, TC0... | \n", "kernel:SMC::s:ERROR:TC0D:kSMCBa:fKeyHa | \n", "kernel | \n", "
2014-01-11 14:03:40 | \n", "[last, message, repeated, 24, times, ---] | \n", "last:messag:repeat:24:times:--- | \n", "last | \n", "
2014-01-11 14:03:40 | \n", "[kernel[0]:, SMC::smcReadKeyAction, ERROR, TC0... | \n", "kernel:SMC::s:ERROR:TC0D:kSMCBa:fKeyHa | \n", "kernel | \n", "
5 rows × 3 columns
\n", "\n", "> cd data_hacking/fun_with_syslog\n", "> python -m SimpleHTTPServer 9999 &\n", "\n", "\n", "Now point your brower at the html file [http://localhost:9999/syslog_vis.html](http://localhost:9999/syslog_vis.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ToDo\n", "Really want to improve the interactive D3 based Hierarchical Tree Visualization [D3 Data Driven Documents](http://d3js.org)\n", "\n", "### Conclusions\n", "We pulled in some syslog data into a Pandas dataframe, made some plots, computed row similarities with 'Banded MinHash' and used single-linkage clustering to build an agglomerative hierarchical cluster. Lots of possibilities from here, you could use just the LSH datastructure or the H-Cluster...\n", "\n", " - LogFile Viewer\n", " - Click on a row, filter out everything that's similar\n", " - Click on a row, color code all the other rows based on similarity\n", " - Super fancy D3 zoom in/filter awesomeness..." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 1 }