{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "# Zeek to Kafka\n", "This notebook covers how to stream Zeek data using Kafka as a message queue. The setup takes a bit of work but the result will be robust way to stream data from Zeek.\n", "\n", "\n", "\n", "### Software\n", "- Zeek Network Monitor: https://www.zeek.org\n", "- Kafka Zeek Plugin: https://github.com/apache/metron-bro-plugin-kafka\n", "- Kafka: https://kafka.apache.org" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 1: Streaming data pipeline\n", "To set some context, our long term plan is to build out a streaming data pipeline. This notebook will help you get started on this path. After completing this notebook you can look at the next steps by viewing our notebooks that use Spark on Zeek output.\n", " - [Zeek to Spark](https://nbviewer.jupyter.org/github/SuperCowPowers/zat/blob/main/notebooks/Zeek_to_Spark.ipynb)\n", " - [Zeek to Kafka to Spark](https://nbviewer.jupyter.org/github/SuperCowPowers/zat/blob/main/notebooks/Zeek_to_Kafka_to_Spark.ipynb)\n", "\n", "So our streaming pipeline looks conceptually like this.\n", "\n", "\n", "- **Kafka Plugin for Zeek**\n", "- **Publish (provides a nice decoupled architecture)**\n", "- **Pull/Subscribe to whatever feed you want (http, dns, conn, x509...)**\n", "- ETL (Extract Transform Load) on the raw message data (parsed data with types)\n", "- Perform Filtering/Aggregation\n", "- Data Analysis and Machine Learning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "# Getting Everything Setup\n", "Things you'll need:\n", "- A running Zeek network security monitor: https://docs.zeek.org/en/stable/install/install.html\n", "- The Kafka Plugin for Zeek: https://github.com/apache/metron-bro-plugin-kafka\n", "- A Kafka Broker: https://kafka.apache.org\n", "\n", "The weblinks above do a pretty good job of getting you setup with Zeek, Kafka, and the Kafka plugin. If you already have these thing setup then you're good to go. If not take some time and get both up and running. If you're a bit wacky (like me) and want to set these thing up on a Mac you might check out my notes here [Zeek/Kafka Mac Setup](https://github.com/SuperCowPowers/zat/blob/main/docs/zeek_kafka_mac.md)\n", "\n", "## Systems Check\n", "Okay now that Zeek with the Kafka Plugin is setup, lets do just a bit of testing to make sure it's all AOK before we get into making a Kafka consumer in Python.\n", "\n", "**Test the Zeek Kafka Plugin**\n", "\n", "Make sure the Kafka plugin is ready to go by running the follow command on your Zeek instance:\n", "\n", "```\n", "$ zeek -N Apache::Kafka\n", "Apache::Kafka - Writes logs to Kafka (dynamic, version 0.3.0)\n", "```\n", "\n", "**Activate the Kafka Plugin**\n", "\n", "There's a good explanation of all the options here (