{
 "metadata": {
  "name": "",
  "signature": "sha256:5d684fdc22f9bd79a89a3bf5e6c80e01af4a1ba431b3476743d051ca7c665359"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<div style=\"float: right; margin: 0px 0px 0px 30px\"><img src=\"files/images/workbench.jpg\" width=\"400px\"></div>\n",
      "# PE Static Analysis with Workbench: \n",
      "**Super Big Thanks**\n",
      "\n",
      "- [IOCBucket](http://www.iocbucket.com/). Resources like this are terrific and greatly appreciated by us and the community.\n",
      "- [Yara](http://plusvic.github.io/yara/). Super Spiffy!\n",
      "\n",
      "### Tools in this Notebook:\n",
      "- Workbench: Open Source Security Framework [Workbench GitHub](https://github.com/SuperCowPowers/workbench)\n",
      "- Yara: The pattern matching swiss knife for malware researchers [Yara](http://plusvic.github.io/yara/)\n",
      "\n",
      "### Lets start up the workbench server...\n",
      "Run the workbench server (from somewhere, for the demo we're just going to start a local one)\n",
      "<pre>\n",
      "$ workbench_server\n",
      "</pre>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Lets start to interact with workbench, please note there is NO specific client to workbench,\n",
      "# Just use the ZeroRPC Python, Node.js, or CLI interfaces.\n",
      "import zerorpc\n",
      "c = zerorpc.Client()\n",
      "c.connect(\"tcp://127.0.0.1:4242\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 1,
       "text": [
        "[None]"
       ]
      }
     ],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<div style=\"float: left; margin: 0px 0px 0px 0px\"><img src=\"files/images/confused.jpg\" width=\"350px\"></div>\n",
      "\n",
      "## So I'm confused what am I suppose to do with workbench? \n",
      "<br>\n",
      "<font size=4> Workbench is often confusing for new users (we're trying to work on that). Please see our github repository https://github.com/SuperCowPowers/workbench for the latest documentation and notebooks examples (the notebook examples can really help). New users can start by typing **c.help()** after they connect to workbench.</font>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# I forgot what stuff I can do with workbench\n",
      "print c.help()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Welcome to Workbench: Here's a list of help commands:\n",
        "\t - Run c.help_basic() for beginner help\n",
        "\t - Run c.help_commands() for command help\n",
        "\t - Run c.help_workers() for a list of workers\n",
        "\t - Run c.help_advanced() for advanced help\n",
        "\n",
        "See https://github.com/SuperCowPowers/workbench for more information\n"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print c.help_basic()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Workbench: Getting started...\n",
        "\t - 1) $ print c.help_commands() for a list of commands\n",
        "\t - 2) $ print c.help_command('store_sample') for into on a specific command\n",
        "\t - 3) $ print c.help_workers() for a list a workers\n",
        "\t - 4) $ print c.help_worker('meta') for info on a specific worker\n",
        "\t - 5) $ my_md5 = c.store_sample(...)\n",
        "\t - 6) $ output = c.work_request('meta', my_md5)\n"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# STEP 1:\n",
      "# Okay get the list of commands from workbench\n",
      "print c.help_commands()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Workbench Commands:\n",
        "\tadd_node(node_id, name, labels)\n",
        "\tadd_rel(source_id, target_id, rel)\n",
        "\tclear_db()\n",
        "\tclear_graph_db()\n",
        "\tget_datastore_uri()\n",
        "\tget_sample(md5)\n",
        "\tget_sample_set(md5)\n",
        "\tget_sample_window(type_tag, size)\n",
        "\thas_node(node_id)\n",
        "\thave_sample(md5)\n",
        "\thelp()\n",
        "\thelp_advanced()\n",
        "\thelp_basic()\n",
        "\thelp_command(command)\n",
        "\thelp_commands()\n",
        "\thelp_worker(worker)\n",
        "\thelp_workers()\n",
        "\tindex_sample(md5, index_name)\n",
        "\tindex_worker_output(worker_class, md5, index_name, subfield)\n",
        "\tsearch(index_name, query)\n",
        "\tstore_sample(input_bytes, filename, type_tag)\n",
        "\tstore_sample_set(md5_list)\n",
        "\twork_request(worker_class, md5, subkeys=None)\n"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# STEP 2:\n",
      "# Lets gets the infomation on a specific command 'store_sample'\n",
      "print c.help_command('store_sample')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        " Command: store_sample(input_bytes, filename, type_tag) \n",
        " Store a sample into the DataStore.\n",
        "            Args:\n",
        "                filename: name of the file (used purely as meta data not for lookup)\n",
        "                input_bytes: the actual bytes of the sample e.g. f.read()\n",
        "                type_tag: ('exe','pcap','pdf','json','swf', or ...)\n",
        "            Returns:\n",
        "                the md5 of the sample\n",
        "        \n"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# STEP 3:\n",
      "# Now lets get infomation about the dynamically loaded workers (your site may have many more!)\n",
      "# Next to each worker name is the list of dependences that worker has declared\n",
      "print c.help_workers()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Workbench Workers:\n",
        "\tjson_meta ['sample', 'meta']\n",
        "\tlog_meta ['sample', 'meta']\n",
        "\tmem_base ['sample']\n",
        "\tmem_connscan ['sample']\n",
        "\tmem_dlllist ['sample']\n",
        "\tmem_meta ['sample']\n",
        "\tmem_procdump ['sample']\n",
        "\tmem_pslist ['sample']\n",
        "\tmeta ['sample']\n",
        "\tmeta_deep ['sample', 'meta']\n",
        "\tpcap_bro ['sample']\n",
        "\tpcap_graph ['pcap_bro']\n",
        "\tpcap_graph_0_1 ['pcap_bro']\n",
        "\tpcap_http_graph ['pcap_bro']\n",
        "\tpe_classifier ['pe_features', 'pe_indicators']\n",
        "\tpe_deep_sim ['meta_deep']\n",
        "\tpe_features ['sample']\n",
        "\tpe_indicators ['sample']\n",
        "\tpe_peid ['sample']\n",
        "\tstrings ['sample']\n",
        "\tswf_meta ['sample', 'meta']\n",
        "\tunzip ['sample']\n",
        "\turl ['strings']\n",
        "\tview ['meta']\n",
        "\tview_customer ['meta']\n",
        "\tview_log_meta ['log_meta']\n",
        "\tview_meta ['meta']\n",
        "\tview_pcap ['pcap_bro']\n",
        "\tview_pcap_details ['view_pcap']\n",
        "\tview_pdf ['meta', 'strings']\n",
        "\tview_pe ['meta', 'strings', 'pe_peid', 'pe_indicators', 'pe_classifier', 'pe_disass']\n",
        "\tview_zip ['meta', 'unzip']\n",
        "\tvt_query ['meta']\n",
        "\tyara_sigs ['sample']\n"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# STEP 4:\n",
      "# Lets gets the infomation about the meta worker\n",
      "print c.help_worker('meta')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        " Worker: meta ['sample']\n",
        "\t This worker computes meta data for any file type. \n"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<div style=\"float: right; margin: 0px 30px 0px 0px\"><img src=\"files/images/feeling_awesome.jpg\" width=\"300px\"></div>\n",
      "# Alright, are we feeling awesome yet? \n",
      "## Let do Static Analysis on PE Files!\n",
      "<font size=4> Workbench has lots of samples that you can try out (some of them are malicious, be careful!). So you can grab a PE file, a PCAP, or a PDF file and throw it in! For this notebook we're just going to put in PE files.\n",
      "</font>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# STEP 5:\n",
      "# Okay when we load up a file, we get the md5 back\n",
      "filename = '../data/pe/bad/0cb9aa6fb9c4aa3afad7a303e21ac0f3'\n",
      "with open(filename,'rb') as f:\n",
      "    my_md5 = c.store_sample(f.read(), filename, 'exe')\n",
      "print my_md5"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0cb9aa6fb9c4aa3afad7a303e21ac0f3\n"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<div style=\"float: left; margin: 0px 30px 0px 0px\"><img src=\"files/images/feeling_more_awesome.jpg\" width=\"350px\"></div>\n",
      "# Alright, now it's time to get fabulous!\n",
      "### Let check out some of the other workers\n",
      "<font size=4> We saw a bunch of workers for PE Files, so lets look at the help for those and start taking it up a notch!\n",
      "</font>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Lets see what view_pe does\n",
      "print c.help_worker('view_pe')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        " Worker: view_pe ['meta', 'strings', 'pe_peid', 'pe_indicators', 'pe_classifier', 'pe_disass']\n",
        "\t Generates a high level summary view for PE files that incorporates a large set of workers \n"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Okay lets give it a try\n",
      "c.work_request('view_pe', my_md5)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 10,
       "text": [
        "{'view_pe': {'classification': 'Evil!',\n",
        "  'customer': 'BearTron',\n",
        "  'disass': 'plugin_failed',\n",
        "  'encoding': 'binary',\n",
        "  'file_size': 20480,\n",
        "  'file_type': 'PE32 executable (GUI) Intel 80386, for MS Windows',\n",
        "  'filename': '../data/pe/bad/0cb9aa6fb9c4aa3afad7a303e21ac0f3',\n",
        "  'import_time': '2014-06-15T01:20:52.355000Z',\n",
        "  'indicators': [{'attributes': ['findwindowexa', 'findwindowa'],\n",
        "    'category': 'ANTI_DEBUG',\n",
        "    'description': 'Imported symbols related to anti-debugging',\n",
        "    'severity': 3},\n",
        "   {'category': 'MALFORMED', 'description': 'Checksum of Zero', 'severity': 1},\n",
        "   {'category': 'MALFORMED',\n",
        "    'description': 'Reported Checksum does not match actual checksum',\n",
        "    'severity': 2},\n",
        "   {'attributes': ['sendmessagea'],\n",
        "    'category': 'COMMUNICATION',\n",
        "    'description': 'Imported symbols related to network communication',\n",
        "    'severity': 1},\n",
        "   {'attributes': ['getmodulehandlea', 'getstartupinfoa'],\n",
        "    'category': 'PROCESS_MANIPULATION',\n",
        "    'description': 'Imported symbols related to process manipulation/injection',\n",
        "    'severity': 3},\n",
        "   {'attributes': ['getsystemmetrics'],\n",
        "    'category': 'PROCESS_SPAWN',\n",
        "    'description': 'Imported symbols related to spawning a new process',\n",
        "    'severity': 2}],\n",
        "  'length': 20480,\n",
        "  'md5': '0cb9aa6fb9c4aa3afad7a303e21ac0f3',\n",
        "  'mime_type': 'application/x-dosexec',\n",
        "  'peid_Matches': ['Microsoft Visual C++ v6.0'],\n",
        "  'type_tag': 'exe'}}"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# The Workbench framework is client/server so here's what just happened...\n",
      "<br>\n",
      "<div style=\"margin: 0px 30px 0px 0px\"><img src=\"files/images/client_server.png\" width=\"900px\"></div>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Okay, that worker needed the output of pe_features and pe_indicators\n",
      "# so what happened? The worker has a dependency list and workbench\n",
      "# recursively satisfies that dependency list.. this is powerful because\n",
      "# when we're interested in one particular analysis we just want to get\n",
      "# the darn thing without having to worry about a bunch of details\n",
      "\n",
      "# Well lets throw in a bunch of files!\n",
      "import os\n",
      "file_list = [os.path.join('../data/pe/bad', child) for child in os.listdir('../data/pe/bad')]\n",
      "working_set = []\n",
      "for filename in file_list:\n",
      "    with open(filename,'rb') as f:\n",
      "        md5 = c.store_sample(f.read(), filename, 'exe')\n",
      "        working_set.append(md5)\n",
      "print working_set[:5]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "['033d91aae8ad29ed9fbb858179271232', '0cb9aa6fb9c4aa3afad7a303e21ac0f3', '0e882ec9b485979ea84c7843d41ba36f', '0e8b030fb6ae48ffd29e520fc16b5641', '0eb9e990c521b30428a379700ec5ab3e']\n"
       ]
      }
     ],
     "prompt_number": 13
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Okay we just pushed in a bunch of files, now we can extract features, \n",
      "# look at indicators, peids, and yara sigs!\n",
      "\n",
      "# Lets just randomly pick one to understand the details and then we'll look\n",
      "# at running all of them a bit later.\n",
      "c.work_request('pe_features', working_set[0])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 14,
       "text": [
        "{'pe_features': {'dense_features': {'check_sum': 0,\n",
        "   'compile_date': 585810474,\n",
        "   'datadir_IMAGE_DIRECTORY_ENTRY_BASERELOC_size': 0,\n",
        "   'datadir_IMAGE_DIRECTORY_ENTRY_EXPORT_size': 0,\n",
        "   'datadir_IMAGE_DIRECTORY_ENTRY_IAT_size': 44,\n",
        "   'datadir_IMAGE_DIRECTORY_ENTRY_IMPORT_size': 40,\n",
        "   'datadir_IMAGE_DIRECTORY_ENTRY_RESOURCE_size': 0,\n",
        "   'debug_size': 0,\n",
        "   'export_size': 0,\n",
        "   'generated_check_sum': 98624,\n",
        "   'iat_rva': 17518,\n",
        "   'major_version': 0,\n",
        "   'minor_version': 0,\n",
        "   'number_of_bound_import_symbols': 0,\n",
        "   'number_of_bound_imports': 0,\n",
        "   'number_of_export_symbols': 0,\n",
        "   'number_of_import_symbols': 10,\n",
        "   'number_of_imports': 1,\n",
        "   'number_of_rva_and_sizes': 16,\n",
        "   'number_of_sections': 4,\n",
        "   'pe_char': 271,\n",
        "   'pe_dll': 0,\n",
        "   'pe_driver': 0,\n",
        "   'pe_exe': 1,\n",
        "   'pe_i386': 1,\n",
        "   'pe_majorlink': 6,\n",
        "   'pe_minorlink': 0,\n",
        "   'pe_warnings': 1,\n",
        "   'sec_entropy_brdata': 7.992004822536996,\n",
        "   'sec_entropy_data': 7.996253697966639,\n",
        "   'sec_entropy_rdata': 0.0,\n",
        "   'sec_entropy_reloc': 0,\n",
        "   'sec_entropy_rsrc': 0,\n",
        "   'sec_entropy_text': 6.4550179842911195,\n",
        "   'sec_raw_execsize': 84480,\n",
        "   'sec_rawptr_brdata': 57856,\n",
        "   'sec_rawptr_data': 14848,\n",
        "   'sec_rawptr_rdata': 0,\n",
        "   'sec_rawptr_rsrc': 0,\n",
        "   'sec_rawptr_text': 1024,\n",
        "   'sec_rawsize_brdata': 27648,\n",
        "   'sec_rawsize_data': 43008,\n",
        "   'sec_rawsize_rdata': 0,\n",
        "   'sec_rawsize_rsrc': 0,\n",
        "   'sec_rawsize_text': 13824,\n",
        "   'sec_va_execsize': 170626,\n",
        "   'sec_vasize_brdata': 49152,\n",
        "   'sec_vasize_data': 42586,\n",
        "   'sec_vasize_rdata': 65536,\n",
        "   'sec_vasize_rsrc': 0,\n",
        "   'sec_vasize_text': 13352,\n",
        "   'size_code': 13824,\n",
        "   'size_image': 180224,\n",
        "   'size_initdata': 43008,\n",
        "   'size_uninit': 65536,\n",
        "   'std_section_names': 0,\n",
        "   'total_size_pe': 85504,\n",
        "   'virtual_address': 4096,\n",
        "   'virtual_size': 13352,\n",
        "   'virtual_size_2': 65536},\n",
        "  'md5': '033d91aae8ad29ed9fbb858179271232',\n",
        "  'sparse_features': {'imp_hash': 'Not found: Install pefile 1.2.10-139 or later',\n",
        "   'imported_symbols': ['kernel32.dll:name=getenvironmentvariablew',\n",
        "    'kernel32.dll:name=opendatafile',\n",
        "    'kernel32.dll:name=createeventa',\n",
        "    'kernel32.dll:name=removelocalalternatecomputernamew',\n",
        "    'kernel32.dll:name=getprocessheaps',\n",
        "    'kernel32.dll:name=process32nextw',\n",
        "    'kernel32.dll:name=createactctxa',\n",
        "    'kernel32.dll:name=widechartomultibyte',\n",
        "    'kernel32.dll:name=setcompluspackageinstallstatus',\n",
        "    'kernel32.dll:name=setcommtimeouts'],\n",
        "   'pe_warning_strings': ['Suspicious flags set for section 0. Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. This might indicate a packed executable.',\n",
        "    'Suspicious flags set for section 3. Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. This might indicate a packed executable.',\n",
        "    'Rich Header corrupted'],\n",
        "   'section_names': ['.text', '.rdata', '.data', '.brdata']}}}"
       ]
      }
     ],
     "prompt_number": 14
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "c.work_request('pe_indicators', working_set[0])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 15,
       "text": [
        "{'pe_indicators': {'indicator_list': [{'category': 'PE_WARN',\n",
        "    'description': 'Suspicious flags set for section 0. Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. This might indicate a packed executable.',\n",
        "    'severity': 2},\n",
        "   {'category': 'PE_WARN',\n",
        "    'description': 'Suspicious flags set for section 3. Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. This might indicate a packed executable.',\n",
        "    'severity': 2},\n",
        "   {'category': 'PE_WARN',\n",
        "    'description': 'Rich Header corrupted',\n",
        "    'severity': 2},\n",
        "   {'category': 'MALFORMED', 'description': 'Checksum of Zero', 'severity': 1},\n",
        "   {'category': 'MALFORMED',\n",
        "    'description': 'Reported Checksum does not match actual checksum',\n",
        "    'severity': 2},\n",
        "   {'category': 'MALFORMED',\n",
        "    'description': 'Image size does not match reported size',\n",
        "    'severity': 3},\n",
        "   {'attributes': ['.brdata'],\n",
        "    'category': 'MALFORMED',\n",
        "    'description': 'Section(s) with a non-standard name, tamper indication',\n",
        "    'severity': 3},\n",
        "   {'attributes': ['process32nextw'],\n",
        "    'category': 'PROCESS_MANIPULATION',\n",
        "    'description': 'Imported symbols related to process manipulation/injection',\n",
        "    'severity': 3}],\n",
        "  'md5': '033d91aae8ad29ed9fbb858179271232'}}"
       ]
      }
     ],
     "prompt_number": 15
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Your data - Transparent, Organized, Accessible\n",
      "Everything done by workbench is pushed into the MongoDB backend, worker output is automatically pushed into the datastore and a very lightweight call is made to get the results. As seen in the workflow diagrame above, workers are chained together within a gevent-coprocessing server, data outputs are 'shallow copied' and pipelined into other workers. The whole process is efficient and elegant.\n",
      "<div style=\"margin: 50px 0px 0px 30px\"><img src=\"files/images/mongo_data.png\" width=\"800px\"></div>\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Now we rip the peid on all the PE files\n",
      "output = c.batch_work_request('pe_peid', {'md5_list':working_set})\n",
      "output"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 23,
       "text": [
        "<generator object iterator at 0x10b8c0a50>"
       ]
      }
     ],
     "prompt_number": 23
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<div style=\"float: left; margin: 0px 0px 0px 30px\"><img src=\"files/images/head_explode.jpg\" width=\"350px\"></div>\n",
      "## Holy s#@&! The server batch request returned a generator?\n",
      "#### Yes generators are awesome but getting one from a server request! Are u serious?!  Yes, thanks to ZeroRPC...dead serious.. like chopping off your head and kicking your body into a shallow grave and putting your head on a stick... serious.\n",
      "\n",
      "#### Now we're going to take that generator and populate a Pandas Dataframe in ONE LINE of CODE! Are people paying attention?!? This is like Butter!"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# At this granularity it opens up a new world\n",
      "import pandas as pd\n",
      "df = pd.DataFrame(output)\n",
      "df.head(10)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>match_list</th>\n",
        "      <th>md5</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td>                                   []</td>\n",
        "      <td> 033d91aae8ad29ed9fbb858179271232</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td>          [Microsoft Visual C++ v6.0]</td>\n",
        "      <td> 0cb9aa6fb9c4aa3afad7a303e21ac0f3</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> [Microsoft Visual Basic v5.0 - v6.0]</td>\n",
        "      <td> 0e882ec9b485979ea84c7843d41ba36f</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>3</th>\n",
        "      <td>                                   []</td>\n",
        "      <td> 0e8b030fb6ae48ffd29e520fc16b5641</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4</th>\n",
        "      <td>                                   []</td>\n",
        "      <td> 0eb9e990c521b30428a379700ec5ab3e</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>5</th>\n",
        "      <td>          [Microsoft Visual C++ v6.0]</td>\n",
        "      <td> 127f2bade752445b3dbf2cf2ea75c201</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>6</th>\n",
        "      <td>                                   []</td>\n",
        "      <td> 139385a91b9bca0833bdc1fa77e42b91</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>7</th>\n",
        "      <td>          [Microsoft Visual C++ v6.0]</td>\n",
        "      <td> 13dcc5b4570180118eb65529b77f6d89</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>8</th>\n",
        "      <td>                     [Armadillo v4.x]</td>\n",
        "      <td> 1cac80a2147cd8f3860547e43edcaa00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>9</th>\n",
        "      <td>                                   []</td>\n",
        "      <td> 1cea13cf888cd8ce4f869029f1dbb601</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>10 rows \u00d7 2 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 21,
       "text": [
        "                             match_list                               md5\n",
        "0                                    []  033d91aae8ad29ed9fbb858179271232\n",
        "1           [Microsoft Visual C++ v6.0]  0cb9aa6fb9c4aa3afad7a303e21ac0f3\n",
        "2  [Microsoft Visual Basic v5.0 - v6.0]  0e882ec9b485979ea84c7843d41ba36f\n",
        "3                                    []  0e8b030fb6ae48ffd29e520fc16b5641\n",
        "4                                    []  0eb9e990c521b30428a379700ec5ab3e\n",
        "5           [Microsoft Visual C++ v6.0]  127f2bade752445b3dbf2cf2ea75c201\n",
        "6                                    []  139385a91b9bca0833bdc1fa77e42b91\n",
        "7           [Microsoft Visual C++ v6.0]  13dcc5b4570180118eb65529b77f6d89\n",
        "8                      [Armadillo v4.x]  1cac80a2147cd8f3860547e43edcaa00\n",
        "9                                    []  1cea13cf888cd8ce4f869029f1dbb601\n",
        "\n",
        "[10 rows x 2 columns]"
       ]
      }
     ],
     "prompt_number": 21
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# So lets get a breakdown of the PEID matches\n",
      "df['match'] = [str(match) for match in df['match_list']]\n",
      "df['match'].value_counts()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 29,
       "text": [
        "[]                                               26\n",
        "['Microsoft Visual C++ v6.0']                     4\n",
        "['Borland Delphi 3.0 (???)']                      3\n",
        "['Microsoft Visual C++ v7.0']                     3\n",
        "['UPX v1.25 (Delphi) Stub']                       2\n",
        "['Armadillo v4.x']                                1\n",
        "['ASPack v1.06b']                                 1\n",
        "['Microsoft Visual Basic v5.0 - v6.0']            1\n",
        "['Safeguard 1.03 -> Simonzh']                     1\n",
        "['Upack v0.399 -> Dwing']                         1\n",
        "['UPX v0.71 - v0.72', 'tElock v0.7x - v0.84']     1\n",
        "['Pack Master v1.0', 'PEX v0.99']                 1\n",
        "['Microsoft Visual Basic v5.0']                   1\n",
        "['UPX -> www.upx.sourceforge.net']                1\n",
        "['Dev-C++ v5']                                    1\n",
        "['Borland Delphi 4.0']                            1\n",
        "['BobSoft Mini Delphi -> BoB / BobSoft']          1\n",
        "dtype: int64"
       ]
      }
     ],
     "prompt_number": 29
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Now we do the same thing for yara sigs\n",
      "output = c.batch_work_request('yara_sigs', {'md5_list':working_set})\n",
      "output"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 42,
       "text": [
        "<generator object iterator at 0x10b9310f0>"
       ]
      }
     ],
     "prompt_number": 42
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Pop it into a dataframe with one line of code\n",
      "df_yara = pd.DataFrame(output)\n",
      "df_yara.head(10)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>matches</th>\n",
        "      <th>md5</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td>                                                {}</td>\n",
        "      <td> 033d91aae8ad29ed9fbb858179271232</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> {u'anti_debug': [{u'matches': True, u'meta': {...</td>\n",
        "      <td> 0cb9aa6fb9c4aa3afad7a303e21ac0f3</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> {u'anti_debug': [{u'matches': True, u'meta': {...</td>\n",
        "      <td> 0e882ec9b485979ea84c7843d41ba36f</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>3</th>\n",
        "      <td> {u'import': [{u'matches': True, u'meta': {'des...</td>\n",
        "      <td> 0e8b030fb6ae48ffd29e520fc16b5641</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4</th>\n",
        "      <td>                                                {}</td>\n",
        "      <td> 0eb9e990c521b30428a379700ec5ab3e</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>5</th>\n",
        "      <td> {u'import': [{u'matches': True, u'meta': {'des...</td>\n",
        "      <td> 127f2bade752445b3dbf2cf2ea75c201</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>6</th>\n",
        "      <td>                                                {}</td>\n",
        "      <td> 139385a91b9bca0833bdc1fa77e42b91</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>7</th>\n",
        "      <td> {u'import': [{u'matches': True, u'meta': {'des...</td>\n",
        "      <td> 13dcc5b4570180118eb65529b77f6d89</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>8</th>\n",
        "      <td> {u'anti_debug': [{u'matches': True, u'meta': {...</td>\n",
        "      <td> 1cac80a2147cd8f3860547e43edcaa00</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>9</th>\n",
        "      <td> {u'anti_debug': [{u'matches': True, u'meta': {...</td>\n",
        "      <td> 1cea13cf888cd8ce4f869029f1dbb601</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>10 rows \u00d7 2 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 43,
       "text": [
        "                                             matches  \\\n",
        "0                                                 {}   \n",
        "1  {u'anti_debug': [{u'matches': True, u'meta': {...   \n",
        "2  {u'anti_debug': [{u'matches': True, u'meta': {...   \n",
        "3  {u'import': [{u'matches': True, u'meta': {'des...   \n",
        "4                                                 {}   \n",
        "5  {u'import': [{u'matches': True, u'meta': {'des...   \n",
        "6                                                 {}   \n",
        "7  {u'import': [{u'matches': True, u'meta': {'des...   \n",
        "8  {u'anti_debug': [{u'matches': True, u'meta': {...   \n",
        "9  {u'anti_debug': [{u'matches': True, u'meta': {...   \n",
        "\n",
        "                                md5  \n",
        "0  033d91aae8ad29ed9fbb858179271232  \n",
        "1  0cb9aa6fb9c4aa3afad7a303e21ac0f3  \n",
        "2  0e882ec9b485979ea84c7843d41ba36f  \n",
        "3  0e8b030fb6ae48ffd29e520fc16b5641  \n",
        "4  0eb9e990c521b30428a379700ec5ab3e  \n",
        "5  127f2bade752445b3dbf2cf2ea75c201  \n",
        "6  139385a91b9bca0833bdc1fa77e42b91  \n",
        "7  13dcc5b4570180118eb65529b77f6d89  \n",
        "8  1cac80a2147cd8f3860547e43edcaa00  \n",
        "9  1cea13cf888cd8ce4f869029f1dbb601  \n",
        "\n",
        "[10 rows x 2 columns]"
       ]
      }
     ],
     "prompt_number": 43
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Here the yara output is a bit more details so we're going to carve it up a bit\n",
      "import numpy as np\n",
      "df_yara['match'] = [str(match.keys()) if match.keys() else np.nan for match in df_yara['matches'] ]\n",
      "df_yara = df_yara.dropna()\n",
      "df_yara['count'] = 1\n",
      "df_yara.groupby(['match','md5']).sum()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th>count</th>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>match</th>\n",
        "      <th>md5</th>\n",
        "      <th></th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th rowspan=\"11\" valign=\"top\">['anti_debug']</th>\n",
        "      <th>0cb9aa6fb9c4aa3afad7a303e21ac0f3</th>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>0e882ec9b485979ea84c7843d41ba36f</th>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1cac80a2147cd8f3860547e43edcaa00</th>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1cea13cf888cd8ce4f869029f1dbb601</th>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2d094b6c69020091b68d1bcf5d11fa4b</th>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2d09b5768e3617523d8afa110361919c</th>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2d09b8d9852c3176259915e3509bcbd1</th>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2d09cc92bbe29d96bb3a91b350d1725f</th>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>9ceccd9f32cb2ad0b140b6d15d8993b6</th>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>9e42ff1e6f75ae3e60b24e48367c8f26</th>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>cc113aa59c04b17e7cb832fc417f104d</th>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th rowspan=\"5\" valign=\"top\">['import', 'anti_debug']</th>\n",
        "      <th>0e8b030fb6ae48ffd29e520fc16b5641</th>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>127f2bade752445b3dbf2cf2ea75c201</th>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>13dcc5b4570180118eb65529b77f6d89</th>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2058c50de5976c67a09dfa5e0e1c7eb5</th>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>b681485cb9e0cad73ee85b9274c0d3c2</th>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>['import']</th>\n",
        "      <th>2d09e4aff42aebac87ae2fd737aba94f</th>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>17 rows \u00d7 1 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 54,
       "text": [
        "                                                           count\n",
        "match                    md5                                    \n",
        "['anti_debug']           0cb9aa6fb9c4aa3afad7a303e21ac0f3      1\n",
        "                         0e882ec9b485979ea84c7843d41ba36f      1\n",
        "                         1cac80a2147cd8f3860547e43edcaa00      1\n",
        "                         1cea13cf888cd8ce4f869029f1dbb601      1\n",
        "                         2d094b6c69020091b68d1bcf5d11fa4b      1\n",
        "                         2d09b5768e3617523d8afa110361919c      1\n",
        "                         2d09b8d9852c3176259915e3509bcbd1      1\n",
        "                         2d09cc92bbe29d96bb3a91b350d1725f      1\n",
        "                         9ceccd9f32cb2ad0b140b6d15d8993b6      1\n",
        "                         9e42ff1e6f75ae3e60b24e48367c8f26      1\n",
        "                         cc113aa59c04b17e7cb832fc417f104d      1\n",
        "['import', 'anti_debug'] 0e8b030fb6ae48ffd29e520fc16b5641      1\n",
        "                         127f2bade752445b3dbf2cf2ea75c201      1\n",
        "                         13dcc5b4570180118eb65529b77f6d89      1\n",
        "                         2058c50de5976c67a09dfa5e0e1c7eb5      1\n",
        "                         b681485cb9e0cad73ee85b9274c0d3c2      1\n",
        "['import']               2d09e4aff42aebac87ae2fd737aba94f      1\n",
        "\n",
        "[17 rows x 1 columns]"
       ]
      }
     ],
     "prompt_number": 54
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Alright now that we have an overview of matches we drill down again\n",
      "c.work_request('yara_sigs', 'b681485cb9e0cad73ee85b9274c0d3c2')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 60,
       "text": [
        "{'yara_sigs': {'matches': {'anti_debug': [{'matches': True,\n",
        "     'meta': {'description': 'Anti-Debug Imports'},\n",
        "     'rule': 'process_manip',\n",
        "     'strings': [{'data': 'FindWindow',\n",
        "       'flags': 283,\n",
        "       'identifier': '$',\n",
        "       'offset': 44110},\n",
        "      {'data': 'GetTickCount',\n",
        "       'flags': 283,\n",
        "       'identifier': '$',\n",
        "       'offset': 43080}],\n",
        "     'tags': []}],\n",
        "   'import': [{'matches': True,\n",
        "     'meta': {'description': 'Communication Calls (Winsock WSA)'},\n",
        "     'rule': 'winsock_wsa',\n",
        "     'strings': [{'data': 'WSASocket',\n",
        "       'flags': 275,\n",
        "       'identifier': '$',\n",
        "       'offset': 44206}],\n",
        "     'tags': []},\n",
        "    {'matches': True,\n",
        "     'meta': {'description': 'Communication Calls (Winsock Generic)'},\n",
        "     'rule': 'winsock_generic',\n",
        "     'strings': [{'data': 'closesocket',\n",
        "       'flags': 275,\n",
        "       'identifier': '$',\n",
        "       'offset': 44290},\n",
        "      {'data': 'connect', 'flags': 275, 'identifier': '$', 'offset': 44280},\n",
        "      {'data': 'recv', 'flags': 275, 'identifier': '$', 'offset': 44272},\n",
        "      {'data': 'send', 'flags': 275, 'identifier': '$', 'offset': 44264},\n",
        "      {'data': 'socket', 'flags': 275, 'identifier': '$', 'offset': 44295}],\n",
        "     'tags': []}]},\n",
        "  'md5': 'b681485cb9e0cad73ee85b9274c0d3c2'}}"
       ]
      }
     ],
     "prompt_number": 60
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#Wrap Up\n",
      "Well for this short notebook we focused on PE File static analysis. Obviously we need lots more yara sigs (please some nice person help us out with yara sigs). We hope this exercise showed some neato functionality using [Workbench](https://github.com/SuperCowPowers/workbench), we encourage you to check out the GitHub repository and our other notebooks:\n",
      "- [PCAP_to_Graph](http://nbviewer.ipython.org/github/SuperCowPowers/workbench/blob/master/workbench/notebooks/PCAP_to_Graph.ipynb) for a short notebook on turning this PCAP into a Neo4j graph.\n",
      "- [Workbench Demo](http://nbviewer.ipython.org/url/raw.github.com/SuperCowPowers/workbench/master/workbench/notebooks/Workbench_Demo.ipynb) general introduction to Workbench.\n",
      "- [PCAP_DriveBy](http://nbviewer.ipython.org/url/raw.github.com/SuperCowPowers/workbench/master/workbench/notebooks/PCAP_DriveBy.ipynb) a detail look at a Web DriveBy from the [ThreatGlass](http://www.threatglass.com) repository.\n",
      "- [PE File Sim Graph](http://nbviewer.ipython.org/url/raw.github.com/SuperCowPowers/workbench/master/workbench/notebooks/PE_SimGraph.ipynb) using Neo4j to generate a similarity graph using PE File features.\n",
      "- [Generator Pipelines](http://nbviewer.ipython.org/url/raw.github.com/SuperCowPowers/workbench/master/workbench/notebooks/Generator_Pipelines.ipynb) using the client/server streaming generators to demonstrate 'chaining' generators."
     ]
    }
   ],
   "metadata": {}
  }
 ]
}