{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "_Yara Cell Magic by Eric Hutchins_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Yara](https://code.google.com/p/yara-project/) is a powerful, flexible signature language. From the [documentation \\(PDF\\)](https://code.google.com/p/yara-project/downloads/detail?name=YARA%20User%27s%20Manual%201.6.pdf), it is a tool \"aimed at helping malware researchers to identify and classify malware families.\" Far beyond just malware, Yara provides a language to define sets of rules each with sets of complex conditions with robust regular expressions, match offset specifications, and many other features that, when combined, make Yara a swiss army knife for finding stuff. Wherever you have stuff and want to find a complex set of conditions, Yara can help with that.\n", "\n", "It also has a nice Python [API](https://pypi.python.org/pypi/yara). The typical way to use Yara in Python is to either read in the rules from an external file or specify the rules in a Python string. Using one language (Python) to write another language (Yara) always frustrates me. At least python's triple quote \"\"\" lets you nicely write multi-line strings.\n", "\n", "```python\n", "import yara\n", "yarasig = \"\"\"rule helloworld\n", "{ \n", " strings: \n", " $a = \"hello\" ascii\n", " $b = \"world\" nocase\n", " condition: \n", " any of them\n", "}\"\"\"\n", "myrules = yara.compile(source=yarasig)\n", "```\n", "\n", "But still, I want to edit a blob of text for a language in a native editor. I want to see matching parentheses and brackets as I type them. I want to be able to bulk comment out lines. I want to see line numbers so I can debug Yara error messages. I want syntax highlighting. \n", "\n", "IPython lets us accomplish in a nice, clean fashion. I wrote a custom magic `%%yara` that turns a code cell into a inline Yara editor. Write rules with proper syntax highlighting (using a new CodeMirror mode for yara that I also wrote). Run the cell, and yara compiles the code and puts the compiled rule object back in your namespace. Then use it to Find Stuff." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Notebook Prerequisites__\n", "\n", "_Modules_\n", "\n", "* [Yara](https://pypi.python.org/pypi/yara)\n", "* [Volatility Framework](https://code.google.com/p/volatility/)\n", "* [Pandas](http://pandas.pydata.org/)\n", "\n", "_Custom_\n", "\n", "* yaramagic.py -- Custom code for `%%yara` cell magic\n", "* yara.js -- [CodeMirror](http://codemirror.net/) custom [mode](http://codemirror.net/doc/manual.html#modeapi) for yara syntax highlighting\n", "\n", "_Data_\n", "\n", "* [ds_fuzz_hidden_proc.img](http://amnesia.gtisc.gatech.edu/~moyix/ds_fuzz_hidden_proc.img.bz2) -- Sample memory image via [Sample Memory Images](https://code.google.com/p/volatility/wiki/SampleMemoryImages) directory" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Hello World__" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from pprint import pprint" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "%%yara -n myrules\n", "/* \n", " My first rule called \"helloworld\" \n", " with category \"testing\"\n", "*/\n", "rule helloworld : testing\n", "{\n", " meta:\n", " version = \"0.1\"\n", " strings:\n", " $a = \"hello\" ascii\n", " $b = \"world\" nocase\n", " condition:\n", " all of them\n", "}" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Adding compiled rules as \"myrules\" to namespace\n" ] } ], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "pprint( myrules.match_data(data=\"This is a hello WoRlD test\") )" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "{'main': [{'matches': True,\n", " 'meta': {'version': '0.1'},\n", " 'rule': 'helloworld',\n", " 'strings': [{'data': 'WoRlD',\n", " 'flags': 27,\n", " 'identifier': '$b',\n", " 'offset': 16L},\n", " {'data': 'hello',\n", " 'flags': 19,\n", " 'identifier': '$a',\n", " 'offset': 10L}],\n", " 'tags': ['testing']}]}\n" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "To help debug a rule, toggle the line numbers in your `%%yara` cell by typing `Ctrl-m-l`. The yara CodeMirror mode starts line numbering at 0 so Yara's offending line number error message matches the notebook display." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%yara\n", "rule badrule\n", "{\n", " strings:\n", " $a = \"hello\"\n", " conditio: //<-- oops\n", " all of them\n", "}" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Syntax error! :5: syntax error, unexpected _IDENTIFIER_, expecting _CONDITION_ \n" ] } ], "prompt_number": 4 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Classic Use Case: Volatility" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Volatility Framework has a built-in [YaraScan](https://code.google.com/p/volatility/source/browse/tags/Volatility-2.1.0/volatility/plugins/malware/malfind.py#340) plugin, but this only accepts an external yara rule file or a plain string. In order to use our compiled rules, I'm hacking together pieces from the various classes under YaraScan for this demonstration. I'm using the same __ds_fuzz_hidden_proc.img__ sample from Volatility's [public memory images](https://code.google.com/p/volatility/wiki/PublicMemoryImages) page.\n", "\n", "The following code will flag processes that match a specific Yara rules object." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Base Volatility import\n", "import volatility.conf as conf\n", "import volatility.registry as registry\n", "import volatility.commands as commands\n", "import volatility.addrspace as addrspace\n", "\n", "registry.PluginImporter()\n", "config = conf.ConfObject()\n", "\n", "registry.register_global_options(config, commands.Command)\n", "registry.register_global_options(config, addrspace.BaseAddressSpace)\n", "\n", "cmds = registry.get_plugin_classes(commands.Command, lower = True)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "# Use-case specific imports \n", "from volatility.plugins.filescan import PSScan\n", "import volatility.utils as utils\n", "import volatility.constants as constants" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "config.PROFILE = \"WinXPSP2x86\"\n", "config.LOCATION = \"file:///c:/ds_fuzz_hidden_proc.img\"" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "# A simplified (and surely imperfect) merging of code from YaraScan, VadYaraScanner, and BaseYaraScanner from volatility.malfind\n", "def yrscan(task, rules, contextsize=16):\n", " results = []\n", " \n", " for vad, address_space in task.get_vads():\n", " offset = vad.Start\n", " maxlen = vad.Length\n", " \n", " # Start scanning from offset until maxlen:\n", " i = offset\n", " \n", " while i < offset + maxlen:\n", " # Read some data and match it.\n", " to_read = min(constants.SCAN_BLOCKSIZE + 1024, offset + maxlen - i)\n", " data = address_space.zread(i, to_read)\n", " if data:\n", " for match in rules.match_data(data).get('main', []):\n", " if all([hit['offset'] < constants.SCAN_BLOCKSIZE for hit in match.get('strings', [])]):\n", " results.append((i, match))\n", "\n", " i += constants.SCAN_BLOCKSIZE\n", " return results" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "%%yara -n volarules\n", "\n", "rule exe_on_desktop\n", "{\n", " // Look for files on the Desktop that end in .exe\n", " strings:\n", " $a = /\\\\Desktop\\\\[\\w .-]{1,20}\\.exe/ nocase\n", " condition:\n", " all of them\n", "}\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Adding compiled rules as \"volarules\" to namespace\n" ] } ], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "ps = PSScan(config)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "for task in ps.calculate():\n", " \n", " # Pass the compiled rules object to our method yrscan\n", " hits = yrscan(task, volarules)\n", " \n", " if len(hits) > 0:\n", " print '-----------------------------------'\n", " print 'Process name: %s' % task.ImageFileName\n", " print 'PID: %s' % task.UniqueProcessId\n", " print 'PPID: %s' % task.InheritedFromUniqueProcessId\n", " print 'Create time: %s' % (task.CreateTime or '')\n", " print 'Exit time: %s' % (task.ExitTime or '')\n", " else:\n", " next\n", " \n", " for addr, hit in hits:\n", " print '> Rule name: %s' % hit.get('rule')\n", " for string in hit.get('strings', []):\n", " # Modified from original\n", " # https://code.google.com/p/volatility/source/browse/tags/Volatility-2.1.0/volatility/plugins/malware/malfind.py#481\n", " print \"\".join(\n", " [\"{0:#010x} {1:<48} {2}\\n\".format(string.get('offset') + addr + o, h, ''.join(c))\n", " for o, h, c in utils.Hexdump(string.get('data', ''))\n", " ])" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "-----------------------------------\n", "Process name: wuauclt.exe\n", "PID: 1372\n", "PPID: 1064\n", "Create time: 2008-11-26 07:39:38 \n", "Exit time: \n", "> Rule name: exe_on_desktop\n", "0x0214342f 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x0214343f 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "0x0214342f 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x0214343f 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "> Rule name: exe_on_desktop\n", "0x022fecb7 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x022fecc7 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "0x022fecb7 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x022fecc7 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "> Rule name: exe_on_desktop\n", "0x023e9e87 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x023e9e97 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "0x023e9e87 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x023e9e97 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "-----------------------------------" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Process name: explorer.exe\n", "PID: 1516\n", "PPID: 1452\n", "Create time: 2008-11-26 07:38:27 \n", "Exit time: \n", "> Rule name: exe_on_desktop\n", "0x0214342f 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x0214343f 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "0x0214342f 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x0214343f 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "> Rule name: exe_on_desktop\n", "0x022fecb7 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x022fecc7 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "0x022fecb7 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x022fecc7 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "-----------------------------------" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Process name: svchost.exe\n", "PID: 844\n", "PPID: 672\n", "Create time: 2008-11-26 07:38:18 \n", "Exit time: \n", "> Rule name: exe_on_desktop\n", "0x0214342f 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x0214343f 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "0x0214342f 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x0214343f 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "0x022fecb7 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x022fecc7 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "0x022fecb7 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x022fecc7 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "0x023e9e87 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x023e9e97 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "0x023e9e87 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x023e9e97 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "-----------------------------------" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Process name: svchost.exe\n", "PID: 1064\n", "PPID: 672\n", "Create time: 2008-11-26 07:38:20 \n", "Exit time: \n", "> Rule name: exe_on_desktop\n", "0x0214342f 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x0214343f 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "0x0214342f 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x0214343f 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "> Rule name: exe_on_desktop\n", "0x022fecb7 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x022fecc7 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "0x022fecb7 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x022fecc7 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "> Rule name: exe_on_desktop\n", "0x023e9e87 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x023e9e97 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n", "0x023e9e87 5c 44 65 73 6b 74 6f 70 5c 6e 65 74 77 6f 72 6b \\Desktop\\network\n", "0x023e9e97 5f 6c 69 73 74 65 6e 65 72 2e 65 78 65 _listener.exe\n", "\n" ] } ], "prompt_number": 11 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Unorthodox Use Case: Filtering Logs in Pandas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As stated, Yara is great at Finding Stuff. When you have a lot of data in a Pandas DataFrame, you'll want to filter it to find important stuff. Pandas provides robust [string matching/searching](http://pandas.pydata.org/pandas-docs/stable/basics.html#vectorized-string-methods) capabilities, but perhaps you already detections defined in yara sigs and want to apply those sigs to this data. Perhaps your analysts are more accustomed to writing Yara sigs than complex Pandas filtering conditions.\n", "\n", "For this example, we load in a handful of user-agent strings into a DataFrame and we'll use Yara to match on a set of rules. This example highlights one more feature of Yara and the `%%yara` magic: __external variables__. When the Yara engine scans data, it normally is scanning a blob of data, but it can also load in named chunks of data into a dictionary called external variables. Since a DataFrame is intuitively chunked into columns, we can specify an external variable container for each column in the DataFrame." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "from cStringIO import StringIO" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "useragentcsv = \"\"\"useragent\n", "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)\n", "\"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17\"\n", "\"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.65 Safari/537.36\"\n", "\"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36\"\n", "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.65 Safari/537.36\"\n", "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)\n", "\"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36\"\n", "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.152 Safari/537.22\"\n", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)\n", "\"Mozilla/5.0 (iPhone; CPU iPhone OS 6_1 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10B144 Safari/8536.25\"\n", "\"Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A403 Safari/8536.25\"\n", "Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0\n", "Googlebot/2.1 (+http://www.google.com/bot.html)\n", "\"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36\"\n", "\"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17\"\n", "\"\"\"" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "df = pd.read_csv(StringIO(useragentcsv))\n", "df.head(5)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
useragent
0 Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...
1 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
2 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53...
3 Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3...
4 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4)...
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 14, "text": [ " useragent\n", "0 Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...\n", "1 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...\n", "2 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53...\n", "3 Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3...\n", "4 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4)..." ] } ], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the cell below, the `-e` option to the `%%yara` magic specifies the external variable. The externals ___must___ be specified at the time of compilation, which, for the `%%yara` cell magic, that means at write-time. `%%yara` accepts multiple `-e` parameters as well as `comma,separated,lists`. External variables are referenced directly in the `condition` block rather than a `strings` statement. Ensure the external variable names match the column names. (For more details, see the docstring by running `%%yara?` in a cell)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%yara -n uarules -e useragent\n", "rule iPad_iPhone\n", "{\n", " condition:\n", " useragent contains \"iPhone;\" or \n", " useragent contains \"iPad;\"\n", "}\n", "\n", "rule Chrome25Plus\n", "{\n", " condition:\n", " useragent matches /Chrome\\/((2[5-9])|3[0-9])/\n", "}" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Adding compiled rules as \"uarules\" to namespace\n" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the next method, we have to get a little creative. [`DataFrame.apply`](http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.apply.html) provides the means to apply a function to rows or columns of the DataFrame. It does not, however, let you specify parameters to the invoked method; the only parameter will be the data from the column or row. (In our example, we use `axis=1` which tells pandas to look at data row-by-row). \n", "\n", "In order to have flexibility to pass in various Yara rule files (and avoid global variables), we have a general yarafilter method that takes a compiled yara rules object as a parameter. The function generates the necessary worker function that `DataFrame.apply` will accept and preloads a subset of columns as external variables for the yara engine. The function returns a comma separated list of rules that matched on each row or blank string if no matches." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def yarafilter(rules):\n", " \n", " # Specify a list of the column names we used in for external variables\n", " # in the yara rules\n", " externals = ['useragent']\n", " \n", " def worker(row):\n", " m = rules.match(data=\" \", externals=row[externals].to_dict())\n", " \n", " if m:\n", " return ','.join( [y.get('rule', '') for y in m.get('main', [])] )\n", " else:\n", " return ''\n", " \n", " return worker" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": [ "df['yarahits'] = df.apply(yarafilter(uarules), axis=1)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
useragentyarahits
0 Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...
1 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
2 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53... Chrome25Plus
3 Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3... Chrome25Plus
4 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4)... Chrome25Plus
5 Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...
6 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... Chrome25Plus
7 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3)... Chrome25Plus
8 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT ...
9 Mozilla/5.0 (iPhone; CPU iPhone OS 6_1 like Ma... iPad_iPhone
10 Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) A... iPad_iPhone
11 Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20...
12 Googlebot/2.1 (+http://www.google.com/bot.html)
13 Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3... Chrome25Plus
14 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 18, "text": [ " useragent yarahits\n", "0 Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ... \n", "1 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... \n", "2 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53... Chrome25Plus\n", "3 Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3... Chrome25Plus\n", "4 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4)... Chrome25Plus\n", "5 Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ... \n", "6 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... Chrome25Plus\n", "7 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3)... Chrome25Plus\n", "8 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT ... \n", "9 Mozilla/5.0 (iPhone; CPU iPhone OS 6_1 like Ma... iPad_iPhone\n", "10 Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) A... iPad_iPhone\n", "11 Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20... \n", "12 Googlebot/2.1 (+http://www.google.com/bot.html) \n", "13 Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.3... Chrome25Plus\n", "14 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... " ] } ], "prompt_number": 18 }, { "cell_type": "markdown", "metadata": {}, "source": [ "\u00a9 2013 Lockheed Martin Corporation. All Rights Reserved." ] } ], "metadata": {} } ] }