{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# Finding Correlations in a CSV of Malware Events via Hypergraph Views\n", "\n", "To find patterns and outliers in CSVs and event data, Graphistry provides the hypergraph transform. \n", "\n", "As an example, this notebook examines different malware files reported to a security vendor. It reveals phenomena such as:\n", "\n", "* The malware files cluster into several families\n", "* The nodes central to a cluster reveal attributes specific to a strain of malware\n", "* The nodes bordering a cluster reveal attributes that show up in a strain, but are unique to each instance in that strain\n", "* Several families have attributes connecting them, suggesting they had the same authors" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load CSV" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "import pandas as pd\n", "import graphistry as g\n", "#graphistry.register(key='...')" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('# samples', 999)\n" ] }, { "data": { "text/plain": [ "{'Campaign': 'TRANSFORMICE',\n", " 'Date': '2015-11-19 14:04:23',\n", " 'Domain': 'spynet1.ddns.net',\n", " 'InstallDir': 'TEMP',\n", " 'InstallFlag': 'True',\n", " 'InstallName': 'svchost.exe',\n", " 'NetworkSeparator': \"|'|'|\",\n", " 'Origin': 'vt',\n", " 'Port': '1177',\n", " 'RegistryValue': 'ba4c12bee3027d94da5c81db2d196bfd',\n", " 'Version': '0.6.4',\n", " 'compile_date': '2015-11-18 21:25:59',\n", " 'imphash': 'f34d5f2d4577ed6d9ceec516c1f5a744',\n", " 'magic': 'PE32 executable for MS Windows (GUI) Intel 80386 32-bit Mono/.Net assembly',\n", " 'md5': '007a8403b3281fd4d48c69f4c96da0b8',\n", " 'rat_name': 'njRat',\n", " 'section_.RELOC': '7905c1aa858eb5484ad08a2e10b7e50e',\n", " 'section_.RSRC': '5b346ed223699f15252c1fdad182859f',\n", " 'section_.TEXT': 'f414cace41511d02fb8e278cf36fd2a3',\n", " 'sha1': 'd215edec90c5487800d961cc1ac2808e221818fa',\n", " 'sha256': '2beb53ca652d9d4f73516ce45365ae824370d2408d6b0d5a809cf3cd177ba694'}" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv('barncat.1k.csv', encoding = \"utf8\")\n", "print(\"# samples\", len(df))\n", "eval(df[:10]['value'].tolist()[0])" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/html": [ "
\n", " | uuid | \n", "event_id | \n", "category | \n", "type | \n", "value | \n", "to_ids | \n", "date | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "56e1af55-22f4-4b76-881a-50feac1f3af3 | \n", "417 | \n", "External analysis | \n", "comment | \n", "{\"InstallFlag\": \"True\", \"RegistryValue\": \"ba4c... | \n", "0 | \n", "20160310 | \n", "
\n", " | ActivateKeylogger | \n", "ActiveXKey | \n", "ActiveXStartup | \n", "BackupDNSServer | \n", "BypassUAC | \n", "Campaign | \n", "ChangeCreationDate | \n", "ClearAccessControl | \n", "ClearZoneIdentifier | \n", "ConnectDelay | \n", "... | \n", "section_.TEXT | \n", "section_.TLS | \n", "section_BSS | \n", "section_CODE | \n", "section_DATA | \n", "sha1 | \n", "sha256 | \n", "to_ids | \n", "type | \n", "uuid | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0.0 | \n", "comment | \n", "56e1af55-22f4-4b76-881a-50feac1f3af3 | \n", "
1 rows × 116 columns
\n", "