{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data Obfuscation Library\n",
"\n",
"Sharing data, creating documents and doing public demonstrations often require that data containing\n",
"PII or other sensitive material be obfuscated.\n",
"\n",
"MSTICPy contains a simple library to obfuscate data using hashing and random mapping of values.\n",
"You can use these functions on a single data items or entire DataFrames.\n",
"\n",
"## Contents\n",
"- [Import the module](#Import-the-module)\n",
"- [Individual Obfuscation Functions](#Individual-Obfuscation-Functions)\n",
"- [Obfuscating DataFrames](#Obfuscating-DataFrames)\n",
"- [Creating custom column mappings](#Creating-custom-mappings)\n",
"- [Using hash_item with delimiters](#Using-hash_item-with-delimiters-to-preserve-the-structure/look-of-the-hashed-input)\n",
"- [Checking Your Obfuscation](#Checking-Your-Obfuscation)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import the module"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from msticpy.common.utility import md\n",
"from msticpy.data import data_obfus"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Read in some data for the examples"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"\n",
"netflow_df = pd.read_csv(\"data/az_net_flows.csv\")\n",
"# list is imported as string from csv - convert back to list with eval\n",
"def str_to_list(val):\n",
" if isinstance(val, str):\n",
" return eval(val)\n",
"netflow_df[\"PublicIPs\"] = netflow_df[\"PublicIPs\"].apply(str_to_list)\n",
"\n",
"# Define subset of output columns\n",
"out_cols = [\n",
" 'TenantId', 'TimeGenerated', 'FlowStartTime',\n",
" 'ResourceGroup', 'VMName', 'VMIPAddress', 'PublicIPs',\n",
" 'SrcIP', 'DestIP', 'L4Protocol', 'AllExtIPs'\n",
"]\n",
"netflow_df = netflow_df[out_cols]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Individual Obfuscation Functions\n",
"\n",
"Here we're importing individual functions but you can access them with the single\n",
"import statement above as:\n",
"```\n",
"data_obfus.hash_string(...)\n",
"```\n",
"etc.\n",
"\n",
"> **Note** In the next cell we're using a function to output documentation and examples.
\n",
"> You can ignore this. The usage of each function is show in the output of
\n",
"> the subsequent cells."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from msticpy.data.data_obfus import (\n",
" hash_dict,\n",
" hash_ip,\n",
" hash_item,\n",
" hash_list,\n",
" hash_sid,\n",
" hash_string,\n",
" replace_guid\n",
")\n",
"\n",
"# Function to automate/format the examples below. You can ignore this\n",
"def show_func(func, examples):\n",
" func_name = func.__name__\n",
" if func.__name__.startswith(\"_\"):\n",
" func_name = func_name[1:]\n",
" md(func_name, \"bold\")\n",
" print(func.__doc__)\n",
" md(\"Examples\", \"bold\")\n",
" for example in examples:\n",
" if isinstance(example, tuple):\n",
" arg, delim = example\n",
" print(\n",
" f\"{func_name}('{arg}', delim='{delim}') =>\", func(*example)\n",
" )\n",
" else:\n",
" print(\n",
" f\"{func_name}('{example}') =>\", func(example)\n",
" )\n",
" md(\"
hash_string
" ], "text/plain": [ "hash_string does a simple hash of the input. If the input is a numeric string it will output a numeric
" ], "text/plain": [ "hash_string
" ], "text/plain": [ "Examples
" ], "text/plain": [ "hash_item
" ], "text/plain": [ "hash_item allows specification of delimiters. Useful for preserving the look of domains, emails, etc.
" ], "text/plain": [ "hash_item
" ], "text/plain": [ "Examples
" ], "text/plain": [ "hash_ip
" ], "text/plain": [ "hash_ip will output random mappings of input IP V4 and V6 addresses.
" ], "text/plain": [ "Within a Python session the mapping will remain constant.
" ], "text/plain": [ "hash_ip
" ], "text/plain": [ "Examples
" ], "text/plain": [ "hash_sid
" ], "text/plain": [ "hash_sid will randomize the domain-specific parts of a SID. It preserves built-in SIDs and well known RIDs (e.g. Admins -500)
" ], "text/plain": [ "hash_sid
" ], "text/plain": [ "Examples
" ], "text/plain": [ "hash_list
" ], "text/plain": [ "hash_list will randomize a list of items preserving the list structure.
" ], "text/plain": [ "hash_list
" ], "text/plain": [ "Examples
" ], "text/plain": [ "hash_dict
" ], "text/plain": [ "hash_dict will randomize a dict of items preserving the structure and the dict keys.
" ], "text/plain": [ "hash_dict
" ], "text/plain": [ "Examples
" ], "text/plain": [ "replace_guid
" ], "text/plain": [ "replace_guid will output a random UUID mapped to the input.
" ], "text/plain": [ "An input GUID will be mapped to the same newly-generated output UUID
" ], "text/plain": [ "You can see that UUID #4 is the same as #1 and mapped to the same output UUID.
" ], "text/plain": [ "replace_guid
" ], "text/plain": [ "Examples
" ], "text/plain": [ "\n", " | TenantId | \n", "TimeGenerated | \n", "FlowStartTime | \n", "ResourceGroup | \n", "VMName | \n", "VMIPAddress | \n", "PublicIPs | \n", "SrcIP | \n", "DestIP | \n", "L4Protocol | \n", "AllExtIPs | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "52b1ab41-869e-4138-9e40-2a4457f09bf0 | \n", "2019-02-12 14:22:40.697 | \n", "2019-02-12 13:00:07.000 | \n", "asihuntomsworkspacerg | \n", "msticalertswin1 | \n", "10.0.3.5 | \n", "[65.55.44.109] | \n", "NaN | \n", "NaN | \n", "T | \n", "65.55.44.109 | \n", "
1 | \n", "52b1ab41-869e-4138-9e40-2a4457f09bf0 | \n", "2019-02-12 14:22:40.681 | \n", "2019-02-12 13:00:48.000 | \n", "asihuntomsworkspacerg | \n", "msticalertswin1 | \n", "10.0.3.5 | \n", "[13.71.172.130, 13.71.172.128] | \n", "NaN | \n", "NaN | \n", "T | \n", "13.71.172.128 | \n", "
2 | \n", "52b1ab41-869e-4138-9e40-2a4457f09bf0 | \n", "2019-02-12 14:22:40.681 | \n", "2019-02-12 13:00:48.000 | \n", "asihuntomsworkspacerg | \n", "msticalertswin1 | \n", "10.0.3.5 | \n", "[13.71.172.130, 13.71.172.128] | \n", "NaN | \n", "NaN | \n", "T | \n", "13.71.172.130 | \n", "
\n", " | TenantId | \n", "TimeGenerated | \n", "FlowStartTime | \n", "ResourceGroup | \n", "VMName | \n", "VMIPAddress | \n", "PublicIPs | \n", "SrcIP | \n", "DestIP | \n", "L4Protocol | \n", "AllExtIPs | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "f9ef3428-3ccb-4ecd-8466-dbedc7044293 | \n", "2019-02-12 14:22:40.697 | \n", "2019-02-12 13:00:07.000 | \n", "ibmkajbmepnmiaeilfofa | \n", "fmlmbnlpdcbnbnn | \n", "10.0.3.5 | \n", "[65.55.44.109] | \n", "NaN | \n", "NaN | \n", "T | \n", "65.55.44.109 | \n", "
1 | \n", "f9ef3428-3ccb-4ecd-8466-dbedc7044293 | \n", "2019-02-12 14:22:40.681 | \n", "2019-02-12 13:00:48.000 | \n", "ibmkajbmepnmiaeilfofa | \n", "fmlmbnlpdcbnbnn | \n", "10.0.3.5 | \n", "[13.71.172.130, 13.71.172.128] | \n", "NaN | \n", "NaN | \n", "T | \n", "13.71.172.128 | \n", "
2 | \n", "f9ef3428-3ccb-4ecd-8466-dbedc7044293 | \n", "2019-02-12 14:22:40.681 | \n", "2019-02-12 13:00:48.000 | \n", "ibmkajbmepnmiaeilfofa | \n", "fmlmbnlpdcbnbnn | \n", "10.0.3.5 | \n", "[13.71.172.130, 13.71.172.128] | \n", "NaN | \n", "NaN | \n", "T | \n", "13.71.172.130 | \n", "
\n", " | TenantId | \n", "TimeGenerated | \n", "FlowStartTime | \n", "ResourceGroup | \n", "VMName | \n", "VMIPAddress | \n", "PublicIPs | \n", "SrcIP | \n", "DestIP | \n", "L4Protocol | \n", "AllExtIPs | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "f9ef3428-3ccb-4ecd-8466-dbedc7044293 | \n", "2019-02-12 14:22:40.697 | \n", "2019-02-12 13:00:07.000 | \n", "ibmkajbmepnmiaeilfofa | \n", "fmlmbnlpdcbnbnn | \n", "10.0.3.5 | \n", "[65.55.44.109] | \n", "NaN | \n", "NaN | \n", "T | \n", "65.55.44.109 | \n", "
1 | \n", "f9ef3428-3ccb-4ecd-8466-dbedc7044293 | \n", "2019-02-12 14:22:40.681 | \n", "2019-02-12 13:00:48.000 | \n", "ibmkajbmepnmiaeilfofa | \n", "fmlmbnlpdcbnbnn | \n", "10.0.3.5 | \n", "[13.71.172.130, 13.71.172.128] | \n", "NaN | \n", "NaN | \n", "T | \n", "13.71.172.128 | \n", "
2 | \n", "f9ef3428-3ccb-4ecd-8466-dbedc7044293 | \n", "2019-02-12 14:22:40.681 | \n", "2019-02-12 13:00:48.000 | \n", "ibmkajbmepnmiaeilfofa | \n", "fmlmbnlpdcbnbnn | \n", "10.0.3.5 | \n", "[13.71.172.130, 13.71.172.128] | \n", "NaN | \n", "NaN | \n", "T | \n", "13.71.172.130 | \n", "
\n", " | TenantId | \n", "TimeGenerated | \n", "FlowStartTime | \n", "ResourceGroup | \n", "VMName | \n", "VMIPAddress | \n", "PublicIPs | \n", "SrcIP | \n", "DestIP | \n", "L4Protocol | \n", "AllExtIPs | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "f9ef3428-3ccb-4ecd-8466-dbedc7044293 | \n", "2019-02-12 14:22:40.697 | \n", "2019-02-12 13:00:07.000 | \n", "ibmkajbmepnmiaeilfofa | \n", "fmlmbnlpdcbnbnn | \n", "10.112.51.93 | \n", "[100.11.187.82] | \n", "NaN | \n", "NaN | \n", "T | \n", "100.11.187.82 | \n", "
1 | \n", "f9ef3428-3ccb-4ecd-8466-dbedc7044293 | \n", "2019-02-12 14:22:40.681 | \n", "2019-02-12 13:00:48.000 | \n", "ibmkajbmepnmiaeilfofa | \n", "fmlmbnlpdcbnbnn | \n", "10.112.51.93 | \n", "[144.169.193.140, 144.169.193.144] | \n", "NaN | \n", "NaN | \n", "T | \n", "144.169.193.144 | \n", "
2 | \n", "f9ef3428-3ccb-4ecd-8466-dbedc7044293 | \n", "2019-02-12 14:22:40.681 | \n", "2019-02-12 13:00:48.000 | \n", "ibmkajbmepnmiaeilfofa | \n", "fmlmbnlpdcbnbnn | \n", "10.112.51.93 | \n", "[144.169.193.140, 144.169.193.144] | \n", "NaN | \n", "NaN | \n", "T | \n", "144.169.193.140 | \n", "