<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Guided-Hunting---Covid-19-Themed-Threats" data-toc-modified-id="Guided-Hunting---Covid-19-Themed-Threats-1">Guided Hunting - Covid-19 Themed Threats</a></span><ul class="toc-item"><li><span><a href="#Setup" data-toc-modified-id="Setup-1.1">Setup</a></span></li><li><span><a href="#Network-event-investigation" data-toc-modified-id="Network-event-investigation-1.2">Network event investigation</a></span></li><li><span><a href="#Select-a-domain-to-get-more-details-on:" data-toc-modified-id="Select-a-domain-to-get-more-details-on:-1.3">Select a domain to get more details on:</a></span></li></ul></li><li><span><a href="#Office-Activity-Investigation" data-toc-modified-id="Office-Activity-Investigation-2">Office Activity Investigation</a></span><ul class="toc-item"><li><span><a href="#Host-Activity-Investigation" data-toc-modified-id="Host-Activity-Investigation-2.1">Host Activity Investigation</a></span></li></ul></li></ul></div>

# Guided Hunting - Covid-19 Themed Threats
**Notebook Version:** 1.0<br>
**Python Version:** Python 3.6 (including Python 3.6 - AzureML)<br>
**Data Sources Required:** CommonSecurityLog, OfficeActivity, SecurityEvent<br>
 
This Notebook assists defenders in hunting for Covid-19 themed attacks by identifying anomalous Covid-19 related events within your Azure Sentinel Workspace. This is designed to be a hunting notebook and has a high probability of returning false positives and returned data points should not be seen as detections without further investigation.

**How to use:**<br>
Run the cells in this Notebook in order, at various points in the Notebook flow you will be prompted to enter or select options relevant to the scope of your triage.<br>
This Notebook presumes you have Azure Sentinel Workspace settings and Threat Intelligence providers configured in a config file. If you do not have this in place please refer https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html# to https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb  and to set this up.

## Setup

In [None]:
from pathlib import Path
from IPython.display import display, HTML

REQ_PYTHON_VER=(3, 6)
REQ_MSTICPY_VER=(1, 0, 0)
REQ_MP_EXTRAS = ["ml"]

update_nbcheck = (
    "<p style='color: orange; text-align=left'>"
    "<b>Warning: we needed to update '<i>utils/nb_check.py</i>'</b><br>"
    "Please restart the kernel and re-run this cell."
    "</p>"
)

display(HTML("<h3>Starting Notebook setup...</h3>"))
if Path("./utils/nb_check.py").is_file():
    try:
        from utils.nb_check import check_versions
    except ImportError as err:
        %xmode Minimal
        !curl https://raw.githubusercontent.com/Azure/Azure-Sentinel-Notebooks/master/utils/nb_check.py > ./utils/nb_check.py 2>/dev/null
        display(HTML(update_nbcheck))
    if "check_versions" not in globals():
        raise ImportError("Old version of nb_check.py detected - see instructions below.")
    %xmode Verbose
    check_versions(REQ_PYTHON_VER, REQ_MSTICPY_VER, REQ_MP_EXTRAS)


# If not using Azure Notebooks, install msticpy with
# !pip install msticpy
from msticpy.nbtools import nbinit
extra_imports = [
    "tqdm.notebook, tqdm",
    "whois",
    "dns",
    "tldextract",
    "datetime",
    "msticpy.nbtools.foliummap, get_map_center",
    "msticpy.nbtools.foliummap, get_center_ip_entities",
    "msticpy.sectools.ip_utils, convert_to_ip_entities",
    "functools, lru_cache",
]

additional_packages = [
    "tldextract", "IPWhois", "python-whois"
]
nbinit.init_notebook(
    namespace=globals(),
    additional_packages=additional_packages,
    extra_imports=extra_imports,
);

from bokeh.plotting import figure

In [None]:
# See if we have an Azure Sentinel Workspace defined in our config file.
# If not, let the user specify Workspace and Tenant IDs

ws_config = WorkspaceConfig()
if not ws_config.config_loaded:
    ws_config.prompt_for_ws()
    
qry_prov = QueryProvider(data_environment="AzureSentinel")
print("done")
    
ti = TILookup()

In [None]:
# Authenticate to Azure Sentinel workspace
qry_prov.connect(ws_config)
table_index = qry_prov.schema_tables

## Network event investigation

Select the time window in which to review network logs in (default is last 24 hours). In large environments you may need to make this time windows smaller in order to avoid query timeout.

In [None]:
query_times = nbwidgets.QueryTime(units='hours',
                                      max_before=72, max_after=0, before=24)
query_times.display()

In [None]:
start = query_times.start
end = query_times.end
# Get Covid-19 related URLs from Network Logs
url_q = f"""
CommonSecurityLog
| where TimeGenerated between (datetime({start})..datetime({end}))
| extend Url = iif(isnotempty(RequestURL), RequestURL , iif(isnotempty(DestinationHostName), DestinationHostName, "None"))
| where Url != "None" 
| distinct Url
| where tolower(Url) matches regex("(?i)(covid|corona.*virus)")
"""
print("Collecting data...")
url_data = qry_prov.exec_query(url_q)
print("Done")

In [None]:
@lru_cache(maxsize=5000)
def get_domain(url):
    try:
        _, domain,tld = tldextract.extract(url)
        return f"{domain}.{tld}"
    except:
        return None

@lru_cache(maxsize=5000)
def whois_url(dom):
    try:
        wis = whois.whois(dom)
        return wis['creation_date']
    except (KeyError, whois.parser.PywhoisError, UnicodeError) as e:
        return None

tqdm.pandas(desc="Lookup progress")

if isinstance(url_data, pd.DataFrame) and not url_data.empty:
    md("Extracting domains")
    url_data['domain'] = url_data['Url'].progress_apply(get_domain)
    url_data = url_data['domain'].drop_duplicates().reset_index().drop(['index'], axis=1)
    md(f"Getting domain registration dates for {len(url_data)} unique domains")
    md("This can take a while for large numbers of domains ~ 100 domains/min")
    url_data['creation_date'] = url_data['domain'].progress_apply(whois_url)
else:
    md("No matches found.")

Pick a date to filter out domains registered before:

In [None]:
import datetime
#Date Picker for excluding dates
date_pick = widgets.DatePicker(
    description='Pick a Date',
    disabled=False,
    value= datetime.date(2020, 3, 1)
)
display(date_pick)

In [None]:
start_time = datetime.datetime.combine(date_pick.value, datetime.datetime.min.time())


def clean_dates(row):
    if type(row['creation_date']) == datetime.datetime:
        return row['creation_date']
    elif type(row['creation_date']) == list: 
        return row['creation_date'][0]
    elif row['creation_date'] == "before Aug-1996":
        return datetime.datetime(1996, 8, 1, 0, 0 ,0)
    else:
        return None
    
recent_urls_df = None
if isinstance(url_data, pd.DataFrame) and not url_data.empty:   
    url_data = url_data.mask(url_data.eq('None')).dropna()    
    md("Formatting and filtering dates")
    url_data['creation_date'] = url_data.progress_apply(clean_dates, axis=1)
    filter_mask = url_data['creation_date'] > start_time
    recent_urls_df = url_data[filter_mask]
    md("Recently registered domains relating to Covid-19","bold")
    display(recent_urls_df)
else:
    md("No URL data to process.")

In [None]:
# Lookup domain in threat intel
def lookup_dom(row):
    sev = []
    resp = ti.lookup_ioc(observable=row["domain"], providers=['OTX', 'XForce', 'VirusTotal'])
    for response in resp[1]:
            sev.append(response[1].severity) 
    if 'high' in sev:
        severity = "High"
    elif 'warning' in sev:
        severity = "Warning"
    elif 'information' in sev:
        severity = "Information"
    else:
        severity = "None"
    return severity

# Lookup primary IP address of domain
def get_ip(row):
    try:
        ip = dns.resolver.query(row['domain'], "A")
        return ip[0]
    except:
        return None

# Lookup IP address in threat intel
def lookup_ip(row):
    sev = []
    resp = ti.lookup_ioc(observable=str(row["ip"]), providers=['OTX', 'XForce', 'VirusTotal'])
    for response in resp[1]:
            sev.append(response[1].severity) 
    if 'high' in sev:
        severity = "High"
    elif 'warning' in sev:
        severity = "Warning"
    elif 'information' in sev:
        severity = "Information"
    else:
        severity = "None"
    return severity


# Highlight cells based on Threat Intelligence results.        
def color_cells(val):
    if isinstance(val, str):
        if val.lower() == "high":
            color = 'Red'
        elif val.lower() == 'warning':
            color = 'Orange'
        elif val.lower() == 'information':
            color = 'Green'
        else:
            color = 'none'
    else:
        color = 'none'
    return 'background-color: %s' % color 

if isinstance(recent_urls_df, pd.DataFrame) and not recent_urls_df.empty:
    md("Getting IP addresses for domain")
    recent_urls_df['ip'] = recent_urls_df.progress_apply(get_ip, axis=1)
    md("Looking up IP addresses in threat intelligence")
    recent_urls_df['IP TI Risk'] = recent_urls_df.progress_apply(lookup_ip, axis=1)
    md("Looking up domains in threat intelligence")
    recent_urls_df['Domain TI Risk'] = recent_urls_df.progress_apply(lookup_dom, axis=1)
    md("Threat Intellignce results for domains and assocaited IP addresses:", "bold")
    display(recent_urls_df.style.applymap(color_cells).hide_index())
else:
    md(f"No domains registered since {start_time} were found")

## Select a domain to get more details on:

In [None]:
if isinstance(recent_urls_df, pd.DataFrame) and not recent_urls_df.empty:
    doms = list(recent_urls_df['domain'])
    dom_picker = widgets.Dropdown(
        options=doms,
        description='Domain:',
        disabled=False,
    )
    display(dom_picker)
else:
    md(f"No domains registered since {start_time} were found")

In [None]:
if isinstance(recent_urls_df, pd.DataFrame) and not recent_urls_df.empty:
    dom_details = recent_urls_df[recent_urls_df['domain'] == dom_picker.value]
    md("Domain details:","bold")
    display(dom_details)
    resp = ti.lookup_ioc(observable=dom_details.iloc[0]['domain'], providers=['OTX', 'XForce', 'VirusTotal'])
    resp = ti.result_to_df(resp)
    md("Domain TI details:","bold")
    display(resp)
    ip_resp = ti.lookup_ioc(observable=str(dom_details.iloc[0]['ip']), providers=['OTX', 'XForce', 'VirusTotal'])
    ip_resp = ti.result_to_df(ip_resp)
    md("IP address TI details:","bold")
    display(ip_resp)
    dom_q = f"""
    CommonSecurityLog
    | where TimeGenerated between (datetime({start})..datetime({end}))
    //| extend Url = iif(isnotempty(RequestURL), RequestURL , iif(isnotempty(DestinationHostName), DestinationHostName, "None"))
    //| where Url != "None" and tolower(Url) contains ('{dom_picker.value}')
    | where RequestURL has "{dom_picker.value.strip()}" or DestinationHostName has "{dom_picker.value.strip()}"
    """
    print("Getting raw events (this can take a few minutes)...")
    dom_data = qry_prov.exec_query(dom_q)
    md("Raw log events:","bold")
    display(dom_data)
else:
    md(f"No domains registered since {start_time} were found")

In [None]:
if isinstance(recent_urls_df, pd.DataFrame) and not recent_urls_df.empty:
    ips = dom_data['DestinationIP'].unique()
    folium_map = FoliumMap()
    ip_entities = []  
    for ip in ips:
        ip_entities.append(convert_to_ip_entities(ip)[0])
    md(f'Map of destination IP addresses associated with {dom_picker.value}', 'bold')
    icon_props = {'color': 'orange'}
    location = get_map_center(ip_entities)
    folium_map.add_ip_cluster(ip_entities=ip_entities, location=location,
                                    **icon_props)
    display(folium_map.folium_map)
else:
    md(f"No domains registered since {start_time} were found")

# Office Activity Investigation

Review Covid-19 related files that have been accessed by large part of your organization. There is a good chance many of these are legitimate organizational documents but some may be widely shared malicious or mis-leading documents.
Enter the approximate number of users in your organization and the query will identify documents accessed by more than 10% of your user base.


In [None]:
users = widgets.IntText(
    value=1000,
    description='No. of users:',
    disabled=False
)
display(users)

In [None]:
# Widely accessed file query
percentage_users = users.value * 0.1
files_q = f"""
OfficeActivity
| where SourceFileName matches regex("(?i)(covid|corona.*virus)")
| where Operation == "FileAccessed"
| summarize dcount(UserId) by SourceFileName, OfficeObjectId
| where dcount_UserId > {percentage_users}
| sort by dcount_UserId
"""

files = qry_prov.exec_query(files_q)
if isinstance(files, pd.DataFrame) and not files.empty:
    display(files)
else:
    md("No Covid-19 related files found.")

Look for Covid-19 related documents that have been uploaded by User Agents that have not been widely seen in the environment over the last 30 days. This may indicate malicious users uploading documents.

In [None]:
rare_uas_q = """
let rare_uas = (
OfficeActivity
| where TimeGenerated > ago(30d)
| where Operation == "FileUploaded"
| where UserAgent !startswith "Microsoft Office" and UserAgent !startswith "OneNote" and UserAgent !startswith "OneDrive" and UserAgent !startswith "Outlook" and UserAgent !startswith "Exchange" 
| summarize count() by UserAgent
| project UserAgent);
OfficeActivity
| where Operation == "FileUploaded"
| where SourceFileName matches regex("(?i)(covid|corona.*virus)")
| where UserAgent in~ (rare_uas)
"""
print("Collecting data...")
rare_uas = None
rare_uas = qry_prov.exec_query(rare_uas_q)

if isinstance(rare_uas, pd.DataFrame) and not rare_uas.empty:
    md("Done")
else:
    md("No Covid-19 related activity found.")

In [None]:
uas_picker = None
if isinstance(rare_uas, pd.DataFrame) and not rare_uas.empty:
    uas = rare_uas['UserAgent'].unique()
    uas_picker = widgets.Dropdown(
        options=uas,
        description='User Agent:',
        disabled=False,
    )
    display(uas_picker)
else:
    md(f"No rare User Agents Found")

In [None]:
if uas_picker:
    filtered_rare_uas = rare_uas[rare_uas['UserAgent']==uas_picker.value]
    display(filtered_rare_uas)
else:
    filtered_rare_uas = None
    md(f"No rare User Agents Found")

In [None]:
if isinstance(filtered_rare_uas, pd.DataFrame) and not filtered_rare_uas.empty:
    rare_uas_ips = filtered_rare_uas['ClientIP'].unique()
    folium_map = FoliumMap()
    ip_entities = []  
    for ip in rare_uas_ips:
        ip_entities.append(convert_to_ip_entities(ip)[0])
    md('Map of the source IP of file uploads from rare user agents', 'bold')
    icon_props = {'color': 'orange'}
    location = get_map_center(ip_entities)
    folium_map.add_ip_cluster(ip_entities=ip_entities, location=location,
                                    **icon_props)
    display(folium_map.folium_map)
else:
    md(f"No rare User Agents Found")

## Host Activity Investigation
Look for new processes spawned from a command line containing COVID-19 related names that may be phishing lures. We start by looking at the number of hosts observed with the command line spawning the particular process and from there can drill down into a specific command line.

In [None]:
process_q = f"""
SecurityEvent
| where TimeGenerated between (datetime({start})..datetime({end}))
| where EventID == 4688
| where CommandLine matches regex("(?i)(covid|corona.*virus)")
| summarize dcount(Computer) by CommandLine, NewProcessName
| sort by dcount_Computer"""

process_data = qry_prov.exec_query(process_q)

cmd_lines = None
if isinstance(process_data, pd.DataFrame) and not process_data.empty:
    display(process_data)
    cmd_lines = process_data['CommandLine'].unique()
else:
    md("No Covid related process data found")

Select command line to look at in more detail:

In [None]:
if cmd_lines:
    cmd_line = widgets.Dropdown(
        options=cmd_lines,
        description='Command Lines:',
        disabled=False,
    )
    display(cmd_line)
else:
    md("No Covid related process data found")

In [None]:
if cmd_lines:
    cmd_line_clean = cmd_line.value.replace('\\', "\\\\")
    cmd_line_q = f"""
    SecurityEvent
    | where TimeGenerated between (datetime({start})..datetime({end}))
    | where EventID == 4688
    | where CommandLine == "{cmd_line_clean}"
    """

    cmd_line_events = qry_prov.exec_query(cmd_line_q)

    if isinstance(cmd_line_events, pd.DataFrame) and not cmd_line_events.empty:
        display(cmd_line_events)
    else:
        md("No events found")
else:
    md("No Covid related process data found")

Use Sysmon data to identify files that are included in the Microsoft Covid-19 threat intelligence data.

In [None]:
sysmon_q = f"""Event
| where TimeGenerated between (datetime({start})..datetime({end}))
| where Source == "Microsoft-Windows-Sysmon"
| extend RenderedDescription = tostring(split(RenderedDescription, ":")[0])
| project TimeGenerated, Source, EventID, Computer, UserName, EventData, RenderedDescription
| extend EvData = parse_xml(EventData)
| extend EventDetail = EvData.DataItem.EventData.Data
| extend Hashes = tostring(EventDetail[17].["#text"])
| where isnotempty(Hashes)"""

sysmon_df = qry_prov.exec_query(sysmon_q)

covid_ti = pd.read_csv("https://raw.githubusercontent.com/Azure/Azure-Sentinel/master/Sample%20Data/Feeds/Microsoft.Covid19.Indicators.csv", 
                        index_col=False,
                        names=["TimeGenerated","Hash","Hash Type", "TLP", "Service", "Type", "Source"],
                        parse_dates=["TimeGenerated"],
                        infer_datetime_format=True
                      )
hash_iocs = covid_ti['Hash']

if isinstance(sysmon_df, pd.DataFrame) and not sysmon_df.empty:
    md("Sysmon events with hashes that appear in Microsoft Covid-19 TI feed:")
    display(sysmon_df[sysmon_df['Hashes'].isin(hash_iocs)])
else:
    md("No Sysmon data present")
